Re: [PATCH 0/2] Fix the assert failure in scsi_dma_complete

2020-08-17 Thread Li Qiang
Paolo Bonzini  于2020年8月18日周二 上午1:05写道:

> On 15/08/20 16:19, Li Qiang wrote:
> > Currently in 'megasas_map_sgl' when 'iov_count=0' will just return
> > success however the 'cmd' doens't contain any iov. This will cause
> > the assert in 'scsi_dma_complete' failed. This is because in
> > 'dma_blk_cb' the 'dbs->sg_cur_index == dbs->sg->nsg' will be true
> > and just call 'dma_complete'. However now there is no aiocb returned.
> >
> > This is the LP#1878263:
> >
> > -->https://bugs.launchpad.net/qemu/+bug/1878263
> >
> > To solve this we will consider the 'iov_count=0' is an error.
> > In the first patch, I uses -1 to indicate an error and in the second
> > patch I consider 'iov_count=0' is an error.
> >
> > Li Qiang (2):
> >   hw: megasas: return -1 when 'megasas_map_sgl' fails
> >   hw: megasas: consider 'iov_count=0' is an error in megasas_map_sgl
> >
> >  hw/scsi/megasas.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
>
> Queued, thanks.  But do you have a qtest for this?
>

Okay, I will cook a qtest for this recently.

Thanks,
Li Qiang


>
> Paolo
>
>


Re: [PATCH 00/18] hw/riscv: Add Microchip PolarFire SoC Icicle Kit board support

2020-08-17 Thread via
On 8/17/20 8:28 PM, Alistair Francis wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> content is safe
>
> On Mon, Aug 17, 2020 at 11:12 AM via  wrote:
>> Hi Anup,
>>
>> On 8/17/20 11:30 AM, Bin Meng wrote:
>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
>>> content is safe
>>>
>>> Hi Anup,
>>>
>>> On Sat, Aug 15, 2020 at 1:44 AM Anup Patel  wrote:
 On Fri, Aug 14, 2020 at 10:12 PM Bin Meng  wrote:
> From: Bin Meng 
>
> This adds support for Microchip PolarFire SoC Icicle Kit board.
> The Icicle Kit board integrates a PolarFire SoC, with one SiFive's
> E51 plus four U54 cores and many on-chip peripherals and an FPGA.
 Nice Work !!! This is very helpful.
>>> Thanks!
>>>
 The Microchip HSS is quite convoluted. It has:
 1. DDR Init
 2. Boot device support
 3. SBI support using OpenSBI as library
 4. Simple TEE support

 I think point 1) and 2) above should be part of U-Boot SPL.
 The point 3) can be OpenSBI FW_DYNAMIC.

 Lastly,for point 4), we are working on a new OpenSBI feature using
 which we can run independent Secure OS and Non-Secure OS using
 U-Boot_SPL+OpenSBI (for both SiFive Unleashed and Microchip
 PolarFire).

 Do you have plans for adding U-Boot SPL support for this board ??
>>> + Cyril Jean from Microchip
>>>
>>> I will have to leave this question to Cyril to comment.
>>>
>> I currently do not have a plan to support U-Boot SPL. The idea of the
>> HSS is to contain all the silicon specific initialization and
>> configuration code within the HSS before jumping to U-Boot S-mode. I
>> would rather keep all this within the HSS for the time being. I would
>> wait until we reach production silicon before attempting to move this to
>> U-Boot SPL as the HSS is likely to contain some opaque silicon related
>> changes for another while.
> That is unfortunate, a lot of work has gone into making the boot flow
> simple and easy to use.
>
> QEMU now includes OpenSBI by default to make it easy for users to boot
> Linux. The Icicle Kit board is now the most difficult QEMU board to
> boot Linux on. Not to mention it makes it hard to impossible to
> support it in standard tool flows such as meta-riscv.
>
> Alistair

If it is such a problem we can add a U-Boot SPL stage and the HSS can be 
treated as standard SoC ROM code.

Cyril.




Re: [PATCH 00/18] hw/riscv: Add Microchip PolarFire SoC Icicle Kit board support

2020-08-17 Thread Alistair Francis
On Mon, Aug 17, 2020 at 11:12 AM via  wrote:
>
> Hi Anup,
>
> On 8/17/20 11:30 AM, Bin Meng wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> > content is safe
> >
> > Hi Anup,
> >
> > On Sat, Aug 15, 2020 at 1:44 AM Anup Patel  wrote:
> >> On Fri, Aug 14, 2020 at 10:12 PM Bin Meng  wrote:
> >>> From: Bin Meng 
> >>>
> >>> This adds support for Microchip PolarFire SoC Icicle Kit board.
> >>> The Icicle Kit board integrates a PolarFire SoC, with one SiFive's
> >>> E51 plus four U54 cores and many on-chip peripherals and an FPGA.
> >> Nice Work !!! This is very helpful.
> > Thanks!
> >
> >> The Microchip HSS is quite convoluted. It has:
> >> 1. DDR Init
> >> 2. Boot device support
> >> 3. SBI support using OpenSBI as library
> >> 4. Simple TEE support
> >>
> >> I think point 1) and 2) above should be part of U-Boot SPL.
> >> The point 3) can be OpenSBI FW_DYNAMIC.
> >>
> >> Lastly,for point 4), we are working on a new OpenSBI feature using
> >> which we can run independent Secure OS and Non-Secure OS using
> >> U-Boot_SPL+OpenSBI (for both SiFive Unleashed and Microchip
> >> PolarFire).
> >>
> >> Do you have plans for adding U-Boot SPL support for this board ??
> > + Cyril Jean from Microchip
> >
> > I will have to leave this question to Cyril to comment.
> >
> I currently do not have a plan to support U-Boot SPL. The idea of the
> HSS is to contain all the silicon specific initialization and
> configuration code within the HSS before jumping to U-Boot S-mode. I
> would rather keep all this within the HSS for the time being. I would
> wait until we reach production silicon before attempting to move this to
> U-Boot SPL as the HSS is likely to contain some opaque silicon related
> changes for another while.

That is unfortunate, a lot of work has gone into making the boot flow
simple and easy to use.

QEMU now includes OpenSBI by default to make it easy for users to boot
Linux. The Icicle Kit board is now the most difficult QEMU board to
boot Linux on. Not to mention it makes it hard to impossible to
support it in standard tool flows such as meta-riscv.

Alistair

>
>
> Regards,
>
> Cyril.
>



Re: [PATCH] hw/sd/allwinner-sdhost: Use AddressSpace for DMA transfers

2020-08-17 Thread Niek Linnenbank
Hi Philippe,

Nice improvement, I didnt know about this API. Makes sense to use it indeed.
The patch works fine. I tested your patches by applying the previous two
sets first, and then this one.
It ran well with the avocado tests and also with the official image
OrangePi_pc_debian_stretch_server_linux5.3.5_v1.0.img.

Tested-by: Niek Linnenbank 
Reviewed-by: Niek Linnenbank 

Regards,
Niek



On Fri, Aug 14, 2020 at 1:01 PM Philippe Mathieu-Daudé 
wrote:

> Allow the device to execute the DMA transfers in a different
> AddressSpace.
>
> The A10 and H3 SoC keep using the system_memory address space,
> but via the proper dma_memory_access() API.
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Based-on: <20200814092346.21825-1-f4...@amsat.org>
> "hw/sd: Use sdbus_read_data/sdbus_write_data for multiple bytes access"
>
> Tested with:
>   AVOCADO_ALLOW_LARGE_STORAGE=1 avocado run -t machine:orangepi-pc -t
> machine:cubieboard tests/acceptance/
> ---
>  include/hw/sd/allwinner-sdhost.h |  6 ++
>  hw/arm/allwinner-a10.c   |  2 ++
>  hw/arm/allwinner-h3.c|  2 ++
>  hw/sd/allwinner-sdhost.c | 37 ++--
>  4 files changed, 41 insertions(+), 6 deletions(-)
>
> diff --git a/include/hw/sd/allwinner-sdhost.h
> b/include/hw/sd/allwinner-sdhost.h
> index d94606a853..839732ebf3 100644
> --- a/include/hw/sd/allwinner-sdhost.h
> +++ b/include/hw/sd/allwinner-sdhost.h
> @@ -71,6 +71,12 @@ typedef struct AwSdHostState {
>  /** Interrupt output signal to notify CPU */
>  qemu_irq irq;
>
> +/** Memory region where DMA transfers are done */
> +MemoryRegion *dma_mr;
> +
> +/** Address space used internally for DMA transfers */
> +AddressSpace dma_as;
> +
>  /** Number of bytes left in current DMA transfer */
>  uint32_t transfer_cnt;
>
> diff --git a/hw/arm/allwinner-a10.c b/hw/arm/allwinner-a10.c
> index e258463747..d404f31e02 100644
> --- a/hw/arm/allwinner-a10.c
> +++ b/hw/arm/allwinner-a10.c
> @@ -155,6 +155,8 @@ static void aw_a10_realize(DeviceState *dev, Error
> **errp)
>  }
>
>  /* SD/MMC */
> +object_property_set_link(OBJECT(>mmc0), "dma-memory",
> + OBJECT(get_system_memory()), _fatal);
>  sysbus_realize(SYS_BUS_DEVICE(>mmc0), _fatal);
>  sysbus_mmio_map(SYS_BUS_DEVICE(>mmc0), 0, AW_A10_MMC0_BASE);
>  sysbus_connect_irq(SYS_BUS_DEVICE(>mmc0), 0, qdev_get_gpio_in(dev,
> 32));
> diff --git a/hw/arm/allwinner-h3.c b/hw/arm/allwinner-h3.c
> index ff92ded82c..43a8d3dc48 100644
> --- a/hw/arm/allwinner-h3.c
> +++ b/hw/arm/allwinner-h3.c
> @@ -349,6 +349,8 @@ static void allwinner_h3_realize(DeviceState *dev,
> Error **errp)
>  sysbus_mmio_map(SYS_BUS_DEVICE(>sid), 0, s->memmap[AW_H3_SID]);
>
>  /* SD/MMC */
> +object_property_set_link(OBJECT(>mmc0), "dma-memory",
> + OBJECT(get_system_memory()), _fatal);
>  sysbus_realize(SYS_BUS_DEVICE(>mmc0), _fatal);
>  sysbus_mmio_map(SYS_BUS_DEVICE(>mmc0), 0, s->memmap[AW_H3_MMC0]);
>  sysbus_connect_irq(SYS_BUS_DEVICE(>mmc0), 0,
> diff --git a/hw/sd/allwinner-sdhost.c b/hw/sd/allwinner-sdhost.c
> index f9eb92c09e..e82afb75eb 100644
> --- a/hw/sd/allwinner-sdhost.c
> +++ b/hw/sd/allwinner-sdhost.c
> @@ -21,7 +21,10 @@
>  #include "qemu/log.h"
>  #include "qemu/module.h"
>  #include "qemu/units.h"
> +#include "qapi/error.h"
>  #include "sysemu/blockdev.h"
> +#include "sysemu/dma.h"
> +#include "hw/qdev-properties.h"
>  #include "hw/irq.h"
>  #include "hw/sd/allwinner-sdhost.h"
>  #include "migration/vmstate.h"
> @@ -306,7 +309,7 @@ static uint32_t
> allwinner_sdhost_process_desc(AwSdHostState *s,
>  uint8_t buf[1024];
>
>  /* Read descriptor */
> -cpu_physical_memory_read(desc_addr, desc, sizeof(*desc));
> +dma_memory_read(>dma_as, desc_addr, desc, sizeof(*desc));
>  if (desc->size == 0) {
>  desc->size = klass->max_desc_size;
>  } else if (desc->size > klass->max_desc_size) {
> @@ -331,22 +334,24 @@ static uint32_t
> allwinner_sdhost_process_desc(AwSdHostState *s,
>
>  /* Write to SD bus */
>  if (is_write) {
> -cpu_physical_memory_read((desc->addr & DESC_SIZE_MASK) +
> num_done,
> -  buf, buf_bytes);
> +dma_memory_read(>dma_as,
> +(desc->addr & DESC_SIZE_MASK) + num_done,
> +buf, buf_bytes);
>  sdbus_write_data(>sdbus, buf, buf_bytes);
>
>  /* Read from SD bus */
>  } else {
>  sdbus_read_data(>sdbus, buf, buf_bytes);
> -cpu_physical_memory_write((desc->addr & DESC_SIZE_MASK) +
> num_done,
> -   buf, buf_bytes);
> +dma_memory_write(>dma_as,
> + (desc->addr & DESC_SIZE_MASK) + num_done,
> + buf, buf_bytes);
>  }
>  num_done += buf_bytes;
>  }
>
>  /* Clear 

Re: [PATCH 0/1] qcow2: Skip copy-on-write when allocating a zero cluster

2020-08-17 Thread Alberto Garcia
On Mon 17 Aug 2020 05:53:07 PM CEST, Kevin Wolf wrote:
> Maybe the difference is in allocating 64k at once instead of doing a
> separate allocation for every 4k block? But with the extent size hint
> patches to file-posix, we should allocate 1 MB at once by default now
> (if your test image was newly created). Can you check whether this is
> in effect for your image file?

Ehmm... is that hint supported in ext4 or only in xfs?

Berto



Re: [PATCH 00/18] hw/riscv: Add Microchip PolarFire SoC Icicle Kit board support

2020-08-17 Thread via
Hi Anup,

On 8/17/20 11:30 AM, Bin Meng wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> content is safe
>
> Hi Anup,
>
> On Sat, Aug 15, 2020 at 1:44 AM Anup Patel  wrote:
>> On Fri, Aug 14, 2020 at 10:12 PM Bin Meng  wrote:
>>> From: Bin Meng 
>>>
>>> This adds support for Microchip PolarFire SoC Icicle Kit board.
>>> The Icicle Kit board integrates a PolarFire SoC, with one SiFive's
>>> E51 plus four U54 cores and many on-chip peripherals and an FPGA.
>> Nice Work !!! This is very helpful.
> Thanks!
>
>> The Microchip HSS is quite convoluted. It has:
>> 1. DDR Init
>> 2. Boot device support
>> 3. SBI support using OpenSBI as library
>> 4. Simple TEE support
>>
>> I think point 1) and 2) above should be part of U-Boot SPL.
>> The point 3) can be OpenSBI FW_DYNAMIC.
>>
>> Lastly,for point 4), we are working on a new OpenSBI feature using
>> which we can run independent Secure OS and Non-Secure OS using
>> U-Boot_SPL+OpenSBI (for both SiFive Unleashed and Microchip
>> PolarFire).
>>
>> Do you have plans for adding U-Boot SPL support for this board ??
> + Cyril Jean from Microchip
>
> I will have to leave this question to Cyril to comment.
>
I currently do not have a plan to support U-Boot SPL. The idea of the 
HSS is to contain all the silicon specific initialization and 
configuration code within the HSS before jumping to U-Boot S-mode. I 
would rather keep all this within the HSS for the time being. I would 
wait until we reach production silicon before attempting to move this to 
U-Boot SPL as the HSS is likely to contain some opaque silicon related 
changes for another while.


Regards,

Cyril.



Re: [RFC PATCH 06/22] qemu-nbd: Use raw block driver for --offset

2020-08-17 Thread Nir Soffer
On Thu, Aug 13, 2020 at 7:36 PM Kevin Wolf  wrote:

> Instead of implementing qemu-nbd --offset in the NBD code, just put a
> raw block node with the requested offset on top of the user image and
> rely on that doing the job.
>
> This does not only simplify the nbd_export_new() interface and bring it
> closer to the set of options that the nbd-server-add QMP command offers,
> but in fact it also eliminates a potential source for bugs in the NBD
> code which previously had to add the offset manually in all relevant
> places.
>

Just to make sure I understand this correctly -

qemu-nbd can work with:

$ qemu-nbd 'json:{"driver": "file", "filename": "test.raw"}'

And:

$ qemu-nbd 'json:{"driver": "raw", "file": {"driver": "file",
"filename": "test.raw"}}'

I assumed that we always create the raw node?

oVirt always uses the second form to make it easier to support offset,
size, and backing.
https://github.com/oVirt/ovirt-imageio/blob/2021164d064227d7c5e03c8da087adc66e3a577e/daemon/ovirt_imageio/_internal/qemu_nbd.py#L104

This also seems to be the way libvirt builds the nodes using -blockdev.

Do we have a way to visualize the internal node graph used by
qemu-nbd/qemu-img?

Nir

Signed-off-by: Kevin Wolf 
> ---
>  include/block/nbd.h |  4 ++--
>  blockdev-nbd.c  |  9 +
>  nbd/server.c| 34 +-
>  qemu-nbd.c  | 27 ---
>  4 files changed, 32 insertions(+), 42 deletions(-)
>
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index c8c5cb6b61..3846d2bac8 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -329,8 +329,8 @@ typedef struct NBDExport NBDExport;
>  typedef struct NBDClient NBDClient;
>
>  BlockExport *nbd_export_create(BlockExportOptions *exp_args, Error
> **errp);
> -NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
> -  uint64_t size, const char *name, const char
> *desc,
> +NBDExport *nbd_export_new(BlockDriverState *bs,
> +  const char *name, const char *desc,
>const char *bitmap, bool readonly, bool shared,
>void (*close)(NBDExport *), bool writethrough,
>BlockBackend *on_eject_blk, Error **errp);
> diff --git a/blockdev-nbd.c b/blockdev-nbd.c
> index a1dc11bdd7..16cda3b052 100644
> --- a/blockdev-nbd.c
> +++ b/blockdev-nbd.c
> @@ -154,7 +154,6 @@ BlockExport *nbd_export_create(BlockExportOptions
> *exp_args, Error **errp)
>  BlockDriverState *bs = NULL;
>  BlockBackend *on_eject_blk;
>  NBDExport *exp = NULL;
> -int64_t len;
>  AioContext *aio_context;
>
>  assert(exp_args->type == BLOCK_EXPORT_TYPE_NBD);
> @@ -192,12 +191,6 @@ BlockExport *nbd_export_create(BlockExportOptions
> *exp_args, Error **errp)
>
>  aio_context = bdrv_get_aio_context(bs);
>  aio_context_acquire(aio_context);
> -len = bdrv_getlength(bs);
> -if (len < 0) {
> -error_setg_errno(errp, -len,
> - "Failed to determine the NBD export's length");
> -goto out;
> -}
>
>  if (!arg->has_writable) {
>  arg->writable = false;
> @@ -206,7 +199,7 @@ BlockExport *nbd_export_create(BlockExportOptions
> *exp_args, Error **errp)
>  arg->writable = false;
>  }
>
> -exp = nbd_export_new(bs, 0, len, arg->name, arg->description,
> arg->bitmap,
> +exp = nbd_export_new(bs, arg->name, arg->description, arg->bitmap,
>   !arg->writable, !arg->writable,
>   NULL, false, on_eject_blk, errp);
>  if (!exp) {
> diff --git a/nbd/server.c b/nbd/server.c
> index 774325dbe5..92360d1f08 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -89,7 +89,6 @@ struct NBDExport {
>  BlockBackend *blk;
>  char *name;
>  char *description;
> -uint64_t dev_offset;
>  uint64_t size;
>  uint16_t nbdflags;
>  QTAILQ_HEAD(, NBDClient) clients;
> @@ -1507,8 +1506,8 @@ static void nbd_eject_notifier(Notifier *n, void
> *data)
>  aio_context_release(aio_context);
>  }
>
> -NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
> -  uint64_t size, const char *name, const char
> *desc,
> +NBDExport *nbd_export_new(BlockDriverState *bs,
> +  const char *name, const char *desc,
>const char *bitmap, bool readonly, bool shared,
>void (*close)(NBDExport *), bool writethrough,
>BlockBackend *on_eject_blk, Error **errp)
> @@ -1516,9 +1515,17 @@ NBDExport *nbd_export_new(BlockDriverState *bs,
> uint64_t dev_offset,
>  AioContext *ctx;
>  BlockBackend *blk;
>  NBDExport *exp;
> +int64_t size;
>  uint64_t perm;
>  int ret;
>
> +size = bdrv_getlength(bs);
> +if (size < 0) {
> +error_setg_errno(errp, -size,
> + "Failed to 

Re: [PATCH] hw: add a number of SPI-flash's of m25p80 family

2020-08-17 Thread i . kononenko
No, the ext ID wasn't be checked at a real HW.
Just copied it from the U-boot official repository
https://github.com/u-boot/u-boot/blob/789bfb52668ee609b2043de645e2f94bbd24fd1f/drivers/mtd/spi/spi-nor-ids.c#L183

Do i need to take it from a real HW and compare?

On 12.08.2020 10:27, Cédric Le Goater wrote:
> On 8/11/20 10:37 PM, Igor Kononenko wrote:
>> Support a following SPI flashes:
>> * mx66l51235f
>> * mt25ql512ab
>>
>> Signed-off-by: Igor Kononenko 
>> ---
>>  hw/block/m25p80.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
>> index 8227088441..bf1f833784 100644
>> --- a/hw/block/m25p80.c
>> +++ b/hw/block/m25p80.c
>> @@ -219,6 +219,7 @@ static const FlashPartInfo known_devices[] = {
>>  { INFO("mx25l12855e", 0xc22618,  0,  64 << 10, 256, 0) },
>>  { INFO("mx25l25635e", 0xc22019,  0,  64 << 10, 512, 0) },
>>  { INFO("mx25l25655e", 0xc22619,  0,  64 << 10, 512, 0) },
>> +{ INFO("mx66l51235f", 0xc2201a,  0,  64 << 10, 1024, ER_4K | 
>> ER_32K) },
>>  { INFO("mx66u51235f", 0xc2253a,  0,  64 << 10, 1024, ER_4K | 
>> ER_32K) },
>>  { INFO("mx66u1g45g",  0xc2253b,  0,  64 << 10, 2048, ER_4K | 
>> ER_32K) },
>>  { INFO("mx66l1g45g",  0xc2201b,  0,  64 << 10, 2048, ER_4K | 
>> ER_32K) },
>> @@ -237,6 +238,7 @@ static const FlashPartInfo known_devices[] = {
>>  { INFO("n25q128", 0x20ba18,  0,  64 << 10, 256, 0) },
>>  { INFO("n25q256a",0x20ba19,  0,  64 << 10, 512, ER_4K) },
>>  { INFO("n25q512a",0x20ba20,  0,  64 << 10, 1024, ER_4K) },
>> +{ INFO("mt25ql512ab", 0x20ba20, 0x1044, 64 << 10, 1024, ER_4K | ER_32K) 
>> },
> 
> Have checked the extended ID on real HW ? 
> 
> C. 
> 
>>  { INFO_STACKED("n25q00",0x20ba21, 0x1000, 64 << 10, 2048, ER_4K, 4) 
>> },
>>  { INFO_STACKED("n25q00a",   0x20bb21, 0x1000, 64 << 10, 2048, ER_4K, 4) 
>> },
>>  { INFO_STACKED("mt25ql01g", 0x20ba21, 0x1040, 64 << 10, 2048, ER_4K, 2) 
>> },
>>
> 

-- 
best,

Igor Kononenko



Re: [PATCH 0/2] Fix the assert failure in scsi_dma_complete

2020-08-17 Thread Paolo Bonzini
On 15/08/20 16:19, Li Qiang wrote:
> Currently in 'megasas_map_sgl' when 'iov_count=0' will just return
> success however the 'cmd' doens't contain any iov. This will cause
> the assert in 'scsi_dma_complete' failed. This is because in
> 'dma_blk_cb' the 'dbs->sg_cur_index == dbs->sg->nsg' will be true
> and just call 'dma_complete'. However now there is no aiocb returned.
> 
> This is the LP#1878263:
> 
> -->https://bugs.launchpad.net/qemu/+bug/1878263
> 
> To solve this we will consider the 'iov_count=0' is an error.
> In the first patch, I uses -1 to indicate an error and in the second
> patch I consider 'iov_count=0' is an error.
> 
> Li Qiang (2):
>   hw: megasas: return -1 when 'megasas_map_sgl' fails
>   hw: megasas: consider 'iov_count=0' is an error in megasas_map_sgl
> 
>  hw/scsi/megasas.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 

Queued, thanks.  But do you have a qtest for this?

Paolo




Re: [PATCH 0/2] block; scsi-generic: Fix max transfer size calculation

2020-08-17 Thread Paolo Bonzini
On 12/08/20 00:51, Dmitry Fomichev wrote:
> When a host-managed zoned device is passed through to the
> guest system using scsi-generic driver, the maximum i/o size for the
> drive at the guest may end up being larger than at the host, causing
> i/o errors while accessing the backing zoned drive at the host system.
> 
> Two problems prevent correct setting of the maximum i/o transfer size
> at the guest in this configuration. One issue is specific to
> host-managed zone devices - scsi-generic driver doesn't recognize the
> SCSI type of HM-zoned devices. The other problem is that file-posix
> code for finding max_segments system value doesn't correctly handle
> SG nodes.
> 
> The following two patches fix these problems.
> 
> Based-on: <20200424084338.26803-16-arm...@redhat.com>
> 
> Dmitry Fomichev (2):
>   file-posix: Correctly read max_segments of SG nodes
>   scsi-generic: Fix HM-zoned device scan
> 
>  block/file-posix.c   | 55 +++-
>  hw/scsi/scsi-generic.c   | 10 +---
>  include/scsi/constants.h |  1 +
>  3 files changed, 39 insertions(+), 27 deletions(-)
> 

The patches are more or less unrelated; I have queued the second, while
the first is outside my maintenance area.

Paolo




Re: [PATCH 2/2] scsi-generic: Fix HM-zoned device scan

2020-08-17 Thread Alistair Francis
On Tue, Aug 11, 2020 at 3:52 PM Dmitry Fomichev  wrote:
>
> Several important steps during device scan depend on SCSI type of the
> device. For example, max_transfer property is only determined and
> assigned if the device has the type of TYPE_DISK.
>
> Host-managed ZBC disks retain most of the properties of regular SCSI
> drives, but they have their own SCSI device type, 0x14. This prevents
> the proper assignment of max_transfer property for HM-zoned devices in
> scsi-generic driver leading to I/O errors if the maximum i/o size
> calculated at the guest exceeds the host value.
>
> To fix this, define TYPE_ZBC to have the standard value from SCSI ZBC
> standard spec. Several scan steps that were previously done only for
> TYPE_DISK devices, are now performed for the SCSI devices having
> TYPE_ZBC too.
>
> Reported-by: Johannes Thumshirn 
> Signed-off-by: Dmitry Fomichev 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/scsi/scsi-generic.c   | 10 ++
>  include/scsi/constants.h |  1 +
>  2 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
> index 86ed0a3822..2cb23ca891 100644
> --- a/hw/scsi/scsi-generic.c
> +++ b/hw/scsi/scsi-generic.c
> @@ -162,7 +162,8 @@ static void scsi_handle_inquiry_reply(SCSIGenericReq *r, 
> SCSIDevice *s)
>  }
>  }
>
> -if (s->type == TYPE_DISK && (r->req.cmd.buf[1] & 0x01)) {
> +if ((s->type == TYPE_DISK || s->type == TYPE_ZBC) &&
> +(r->req.cmd.buf[1] & 0x01)) {
>  page = r->req.cmd.buf[2];
>  if (page == 0xb0) {
>  uint32_t max_transfer =
> @@ -299,10 +300,11 @@ static void scsi_read_complete(void * opaque, int ret)
>  }
>  blk_set_guest_block_size(s->conf.blk, s->blocksize);
>
> -/* Patch MODE SENSE device specific parameters if the BDS is opened
> +/*
> + * Patch MODE SENSE device specific parameters if the BDS is opened
>   * readonly.
>   */
> -if ((s->type == TYPE_DISK || s->type == TYPE_TAPE) &&
> +if ((s->type == TYPE_DISK || s->type == TYPE_TAPE || s->type == 
> TYPE_ZBC) &&
>  blk_is_read_only(s->conf.blk) &&
>  (r->req.cmd.buf[0] == MODE_SENSE ||
>   r->req.cmd.buf[0] == MODE_SENSE_10) &&
> @@ -617,7 +619,7 @@ static void 
> scsi_generic_read_device_identification(SCSIDevice *s)
>  void scsi_generic_read_device_inquiry(SCSIDevice *s)
>  {
>  scsi_generic_read_device_identification(s);
> -if (s->type == TYPE_DISK) {
> +if (s->type == TYPE_DISK || s->type == TYPE_ZBC) {
>  scsi_generic_set_vpd_bl_emulation(s);
>  } else {
>  s->needs_vpd_bl_emulation = false;
> diff --git a/include/scsi/constants.h b/include/scsi/constants.h
> index 874176019e..2a32c08b5e 100644
> --- a/include/scsi/constants.h
> +++ b/include/scsi/constants.h
> @@ -218,6 +218,7 @@
>  #define TYPE_ENCLOSURE  0x0d/* Enclosure Services Device */
>  #define TYPE_RBC0x0e/* Simplified Direct-Access Device */
>  #define TYPE_OSD0x11/* Object-storage Device */
> +#define TYPE_ZBC0x14/* Host-managed Zoned SCSI Device */
>  #define TYPE_WLUN   0x1e/* Well known LUN */
>  #define TYPE_NOT_PRESENT0x1f
>  #define TYPE_INACTIVE   0x20
> --
> 2.21.0
>
>



Re: [PATCH 0/1] qcow2: Skip copy-on-write when allocating a zero cluster

2020-08-17 Thread Kevin Wolf
Am 17.08.2020 um 17:31 hat Alberto Garcia geschrieben:
> On Mon 17 Aug 2020 12:10:19 PM CEST, Kevin Wolf wrote:
> >> Since commit c8bb23cbdbe / QEMU 4.1.0 (and if the storage backend
> >> allows it) writing to an image created with preallocation=metadata
> >> can be slower (20% in my tests) than writing to an image with no
> >> preallocation at all.
> >
> > A while ago we had a case where commit c8bb23cbdbe was actually
> > reported as a major performance regression, so it's a big "it
> > depends".
> >
> > XFS people told me that they consider this code a bad idea. Just
> > because it's a specialised "write zeroes" operation, it's not
> > necessarily fast on filesystems. In particular, on XFS, ZERO_RANGE
> > causes a queue drain with O_DIRECT (probably hurts cases with high
> > queue depths) and additionally even a page cache flush without
> > O_DIRECT.
> >
> > So in a way this whole thing is a two-edged sword.
> 
> I see... on ext4 the improvements are clearly visible. Are we not
> detecting this for xfs? We do have an s->is_xfs flag.

My understanding is that XFS and ext4 behave very similar in this
respect. It's not a clear loss on XFS either, some cases are improved.
But cases that get a performance regression exist, too. It's a question
of the workload, the file system state (e.g. fragmentation of the image
file) and the storage.

So I don't think checking for a specific filesystem is going to improve
things.

> >> a) shall we include a warning in the documentation ("note that this
> >> preallocation mode can result in worse performance")?
> >
> > To be honest, I don't really understand this case yet. With metadata
> > preallocation, the clusters are already marked as allocated, so why
> > would handle_alloc_space() even be called? We're not allocating new
> > clusters after all?
> 
> It's not called, what happens is what you say below:
> 
> > Or are you saying that ZERO_RANGE + pwrite on a sparse file (= cluster
> > allocation) is faster for you than just the pwrite alone (= writing to
> > already allocated cluster)?
> 
> Yes, 20% faster in my tests (4KB random writes), but in the latter case
> the cluster is already allocated only at the qcow2 level, not on the
> filesystem. preallocation=falloc is faster than preallocation=metadata
> (preallocation=off sits in the middle).

Hm, this feels wrong. Doing more operations should never be faster than
doing less operations.

Maybe the difference is in allocating 64k at once instead of doing a
separate allocation for every 4k block? But with the extent size hint
patches to file-posix, we should allocate 1 MB at once by default now
(if your test image was newly created). Can you check whether this is in
effect for your image file?

Kevin




Re: [RFC PATCH 14/22] block/export: Move AioContext from NBDExport to BlockExport

2020-08-17 Thread Max Reitz
On 17.08.20 17:22, Kevin Wolf wrote:
> Am 17.08.2020 um 16:56 hat Max Reitz geschrieben:
>> On 13.08.20 18:29, Kevin Wolf wrote:
>>> Signed-off-by: Kevin Wolf 
>>> ---
>>>  include/block/export.h |  6 ++
>>>  nbd/server.c   | 26 +-
>>>  2 files changed, 19 insertions(+), 13 deletions(-)
>>
>> Reviewed-by: Max Reitz 
>>
>>> diff --git a/include/block/export.h b/include/block/export.h
>>> index f44290a4a2..5459f79469 100644
>>> --- a/include/block/export.h
>>> +++ b/include/block/export.h
>>> @@ -33,6 +33,12 @@ struct BlockExport {
>>>   * the export.
>>>   */
>>>  int refcount;
>>> +
>>> +/*
>>> + * The AioContex whose lock needs to be held while calling
>>
>> *AioContext
>>
>>> + * BlockExportDriver callbacks.
>>
>> Hm.  But other blk_exp_* functions (i.e. the refcount manipulation
>> functions) are fair game?
> 
> Hmm... The assumption was the ref/unref are only called from the main
> thread, but maybe that's not true? So maybe blk_exp_*() shouldn't lock
> the AioContext internally, but require that the lock is already held, so
> that they can be called both from within the AioContext (where we don't
> want to lock a second tim) and from the main context.
> 
> I also guess we need a separate mutex to protect the exports list if
> unref can be called from different threads.
> 
> And probably the existing NBD server code has already the same problems
> with respect to different AioContexts.
> 
>>> + */
>>> +AioContext *ctx;
>>>  };
>>>  
>>>  extern const BlockExportDriver blk_exp_nbd;
>>> diff --git a/nbd/server.c b/nbd/server.c
>>> index 2bf30bb731..b735a68429 100644
>>> --- a/nbd/server.c
>>> +++ b/nbd/server.c
>>
>> [...]
>>
>>> @@ -1466,7 +1464,7 @@ static void blk_aio_attached(AioContext *ctx, void 
>>> *opaque)
>>>  
>>>  trace_nbd_blk_aio_attached(exp->name, ctx);
>>>  
>>> -exp->ctx = ctx;
>>> +exp->common.ctx = ctx;
>>
>> (Not sure if Ḯ’m missing anything to that regard), but perhaps after
>> patch 21 we can move this part to the common block export code, and
>> maybe make it call a BlockExportDriver callback (that handles the rest
>> of this function).
> 
> Could probably be done. Not every export driver may support switching
> AioContexts, but we can make it conditional on having the callback.

Good point.

> So do I understand right from your comments to the series in general
> that you would prefer to make this series more complete, even if that
> means that it becomes quite a bit longer?

I’m not necessarily asking for this now, I’m mostly asking whether you
have the same idea as me on things like this.  I don’t mind too much
leaving this in an unfinished state as long as we both agree that it’s
kind of unfinished.

Sorry if this is a bit frustrating to you because you wrote in the cover
letter that indeed you are unsure about how complete you want to do
this.  The problem is that I don’t know exactly what things you’re
referring to, so I just point out everything that stands out to me.  If
you’re aware of those things, and we can work on them later, then that’s OK.

OTOH...  Yes, from a design standpoint, I think it makes sense to pull
out as much specialized code as possible from NBD into the generalized
block export code.  But I say that as a reviewer.  You would have to do
that, so I want to leave it to you how much work you think is reasonable
to put into that.  Leaving a couple of rough edges here and there
shouldn’t be a problem.  (Or maybe leaving something to me for when I
add fuse export code.)

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 0/1] qcow2: Skip copy-on-write when allocating a zero cluster

2020-08-17 Thread Alberto Garcia
On Mon 17 Aug 2020 12:10:19 PM CEST, Kevin Wolf wrote:
>> Since commit c8bb23cbdbe / QEMU 4.1.0 (and if the storage backend
>> allows it) writing to an image created with preallocation=metadata
>> can be slower (20% in my tests) than writing to an image with no
>> preallocation at all.
>
> A while ago we had a case where commit c8bb23cbdbe was actually
> reported as a major performance regression, so it's a big "it
> depends".
>
> XFS people told me that they consider this code a bad idea. Just
> because it's a specialised "write zeroes" operation, it's not
> necessarily fast on filesystems. In particular, on XFS, ZERO_RANGE
> causes a queue drain with O_DIRECT (probably hurts cases with high
> queue depths) and additionally even a page cache flush without
> O_DIRECT.
>
> So in a way this whole thing is a two-edged sword.

I see... on ext4 the improvements are clearly visible. Are we not
detecting this for xfs? We do have an s->is_xfs flag.

>> a) shall we include a warning in the documentation ("note that this
>> preallocation mode can result in worse performance")?
>
> To be honest, I don't really understand this case yet. With metadata
> preallocation, the clusters are already marked as allocated, so why
> would handle_alloc_space() even be called? We're not allocating new
> clusters after all?

It's not called, what happens is what you say below:

> Or are you saying that ZERO_RANGE + pwrite on a sparse file (= cluster
> allocation) is faster for you than just the pwrite alone (= writing to
> already allocated cluster)?

Yes, 20% faster in my tests (4KB random writes), but in the latter case
the cluster is already allocated only at the qcow2 level, not on the
filesystem. preallocation=falloc is faster than preallocation=metadata
(preallocation=off sits in the middle).

>> b) why don't we also initialize preallocated clusters with
>>QCOW_OFLAG_ZERO? (at least when there's no subclusters involved,
>>i.e. no backing file). This would make reading from them (and
>>writing to them, after this patch) faster.
>
> Because the idea with metadata preallocation is that you don't have to
> perform any COW and update any metdata because everything is already
> allocated. If you set the zero flag, you get cluster allocations with
> COW again, defeating the whole purpose of the preallocation.

Fair enough.

Berto



Re: [PATCH 0/1] qcow2: Skip copy-on-write when allocating a zero cluster

2020-08-17 Thread Alberto Garcia
On Mon 17 Aug 2020 05:53:07 PM CEST, Kevin Wolf wrote:
>> > Or are you saying that ZERO_RANGE + pwrite on a sparse file (=
>> > cluster allocation) is faster for you than just the pwrite alone (=
>> > writing to already allocated cluster)?
>> 
>> Yes, 20% faster in my tests (4KB random writes), but in the latter
>> case the cluster is already allocated only at the qcow2 level, not on
>> the filesystem. preallocation=falloc is faster than
>> preallocation=metadata (preallocation=off sits in the middle).
>
> Hm, this feels wrong. Doing more operations should never be faster
> than doing less operations.
>
> Maybe the difference is in allocating 64k at once instead of doing a
> separate allocation for every 4k block?

That's what I imagine, yes. I'll have a look at your patches and tell
you.

Berto



Re: [RFC PATCH 09/22] nbd: Add writethrough to block-export-add

2020-08-17 Thread Max Reitz
On 17.08.20 16:32, Kevin Wolf wrote:
> Am 17.08.2020 um 15:51 hat Max Reitz geschrieben:
>> On 17.08.20 15:13, Kevin Wolf wrote:
>>> Am 17.08.2020 um 14:56 hat Max Reitz geschrieben:
 On 13.08.20 18:29, Kevin Wolf wrote:
> qemu-nbd allows use of writethrough cache modes, which mean that write
> requests made through NBD will cause a flush before they complete.
> Expose the same functionality in block-export-add.
>
> Signed-off-by: Kevin Wolf 
> ---
>  qapi/block-export.json | 7 ++-
>  blockdev-nbd.c | 2 +-
>  2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/qapi/block-export.json b/qapi/block-export.json
> index 1fdc55c53a..4ce163411f 100644
> --- a/qapi/block-export.json
> +++ b/qapi/block-export.json
> @@ -167,10 +167,15 @@
>  # Describes a block export, i.e. how single node should be exported on an
>  # external interface.
>  #
> +# @writethrough: If true, caches are flushed after every write request 
> to the
> +#export before completion is signalled. (since: 5.2;
> +#default: false)
> +#
>  # Since: 4.2
>  ##
>  { 'union': 'BlockExportOptions',
> -  'base': { 'type': 'BlockExportType' },
> +  'base': { 'type': 'BlockExportType',
> +'*writethrough': 'bool' },
>'discriminator': 'type',
>'data': {
>'nbd': 'BlockExportOptionsNbd'

 Hm.  I find it weird to have @writethrough in the base but @device in
 the specialized class.

 I think everything that will be common to all block exports should be in
 the base, and that probably includes a node-name.  I’m aware that will
 make things more tedious in the code, but perhaps it would be a nicer
 interface in the end.  Or is the real problem that that would create
 problems in the storage daemon’s command line interface, because then
 the specialized (legacy) NBD interface would no longer be compatible
 with the new generalized block export interface?
>>>
>>> Indeed. I think patch 15 has what you're looking for.
>>
>> Great. :)
>>
>> Discussions where both participants have the same opinion from the
>> start are the best ones.
> 
> Makes things a lot easier.
> 
> Maybe I should try to move patch 15 earlier. The series is mostly just
> in the order that I wrote things, but there were also a few nasty
> dependencies in the part the generalises things from NBD to BlockExport.
> So I'm not sure if this is a patch that can be moved.
> 
 Anyway, @writable might be a similar story.  A @read-only may make sense
 in general, I think.
>>>
>>> Pulling @writable up is easier than a @read-only, but that's a naming
>>> detail.
>>
>> Sure.
>>
>>> In general I agree, but this part isn't addressed in this series yet.
>>> Part of the reason why this is an RFC was to find out if I should
>>> include things like this or if we'll do it when we add another export
>>> type or common functionality that needs the same option.
>>
>> Sure, sure.
> 
> So should I or not? :-)

Can we delay it until after this series?  I.e., as long as it retains
the name “writable”, would pulling it up into BlockExportOptions a
compatible change?

If so, then I suppose we could do it afterwards.  But I think it does
make the most sense to “just” do it as part of this series.

>> Meta: I personally don’t like RFC patches very much.  RFC to me means
>> everything is fair game, and reviewers should be free to let their
>> thoughts wander and come up with perhaps wild ideas, just trying to
>> gauge what everyone thinks.
>>
>> When I’m the submitter, I tend to get defensive then, because I’ve
>> invested time in writing the code already, so I tend to be biased
>> against fundamental changes.  (Horrible personal trait.  I’m working
>> on it.)
> 
> This makes sense. Nobody likes having to rewrite their RFC series.
> 
> But there is one thing I dread even more: Polishing the RFC series for
> another week until I can send it out as non-RFC and _then_ having to
> rewrite it.

Yes.  Especially bad with tests.

>> As a reviewer, the code and thus some fleshed out design is there
>> already, so it’s difficult to break free from that and find completely
>> different solutions to the original problem.
>> (I kind of ventured in that direction for this patch, and it seems like
>> you immediately noticed that my response was different from usual and
>> pointed out the RFC status, perhaps to make me feel more comfortable in
>> questioning the fundamental design more.  Which I noticed, hence this
>> wall of text.)
> 
> Basically just telling you that I was already interested in your input
> for this point specifically when I sent the series.

OK :)

>> Perhaps I’m wrong.  Perhaps it’s just myself (the points I’ve just
>> listed are definitely my own personal weaknesses), but I can’t help but
>> project and assume that others may feel similar, at least in 

Re: [RFC PATCH 15/22] block/export: Move device to BlockExportOptions

2020-08-17 Thread Kevin Wolf
Am 17.08.2020 um 17:13 hat Max Reitz geschrieben:
> On 13.08.20 18:29, Kevin Wolf wrote:
> > Every block export needs a block node to export, so move the 'device'
> > option from BlockExportOptionsNbd to BlockExportOptions.
> > 
> > To maintain compatibility in nbd-server-add, BlockExportOptionsNbd needs
> > to be wrapped by a new type NbdServerAddOptions that adds 'device' back
> > because nbd-server-add doesn't use the BlockExportOptions base type.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  qapi/block-export.json | 27 +--
> >  block/export/export.c  | 26 --
> >  block/monitor/block-hmp-cmds.c |  6 +++---
> >  blockdev-nbd.c |  4 ++--
> >  qemu-nbd.c |  2 +-
> >  5 files changed, 47 insertions(+), 18 deletions(-)
> 
> (Code diff looks good, just a question on the interface:)
> 
> > diff --git a/qapi/block-export.json b/qapi/block-export.json
> > index 4ce163411f..d68f3bf87e 100644
> > --- a/qapi/block-export.json
> > +++ b/qapi/block-export.json
> 
> [...]
> 
> > @@ -167,6 +179,8 @@
> >  # Describes a block export, i.e. how single node should be exported on an
> >  # external interface.
> >  #
> > +# @device: The device name or node name of the node to be exported
> > +#
> 
> Wouldn’t it be better to restrict ourselves to a node name here?
> (Bluntly ignoring the fact that doing so would make this patch an
> incompatible change, which would require some reordering in this series,
> unless we decide to just ignore that intra-series incompatibility.)

We already have intra-series incompatibility, so I wouldn't mind that.

> OTOH...  What does “better” mean.  It won’t hurt anyone to keep this as
> @device.  It’s just that I feel like if we had no legacy burden and did
> this all from scratch, we would just make it @node-name.
> 
> Did you deliberately decide against @node-name?

At first I thought I could still share code between nbd-server-add and
block-export-add, but that's not the case. Then I guess the only other
reason I have is consistency with other QMP commands. I won't pretend
it's a strong one.

Kevin


signature.asc
Description: PGP signature


[PULL 1/3] async: rename event_notifier_dummy_cb/poll()

2020-08-17 Thread Stefan Hajnoczi
The event_notifier_*() prefix can be confused with the EventNotifier
APIs that are also called event_notifier_*().

Rename the functions to aio_context_notifier_*() to make it clear that
they relate to the AioContext::notifier field.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Paolo Bonzini 
Message-id: 20200806131802.569478-2-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 util/async.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/util/async.c b/util/async.c
index 1319eee3bc..d9f987e133 100644
--- a/util/async.c
+++ b/util/async.c
@@ -445,12 +445,12 @@ static void aio_timerlist_notify(void *opaque, 
QEMUClockType type)
 aio_notify(opaque);
 }
 
-static void event_notifier_dummy_cb(EventNotifier *e)
+static void aio_context_notifier_dummy_cb(EventNotifier *e)
 {
 }
 
 /* Returns true if aio_notify() was called (e.g. a BH was scheduled) */
-static bool event_notifier_poll(void *opaque)
+static bool aio_context_notifier_poll(void *opaque)
 {
 EventNotifier *e = opaque;
 AioContext *ctx = container_of(e, AioContext, notifier);
@@ -508,8 +508,8 @@ AioContext *aio_context_new(Error **errp)
 
 aio_set_event_notifier(ctx, >notifier,
false,
-   event_notifier_dummy_cb,
-   event_notifier_poll);
+   aio_context_notifier_dummy_cb,
+   aio_context_notifier_poll);
 #ifdef CONFIG_LINUX_AIO
 ctx->linux_aio = NULL;
 #endif
-- 
2.26.2



[PULL 3/3] aio-posix: keep aio_notify_me disabled during polling

2020-08-17 Thread Stefan Hajnoczi
Polling only monitors the ctx->notified field and does not need the
ctx->notifier EventNotifier to be signalled. Keep ctx->aio_notify_me
disabled while polling to avoid unnecessary EventNotifier syscalls.

This optimization improves virtio-blk 4KB random read performance by
18%. The following results are with an IOThread and the null-co block
driver:

Test IOPS   Error
Before  244518.62 ± 1.20%
After   290706.11 ± 0.44%

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20200806131802.569478-4-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 util/aio-posix.c | 47 +--
 1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/util/aio-posix.c b/util/aio-posix.c
index 1b2a3af65b..f7f13ebfc2 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -464,9 +464,6 @@ static bool remove_idle_poll_handlers(AioContext *ctx, 
int64_t now)
  *
  * Polls for a given time.
  *
- * Note that ctx->notify_me must be non-zero so this function can detect
- * aio_notify().
- *
  * Note that the caller must have incremented ctx->list_lock.
  *
  * Returns: true if progress was made, false otherwise
@@ -476,7 +473,6 @@ static bool run_poll_handlers(AioContext *ctx, int64_t 
max_ns, int64_t *timeout)
 bool progress;
 int64_t start_time, elapsed_time;
 
-assert(ctx->notify_me);
 assert(qemu_lockcnt_count(>list_lock) > 0);
 
 trace_run_poll_handlers_begin(ctx, max_ns, *timeout);
@@ -520,8 +516,6 @@ static bool run_poll_handlers(AioContext *ctx, int64_t 
max_ns, int64_t *timeout)
  * @timeout: timeout for blocking wait, computed by the caller and updated if
  *polling succeeds.
  *
- * ctx->notify_me must be non-zero so this function can detect aio_notify().
- *
  * Note that the caller must have incremented ctx->list_lock.
  *
  * Returns: true if progress was made, false otherwise
@@ -556,6 +550,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
 AioHandlerList ready_list = QLIST_HEAD_INITIALIZER(ready_list);
 int ret = 0;
 bool progress;
+bool use_notify_me;
 int64_t timeout;
 int64_t start = 0;
 
@@ -566,33 +561,39 @@ bool aio_poll(AioContext *ctx, bool blocking)
  */
 assert(in_aio_context_home_thread(ctx));
 
-/* aio_notify can avoid the expensive event_notifier_set if
+qemu_lockcnt_inc(>list_lock);
+
+if (ctx->poll_max_ns) {
+start = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+}
+
+timeout = blocking ? aio_compute_timeout(ctx) : 0;
+progress = try_poll_mode(ctx, );
+assert(!(timeout && progress));
+
+/*
+ * aio_notify can avoid the expensive event_notifier_set if
  * everything (file descriptors, bottom halves, timers) will
  * be re-evaluated before the next blocking poll().  This is
  * already true when aio_poll is called with blocking == false;
  * if blocking == true, it is only true after poll() returns,
  * so disable the optimization now.
  */
-if (blocking) {
+use_notify_me = timeout != 0;
+if (use_notify_me) {
 atomic_set(>notify_me, atomic_read(>notify_me) + 2);
 /*
- * Write ctx->notify_me before computing the timeout
- * (reading bottom half flags, etc.).  Pairs with
+ * Write ctx->notify_me before reading ctx->notified.  Pairs with
  * smp_mb in aio_notify().
  */
 smp_mb();
-}
-
-qemu_lockcnt_inc(>list_lock);
 
-if (ctx->poll_max_ns) {
-start = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+/* Don't block if aio_notify() was called */
+if (atomic_read(>notified)) {
+timeout = 0;
+}
 }
 
-timeout = blocking ? aio_compute_timeout(ctx) : 0;
-progress = try_poll_mode(ctx, );
-assert(!(timeout && progress));
-
 /* If polling is allowed, non-blocking aio_poll does not need the
  * system call---a single round of run_poll_handlers_once suffices.
  */
@@ -600,12 +601,14 @@ bool aio_poll(AioContext *ctx, bool blocking)
 ret = ctx->fdmon_ops->wait(ctx, _list, timeout);
 }
 
-if (blocking) {
+if (use_notify_me) {
 /* Finish the poll before clearing the flag.  */
-atomic_store_release(>notify_me, atomic_read(>notify_me) - 
2);
-aio_notify_accept(ctx);
+atomic_store_release(>notify_me,
+ atomic_read(>notify_me) - 2);
 }
 
+aio_notify_accept(ctx);
+
 /* Adjust polling time */
 if (ctx->poll_max_ns) {
 int64_t block_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start;
-- 
2.26.2



[PULL 0/3] Block patches

2020-08-17 Thread Stefan Hajnoczi
The following changes since commit d0ed6a69d399ae193959225cdeaa9382746c91cc:

  Update version for v5.1.0 release (2020-08-11 17:07:03 +0100)

are available in the Git repository at:

  https://github.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 44277bf914471962c9e88e09c859aae65ae109c4:

  aio-posix: keep aio_notify_me disabled during polling (2020-08-13 13:34:14 =
+0100)


Pull request



Stefan Hajnoczi (3):
  async: rename event_notifier_dummy_cb/poll()
  async: always set ctx->notified in aio_notify()
  aio-posix: keep aio_notify_me disabled during polling

 util/aio-posix.c | 47 +--
 util/async.c | 36 +++-
 2 files changed, 48 insertions(+), 35 deletions(-)

--=20
2.26.2



Re: [RFC PATCH 11/22] qemu-nbd: Use blk_exp_add() to create the export

2020-08-17 Thread Kevin Wolf
Am 17.08.2020 um 16:27 hat Max Reitz geschrieben:
> On 13.08.20 18:29, Kevin Wolf wrote:
> > With this change, NBD exports are only created through the BlockExport
> > interface any more. This allows us finally to move things from the NBD
> > layer to the BlockExport layer if they make sense for other export
> > types, too.
> 
> I see.
> 
> > blk_exp_add() returns only a weak reference, so the explicit
> > nbd_export_put() goes away.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  include/block/export.h |  2 ++
> >  include/block/nbd.h|  1 +
> >  block/export/export.c  |  2 +-
> >  blockdev-nbd.c |  8 +++-
> >  qemu-nbd.c | 28 ++--
> >  5 files changed, 33 insertions(+), 8 deletions(-)
> 
> [...]
> 
> > diff --git a/blockdev-nbd.c b/blockdev-nbd.c
> > index d5b084acc2..8dd127af52 100644
> > --- a/blockdev-nbd.c
> > +++ b/blockdev-nbd.c
> 
> [...]
> 
> > @@ -176,7 +182,7 @@ BlockExport *nbd_export_create(BlockExportOptions 
> > *exp_args, Error **errp)
> >  
> >  assert(exp_args->type == BLOCK_EXPORT_TYPE_NBD);
> >  
> > -if (!nbd_server) {
> > +if (!nbd_server && !is_qemu_nbd) {
> 
> (This begs the question of how difficult it would be to let qemu-nbd use
> QMP’s nbd-server-start, but I will not ask it, for I fear the answer.)

(It would probably include something along the lines of "patches
welcome". (I initially wanted to do this, but came to the conclusion
that it wasn't for this series when I noticed the socket activation
code and discussed with danpb on IRC how to integrate it in
SocketAddress.))

> >  error_setg(errp, "NBD server not running");
> >  return NULL;
> >  }
> > diff --git a/qemu-nbd.c b/qemu-nbd.c
> > index 48aa8a9d46..d967b8fcb9 100644
> > --- a/qemu-nbd.c
> > +++ b/qemu-nbd.c
> 
> [...]
> 
> > @@ -1050,9 +1050,27 @@ int main(int argc, char **argv)
> >  
> >  bs->detect_zeroes = detect_zeroes;
> >  
> > -export = nbd_export_new(bs, export_name,
> > -export_description, bitmap, readonly, shared > 
> > 1,
> > -writethrough, _fatal);
> > +nbd_server_is_qemu_nbd(true);
> > +
> > +export_opts = g_new(BlockExportOptions, 1);
> > +*export_opts = (BlockExportOptions) {
> > +.type   = BLOCK_EXPORT_TYPE_NBD,
> > +.has_writethrough   = true,
> > +.writethrough   = writethrough,
> > +.u.nbd = {
> > +.device = g_strdup(bdrv_get_node_name(bs)),
> > +.has_name   = true,
> > +.name   = g_strdup(export_name),
> > +.has_description= !!export_description,
> > +.description= g_strdup(export_description),
> > +.has_writable   = true,
> > +.writable   = !readonly,
> > +.has_bitmap = !!bitmap,
> > +.bitmap = g_strdup(bitmap),
> > +},
> > +};
> > +blk_exp_add(export_opts, _fatal);
> 
> Why not use the already-global qmp_block_export_add(), if we don’t need
> the return value here?  (Will we require it at some point?)

I can do that. I needed the return value originally, but after some
reshuffling of the series it magically disappeared.

Kevin


signature.asc
Description: PGP signature


Re: [RFC PATCH 15/22] block/export: Move device to BlockExportOptions

2020-08-17 Thread Max Reitz
On 13.08.20 18:29, Kevin Wolf wrote:
> Every block export needs a block node to export, so move the 'device'
> option from BlockExportOptionsNbd to BlockExportOptions.
> 
> To maintain compatibility in nbd-server-add, BlockExportOptionsNbd needs
> to be wrapped by a new type NbdServerAddOptions that adds 'device' back
> because nbd-server-add doesn't use the BlockExportOptions base type.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  qapi/block-export.json | 27 +--
>  block/export/export.c  | 26 --
>  block/monitor/block-hmp-cmds.c |  6 +++---
>  blockdev-nbd.c |  4 ++--
>  qemu-nbd.c |  2 +-
>  5 files changed, 47 insertions(+), 18 deletions(-)

(Code diff looks good, just a question on the interface:)

> diff --git a/qapi/block-export.json b/qapi/block-export.json
> index 4ce163411f..d68f3bf87e 100644
> --- a/qapi/block-export.json
> +++ b/qapi/block-export.json

[...]

> @@ -167,6 +179,8 @@
>  # Describes a block export, i.e. how single node should be exported on an
>  # external interface.
>  #
> +# @device: The device name or node name of the node to be exported
> +#

Wouldn’t it be better to restrict ourselves to a node name here?
(Bluntly ignoring the fact that doing so would make this patch an
incompatible change, which would require some reordering in this series,
unless we decide to just ignore that intra-series incompatibility.)

OTOH...  What does “better” mean.  It won’t hurt anyone to keep this as
@device.  It’s just that I feel like if we had no legacy burden and did
this all from scratch, we would just make it @node-name.

Did you deliberately decide against @node-name?

Max



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 15/22] block/export: Move device to BlockExportOptions

2020-08-17 Thread Max Reitz
On 17.08.20 17:27, Kevin Wolf wrote:
> Am 17.08.2020 um 17:13 hat Max Reitz geschrieben:
>> On 13.08.20 18:29, Kevin Wolf wrote:
>>> Every block export needs a block node to export, so move the 'device'
>>> option from BlockExportOptionsNbd to BlockExportOptions.
>>>
>>> To maintain compatibility in nbd-server-add, BlockExportOptionsNbd needs
>>> to be wrapped by a new type NbdServerAddOptions that adds 'device' back
>>> because nbd-server-add doesn't use the BlockExportOptions base type.
>>>
>>> Signed-off-by: Kevin Wolf 
>>> ---
>>>  qapi/block-export.json | 27 +--
>>>  block/export/export.c  | 26 --
>>>  block/monitor/block-hmp-cmds.c |  6 +++---
>>>  blockdev-nbd.c |  4 ++--
>>>  qemu-nbd.c |  2 +-
>>>  5 files changed, 47 insertions(+), 18 deletions(-)
>>
>> (Code diff looks good, just a question on the interface:)
>>
>>> diff --git a/qapi/block-export.json b/qapi/block-export.json
>>> index 4ce163411f..d68f3bf87e 100644
>>> --- a/qapi/block-export.json
>>> +++ b/qapi/block-export.json
>>
>> [...]
>>
>>> @@ -167,6 +179,8 @@
>>>  # Describes a block export, i.e. how single node should be exported on an
>>>  # external interface.
>>>  #
>>> +# @device: The device name or node name of the node to be exported
>>> +#
>>
>> Wouldn’t it be better to restrict ourselves to a node name here?
>> (Bluntly ignoring the fact that doing so would make this patch an
>> incompatible change, which would require some reordering in this series,
>> unless we decide to just ignore that intra-series incompatibility.)
> 
> We already have intra-series incompatibility, so I wouldn't mind that.
> 
>> OTOH...  What does “better” mean.  It won’t hurt anyone to keep this as
>> @device.  It’s just that I feel like if we had no legacy burden and did
>> this all from scratch, we would just make it @node-name.
>>
>> Did you deliberately decide against @node-name?
> 
> At first I thought I could still share code between nbd-server-add and
> block-export-add, but that's not the case. Then I guess the only other
> reason I have is consistency with other QMP commands. I won't pretend
> it's a strong one.

If “using @node-name would be more natural” doesn’t resonate with you,
then I suppose we should just leave it as @device.  It isn’t harder to
handle for qemu, and maybe it makes transitioning easier for some users
(although I can’t quite imagine how).

Max



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 13/22] block/export: Move refcount from NBDExport to BlockExport

2020-08-17 Thread Max Reitz
On 13.08.20 18:29, Kevin Wolf wrote:
> Having a refcount makes sense for all types of block exports. It is also
> a prerequisite for keeping a list of all exports at the BlockExport
> level.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/export.h | 10 ++
>  include/block/nbd.h|  2 --
>  block/export/export.c  | 14 
>  blockdev-nbd.c |  2 +-
>  nbd/server.c   | 72 +++---
>  5 files changed, 58 insertions(+), 42 deletions(-)

[...]

> diff --git a/blockdev-nbd.c b/blockdev-nbd.c
> index 8dd127af52..a8b7b785e7 100644
> --- a/blockdev-nbd.c
> +++ b/blockdev-nbd.c
> @@ -232,7 +232,7 @@ BlockExport *nbd_export_create(BlockExportOptions 
> *exp_args, Error **errp)
>  /* The list of named exports has a strong reference to this export now 
> and
>   * our only way of accessing it is through nbd_export_find(), so we can 
> drop
>   * the strong reference that is @exp. */
> -nbd_export_put(exp);
> +blk_exp_unref((BlockExport*) exp);

:/

Less so because of the asterisk, but more so because of “another
instance of a cast because we can’t access a BlockExport’s fields.

>   out:
>  aio_context_release(aio_context);
> diff --git a/nbd/server.c b/nbd/server.c
> index 4c594e6558..2bf30bb731 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c

[...]

> @@ -1537,7 +1536,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs,
>  
>  exp = g_new0(NBDExport, 1);
>  exp->common = (BlockExport) {
> -.drv = _exp_nbd,
> +.drv= _exp_nbd,
> +.refcount   = 1,
>  };

This makes me wish...  Ah, for patch 16, I see. :)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 14/22] block/export: Move AioContext from NBDExport to BlockExport

2020-08-17 Thread Kevin Wolf
Am 17.08.2020 um 16:56 hat Max Reitz geschrieben:
> On 13.08.20 18:29, Kevin Wolf wrote:
> > Signed-off-by: Kevin Wolf 
> > ---
> >  include/block/export.h |  6 ++
> >  nbd/server.c   | 26 +-
> >  2 files changed, 19 insertions(+), 13 deletions(-)
> 
> Reviewed-by: Max Reitz 
> 
> > diff --git a/include/block/export.h b/include/block/export.h
> > index f44290a4a2..5459f79469 100644
> > --- a/include/block/export.h
> > +++ b/include/block/export.h
> > @@ -33,6 +33,12 @@ struct BlockExport {
> >   * the export.
> >   */
> >  int refcount;
> > +
> > +/*
> > + * The AioContex whose lock needs to be held while calling
> 
> *AioContext
> 
> > + * BlockExportDriver callbacks.
> 
> Hm.  But other blk_exp_* functions (i.e. the refcount manipulation
> functions) are fair game?

Hmm... The assumption was the ref/unref are only called from the main
thread, but maybe that's not true? So maybe blk_exp_*() shouldn't lock
the AioContext internally, but require that the lock is already held, so
that they can be called both from within the AioContext (where we don't
want to lock a second tim) and from the main context.

I also guess we need a separate mutex to protect the exports list if
unref can be called from different threads.

And probably the existing NBD server code has already the same problems
with respect to different AioContexts.

> > + */
> > +AioContext *ctx;
> >  };
> >  
> >  extern const BlockExportDriver blk_exp_nbd;
> > diff --git a/nbd/server.c b/nbd/server.c
> > index 2bf30bb731..b735a68429 100644
> > --- a/nbd/server.c
> > +++ b/nbd/server.c
> 
> [...]
> 
> > @@ -1466,7 +1464,7 @@ static void blk_aio_attached(AioContext *ctx, void 
> > *opaque)
> >  
> >  trace_nbd_blk_aio_attached(exp->name, ctx);
> >  
> > -exp->ctx = ctx;
> > +exp->common.ctx = ctx;
> 
> (Not sure if Ḯ’m missing anything to that regard), but perhaps after
> patch 21 we can move this part to the common block export code, and
> maybe make it call a BlockExportDriver callback (that handles the rest
> of this function).

Could probably be done. Not every export driver may support switching
AioContexts, but we can make it conditional on having the callback.

So do I understand right from your comments to the series in general
that you would prefer to make this series more complete, even if that
means that it becomes quite a bit longer?

Kevin


signature.asc
Description: PGP signature


[PULL 2/3] async: always set ctx->notified in aio_notify()

2020-08-17 Thread Stefan Hajnoczi
aio_notify() does not set ctx->notified when called with
ctx->aio_notify_me disabled. Therefore aio_notify_me needs to be enabled
during polling.

This is suboptimal since expensive event_notifier_set(>notifier)
and event_notifier_test_and_clear(>notifier) calls are required
when ctx->aio_notify_me is enabled.

Change aio_notify() so that aio->notified is always set, regardless of
ctx->aio_notify_me. This will make polling cheaper since
ctx->aio_notify_me can remain disabled. Move the
event_notifier_test_and_clear() to the fd handler function (which is now
no longer an empty function so "dummy" has been dropped from its name).

The next patch takes advantage of this by optimizing polling in
util/aio-posix.c.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20200806131802.569478-3-stefa...@redhat.com

[Paolo Bonzini pointed out that the smp_wmb() in aio_notify_accept()
should be smp_wb() but the comment should be smp_wmb() instead of
smp_wb(). Fixed.
--Stefan]

Signed-off-by: Stefan Hajnoczi 
---
 util/async.c | 32 +---
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/util/async.c b/util/async.c
index d9f987e133..4266745dee 100644
--- a/util/async.c
+++ b/util/async.c
@@ -419,25 +419,32 @@ LuringState *aio_get_linux_io_uring(AioContext *ctx)
 
 void aio_notify(AioContext *ctx)
 {
-/* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
+/*
+ * Write e.g. bh->flags before writing ctx->notified.  Pairs with smp_mb in
+ * aio_notify_accept.
+ */
+smp_wmb();
+atomic_set(>notified, true);
+
+/*
+ * Write ctx->notified before reading ctx->notify_me.  Pairs
  * with smp_mb in aio_ctx_prepare or aio_poll.
  */
 smp_mb();
 if (atomic_read(>notify_me)) {
 event_notifier_set(>notifier);
-atomic_mb_set(>notified, true);
 }
 }
 
 void aio_notify_accept(AioContext *ctx)
 {
-if (atomic_xchg(>notified, false)
-#ifdef WIN32
-|| true
-#endif
-) {
-event_notifier_test_and_clear(>notifier);
-}
+atomic_set(>notified, false);
+
+/*
+ * Write ctx->notified before reading e.g. bh->flags.  Pairs with smp_wmb
+ * in aio_notify.
+ */
+smp_mb();
 }
 
 static void aio_timerlist_notify(void *opaque, QEMUClockType type)
@@ -445,8 +452,11 @@ static void aio_timerlist_notify(void *opaque, 
QEMUClockType type)
 aio_notify(opaque);
 }
 
-static void aio_context_notifier_dummy_cb(EventNotifier *e)
+static void aio_context_notifier_cb(EventNotifier *e)
 {
+AioContext *ctx = container_of(e, AioContext, notifier);
+
+event_notifier_test_and_clear(>notifier);
 }
 
 /* Returns true if aio_notify() was called (e.g. a BH was scheduled) */
@@ -508,7 +518,7 @@ AioContext *aio_context_new(Error **errp)
 
 aio_set_event_notifier(ctx, >notifier,
false,
-   aio_context_notifier_dummy_cb,
+   aio_context_notifier_cb,
aio_context_notifier_poll);
 #ifdef CONFIG_LINUX_AIO
 ctx->linux_aio = NULL;
-- 
2.26.2



Re: [RFC PATCH 14/22] block/export: Move AioContext from NBDExport to BlockExport

2020-08-17 Thread Max Reitz
On 13.08.20 18:29, Kevin Wolf wrote:
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/export.h |  6 ++
>  nbd/server.c   | 26 +-
>  2 files changed, 19 insertions(+), 13 deletions(-)

Reviewed-by: Max Reitz 

> diff --git a/include/block/export.h b/include/block/export.h
> index f44290a4a2..5459f79469 100644
> --- a/include/block/export.h
> +++ b/include/block/export.h
> @@ -33,6 +33,12 @@ struct BlockExport {
>   * the export.
>   */
>  int refcount;
> +
> +/*
> + * The AioContex whose lock needs to be held while calling

*AioContext

> + * BlockExportDriver callbacks.

Hm.  But other blk_exp_* functions (i.e. the refcount manipulation
functions) are fair game?

> + */
> +AioContext *ctx;
>  };
>  
>  extern const BlockExportDriver blk_exp_nbd;
> diff --git a/nbd/server.c b/nbd/server.c
> index 2bf30bb731..b735a68429 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c

[...]

> @@ -1466,7 +1464,7 @@ static void blk_aio_attached(AioContext *ctx, void 
> *opaque)
>  
>  trace_nbd_blk_aio_attached(exp->name, ctx);
>  
> -exp->ctx = ctx;
> +exp->common.ctx = ctx;

(Not sure if Ḯ’m missing anything to that regard), but perhaps after
patch 21 we can move this part to the common block export code, and
maybe make it call a BlockExportDriver callback (that handles the rest
of this function).

>  QTAILQ_FOREACH(client, >clients, next) {
>  qio_channel_attach_aio_context(client->ioc, ctx);
> @@ -1484,13 +1482,13 @@ static void blk_aio_detach(void *opaque)
>  NBDExport *exp = opaque;
>  NBDClient *client;
>  
> -trace_nbd_blk_aio_detach(exp->name, exp->ctx);
> +trace_nbd_blk_aio_detach(exp->name, exp->common.ctx);
>  
>  QTAILQ_FOREACH(client, >clients, next) {
>  qio_channel_detach_aio_context(client->ioc);
>  }
>  
> -exp->ctx = NULL;
> +exp->common.ctx = NULL;

(same here)

Max



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 11/22] qemu-nbd: Use blk_exp_add() to create the export

2020-08-17 Thread Max Reitz
On 17.08.20 16:27, Max Reitz wrote:
> On 13.08.20 18:29, Kevin Wolf wrote:
>> With this change, NBD exports are only created through the BlockExport
>> interface any more. This allows us finally to move things from the NBD
>> layer to the BlockExport layer if they make sense for other export
>> types, too.
> 
> I see.
> 
>> blk_exp_add() returns only a weak reference, so the explicit
>> nbd_export_put() goes away.
>>
>> Signed-off-by: Kevin Wolf 
>> ---
>>  include/block/export.h |  2 ++
>>  include/block/nbd.h|  1 +
>>  block/export/export.c  |  2 +-
>>  blockdev-nbd.c |  8 +++-
>>  qemu-nbd.c | 28 ++--
>>  5 files changed, 33 insertions(+), 8 deletions(-)
> 
> [...]
> 
>> diff --git a/blockdev-nbd.c b/blockdev-nbd.c
>> index d5b084acc2..8dd127af52 100644
>> --- a/blockdev-nbd.c
>> +++ b/blockdev-nbd.c
> 
> [...]
> 
>> @@ -176,7 +182,7 @@ BlockExport *nbd_export_create(BlockExportOptions 
>> *exp_args, Error **errp)
>>  
>>  assert(exp_args->type == BLOCK_EXPORT_TYPE_NBD);
>>  
>> -if (!nbd_server) {
>> +if (!nbd_server && !is_qemu_nbd) {
> 
> (This begs the question of how difficult it would be to let qemu-nbd use
> QMP’s nbd-server-start, but I will not ask it, for I fear the answer.)
> 
>>  error_setg(errp, "NBD server not running");
>>  return NULL;
>>  }
>> diff --git a/qemu-nbd.c b/qemu-nbd.c
>> index 48aa8a9d46..d967b8fcb9 100644
>> --- a/qemu-nbd.c
>> +++ b/qemu-nbd.c
> 
> [...]
> 
>> @@ -1050,9 +1050,27 @@ int main(int argc, char **argv)
>>  
>>  bs->detect_zeroes = detect_zeroes;
>>  
>> -export = nbd_export_new(bs, export_name,
>> -export_description, bitmap, readonly, shared > 
>> 1,
>> -writethrough, _fatal);
>> +nbd_server_is_qemu_nbd(true);
>> +
>> +export_opts = g_new(BlockExportOptions, 1);
>> +*export_opts = (BlockExportOptions) {
>> +.type   = BLOCK_EXPORT_TYPE_NBD,
>> +.has_writethrough   = true,
>> +.writethrough   = writethrough,
>> +.u.nbd = {
>> +.device = g_strdup(bdrv_get_node_name(bs)),
>> +.has_name   = true,
>> +.name   = g_strdup(export_name),
>> +.has_description= !!export_description,
>> +.description= g_strdup(export_description),
>> +.has_writable   = true,
>> +.writable   = !readonly,
>> +.has_bitmap = !!bitmap,
>> +.bitmap = g_strdup(bitmap),
>> +},
>> +};
>> +blk_exp_add(export_opts, _fatal);
> 
> Why not use the already-global qmp_block_export_add(), if we don’t need
> the return value here?  (Will we require it at some point?)

In the context of patch 13, which adds more blk_exp_* functions, it
makes sense to make blk_exp_add() global, and then to use it here.  So:

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 09/22] nbd: Add writethrough to block-export-add

2020-08-17 Thread Kevin Wolf
Am 17.08.2020 um 15:51 hat Max Reitz geschrieben:
> On 17.08.20 15:13, Kevin Wolf wrote:
> > Am 17.08.2020 um 14:56 hat Max Reitz geschrieben:
> >> On 13.08.20 18:29, Kevin Wolf wrote:
> >>> qemu-nbd allows use of writethrough cache modes, which mean that write
> >>> requests made through NBD will cause a flush before they complete.
> >>> Expose the same functionality in block-export-add.
> >>>
> >>> Signed-off-by: Kevin Wolf 
> >>> ---
> >>>  qapi/block-export.json | 7 ++-
> >>>  blockdev-nbd.c | 2 +-
> >>>  2 files changed, 7 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/qapi/block-export.json b/qapi/block-export.json
> >>> index 1fdc55c53a..4ce163411f 100644
> >>> --- a/qapi/block-export.json
> >>> +++ b/qapi/block-export.json
> >>> @@ -167,10 +167,15 @@
> >>>  # Describes a block export, i.e. how single node should be exported on an
> >>>  # external interface.
> >>>  #
> >>> +# @writethrough: If true, caches are flushed after every write request 
> >>> to the
> >>> +#export before completion is signalled. (since: 5.2;
> >>> +#default: false)
> >>> +#
> >>>  # Since: 4.2
> >>>  ##
> >>>  { 'union': 'BlockExportOptions',
> >>> -  'base': { 'type': 'BlockExportType' },
> >>> +  'base': { 'type': 'BlockExportType',
> >>> +'*writethrough': 'bool' },
> >>>'discriminator': 'type',
> >>>'data': {
> >>>'nbd': 'BlockExportOptionsNbd'
> >>
> >> Hm.  I find it weird to have @writethrough in the base but @device in
> >> the specialized class.
> >>
> >> I think everything that will be common to all block exports should be in
> >> the base, and that probably includes a node-name.  I’m aware that will
> >> make things more tedious in the code, but perhaps it would be a nicer
> >> interface in the end.  Or is the real problem that that would create
> >> problems in the storage daemon’s command line interface, because then
> >> the specialized (legacy) NBD interface would no longer be compatible
> >> with the new generalized block export interface?
> > 
> > Indeed. I think patch 15 has what you're looking for.
> 
> Great. :)
> 
> Discussions where both participants have the same opinion from the
> start are the best ones.

Makes things a lot easier.

Maybe I should try to move patch 15 earlier. The series is mostly just
in the order that I wrote things, but there were also a few nasty
dependencies in the part the generalises things from NBD to BlockExport.
So I'm not sure if this is a patch that can be moved.

> >> Anyway, @writable might be a similar story.  A @read-only may make sense
> >> in general, I think.
> > 
> > Pulling @writable up is easier than a @read-only, but that's a naming
> > detail.
> 
> Sure.
> 
> > In general I agree, but this part isn't addressed in this series yet.
> > Part of the reason why this is an RFC was to find out if I should
> > include things like this or if we'll do it when we add another export
> > type or common functionality that needs the same option.
> 
> Sure, sure.

So should I or not? :-)

> Meta: I personally don’t like RFC patches very much.  RFC to me means
> everything is fair game, and reviewers should be free to let their
> thoughts wander and come up with perhaps wild ideas, just trying to
> gauge what everyone thinks.
> 
> When I’m the submitter, I tend to get defensive then, because I’ve
> invested time in writing the code already, so I tend to be biased
> against fundamental changes.  (Horrible personal trait.  I’m working
> on it.)

This makes sense. Nobody likes having to rewrite their RFC series.

But there is one thing I dread even more: Polishing the RFC series for
another week until I can send it out as non-RFC and _then_ having to
rewrite it.

> As a reviewer, the code and thus some fleshed out design is there
> already, so it’s difficult to break free from that and find completely
> different solutions to the original problem.
> (I kind of ventured in that direction for this patch, and it seems like
> you immediately noticed that my response was different from usual and
> pointed out the RFC status, perhaps to make me feel more comfortable in
> questioning the fundamental design more.  Which I noticed, hence this
> wall of text.)

Basically just telling you that I was already interested in your input
for this point specifically when I sent the series.

> Perhaps I’m wrong.  Perhaps it’s just myself (the points I’ve just
> listed are definitely my own personal weaknesses), but I can’t help but
> project and assume that others may feel similar, at least in part.
> So I feel like RFCs that consist of patches tend to at least lock me in
> to the solution that’s present.  I find them difficult to handle, both
> as a submitter and as a reviewer.
> 
> All in all, that means on either side I tend to handle patch RFCs as
> “Standard series, just tests missing”.  Not sure if that’s ideal.  Or
> maybe that’s exactly what patch RFCs are?
> 
> (Of course, it can and should be argued that even 

Re: [RFC PATCH 12/22] nbd/server: Simplify export shutdown

2020-08-17 Thread Max Reitz
On 13.08.20 18:29, Kevin Wolf wrote:
> Closing export is somewhat convoluted because nbd_export_close() and
> nbd_export_put() call each other and the ways they actually end up being
> nested is not necessarily obvious.
> 
> However, it is not really necessary to call nbd_export_close() from
> nbd_export_put() when putting the last reference because it only does
> three things:
> 
> 1. Close all clients. We're going to refcount 0 and all clients hold a
>reference, so we know there is no active client any more.
> 
> 2. Close the user reference (represented by exp->name being non-NULL).
>The same argument applies: If the export were still named, we would
>still have a reference.
> 
> 3. Freeing exp->description. This is really cleanup work to be done when
>the export is finally freed. There is no reason to already clear it
>while clients are still in the process of shutting down.

Convincing.

> So after moving the cleanup of exp->description, the code can be
> simplified so that only nbd_export_close() calls nbd_export_put(), but
> never the other way around.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  nbd/server.c | 17 -
>  1 file changed, 4 insertions(+), 13 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 11/22] qemu-nbd: Use blk_exp_add() to create the export

2020-08-17 Thread Max Reitz
On 13.08.20 18:29, Kevin Wolf wrote:
> With this change, NBD exports are only created through the BlockExport
> interface any more. This allows us finally to move things from the NBD
> layer to the BlockExport layer if they make sense for other export
> types, too.

I see.

> blk_exp_add() returns only a weak reference, so the explicit
> nbd_export_put() goes away.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/export.h |  2 ++
>  include/block/nbd.h|  1 +
>  block/export/export.c  |  2 +-
>  blockdev-nbd.c |  8 +++-
>  qemu-nbd.c | 28 ++--
>  5 files changed, 33 insertions(+), 8 deletions(-)

[...]

> diff --git a/blockdev-nbd.c b/blockdev-nbd.c
> index d5b084acc2..8dd127af52 100644
> --- a/blockdev-nbd.c
> +++ b/blockdev-nbd.c

[...]

> @@ -176,7 +182,7 @@ BlockExport *nbd_export_create(BlockExportOptions 
> *exp_args, Error **errp)
>  
>  assert(exp_args->type == BLOCK_EXPORT_TYPE_NBD);
>  
> -if (!nbd_server) {
> +if (!nbd_server && !is_qemu_nbd) {

(This begs the question of how difficult it would be to let qemu-nbd use
QMP’s nbd-server-start, but I will not ask it, for I fear the answer.)

>  error_setg(errp, "NBD server not running");
>  return NULL;
>  }
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index 48aa8a9d46..d967b8fcb9 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c

[...]

> @@ -1050,9 +1050,27 @@ int main(int argc, char **argv)
>  
>  bs->detect_zeroes = detect_zeroes;
>  
> -export = nbd_export_new(bs, export_name,
> -export_description, bitmap, readonly, shared > 1,
> -writethrough, _fatal);
> +nbd_server_is_qemu_nbd(true);
> +
> +export_opts = g_new(BlockExportOptions, 1);
> +*export_opts = (BlockExportOptions) {
> +.type   = BLOCK_EXPORT_TYPE_NBD,
> +.has_writethrough   = true,
> +.writethrough   = writethrough,
> +.u.nbd = {
> +.device = g_strdup(bdrv_get_node_name(bs)),
> +.has_name   = true,
> +.name   = g_strdup(export_name),
> +.has_description= !!export_description,
> +.description= g_strdup(export_description),
> +.has_writable   = true,
> +.writable   = !readonly,
> +.has_bitmap = !!bitmap,
> +.bitmap = g_strdup(bitmap),
> +},
> +};
> +blk_exp_add(export_opts, _fatal);

Why not use the already-global qmp_block_export_add(), if we don’t need
the return value here?  (Will we require it at some point?)

Max

> +qapi_free_BlockExportOptions(export_opts);
>  
>  if (device) {
>  #if HAVE_NBD_DEVICE



signature.asc
Description: OpenPGP digital signature


Re: Choice of BDRV_REQUEST_MAX_SECTORS

2020-08-17 Thread Peter Lieven

Am 17.08.20 um 15:44 schrieb Eric Blake:

On 8/17/20 7:32 AM, Peter Lieven wrote:

Hi,


I am currently debugging a performance issue in qemu-img convert. I think I 
have found the cause and will send a patch later.

But is there any reason why BDRV_REQUEST_MAX_SECTORS is not at least aligned 
down to 8 (4k sectors)?

Any operation that is not able to determinate an optimal or maximum request 
size from a driver will likely use this value and

might generate unaligned requests (as qemu-img convert does) as soon as the 
first iteration of an operation is done.


Vladimir's work to make the block layer 64-bit clean should help resolve this, 
as we will be relying more heavily on values aligned to the driver's limits 
rather than arbitrary 32-bit cutoffs.



Would it be worth aligning this constant if it is not at all eliminated by 
Vladimirs work?


Peter






Re: [RFC PATCH 10/22] nbd: Remove NBDExport.close callback

2020-08-17 Thread Max Reitz
On 13.08.20 18:29, Kevin Wolf wrote:
> The export close callback is unused by the built-in NBD server. qemu-nbd
> uses it only during shutdown to wait for the unrefed export to actually
> go away. It can just use nbd_export_close_all() instead and do without
> the callback.
> 
> This removes the close callback from nbd_export_new() and makes both
> callers of it more similar.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/nbd.h |  3 +--
>  blockdev-nbd.c  |  2 +-
>  nbd/server.c|  9 +
>  qemu-nbd.c  | 14 --
>  4 files changed, 7 insertions(+), 21 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 09/22] nbd: Add writethrough to block-export-add

2020-08-17 Thread Max Reitz
On 17.08.20 15:13, Kevin Wolf wrote:
> Am 17.08.2020 um 14:56 hat Max Reitz geschrieben:
>> On 13.08.20 18:29, Kevin Wolf wrote:
>>> qemu-nbd allows use of writethrough cache modes, which mean that write
>>> requests made through NBD will cause a flush before they complete.
>>> Expose the same functionality in block-export-add.
>>>
>>> Signed-off-by: Kevin Wolf 
>>> ---
>>>  qapi/block-export.json | 7 ++-
>>>  blockdev-nbd.c | 2 +-
>>>  2 files changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/qapi/block-export.json b/qapi/block-export.json
>>> index 1fdc55c53a..4ce163411f 100644
>>> --- a/qapi/block-export.json
>>> +++ b/qapi/block-export.json
>>> @@ -167,10 +167,15 @@
>>>  # Describes a block export, i.e. how single node should be exported on an
>>>  # external interface.
>>>  #
>>> +# @writethrough: If true, caches are flushed after every write request to 
>>> the
>>> +#export before completion is signalled. (since: 5.2;
>>> +#default: false)
>>> +#
>>>  # Since: 4.2
>>>  ##
>>>  { 'union': 'BlockExportOptions',
>>> -  'base': { 'type': 'BlockExportType' },
>>> +  'base': { 'type': 'BlockExportType',
>>> +'*writethrough': 'bool' },
>>>'discriminator': 'type',
>>>'data': {
>>>'nbd': 'BlockExportOptionsNbd'
>>
>> Hm.  I find it weird to have @writethrough in the base but @device in
>> the specialized class.
>>
>> I think everything that will be common to all block exports should be in
>> the base, and that probably includes a node-name.  I’m aware that will
>> make things more tedious in the code, but perhaps it would be a nicer
>> interface in the end.  Or is the real problem that that would create
>> problems in the storage daemon’s command line interface, because then
>> the specialized (legacy) NBD interface would no longer be compatible
>> with the new generalized block export interface?
> 
> Indeed. I think patch 15 has what you're looking for.

Great. :)

Discussions where both participants have the same opinion from the start
are the best ones.

>> Anyway, @writable might be a similar story.  A @read-only may make sense
>> in general, I think.
> 
> Pulling @writable up is easier than a @read-only, but that's a naming
> detail.

Sure.

> In general I agree, but this part isn't addressed in this series yet.
> Part of the reason why this is an RFC was to find out if I should
> include things like this or if we'll do it when we add another export
> type or common functionality that needs the same option.

Sure, sure.


Meta: I personally don’t like RFC patches very much.  RFC to me means
everything is fair game, and reviewers should be free to let their
thoughts wander and come up with perhaps wild ideas, just trying to
gauge what everyone thinks.

When I’m the submitter, I tend to get defensive then, because I’ve
invested time in writing the code already, so I tend to be biased
against fundamental changes.  (Horrible personal trait.  I’m working on it.)

As a reviewer, the code and thus some fleshed out design is there
already, so it’s difficult to break free from that and find completely
different solutions to the original problem.
(I kind of ventured in that direction for this patch, and it seems like
you immediately noticed that my response was different from usual and
pointed out the RFC status, perhaps to make me feel more comfortable in
questioning the fundamental design more.  Which I noticed, hence this
wall of text.)

Perhaps I’m wrong.  Perhaps it’s just myself (the points I’ve just
listed are definitely my own personal weaknesses), but I can’t help but
project and assume that others may feel similar, at least in part.
So I feel like RFCs that consist of patches tend to at least lock me in
to the solution that’s present.  I find them difficult to handle, both
as a submitter and as a reviewer.

All in all, that means on either side I tend to handle patch RFCs as
“Standard series, just tests missing”.  Not sure if that’s ideal.  Or
maybe that’s exactly what patch RFCs are?

(Of course, it can and should be argued that even for standard series, I
shouldn’t be afraid of questioning the fundamental design still.  But
that’s hard...)


But, well.  The alternative is writing pure design RFCs, and then you
tend to get weeks of slow discussion, drawing everything out.  Which
isn’t ideal either.  Or is that just a baseless prejudice I have?

>> Basically, I think that the export code should be separate from the code
>> setting up the BlockBackend that should be exported, so all options
>> regarding that BB should be common; and those options are @node-name,
>> @writethrough, and @read-only.  (And perhaps other things like
>> @resizable, too, even though that isn’t something to consider for NBD.)
> 
> Do you mean that the BlockBackend should already be created by the
> generic block export layer?

It would certainly be nice, if it were feasible, don’t you think?

We don’t have to bend backwards for it, but maybe it 

Re: Choice of BDRV_REQUEST_MAX_SECTORS

2020-08-17 Thread Eric Blake

On 8/17/20 7:32 AM, Peter Lieven wrote:

Hi,


I am currently debugging a performance issue in qemu-img convert. I 
think I have found the cause and will send a patch later.


But is there any reason why BDRV_REQUEST_MAX_SECTORS is not at least 
aligned down to 8 (4k sectors)?


Any operation that is not able to determinate an optimal or maximum 
request size from a driver will likely use this value and


might generate unaligned requests (as qemu-img convert does) as soon as 
the first iteration of an operation is done.


Vladimir's work to make the block layer 64-bit clean should help resolve 
this, as we will be relying more heavily on values aligned to the 
driver's limits rather than arbitrary 32-bit cutoffs.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [RFC PATCH 04/22] block/export: Add BlockExport infrastructure and block-export-add

2020-08-17 Thread Kevin Wolf
Am 17.08.2020 um 15:19 hat Max Reitz geschrieben:
> On 17.08.20 14:45, Kevin Wolf wrote:
> > Am 17.08.2020 um 12:03 hat Max Reitz geschrieben:
> >> On 13.08.20 18:29, Kevin Wolf wrote:
> >>> We want to have a common set of commands for all types of block exports.
> >>> Currently, this is only NBD, but we're going to add more types.
> >>>
> >>> This patch adds the basic BlockExport and BlockExportDriver structs and
> >>> a QMP command block-export-add that creates a new export based on the
> >>> given BlockExportOptions.
> >>>
> >>> qmp_nbd_server_add() becomes a wrapper around qmp_block_export_add().
> >>>
> >>> Signed-off-by: Kevin Wolf 
> >>> ---
> >>>  qapi/block-export.json |  9 ++
> >>>  include/block/export.h | 32 +
> >>>  include/block/nbd.h|  3 +-
> >>>  block/export/export.c  | 57 ++
> >>>  blockdev-nbd.c | 19 -
> >>>  nbd/server.c   | 15 +-
> >>>  Makefile.objs  |  6 ++--
> >>>  block/Makefile.objs|  2 ++
> >>>  block/export/Makefile.objs |  1 +
> >>>  9 files changed, 132 insertions(+), 12 deletions(-)
> >>>  create mode 100644 include/block/export.h
> >>>  create mode 100644 block/export/export.c
> >>>  create mode 100644 block/export/Makefile.objs
> >>
> >> Nothing of too great importance below.  But it’s an RFC, so comments I
> >> will give.
> >>
> >>> diff --git a/block/export/export.c b/block/export/export.c
> >>> new file mode 100644
> >>> index 00..3d0dacb3f2
> >>> --- /dev/null
> >>> +++ b/block/export/export.c
> >>> @@ -0,0 +1,57 @@
> >>> +/*
> >>> + * Common block export infrastructure
> >>> + *
> >>> + * Copyright (c) 2012, 2020 Red Hat, Inc.
> >>> + *
> >>> + * Authors:
> >>> + * Paolo Bonzini 
> >>> + * Kevin Wolf 
> >>> + *
> >>> + * This work is licensed under the terms of the GNU GPL, version 2 or
> >>> + * later.  See the COPYING file in the top-level directory.
> >>> + */
> >>> +
> >>> +#include "qemu/osdep.h"
> >>> +
> >>> +#include "block/export.h"
> >>> +#include "block/nbd.h"
> >>> +#include "qapi/error.h"
> >>> +#include "qapi/qapi-commands-block-export.h"
> >>> +
> >>> +static const BlockExportDriver* blk_exp_drivers[] = {
> >>  ^^
> >> Sternenplatzierung *hust*
> >>
> >>> +_exp_nbd,
> >>> +};
> >>
> >> Not sure whether I like this better than the block driver way of
> >> registering block drivers with a constructor.  It requires writing less
> >> code, at the expense of making the variable global.  So I think there’s
> >> no good reason to prefer the block driver approach.
> > 
> > I guess I can see one reason why we may want to switch to the
> > registration style eventually: If we we want to make export drivers
> > optional modules which may or may not be present.
> 
> Good point.
> 
> >> Maybe my hesitance comes from the variable being declared (as extern) in
> >> a header file (block/export.h).  I think I would prefer it if we put
> >> that external reference only here in this file.  Would that work, or do
> >> you have other plans that require blk_exp_nbd to be visible outside of
> >> nbd/server.c and this file here?
> > 
> > Hm, do we have precedence for "public, but not really" variables?
> > Normally I expect public symbols to be declared in a header file.
> 
> Hm, yes.
> 
> tl;dr: I was wrong about a local external reference being nicer.  But I
> believe there is a difference between externally-facing header files
> (e.g. block.h) and internal header files (e.g. block_int.h).  I don’t
> know which of those block/export.h is supposed to be.
> 
> (And of course it doesn’t even matter at all, really.)
> 
> 
> non-tl;dr:
> 
> We have a similar case for bdrv_{file,raw,qcow2}, but those are at least
> in a *_int.h.  I can’t say I like that style.
> 
> OK, let me try to figure out what my problem with this is.
> 
> I think if a module (in this case the NBD export code) exports
> something, it should be available in the respective header (i.e., some
> NBD header), not in some other header.  A module’s header should present
> what it exports to the rest of the code.  The export code probably
> doesn’t want to export the NBD driver object, it wants to import it,
> actually.  So if it should be in a header file, it should be in an NBD
> header.
> 
> Now none of our block drivers has a header file for exporting symbols to
> the rest of the block code, which is why their symbols have been put
> into block_int.h.  I think that’s cutting corners, but can be defended
> by saying that block_int.h is not for exporting anything, but just
> collects stuff internal to the block layer, so it kind of fits there.
> 
> (Still, technically, I believe bdrv_{file,raw,qcow2} should be exported
> by each respective block driver in a driver-specific header file.  If
> that isn’t the case, it doesn’t really matter to me whether it’s put
> into a dedicated header file to collect internal stuff (block_int.h) 

Re: [RFC PATCH 07/22] block/export: Remove magic from block-export-add

2020-08-17 Thread Max Reitz
On 17.08.20 14:49, Kevin Wolf wrote:
> Am 17.08.2020 um 13:41 hat Max Reitz geschrieben:
>> On 13.08.20 18:29, Kevin Wolf wrote:
>>> nbd-server-add tries to be convenient and adds two questionable
>>> features that we don't want to share in block-export-add, even for NBD
>>> exports:
>>>
>>> 1. When requesting a writable export of a read-only device, the export
>>>is silently downgraded to read-only. This should be an error in the
>>>context of block-export-add.
>>>
>>> 2. When using a BlockBackend name, unplugging the device from the guest
>>>will automatically stop the NBD server, too. This may sometimes be
>>>what you want, but it could also be very surprising. Let's keep
>>>things explicit with block-export-add. If the user wants to stop the
>>>export, they should tell us so.
>>>
>>> Move these things into the nbd-server-add QMP command handler so that
>>> they apply only there.
>>>
>>> Signed-off-by: Kevin Wolf 
>>> ---
>>>  include/block/nbd.h   |  3 ++-
>>>  block/export/export.c | 44 ++-
>>>  blockdev-nbd.c| 10 --
>>>  nbd/server.c  | 19 ---
>>>  qemu-nbd.c|  3 +--
>>>  5 files changed, 58 insertions(+), 21 deletions(-)

[...]

>>> +}
>>> +
>>> +export = blk_exp_add(_opts, errp);
>>> +if (!export) {
>>> +return;
>>> +}
>>> +
>>> +/*
>>> + * nbd-server-add removes the export when the named BlockBackend used 
>>> for
>>> + * @device goes away.
>>> + */
>>> +on_eject_blk = blk_by_name(arg->device);
>>> +if (on_eject_blk) {
>>> +nbd_export_set_on_eject_blk(export, on_eject_blk);
>>> +}
>>>  }
>>
>> The longer it gets, the more I think maybe it should be in some NBD file
>> like blockdev-nbd.c after all.
> 
> Fair enough. Though I think blockdev-nbd.c in the root directory is
> something that shouldn't even exist.

Absolutely.  But unless you (or someone™ else) doesn’t do anything about
it, we may as well continue to (ab)use it. O:)

Max



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 04/22] block/export: Add BlockExport infrastructure and block-export-add

2020-08-17 Thread Max Reitz
On 17.08.20 14:45, Kevin Wolf wrote:
> Am 17.08.2020 um 12:03 hat Max Reitz geschrieben:
>> On 13.08.20 18:29, Kevin Wolf wrote:
>>> We want to have a common set of commands for all types of block exports.
>>> Currently, this is only NBD, but we're going to add more types.
>>>
>>> This patch adds the basic BlockExport and BlockExportDriver structs and
>>> a QMP command block-export-add that creates a new export based on the
>>> given BlockExportOptions.
>>>
>>> qmp_nbd_server_add() becomes a wrapper around qmp_block_export_add().
>>>
>>> Signed-off-by: Kevin Wolf 
>>> ---
>>>  qapi/block-export.json |  9 ++
>>>  include/block/export.h | 32 +
>>>  include/block/nbd.h|  3 +-
>>>  block/export/export.c  | 57 ++
>>>  blockdev-nbd.c | 19 -
>>>  nbd/server.c   | 15 +-
>>>  Makefile.objs  |  6 ++--
>>>  block/Makefile.objs|  2 ++
>>>  block/export/Makefile.objs |  1 +
>>>  9 files changed, 132 insertions(+), 12 deletions(-)
>>>  create mode 100644 include/block/export.h
>>>  create mode 100644 block/export/export.c
>>>  create mode 100644 block/export/Makefile.objs
>>
>> Nothing of too great importance below.  But it’s an RFC, so comments I
>> will give.
>>
>>> diff --git a/block/export/export.c b/block/export/export.c
>>> new file mode 100644
>>> index 00..3d0dacb3f2
>>> --- /dev/null
>>> +++ b/block/export/export.c
>>> @@ -0,0 +1,57 @@
>>> +/*
>>> + * Common block export infrastructure
>>> + *
>>> + * Copyright (c) 2012, 2020 Red Hat, Inc.
>>> + *
>>> + * Authors:
>>> + * Paolo Bonzini 
>>> + * Kevin Wolf 
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2 or
>>> + * later.  See the COPYING file in the top-level directory.
>>> + */
>>> +
>>> +#include "qemu/osdep.h"
>>> +
>>> +#include "block/export.h"
>>> +#include "block/nbd.h"
>>> +#include "qapi/error.h"
>>> +#include "qapi/qapi-commands-block-export.h"
>>> +
>>> +static const BlockExportDriver* blk_exp_drivers[] = {
>>  ^^
>> Sternenplatzierung *hust*
>>
>>> +_exp_nbd,
>>> +};
>>
>> Not sure whether I like this better than the block driver way of
>> registering block drivers with a constructor.  It requires writing less
>> code, at the expense of making the variable global.  So I think there’s
>> no good reason to prefer the block driver approach.
> 
> I guess I can see one reason why we may want to switch to the
> registration style eventually: If we we want to make export drivers
> optional modules which may or may not be present.

Good point.

>> Maybe my hesitance comes from the variable being declared (as extern) in
>> a header file (block/export.h).  I think I would prefer it if we put
>> that external reference only here in this file.  Would that work, or do
>> you have other plans that require blk_exp_nbd to be visible outside of
>> nbd/server.c and this file here?
> 
> Hm, do we have precedence for "public, but not really" variables?
> Normally I expect public symbols to be declared in a header file.

Hm, yes.

tl;dr: I was wrong about a local external reference being nicer.  But I
believe there is a difference between externally-facing header files
(e.g. block.h) and internal header files (e.g. block_int.h).  I don’t
know which of those block/export.h is supposed to be.

(And of course it doesn’t even matter at all, really.)


non-tl;dr:

We have a similar case for bdrv_{file,raw,qcow2}, but those are at least
in a *_int.h.  I can’t say I like that style.

OK, let me try to figure out what my problem with this is.

I think if a module (in this case the NBD export code) exports
something, it should be available in the respective header (i.e., some
NBD header), not in some other header.  A module’s header should present
what it exports to the rest of the code.  The export code probably
doesn’t want to export the NBD driver object, it wants to import it,
actually.  So if it should be in a header file, it should be in an NBD
header.

Now none of our block drivers has a header file for exporting symbols to
the rest of the block code, which is why their symbols have been put
into block_int.h.  I think that’s cutting corners, but can be defended
by saying that block_int.h is not for exporting anything, but just
collects stuff internal to the block layer, so it kind of fits there.

(Still, technically, I believe bdrv_{file,raw,qcow2} should be exported
by each respective block driver in a driver-specific header file.  If
that isn’t the case, it doesn’t really matter to me whether it’s put
into a dedicated header file to collect internal stuff (block_int.h) or
just imported locally (with an external declaration) where it’s used.
Probably the dedicated header file is cleaner after all, right.)

Maybe block/export.h is the same in that it’s just supposed to collect
symbols used internally by the export code, then it isn’t wrong to put
it 

Re: [RFC PATCH 09/22] nbd: Add writethrough to block-export-add

2020-08-17 Thread Kevin Wolf
Am 17.08.2020 um 14:56 hat Max Reitz geschrieben:
> On 13.08.20 18:29, Kevin Wolf wrote:
> > qemu-nbd allows use of writethrough cache modes, which mean that write
> > requests made through NBD will cause a flush before they complete.
> > Expose the same functionality in block-export-add.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  qapi/block-export.json | 7 ++-
> >  blockdev-nbd.c | 2 +-
> >  2 files changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/qapi/block-export.json b/qapi/block-export.json
> > index 1fdc55c53a..4ce163411f 100644
> > --- a/qapi/block-export.json
> > +++ b/qapi/block-export.json
> > @@ -167,10 +167,15 @@
> >  # Describes a block export, i.e. how single node should be exported on an
> >  # external interface.
> >  #
> > +# @writethrough: If true, caches are flushed after every write request to 
> > the
> > +#export before completion is signalled. (since: 5.2;
> > +#default: false)
> > +#
> >  # Since: 4.2
> >  ##
> >  { 'union': 'BlockExportOptions',
> > -  'base': { 'type': 'BlockExportType' },
> > +  'base': { 'type': 'BlockExportType',
> > +'*writethrough': 'bool' },
> >'discriminator': 'type',
> >'data': {
> >'nbd': 'BlockExportOptionsNbd'
> 
> Hm.  I find it weird to have @writethrough in the base but @device in
> the specialized class.
>
> I think everything that will be common to all block exports should be in
> the base, and that probably includes a node-name.  I’m aware that will
> make things more tedious in the code, but perhaps it would be a nicer
> interface in the end.  Or is the real problem that that would create
> problems in the storage daemon’s command line interface, because then
> the specialized (legacy) NBD interface would no longer be compatible
> with the new generalized block export interface?

Indeed. I think patch 15 has what you're looking for.

> Anyway, @writable might be a similar story.  A @read-only may make sense
> in general, I think.

Pulling @writable up is easier than a @read-only, but that's a naming
detail.

In general I agree, but this part isn't addressed in this series yet.
Part of the reason why this is an RFC was to find out if I should
include things like this or if we'll do it when we add another export
type or common functionality that needs the same option.

> Basically, I think that the export code should be separate from the code
> setting up the BlockBackend that should be exported, so all options
> regarding that BB should be common; and those options are @node-name,
> @writethrough, and @read-only.  (And perhaps other things like
> @resizable, too, even though that isn’t something to consider for NBD.)

Do you mean that the BlockBackend should already be created by the
generic block export layer?

Kevin


signature.asc
Description: PGP signature


Re: [RFC PATCH 08/22] nbd: Add max-connections to nbd-server-start

2020-08-17 Thread Kevin Wolf
Am 17.08.2020 um 14:37 hat Max Reitz geschrieben:
> On 13.08.20 18:29, Kevin Wolf wrote:
> > This is a QMP equivalent of qemu-nbd's --share option, limiting the
> 
> *--shared
> 
> > maximum number of clients that can attach at the same time.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  qapi/block-export.json | 10 --
> >  include/block/nbd.h|  3 ++-
> >  block/monitor/block-hmp-cmds.c |  2 +-
> >  blockdev-nbd.c | 33 ++---
> >  qemu-storage-daemon.c  |  4 ++--
> >  5 files changed, 39 insertions(+), 13 deletions(-)
> 
> I suppose this is part of this series so that patch 11 can happen?

More like because initially I thought it would be needed for patch 11,
and when I realised that server != export and it's not really needed, I
still didn't want to throw the patch away...

I could make it a patch separate from this series if that's helpful.

Kevin


signature.asc
Description: PGP signature


Re: [RFC PATCH 09/22] nbd: Add writethrough to block-export-add

2020-08-17 Thread Max Reitz
On 13.08.20 18:29, Kevin Wolf wrote:
> qemu-nbd allows use of writethrough cache modes, which mean that write
> requests made through NBD will cause a flush before they complete.
> Expose the same functionality in block-export-add.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  qapi/block-export.json | 7 ++-
>  blockdev-nbd.c | 2 +-
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/qapi/block-export.json b/qapi/block-export.json
> index 1fdc55c53a..4ce163411f 100644
> --- a/qapi/block-export.json
> +++ b/qapi/block-export.json
> @@ -167,10 +167,15 @@
>  # Describes a block export, i.e. how single node should be exported on an
>  # external interface.
>  #
> +# @writethrough: If true, caches are flushed after every write request to the
> +#export before completion is signalled. (since: 5.2;
> +#default: false)
> +#
>  # Since: 4.2
>  ##
>  { 'union': 'BlockExportOptions',
> -  'base': { 'type': 'BlockExportType' },
> +  'base': { 'type': 'BlockExportType',
> +'*writethrough': 'bool' },
>'discriminator': 'type',
>'data': {
>'nbd': 'BlockExportOptionsNbd'

Hm.  I find it weird to have @writethrough in the base but @device in
the specialized class.

I think everything that will be common to all block exports should be in
the base, and that probably includes a node-name.  I’m aware that will
make things more tedious in the code, but perhaps it would be a nicer
interface in the end.  Or is the real problem that that would create
problems in the storage daemon’s command line interface, because then
the specialized (legacy) NBD interface would no longer be compatible
with the new generalized block export interface?

Anyway, @writable might be a similar story.  A @read-only may make sense
in general, I think.

Basically, I think that the export code should be separate from the code
setting up the BlockBackend that should be exported, so all options
regarding that BB should be common; and those options are @node-name,
@writethrough, and @read-only.  (And perhaps other things like
@resizable, too, even though that isn’t something to consider for NBD.)

Max



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 07/22] block/export: Remove magic from block-export-add

2020-08-17 Thread Kevin Wolf
Am 17.08.2020 um 13:41 hat Max Reitz geschrieben:
> On 13.08.20 18:29, Kevin Wolf wrote:
> > nbd-server-add tries to be convenient and adds two questionable
> > features that we don't want to share in block-export-add, even for NBD
> > exports:
> > 
> > 1. When requesting a writable export of a read-only device, the export
> >is silently downgraded to read-only. This should be an error in the
> >context of block-export-add.
> > 
> > 2. When using a BlockBackend name, unplugging the device from the guest
> >will automatically stop the NBD server, too. This may sometimes be
> >what you want, but it could also be very surprising. Let's keep
> >things explicit with block-export-add. If the user wants to stop the
> >export, they should tell us so.
> > 
> > Move these things into the nbd-server-add QMP command handler so that
> > they apply only there.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  include/block/nbd.h   |  3 ++-
> >  block/export/export.c | 44 ++-
> >  blockdev-nbd.c| 10 --
> >  nbd/server.c  | 19 ---
> >  qemu-nbd.c|  3 +--
> >  5 files changed, 58 insertions(+), 21 deletions(-)
> 
> [...]
> 
> > diff --git a/block/export/export.c b/block/export/export.c
> > index 3d0dacb3f2..2d5f92861c 100644
> > --- a/block/export/export.c
> > +++ b/block/export/export.c
> 
> [...]
> 
> > @@ -34,24 +36,56 @@ static const BlockExportDriver 
> > *blk_exp_find_driver(BlockExportType type)
> >  return NULL;
> >  }
> >  
> > -void qmp_block_export_add(BlockExportOptions *export, Error **errp)
> > +static BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
> >  {
> >  const BlockExportDriver *drv;
> >  
> >  drv = blk_exp_find_driver(export->type);
> >  if (!drv) {
> >  error_setg(errp, "No driver found for the requested export type");
> > -return;
> > +return NULL;
> >  }
> >  
> > -drv->create(export, errp);
> > +return drv->create(export, errp);
> > +}
> > +
> > +void qmp_block_export_add(BlockExportOptions *export, Error **errp)
> > +{
> > +blk_exp_add(export, errp);
> >  }
> 
> Interesting.  I would have added it this way from the start then (with a
> note that we’ll need it later).
> 
> >  void qmp_nbd_server_add(BlockExportOptionsNbd *arg, Error **errp)
> >  {
> > -BlockExportOptions export = {
> > +BlockExport *export;
> > +BlockDriverState *bs;
> > +BlockBackend *on_eject_blk;
> > +
> > +BlockExportOptions export_opts = {
> >  .type = BLOCK_EXPORT_TYPE_NBD,
> >  .u.nbd = *arg,
> 
> This copies *arg’s contents...
> 
> >  };
> > -qmp_block_export_add(, errp);
> > +
> > +/*
> > + * nbd-server-add doesn't complain when a read-only device should be
> > + * exported as writable, but simply downgrades it. This is an error 
> > with
> > + * block-export-add.
> > + */
> > +bs = bdrv_lookup_bs(arg->device, arg->device, NULL);
> > +if (bs && bdrv_is_read_only(bs)) {
> > +arg->writable = false;
> 
> ...and here you only modify the original *arg, but not
> export_opts.u.nbd.  So I don’t think this will have any effect.

I thought I had tested this... Well, good catch, thanks.

> > +}
> > +
> > +export = blk_exp_add(_opts, errp);
> > +if (!export) {
> > +return;
> > +}
> > +
> > +/*
> > + * nbd-server-add removes the export when the named BlockBackend used 
> > for
> > + * @device goes away.
> > + */
> > +on_eject_blk = blk_by_name(arg->device);
> > +if (on_eject_blk) {
> > +nbd_export_set_on_eject_blk(export, on_eject_blk);
> > +}
> >  }
> 
> The longer it gets, the more I think maybe it should be in some NBD file
> like blockdev-nbd.c after all.

Fair enough. Though I think blockdev-nbd.c in the root directory is
something that shouldn't even exist.

But I guess I can just leave the functions where they are and we can
move the file another day.

> [...]
> 
> > diff --git a/nbd/server.c b/nbd/server.c
> > index 92360d1f08..0b84fd30e2 100644
> > --- a/nbd/server.c
> > +++ b/nbd/server.c
> > @@ -1506,11 +1506,22 @@ static void nbd_eject_notifier(Notifier *n, void 
> > *data)
> >  aio_context_release(aio_context);
> >  }
> >  
> > +void nbd_export_set_on_eject_blk(BlockExport *exp, BlockBackend *blk)
> > +{
> > +NBDExport *nbd_exp = container_of(exp, NBDExport, common);
> > +assert(exp->drv == _exp_nbd);
> > +
> 
> I think asserting that the nbd_exp->eject_notifier is unused so far
> would make sense (e.g. just checking that eject_notifier_blk is NULL).

Makes sense.

Kevin


signature.asc
Description: PGP signature


Re: [RFC PATCH 04/22] block/export: Add BlockExport infrastructure and block-export-add

2020-08-17 Thread Kevin Wolf
Am 17.08.2020 um 12:03 hat Max Reitz geschrieben:
> On 13.08.20 18:29, Kevin Wolf wrote:
> > We want to have a common set of commands for all types of block exports.
> > Currently, this is only NBD, but we're going to add more types.
> > 
> > This patch adds the basic BlockExport and BlockExportDriver structs and
> > a QMP command block-export-add that creates a new export based on the
> > given BlockExportOptions.
> > 
> > qmp_nbd_server_add() becomes a wrapper around qmp_block_export_add().
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  qapi/block-export.json |  9 ++
> >  include/block/export.h | 32 +
> >  include/block/nbd.h|  3 +-
> >  block/export/export.c  | 57 ++
> >  blockdev-nbd.c | 19 -
> >  nbd/server.c   | 15 +-
> >  Makefile.objs  |  6 ++--
> >  block/Makefile.objs|  2 ++
> >  block/export/Makefile.objs |  1 +
> >  9 files changed, 132 insertions(+), 12 deletions(-)
> >  create mode 100644 include/block/export.h
> >  create mode 100644 block/export/export.c
> >  create mode 100644 block/export/Makefile.objs
> 
> Nothing of too great importance below.  But it’s an RFC, so comments I
> will give.
> 
> > diff --git a/block/export/export.c b/block/export/export.c
> > new file mode 100644
> > index 00..3d0dacb3f2
> > --- /dev/null
> > +++ b/block/export/export.c
> > @@ -0,0 +1,57 @@
> > +/*
> > + * Common block export infrastructure
> > + *
> > + * Copyright (c) 2012, 2020 Red Hat, Inc.
> > + *
> > + * Authors:
> > + * Paolo Bonzini 
> > + * Kevin Wolf 
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or
> > + * later.  See the COPYING file in the top-level directory.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +
> > +#include "block/export.h"
> > +#include "block/nbd.h"
> > +#include "qapi/error.h"
> > +#include "qapi/qapi-commands-block-export.h"
> > +
> > +static const BlockExportDriver* blk_exp_drivers[] = {
>  ^^
> Sternenplatzierung *hust*
> 
> > +_exp_nbd,
> > +};
> 
> Not sure whether I like this better than the block driver way of
> registering block drivers with a constructor.  It requires writing less
> code, at the expense of making the variable global.  So I think there’s
> no good reason to prefer the block driver approach.

I guess I can see one reason why we may want to switch to the
registration style eventually: If we we want to make export drivers
optional modules which may or may not be present.

> Maybe my hesitance comes from the variable being declared (as extern) in
> a header file (block/export.h).  I think I would prefer it if we put
> that external reference only here in this file.  Would that work, or do
> you have other plans that require blk_exp_nbd to be visible outside of
> nbd/server.c and this file here?

Hm, do we have precedence for "public, but not really" variables?
Normally I expect public symbols to be declared in a header file.

> > +static const BlockExportDriver *blk_exp_find_driver(BlockExportType type)
> > +{
> > +int i;
> > +
> > +for (i = 0; i < ARRAY_SIZE(blk_exp_drivers); i++) {
> > +if (blk_exp_drivers[i]->type == type) {
> > +return blk_exp_drivers[i];
> > +}
> > +}
> 
> How bad would it be to define blk_exp_drivers as
> blk_exp_drivers[BLOCK_EXPORT_TYPE__MAX] and use the BlockExportType as
> the driver index so we don’t have to loop here?
> 
> Not that it matters performance-wise.  Just something I wondered.

Might be nicer indeed. It would be incompatible with a registration
model, though, so if we're not sure yet what we want to have in the long
term, maybe the more neutral way is to leave it as it is.

> > +return NULL;
> 
> Why not e.g. g_assert_not_reached()?
> 
> (If the BlockExportType were used as the index, I’d assert that
> type < ARRAY_SIZE(blk_exp_drivers) && blk_exp_drivers[type] != NULL.  I
> don’t think there’s a reason for graceful handling.)

Same thing actually. This works as long as all drivers are always
present.

Now I understand that the current state is somewhat inconsistent in that
it uses a simple array of things that are always present, but has
functions that work as if it were dynamic. I don't mind this
inconsistency very much, but if you do, I guess I could implement a
registration type thing right away.

Kevin


signature.asc
Description: PGP signature


Re: [RFC PATCH 08/22] nbd: Add max-connections to nbd-server-start

2020-08-17 Thread Max Reitz
On 13.08.20 18:29, Kevin Wolf wrote:
> This is a QMP equivalent of qemu-nbd's --share option, limiting the

*--shared

> maximum number of clients that can attach at the same time.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  qapi/block-export.json | 10 --
>  include/block/nbd.h|  3 ++-
>  block/monitor/block-hmp-cmds.c |  2 +-
>  blockdev-nbd.c | 33 ++---
>  qemu-storage-daemon.c  |  4 ++--
>  5 files changed, 39 insertions(+), 13 deletions(-)

I suppose this is part of this series so that patch 11 can happen?

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Choice of BDRV_REQUEST_MAX_SECTORS

2020-08-17 Thread Peter Lieven

Hi,


I am currently debugging a performance issue in qemu-img convert. I think I 
have found the cause and will send a patch later.

But is there any reason why BDRV_REQUEST_MAX_SECTORS is not at least aligned 
down to 8 (4k sectors)?

Any operation that is not able to determinate an optimal or maximum request 
size from a driver will likely use this value and

might generate unaligned requests (as qemu-img convert does) as soon as the 
first iteration of an operation is done.


Peter





Re: [PATCH v9 4/5] vhost-user block device backend server

2020-08-17 Thread Coiby Xu

On Thu, Jun 18, 2020 at 05:57:40PM +0200, Kevin Wolf wrote:

Am 14.06.2020 um 20:39 hat Coiby Xu geschrieben:

By making use of libvhost-user, block device drive can be shared to
the connected vhost-user client. Only one client can connect to the
server one time.

Since vhost-user-server needs a block drive to be created first, delay
the creation of this object.

Signed-off-by: Coiby Xu 
---
 block/Makefile.objs  |   1 +
 block/export/vhost-user-blk-server.c | 669 +++
 block/export/vhost-user-blk-server.h |  35 ++
 softmmu/vl.c |   4 +
 4 files changed, 709 insertions(+)
 create mode 100644 block/export/vhost-user-blk-server.c
 create mode 100644 block/export/vhost-user-blk-server.h

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 3635b6b4c1..0eb7eff470 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -24,6 +24,7 @@ block-obj-y += throttle-groups.o
 block-obj-$(CONFIG_LINUX) += nvme.o

 block-obj-y += nbd.o
+block-obj-$(CONFIG_LINUX) += export/vhost-user-blk-server.o 
../contrib/libvhost-user/libvhost-user.o
 block-obj-$(CONFIG_SHEEPDOG) += sheepdog.o
 block-obj-$(CONFIG_LIBISCSI) += iscsi.o
 block-obj-$(if $(CONFIG_LIBISCSI),y,n) += iscsi-opts.o
diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
new file mode 100644
index 00..bbf2ceaa9b
--- /dev/null
+++ b/block/export/vhost-user-blk-server.c
@@ -0,0 +1,669 @@
+/*
+ * Sharing QEMU block devices via vhost-user protocal
+ *
+ * Author: Coiby Xu 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "block/block.h"
+#include "vhost-user-blk-server.h"
+#include "qapi/error.h"
+#include "qom/object_interfaces.h"
+#include "sysemu/block-backend.h"
+#include "util/block-helpers.h"
+
+enum {
+VHOST_USER_BLK_MAX_QUEUES = 1,
+};
+struct virtio_blk_inhdr {
+unsigned char status;
+};
+
+
+typedef struct VuBlockReq {
+VuVirtqElement *elem;
+int64_t sector_num;
+size_t size;
+struct virtio_blk_inhdr *in;
+struct virtio_blk_outhdr out;
+VuServer *server;
+struct VuVirtq *vq;
+} VuBlockReq;
+
+
+static void vu_block_req_complete(VuBlockReq *req)
+{
+VuDev *vu_dev = >server->vu_dev;
+
+/* IO size with 1 extra status byte */
+vu_queue_push(vu_dev, req->vq, req->elem, req->size + 1);
+vu_queue_notify(vu_dev, req->vq);
+
+if (req->elem) {
+free(req->elem);
+}
+
+g_free(req);
+}
+
+static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
+{
+return container_of(server, VuBlockDev, vu_server);
+}
+
+static int coroutine_fn
+vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
+  uint32_t iovcnt, uint32_t type)
+{
+struct virtio_blk_discard_write_zeroes desc;
+ssize_t size = iov_to_buf(iov, iovcnt, 0, , sizeof(desc));
+if (unlikely(size != sizeof(desc))) {
+error_report("Invalid size %ld, expect %ld", size, sizeof(desc));
+return -EINVAL;
+}
+
+VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
+uint64_t range[2] = { le64_to_cpu(desc.sector) << 9,
+  le32_to_cpu(desc.num_sectors) << 9 };
+if (type == VIRTIO_BLK_T_DISCARD) {
+if (blk_co_pdiscard(vdev_blk->backend, range[0], range[1]) == 0) {
+return 0;
+}
+} else if (type == VIRTIO_BLK_T_WRITE_ZEROES) {
+if (blk_co_pwrite_zeroes(vdev_blk->backend,
+ range[0], range[1], 0) == 0) {
+return 0;
+}
+}
+
+return -EINVAL;
+}
+
+
+static void coroutine_fn vu_block_flush(VuBlockReq *req)
+{
+VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
+BlockBackend *backend = vdev_blk->backend;
+blk_co_flush(backend);
+}
+
+
+struct req_data {
+VuServer *server;
+VuVirtq *vq;
+VuVirtqElement *elem;
+};
+
+static void coroutine_fn vu_block_virtio_process_req(void *opaque)
+{
+struct req_data *data = opaque;
+VuServer *server = data->server;
+VuVirtq *vq = data->vq;
+VuVirtqElement *elem = data->elem;
+uint32_t type;
+VuBlockReq *req;
+
+VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
+BlockBackend *backend = vdev_blk->backend;
+
+struct iovec *in_iov = elem->in_sg;
+struct iovec *out_iov = elem->out_sg;
+unsigned in_num = elem->in_num;
+unsigned out_num = elem->out_num;
+/* refer to hw/block/virtio_blk.c */
+if (elem->out_num < 1 || elem->in_num < 1) {
+error_report("virtio-blk request missing headers");
+free(elem);
+return;
+}
+
+req = g_new0(VuBlockReq, 1);
+req->server = server;
+req->vq = vq;
+req->elem = elem;
+
+if (unlikely(iov_to_buf(out_iov, out_num, 0, >out,
+sizeof(req->out)) != 

Re: [RFC PATCH 06/22] qemu-nbd: Use raw block driver for --offset

2020-08-17 Thread Max Reitz
On 13.08.20 18:29, Kevin Wolf wrote:
> Instead of implementing qemu-nbd --offset in the NBD code, just put a
> raw block node with the requested offset on top of the user image and
> rely on that doing the job.
> 
> This does not only simplify the nbd_export_new() interface and bring it
> closer to the set of options that the nbd-server-add QMP command offers,
> but in fact it also eliminates a potential source for bugs in the NBD
> code which previously had to add the offset manually in all relevant
> places.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/nbd.h |  4 ++--
>  blockdev-nbd.c  |  9 +
>  nbd/server.c| 34 +-
>  qemu-nbd.c  | 27 ---
>  4 files changed, 32 insertions(+), 42 deletions(-)

[...]

> diff --git a/nbd/server.c b/nbd/server.c
> index 774325dbe5..92360d1f08 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c

[...]

> @@ -1569,7 +1574,7 @@ NBDExport *nbd_export_new(BlockDriverState *bs, 
> uint64_t dev_offset,
>  exp->nbdflags |= (NBD_FLAG_SEND_TRIM | NBD_FLAG_SEND_WRITE_ZEROES |
>NBD_FLAG_SEND_FAST_ZERO);
>  }
> -assert(size <= INT64_MAX - dev_offset);
> +assert(size <= INT64_MAX);

Forgot to note: I think we can drop this assertion altogether now.

Max



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 07/22] block/export: Remove magic from block-export-add

2020-08-17 Thread Max Reitz
On 13.08.20 18:29, Kevin Wolf wrote:
> nbd-server-add tries to be convenient and adds two questionable
> features that we don't want to share in block-export-add, even for NBD
> exports:
> 
> 1. When requesting a writable export of a read-only device, the export
>is silently downgraded to read-only. This should be an error in the
>context of block-export-add.
> 
> 2. When using a BlockBackend name, unplugging the device from the guest
>will automatically stop the NBD server, too. This may sometimes be
>what you want, but it could also be very surprising. Let's keep
>things explicit with block-export-add. If the user wants to stop the
>export, they should tell us so.
> 
> Move these things into the nbd-server-add QMP command handler so that
> they apply only there.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/nbd.h   |  3 ++-
>  block/export/export.c | 44 ++-
>  blockdev-nbd.c| 10 --
>  nbd/server.c  | 19 ---
>  qemu-nbd.c|  3 +--
>  5 files changed, 58 insertions(+), 21 deletions(-)

[...]

> diff --git a/block/export/export.c b/block/export/export.c
> index 3d0dacb3f2..2d5f92861c 100644
> --- a/block/export/export.c
> +++ b/block/export/export.c

[...]

> @@ -34,24 +36,56 @@ static const BlockExportDriver 
> *blk_exp_find_driver(BlockExportType type)
>  return NULL;
>  }
>  
> -void qmp_block_export_add(BlockExportOptions *export, Error **errp)
> +static BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
>  {
>  const BlockExportDriver *drv;
>  
>  drv = blk_exp_find_driver(export->type);
>  if (!drv) {
>  error_setg(errp, "No driver found for the requested export type");
> -return;
> +return NULL;
>  }
>  
> -drv->create(export, errp);
> +return drv->create(export, errp);
> +}
> +
> +void qmp_block_export_add(BlockExportOptions *export, Error **errp)
> +{
> +blk_exp_add(export, errp);
>  }

Interesting.  I would have added it this way from the start then (with a
note that we’ll need it later).

>  void qmp_nbd_server_add(BlockExportOptionsNbd *arg, Error **errp)
>  {
> -BlockExportOptions export = {
> +BlockExport *export;
> +BlockDriverState *bs;
> +BlockBackend *on_eject_blk;
> +
> +BlockExportOptions export_opts = {
>  .type = BLOCK_EXPORT_TYPE_NBD,
>  .u.nbd = *arg,

This copies *arg’s contents...

>  };
> -qmp_block_export_add(, errp);
> +
> +/*
> + * nbd-server-add doesn't complain when a read-only device should be
> + * exported as writable, but simply downgrades it. This is an error with
> + * block-export-add.
> + */
> +bs = bdrv_lookup_bs(arg->device, arg->device, NULL);
> +if (bs && bdrv_is_read_only(bs)) {
> +arg->writable = false;

...and here you only modify the original *arg, but not
export_opts.u.nbd.  So I don’t think this will have any effect.

> +}
> +
> +export = blk_exp_add(_opts, errp);
> +if (!export) {
> +return;
> +}
> +
> +/*
> + * nbd-server-add removes the export when the named BlockBackend used for
> + * @device goes away.
> + */
> +on_eject_blk = blk_by_name(arg->device);
> +if (on_eject_blk) {
> +nbd_export_set_on_eject_blk(export, on_eject_blk);
> +}
>  }

The longer it gets, the more I think maybe it should be in some NBD file
like blockdev-nbd.c after all.

[...]

> diff --git a/nbd/server.c b/nbd/server.c
> index 92360d1f08..0b84fd30e2 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1506,11 +1506,22 @@ static void nbd_eject_notifier(Notifier *n, void 
> *data)
>  aio_context_release(aio_context);
>  }
>  
> +void nbd_export_set_on_eject_blk(BlockExport *exp, BlockBackend *blk)
> +{
> +NBDExport *nbd_exp = container_of(exp, NBDExport, common);
> +assert(exp->drv == _exp_nbd);
> +

I think asserting that the nbd_exp->eject_notifier is unused so far
would make sense (e.g. just checking that eject_notifier_blk is NULL).

Max

> +blk_ref(blk);
> +nbd_exp->eject_notifier_blk = blk;
> +nbd_exp->eject_notifier.notify = nbd_eject_notifier;
> +blk_add_remove_bs_notifier(blk, _exp->eject_notifier);
> +}
> +



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 7/7] hw/scsi/scsi-disk: Replace magic '512' value by BDRV_SECTOR_SIZE

2020-08-17 Thread Kevin Wolf
Am 14.08.2020 um 10:28 hat Philippe Mathieu-Daudé geschrieben:
> Use self-explicit definitions instead of magic '512' value.
> 
> Signed-off-by: Philippe Mathieu-Daudé 

In this patch, BDRV_SECTOR_SIZE actually looks correct to me. The values
have already been converted from s->qdev.blocksize in these instances,
so it's not about the sector size of the virtual disk, but purely about
QEMU's internal handling.

Reviewed-by: Kevin Wolf 




Re: [PATCH 3/7] hw/ide/core: Replace magic '512' value by BDRV_SECTOR_SIZE

2020-08-17 Thread Kevin Wolf
Am 14.08.2020 um 10:28 hat Philippe Mathieu-Daudé geschrieben:
> Use self-explicit definitions instead of magic '512' value.
> 
> Signed-off-by: Philippe Mathieu-Daudé 

BDRV_SECTOR_SIZE is the arbitrary unit in which some block layer
functions and variables work (such as bs->total_sectors). It happens to
be 512.

IDE disks have a sector size, too. Actually, two of them, a physical and
a logical one. The more important one is the logical one. We happen to
emulate only IDE devices for which the logical block size is 512 bytes
(ide_dev_initfn() errors out otherwise).

But just because two constants both happen to be 512 in practice, they
are not the same thing.

So if we want to replace all magic 512 values, we should probably
introduce a new IDE_SECTOR_SIZE and then decide case by case whether
IDE_SECTOR_SIZE or BDRV_SECTOR_SIZE is meant. I think most (if not all)
of the places you converted in this patch need to be IDE_SECTOR_SIZE.

Kevin




Re: [PATCH 2/7] hw/ide/core: Trivial typo fix

2020-08-17 Thread Kevin Wolf
Am 14.08.2020 um 10:28 hat Philippe Mathieu-Daudé geschrieben:
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Kevin Wolf 




Re: [PATCH 1/7] block/null: Make more explicit the driver default size is 1GiB

2020-08-17 Thread Kevin Wolf
Am 14.08.2020 um 10:28 hat Philippe Mathieu-Daudé geschrieben:
> As it is not obvious the default size for the null block driver
> is 1 GiB, replace the obfuscated '1 << 30' magic value by a
> definition using IEC binary prefixes.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  block/null.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/block/null.c b/block/null.c
> index 15e1d56746..8354def367 100644
> --- a/block/null.c
> +++ b/block/null.c
> @@ -11,6 +11,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/units.h"
>  #include "qapi/error.h"
>  #include "qapi/qmp/qdict.h"
>  #include "qapi/qmp/qstring.h"
> @@ -21,6 +22,7 @@
>  
>  #define NULL_OPT_LATENCY "latency-ns"
>  #define NULL_OPT_ZEROES  "read-zeroes"
> +#define NULL_OPT_SIZE(1 * GiB)

Let's use a different naming schema for option names and option default
values, and an empty line between the definition for both. The way this
patch has it, it looks like another option name until you look at the
actual value.

Kevin




[PATCH] qemu-img: Explicit number replaced by a constant

2020-08-17 Thread Yi Li
Signed-off-by: Yi Li 
---
 qemu-img.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 5308773811..a0fbc2757c 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1181,7 +1181,7 @@ static int64_t find_nonzero(const uint8_t *buf, int64_t n)
 }
 
 /*
- * Returns true iff the first sector pointed to by 'buf' contains at least
+ * Returns true if the first sector pointed to by 'buf' contains at least
  * a non-NUL byte.
  *
  * 'pnum' is set to the number of sectors (including and immediately following
@@ -1200,10 +1200,10 @@ static int is_allocated_sectors(const uint8_t *buf, int 
n, int *pnum,
 *pnum = 0;
 return 0;
 }
-is_zero = buffer_is_zero(buf, 512);
+is_zero = buffer_is_zero(buf, BDRV_SECTOR_SIZE);
 for(i = 1; i < n; i++) {
-buf += 512;
-if (is_zero != buffer_is_zero(buf, 512)) {
+buf += BDRV_SECTOR_SIZE;
+if (is_zero != buffer_is_zero(buf, BDRV_SECTOR_SIZE)) {
 break;
 }
 }
@@ -2489,8 +2489,8 @@ static int img_convert(int argc, char **argv)
 }
 }
 
-qemu_opt_set_number(opts, BLOCK_OPT_SIZE, s.total_sectors * 512,
-_abort);
+qemu_opt_set_number(opts, BLOCK_OPT_SIZE,
+s.total_sectors * BDRV_SECTOR_SIZE, _abort);
 ret = add_old_style_options(out_fmt, opts, out_baseimg, NULL);
 if (ret < 0) {
 goto out;
-- 
2.25.3






Re: [RFC PATCH 06/22] qemu-nbd: Use raw block driver for --offset

2020-08-17 Thread Max Reitz
On 13.08.20 18:29, Kevin Wolf wrote:
> Instead of implementing qemu-nbd --offset in the NBD code, just put a
> raw block node with the requested offset on top of the user image and
> rely on that doing the job.
> 
> This does not only simplify the nbd_export_new() interface and bring it
> closer to the set of options that the nbd-server-add QMP command offers,
> but in fact it also eliminates a potential source for bugs in the NBD
> code which previously had to add the offset manually in all relevant
> places.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/nbd.h |  4 ++--
>  blockdev-nbd.c  |  9 +
>  nbd/server.c| 34 +-
>  qemu-nbd.c  | 27 ---
>  4 files changed, 32 insertions(+), 42 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v3 00/12] preallocate filter

2020-08-17 Thread Vladimir Sementsov-Ogievskiy

Hmm strange. Probably need to check #ifdef FUSE_SUPER_MAGIC

But again, it's a problem of patch 12, which is just an rfc not to be merged.

17.08.2020 12:48, no-re...@patchew.org wrote:

Patchew URL: 
https://patchew.org/QEMU/20200817091553.283155-1-vsement...@virtuozzo.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

   CC  block/nvme.o
   CC  block/nbd.o
/tmp/qemu-test/src/block/file-posix.c: In function 'is_fuse':
/tmp/qemu-test/src/block/file-posix.c:334:35: error: 'FUSE_SUPER_MAGIC' 
undeclared (first use in this function)
  if (ret == 0 && buf.f_type == FUSE_SUPER_MAGIC) {
^
/tmp/qemu-test/src/block/file-posix.c:334:35: note: each undeclared identifier 
is reported only once for each function it appears in
   CC  block/sheepdog.o
make: *** [block/file-posix.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
   File "./tests/docker/docker.py", line 709, in 
---
 raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=ee7ebe61e5fd4a018bbb2a16ec4a5a54', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-o73ap74n/src/docker-src.2020-08-17-05.46.38.25129:/var/tmp/qemu:z,ro',
 'qemu/centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=ee7ebe61e5fd4a018bbb2a16ec4a5a54
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-o73ap74n/src'
make: *** [docker-run-test-quick@centos7] Error 2

real1m49.146s
user0m8.506s


The full log is available at
http://patchew.org/logs/20200817091553.283155-1-vsement...@virtuozzo.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com




--
Best regards,
Vladimir



Re: [PATCH v3 00/12] preallocate filter

2020-08-17 Thread Vladimir Sementsov-Ogievskiy

Aha, if we want commit 11, we'll need also a stub function for win32.

17.08.2020 12:45, no-re...@patchew.org wrote:

Patchew URL: 
https://patchew.org/QEMU/20200817091553.283155-1-vsement...@virtuozzo.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

   LINKqemu-edid.exe
/usr/lib/gcc/x86_64-w64-mingw32/9.2.1/../../../../x86_64-w64-mingw32/bin/ld: 
block/qcow2.o: in function `qcow2_vz_insert_prealloc_filter':
/tmp/qemu-test/src/block/qcow2.c:1878: undefined reference to 
`bdrv_is_file_on_fuse'
collect2: error: ld returned 1 exit status
make: *** [/tmp/qemu-test/src/rules.mak:124: qemu-io.exe] Error 1
make: *** Waiting for unfinished jobs
   GEN x86_64-softmmu/hmp-commands.h
   GEN x86_64-softmmu/hmp-commands-info.h
---
   CC  aarch64-softmmu/hw/arm/xlnx-zynqmp.o
/usr/lib/gcc/x86_64-w64-mingw32/9.2.1/../../../../x86_64-w64-mingw32/bin/ld: 
../block/qcow2.o: in function `qcow2_vz_insert_prealloc_filter':
/tmp/qemu-test/src/block/qcow2.c:1878: undefined reference to 
`bdrv_is_file_on_fuse'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:219: qemu-system-x86_64w.exe] Error 1
make: *** [Makefile:527: x86_64-softmmu/all] Error 2
   CC  aarch64-softmmu/hw/arm/xlnx-zcu102.o
   CC  aarch64-softmmu/hw/arm/xlnx-versal.o
   CC  aarch64-softmmu/hw/arm/xlnx-versal-virt.o
---
   LINKaarch64-softmmu/qemu-system-aarch64w.exe
/usr/lib/gcc/x86_64-w64-mingw32/9.2.1/../../../../x86_64-w64-mingw32/bin/ld: 
../block/qcow2.o: in function `qcow2_vz_insert_prealloc_filter':
/tmp/qemu-test/src/block/qcow2.c:1878: undefined reference to 
`bdrv_is_file_on_fuse'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:219: qemu-system-aarch64w.exe] Error 1
make: *** [Makefile:527: aarch64-softmmu/all] Error 2
Traceback (most recent call last):
   File "./tests/docker/docker.py", line 709, in 
 sys.exit(main())
---
 raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=46c939bfc42848c6bf98acc32e769fb0', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-v9t81w9y/src/docker-src.2020-08-17-05.40.50.16028:/var/tmp/qemu:z,ro',
 'qemu/fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=46c939bfc42848c6bf98acc32e769fb0
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-v9t81w9y/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real4m43.217s
user0m8.945s


The full log is available at
http://patchew.org/logs/20200817091553.283155-1-vsement...@virtuozzo.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com




--
Best regards,
Vladimir



Re: [PATCH 00/18] hw/riscv: Add Microchip PolarFire SoC Icicle Kit board support

2020-08-17 Thread Bin Meng
Hi Anup,

On Sat, Aug 15, 2020 at 1:44 AM Anup Patel  wrote:
>
> On Fri, Aug 14, 2020 at 10:12 PM Bin Meng  wrote:
> >
> > From: Bin Meng 
> >
> > This adds support for Microchip PolarFire SoC Icicle Kit board.
> > The Icicle Kit board integrates a PolarFire SoC, with one SiFive's
> > E51 plus four U54 cores and many on-chip peripherals and an FPGA.
>
> Nice Work !!! This is very helpful.

Thanks!

>
> >
> > For more details about Microchip PolarFire Soc, please see:
> > https://www.microsemi.com/product-directory/soc-fpgas/5498-polarfire-soc-fpga
> >
> > The Icicle Kit board information can be found here:
> > https://www.microsemi.com/existing-parts/parts/152514
> >
> > Unlike SiFive FU540, the RISC-V core resect vector is at 0x2022.
> > The RISC-V CPU and HART codes has been updated to set the core's
> > reset vector based on a configurable property from machine codes.
> >
> > The following perepherals are created as an unimplemented device:
> >
> > - Bus Error Uint 0/1/2/3/4
> > - L2 cache controller
> > - SYSREG
> > - MPUCFG
> > - IOSCBCFG
> > - GPIO
> >
> > The following perepherals are emulated:
> > - SiFive CLINT
> > - SiFive PLIC
> > - PolarFire SoC Multi-Mode UART
> > - PolarFire SoC DMA
> > - Cadence eMMC/SDHCI controller
> > - Cadence Gigabit Ethernet MAC
> >
> > Some bugs in the SD card codes are fixed during the development.
> >
> > The BIOS image used by this machine is hss.bin, aka Hart Software
> > Services, which can be built from:
> > https://github.com/polarfire-soc/hart-software-services
> >
> > To launch this machine:
> > $ qemu-system-riscv64 -M microchip-icicle-kit -smp 5 \
> > -bios path/to/hss.bin -sd path/to/sdcard.img \
> > -nic tap,ifname=tap,script=no,model=cadence_gem \
> > -display none -serial stdio \
> > -chardev socket,id=serial1,path=serial1.sock,server,wait \
> > -serial chardev:serial1
>
> Currently, it is fine to use HSS (with OpenSBI v0.6 as a library) but
> this is not aligned with the existing booting flow of many RISC-V
> systems.

Yep, unfortunately this is the case currently.

>
> It will be nice to have standard U-Boot RISC-V boot-flow working
> on Microchip PolarFire SoC:
> U-Boot SPL (BIOS) => FW_DYNAMIC (Generic) => U-Boot S-mode
>

Agreed.

> The Microchip HSS is quite convoluted. It has:
> 1. DDR Init
> 2. Boot device support
> 3. SBI support using OpenSBI as library
> 4. Simple TEE support
>
> I think point 1) and 2) above should be part of U-Boot SPL.
> The point 3) can be OpenSBI FW_DYNAMIC.
>
> Lastly,for point 4), we are working on a new OpenSBI feature using
> which we can run independent Secure OS and Non-Secure OS using
> U-Boot_SPL+OpenSBI (for both SiFive Unleashed and Microchip
> PolarFire).
>
> Do you have plans for adding U-Boot SPL support for this board ??

+ Cyril Jean from Microchip

I will have to leave this question to Cyril to comment.

Regards,
Bin



Re: [RFC PATCH 05/22] qemu-storage-daemon: Use qmp_block_export_add()

2020-08-17 Thread Max Reitz
On 13.08.20 18:29, Kevin Wolf wrote:
> No reason to duplicate the functionality locally, we can now just reuse
> the QMP command block-export-add for --export.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  qemu-storage-daemon.c | 13 +
>  1 file changed, 1 insertion(+), 12 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 0/1] qcow2: Skip copy-on-write when allocating a zero cluster

2020-08-17 Thread Kevin Wolf
Am 14.08.2020 um 16:57 hat Alberto Garcia geschrieben:
> Hi,
> 
> the patch is self-explanatory, but I'm using the cover letter to raise
> a couple of related questions.
> 
> Since commit c8bb23cbdbe / QEMU 4.1.0 (and if the storage backend
> allows it) writing to an image created with preallocation=metadata can
> be slower (20% in my tests) than writing to an image with no
> preallocation at all.

A while ago we had a case where commit c8bb23cbdbe was actually reported
as a major performance regression, so it's a big "it depends".

XFS people told me that they consider this code a bad idea. Just because
it's a specialised "write zeroes" operation, it's not necessarily fast
on filesystems. In particular, on XFS, ZERO_RANGE causes a queue drain
with O_DIRECT (probably hurts cases with high queue depths) and
additionally even a page cache flush without O_DIRECT.

So in a way this whole thing is a two-edged sword.

> So:
> 
> a) shall we include a warning in the documentation ("note that this
>preallocation mode can result in worse performance")?

To be honest, I don't really understand this case yet. With metadata
preallocation, the clusters are already marked as allocated, so why
would handle_alloc_space() even be called? We're not allocating new
clusters after all?

Or are you saying that ZERO_RANGE + pwrite on a sparse file (= cluster
allocation) is faster for you than just the pwrite alone (= writing to
already allocated cluster)?

> b) why don't we also initialize preallocated clusters with
>QCOW_OFLAG_ZERO? (at least when there's no subclusters involved,
>i.e. no backing file). This would make reading from them (and
>writing to them, after this patch) faster.

Because the idea with metadata preallocation is that you don't have to
perform any COW and update any metdata because everything is already
allocated. If you set the zero flag, you get cluster allocations with
COW again, defeating the whole purpose of the preallocation.

Kevin




Re: [PATCH v2 2/3] hw/sd: sd: Correct the maximum size of a Standard Capacity SD Memory Card

2020-08-17 Thread Philippe Mathieu-Daudé
On 8/17/20 12:03 PM, Bin Meng wrote:
> Per the SD spec, Standard Capacity SD Memory Card (SDSC) supports
> capacity up to and including 2 GiB.
> 
> Fixes: 2d7adea4fe ("hw/sd: Support SDHC size cards")
> Signed-off-by: Bin Meng 

Reviewed-by: Philippe Mathieu-Daudé 

> ---
> 
> Changes in v2:
> - fix SDSC size check in sd_set_csd() too
> 
>  hw/sd/sd.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/sd/sd.c b/hw/sd/sd.c
> index 3226404..254d713 100644
> --- a/hw/sd/sd.c
> +++ b/hw/sd/sd.c
> @@ -50,6 +50,8 @@
>  
>  //#define DEBUG_SD 1
>  
> +#define SDSC_MAX_CAPACITY   (2 * GiB)
> +
>  typedef enum {
>  sd_r0 = 0,/* no response */
>  sd_r1,/* normal response command */
> @@ -313,7 +315,7 @@ static void sd_ocr_powerup(void *opaque)
>  /* card power-up OK */
>  sd->ocr = FIELD_DP32(sd->ocr, OCR, CARD_POWER_UP, 1);
>  
> -if (sd->size > 1 * GiB) {
> +if (sd->size > SDSC_MAX_CAPACITY) {
>  sd->ocr = FIELD_DP32(sd->ocr, OCR, CARD_CAPACITY, 1);
>  }
>  }
> @@ -385,7 +387,7 @@ static void sd_set_csd(SDState *sd, uint64_t size)
>  uint32_t sectsize = (1 << (SECTOR_SHIFT + 1)) - 1;
>  uint32_t wpsize = (1 << (WPGROUP_SHIFT + 1)) - 1;
>  
> -if (size <= 1 * GiB) { /* Standard Capacity SD */
> +if (size <= SDSC_MAX_CAPACITY) { /* Standard Capacity SD */
>  sd->csd[0] = 0x00;   /* CSD structure */
>  sd->csd[1] = 0x26;   /* Data read access-time-1 */
>  sd->csd[2] = 0x00;   /* Data read access-time-2 */
> 



[PATCH v2 2/3] hw/sd: sd: Correct the maximum size of a Standard Capacity SD Memory Card

2020-08-17 Thread Bin Meng
Per the SD spec, Standard Capacity SD Memory Card (SDSC) supports
capacity up to and including 2 GiB.

Fixes: 2d7adea4fe ("hw/sd: Support SDHC size cards")
Signed-off-by: Bin Meng 

---

Changes in v2:
- fix SDSC size check in sd_set_csd() too

 hw/sd/sd.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index 3226404..254d713 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -50,6 +50,8 @@
 
 //#define DEBUG_SD 1
 
+#define SDSC_MAX_CAPACITY   (2 * GiB)
+
 typedef enum {
 sd_r0 = 0,/* no response */
 sd_r1,/* normal response command */
@@ -313,7 +315,7 @@ static void sd_ocr_powerup(void *opaque)
 /* card power-up OK */
 sd->ocr = FIELD_DP32(sd->ocr, OCR, CARD_POWER_UP, 1);
 
-if (sd->size > 1 * GiB) {
+if (sd->size > SDSC_MAX_CAPACITY) {
 sd->ocr = FIELD_DP32(sd->ocr, OCR, CARD_CAPACITY, 1);
 }
 }
@@ -385,7 +387,7 @@ static void sd_set_csd(SDState *sd, uint64_t size)
 uint32_t sectsize = (1 << (SECTOR_SHIFT + 1)) - 1;
 uint32_t wpsize = (1 << (WPGROUP_SHIFT + 1)) - 1;
 
-if (size <= 1 * GiB) { /* Standard Capacity SD */
+if (size <= SDSC_MAX_CAPACITY) { /* Standard Capacity SD */
 sd->csd[0] = 0x00; /* CSD structure */
 sd->csd[1] = 0x26; /* Data read access-time-1 */
 sd->csd[2] = 0x00; /* Data read access-time-2 */
-- 
2.7.4




[PATCH v2 0/3] hw/sd: Add Cadence SDHCI emulation

2020-08-17 Thread Bin Meng
This series is spun off from the following series as it is hw/sd
centric, so that it can be picked up separately by Philippe.

http://patchwork.ozlabs.org/project/qemu-devel/list/?series=195648

This series fixed 2 SD card issues, and added a new model for
Cadence SDHCI controller.

Patch "[09/18] hw/sd: sdhci: Make sdhci_poweron_reset() internal visible"
in this series per the review comments.

Changes in v2:
- remove the pointless zero initialization
- fix SDSC size check in sd_set_csd() too
- use 's' for the model state
- call device_cold_reset() in cadence_sdhci_reset()
- add .impl in cadence_sdhci_ops
- move Cadence specific register defines to cadence_sdhci.c
- use 'sdhci' instead of 'slot' to represent SDHCIState
- use sysbus_mmio_get_region() to access SDHCI model's memory region
- initialize TYPE_SYSBUS_SDHCI in the instance_init() so that users
  of Cadence SDHCI do not have to do that themselves
- propergate irq and 'sd-bus' from generic-sdhci

Bin Meng (3):
  hw/sd: sd: Fix incorrect populated function switch status data
structure
  hw/sd: sd: Correct the maximum size of a Standard Capacity SD Memory
Card
  hw/sd: Add Cadence SDHCI emulation

 hw/sd/Kconfig |   4 +
 hw/sd/Makefile.objs   |   1 +
 hw/sd/cadence_sdhci.c | 200 ++
 hw/sd/sd.c|   9 +-
 include/hw/sd/cadence_sdhci.h |  46 ++
 5 files changed, 257 insertions(+), 3 deletions(-)
 create mode 100644 hw/sd/cadence_sdhci.c
 create mode 100644 include/hw/sd/cadence_sdhci.h

-- 
2.7.4




[PATCH v2 2/3] hw/sd: sd: Correct the maximum size of a Standard Capacity SD Memory Card

2020-08-17 Thread Bin Meng
Per the SD spec, Standard Capacity SD Memory Card (SDSC) supports
capacity up to and including 2 GiB.

Fixes: 2d7adea4fe ("hw/sd: Support SDHC size cards")
Signed-off-by: Bin Meng 

---

Changes in v2:
- fix SDSC size check in sd_set_csd() too

 hw/sd/sd.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index 3226404..254d713 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -50,6 +50,8 @@
 
 //#define DEBUG_SD 1
 
+#define SDSC_MAX_CAPACITY   (2 * GiB)
+
 typedef enum {
 sd_r0 = 0,/* no response */
 sd_r1,/* normal response command */
@@ -313,7 +315,7 @@ static void sd_ocr_powerup(void *opaque)
 /* card power-up OK */
 sd->ocr = FIELD_DP32(sd->ocr, OCR, CARD_POWER_UP, 1);
 
-if (sd->size > 1 * GiB) {
+if (sd->size > SDSC_MAX_CAPACITY) {
 sd->ocr = FIELD_DP32(sd->ocr, OCR, CARD_CAPACITY, 1);
 }
 }
@@ -385,7 +387,7 @@ static void sd_set_csd(SDState *sd, uint64_t size)
 uint32_t sectsize = (1 << (SECTOR_SHIFT + 1)) - 1;
 uint32_t wpsize = (1 << (WPGROUP_SHIFT + 1)) - 1;
 
-if (size <= 1 * GiB) { /* Standard Capacity SD */
+if (size <= SDSC_MAX_CAPACITY) { /* Standard Capacity SD */
 sd->csd[0] = 0x00; /* CSD structure */
 sd->csd[1] = 0x26; /* Data read access-time-1 */
 sd->csd[2] = 0x00; /* Data read access-time-2 */
-- 
2.7.4




[PATCH v2 3/3] hw/sd: Add Cadence SDHCI emulation

2020-08-17 Thread Bin Meng
Cadence SD/SDIO/eMMC Host Controller (SD4HC) is an SDHCI compatible
controller. The SDHCI compatible registers start from offset 0x200,
which are called Slot Register Set (SRS) in its datasheet.

This creates a Cadence SDHCI model built on top of the existing
generic SDHCI model. Cadence specific Host Register Set (HRS) is
implemented to make guest software happy.

Signed-off-by: Bin Meng 

---

Changes in v2:
- use 's' for the model state
- call device_cold_reset() in cadence_sdhci_reset()
- add .impl in cadence_sdhci_ops
- move Cadence specific register defines to cadence_sdhci.c
- use 'sdhci' instead of 'slot' to represent SDHCIState
- use sysbus_mmio_get_region() to access SDHCI model's memory region
- initialize TYPE_SYSBUS_SDHCI in the instance_init() so that users
  of Cadence SDHCI do not have to do that themselves
- propergate irq and 'sd-bus' from generic-sdhci

 hw/sd/Kconfig |   4 +
 hw/sd/Makefile.objs   |   1 +
 hw/sd/cadence_sdhci.c | 200 ++
 include/hw/sd/cadence_sdhci.h |  46 ++
 4 files changed, 251 insertions(+)
 create mode 100644 hw/sd/cadence_sdhci.c
 create mode 100644 include/hw/sd/cadence_sdhci.h

diff --git a/hw/sd/Kconfig b/hw/sd/Kconfig
index c5e1e55..633b9af 100644
--- a/hw/sd/Kconfig
+++ b/hw/sd/Kconfig
@@ -19,3 +19,7 @@ config SDHCI_PCI
 default y if PCI_DEVICES
 depends on PCI
 select SDHCI
+
+config CADENCE_SDHCI
+bool
+select SDHCI
diff --git a/hw/sd/Makefile.objs b/hw/sd/Makefile.objs
index 0d1df17..4d500a6 100644
--- a/hw/sd/Makefile.objs
+++ b/hw/sd/Makefile.objs
@@ -10,3 +10,4 @@ common-obj-$(CONFIG_OMAP) += omap_mmc.o
 common-obj-$(CONFIG_PXA2XX) += pxa2xx_mmci.o
 common-obj-$(CONFIG_RASPI) += bcm2835_sdhost.o
 common-obj-$(CONFIG_ASPEED_SOC) += aspeed_sdhci.o
+common-obj-$(CONFIG_CADENCE_SDHCI) += cadence_sdhci.o
diff --git a/hw/sd/cadence_sdhci.c b/hw/sd/cadence_sdhci.c
new file mode 100644
index 000..4f01d63
--- /dev/null
+++ b/hw/sd/cadence_sdhci.c
@@ -0,0 +1,200 @@
+/*
+ * Cadence SDHCI emulation
+ *
+ * Copyright (c) 2020 Wind River Systems, Inc.
+ *
+ * Author:
+ *   Bin Meng 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 or
+ * (at your option) version 3 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/bitops.h"
+#include "qemu/error-report.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "migration/vmstate.h"
+#include "hw/irq.h"
+#include "hw/sd/cadence_sdhci.h"
+#include "sdhci-internal.h"
+
+/* HRS - Host Register Set (specific to Cadence) */
+
+#define CADENCE_SDHCI_HRS00 0x00/* general information */
+#define CADENCE_SDHCI_HRS00_SWR BIT(0)
+#define CADENCE_SDHCI_HRS00_POR_VAL 0x0001
+
+#define CADENCE_SDHCI_HRS04 0x10/* PHY access port */
+#define CADENCE_SDHCI_HRS04_WR  BIT(24)
+#define CADENCE_SDHCI_HRS04_RD  BIT(25)
+#define CADENCE_SDHCI_HRS04_ACK BIT(26)
+
+#define CADENCE_SDHCI_HRS06 0x18/* eMMC control */
+#define CADENCE_SDHCI_HRS06_TUNE_UP BIT(15)
+
+/* SRS - Slot Register Set (SDHCI-compatible) */
+
+#define CADENCE_SDHCI_SRS_BASE  0x200
+
+#define TO_REG(addr)((addr) / sizeof(uint32_t))
+
+static void cadence_sdhci_instance_init(Object *obj)
+{
+CadenceSDHCIState *s = CADENCE_SDHCI(obj);
+
+object_initialize_child(OBJECT(s), "cadence-sdhci.sdhci",
+>sdhci, TYPE_SYSBUS_SDHCI);
+}
+
+static void cadence_sdhci_reset(DeviceState *dev)
+{
+CadenceSDHCIState *s = CADENCE_SDHCI(dev);
+
+memset(s->regs, 0, CADENCE_SDHCI_REG_SIZE);
+s->regs[TO_REG(CADENCE_SDHCI_HRS00)] = CADENCE_SDHCI_HRS00_POR_VAL;
+
+device_cold_reset(DEVICE(>sdhci));
+}
+
+static uint64_t cadence_sdhci_read(void *opaque, hwaddr addr, unsigned int 
size)
+{
+uint32_t val = 0;
+CadenceSDHCIState *s = opaque;
+
+if (addr < CADENCE_SDHCI_REG_SIZE) {
+val = s->regs[TO_REG(addr)];
+} else {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: Out-of-bounds read at 0x%" HWADDR_PRIx "\n",
+  __func__, addr);
+}
+
+return (uint64_t)val;
+}
+
+static void cadence_sdhci_write(void *opaque, hwaddr addr, uint64_t val,
+unsigned int size)
+{
+CadenceSDHCIState *s = opaque;
+uint32_t val32 = (uint32_t)val;
+
+switch (addr) {
+case 

[PATCH v2 1/3] hw/sd: sd: Fix incorrect populated function switch status data structure

2020-08-17 Thread Bin Meng
At present the function switch status data structure bit [399:376]
are wrongly pupulated. These 3 bytes encode function switch status
for the 6 function groups, with 4 bits per group, starting from
function group 6 at bit 399, then followed by function group 5 at
bit 395, and so on.

However the codes mistakenly fills in the function group 1 status
at bit 399. This fixes the code logic.

Fixes: a1bb27b1e9 ("SD card emulation (initial implementation)")
Signed-off-by: Bin Meng 
Reviewed-by: Philippe Mathieu-Daudé 

---

Changes in v2:
- remove the pointless zero initialization

 hw/sd/sd.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index fad9cf1..3226404 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -806,11 +806,12 @@ static void sd_function_switch(SDState *sd, uint32_t arg)
 sd->data[11] = 0x43;
 sd->data[12] = 0x80;   /* Supported group 1 functions */
 sd->data[13] = 0x03;
+
 for (i = 0; i < 6; i ++) {
 new_func = (arg >> (i * 4)) & 0x0f;
 if (mode && new_func != 0x0f)
 sd->function_group[i] = new_func;
-sd->data[14 + (i >> 1)] = new_func << ((i * 4) & 4);
+sd->data[16 - (i >> 1)] |= new_func << ((i % 2) * 4);
 }
 memset(>data[17], 0, 47);
 stw_be_p(sd->data + 64, sd_crc16(sd->data, 64));
-- 
2.7.4




[PATCH v2 0/3] hw/sd: Add Cadence SDHCI emulation

2020-08-17 Thread Bin Meng
This series is spun off from the following series as it is hw/sd
centric, so that it can be picked up separately by Philippe.

http://patchwork.ozlabs.org/project/qemu-devel/list/?series=195648

This series fixed 2 SD card issues, and added a new model for
Cadence SDHCI controller.

Patch "[09/18] hw/sd: sdhci: Make sdhci_poweron_reset() internal visible"
in this series per the review comments.

Changes in v2:
- remove the pointless zero initialization
- fix SDSC size check in sd_set_csd() too
- use 's' for the model state
- call device_cold_reset() in cadence_sdhci_reset()
- add .impl in cadence_sdhci_ops
- move Cadence specific register defines to cadence_sdhci.c
- use 'sdhci' instead of 'slot' to represent SDHCIState
- use sysbus_mmio_get_region() to access SDHCI model's memory region
- initialize TYPE_SYSBUS_SDHCI in the instance_init() so that users
  of Cadence SDHCI do not have to do that themselves
- propergate irq and 'sd-bus' from generic-sdhci

Bin Meng (3):
  hw/sd: sd: Fix incorrect populated function switch status data
structure
  hw/sd: sd: Correct the maximum size of a Standard Capacity SD Memory
Card
  hw/sd: Add Cadence SDHCI emulation

 hw/sd/Kconfig |   4 +
 hw/sd/Makefile.objs   |   1 +
 hw/sd/cadence_sdhci.c | 200 ++
 hw/sd/sd.c|   9 +-
 include/hw/sd/cadence_sdhci.h |  46 ++
 5 files changed, 257 insertions(+), 3 deletions(-)
 create mode 100644 hw/sd/cadence_sdhci.c
 create mode 100644 include/hw/sd/cadence_sdhci.h

-- 
2.7.4




[PATCH v2 1/3] hw/sd: sd: Fix incorrect populated function switch status data structure

2020-08-17 Thread Bin Meng
At present the function switch status data structure bit [399:376]
are wrongly pupulated. These 3 bytes encode function switch status
for the 6 function groups, with 4 bits per group, starting from
function group 6 at bit 399, then followed by function group 5 at
bit 395, and so on.

However the codes mistakenly fills in the function group 1 status
at bit 399. This fixes the code logic.

Fixes: a1bb27b1e9 ("SD card emulation (initial implementation)")
Signed-off-by: Bin Meng 
Reviewed-by: Philippe Mathieu-Daudé 

---

Changes in v2:
- remove the pointless zero initialization

 hw/sd/sd.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index fad9cf1..3226404 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -806,11 +806,12 @@ static void sd_function_switch(SDState *sd, uint32_t arg)
 sd->data[11] = 0x43;
 sd->data[12] = 0x80;   /* Supported group 1 functions */
 sd->data[13] = 0x03;
+
 for (i = 0; i < 6; i ++) {
 new_func = (arg >> (i * 4)) & 0x0f;
 if (mode && new_func != 0x0f)
 sd->function_group[i] = new_func;
-sd->data[14 + (i >> 1)] = new_func << ((i * 4) & 4);
+sd->data[16 - (i >> 1)] |= new_func << ((i % 2) * 4);
 }
 memset(>data[17], 0, 47);
 stw_be_p(sd->data + 64, sd_crc16(sd->data, 64));
-- 
2.7.4




Re: [RFC PATCH 04/22] block/export: Add BlockExport infrastructure and block-export-add

2020-08-17 Thread Max Reitz
On 13.08.20 18:29, Kevin Wolf wrote:
> We want to have a common set of commands for all types of block exports.
> Currently, this is only NBD, but we're going to add more types.
> 
> This patch adds the basic BlockExport and BlockExportDriver structs and
> a QMP command block-export-add that creates a new export based on the
> given BlockExportOptions.
> 
> qmp_nbd_server_add() becomes a wrapper around qmp_block_export_add().
> 
> Signed-off-by: Kevin Wolf 
> ---
>  qapi/block-export.json |  9 ++
>  include/block/export.h | 32 +
>  include/block/nbd.h|  3 +-
>  block/export/export.c  | 57 ++
>  blockdev-nbd.c | 19 -
>  nbd/server.c   | 15 +-
>  Makefile.objs  |  6 ++--
>  block/Makefile.objs|  2 ++
>  block/export/Makefile.objs |  1 +
>  9 files changed, 132 insertions(+), 12 deletions(-)
>  create mode 100644 include/block/export.h
>  create mode 100644 block/export/export.c
>  create mode 100644 block/export/Makefile.objs

Nothing of too great importance below.  But it’s an RFC, so comments I
will give.

> diff --git a/block/export/export.c b/block/export/export.c
> new file mode 100644
> index 00..3d0dacb3f2
> --- /dev/null
> +++ b/block/export/export.c
> @@ -0,0 +1,57 @@
> +/*
> + * Common block export infrastructure
> + *
> + * Copyright (c) 2012, 2020 Red Hat, Inc.
> + *
> + * Authors:
> + * Paolo Bonzini 
> + * Kevin Wolf 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> + * later.  See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +
> +#include "block/export.h"
> +#include "block/nbd.h"
> +#include "qapi/error.h"
> +#include "qapi/qapi-commands-block-export.h"
> +
> +static const BlockExportDriver* blk_exp_drivers[] = {
 ^^
Sternenplatzierung *hust*

> +_exp_nbd,
> +};

Not sure whether I like this better than the block driver way of
registering block drivers with a constructor.  It requires writing less
code, at the expense of making the variable global.  So I think there’s
no good reason to prefer the block driver approach.

Maybe my hesitance comes from the variable being declared (as extern) in
a header file (block/export.h).  I think I would prefer it if we put
that external reference only here in this file.  Would that work, or do
you have other plans that require blk_exp_nbd to be visible outside of
nbd/server.c and this file here?

> +static const BlockExportDriver *blk_exp_find_driver(BlockExportType type)
> +{
> +int i;
> +
> +for (i = 0; i < ARRAY_SIZE(blk_exp_drivers); i++) {
> +if (blk_exp_drivers[i]->type == type) {
> +return blk_exp_drivers[i];
> +}
> +}

How bad would it be to define blk_exp_drivers as
blk_exp_drivers[BLOCK_EXPORT_TYPE__MAX] and use the BlockExportType as
the driver index so we don’t have to loop here?

Not that it matters performance-wise.  Just something I wondered.

> +return NULL;

Why not e.g. g_assert_not_reached()?

(If the BlockExportType were used as the index, I’d assert that
type < ARRAY_SIZE(blk_exp_drivers) && blk_exp_drivers[type] != NULL.  I
don’t think there’s a reason for graceful handling.)

> +}

[...]

> +void qmp_nbd_server_add(BlockExportOptionsNbd *arg, Error **errp)
> +{
> +BlockExportOptions export = {
> +.type = BLOCK_EXPORT_TYPE_NBD,
> +.u.nbd = *arg,
> +};
> +qmp_block_export_add(, errp);
> +}

Hm.  I think I’d’ve kept this in blockdev-nbd.c, actually, but, well.
It’ll stay a one-off.

> diff --git a/blockdev-nbd.c b/blockdev-nbd.c
> index 98ee1b6170..a1dc11bdd7 100644
> --- a/blockdev-nbd.c
> +++ b/blockdev-nbd.c

[...]

> @@ -217,6 +220,8 @@ void qmp_nbd_server_add(BlockExportOptionsNbd *arg, Error 
> **errp)
>  
>   out:
>  aio_context_release(aio_context);
> +/* TODO Remove the cast: Move to server.c which can access fields of exp 
> */
> +return (BlockExport*) exp;

*hust*

(But if it’s moved soon anyway so we can use >common, then whatever.)

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v3 00/12] preallocate filter

2020-08-17 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20200817091553.283155-1-vsement...@virtuozzo.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  block/nvme.o
  CC  block/nbd.o
/tmp/qemu-test/src/block/file-posix.c: In function 'is_fuse':
/tmp/qemu-test/src/block/file-posix.c:334:35: error: 'FUSE_SUPER_MAGIC' 
undeclared (first use in this function)
 if (ret == 0 && buf.f_type == FUSE_SUPER_MAGIC) {
   ^
/tmp/qemu-test/src/block/file-posix.c:334:35: note: each undeclared identifier 
is reported only once for each function it appears in
  CC  block/sheepdog.o
make: *** [block/file-posix.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 709, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=ee7ebe61e5fd4a018bbb2a16ec4a5a54', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-o73ap74n/src/docker-src.2020-08-17-05.46.38.25129:/var/tmp/qemu:z,ro',
 'qemu/centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=ee7ebe61e5fd4a018bbb2a16ec4a5a54
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-o73ap74n/src'
make: *** [docker-run-test-quick@centos7] Error 2

real1m49.146s
user0m8.506s


The full log is available at
http://patchew.org/logs/20200817091553.283155-1-vsement...@virtuozzo.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v3 00/12] preallocate filter

2020-08-17 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20200817091553.283155-1-vsement...@virtuozzo.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

  LINKqemu-edid.exe
/usr/lib/gcc/x86_64-w64-mingw32/9.2.1/../../../../x86_64-w64-mingw32/bin/ld: 
block/qcow2.o: in function `qcow2_vz_insert_prealloc_filter':
/tmp/qemu-test/src/block/qcow2.c:1878: undefined reference to 
`bdrv_is_file_on_fuse'
collect2: error: ld returned 1 exit status
make: *** [/tmp/qemu-test/src/rules.mak:124: qemu-io.exe] Error 1
make: *** Waiting for unfinished jobs
  GEN x86_64-softmmu/hmp-commands.h
  GEN x86_64-softmmu/hmp-commands-info.h
---
  CC  aarch64-softmmu/hw/arm/xlnx-zynqmp.o
/usr/lib/gcc/x86_64-w64-mingw32/9.2.1/../../../../x86_64-w64-mingw32/bin/ld: 
../block/qcow2.o: in function `qcow2_vz_insert_prealloc_filter':
/tmp/qemu-test/src/block/qcow2.c:1878: undefined reference to 
`bdrv_is_file_on_fuse'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:219: qemu-system-x86_64w.exe] Error 1
make: *** [Makefile:527: x86_64-softmmu/all] Error 2
  CC  aarch64-softmmu/hw/arm/xlnx-zcu102.o
  CC  aarch64-softmmu/hw/arm/xlnx-versal.o
  CC  aarch64-softmmu/hw/arm/xlnx-versal-virt.o
---
  LINKaarch64-softmmu/qemu-system-aarch64w.exe
/usr/lib/gcc/x86_64-w64-mingw32/9.2.1/../../../../x86_64-w64-mingw32/bin/ld: 
../block/qcow2.o: in function `qcow2_vz_insert_prealloc_filter':
/tmp/qemu-test/src/block/qcow2.c:1878: undefined reference to 
`bdrv_is_file_on_fuse'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:219: qemu-system-aarch64w.exe] Error 1
make: *** [Makefile:527: aarch64-softmmu/all] Error 2
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 709, in 
sys.exit(main())
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=46c939bfc42848c6bf98acc32e769fb0', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-v9t81w9y/src/docker-src.2020-08-17-05.40.50.16028:/var/tmp/qemu:z,ro',
 'qemu/fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=46c939bfc42848c6bf98acc32e769fb0
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-v9t81w9y/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real4m43.217s
user0m8.945s


The full log is available at
http://patchew.org/logs/20200817091553.283155-1-vsement...@virtuozzo.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH 09/12] blockdev: Fix a memleak in drive_backup_prepare()

2020-08-17 Thread Kevin Wolf
Am 14.08.2020 um 18:02 hat Pan Nengyuan geschrieben:
> 'local_err' seems forgot to propagate in error path, it'll cause
> a memleak. Fix it.
> 
> Reported-by: Euler Robot 
> Signed-off-by: Pan Nengyuan 

I wonder if using ERRP_GUARD() wouldn't simplify this. Anyway, the fix
looks correct:

Reviewed-by: Kevin Wolf 




Re: [PATCH 10/12] block/file-posix: fix a possible undefined behavior

2020-08-17 Thread Kevin Wolf
Am 14.08.2020 um 18:02 hat Pan Nengyuan geschrieben:
> local_err is not initialized to NULL, it will cause a assert error as below:
> qemu/util/error.c:59: error_setv: Assertion `*errp == NULL' failed.
> 
> Fixes: c6447510690
> Reported-by: Euler Robot 
> Signed-off-by: Pan Nengyuan 

Reviewed-by: Kevin Wolf 




[PATCH v3 12/12] block/qcow2: automatically insert preallocate filter when on FUSE

2020-08-17 Thread Vladimir Sementsov-Ogievskiy
vstorage has slow allocation, so this patch detect vstorage
(I hope, we don't use other FUSE filesystems) and inserts preallocate
filter between qcow2 node and its file child.

The following test executes more than 10 times faster
(43.2s -> 3.9s for me) with this patch. (/newssd3 is mount point of
vstorage, both cs and mds are on same ssd local ssd disk)

IMG=/newssd3/z
FILE_OPTS=file.filename=$IMG
COUNT=15000
CHUNK=64K
CLUSTER=1M
rm -f $IMG
./qemu-img create -f qcow2 -o cluster_size=$CLUSTER $IMG 1G
./qemu-img bench -c $COUNT -d 1 -s $CHUNK -w -t none -f qcow2 $IMG

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.c | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 6ad6bdc166..f56507158e 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1868,6 +1868,40 @@ static void coroutine_fn qcow2_open_entry(void *opaque)
 qemu_co_mutex_unlock(>lock);
 }
 
+static int qcow2_vz_insert_prealloc_filter(BlockDriverState *bs, Error **errp)
+{
+QDict *options;
+BlockDriverState *filter_bs;
+Error *local_err = NULL;
+int flags;
+
+if (!bdrv_is_file_on_fuse(bs->file->bs)) {
+/* Nothing to do */
+return 0;
+}
+
+/* Assume it's a vstorage */
+options = qdict_new();
+qdict_put_str(options, "driver", "preallocate");
+qdict_put_str(options, "file", bs->file->bs->node_name);
+flags = bdrv_is_read_only(bs->file->bs) ? 0 : BDRV_O_RDWR;
+filter_bs = bdrv_open(NULL, NULL, options, flags, errp);
+if (!filter_bs) {
+return -EINVAL;
+}
+
+bdrv_replace_node(bs->file->bs, filter_bs, _err);
+
+/*
+ * On failure we want to remove filter_bs, on success it's referenced now 
by
+ * qcow2 node.
+ */
+bdrv_unref(filter_bs);
+
+error_propagate(errp, local_err);
+return local_err ? -EINVAL : 0;
+}
+
 static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
   Error **errp)
 {
@@ -1886,6 +1920,10 @@ static int qcow2_open(BlockDriverState *bs, QDict 
*options, int flags,
 return -EINVAL;
 }
 
+if (qcow2_vz_insert_prealloc_filter(bs, errp) < 0) {
+return -EINVAL;
+}
+
 /* Initialise locks */
 qemu_co_mutex_init(>lock);
 
-- 
2.18.0




[PATCH v3 11/12] block: add bdrv_is_file_on_fuse helper

2020-08-17 Thread Vladimir Sementsov-Ogievskiy
Add a function to check, is it a file-posix node on top of file in
FUSE file system.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block.h |  2 ++
 block/file-posix.c| 21 +
 2 files changed, 23 insertions(+)

diff --git a/include/block/block.h b/include/block/block.h
index 877fda06a4..51e957f6fb 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -787,4 +787,6 @@ int coroutine_fn bdrv_co_copy_range(BdrvChild *src, 
uint64_t src_offset,
 BdrvChild *dst, uint64_t dst_offset,
 uint64_t bytes, BdrvRequestFlags 
read_flags,
 BdrvRequestFlags write_flags);
+
+bool bdrv_is_file_on_fuse(BlockDriverState *bs);
 #endif
diff --git a/block/file-posix.c b/block/file-posix.c
index 560d1c0a94..4100b8dc89 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -324,6 +324,27 @@ static bool dio_byte_aligned(int fd)
 return false;
 }
 
+static bool is_fuse(int fd)
+{
+#ifdef __linux__
+struct statfs buf;
+int ret;
+
+ret = fstatfs(fd, );
+if (ret == 0 && buf.f_type == FUSE_SUPER_MAGIC) {
+return true;
+}
+#endif
+return false;
+}
+
+bool bdrv_is_file_on_fuse(BlockDriverState *bs)
+{
+BDRVRawState *s = bs->opaque;
+
+return !strcmp(bs->drv->format_name, "file") && is_fuse(s->fd);
+}
+
 /* Check if read is allowed with given memory buffer and length.
  *
  * This function is used to check O_DIRECT memory buffer and request alignment.
-- 
2.18.0




[PATCH v3 09/12] iotests.py: add filter_img_check

2020-08-17 Thread Vladimir Sementsov-Ogievskiy
Add analog of bash _filter_qemu_img_check to python framework.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/iotests.py | 4 
 1 file changed, 4 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 369e9918b4..ef3da4ee61 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -402,6 +402,10 @@ def filter_img_info(output, filename):
 lines.append(line)
 return '\n'.join(lines)
 
+def filter_img_check(msg):
+msg = re.sub(r'.*allocated.*fragmented.*compressed clusters', '', msg)
+return re.sub(r'Image end offset: [0-9]+', '', msg).strip()
+
 def filter_imgfmt(msg):
 return msg.replace(imgfmt, 'IMGFMT')
 
-- 
2.18.0




[PATCH v3 08/12] iotests.py: add verify_o_direct helper

2020-08-17 Thread Vladimir Sementsov-Ogievskiy
Add python notrun-helper similar to _check_o_direct for bash tests.
To be used in the following commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/iotests.py | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 717b5b652c..369e9918b4 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -1083,6 +1083,12 @@ def _verify_aio_mode(supported_aio_modes: Sequence[str] 
= ()) -> None:
 if supported_aio_modes and (aiomode not in supported_aio_modes):
 notrun('not suitable for this aio mode: %s' % aiomode)
 
+def verify_o_direct() -> None:
+with FilePath('test_o_direct') as f:
+qemu_img_create('-f', 'raw', f, '1M')
+if 'O_DIRECT' in qemu_io('-f', 'raw', '-t', 'none', '-c', 'quit', f):
+notrun(f'file system at {test_dir} does not support O_DIRECT')
+
 def supports_quorum():
 return 'quorum' in qemu_img_pipe('--help')
 
-- 
2.18.0




[PATCH v3 02/12] block/io.c: drop assertion on double waiting for request serialisation

2020-08-17 Thread Vladimir Sementsov-Ogievskiy
The comments states, that on misaligned request we should have already
been waiting. But for bdrv_padding_rmw_read, we called
bdrv_mark_request_serialising with align = request_alignment, and now
we serialise with align = cluster_size. So we may have to wait again
with larger alignment.

Note, that the only user of BDRV_REQ_SERIALISING is backup which issues
cluster-aligned requests, so seems the assertion should not fire for
now. But it's wrong anyway.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/io.c | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/block/io.c b/block/io.c
index ad3a51ed53..b18680a842 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1881,7 +1881,6 @@ bdrv_co_write_req_prepare(BdrvChild *child, int64_t 
offset, uint64_t bytes,
   BdrvTrackedRequest *req, int flags)
 {
 BlockDriverState *bs = child->bs;
-bool waited;
 int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
 
 if (bs->read_only) {
@@ -1893,15 +1892,7 @@ bdrv_co_write_req_prepare(BdrvChild *child, int64_t 
offset, uint64_t bytes,
 assert(!(flags & ~BDRV_REQ_MASK));
 
 if (flags & BDRV_REQ_SERIALISING) {
-waited = bdrv_mark_request_serialising(req, bdrv_get_cluster_size(bs));
-/*
- * For a misaligned request we should have already waited earlier,
- * because we come after bdrv_padding_rmw_read which must be called
- * with the request already marked as serialising.
- */
-assert(!waited ||
-   (req->offset == req->overlap_offset &&
-req->bytes == req->overlap_bytes));
+bdrv_mark_request_serialising(req, bdrv_get_cluster_size(bs));
 } else {
 bdrv_wait_serialising_requests(req);
 }
-- 
2.18.0




[PATCH v3 05/12] block: bdrv_mark_request_serialising: split non-waiting function

2020-08-17 Thread Vladimir Sementsov-Ogievskiy
We'll need a separate function, which will only "mark" request
serialising with specified align but not wait for conflicting
requests. So, it will be like old bdrv_mark_request_serialising(),
before merging bdrv_wait_serialising_requests_locked() into it.

To reduce the possible mess, let's do the following:

Public function that does both marking and waiting will be called
bdrv_make_request_serialising, and private function which will only
"mark" will be called tracked_request_set_serialising().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block_int.h |  3 ++-
 block/file-posix.c|  2 +-
 block/io.c| 34 ++
 3 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 38dec0275b..4d56a1b141 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1039,7 +1039,8 @@ extern unsigned int bdrv_drain_all_count;
 void bdrv_apply_subtree_drain(BdrvChild *child, BlockDriverState *new_parent);
 void bdrv_unapply_subtree_drain(BdrvChild *child, BlockDriverState 
*old_parent);
 
-bool coroutine_fn bdrv_mark_request_serialising(BdrvTrackedRequest *req, 
uint64_t align);
+bool coroutine_fn bdrv_make_request_serialising(BdrvTrackedRequest *req,
+uint64_t align);
 BdrvTrackedRequest *coroutine_fn bdrv_co_get_self_request(BlockDriverState 
*bs);
 
 int get_tmp_filename(char *filename, int size);
diff --git a/block/file-posix.c b/block/file-posix.c
index 9a00d4190a..560d1c0a94 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2957,7 +2957,7 @@ raw_do_pwrite_zeroes(BlockDriverState *bs, int64_t 
offset, int bytes,
 req->bytes = end - req->offset;
 req->overlap_bytes = req->bytes;
 
-bdrv_mark_request_serialising(req, bs->bl.request_alignment);
+bdrv_make_request_serialising(req, bs->bl.request_alignment);
 }
 #endif
 
diff --git a/block/io.c b/block/io.c
index 36bbe4b9b1..96b1b9cf5f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -778,15 +778,14 @@ bdrv_wait_serialising_requests_locked(BdrvTrackedRequest 
*self)
 return waited;
 }
 
-bool bdrv_mark_request_serialising(BdrvTrackedRequest *req, uint64_t align)
+/* Called with req->bs->reqs_lock held */
+static void tracked_request_set_serialising(BdrvTrackedRequest *req,
+uint64_t align)
 {
-BlockDriverState *bs = req->bs;
 int64_t overlap_offset = req->offset & ~(align - 1);
 uint64_t overlap_bytes = ROUND_UP(req->offset + req->bytes, align)
- overlap_offset;
-bool waited;
 
-qemu_co_mutex_lock(>reqs_lock);
 if (!req->serialising) {
 atomic_inc(>bs->serialising_in_flight);
 req->serialising = true;
@@ -794,9 +793,6 @@ bool bdrv_mark_request_serialising(BdrvTrackedRequest *req, 
uint64_t align)
 
 req->overlap_offset = MIN(req->overlap_offset, overlap_offset);
 req->overlap_bytes = MAX(req->overlap_bytes, overlap_bytes);
-waited = bdrv_wait_serialising_requests_locked(req);
-qemu_co_mutex_unlock(>reqs_lock);
-return waited;
 }
 
 /**
@@ -882,6 +878,20 @@ static bool coroutine_fn 
bdrv_wait_serialising_requests(BdrvTrackedRequest *self
 return waited;
 }
 
+bool bdrv_make_request_serialising(BdrvTrackedRequest *req, uint64_t align)
+{
+bool waited;
+
+qemu_co_mutex_lock(>bs->reqs_lock);
+
+tracked_request_set_serialising(req, align);
+waited = bdrv_wait_serialising_requests_locked(req);
+
+qemu_co_mutex_unlock(>bs->reqs_lock);
+
+return waited;
+}
+
 static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset,
size_t size)
 {
@@ -1492,7 +1502,7 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild 
*child,
  * with each other for the same cluster.  For example, in copy-on-read
  * it ensures that the CoR read and write operations are atomic and
  * guest writes cannot interleave between them. */
-bdrv_mark_request_serialising(req, bdrv_get_cluster_size(bs));
+bdrv_make_request_serialising(req, bdrv_get_cluster_size(bs));
 } else {
 bdrv_wait_serialising_requests(req);
 }
@@ -1903,7 +1913,7 @@ bdrv_co_write_req_prepare(BdrvChild *child, int64_t 
offset, uint64_t bytes,
 assert(!(flags & ~BDRV_REQ_MASK));
 
 if (flags & BDRV_REQ_SERIALISING) {
-bdrv_mark_request_serialising(req, bdrv_get_cluster_size(bs));
+bdrv_make_request_serialising(req, bdrv_get_cluster_size(bs));
 } else {
 bdrv_wait_serialising_requests(req);
 }
@@ -2069,7 +2079,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild 
*child,
 
 padding = bdrv_init_padding(bs, offset, bytes, );
 if (padding) {
-bdrv_mark_request_serialising(req, align);
+bdrv_make_request_serialising(req, align);
 
 bdrv_padding_rmw_read(child, 

[PATCH v3 10/12] iotests: add 298 to test new preallocate filter driver

2020-08-17 Thread Vladimir Sementsov-Ogievskiy
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/298 | 50 ++
 tests/qemu-iotests/298.out |  6 +
 tests/qemu-iotests/group   |  1 +
 3 files changed, 57 insertions(+)
 create mode 100644 tests/qemu-iotests/298
 create mode 100644 tests/qemu-iotests/298.out

diff --git a/tests/qemu-iotests/298 b/tests/qemu-iotests/298
new file mode 100644
index 00..4f2087352a
--- /dev/null
+++ b/tests/qemu-iotests/298
@@ -0,0 +1,50 @@
+#!/usr/bin/env python3
+#
+# Test for preallocate filter
+#
+# Copyright (c) 2020 Virtuozzo International GmbH.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import os
+import iotests
+from iotests import log
+
+iotests.script_initialize(supported_fmts=['qcow2'])
+iotests.verify_o_direct()
+
+size = 10 * 1024 * 1024
+disk = iotests.file_path('disk')
+
+iotests.qemu_img_create('-f', iotests.imgfmt, disk, str(size))
+
+opts = f'driver={iotests.imgfmt},' \
+f'file.driver=preallocate,file.file.filename={disk}'
+p = iotests.QemuIoInteractive('--image-opts', '-t', 'none', opts)
+
+log(p.cmd('write 0 1M'), filters=[iotests.filter_qemu_io])
+p.cmd('flush')
+
+if os.path.getsize(disk) > 100 * 1024 * 1024:
+log('file in progress is big, preallocation works')
+
+p.close()
+
+if os.path.getsize(disk) < 10 * 1024 * 1024:
+log('file is small after close')
+
+# Check that there are no leaks.
+log(iotests.qemu_img_pipe('check', '-f', 'qcow2', disk),
+filters=[iotests.filter_img_check])
diff --git a/tests/qemu-iotests/298.out b/tests/qemu-iotests/298.out
new file mode 100644
index 00..baf8f8425c
--- /dev/null
+++ b/tests/qemu-iotests/298.out
@@ -0,0 +1,6 @@
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+file in progress is big, preallocation works
+file is small after close
+No errors were found on the image.
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 025ed5238d..ac4772b43f 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -306,6 +306,7 @@
 295 rw
 296 rw
 297 meta
+298 auto quick
 299 auto quick
 301 backing quick
 302 quick
-- 
2.18.0




[PATCH v3 07/12] block: introduce preallocate filter

2020-08-17 Thread Vladimir Sementsov-Ogievskiy
It's intended to be inserted between format and protocol nodes to
preallocate additional space (expanding protocol file) on writes
crossing EOF. It improves performance for file-systems with slow
allocation.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 docs/system/qemu-block-drivers.rst.inc |  26 +++
 qapi/block-core.json   |  20 +-
 block/preallocate.c| 291 +
 block/Makefile.objs|   1 +
 4 files changed, 337 insertions(+), 1 deletion(-)
 create mode 100644 block/preallocate.c

diff --git a/docs/system/qemu-block-drivers.rst.inc 
b/docs/system/qemu-block-drivers.rst.inc
index b052a6d14e..5bfa4f4116 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -952,3 +952,29 @@ on host and see if there are locks held by the QEMU 
process on the image file.
 More than one byte could be locked by the QEMU instance, each byte of which
 reflects a particular permission that is acquired or protected by the running
 block driver.
+
+Filter drivers
+~~
+
+Qemu supports several filter drivers, which doesn't store any data, but do some
+additional tasks, hooking io requests.
+
+.. program:: filter-drivers
+.. option:: preallocate
+
+  Preallocate filter driver is intended to be inserted between format
+  and protocol nodes and does preallocation of some additional space
+  (expanding the protocol file) on write. It may be used for
+  file-systems with slow allocation.
+
+  Supported options:
+
+  .. program:: preallocate
+  .. option:: prealloc-align
+
+On preallocation, align file length to this number, default 1M.
+
+  .. program:: preallocate
+  .. option:: prealloc-size
+
+How much to preallocate, default 128M.
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 197bdc1c36..b40448063b 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2805,7 +2805,7 @@
 'cloop', 'compress', 'copy-on-read', 'dmg', 'file', 'ftp', 'ftps',
 'gluster', 'host_cdrom', 'host_device', 'http', 'https', 'iscsi',
 'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
-'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
+'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
 { 'name': 'replication', 'if': 'defined(CONFIG_REPLICATION)' },
 'sheepdog',
 'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
@@ -3074,6 +3074,23 @@
   'data': { 'aes': 'QCryptoBlockOptionsQCow',
 'luks': 'QCryptoBlockOptionsLUKS'} }
 
+##
+# @BlockdevOptionsPreallocate:
+#
+# Filter driver intended to be inserted between format and protocol node
+# and do preallocation in protocol node on write.
+#
+# @prealloc-align: on preallocation, align file length to this number,
+# default 1048576 (1M)
+#
+# @prealloc-size: how much to preallocate, default 134217728 (128M)
+#
+# Since: 5.2
+##
+{ 'struct': 'BlockdevOptionsPreallocate',
+  'base': 'BlockdevOptionsGenericFormat',
+  'data': { '*prealloc-align': 'int', '*prealloc-size': 'int' } }
+
 ##
 # @BlockdevOptionsQcow2:
 #
@@ -3979,6 +3996,7 @@
   'null-co':'BlockdevOptionsNull',
   'nvme':   'BlockdevOptionsNVMe',
   'parallels':  'BlockdevOptionsGenericFormat',
+  'preallocate':'BlockdevOptionsPreallocate',
   'qcow2':  'BlockdevOptionsQcow2',
   'qcow':   'BlockdevOptionsQcow',
   'qed':'BlockdevOptionsGenericCOWFormat',
diff --git a/block/preallocate.c b/block/preallocate.c
new file mode 100644
index 00..6dbad8a0a2
--- /dev/null
+++ b/block/preallocate.c
@@ -0,0 +1,291 @@
+/*
+ * preallocate filter driver
+ *
+ * The driver performs preallocate operation: it is injected above
+ * some node, and before each write over EOF it does additional preallocating
+ * write-zeroes request.
+ *
+ * Copyright (c) 2020 Virtuozzo International GmbH.
+ *
+ * Author:
+ *  Sementsov-Ogievskiy Vladimir 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see .
+ */
+
+#include "qemu/osdep.h"
+
+#include "qapi/error.h"
+#include "qemu/module.h"
+#include "qemu/option.h"
+#include "qemu/units.h"
+#include "block/block_int.h"
+
+
+typedef struct BDRVPreallocateState {
+int64_t prealloc_size;
+int64_t prealloc_align;
+
+/*
+ * Filter is started as not-active, 

[PATCH v3 03/12] block/io: split out bdrv_find_conflicting_request

2020-08-17 Thread Vladimir Sementsov-Ogievskiy
To be reused in separate.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/io.c | 71 +++---
 1 file changed, 41 insertions(+), 30 deletions(-)

diff --git a/block/io.c b/block/io.c
index b18680a842..5b96715058 100644
--- a/block/io.c
+++ b/block/io.c
@@ -727,43 +727,54 @@ static bool tracked_request_overlaps(BdrvTrackedRequest 
*req,
 return true;
 }
 
+/* Called with self->bs->reqs_lock held */
+static BdrvTrackedRequest *
+bdrv_find_conflicting_request(BdrvTrackedRequest *self)
+{
+BdrvTrackedRequest *req;
+
+QLIST_FOREACH(req, >bs->tracked_requests, list) {
+if (req == self || (!req->serialising && !self->serialising)) {
+continue;
+}
+if (tracked_request_overlaps(req, self->overlap_offset,
+ self->overlap_bytes))
+{
+/*
+ * Hitting this means there was a reentrant request, for
+ * example, a block driver issuing nested requests.  This must
+ * never happen since it means deadlock.
+ */
+assert(qemu_coroutine_self() != req->co);
+
+/*
+ * If the request is already (indirectly) waiting for us, or
+ * will wait for us as soon as it wakes up, then just go on
+ * (instead of producing a deadlock in the former case).
+ */
+if (!req->waiting_for) {
+return req;
+}
+}
+}
+
+return NULL;
+}
+
 static bool coroutine_fn
 bdrv_wait_serialising_requests_locked(BlockDriverState *bs,
   BdrvTrackedRequest *self)
 {
 BdrvTrackedRequest *req;
-bool retry;
 bool waited = false;
 
-do {
-retry = false;
-QLIST_FOREACH(req, >tracked_requests, list) {
-if (req == self || (!req->serialising && !self->serialising)) {
-continue;
-}
-if (tracked_request_overlaps(req, self->overlap_offset,
- self->overlap_bytes))
-{
-/* Hitting this means there was a reentrant request, for
- * example, a block driver issuing nested requests.  This must
- * never happen since it means deadlock.
- */
-assert(qemu_coroutine_self() != req->co);
-
-/* If the request is already (indirectly) waiting for us, or
- * will wait for us as soon as it wakes up, then just go on
- * (instead of producing a deadlock in the former case). */
-if (!req->waiting_for) {
-self->waiting_for = req;
-qemu_co_queue_wait(>wait_queue, >reqs_lock);
-self->waiting_for = NULL;
-retry = true;
-waited = true;
-break;
-}
-}
-}
-} while (retry);
+while ((req = bdrv_find_conflicting_request(self))) {
+self->waiting_for = req;
+qemu_co_queue_wait(>wait_queue, >reqs_lock);
+self->waiting_for = NULL;
+waited = true;
+}
+
 return waited;
 }
 
-- 
2.18.0




Re: [RFC PATCH 03/22] qapi: Rename BlockExport to BlockExportOptions

2020-08-17 Thread Max Reitz
On 13.08.20 18:29, Kevin Wolf wrote:
> The name BlockExport will be used for the struct containing the runtime
> state of block exports, so change the name of export creation options.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  qapi/block-export.json | 12 ++--
>  block/monitor/block-hmp-cmds.c |  6 +++---
>  blockdev-nbd.c |  2 +-
>  qemu-storage-daemon.c  |  8 
>  4 files changed, 14 insertions(+), 14 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 0/7] block: Use definitions instead of magic values

2020-08-17 Thread Stefano Garzarella
On Fri, Aug 14, 2020 at 10:28:34AM +0200, Philippe Mathieu-Daudé wrote:
> Trivial block patches:
> - Fix a typo
> - Replace '1 << 30' by '1 * GiB' in null-co
> - Replace 512 by BDRV_SECTOR_SIZE when appropriate.
> 
> Philippe Mathieu-Daudé (7):
>   block/null: Make more explicit the driver default size is 1GiB
>   hw/ide/core: Trivial typo fix
>   hw/ide/core: Replace magic '512' value by BDRV_SECTOR_SIZE
>   hw/ide/ahci: Replace magic '512' value by BDRV_SECTOR_SIZE
>   hw/ide/atapi: Replace magic '512' value by BDRV_SECTOR_SIZE
>   hw/ide/pci: Replace magic '512' value by BDRV_SECTOR_SIZE
>   hw/scsi/scsi-disk: Replace magic '512' value by BDRV_SECTOR_SIZE
> 
>  block/null.c|  4 +++-
>  hw/ide/ahci.c   |  5 +++--
>  hw/ide/atapi.c  |  8 
>  hw/ide/core.c   | 25 +
>  hw/ide/pci.c|  2 +-
>  hw/scsi/scsi-disk.c | 44 +++-
>  6 files changed, 47 insertions(+), 41 deletions(-)

Series:
Reviewed-by: Stefano Garzarella 


Thanks for the cleaning that makes the code more readable!
Stefano




[PATCH v3 00/12] preallocate filter

2020-08-17 Thread Vladimir Sementsov-Ogievskiy
Hi all!

Here is a filter, which does preallocation on write.

In Virtuozzo we have to deal with some custom distributed storage
solution, where allocation is relatively expensive operation. We have to
workaround it in Qemu, so here is a new filter.

Patches 1-10 introduces the new filter and suggested to be merged
to master.

Patches 11,12 are here just to show how we are going to use the feature.
I don't know can they be somehow generally useful, ideas are welcome.

Difference from v1:
1-6 are new and substitutes bdrv_co_range_try_lock mechanism used in v1
07: add note to docs/system/qemu-block-drivers.rst.inc
add open options
rebase on new BDRV_REQ_NO_WAIT flag
drop bs->file check in _co_flush()
add active logic, to make it possible to insert filter dynamically
08,09: new
10: - use new iotests.verify_o_direct()
10: - add qemu-img check call
11,12: not for commit

Vladimir Sementsov-Ogievskiy (12):
  block: simplify comment to BDRV_REQ_SERIALISING
  block/io.c: drop assertion on double waiting for request serialisation
  block/io: split out bdrv_find_conflicting_request
  block/io: bdrv_wait_serialising_requests_locked: drop extra bs arg
  block: bdrv_mark_request_serialising: split non-waiting function
  block: introduce BDRV_REQ_NO_WAIT flag
  block: introduce preallocate filter
  iotests.py: add verify_o_direct helper
  iotests.py: add filter_img_check
  iotests: add 298 to test new preallocate filter driver
  block: add bdrv_is_file_on_fuse helper
  block/qcow2: automatically insert preallocate filter when on FUSE

 docs/system/qemu-block-drivers.rst.inc |  26 +++
 qapi/block-core.json   |  20 +-
 include/block/block.h  |  22 +-
 include/block/block_int.h  |   3 +-
 block/file-posix.c |  23 +-
 block/io.c | 131 ++-
 block/preallocate.c| 291 +
 block/qcow2.c  |  38 
 block/Makefile.objs|   1 +
 tests/qemu-iotests/298 |  50 +
 tests/qemu-iotests/298.out |   6 +
 tests/qemu-iotests/group   |   1 +
 tests/qemu-iotests/iotests.py  |  10 +
 13 files changed, 554 insertions(+), 68 deletions(-)
 create mode 100644 block/preallocate.c
 create mode 100644 tests/qemu-iotests/298
 create mode 100644 tests/qemu-iotests/298.out

-- 
2.18.0




[PATCH v3 04/12] block/io: bdrv_wait_serialising_requests_locked: drop extra bs arg

2020-08-17 Thread Vladimir Sementsov-Ogievskiy
bs is linked in req, so no needs to pass it separately. Most of
tracked-requests API doesn't have bs argument. Actually, after this
patch only tracked_request_begin has it, but it's for purpose.

While being here, also add a comment about what "_locked" is.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/io.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index 5b96715058..36bbe4b9b1 100644
--- a/block/io.c
+++ b/block/io.c
@@ -761,16 +761,16 @@ bdrv_find_conflicting_request(BdrvTrackedRequest *self)
 return NULL;
 }
 
+/* Called with self->bs->reqs_lock held */
 static bool coroutine_fn
-bdrv_wait_serialising_requests_locked(BlockDriverState *bs,
-  BdrvTrackedRequest *self)
+bdrv_wait_serialising_requests_locked(BdrvTrackedRequest *self)
 {
 BdrvTrackedRequest *req;
 bool waited = false;
 
 while ((req = bdrv_find_conflicting_request(self))) {
 self->waiting_for = req;
-qemu_co_queue_wait(>wait_queue, >reqs_lock);
+qemu_co_queue_wait(>wait_queue, >bs->reqs_lock);
 self->waiting_for = NULL;
 waited = true;
 }
@@ -794,7 +794,7 @@ bool bdrv_mark_request_serialising(BdrvTrackedRequest *req, 
uint64_t align)
 
 req->overlap_offset = MIN(req->overlap_offset, overlap_offset);
 req->overlap_bytes = MAX(req->overlap_bytes, overlap_bytes);
-waited = bdrv_wait_serialising_requests_locked(bs, req);
+waited = bdrv_wait_serialising_requests_locked(req);
 qemu_co_mutex_unlock(>reqs_lock);
 return waited;
 }
@@ -876,7 +876,7 @@ static bool coroutine_fn 
bdrv_wait_serialising_requests(BdrvTrackedRequest *self
 }
 
 qemu_co_mutex_lock(>reqs_lock);
-waited = bdrv_wait_serialising_requests_locked(bs, self);
+waited = bdrv_wait_serialising_requests_locked(self);
 qemu_co_mutex_unlock(>reqs_lock);
 
 return waited;
-- 
2.18.0




[PATCH v3 01/12] block: simplify comment to BDRV_REQ_SERIALISING

2020-08-17 Thread Vladimir Sementsov-Ogievskiy
1. BDRV_REQ_NO_SERIALISING doesn't exist already, don't mention it.

2. We are going to add one more user of BDRV_REQ_SERIALISING, so
   comment about backup becomes a bit confusing here. The use case in
   backup is documented in block/backup.c, so let's just drop
   duplication here.

3. The fact that BDRV_REQ_SERIALISING is only for write requests is
   omitted. Add a note.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block.h | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 6e36154061..b8f4e86e8d 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -53,16 +53,7 @@ typedef enum {
  * content. */
 BDRV_REQ_WRITE_UNCHANGED= 0x40,
 
-/*
- * BDRV_REQ_SERIALISING forces request serialisation for writes.
- * It is used to ensure that writes to the backing file of a backup process
- * target cannot race with a read of the backup target that defers to the
- * backing file.
- *
- * Note, that BDRV_REQ_SERIALISING is _not_ opposite in meaning to
- * BDRV_REQ_NO_SERIALISING. A more descriptive name for the latter might be
- * _DO_NOT_WAIT_FOR_SERIALISING, except that is too long.
- */
+/* Forces request serialisation. Use only with write requests. */
 BDRV_REQ_SERIALISING= 0x80,
 
 /* Execute the request only if the operation can be offloaded or otherwise
-- 
2.18.0




[PATCH v3 06/12] block: introduce BDRV_REQ_NO_WAIT flag

2020-08-17 Thread Vladimir Sementsov-Ogievskiy
Add flag to make serialising request no wait: if there are conflicting
requests, just return error immediately. It's will be used in upcoming
preallocate filter.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block.h |  9 -
 block/io.c| 13 -
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index b8f4e86e8d..877fda06a4 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -67,8 +67,15 @@ typedef enum {
  * written to qiov parameter which may be NULL.
  */
 BDRV_REQ_PREFETCH  = 0x200,
+
+/*
+ * If we need to wait for other requests, just fail immediately. Used
+ * only together with BDRV_REQ_SERIALISING.
+ */
+BDRV_REQ_NO_WAIT = 0x400,
+
 /* Mask of valid flags */
-BDRV_REQ_MASK   = 0x3ff,
+BDRV_REQ_MASK   = 0x7ff,
 } BdrvRequestFlags;
 
 typedef struct BlockSizes {
diff --git a/block/io.c b/block/io.c
index 96b1b9cf5f..fc6d44d302 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1911,9 +1911,20 @@ bdrv_co_write_req_prepare(BdrvChild *child, int64_t 
offset, uint64_t bytes,
 assert(!(bs->open_flags & BDRV_O_INACTIVE));
 assert((bs->open_flags & BDRV_O_NO_IO) == 0);
 assert(!(flags & ~BDRV_REQ_MASK));
+assert(!((flags & BDRV_REQ_NO_WAIT) && !(flags & BDRV_REQ_SERIALISING)));
 
 if (flags & BDRV_REQ_SERIALISING) {
-bdrv_make_request_serialising(req, bdrv_get_cluster_size(bs));
+qemu_co_mutex_lock(>reqs_lock);
+
+tracked_request_set_serialising(req, bdrv_get_cluster_size(bs));
+
+if ((flags & BDRV_REQ_NO_WAIT) && bdrv_find_conflicting_request(req)) {
+return -EBUSY;
+}
+
+bdrv_wait_serialising_requests_locked(req);
+
+qemu_co_mutex_unlock(>reqs_lock);
 } else {
 bdrv_wait_serialising_requests(req);
 }
-- 
2.18.0




Re: [PATCH 10/12] block/file-posix: fix a possible undefined behavior

2020-08-17 Thread Stefano Garzarella
On Fri, Aug 14, 2020 at 12:02:39PM -0400, Pan Nengyuan wrote:
> local_err is not initialized to NULL, it will cause a assert error as below:
> qemu/util/error.c:59: error_setv: Assertion `*errp == NULL' failed.
> 
> Fixes: c6447510690
> Reported-by: Euler Robot 
> Signed-off-by: Pan Nengyuan 
> ---
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> Cc: Aarushi Mehta 
> Cc: qemu-block@nongnu.org
> ---
>  block/file-posix.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefano Garzarella 

Thanks,
Stefano

> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 9a00d4190a..697a7d9eea 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -2113,7 +2113,7 @@ static void raw_aio_attach_aio_context(BlockDriverState 
> *bs,
>  #endif
>  #ifdef CONFIG_LINUX_IO_URING
>  if (s->use_linux_io_uring) {
> -Error *local_err;
> +Error *local_err = NULL;
>  if (!aio_setup_linux_io_uring(new_context, _err)) {
>  error_reportf_err(local_err, "Unable to use linux io_uring, "
>   "falling back to thread pool: ");
> -- 
> 2.18.2
> 
> 




Re: [RFC PATCH 02/22] qapi: Create block-export module

2020-08-17 Thread Max Reitz
On 13.08.20 18:29, Kevin Wolf wrote:
> Move all block export related types and commands from block-core to the
> new QAPI module block-export.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  qapi/block-core.json | 166 --
>  qapi/block-export.json   | 172 +++
>  qapi/qapi-schema.json|   1 +
>  include/block/nbd.h  |   2 +-
>  block/monitor/block-hmp-cmds.c   |   1 +
>  blockdev-nbd.c   |   2 +-
>  qemu-storage-daemon.c|   2 +-
>  qapi/Makefile.objs   |   5 +-
>  storage-daemon/qapi/qapi-schema.json |   1 +
>  9 files changed, 181 insertions(+), 171 deletions(-)
>  create mode 100644 qapi/block-export.json

[...]

> diff --git a/qapi/block-export.json b/qapi/block-export.json
> new file mode 100644
> index 00..62f4938e83
> --- /dev/null
> +++ b/qapi/block-export.json
> @@ -0,0 +1,172 @@

There should probably be a filetype modeline here (not trying to get
into any hornets’ nest here, but... :)).

With that:

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 01/22] nbd: Remove unused nbd_export_get_blockdev()

2020-08-17 Thread Max Reitz
On 13.08.20 18:29, Kevin Wolf wrote:
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/nbd.h | 2 --
>  nbd/server.c| 5 -
>  2 files changed, 7 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 00/16] hw/block/nvme: dma handling and address mapping cleanup

2020-08-17 Thread Klaus Jensen
On Jul 30 00:06, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> This series consists of patches that refactors dma read/write and adds a
> number of address mapping helper functions.
> 
> v2:
>   * hw/block/nvme: add mapping helpers
> - Add an assert in case of out of bounds array access. (Maxim)
> 
>   * hw/block/nvme: remove redundant has_sg member
> - Split the fix for the missing qemu_iov_destroy into a fresh patch
>   ("hw/block/nvme: destroy request iov before reuse"). (Minwoo)
> 
>   * hw/block/nvme: pass request along for tracing [DROPPED]
> - Dropped the patch and replaced it with a simple patch that just adds
>   tracing to the nvme_map_prp function ("hw/block/nvme: add tracing to
>   nvme_map_prp"). (Minwoo)
> 
>   * hw/block/nvme: add request mapping helper
> - Changed the name from nvme_map to nvme_map_dptr. (Minwoo, Maxim)
> 
>   * hw/block/nvme: add check for mdts
> - Don't touch the documentaiton for the cmb_size_mb and max_ioqpairs
>   parameters in this patch. (Minwoo)
> 
>   * hw/block/nvme: refactor NvmeRequest clearing [DROPPED]
> - Keep NvmeRequest structure clearing as "before use". (Maxim)
> 
>   * hw/block/nvme: add a namespace reference in NvmeRequest
>   * hw/block/nvme: remove NvmeCmd parameter
> - Squash these two patches together into "hw/block/nvme: add ns/cmd
>   references in NvmeRequest".
> 
>   * hw/block/nvme: consolidate qsg/iov clearing
> - Move the qsg/iov destroys to a new nvme_req_exit function that is called
>   after the cqe has been posted.
> 
>   * hw/block/nvme: remove explicit qsg/iov parameters
> - New patch. THe nvme_map_prp() function doesn't require the qsg and iov
>   parameters since it can just get them from the passed NvmeRequest.
> 
> Based-on: <20200706061303.246057-1-...@irrelevant.dk>
> 
> Klaus Jensen (16):
>   hw/block/nvme: memset preallocated requests structures
>   hw/block/nvme: add mapping helpers
>   hw/block/nvme: replace dma_acct with blk_acct equivalent
>   hw/block/nvme: remove redundant has_sg member
>   hw/block/nvme: destroy request iov before reuse
>   hw/block/nvme: refactor dma read/write
>   hw/block/nvme: add tracing to nvme_map_prp
>   hw/block/nvme: add request mapping helper
>   hw/block/nvme: verify validity of prp lists in the cmb
>   hw/block/nvme: refactor request bounds checking
>   hw/block/nvme: add check for mdts
>   hw/block/nvme: be consistent about zeros vs zeroes
>   hw/block/nvme: add ns/cmd references in NvmeRequest
>   hw/block/nvme: consolidate qsg/iov clearing
>   hw/block/nvme: use preallocated qsg/iov in nvme_dma_prp
>   hw/block/nvme: remove explicit qsg/iov parameters
> 
>  block/nvme.c  |   4 +-
>  hw/block/nvme.c   | 506 +++---
>  hw/block/nvme.h   |   4 +-
>  hw/block/trace-events |   4 +
>  include/block/nvme.h  |   4 +-
>  5 files changed, 340 insertions(+), 182 deletions(-)
> 
> -- 
> 2.27.0
> 
> 

Thanks for the reviews everyone.

Pushed to nvme-next.



Re: [PATCH v6 2/2] nvme: allow cmb and pmr to be enabled on same device

2020-08-17 Thread Klaus Jensen
On Jul 30 00:50, Klaus Jensen wrote:
> On Jul 29 15:01, Andrzej Jakowski wrote:
> > So far it was not possible to have CMB and PMR emulated on the same
> > device, because BAR2 was used exclusively either of PMR or CMB. This
> > patch places CMB at BAR4 offset so it not conflicts with MSI-X vectors.
> > 
> > Signed-off-by: Andrzej Jakowski 
> > ---
> 
> Well, I'm certainly happy now. LGTM!
> 
> Reviewed-by: Klaus Jensen 
> 

Are anyone willing to chip in with another review on this?