Re: [PATCH] hw/nvme: fix validation of ASQ and ACQ

2021-08-23 Thread Klaus Jensen
On Aug 23 19:47, Keith Busch wrote:
> On Mon, Aug 23, 2021 at 02:20:18PM +0200, Klaus Jensen wrote:
> > From: Klaus Jensen 
> > 
> > Address 0x0 is a valid address. Fix the admin submission and completion
> > queue address validation to not error out on this.
> 
> Indeed, there are environments that can use that address. It's a host error if
> the controller was enabled with invalid queue addresses anyway. The controller
> only needs to verify the lower bits are clear, which we do later.
> 
> Reviewed-by: Keith Busch 
> 

Thanks Keith,

Yeah, I noticed this with a VFIO-based driver where the IOVAs typically
start at 0x0.

And yes, I specifically refrained from adding any other sanity checks on
the addresses. I.e., we could add a check for ASQ != ACQ, but who are we
to judge ;)

Applied to nvme-next!


signature.asc
Description: PGP signature


Re: [PATCH] hw/nvme: fix validation of ASQ and ACQ

2021-08-23 Thread Keith Busch
On Mon, Aug 23, 2021 at 02:20:18PM +0200, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> Address 0x0 is a valid address. Fix the admin submission and completion
> queue address validation to not error out on this.

Indeed, there are environments that can use that address. It's a host error if
the controller was enabled with invalid queue addresses anyway. The controller
only needs to verify the lower bits are clear, which we do later.

Reviewed-by: Keith Busch 



Re: [PATCH v7 07/33] block/block-copy: introduce block_copy_set_copy_opts()

2021-08-23 Thread Vladimir Sementsov-Ogievskiy

23.08.2021 15:05, Vladimir Sementsov-Ogievskiy wrote:

10.08.2021 17:55, Hanna Reitz wrote:

On 04.08.21 11:37, Vladimir Sementsov-Ogievskiy wrote:

We'll need a possibility to set compress and use_copy_range options
after initialization of the state. So make corresponding part of
block_copy_state_new() separate and public.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/block-copy.h |  2 ++
  block/block-copy.c | 66 +-
  2 files changed, 39 insertions(+), 29 deletions(-)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 734389d32a..f0ba7bc828 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -28,6 +28,8 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
   int64_t cluster_size, bool 
use_copy_range,
   bool compress, Error **errp);
+void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
+  bool compress);
  void block_copy_set_progress_meter(BlockCopyState *s, ProgressMeter *pm);
  void block_copy_state_free(BlockCopyState *s);
diff --git a/block/block-copy.c b/block/block-copy.c
index 58b4345a5a..84c29fb233 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -315,21 +315,11 @@ static uint32_t block_copy_max_transfer(BdrvChild 
*source, BdrvChild *target)
   target->bs->bl.max_transfer));
  }
-BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
- int64_t cluster_size, bool use_copy_range,
- bool compress, Error **errp)
+/* Function should be called prior any actual copy request */


Given this is something the caller should know, I’d prefer putting this into 
the block-copy.h header.

However, I realize we have a wild mix of this in qemu and often do put such 
information into the C source file, so...


+void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
+  bool compress)
  {
-    BlockCopyState *s;
-    BdrvDirtyBitmap *copy_bitmap;
  bool is_fleecing;
-
-    copy_bitmap = bdrv_create_dirty_bitmap(source->bs, cluster_size, NULL,
-   errp);
-    if (!copy_bitmap) {
-    return NULL;
-    }
-    bdrv_disable_dirty_bitmap(copy_bitmap);
-
  /*
   * If source is in backing chain of target assume that target is going to 
be
   * used for "image fleecing", i.e. it should represent a kind of snapshot 
of
@@ -344,24 +334,12 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
   * For more information see commit f8d59dfb40bb and test
   * tests/qemu-iotests/222
   */
-    is_fleecing = bdrv_chain_contains(target->bs, source->bs);
+    is_fleecing = bdrv_chain_contains(s->target->bs, s->source->bs);
-    s = g_new(BlockCopyState, 1);
-    *s = (BlockCopyState) {
-    .source = source,
-    .target = target,
-    .copy_bitmap = copy_bitmap,
-    .cluster_size = cluster_size,
-    .len = bdrv_dirty_bitmap_size(copy_bitmap),
-    .write_flags = (is_fleecing ? BDRV_REQ_SERIALISING : 0) |
-    (compress ? BDRV_REQ_WRITE_COMPRESSED : 0),
-    .mem = shres_create(BLOCK_COPY_MAX_MEM),
-    .max_transfer = QEMU_ALIGN_DOWN(
-    block_copy_max_transfer(source, target),
-    cluster_size),
-    };
+    s->write_flags = (is_fleecing ? BDRV_REQ_SERIALISING : 0) |
+    (compress ? BDRV_REQ_WRITE_COMPRESSED : 0);


Shouldn’t we keep the is_fleecing check in block_copy_state_new()? We must 
perform it at least once, but there is no benefit in repeating it on every 
block_copy_set_copy_opts() call, right?



I think, it may help if user calls block_copy_set_copy_opts() after graph change




On the other hand, nobody actually call this function after generic graph 
change. And intended usage is start backup when source is already backing child 
of target.. Will change it, at least for code be more obvious and not raise 
questions.


--
Best regards,
Vladimir



Re: [PATCH v7 25/33] iotests.py: VM: add own __enter__ method

2021-08-23 Thread Vladimir Sementsov-Ogievskiy

04.08.2021 19:27, John Snow wrote:


On Wed, Aug 4, 2021 at 5:39 AM Vladimir Sementsov-Ogievskiy mailto:vsement...@virtuozzo.com>> wrote:

In superclass __enter__ method is defined with return value type hint
'QEMUMachine'. So, mypy thinks that return value of VM.__enter__ is
QEMUMachine. Let's redefine __enter__ in VM class, to give it correct
type hint.

Signed-off-by: Vladimir Sementsov-Ogievskiy mailto:vsement...@virtuozzo.com>>
Reviewed-by: Max Reitz mailto:mre...@redhat.com>>
---
  tests/qemu-iotests/iotests.py | 4 
  1 file changed, 4 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 89663dac06..025e288ddd 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -568,6 +568,10 @@ def remote_filename(path):
  class VM(qtest.QEMUQtestMachine):
      '''A QEMU VM'''

+    # Redefine __enter__ with proper type hint
+    def __enter__(self) -> 'VM':
+        return self
+
      def __init__(self, path_suffix=''):
          name = "qemu%s-%d" % (path_suffix, os.getpid())
          super().__init__(qemu_prog, qemu_opts, name=name,
-- 
2.29.2



First and foremost:

Reviewed-by: John Snow mailto:js...@redhat.com>>
Acked-by: John Snow mailto:js...@redhat.com>>

A more durable approach might be to annotate QEMUMachine differently such that 
subclasses get the right types automatically. See if this following snippet 
works instead of adding a new __enter__ method.

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index 971ed7e8c6a..2e410103569 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -36,6 +36,7 @@
      Sequence,
      Tuple,
      Type,
+    TypeVar,
  )

  from qemu.qmp import (  # pylint: disable=import-error
@@ -73,6 +74,9 @@ class AbnormalShutdown(QEMUMachineError):
      """


+_T = TypeVar('_T', bound='QEMUMachine')
+
+
  class QEMUMachine:
      """
      A QEMU VM.
@@ -166,7 +170,7 @@ def __init__(self,
          self._remove_files: List[str] = []
          self._user_killed = False

-    def __enter__(self) -> 'QEMUMachine':
+    def __enter__(self: _T) -> _T:
          return self

      def __exit__(self,
@@ -182,8 +186,8 @@ def add_monitor_null(self) -> None:
          self._args.append('-monitor')
          self._args.append('null')

-    def add_fd(self, fd: int, fdset: int,
-               opaque: str, opts: str = '') -> 'QEMUMachine':
+    def add_fd(self: _T, fd: int, fdset: int,
+               opaque: str, opts: str = '') -> _T:
          """
          Pass a file descriptor to the VM
          """


That looks better, I'll go this way, thanks!

--
Best regards,
Vladimir



Re: [PATCH v7 17/33] block/copy-before-write: bdrv_cbw_append(): drop unused compress arg

2021-08-23 Thread Vladimir Sementsov-Ogievskiy

10.08.2021 18:17, Hanna Reitz wrote:

On 04.08.21 11:37, Vladimir Sementsov-Ogievskiy wrote:

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/copy-before-write.h | 1 -
  block/backup.c    | 2 +-
  block/copy-before-write.c | 7 +++
  3 files changed, 4 insertions(+), 6 deletions(-)


Well, I mean, it isn’t exactly syntactically unused before this patch; it’s 
only unused because this patch drops it from cbw_init(), too.

It definitely is functionally unused, though, so:

Reviewed-by: Hanna Reitz 

I wonder whether we should drop @use_copy_range and @compress from 
block_copy_state_new(), too, though.



Yes they are useless now.. I can add a patch.


--
Best regards,
Vladimir



hw/nvme: fix verification of select field in namespace attachment

2021-08-23 Thread Naveen
Fix is added to check for reserved value in select field for 
namespace attachment

Signed-off-by: Naveen Nagar 
Signed-off-by: Klaus Jensen 
cc: Minwoo Im 

---
 hw/nvme/ctrl.c   | 13 +
 include/block/nvme.h |  5 +
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 6baf9e0..2c59c74 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -5191,7 +5191,7 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, 
NvmeRequest *req)
 uint16_t list[NVME_CONTROLLER_LIST_SIZE] = {};
 uint32_t nsid = le32_to_cpu(req->cmd.nsid);
 uint32_t dw10 = le32_to_cpu(req->cmd.cdw10);
-bool attach = !(dw10 & 0xf);
+uint8_t sel = dw10 & 0xf;
 uint16_t *nr_ids = [0];
 uint16_t *ids = [1];
 uint16_t ret;
@@ -5224,7 +5224,8 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, 
NvmeRequest *req)
 return NVME_NS_CTRL_LIST_INVALID | NVME_DNR;
 }
 
-if (attach) {
+switch (sel) {
+case NVME_NS_ATTACHMENT_ATTACH:
 if (nvme_ns(ctrl, nsid)) {
 return NVME_NS_ALREADY_ATTACHED | NVME_DNR;
 }
@@ -5235,7 +5236,8 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, 
NvmeRequest *req)
 
 nvme_attach_ns(ctrl, ns);
 nvme_select_iocs_ns(ctrl, ns);
-} else {
+break;
+case NVME_NS_ATTACHMENT_DETACH:
 if (!nvme_ns(ctrl, nsid)) {
 return NVME_NS_NOT_ATTACHED | NVME_DNR;
 }
@@ -5244,8 +5246,11 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, 
NvmeRequest *req)
 ns->attached--;
 
 nvme_update_dmrsl(ctrl);
+break;
+default:
+return NVME_INVALID_FIELD | NVME_DNR;
 }
-
+
 /*
  * Add namespace id to the changed namespace id list for event clearing
  * via Get Log Page command.
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 77aae01..e3bd47b 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1154,6 +1154,11 @@ enum NvmeIdCtrlCmic {
 NVME_CMIC_MULTI_CTRL= 1 << 1,
 };
 
+enum NvmeNsAttachmentOperation {
+NVME_NS_ATTACHMENT_ATTACH = 0x0,
+NVME_NS_ATTACHMENT_DETACH = 0x1,
+};
+
 #define NVME_CTRL_SQES_MIN(sqes) ((sqes) & 0xf)
 #define NVME_CTRL_SQES_MAX(sqes) (((sqes) >> 4) & 0xf)
 #define NVME_CTRL_CQES_MIN(cqes) ((cqes) & 0xf)
-- 
1.8.3.1




Re: [PATCH v6 1/6] qmp: add QMP command x-debug-query-virtio

2021-08-23 Thread Markus Armbruster
Back from my summer break, please excuse the delay.

Jonah Palmer  writes:

> On 8/7/21 8:35 AM, Markus Armbruster wrote:
>> QAPI schema review only.
>>
>> Jonah Palmer  writes:
>>
>>> From: Laurent Vivier 
>>>
>>> This new command lists all the instances of VirtIODevice with
>>> their path and virtio type.
>>>
>>> Signed-off-by: Laurent Vivier 
>>> Reviewed-by: Eric Blake 
>>> Signed-off-by: Jonah Palmer 
>> [...]
>>
>>> diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
>>> index 4912b97..0c89789 100644
>>> --- a/qapi/qapi-schema.json
>>> +++ b/qapi/qapi-schema.json
>>> @@ -91,5 +91,6 @@
>>>   { 'include': 'misc.json' }
>>>   { 'include': 'misc-target.json' }
>>>   { 'include': 'audio.json' }
>>> +{ 'include': 'virtio.json' }
>>>   { 'include': 'acpi.json' }
>>>   { 'include': 'pci.json' }
>>> diff --git a/qapi/virtio.json b/qapi/virtio.json
>>> new file mode 100644
>>> index 000..804adbe
>>> --- /dev/null
>>> +++ b/qapi/virtio.json
>>> @@ -0,0 +1,72 @@
>> Please insert at the beginning
>>
>> # -*- Mode: Python -*-
>> # vim: filetype=python
>> #
>
> Will do.
>
>>> +##
>>> +# = Virtio devices
>>> +##
>>> +
>>> +##
>>> +# @VirtioType:
>>> +#
>>> +# An enumeration of Virtio device types.
>>> +#
>>> +# Since: 6.1
>> 6.2 now, here and below.
>
> Okay, will update for entire series.
>
>>
>>> +##
>>> +{ 'enum': 'VirtioType',
>>> +  'data': [ 'unknown', 'virtio-net', 'virtio-blk', 'virtio-console',
>>> +'virtio-rng', 'virtio-balloon', 'virtio-iomem', 'virtio-rpmsg',
>>> +'virtio-scsi', 'virtio-9p', 'virtio-mac80211-wlan',
>>> +'virtio-serial', 'virtio-caif', 'virtio-memory-balloon',
>>> +'unknown-14', 'unknown-15', 'virtio-gpu', 'virtio-clock',
>>> +'virtio-input', 'vhost-vsock', 'virtio-crypto', 
>>> 'virtio-signal-dist',
>>> +'virtio-pstore', 'virtio-iommu', 'virtio-mem', 'unknown-25',
>>> +'vhost-user-fs', 'virtio-pmem', 'unknown-28', 
>>> 'virtio-mac80211-hwsim' ]
>> Please limit line length to approximately 70 characters.
>
> Fixed, sorry about that. Also, should I continue this up to 
> 'virtio-bluetooth'? E.g:
>
> ...
> 'virtio-mac80211-hwsim', 'unknown-30', 'unknown-31',
> 'unknown-32', 'unknown-33', 'unknown-34', 'unknown-35',
> 'unknown-36', 'unknown-37', 'unknown-38', 'unknown-39',
> 'virtio-bluetooth' ]

Hmm...  how is this enum used?  In this patch:

VirtioInfoList *qmp_x_debug_query_virtio(Error **errp)
{
VirtioInfoList *list = NULL;
VirtioInfoList *node;
VirtIODevice *vdev;

QTAILQ_FOREACH(vdev, _list, next) {
DeviceState *dev = DEVICE(vdev);
node = g_new0(VirtioInfoList, 1);
node->value = g_new(VirtioInfo, 1);
node->value->path = g_strdup(dev->canonical_path);
--->node->value->type = qapi_enum_parse(_lookup, vdev->name,
--->VIRTIO_TYPE_UNKNOWN, NULL);
QAPI_LIST_PREPEND(list, node->value);
}

return list;
}

This maps VirtioDevice member @name (a string) to an enum value.

As far as I can tell, this member is set in virtio_init() only, to its
argument.  All callers pass string literals.  They also pass a numeric
device_id, and the two seem to match, e.g. "virtio-blk" and
VIRTIO_ID_BLOCK.

Thus, the pairs (numeric device ID, string device ID) already exist in
the code, but not in a way that lets you map between them.  To get that,
you *duplicate* the pairs in QAPI.

Having two copies means we get to keep them consistent.  Can we avoid
that?

The enum has a special member 'unknown' that gets used when @name does
not parse as enum member, i.e. we failed at keeping the copies
consistent.  I'm afraid this sweeps a programming error under the rug.

The enum has a bunch of dummy members like 'unknown-14' to make QAPI
generate numeric enum values match the device IDs.  Error prone.  Can't
see offhand why we need them to match.

What about the following.  Define a map from numeric device ID to
string, like so

const char *virtio_device_name[] = {
[VIRTIO_ID_NET] = "virtio-net",
[VIRTIO_ID_BLOCK] = "virtio-blk",
...
}

This lets you

* map device ID to string by subscripting virtio_device_name[].
  Guarding with assert(device_id < G_N_ELEMENTS(virtio_device_name)) may
  be advisable.

* map string to device ID by searching virtio_device_name[].  May want a
  function for that,

Now you can have virtio_init() map parameter @device_id to string, and
drop parameter @name.

Then you have two options:

1. With QAPI enum VirtioType

   Define it without the special and the dummy members.

   To map from string to QAPI enum, use qapi_enum_parse().

   To map from QAPI enum to string, use VirtioType_str().

   To map from QAPI enum to device ID, map through string.

2. Without QAPI enum VirtioType

   Simply use uint16_t device_id, just like struct VirtioDevice.

The choice between 1. and 2. depends on 

Re: [PATCH 3/3] qcow2: handle_dependencies(): relax conflict detection

2021-08-23 Thread Vladimir Sementsov-Ogievskiy

20.08.2021 16:21, Hanna Reitz wrote:

On 24.07.21 15:38, Vladimir Sementsov-Ogievskiy wrote:

There is no conflict and no dependency if we have parallel writes to
different subclusters of one cluster when cluster itself is already
allocated. So, relax extra dependency.

Measure performance:
First, prepare build/qemu-img-old and build/qemu-img-new images.

cd scripts/simplebench
./img_bench_templater.py

Paste the following to stdin of running script:

qemu_img=../../build/qemu-img-{old|new}
$qemu_img create -f qcow2 -o extended_l2=on /ssd/x.qcow2 1G
$qemu_img bench -c 10 -d 8 [-s 2K|-s 2K -o 512|-s $((1024*2+512))] \
 -w -t none -n /ssd/x.qcow2

The result:

All results are in seconds

--  -  -
 old    new
-s 2K   6.7 ± 15%  6.2 ± 12%
  -7%
-s 2K -o 512    13 ± 3%    11 ± 5%
  -16%
-s $((1024*2+512))  9.5 ± 4%   8.4
  -12%
--  -  -

So small writes are more independent now and that helps to keep deeper
io queue which improves performance.

271 iotest output becomes racy for three allocation in one cluster.
Second and third writes may finish in different order. Second and
third requests don't depend on each other any more. Still they both
depend on first request anyway. Keep only one for consistent output.


I mean, we could also just filter the result (`s/\(20480\|40960\)/FILTERED/` or 
something).  Perhaps there was some idea behind doing three writes, I don’t 
know exactly.

I think I’d prefer a filter, because I guess this is the only test that 
actually will do two subcluster requests in parallel...?



Reasonable, will do




Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/qcow2-cluster.c  | 11 +++
  tests/qemu-iotests/271 |  4 +---
  tests/qemu-iotests/271.out |  2 --
  3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 967121c7e6..8f56de5516 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1403,6 +1403,17 @@ static int handle_dependencies(BlockDriverState *bs, 
uint64_t guest_offset,
  continue;
  }
+    if (old_alloc->keep_old_clusters &&
+    (end <= l2meta_cow_start(old_alloc) ||
+ start >= l2meta_cow_end(old_alloc)))
+    {
+    /*
+ * Clusters intersect but COW areas don't. And cluster itself is
+ * already allocated. So, there is no actual conflict.
+ */
+    continue;
+    }
+
  /* Conflict */
  if (start < old_start) {
diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271
index 599b849cc6..939e88ee88 100755
--- a/tests/qemu-iotests/271
+++ b/tests/qemu-iotests/271
@@ -866,7 +866,7 @@ echo
  _concurrent_io()
  {
-# Allocate three subclusters in the same cluster.
+# Allocate two subclusters in the same cluster.
  # This works because handle_dependencies() checks whether the requests
  # allocate the same cluster, even if the COW regions don't overlap (in
  # this case they don't).
@@ -876,7 +876,6 @@ break write_aio A
  aio_write -P 10 30k 2k
  wait_break A
  aio_write -P 11 20k 2k
-aio_write -P 12 40k 2k
  resume A
  aio_flush
  EOF
@@ -888,7 +887,6 @@ cat <




--
Best regards,
Vladimir



[PATCH] hw/nvme: fix validation of ASQ and ACQ

2021-08-23 Thread Klaus Jensen
From: Klaus Jensen 

Address 0x0 is a valid address. Fix the admin submission and completion
queue address validation to not error out on this.

Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 8 
 hw/nvme/trace-events | 2 --
 2 files changed, 10 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 6baf9e0420d5..ff784851137e 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -5623,14 +5623,6 @@ static int nvme_start_ctrl(NvmeCtrl *n)
 trace_pci_nvme_err_startfail_sq();
 return -1;
 }
-if (unlikely(!asq)) {
-trace_pci_nvme_err_startfail_nbarasq();
-return -1;
-}
-if (unlikely(!acq)) {
-trace_pci_nvme_err_startfail_nbaracq();
-return -1;
-}
 if (unlikely(asq & (page_size - 1))) {
 trace_pci_nvme_err_startfail_asq_misaligned(asq);
 return -1;
diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events
index 430eeb395b24..ff6cafd520df 100644
--- a/hw/nvme/trace-events
+++ b/hw/nvme/trace-events
@@ -159,8 +159,6 @@ pci_nvme_err_invalid_setfeat(uint32_t dw10) "invalid set 
features, dw10=0x%"PRIx
 pci_nvme_err_invalid_log_page(uint16_t cid, uint16_t lid) "cid %"PRIu16" lid 
0x%"PRIx16""
 pci_nvme_err_startfail_cq(void) "nvme_start_ctrl failed because there are 
non-admin completion queues"
 pci_nvme_err_startfail_sq(void) "nvme_start_ctrl failed because there are 
non-admin submission queues"
-pci_nvme_err_startfail_nbarasq(void) "nvme_start_ctrl failed because the admin 
submission queue address is null"
-pci_nvme_err_startfail_nbaracq(void) "nvme_start_ctrl failed because the admin 
completion queue address is null"
 pci_nvme_err_startfail_asq_misaligned(uint64_t addr) "nvme_start_ctrl failed 
because the admin submission queue address is misaligned: 0x%"PRIx64""
 pci_nvme_err_startfail_acq_misaligned(uint64_t addr) "nvme_start_ctrl failed 
because the admin completion queue address is misaligned: 0x%"PRIx64""
 pci_nvme_err_startfail_page_too_small(uint8_t log2ps, uint8_t maxlog2ps) 
"nvme_start_ctrl failed because the page size is too small: log2size=%u, min=%u"
-- 
2.32.0




Re: [PATCH v7 07/33] block/block-copy: introduce block_copy_set_copy_opts()

2021-08-23 Thread Vladimir Sementsov-Ogievskiy

10.08.2021 17:55, Hanna Reitz wrote:

On 04.08.21 11:37, Vladimir Sementsov-Ogievskiy wrote:

We'll need a possibility to set compress and use_copy_range options
after initialization of the state. So make corresponding part of
block_copy_state_new() separate and public.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/block-copy.h |  2 ++
  block/block-copy.c | 66 +-
  2 files changed, 39 insertions(+), 29 deletions(-)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 734389d32a..f0ba7bc828 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -28,6 +28,8 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
   int64_t cluster_size, bool 
use_copy_range,
   bool compress, Error **errp);
+void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
+  bool compress);
  void block_copy_set_progress_meter(BlockCopyState *s, ProgressMeter *pm);
  void block_copy_state_free(BlockCopyState *s);
diff --git a/block/block-copy.c b/block/block-copy.c
index 58b4345a5a..84c29fb233 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -315,21 +315,11 @@ static uint32_t block_copy_max_transfer(BdrvChild 
*source, BdrvChild *target)
   target->bs->bl.max_transfer));
  }
-BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
- int64_t cluster_size, bool use_copy_range,
- bool compress, Error **errp)
+/* Function should be called prior any actual copy request */


Given this is something the caller should know, I’d prefer putting this into 
the block-copy.h header.

However, I realize we have a wild mix of this in qemu and often do put such 
information into the C source file, so...


+void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
+  bool compress)
  {
-    BlockCopyState *s;
-    BdrvDirtyBitmap *copy_bitmap;
  bool is_fleecing;
-
-    copy_bitmap = bdrv_create_dirty_bitmap(source->bs, cluster_size, NULL,
-   errp);
-    if (!copy_bitmap) {
-    return NULL;
-    }
-    bdrv_disable_dirty_bitmap(copy_bitmap);
-
  /*
   * If source is in backing chain of target assume that target is going to 
be
   * used for "image fleecing", i.e. it should represent a kind of snapshot 
of
@@ -344,24 +334,12 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
   * For more information see commit f8d59dfb40bb and test
   * tests/qemu-iotests/222
   */
-    is_fleecing = bdrv_chain_contains(target->bs, source->bs);
+    is_fleecing = bdrv_chain_contains(s->target->bs, s->source->bs);
-    s = g_new(BlockCopyState, 1);
-    *s = (BlockCopyState) {
-    .source = source,
-    .target = target,
-    .copy_bitmap = copy_bitmap,
-    .cluster_size = cluster_size,
-    .len = bdrv_dirty_bitmap_size(copy_bitmap),
-    .write_flags = (is_fleecing ? BDRV_REQ_SERIALISING : 0) |
-    (compress ? BDRV_REQ_WRITE_COMPRESSED : 0),
-    .mem = shres_create(BLOCK_COPY_MAX_MEM),
-    .max_transfer = QEMU_ALIGN_DOWN(
-    block_copy_max_transfer(source, target),
-    cluster_size),
-    };
+    s->write_flags = (is_fleecing ? BDRV_REQ_SERIALISING : 0) |
+    (compress ? BDRV_REQ_WRITE_COMPRESSED : 0);


Shouldn’t we keep the is_fleecing check in block_copy_state_new()? We must 
perform it at least once, but there is no benefit in repeating it on every 
block_copy_set_copy_opts() call, right?



I think, it may help if user calls block_copy_set_copy_opts() after graph change


--
Best regards,
Vladimir



Re: [PATCH V2] block/rbd: implement bdrv_co_block_status

2021-08-23 Thread Peter Lieven

Am 22.08.21 um 23:02 schrieb Ilya Dryomov:

On Tue, Aug 10, 2021 at 3:41 PM Peter Lieven  wrote:

the qemu rbd driver currently lacks support for bdrv_co_block_status.
This results mainly in incorrect progress during block operations (e.g.
qemu-img convert with an rbd image as source).

This patch utilizes the rbd_diff_iterate2 call from librbd to detect
allocated and unallocated (all zero areas).

To avoid querying the ceph OSDs for the answer this is only done if
the image has the fast-diff features which depends on the object-map

Hi Peter,

Nit: "has the fast-diff feature which depends on the object-map and
exclusive-lock features"



will reword in V3.





and exclusive-lock. In this case it is guaranteed that the information
is present in memory in the librbd client and thus very fast.

If fast-diff is not available all areas are reported to be allocated
which is the current behaviour if bdrv_co_block_status is not implemented.

Signed-off-by: Peter Lieven 
---
V1->V2:
- add commit comment [Stefano]
- use failed_post_open [Stefano]
- remove redundant assert [Stefano]
- add macro+comment for the magic -9000 value [Stefano]
- always set *file if its non NULL [Stefano]

  block/rbd.c | 125 
  1 file changed, 125 insertions(+)

diff --git a/block/rbd.c b/block/rbd.c
index dcf82b15b8..8692e76f40 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -88,6 +88,7 @@ typedef struct BDRVRBDState {
  char *namespace;
  uint64_t image_size;
  uint64_t object_size;
+uint64_t features;
  } BDRVRBDState;

  typedef struct RBDTask {
@@ -983,6 +984,13 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
  s->image_size = info.size;
  s->object_size = info.obj_size;

+r = rbd_get_features(s->image, >features);
+if (r < 0) {
+error_setg_errno(errp, -r, "error getting image features from %s",
+ s->image_name);
+goto failed_post_open;
+}

The object-map and fast-diff features can be enabled/disabled while the
image is open so this should probably go to qemu_rbd_co_block_status().


+
  /* If we are using an rbd snapshot, we must be r/o, otherwise
   * leave as-is */
  if (s->snap != NULL) {
@@ -1259,6 +1267,122 @@ static ImageInfoSpecific 
*qemu_rbd_get_specific_info(BlockDriverState *bs,
  return spec_info;
  }

+typedef struct rbd_diff_req {
+uint64_t offs;
+uint64_t bytes;
+int exists;
+} rbd_diff_req;
+
+/*
+ * rbd_diff_iterate2 allows to interrupt the exection by returning a negative
+ * value in the callback routine. Choose a value that does not conflict with
+ * an existing exitcode and return it if we want to prematurely stop the
+ * execution because we detected a change in the allocation status.
+ */
+#define QEMU_RBD_EXIT_DIFF_ITERATE2 -9000
+
+static int qemu_rbd_co_block_status_cb(uint64_t offs, size_t len,
+   int exists, void *opaque)
+{
+struct rbd_diff_req *req = opaque;
+
+assert(req->offs + req->bytes <= offs);
+
+if (req->exists && offs > req->offs + req->bytes) {
+/*
+ * we started in an allocated area and jumped over an unallocated area,
+ * req->bytes contains the length of the allocated area before the
+ * unallocated area. stop further processing.
+ */
+return QEMU_RBD_EXIT_DIFF_ITERATE2;
+}
+if (req->exists && !exists) {
+/*
+ * we started in an allocated area and reached a hole. req->bytes
+ * contains the length of the allocated area before the hole.
+ * stop further processing.
+ */
+return QEMU_RBD_EXIT_DIFF_ITERATE2;
+}
+if (!req->exists && exists && offs > req->offs) {
+/*
+ * we started in an unallocated area and hit the first allocated
+ * block. req->bytes must be set to the length of the unallocated area
+ * before the allocated area. stop further processing.
+ */
+req->bytes = offs - req->offs;
+return QEMU_RBD_EXIT_DIFF_ITERATE2;
+}
+
+/*
+ * assert that we catched all cases above and allocation state has not

catched -> caught


+ * changed during callbacks.
+ */
+assert(exists == req->exists || !req->bytes);
+req->exists = exists;
+
+/*
+ * assert that we either return an unallocated block or have got callbacks
+ * for all allocated blocks present.
+ */
+assert(!req->exists || offs == req->offs + req->bytes);
+req->bytes = offs + len - req->offs;
+
+return 0;
+}
+
+static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
+ bool want_zero, int64_t 
offset,
+ int64_t bytes, int64_t *pnum,
+ int64_t *map,
+ BlockDriverState **file)
+{
+