Re: [Qemu-block] [PATCH v2] vmdk: align end of file to a sector boundary

2018-09-12 Thread Fam Zheng
On Thu, 09/13 10:31, yuchen...@synology.com wrote:
> From: yuchenlin 
> 
> There is a rare case which the size of last compressed cluster
> is larger than the cluster size, which will cause the file is
> not aligned at the sector boundary.

The code looks good to me. Can you also explain why it is important to align
file size to sector boundary in the comment?

Fam

> 
> Signed-off-by: yuchenlin 
> ---
> v1 -> v2:
> * Add more detail comment.
> * Add QEMU_ALIGN_UP to show the intention more clearly.
> * Add newline in the end of bdrv_truncate.
> 
> thanks
> 
>  block/vmdk.c | 21 +
>  1 file changed, 21 insertions(+)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index a9d0084e36..2c9e86d98f 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -1698,6 +1698,27 @@ static int coroutine_fn
>  vmdk_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
> uint64_t bytes, QEMUIOVector *qiov)
>  {
> +if (bytes == 0) {
> +/* The caller will write bytes 0 to signal EOF.
> + * When receive it, we align EOF to a sector boundary. */
> +BDRVVmdkState *s = bs->opaque;
> +int i, ret;
> +int64_t length;
> +
> +for (i = 0; i < s->num_extents; i++) {
> +length = bdrv_getlength(s->extents[i].file->bs);
> +if (length < 0) {
> +return length;
> +}
> +length = QEMU_ALIGN_UP(length, BDRV_SECTOR_SIZE);
> +ret = bdrv_truncate(s->extents[i].file, length,
> +PREALLOC_MODE_OFF, NULL);
> +if (ret < 0) {
> +return ret;
> +}
> +}
> +return 0;
> +}
>  return vmdk_co_pwritev(bs, offset, bytes, qiov, 0);
>  }
>  
> -- 
> 2.18.0
> 




[Qemu-block] [PATCH v2] vmdk: align end of file to a sector boundary

2018-09-12 Thread yuchenlin
From: yuchenlin 

There is a rare case which the size of last compressed cluster
is larger than the cluster size, which will cause the file is
not aligned at the sector boundary.

Signed-off-by: yuchenlin 
---
v1 -> v2:
* Add more detail comment.
* Add QEMU_ALIGN_UP to show the intention more clearly.
* Add newline in the end of bdrv_truncate.

thanks

 block/vmdk.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/block/vmdk.c b/block/vmdk.c
index a9d0084e36..2c9e86d98f 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1698,6 +1698,27 @@ static int coroutine_fn
 vmdk_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
uint64_t bytes, QEMUIOVector *qiov)
 {
+if (bytes == 0) {
+/* The caller will write bytes 0 to signal EOF.
+ * When receive it, we align EOF to a sector boundary. */
+BDRVVmdkState *s = bs->opaque;
+int i, ret;
+int64_t length;
+
+for (i = 0; i < s->num_extents; i++) {
+length = bdrv_getlength(s->extents[i].file->bs);
+if (length < 0) {
+return length;
+}
+length = QEMU_ALIGN_UP(length, BDRV_SECTOR_SIZE);
+ret = bdrv_truncate(s->extents[i].file, length,
+PREALLOC_MODE_OFF, NULL);
+if (ret < 0) {
+return ret;
+}
+}
+return 0;
+}
 return vmdk_co_pwritev(bs, offset, bytes, qiov, 0);
 }
 
-- 
2.18.0




Re: [Qemu-block] [PATCH] vmdk: align end of file to a sector boundary

2018-09-12 Thread yuchenlin

On 2018-09-12 19:54, Fam Zheng wrote:

On Tue, 08/28 11:17, yuchen...@synology.com wrote:

From: yuchenlin 

There is a rare case which the size of last compressed cluster
is larger than the cluster size, which will cause the file is
not aligned at the sector boundary.

Signed-off-by: yuchenlin 
---
 block/vmdk.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/block/vmdk.c b/block/vmdk.c
index a9d0084e36..a8ae7c65d2 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1698,6 +1698,24 @@ static int coroutine_fn
 vmdk_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
uint64_t bytes, QEMUIOVector *qiov)
 {
+if (bytes == 0) {
+/* align end of file to a sector boundary. */
+BDRVVmdkState *s = bs->opaque;
+int i, ret;
+int64_t length;
+
+for (i = 0; i < s->num_extents; i++) {
+length = bdrv_getlength(s->extents[i].file->bs);
+if (length < 0) {
+return length;
+}


Could you add "length = QEMU_ALIGN_UP(length, BDRV_SECTOR_SIZE);" to 
show the

intention more clearly?

Fam



Thank you for your effort, I will do it.

yuchenlin

+ret = bdrv_truncate(s->extents[i].file, length, 
PREALLOC_MODE_OFF, NULL);

+if (ret < 0) {
+return ret;
+}
+}
+return 0;
+}
 return vmdk_co_pwritev(bs, offset, bytes, qiov, 0);
 }

--
2.17.0






Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] nvme: PCI/e configuration from specification

2018-09-12 Thread Eric Blake

On 9/12/18 2:53 PM, Gersner wrote:

Hi Daniel,

Sorry for the long round-trips, we had a busy month. We have implemented
all the changes. Waiting for a final clarification.

Should the new patches be posted on this thread or a new one?


Best to post a v2 as a new top-level thread (our CI tools don't spot 
patch revisions buried in reply to an existing thread).


Also, it's best to avoid top-posting on technical lists, as it's harder 
to follow the flow of your email.



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] nvme: PCI/e configuration from specification

2018-09-12 Thread Gersner
Hi Daniel,

Sorry for the long round-trips, we had a busy month. We have implemented
all the changes. Waiting for a final clarification.

Should the new patches be posted on this thread or a new one?

Thanks for you time.

Gersner.

On Thu, Aug 30, 2018 at 6:45 PM Daniel Verkamp  wrote:

> Hi Shimi,
>
> On Sun, Aug 26, 2018 at 2:50 PM Gersner  wrote:
> >
> > Hi Daniel,
> > Thanks for taking a look. Comments are inline.
> >
> > Gersner.
> >
> > On Sun, Jul 15, 2018 at 9:21 AM Daniel Verkamp  wrote:
> >>
> >> On Fri, Jun 22, 2018 at 4:22 AM, Shimi Gersner 
> wrote:
> >> > PCI/e configuration currently does not meets specifications.
> >> >
> >> > Patch includes various configuration changes to support specifications
> >> > - BAR2 to return zero when read and CMD.IOSE is not set.
> >> > - Expose NVME configuration through IO space (Optional).
> >> > - PCI Power Management v1.2.
> >> > - PCIe Function Level Reset.
> >> > - Disable QEMUs default use of PCIe Link Status (DLLLA).
> >> > - PCIe missing AOC compliance flag.
> >> > - Mask PCIe End-to-End TLP as RO (Unspecified by specification).
> >> [...]
> >> >  n->num_namespaces = 1;
> >> >  n->num_queues = 64;
> >> > -n->reg_size = pow2ceil(0x1004 + 2 * (n->num_queues + 1) * 4);
> >> > +n->reg_size = pow2ceil(0x2000 + 2 * (n->num_queues + 1) * 4);
> >>
> >> The existing math looks OK to me (maybe already 4 bytes larger than
> >> necessary?). The controller registers should be 0x1000 bytes for the
> >> fixed register set, then 2 * 4 bytes for each doorbell pair (+ 1 is
> >> for the admin queue, and CAP.DSTRD is set to 0, so there is no extra
> >> padding between queues). The result is also rounded up to a power of
> >> two, so the result will typically be 8 KB.  What is the rationale for
> >> this change?
> >
> > You are correct, this change shouldn't be here. It was made during tests
> against the
> > nvme compliance which failed here.
>
> If the NVMe compliance test requires it, that is probably sufficient
> reason to change it - although it would be interesting to hear their
> rationale.
>
> > The specification states that bits 0 to 13 are RO, which implicitly
> suggests
> > that the minimal size of this BAR should be at least 16K as this is a
> standard
> > way to determine the BAR size on PCI (AFAIK). On the contrary it doesn't
> > fit very well with the memory mapped laid out on 3.1 Register Definition
> > of the 1.3b nvme spec. Any idea?
>
> You're right, the MLBAR section of the NVMe spec does seem to indicate
> that up to bit 13 should be reserved/RO.  This is also probably a good
> enough reason to make the change, as long as this reason is provided.
>
> >
> > Additionally it does look like it has an extra 4 bytes.
> >
> >>
> >>
> >> >  n->ns_size = bs_size / (uint64_t)n->num_namespaces;
> >> >
> >> >  n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
> >> > @@ -1245,6 +1295,10 @@ static void nvme_realize(PCIDevice *pci_dev,
> Error **errp)
> >> >  pci_register_bar(>parent_obj, 0,
> >> >  PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
> >> >  >iomem);
> >> > +
> >> > +// Expose the NVME memory through Address Space IO (Optional by
> spec)
> >> > +pci_register_bar(>parent_obj, 2, PCI_BASE_ADDRESS_SPACE_IO,
> >iomem);
> >>
> >> This looks like it will register the whole 4KB+ NVMe register set
> >> (n->iomem) as an I/O space BAR, but this doesn't match the spec (see
> >> NVMe 1.3c section 3.2, "Index/Data Pair registers"). The I/O BAR (if
> >> implemented) should just contain two 4-byte registers, Index and Data,
> >> that can be used to indirectly access the NVMe register set.  (Just
> >> for curiosity, do you know of any software that uses this feature? It
> >> could be useful for testing the implementation.)
> >
> > Very nice catch. We indeed totally miss-interpreted the specification
> here.
> >
> > We use the compliance suit for sanity, but it doesn't actually use this
> feature,
> > just verifies the configuration of the registers.
> >
> > We will go with rolling back this feature as this is optional.
>
> This is probably the right move; I don't know of any real hardware
> devices that implement it (and therefore I doubt any software will
> require it).  However, it should not be too difficult to add an
> implementation of this feature; if you are interested, take a look at
> the ahci_idp_read/ahci_idp_write functions - the NVMe index/data pair
> should be very similar.
>
Nice, It does seem easy to implement. We decided to go without due to lack
of real-world software we could test against the new facility.


>
> > Question - I'm having hard time to determine from the specification - As
> > this is optional, how device drivers determine if it is available? Does
> it
> > simply the CMD.IOSE flag from the PCI? And if so, what is
> > the standard way in QEMU to disable the flag completely?
>
> Based on the wording in NVMe 1.3 section 3.2, it seems like the only
> indication of 

[Qemu-block] [PATCH 3/3] aio-posix: do skip system call if ctx->notifier polling succeeds

2018-09-12 Thread Paolo Bonzini
Commit 70232b5253 ("aio-posix: Don't count ctx->notifier as progress when
2018-08-15), by not reporting progress, causes aio_poll to execute the
system call when polling succeeds because of ctx->notifier.  This introduces
latency before the call to aio_bh_poll() and negates the advantages of
polling, unfortunately.

The fix builds on the previous patch, separating the effect of polling on
the timeout from the progress reported to aio_poll().  ctx->notifier
does zero the timeout, causing the caller to skip the system call,
but it does not report progress, so that the bug fix of commit 70232b5253
still stands.

Fixes: 70232b5253a3c4e03ed1ac47ef9246a8ac66c6fa
Signed-off-by: Paolo Bonzini 
---
 util/aio-posix.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/util/aio-posix.c b/util/aio-posix.c
index 21d8987179..efa79b9e62 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -498,10 +498,11 @@ static bool run_poll_handlers_once(AioContext *ctx, 
int64_t *timeout)
 QLIST_FOREACH_RCU(node, >aio_handlers, node) {
 if (!node->deleted && node->io_poll &&
 aio_node_check(ctx, node->is_external) &&
-node->io_poll(node->opaque) &&
-node->opaque != >notifier) {
+node->io_poll(node->opaque)) {
 *timeout = 0;
-progress = true;
+if (node->opaque != >notifier) {
+progress = true;
+}
 }
 
 /* Caller handles freeing deleted nodes.  Don't do it here. */
-- 
2.17.1




[Qemu-block] [PATCH 1/3] aio-posix: fix concurrent access to poll_disable_cnt

2018-09-12 Thread Paolo Bonzini
It is valid for an aio_set_fd_handler to happen concurrently with
aio_poll.  In that case, poll_disable_cnt can change under the heels
of aio_poll, and the assertion on poll_disable_cnt can fail in
run_poll_handlers.

Therefore, this patch simply checks the counter on every polling
iteration.  There are no particular needs for ordering, since the
polling loop is terminated anyway by aio_notify at the end of
aio_set_fd_handler.

Signed-off-by: Paolo Bonzini 
---
 util/aio-posix.c | 26 +++---
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/util/aio-posix.c b/util/aio-posix.c
index 131ba6b4a8..5c29380575 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -211,6 +211,7 @@ void aio_set_fd_handler(AioContext *ctx,
 AioHandler *node;
 bool is_new = false;
 bool deleted = false;
+int poll_disable_change;
 
 qemu_lockcnt_lock(>list_lock);
 
@@ -244,11 +245,9 @@ void aio_set_fd_handler(AioContext *ctx,
 QLIST_REMOVE(node, node);
 deleted = true;
 }
-
-if (!node->io_poll) {
-ctx->poll_disable_cnt--;
-}
+poll_disable_change = -!node->io_poll;
 } else {
+poll_disable_change = !io_poll - (node && !node->io_poll);
 if (node == NULL) {
 /* Alloc and insert if it's not already there */
 node = g_new0(AioHandler, 1);
@@ -257,10 +256,6 @@ void aio_set_fd_handler(AioContext *ctx,
 
 g_source_add_poll(>source, >pfd);
 is_new = true;
-
-ctx->poll_disable_cnt += !io_poll;
-} else {
-ctx->poll_disable_cnt += !io_poll - !node->io_poll;
 }
 
 /* Update handler with latest information */
@@ -274,6 +269,15 @@ void aio_set_fd_handler(AioContext *ctx,
 node->pfd.events |= (io_write ? G_IO_OUT | G_IO_ERR : 0);
 }
 
+/* No need to order poll_disable_cnt writes against other updates;
+ * the counter is only used to avoid wasting time and latency on
+ * iterated polling when the system call will be ultimately necessary.
+ * Changing handlers is a rare event, and a little wasted polling until
+ * the aio_notify below is not an issue.
+ */
+atomic_set(>poll_disable_cnt,
+   atomic_read(>poll_disable_cnt) + poll_disable_change);
+
 aio_epoll_update(ctx, node, is_new);
 qemu_lockcnt_unlock(>list_lock);
 aio_notify(ctx);
@@ -525,7 +529,6 @@ static bool run_poll_handlers(AioContext *ctx, int64_t 
max_ns)
 
 assert(ctx->notify_me);
 assert(qemu_lockcnt_count(>list_lock) > 0);
-assert(ctx->poll_disable_cnt == 0);
 
 trace_run_poll_handlers_begin(ctx, max_ns);
 
@@ -533,7 +536,8 @@ static bool run_poll_handlers(AioContext *ctx, int64_t 
max_ns)
 
 do {
 progress = run_poll_handlers_once(ctx);
-} while (!progress && qemu_clock_get_ns(QEMU_CLOCK_REALTIME) < end_time);
+} while (!progress && qemu_clock_get_ns(QEMU_CLOCK_REALTIME) < end_time
+ && !atomic_read(>poll_disable_cnt));
 
 trace_run_poll_handlers_end(ctx, progress);
 
@@ -552,7 +556,7 @@ static bool run_poll_handlers(AioContext *ctx, int64_t 
max_ns)
  */
 static bool try_poll_mode(AioContext *ctx, bool blocking)
 {
-if (blocking && ctx->poll_max_ns && ctx->poll_disable_cnt == 0) {
+if (blocking && ctx->poll_max_ns && !atomic_read(>poll_disable_cnt)) {
 /* See qemu_soonest_timeout() uint64_t hack */
 int64_t max_ns = MIN((uint64_t)aio_compute_timeout(ctx),
  (uint64_t)ctx->poll_ns);
-- 
2.17.1





[Qemu-block] [PATCH 2/3] aio-posix: compute timeout before polling

2018-09-12 Thread Paolo Bonzini
This is a preparation for the next patch, and also a very small
optimization.  Compute the timeout only once, before invoking
try_poll_mode, and adjust it in run_poll_handlers.  The adjustment
is the polling time when polling fails, or zero (non-blocking) if
polling succeeds.

Fixes: 70232b5253a3c4e03ed1ac47ef9246a8ac66c6fa
Signed-off-by: Paolo Bonzini 
---
 util/aio-posix.c  | 59 +++
 util/trace-events |  4 ++--
 2 files changed, 36 insertions(+), 27 deletions(-)

diff --git a/util/aio-posix.c b/util/aio-posix.c
index 5c29380575..21d8987179 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -490,7 +490,7 @@ static void add_pollfd(AioHandler *node)
 npfd++;
 }
 
-static bool run_poll_handlers_once(AioContext *ctx)
+static bool run_poll_handlers_once(AioContext *ctx, int64_t *timeout)
 {
 bool progress = false;
 AioHandler *node;
@@ -500,6 +500,7 @@ static bool run_poll_handlers_once(AioContext *ctx)
 aio_node_check(ctx, node->is_external) &&
 node->io_poll(node->opaque) &&
 node->opaque != >notifier) {
+*timeout = 0;
 progress = true;
 }
 
@@ -522,31 +523,38 @@ static bool run_poll_handlers_once(AioContext *ctx)
  *
  * Returns: true if progress was made, false otherwise
  */
-static bool run_poll_handlers(AioContext *ctx, int64_t max_ns)
+static bool run_poll_handlers(AioContext *ctx, int64_t max_ns, int64_t 
*timeout)
 {
 bool progress;
-int64_t end_time;
+int64_t start_time, elapsed_time;
 
 assert(ctx->notify_me);
 assert(qemu_lockcnt_count(>list_lock) > 0);
 
-trace_run_poll_handlers_begin(ctx, max_ns);
-
-end_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + max_ns;
+trace_run_poll_handlers_begin(ctx, max_ns, *timeout);
 
+start_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 do {
-progress = run_poll_handlers_once(ctx);
-} while (!progress && qemu_clock_get_ns(QEMU_CLOCK_REALTIME) < end_time
+progress = run_poll_handlers_once(ctx, timeout);
+elapsed_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start_time;
+} while (!progress && elapsed_time < max_ns
  && !atomic_read(>poll_disable_cnt));
 
-trace_run_poll_handlers_end(ctx, progress);
+/* If time has passed with no successful polling, adjust *timeout to
+ * keep the same ending time.
+ */
+if (*timeout != -1) {
+*timeout -= MIN(*timeout, elapsed_time);
+}
 
+trace_run_poll_handlers_end(ctx, progress, *timeout);
 return progress;
 }
 
 /* try_poll_mode:
  * @ctx: the AioContext
- * @blocking: busy polling is only attempted when blocking is true
+ * @timeout: timeout for blocking wait, computed by the caller and updated if
+ *polling succeeds.
  *
  * ctx->notify_me must be non-zero so this function can detect aio_notify().
  *
@@ -554,19 +562,16 @@ static bool run_poll_handlers(AioContext *ctx, int64_t 
max_ns)
  *
  * Returns: true if progress was made, false otherwise
  */
-static bool try_poll_mode(AioContext *ctx, bool blocking)
+static bool try_poll_mode(AioContext *ctx, int64_t *timeout)
 {
-if (blocking && ctx->poll_max_ns && !atomic_read(>poll_disable_cnt)) {
-/* See qemu_soonest_timeout() uint64_t hack */
-int64_t max_ns = MIN((uint64_t)aio_compute_timeout(ctx),
- (uint64_t)ctx->poll_ns);
+/* See qemu_soonest_timeout() uint64_t hack */
+int64_t max_ns = MIN((uint64_t)*timeout, (uint64_t)ctx->poll_ns);
 
-if (max_ns) {
-poll_set_started(ctx, true);
+if (max_ns && !atomic_read(>poll_disable_cnt)) {
+poll_set_started(ctx, true);
 
-if (run_poll_handlers(ctx, max_ns)) {
-return true;
-}
+if (run_poll_handlers(ctx, max_ns, timeout)) {
+return true;
 }
 }
 
@@ -575,7 +580,7 @@ static bool try_poll_mode(AioContext *ctx, bool blocking)
 /* Even if we don't run busy polling, try polling once in case it can make
  * progress and the caller will be able to avoid ppoll(2)/epoll_wait(2).
  */
-return run_poll_handlers_once(ctx);
+return run_poll_handlers_once(ctx, timeout);
 }
 
 bool aio_poll(AioContext *ctx, bool blocking)
@@ -605,8 +610,14 @@ bool aio_poll(AioContext *ctx, bool blocking)
 start = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 }
 
-progress = try_poll_mode(ctx, blocking);
-if (!progress) {
+timeout = blocking ? aio_compute_timeout(ctx) : 0;
+progress = try_poll_mode(ctx, );
+assert(!(timeout && progress));
+
+/* If polling is allowed, non-blocking aio_poll does not need the
+ * system call---a single round of run_poll_handlers_once suffices.
+ */
+if (timeout || atomic_read(>poll_disable_cnt)) {
 assert(npfd == 0);
 
 /* fill pollfds */
@@ -620,8 +631,6 @@ bool aio_poll(AioContext *ctx, bool blocking)
 }
 }
 
-timeout = 

[Qemu-block] [PATCH 0/3] aio-posix: polling mode bug fixes

2018-09-12 Thread Paolo Bonzini
Patch 1 fixes a too-strict assertion that could fire when aio_poll
is called in parallel with aio_set_fd_handler.

Patch 2 and 3 reinstate the performance benefits of polling, which were
essentially disabled by commit 70232b5253 ("aio-posix: Don't count
ctx->notifier as progress when polling", 2018-08-15).

Please review carefully! :)

Thanks,

Paolo

Paolo Bonzini (3):
  aio-posix: fix concurrent access to poll_disable_cnt
  aio-posix: compute timeout before polling
  aio-posix: do skip system call if ctx->notifier polling succeeds

 util/aio-posix.c  | 88 +++
 util/trace-events |  4 +--
 2 files changed, 53 insertions(+), 39 deletions(-)

-- 
2.17.1




Re: [Qemu-block] [PATCH v0 2/2] block: postpone the coroutine executing if the BDS's is drained

2018-09-12 Thread Denis V. Lunev
On 09/12/2018 04:15 PM, Kevin Wolf wrote:
> Am 12.09.2018 um 14:03 hat Denis Plotnikov geschrieben:
>> On 10.09.2018 15:41, Kevin Wolf wrote:
>>> Am 29.06.2018 um 14:40 hat Denis Plotnikov geschrieben:
 Fixes the problem of ide request appearing when the BDS is in
 the "drained section".

 Without the patch the request can come and be processed by the main
 event loop, as the ide requests are processed by the main event loop
 and the main event loop doesn't stop when its context is in the
 "drained section".
 The request execution is postponed until the end of "drained section".

 The patch doesn't modify ide specific code, as well as any other
 device code. Instead, it modifies the infrastructure of asynchronous
 Block Backend requests, in favor of postponing the requests arisen
 when in "drained section" to remove the possibility of request appearing
 for all the infrastructure clients.

 This approach doesn't make vCPU processing the request wait untill
 the end of request processing.

 Signed-off-by: Denis Plotnikov 
>>> I generally agree with the idea that requests should be queued during a
>>> drained section. However, I think there are a few fundamental problems
>>> with the implementation in this series:
>>>
>>> 1) aio_disable_external() is already a layering violation and we'd like
>>> to get rid of it (by replacing it with a BlockDevOps callback from
>>> BlockBackend to the devices), so adding more functionality there
>>> feels like a step in the wrong direction.
>>>
>>> 2) Only blk_aio_* are fixed, while we also have synchronous public
>>> interfaces (blk_pread/pwrite) as well as coroutine-based ones
>>> (blk_co_*). They need to be postponed as well.
>> Good point! Thanks!

Should we really prohibit all public interfaces, as they are reused
inside block
level?

There is also a problem which is not stated in the clear words yet.
We have potential deadlock in the code under the following
conditions, which should be also taken into the consideration.


bdrv_co_pwritev
    bdrv_inc_in_flight
    bdrv_aligned_pwritev
    notifier_list_with_return_notify
 backup_before_write_notify
 backup_do_cow
 backup_cow_with_bounce_buffer
 blk_co_preadv

Here blk_co_preadv() must finish its work before we
will release the notifier and finish request initiated
from the controller and which has incremented
in-fligh counter.

Thus we should differentiate requests initiated at the
controller level and requests initiated in the block layer.
This is sad but true.

The idea to touch only these interfaces was to avoid
interference with block jobs code. It is revealed that
the approach is a mistake and we should have a
segregation by request kinds. Thus the idea of the
flag for use in the controller code should not be that
awful.


>>> blk_co_preadv/pwritev() are the common point in the call chain for
>>> all of these variants, so this is where the fix needs to live.
>> Using the common point might be a good idea, but in case aio requests we
>> also have to mane completions which out of the scope of
>> blk_co_p(read|write)v:
> I don't understand what you mean here (possibly because I fail to
> understand the word "mane") and what completions have to do with
> queueing of requests.
>
> Just to clarify, we are talking about the following situation, right?
> bdrv_drain_all_begin() has returned, so all the old requests have
> already been drained and their completion callback has already been
> called. For any new requests that come in, we need to queue them until
> the drained section ends. In other words, they won't reach the point
> where they could possibly complete before .drained_end.

Such requests should not reach the point once they will start to
execute EXCEPT notifiers. There is a big problem with synchronous
which can queue new requests and that requests are to be finished.

Den



Re: [Qemu-block] [RFC PATCH 06/10] block: Allow changing the backing file on reopen

2018-09-12 Thread Alberto Garcia
On Thu 21 Jun 2018 03:06:22 PM CEST, Kevin Wolf wrote:
>> > Actually, do we ever use bdrv_reopen() for flags other than
>> > read-only?  Maybe we should get rid of that flags nonsense and
>> > simply make it a bdrv_reopen_set_readonly() taking a boolean.
>> 
>> I think that's a good idea. There's however one case where other
>> flags are changed: reopen_backing_file() in block/replication.c that
>> also clears BDRV_O_INACTIVE. I'd need to take a closer look to that
>> code to see what we can do about it.
>
> Uh, and that works?
>
> Clearing BDRV_O_INACTIVE with bdrv_reopen() (which means, without
> calling bdrv_invalidate_cache()) is surely suspicious. Probably this
> code is buggy.
>
> Another reason for removing flags from the interface: We didn't do any
> sanity checks for the flag changes.

I'm working on removing the flags parameter from bdrv_reopen() and
friends, and I think this is the only clearly strange thing on the way.

The documentation is in https://wiki.qemu.org/Features/BlockReplication
or in docs/block-replication.txt.

As far as I can see there's the following backing chain:

  [secondary] <- [hidden] <- [active]

There's an NBD server writing on [secondary], a backup job copying data
from [secondary] to [hidden], and the VM writing to [active].

The reopen_backing_file() function in question simply puts the two
backing images in read-write mode (clearing BDRV_O_INACTIVE along the
way) so the NBD server and the backup job can write to them.

This is done by replication_start(), which can only be reached with the
QMP command "xen-set-replication". There's a comment that says "The
caller of the function MUST make sure vm stopped" but no one seems to
check that the vm is indeed stopped (?). This API was btw added half a
year after the replication driver, and before this point the only user
of replication_start() was a unit test.

Without having tested the COLO / replication code before, I don't see
any reason why it would make sense to clear the BDRV_O_INACTIVE flag on
the backing files, or why that flag would be there in the first place
(and I'm not even talking about whether that flag is supposed to be
set/cleared with a simple bdrv_reopen()). So I'm thinking to simply
remove it from there.

If anyone else has more information I'd be happy to hear it.

Berto



Re: [Qemu-block] [PATCH v0 2/2] block: postpone the coroutine executing if the BDS's is drained

2018-09-12 Thread Kevin Wolf
Am 12.09.2018 um 16:53 hat Denis Plotnikov geschrieben:
> On 12.09.2018 16:15, Kevin Wolf wrote:
> > Am 12.09.2018 um 14:03 hat Denis Plotnikov geschrieben:
> > > On 10.09.2018 15:41, Kevin Wolf wrote:
> > > > Am 29.06.2018 um 14:40 hat Denis Plotnikov geschrieben:
> > > > > Fixes the problem of ide request appearing when the BDS is in
> > > > > the "drained section".
> > > > > 
> > > > > Without the patch the request can come and be processed by the main
> > > > > event loop, as the ide requests are processed by the main event loop
> > > > > and the main event loop doesn't stop when its context is in the
> > > > > "drained section".
> > > > > The request execution is postponed until the end of "drained section".
> > > > > 
> > > > > The patch doesn't modify ide specific code, as well as any other
> > > > > device code. Instead, it modifies the infrastructure of asynchronous
> > > > > Block Backend requests, in favor of postponing the requests arisen
> > > > > when in "drained section" to remove the possibility of request 
> > > > > appearing
> > > > > for all the infrastructure clients.
> > > > > 
> > > > > This approach doesn't make vCPU processing the request wait untill
> > > > > the end of request processing.
> > > > > 
> > > > > Signed-off-by: Denis Plotnikov 
> > > > 
> > > > I generally agree with the idea that requests should be queued during a
> > > > drained section. However, I think there are a few fundamental problems
> > > > with the implementation in this series:
> > > > 
> > > > 1) aio_disable_external() is already a layering violation and we'd like
> > > >  to get rid of it (by replacing it with a BlockDevOps callback from
> > > >  BlockBackend to the devices), so adding more functionality there
> > > >  feels like a step in the wrong direction.
> > > > 
> > > > 2) Only blk_aio_* are fixed, while we also have synchronous public
> > > >  interfaces (blk_pread/pwrite) as well as coroutine-based ones
> > > >  (blk_co_*). They need to be postponed as well.
> > > Good point! Thanks!
> > > > 
> > > >  blk_co_preadv/pwritev() are the common point in the call chain for
> > > >  all of these variants, so this is where the fix needs to live.
> > > Using the common point might be a good idea, but in case aio requests we
> > > also have to mane completions which out of the scope of
> > > blk_co_p(read|write)v:
> > 
> > I don't understand what you mean here (possibly because I fail to
> > understand the word "mane") and what completions have to do with
> mane = make
> > queueing of requests.
> > 
> > Just to clarify, we are talking about the following situation, right?
> > bdrv_drain_all_begin() has returned, so all the old requests have
> > already been drained and their completion callback has already been
> > called. For any new requests that come in, we need to queue them until
> > the drained section ends. In other words, they won't reach the point
> > where they could possibly complete before .drained_end.
> Yes
> 
> To make it clear: I'm trying to defend the idea that putting the postponing
> routine in blk_co_preadv/pwritev is not the best choice and that's why:
> 
> If I understood your idea correctly, if we do the postponing inside
> blk_co_p(write|read)v we don't know whether we do synchronous or
> asynchronous request.
> We need to know this because if we postpone an async request then, later, on
> the postponed requests processing, we must to make "a completion" for that
> request stating that it's finally "done".

Yes, for AIO requests, the completion callback must be called
eventually. This is not different between normal and postponed requests,
though. This is why blk_aio_read/write_entry() call blk_aio_complete()
before they return. This call will be made for postponed requests, too,
so there is nothing that you would need to do additionally inside
blk_co_preadv/pwritev().

> Furthermore, for sync requests if we postpone them, we must block the
> clients issued them until the requests postponed have been processed on
> drained section leaving. This would ask an additional notification
> mechanism. Instead, we can just check whether we could proceed in
> blk_p(write|read) and if not (we're in drained) to wait there.

Again, this is the same for normal requests. The BDRV_POLL_WHILE() in
blk_prw() already implements the waiting. You don't need another
mechanism.

> We avoid the things above if we postponing in blk_aio_prwv and waiting in
> blk_prw without postponing.
> 
> What do you think?
> 
> > 
> > > static void blk_aio_write_entry(void *opaque) {
> > >  ...
> > >  rwco->ret = blk_co_pwritev(...);
> > > 
> > >  blk_aio_complete(acb);
> > >  ...
> > > }
> > > 
> > > This makes the difference.
> > > I would suggest adding waiting until "drained_end" is done on the
> > > synchronous read/write at blk_prw
> > 
> > It is possible, but then the management becomes a bit more complicated
> > because you have more than just a list of Coroutines that 

Re: [Qemu-block] [PATCH v0 2/2] block: postpone the coroutine executing if the BDS's is drained

2018-09-12 Thread Denis Plotnikov




On 12.09.2018 16:15, Kevin Wolf wrote:

Am 12.09.2018 um 14:03 hat Denis Plotnikov geschrieben:

On 10.09.2018 15:41, Kevin Wolf wrote:

Am 29.06.2018 um 14:40 hat Denis Plotnikov geschrieben:

Fixes the problem of ide request appearing when the BDS is in
the "drained section".

Without the patch the request can come and be processed by the main
event loop, as the ide requests are processed by the main event loop
and the main event loop doesn't stop when its context is in the
"drained section".
The request execution is postponed until the end of "drained section".

The patch doesn't modify ide specific code, as well as any other
device code. Instead, it modifies the infrastructure of asynchronous
Block Backend requests, in favor of postponing the requests arisen
when in "drained section" to remove the possibility of request appearing
for all the infrastructure clients.

This approach doesn't make vCPU processing the request wait untill
the end of request processing.

Signed-off-by: Denis Plotnikov 


I generally agree with the idea that requests should be queued during a
drained section. However, I think there are a few fundamental problems
with the implementation in this series:

1) aio_disable_external() is already a layering violation and we'd like
 to get rid of it (by replacing it with a BlockDevOps callback from
 BlockBackend to the devices), so adding more functionality there
 feels like a step in the wrong direction.

2) Only blk_aio_* are fixed, while we also have synchronous public
 interfaces (blk_pread/pwrite) as well as coroutine-based ones
 (blk_co_*). They need to be postponed as well.

Good point! Thanks!


 blk_co_preadv/pwritev() are the common point in the call chain for
 all of these variants, so this is where the fix needs to live.

Using the common point might be a good idea, but in case aio requests we
also have to mane completions which out of the scope of
blk_co_p(read|write)v:


I don't understand what you mean here (possibly because I fail to
understand the word "mane") and what completions have to do with

mane = make

queueing of requests.

Just to clarify, we are talking about the following situation, right?
bdrv_drain_all_begin() has returned, so all the old requests have
already been drained and their completion callback has already been
called. For any new requests that come in, we need to queue them until
the drained section ends. In other words, they won't reach the point
where they could possibly complete before .drained_end.

Yes

To make it clear: I'm trying to defend the idea that putting the 
postponing routine in blk_co_preadv/pwritev is not the best choice and 
that's why:


If I understood your idea correctly, if we do the postponing inside
blk_co_p(write|read)v we don't know whether we do synchronous or 
asynchronous request.
We need to know this because if we postpone an async request then, 
later, on the postponed requests processing, we must to make "a 
completion" for that request stating that it's finally "done".


Furthermore, for sync requests if we postpone them, we must block the 
clients issued them until the requests postponed have been processed on 
drained section leaving. This would ask an additional notification 
mechanism. Instead, we can just check whether we could proceed in 
blk_p(write|read) and if not (we're in drained) to wait there.


We avoid the things above if we postponing in blk_aio_prwv and waiting 
in blk_prw without postponing.


What do you think?




static void blk_aio_write_entry(void *opaque) {
 ...
 rwco->ret = blk_co_pwritev(...);

 blk_aio_complete(acb);
 ...
}

This makes the difference.
I would suggest adding waiting until "drained_end" is done on the
synchronous read/write at blk_prw


It is possible, but then the management becomes a bit more complicated
because you have more than just a list of Coroutines that you need to
wake up.

One thing that could be problematic in blk_co_preadv/pwritev is that
blk->in_flight would count even requests that are queued if we're not
careful. Then a nested drain would deadlock because the BlockBackend
would never say that it's quiesced.


   >

3) Within a drained section, you want requests from other users to be
 blocked, but not your own ones (essentially you want exclusive
 access). We don't have blk_drained_begin/end() yet, so this is not
 something to implement right now, but let's keep this requirement in
 mind and choose a design that allows this.

There is an idea to distinguish the requests that should be done without
respect to "drained section" by using a flag in BdrvRequestFlags. The
requests with a flag set should be processed anyway.


I don't think that would work because the accesses can be nested quite
deeply in functions that can be called from anywhere.

But possibly all of the interesting cases are directly calling BDS
functions anyway and not BlockBackend.

I hope it's so 

[Qemu-block] [PULL 2/4] block/rbd: Attempt to parse legacy filenames

2018-09-12 Thread Jeff Cody
When we converted rbd to get rid of the older key/value-centric
encoding format, we broke compatibility with image files with backing
file strings encoded in the old format.

This leaves a bit of an ugly conundrum, and a hacky solution.

If the initial attempt to parse the "proper" options fails, it assumes
that we may have an older key/value encoded filename.  Fall back to
attempting to parse the filename, and extract the required options from
it.  If that fails, pass along the original error message.

We do not support mixed modern usage alongside legacy keyvalue pair
usage.

A deprecation warning has been added, although care should be taken
when actually deprecating since the impact is not limited to
commandline or qapi usage, but also opening existing images.

Reviewed-by: Eric Blake 
Signed-off-by: Jeff Cody 
Message-id: 
15b332e5432ad069441f7275a46080f465d789a0.1536704901.git.jc...@redhat.com
Signed-off-by: Jeff Cody 
---
 block/rbd.c | 53 +++--
 1 file changed, 51 insertions(+), 2 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index b199450f9f..5090e4f662 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -678,6 +678,33 @@ static int qemu_rbd_convert_options(QDict *options, 
BlockdevOptionsRbd **opts,
 return 0;
 }
 
+static int qemu_rbd_attempt_legacy_options(QDict *options,
+   BlockdevOptionsRbd **opts,
+   char **keypairs)
+{
+char *filename;
+int r;
+
+filename = g_strdup(qdict_get_try_str(options, "filename"));
+if (!filename) {
+return -EINVAL;
+}
+qdict_del(options, "filename");
+
+qemu_rbd_parse_filename(filename, options, NULL);
+
+/* keypairs freed by caller */
+*keypairs = g_strdup(qdict_get_try_str(options, "=keyvalue-pairs"));
+if (*keypairs) {
+qdict_del(options, "=keyvalue-pairs");
+}
+
+r = qemu_rbd_convert_options(options, opts, NULL);
+
+g_free(filename);
+return r;
+}
+
 static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
  Error **errp)
 {
@@ -700,8 +727,30 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 r = qemu_rbd_convert_options(options, , _err);
 if (local_err) {
-error_propagate(errp, local_err);
-goto out;
+/* If keypairs are present, that means some options are present in
+ * the modern option format.  Don't attempt to parse legacy option
+ * formats, as we won't support mixed usage. */
+if (keypairs) {
+error_propagate(errp, local_err);
+goto out;
+}
+
+/* If the initial attempt to convert and process the options failed,
+ * we may be attempting to open an image file that has the rbd options
+ * specified in the older format consisting of all key/value pairs
+ * encoded in the filename.  Go ahead and attempt to parse the
+ * filename, and see if we can pull out the required options. */
+r = qemu_rbd_attempt_legacy_options(options, , );
+if (r < 0) {
+error_propagate(errp, local_err);
+goto out;
+}
+/* Take care whenever deciding to actually deprecate; once this ability
+ * is removed, we will not be able to open any images with 
legacy-styled
+ * backing image strings. */
+error_report("RBD options encoded in the filename as keyvalue pairs "
+ "is deprecated.  Future versions may cease to parse "
+ "these options in the future.");
 }
 
 /* Remove the processed options from the QDict (the visitor processes
-- 
2.17.1




[Qemu-block] [PULL 1/4] block/rbd: pull out qemu_rbd_convert_options

2018-09-12 Thread Jeff Cody
Code movement to pull the conversion from Qdict to BlockdevOptionsRbd
into a helper function.

Reviewed-by: Eric Blake 
Reviewed-by: John Snow 
Signed-off-by: Jeff Cody 
Message-id: 
5b49a980f2cde6610ab1df41bb0277d00b5db893.1536704901.git.jc...@redhat.com
Signed-off-by: Jeff Cody 
---
 block/rbd.c | 36 
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index ca8e5bbace..b199450f9f 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -655,12 +655,34 @@ failed_opts:
 return r;
 }
 
+static int qemu_rbd_convert_options(QDict *options, BlockdevOptionsRbd **opts,
+Error **errp)
+{
+Visitor *v;
+Error *local_err = NULL;
+
+/* Convert the remaining options into a QAPI object */
+v = qobject_input_visitor_new_flat_confused(options, errp);
+if (!v) {
+return -EINVAL;
+}
+
+visit_type_BlockdevOptionsRbd(v, NULL, opts, _err);
+visit_free(v);
+
+if (local_err) {
+error_propagate(errp, local_err);
+return -EINVAL;
+}
+
+return 0;
+}
+
 static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
  Error **errp)
 {
 BDRVRBDState *s = bs->opaque;
 BlockdevOptionsRbd *opts = NULL;
-Visitor *v;
 const QDictEntry *e;
 Error *local_err = NULL;
 char *keypairs, *secretid;
@@ -676,19 +698,9 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 qdict_del(options, "password-secret");
 }
 
-/* Convert the remaining options into a QAPI object */
-v = qobject_input_visitor_new_flat_confused(options, errp);
-if (!v) {
-r = -EINVAL;
-goto out;
-}
-
-visit_type_BlockdevOptionsRbd(v, NULL, , _err);
-visit_free(v);
-
+r = qemu_rbd_convert_options(options, , _err);
 if (local_err) {
 error_propagate(errp, local_err);
-r = -EINVAL;
 goto out;
 }
 
-- 
2.17.1




[Qemu-block] [PULL 0/4] Block patches

2018-09-12 Thread Jeff Cody
The following changes since commit 19b599f7664b2ebfd0f405fb79c14dd241557452:

  Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2018-08-27-v2' 
into staging (2018-08-27 16:44:20 +0100)

are available in the Git repository at:

  git://github.com/codyprime/qemu-kvm-jtc.git tags/block-pull-request

for you to fetch changes up to 8af2eb7b43a1b694fd6d1d090025027d6b72caac:

  block/rbd: add deprecation documentation for filename keyvalue pairs 
(2018-09-12 08:51:45 -0400)


Block patches for RBD


Jeff Cody (4):
  block/rbd: pull out qemu_rbd_convert_options
  block/rbd: Attempt to parse legacy filenames
  block/rbd: add iotest for rbd legacy keyvalue filename parsing
  block/rbd: add deprecation documentation for filename keyvalue pairs

 block/rbd.c| 89 --
 qemu-deprecated.texi   | 15 +++
 tests/qemu-iotests/231 | 62 ++
 tests/qemu-iotests/231.out |  9 
 tests/qemu-iotests/group   |  1 +
 5 files changed, 162 insertions(+), 14 deletions(-)
 create mode 100755 tests/qemu-iotests/231
 create mode 100644 tests/qemu-iotests/231.out

-- 
2.17.1




[Qemu-block] [PULL 4/4] block/rbd: add deprecation documentation for filename keyvalue pairs

2018-09-12 Thread Jeff Cody
Signed-off-by: Jeff Cody 
Message-id: 
647f5b5ab7efd8bf567a504c832b1d2d6f719b23.1536704901.git.jc...@redhat.com
Signed-off-by: Jeff Cody 
---
 qemu-deprecated.texi | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
index 1b9c007f12..8d285b281e 100644
--- a/qemu-deprecated.texi
+++ b/qemu-deprecated.texi
@@ -190,6 +190,21 @@ used instead.
 In order to prevent QEMU from automatically opening an image's backing
 chain, use ``"backing": null'' instead.
 
+@subsubsection rbd keyvalue pair encoded filenames: "" (since 3.1.0)
+
+Options for ``rbd'' should be specified according to its runtime options,
+like other block drivers.  Legacy parsing of keyvalue pair encoded
+filenames is useful to open images with the old format for backing files;
+These image files should be updated to use the current format.
+
+Example of legacy encoding:
+
+@code{json:@{"file.driver":"rbd", "file.filename":"rbd:rbd/name"@}}
+
+The above, converted to the current supported format:
+
+@code{json:@{"file.driver":"rbd", "file.pool":"rbd", "file.image":"name"@}}
+
 @subsection vio-spapr-device device options
 
 @subsubsection "irq": "" (since 3.0.0)
-- 
2.17.1




[Qemu-block] [PULL 3/4] block/rbd: add iotest for rbd legacy keyvalue filename parsing

2018-09-12 Thread Jeff Cody
This is a small test that will check for the ability to parse
both legacy and modern options for rbd.

The way the test is set up is for failure to occur, but without
having to wait to timeout on a non-existent rbd server.  The error
messages in the success path show that the arguments were parsed.

The failure behavior prior to the patch series that has this test, is
qemu-img complaining about mandatory options (e.g. 'pool') not being
provided.

Reviewed-by: Eric Blake 
Signed-off-by: Jeff Cody 
Message-id: 
f830580e339b974a83ed4870d11adcdc17f49a47.1536704901.git.jc...@redhat.com
Signed-off-by: Jeff Cody 
---
 tests/qemu-iotests/231 | 62 ++
 tests/qemu-iotests/231.out |  9 ++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 72 insertions(+)
 create mode 100755 tests/qemu-iotests/231
 create mode 100644 tests/qemu-iotests/231.out

diff --git a/tests/qemu-iotests/231 b/tests/qemu-iotests/231
new file mode 100755
index 00..3e283708b4
--- /dev/null
+++ b/tests/qemu-iotests/231
@@ -0,0 +1,62 @@
+#!/bin/bash
+#
+# Test legacy and modern option parsing for rbd/ceph.  This will not
+# actually connect to a ceph server, but rather looks for the appropriate
+# error message that indicates we parsed the options correctly.
+#
+# Copyright (C) 2018 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=jc...@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1   # failure is the default!
+
+_cleanup()
+{
+rm "${BOGUS_CONF}"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt generic
+_supported_proto rbd
+_supported_os Linux
+
+BOGUS_CONF=${TEST_DIR}/ceph-$$.conf
+touch "${BOGUS_CONF}"
+
+_filter_conf()
+{
+sed -e "s#$BOGUS_CONF#BOGUS_CONF#g"
+}
+
+# We expect this to fail, with no monitor ip provided and a null conf file.  
Just want it
+# to fail in the right way.
+$QEMU_IMG info 
"json:{'file.driver':'rbd','file.filename':'rbd:rbd/bogus:conf=${BOGUS_CONF}'}" 
2>&1 | _filter_conf
+$QEMU_IMG info 
"json:{'file.driver':'rbd','file.pool':'rbd','file.image':'bogus','file.conf':'${BOGUS_CONF}'}"
 2>&1 | _filter_conf
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/231.out b/tests/qemu-iotests/231.out
new file mode 100644
index 00..579ba11c16
--- /dev/null
+++ b/tests/qemu-iotests/231.out
@@ -0,0 +1,9 @@
+QA output created by 231
+qemu-img: RBD options encoded in the filename as keyvalue pairs is deprecated. 
 Future versions may cease to parse these options in the future.
+unable to get monitor info from DNS SRV with service name: ceph-mon
+no monitors specified to connect to.
+qemu-img: Could not open 
'json:{'file.driver':'rbd','file.filename':'rbd:rbd/bogus:conf=BOGUS_CONF'}': 
error connecting: No such file or directory
+unable to get monitor info from DNS SRV with service name: ceph-mon
+no monitors specified to connect to.
+qemu-img: Could not open 
'json:{'file.driver':'rbd','file.pool':'rbd','file.image':'bogus','file.conf':'BOGUS_CONF'}':
 error connecting: No such file or directory
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 743790745b..31f6e77dcb 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -226,3 +226,4 @@
 226 auto quick
 227 auto quick
 229 auto quick
+231 auto quick
-- 
2.17.1




Re: [Qemu-block] [PATCH v0 2/2] block: postpone the coroutine executing if the BDS's is drained

2018-09-12 Thread Kevin Wolf
Am 12.09.2018 um 14:03 hat Denis Plotnikov geschrieben:
> On 10.09.2018 15:41, Kevin Wolf wrote:
> > Am 29.06.2018 um 14:40 hat Denis Plotnikov geschrieben:
> > > Fixes the problem of ide request appearing when the BDS is in
> > > the "drained section".
> > > 
> > > Without the patch the request can come and be processed by the main
> > > event loop, as the ide requests are processed by the main event loop
> > > and the main event loop doesn't stop when its context is in the
> > > "drained section".
> > > The request execution is postponed until the end of "drained section".
> > > 
> > > The patch doesn't modify ide specific code, as well as any other
> > > device code. Instead, it modifies the infrastructure of asynchronous
> > > Block Backend requests, in favor of postponing the requests arisen
> > > when in "drained section" to remove the possibility of request appearing
> > > for all the infrastructure clients.
> > > 
> > > This approach doesn't make vCPU processing the request wait untill
> > > the end of request processing.
> > > 
> > > Signed-off-by: Denis Plotnikov 
> > 
> > I generally agree with the idea that requests should be queued during a
> > drained section. However, I think there are a few fundamental problems
> > with the implementation in this series:
> > 
> > 1) aio_disable_external() is already a layering violation and we'd like
> > to get rid of it (by replacing it with a BlockDevOps callback from
> > BlockBackend to the devices), so adding more functionality there
> > feels like a step in the wrong direction.
> > 
> > 2) Only blk_aio_* are fixed, while we also have synchronous public
> > interfaces (blk_pread/pwrite) as well as coroutine-based ones
> > (blk_co_*). They need to be postponed as well.
> Good point! Thanks!
> > 
> > blk_co_preadv/pwritev() are the common point in the call chain for
> > all of these variants, so this is where the fix needs to live.
> Using the common point might be a good idea, but in case aio requests we
> also have to mane completions which out of the scope of
> blk_co_p(read|write)v:

I don't understand what you mean here (possibly because I fail to
understand the word "mane") and what completions have to do with
queueing of requests.

Just to clarify, we are talking about the following situation, right?
bdrv_drain_all_begin() has returned, so all the old requests have
already been drained and their completion callback has already been
called. For any new requests that come in, we need to queue them until
the drained section ends. In other words, they won't reach the point
where they could possibly complete before .drained_end.

> static void blk_aio_write_entry(void *opaque) {
> ...
> rwco->ret = blk_co_pwritev(...);
> 
> blk_aio_complete(acb);
> ...
> }
> 
> This makes the difference.
> I would suggest adding waiting until "drained_end" is done on the
> synchronous read/write at blk_prw

It is possible, but then the management becomes a bit more complicated
because you have more than just a list of Coroutines that you need to
wake up.

One thing that could be problematic in blk_co_preadv/pwritev is that
blk->in_flight would count even requests that are queued if we're not
careful. Then a nested drain would deadlock because the BlockBackend
would never say that it's quiesced.

>   >
> > 3) Within a drained section, you want requests from other users to be
> > blocked, but not your own ones (essentially you want exclusive
> > access). We don't have blk_drained_begin/end() yet, so this is not
> > something to implement right now, but let's keep this requirement in
> > mind and choose a design that allows this.
> There is an idea to distinguish the requests that should be done without
> respect to "drained section" by using a flag in BdrvRequestFlags. The
> requests with a flag set should be processed anyway.

I don't think that would work because the accesses can be nested quite
deeply in functions that can be called from anywhere.

But possibly all of the interesting cases are directly calling BDS
functions anyway and not BlockBackend.

> > I believe the whole logic should be kept local to BlockBackend, and
> > blk_root_drained_begin/end() should be the functions that start queuing
> > requests or let queued requests resume.
> > 
> > As we are already in coroutine context in blk_co_preadv/pwritev(), after
> > checking that blk->quiesce_counter > 0, we can enter the coroutine
> > object into a list and yield. blk_root_drained_end() calls aio_co_wake()
> > for each of the queued coroutines. This should be all that we need to
> > manage.
> In my understanding by using brdv_drained_begin/end we want to protect a
> certain BlockDriverState from external access but not the whole BlockBackend
> which may involve using a number of BlockDriverState-s.
> I though it because we could possibly change a backing file for some
> BlockDriverState. And for the time of changing we 

Re: [Qemu-block] [PATCH v4 0/4] block/rbd: enable filename parsing on open

2018-09-12 Thread Jeff Cody
On Tue, Sep 11, 2018 at 06:32:29PM -0400, Jeff Cody wrote:
> Changes from v3:
> 
> 
> Patch 4: Typo fixed [Eric]
>  Added examples [Eric]
> 
> Changes from v2:
> =
> 
> Patch 4: New, document deprecation. [Eric]
> Patch 3,2: Add r-b's
> 
> 
> Changes from v1:
> =
> 
> Patch 1: Don't pass unused BlockDriverState to helper function
> 
> Patch 2: Do not allow mixed usage; fail if keyvalue is present [Eric]
>  Add deprecation warning [John]
>  Pull legacy parsing code into function [John]
>  Fixed filename leak
> 
> Patch 3: New; iotest 231. [Eric]
> 
> 
> iotest failure on current master:
> 
>  QA output created by 231
> -qemu-img: RBD options encoded in the filename as keyvalue pairs is 
> deprecated.  Future versions may cease to parse these options in the future.
> -unable to get monitor info from DNS SRV with service name: ceph-mon
> -no monitors specified to connect to.
> -qemu-img: Could not open 
> 'json:{'file.driver':'rbd','file.filename':'rbd:rbd/bogus:conf=BOGUS_CONF'}': 
> error connecting: No such file or directory
> +qemu-img: Could not open 
> 'json:{'file.driver':'rbd','file.filename':'rbd:rbd/bogus:conf=BOGUS_CONF'}': 
> Parameter 'pool' is missing
>  unable to get monitor info from DNS SRV with service name: ceph-mon
>  no monitors specified to connect to.
>  qemu-img: Could not open 
> 'json:{'file.driver':'rbd','file.pool':'rbd','file.image':'bogus','file.conf':'BOGUS_CONF'}':
>  error connecting: No such file or directory
> Failures: 231
> Failed 1 of 1 tests
> 
> Jeff Cody (4):
>   block/rbd: pull out qemu_rbd_convert_options
>   block/rbd: Attempt to parse legacy filenames
>   block/rbd: add iotest for rbd legacy keyvalue filename parsing
>   block/rbd: add deprecation documentation for filename keyvalue pairs
> 
>  block/rbd.c| 89 --
>  qemu-deprecated.texi   | 15 +++
>  tests/qemu-iotests/231 | 62 ++
>  tests/qemu-iotests/231.out |  9 
>  tests/qemu-iotests/group   |  1 +
>  5 files changed, 162 insertions(+), 14 deletions(-)
>  create mode 100755 tests/qemu-iotests/231
>  create mode 100644 tests/qemu-iotests/231.out
> 
> -- 
> 2.17.1
> 

Thanks,

Applied to my block branch:

git://github.com/codyprime/qemu-kvm-jtc block

-Jeff



Re: [Qemu-block] [Qemu-devel] [PATCH 2/2] block/rbd: Attempt to parse legacy filenames

2018-09-12 Thread Jeff Cody
On Wed, Sep 12, 2018 at 08:42:15AM -0400, Jeff Cody wrote:
> On Wed, Sep 12, 2018 at 12:38:56PM +0200, Kevin Wolf wrote:
> > Am 11.09.2018 um 20:37 hat Jeff Cody geschrieben:
> > > On Tue, Sep 11, 2018 at 02:22:31PM -0400, John Snow wrote:
> > > > Once we load the image, will the header get rewritten into a compliant
> > > > format?
> > > 
> > > Hmm - I think in some code paths, but not all.  I don't think the answer 
> > > is
> > > 'yes' universally, alas.
> > 
> > Can't we explicitly call BdrvChildRole.update_filename() for all parents
> > when we open a legacy filename? We'd just need to add the callback to
> > child_file, which would propagate it to the parents of the format layer,
> > and then just opening the image with a legacy backing file link once
> > would fix the problem for this image.
> >
> 
> Yes, that is a good idea.  I will spin a v5 with that added.

On second thought; how about we address updating the legacy filename header
in a separate patch series?

-Jeff



Re: [Qemu-block] [PATCH 2/2] virtio-scsi/virtio-blk: Disable poll handlers when stopping vq handler

2018-09-12 Thread Paolo Bonzini
On 12/09/2018 13:50, Fam Zheng wrote:
>> I think it's okay if it is invoked.  The sequence is first you stop the
>> vq, then you drain the BlockBackends, then you switch AioContext.  All
>> that matters is the outcome when virtio_scsi_dataplane_stop returns.
> Yes, but together with vIOMMU, it also effectively leads to a virtio_error(),
> which is not clean. QEMU stderr when this call happens (with patch 1 but not
> this patch):
> 
> 2018-09-12T11:48:10.193023Z qemu-system-x86_64: vtd_iommu_translate: detected 
> translation failure (dev=02:00:00, iova=0x0)
> 2018-09-12T11:48:10.193044Z qemu-system-x86_64: New fault is not recorded due 
> to compression of faults
> 2018-09-12T11:48:10.193061Z qemu-system-x86_64: virtio: zero sized buffers 
> are not allowed

But with iothread, virtio_scsi_dataplane_stop runs in another thread
than the iothread; in that case you still have a race where the iothread
can process the vq before aio_disable_external and print the error.

IIUC the guest has cleared the IOMMU page tables _before_ clearing the
DRIVER_OK bit in the status field.  Could this be a guest bug?

Paolo



Re: [Qemu-block] [Qemu-devel] [PATCH 2/2] block/rbd: Attempt to parse legacy filenames

2018-09-12 Thread Jeff Cody
On Wed, Sep 12, 2018 at 12:38:56PM +0200, Kevin Wolf wrote:
> Am 11.09.2018 um 20:37 hat Jeff Cody geschrieben:
> > On Tue, Sep 11, 2018 at 02:22:31PM -0400, John Snow wrote:
> > > Once we load the image, will the header get rewritten into a compliant
> > > format?
> > 
> > Hmm - I think in some code paths, but not all.  I don't think the answer is
> > 'yes' universally, alas.
> 
> Can't we explicitly call BdrvChildRole.update_filename() for all parents
> when we open a legacy filename? We'd just need to add the callback to
> child_file, which would propagate it to the parents of the format layer,
> and then just opening the image with a legacy backing file link once
> would fix the problem for this image.
>

Yes, that is a good idea.  I will spin a v5 with that added.



Re: [Qemu-block] [PATCH v0 2/2] block: postpone the coroutine executing if the BDS's is drained

2018-09-12 Thread Denis Plotnikov




On 10.09.2018 15:41, Kevin Wolf wrote:

Am 29.06.2018 um 14:40 hat Denis Plotnikov geschrieben:

Fixes the problem of ide request appearing when the BDS is in
the "drained section".

Without the patch the request can come and be processed by the main
event loop, as the ide requests are processed by the main event loop
and the main event loop doesn't stop when its context is in the
"drained section".
The request execution is postponed until the end of "drained section".

The patch doesn't modify ide specific code, as well as any other
device code. Instead, it modifies the infrastructure of asynchronous
Block Backend requests, in favor of postponing the requests arisen
when in "drained section" to remove the possibility of request appearing
for all the infrastructure clients.

This approach doesn't make vCPU processing the request wait untill
the end of request processing.

Signed-off-by: Denis Plotnikov 


I generally agree with the idea that requests should be queued during a
drained section. However, I think there are a few fundamental problems
with the implementation in this series:

1) aio_disable_external() is already a layering violation and we'd like
to get rid of it (by replacing it with a BlockDevOps callback from
BlockBackend to the devices), so adding more functionality there
feels like a step in the wrong direction.

2) Only blk_aio_* are fixed, while we also have synchronous public
interfaces (blk_pread/pwrite) as well as coroutine-based ones
(blk_co_*). They need to be postponed as well.

Good point! Thanks!


blk_co_preadv/pwritev() are the common point in the call chain for
all of these variants, so this is where the fix needs to live.
Using the common point might be a good idea, but in case aio requests we 
also have to mane completions which out of the scope of 
blk_co_p(read|write)v:


static void blk_aio_write_entry(void *opaque) {
...
rwco->ret = blk_co_pwritev(...);

blk_aio_complete(acb);
...
}

This makes the difference.
I would suggest adding waiting until "drained_end" is done on the 
synchronous read/write at blk_prw


  >

3) Within a drained section, you want requests from other users to be
blocked, but not your own ones (essentially you want exclusive
access). We don't have blk_drained_begin/end() yet, so this is not
something to implement right now, but let's keep this requirement in
mind and choose a design that allows this.
There is an idea to distinguish the requests that should be done without 
respect to "drained section" by using a flag in BdrvRequestFlags. The 
requests with a flag set should be processed anyway.


I believe the whole logic should be kept local to BlockBackend, and
blk_root_drained_begin/end() should be the functions that start queuing
requests or let queued requests resume.

As we are already in coroutine context in blk_co_preadv/pwritev(), after
checking that blk->quiesce_counter > 0, we can enter the coroutine
object into a list and yield. blk_root_drained_end() calls aio_co_wake()
for each of the queued coroutines. This should be all that we need to
manage.
In my understanding by using brdv_drained_begin/end we want to protect a 
certain BlockDriverState from external access but not the whole 
BlockBackend which may involve using a number of BlockDriverState-s.
I though it because we could possibly change a backing file for some 
BlockDriverState. And for the time of changing we need to prevent 
external access to it but keep the io going.
By using blk_root_drained_begin/end() we put to "drained section" all 
the BlockDriverState-s linked to that root.

Does it have to be so?

Denis



Kevin



--
Best,
Denis



Re: [Qemu-block] [PATCH] vmdk: align end of file to a sector boundary

2018-09-12 Thread Fam Zheng
On Wed, 09/12 17:52, yuchenlin wrote:
> 
> Fam Zheng  於 2018-09-12 17:34 寫道:
> > On Tue, 08/28 11:17, yuchen...@synology.com wrote: > From: yuchenlin 
> >  > > There is a rare case which the size of last 
> > compressed cluster > is larger than the cluster size, which will cause the 
> > file is > not aligned at the sector boundary. I don't understand. Doesn't 
> > it mean that if you force the alignment by truncating out the extra bytes, 
> > some data is lost?   
> 
> You can take qcow2_co_pwritev_compressed in block/qcow2.c as an example.
> The bdrv_getlength will return the length in bytes which is always a multiple 
> of BDRV_SECTOR_SIZE.
> After truncates this size, the vmdk is extended to align sector size.
> > > > Signed-off-by: yuchenlin  > --- > block/vmdk.c 
> > > > | 18 ++ > 1 file changed, 18 insertions(+) > > diff 
> > > > --git a/block/vmdk.c b/block/vmdk.c > index a9d0084e36..a8ae7c65d2 
> > > > 100644 > --- a/block/vmdk.c > +++ b/block/vmdk.c > @@ -1698,6 +1698,24 
> > > > @@ static int coroutine_fn > 
> > > > vmdk_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset, > 
> > > > uint64_t bytes, QEMUIOVector *qiov) > { > + if (bytes == 0) { Where is 
> > > > this bytes == 0 condition from?
> 
> From the end of convert_do_copy in qemu-img.c.
> 
> if (s->compressed && !s->ret) {
> /* signal EOF to align */
> ret = blk_pwrite_compressed(s->target, 0, NULL, 0);
> if (ret < 0) {
> return ret;
> }
> }

I see. Please mention this in the comment of the if check.

Fam

> 
> It signals the EOF to the block driver.
> > > + /* align end of file to a sector boundary. */ > + BDRVVmdkState *s = 
> > > bs->opaque; > + int i, ret; > + int64_t length; > + > + for (i = 0; i < 
> > > s->num_extents; i++) { > + length = 
> > > bdrv_getlength(s->extents[i].file->bs); > + if (length < 0) { > + return 
> > > length; > + } > + ret = bdrv_truncate(s->extents[i].file, length, 
> > > PREALLOC_MODE_OFF, NULL); > + if (ret < 0) { > + return ret; > + } > + } 
> > > > + return 0; > + } > return vmdk_co_pwritev(bs, offset, bytes, qiov, 0); 
> > > > } > > -- > 2.17.0 > Fam
> 
> 
> yuchenlin
> 



Re: [Qemu-block] [PATCH 2/2] virtio-scsi/virtio-blk: Disable poll handlers when stopping vq handler

2018-09-12 Thread Fam Zheng
On Wed, 09/12 13:11, Paolo Bonzini wrote:
> On 12/09/2018 03:31, Fam Zheng wrote:
> >>>
> >>> ctx is qemu_aio_context here, so there's no interaction with IOThread.
> >> In this case, it should be okay to have the reentrancy, what is the bug
> >> that this patch is fixing?
> > The same symptom as in the previous patch: virtio_scsi_handle_cmd_vq hangs. 
> > The
> > reason it hangs is fixed by the previous patch, but I don't think it should 
> > be
> > invoked as we're in the middle of virtio_scsi_dataplane_stop(). Applying 
> > either
> > one of the two patches avoids the problem, but this one is more superficial.
> > What do you think?
> 
> I think it's okay if it is invoked.  The sequence is first you stop the
> vq, then you drain the BlockBackends, then you switch AioContext.  All
> that matters is the outcome when virtio_scsi_dataplane_stop returns.

Yes, but together with vIOMMU, it also effectively leads to a virtio_error(),
which is not clean. QEMU stderr when this call happens (with patch 1 but not
this patch):

2018-09-12T11:48:10.193023Z qemu-system-x86_64: vtd_iommu_translate: detected 
translation failure (dev=02:00:00, iova=0x0)
2018-09-12T11:48:10.193044Z qemu-system-x86_64: New fault is not recorded due 
to compression of faults
2018-09-12T11:48:10.193061Z qemu-system-x86_64: virtio: zero sized buffers are 
not allowed

Fam



Re: [Qemu-block] [PATCH 2/2] virtio-scsi/virtio-blk: Disable poll handlers when stopping vq handler

2018-09-12 Thread Paolo Bonzini
On 12/09/2018 03:31, Fam Zheng wrote:
>>>
>>> ctx is qemu_aio_context here, so there's no interaction with IOThread.
>> In this case, it should be okay to have the reentrancy, what is the bug
>> that this patch is fixing?
> The same symptom as in the previous patch: virtio_scsi_handle_cmd_vq hangs. 
> The
> reason it hangs is fixed by the previous patch, but I don't think it should be
> invoked as we're in the middle of virtio_scsi_dataplane_stop(). Applying 
> either
> one of the two patches avoids the problem, but this one is more superficial.
> What do you think?

I think it's okay if it is invoked.  The sequence is first you stop the
vq, then you drain the BlockBackends, then you switch AioContext.  All
that matters is the outcome when virtio_scsi_dataplane_stop returns.

Paolo



Re: [Qemu-block] [Qemu-devel] [PATCH 2/2] block/rbd: Attempt to parse legacy filenames

2018-09-12 Thread Kevin Wolf
Am 11.09.2018 um 20:37 hat Jeff Cody geschrieben:
> On Tue, Sep 11, 2018 at 02:22:31PM -0400, John Snow wrote:
> > Once we load the image, will the header get rewritten into a compliant
> > format?
> 
> Hmm - I think in some code paths, but not all.  I don't think the answer is
> 'yes' universally, alas.

Can't we explicitly call BdrvChildRole.update_filename() for all parents
when we open a legacy filename? We'd just need to add the callback to
child_file, which would propagate it to the parents of the format layer,
and then just opening the image with a legacy backing file link once
would fix the problem for this image.

Kevin



Re: [Qemu-block] [PATCH] util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cb

2018-09-12 Thread Kevin Wolf
Am 12.09.2018 um 09:41 hat Fam Zheng geschrieben:
> On Wed, 09/05 11:33, Sergio Lopez wrote:
> > AIO Coroutines shouldn't by managed by an AioContext different than the
> > one assigned when they are created. aio_co_enter avoids entering a
> > coroutine from a different AioContext, calling aio_co_schedule instead.
> > 
> > Scheduled coroutines are then entered by co_schedule_bh_cb using
> > qemu_coroutine_enter, which just calls qemu_aio_coroutine_enter with the
> > current AioContext obtained with qemu_get_current_aio_context.
> > Eventually, co->ctx will be set to the AioContext passed as an argument
> > to qemu_aio_coroutine_enter.
> > 
> > This means that, if an IO Thread's AioConext is being processed by the
> > Main Thread (due to aio_poll being called with a BDS AioContext, as it
> > happens in AIO_WAIT_WHILE among other places), the AioContext from some
> > coroutines may be wrongly replaced with the one from the Main Thread.
> > 
> > This is the root cause behind some crashes, mainly triggered by the
> > drain code at block/io.c. The most common are these abort and failed
> > assertion:
> > 
> > util/async.c:aio_co_schedule
> > 456 if (scheduled) {
> > 457 fprintf(stderr,
> > 458 "%s: Co-routine was already scheduled in '%s'\n",
> > 459 __func__, scheduled);
> > 460 abort();
> > 461 }
> > 
> > util/qemu-coroutine-lock.c:
> > 286 assert(mutex->holder == self);
> > 
> > But it's also known to cause random errors at different locations, and
> > even SIGSEGV with broken coroutine backtraces.
> > 
> > By using qemu_aio_coroutine_enter directly in co_schedule_bh_cb, we can
> > pass the correct AioContext as an argument, making sure co->ctx is not
> > wrongly altered.
> > 
> > Signed-off-by: Sergio Lopez 
> > ---
> >  util/async.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/util/async.c b/util/async.c
> > index 05979f8014..c10642a385 100644
> > --- a/util/async.c
> > +++ b/util/async.c
> > @@ -400,7 +400,7 @@ static void co_schedule_bh_cb(void *opaque)
> >  
> >  /* Protected by write barrier in qemu_aio_coroutine_enter */
> >  atomic_set(>scheduled, NULL);
> > -qemu_coroutine_enter(co);
> > +qemu_aio_coroutine_enter(ctx, co);
> >  aio_context_release(ctx);
> >  }
> >  }
> 
> Kevin, could you test this patch together with your next version of the drain
> fix series? Since they are related, it's better if you could include it in 
> your
> series or even apply it yourself. Peter is not processing pull requests, so
> scattering fixes in various trees will do no good.

Apparently I forgot to send an email, but I already applied this to my
block branch.

Kevin



Re: [Qemu-block] [PATCH] vmdk: align end of file to a sector boundary

2018-09-12 Thread yuchenlin

Fam Zheng  於 2018-09-12 17:34 寫道:
> On Tue, 08/28 11:17, yuchen...@synology.com wrote: > From: yuchenlin 
>  > > There is a rare case which the size of last 
> compressed cluster > is larger than the cluster size, which will cause the 
> file is > not aligned at the sector boundary. I don't understand. Doesn't it 
> mean that if you force the alignment by truncating out the extra bytes, some 
> data is lost?   

You can take qcow2_co_pwritev_compressed in block/qcow2.c as an example.
The bdrv_getlength will return the length in bytes which is always a multiple 
of BDRV_SECTOR_SIZE.
After truncates this size, the vmdk is extended to align sector size.
> > > Signed-off-by: yuchenlin  > --- > block/vmdk.c | 
> > > 18 ++ > 1 file changed, 18 insertions(+) > > diff --git 
> > > a/block/vmdk.c b/block/vmdk.c > index a9d0084e36..a8ae7c65d2 100644 > --- 
> > > a/block/vmdk.c > +++ b/block/vmdk.c > @@ -1698,6 +1698,24 @@ static int 
> > > coroutine_fn > vmdk_co_pwritev_compressed(BlockDriverState *bs, uint64_t 
> > > offset, > uint64_t bytes, QEMUIOVector *qiov) > { > + if (bytes == 0) { 
> > > Where is this bytes == 0 condition from?

From the end of convert_do_copy in qemu-img.c.

if (s->compressed && !s->ret) {
/* signal EOF to align */
ret = blk_pwrite_compressed(s->target, 0, NULL, 0);
if (ret < 0) {
return ret;
}
}

It signals the EOF to the block driver.
> > + /* align end of file to a sector boundary. */ > + BDRVVmdkState *s = 
> > bs->opaque; > + int i, ret; > + int64_t length; > + > + for (i = 0; i < 
> > s->num_extents; i++) { > + length = bdrv_getlength(s->extents[i].file->bs); 
> > > + if (length < 0) { > + return length; > + } > + ret = 
> > bdrv_truncate(s->extents[i].file, length, PREALLOC_MODE_OFF, NULL); > + if 
> > (ret < 0) { > + return ret; > + } > + } > + return 0; > + } > return 
> > vmdk_co_pwritev(bs, offset, bytes, qiov, 0); > } > > -- > 2.17.0 > Fam


yuchenlin



Re: [Qemu-block] [PATCH] vmdk: align end of file to a sector boundary

2018-09-12 Thread Fam Zheng
On Tue, 08/28 11:17, yuchen...@synology.com wrote:
> From: yuchenlin 
> 
> There is a rare case which the size of last compressed cluster
> is larger than the cluster size, which will cause the file is
> not aligned at the sector boundary.

I don't understand. Doesn't it mean that if you force the alignment by
truncating out the extra bytes, some data is lost?

> 
> Signed-off-by: yuchenlin 
> ---
>  block/vmdk.c | 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index a9d0084e36..a8ae7c65d2 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -1698,6 +1698,24 @@ static int coroutine_fn
>  vmdk_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
> uint64_t bytes, QEMUIOVector *qiov)
>  {
> +if (bytes == 0) {

Where is this bytes == 0 condition from?

> +/* align end of file to a sector boundary. */
> +BDRVVmdkState *s = bs->opaque;
> +int i, ret;
> +int64_t length;
> +
> +for (i = 0; i < s->num_extents; i++) {
> +length = bdrv_getlength(s->extents[i].file->bs);
> +if (length < 0) {
> +return length;
> +}
> +ret = bdrv_truncate(s->extents[i].file, length, 
> PREALLOC_MODE_OFF, NULL);
> +if (ret < 0) {
> +return ret;
> +}
> +}
> +return 0;
> +}
>  return vmdk_co_pwritev(bs, offset, bytes, qiov, 0);
>  }
>  
> -- 
> 2.17.0
> 

Fam



Re: [Qemu-block] [PATCH v8 0/8] Take the image size into account when allocating the L2 cache

2018-09-12 Thread Leonid Bloch

Hi Kevin & All,

Sorry it took so long to send a new version! I had some very urgent 
things popping up on several fronts. I will send the new version over 
this weekend.


Thanks for the reminder, and sorry again.

Leonid.
___

On 9/10/18 5:33 PM, Kevin Wolf wrote:

Hi Leonid,

Am 13.08.2018 um 03:07 hat Leonid Bloch geschrieben:

This series makes the qcow2 L2 cache assignment aware of the image size,
with the intention for it to cover the entire image. The importance of
this change is in noticeable performance improvement, especially with
heavy random I/O. The memory overhead is not big in most cases, as only
1 MB of cache for every 8 GB of image size is used. For cases with very
large images and/or small cluster sizes, or systems with limited RAM
resources, there is an upper limit on the default L2 cache: 32 MB. To
modify this limit one can use the already existing 'l2-cache-size' and
'cache-size' options. Moreover, this fixes the behavior of 'l2-cache-size',
as it was documented as the *maximum* L2 cache size, but in practice
behaved as the absolute size.

To compensate the memory overhead which may be increased following this
behavior, the default cache-clean-interval is set to 10 minutes by default
(was disabled by default before).

The L2 cache is also resized accordingly, by default, if the image is
resized.


after digging through the most urgent part of my post-vacation backlog,
I saw this series again. It's been a while, but I don't think much is
missing, so let's get this finished now. :-)

We had some discussion in the v7 thread after you posted v8, and you
mentioned there that you'd do some changes in a v9 (at first sight, just
some test case stuff). Are you still planning to send that v9?

Kevin





Re: [Qemu-block] [PATCH] util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cb

2018-09-12 Thread Fam Zheng
On Wed, 09/05 11:33, Sergio Lopez wrote:
> AIO Coroutines shouldn't by managed by an AioContext different than the
> one assigned when they are created. aio_co_enter avoids entering a
> coroutine from a different AioContext, calling aio_co_schedule instead.
> 
> Scheduled coroutines are then entered by co_schedule_bh_cb using
> qemu_coroutine_enter, which just calls qemu_aio_coroutine_enter with the
> current AioContext obtained with qemu_get_current_aio_context.
> Eventually, co->ctx will be set to the AioContext passed as an argument
> to qemu_aio_coroutine_enter.
> 
> This means that, if an IO Thread's AioConext is being processed by the
> Main Thread (due to aio_poll being called with a BDS AioContext, as it
> happens in AIO_WAIT_WHILE among other places), the AioContext from some
> coroutines may be wrongly replaced with the one from the Main Thread.
> 
> This is the root cause behind some crashes, mainly triggered by the
> drain code at block/io.c. The most common are these abort and failed
> assertion:
> 
> util/async.c:aio_co_schedule
> 456 if (scheduled) {
> 457 fprintf(stderr,
> 458 "%s: Co-routine was already scheduled in '%s'\n",
> 459 __func__, scheduled);
> 460 abort();
> 461 }
> 
> util/qemu-coroutine-lock.c:
> 286 assert(mutex->holder == self);
> 
> But it's also known to cause random errors at different locations, and
> even SIGSEGV with broken coroutine backtraces.
> 
> By using qemu_aio_coroutine_enter directly in co_schedule_bh_cb, we can
> pass the correct AioContext as an argument, making sure co->ctx is not
> wrongly altered.
> 
> Signed-off-by: Sergio Lopez 
> ---
>  util/async.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/util/async.c b/util/async.c
> index 05979f8014..c10642a385 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -400,7 +400,7 @@ static void co_schedule_bh_cb(void *opaque)
>  
>  /* Protected by write barrier in qemu_aio_coroutine_enter */
>  atomic_set(>scheduled, NULL);
> -qemu_coroutine_enter(co);
> +qemu_aio_coroutine_enter(ctx, co);
>  aio_context_release(ctx);
>  }
>  }

Kevin, could you test this patch together with your next version of the drain
fix series? Since they are related, it's better if you could include it in your
series or even apply it yourself. Peter is not processing pull requests, so
scattering fixes in various trees will do no good.

Fam



Re: [Qemu-block] [PATCH] util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cb

2018-09-12 Thread Fam Zheng
On Wed, 09/05 11:33, Sergio Lopez wrote:
> AIO Coroutines shouldn't by managed by an AioContext different than the
> one assigned when they are created. aio_co_enter avoids entering a
> coroutine from a different AioContext, calling aio_co_schedule instead.
> 
> Scheduled coroutines are then entered by co_schedule_bh_cb using
> qemu_coroutine_enter, which just calls qemu_aio_coroutine_enter with the
> current AioContext obtained with qemu_get_current_aio_context.
> Eventually, co->ctx will be set to the AioContext passed as an argument
> to qemu_aio_coroutine_enter.
> 
> This means that, if an IO Thread's AioConext is being processed by the
> Main Thread (due to aio_poll being called with a BDS AioContext, as it
> happens in AIO_WAIT_WHILE among other places), the AioContext from some
> coroutines may be wrongly replaced with the one from the Main Thread.
> 
> This is the root cause behind some crashes, mainly triggered by the
> drain code at block/io.c. The most common are these abort and failed
> assertion:
> 
> util/async.c:aio_co_schedule
> 456 if (scheduled) {
> 457 fprintf(stderr,
> 458 "%s: Co-routine was already scheduled in '%s'\n",
> 459 __func__, scheduled);
> 460 abort();
> 461 }
> 
> util/qemu-coroutine-lock.c:
> 286 assert(mutex->holder == self);
> 
> But it's also known to cause random errors at different locations, and
> even SIGSEGV with broken coroutine backtraces.
> 
> By using qemu_aio_coroutine_enter directly in co_schedule_bh_cb, we can
> pass the correct AioContext as an argument, making sure co->ctx is not
> wrongly altered.
> 
> Signed-off-by: Sergio Lopez 
> ---
>  util/async.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/util/async.c b/util/async.c
> index 05979f8014..c10642a385 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -400,7 +400,7 @@ static void co_schedule_bh_cb(void *opaque)
>  
>  /* Protected by write barrier in qemu_aio_coroutine_enter */
>  atomic_set(>scheduled, NULL);
> -qemu_coroutine_enter(co);
> +qemu_aio_coroutine_enter(ctx, co);
>  aio_context_release(ctx);
>  }
>  }
> -- 
> 2.17.0
> 

Reviewed-by: Fam Zheng