Re: [PATCH] block/rbd: fix write zeroes with growing images

2022-03-24 Thread Peter Lieven

Am 24.03.22 um 12:06 schrieb Hanna Reitz:

On 24.03.22 11:42, Peter Lieven wrote:

Am 24.03.22 um 11:40 schrieb Stefano Garzarella:

On Thu, Mar 24, 2022 at 10:52:04AM +0100, Peter Lieven wrote:

Am 22.03.22 um 10:38 schrieb Hanna Reitz:

On 21.03.22 09:31, Stefano Garzarella wrote:

On Sat, Mar 19, 2022 at 04:15:33PM +0100, Peter Lieven wrote:




Am 18.03.2022 um 17:47 schrieb Stefano Garzarella :

On Fri, Mar 18, 2022 at 04:48:18PM +0100, Peter Lieven wrote:




Am 18.03.2022 um 09:25 schrieb Stefano Garzarella :


On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:




Am 17.03.2022 um 17:26 schrieb Stefano Garzarella :


Commit d24f80234b ("block/rbd: increase dynamically the image size")
added a workaround to support growing images (eg. qcow2), resizing
the image before write operations that exceed the current size.

We recently added support for write zeroes and without the
workaround we can have problems with qcow2.

So let's move the resize into qemu_rbd_start_co() and do it when
the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
Signed-off-by: Stefano Garzarella 
---
block/rbd.c | 26 ++
1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 8f183eba2a..6caf35cbba 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1107,6 +1107,20 @@ static int coroutine_fn 
qemu_rbd_start_co(BlockDriverState *bs,

  assert(!qiov || qiov->size == bytes);

+    if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
+    /*
+ * RBD APIs don't allow us to write more than actual size, so in order
+ * to support growing images, we resize the image before write
+ * operations that exceed the current size.
+ */
+    if (offset + bytes > s->image_size) {
+    int r = qemu_rbd_resize(bs, offset + bytes);
+    if (r < 0) {
+    return r;
+    }
+    }
+    }
+
  r = rbd_aio_create_completion(,
(rbd_callback_t) qemu_rbd_completion_cb, );
  if (r < 0) {
@@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, 
int64_t offset,
   int64_t bytes, QEMUIOVector *qiov,
BdrvRequestFlags flags)
{
-    BDRVRBDState *s = bs->opaque;
-    /*
- * RBD APIs don't allow us to write more than actual size, so in order
- * to support growing images, we resize the image before write
- * operations that exceed the current size.
- */
-    if (offset + bytes > s->image_size) {
-    int r = qemu_rbd_resize(bs, offset + bytes);
-    if (r < 0) {
-    return r;
-    }
-    }
  return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
}

-- 2.35.1



Do we really have a use case for growing rbd images?


The use case is to have a qcow2 image on rbd.
I don't think it's very common, but some people use it and here [1] we had a 
little discussion about features that could be interesting (e.g. persistent 
dirty bitmaps for incremental backup).

In any case the support is quite simple and does not affect other use cases 
since we only increase the size when we go beyond the current size.

IMHO we can have it in :-)



The QCOW2 alone doesn’t make much sense, but additional metadata might be a use 
case.


Yep.


Be aware that the current approach will serialize requests. If there is a real 
use case, we might think of a better solution.


Good point, but it only happens when we have to resize, so maybe it's okay for 
now, but I agree we could do better ;-)


There might also be a problem if a write for a higher offset past eof will be executed shortly before a write to a slightly lower offset past eof. The second resize will fail as it would shrink the image. We would need proper locking to avoid 
this. Maybe we need to check if we write past eof. If yes, take a lock around the resize op and then check again if it’s still eof and only resize if true.


I thought rbd_resize() was synchronous. Indeed when you said this could 
serialize writes it sounded like confirmation to me.

Since we call rbd_resize() before rbd_aio_writev(), I thought this case could 
not occur.

Can you please elaborate?


Seconding this request, because if rbd_resize() is allowed to shrink data, it 
being asynchronous might cause data corruption.

I’ll keep your patch because I find this highly unlikely, though: 
qemu_rbd_resize() itself is definitely synchronous, it can’t invoke 
qemu_coroutine_yield().

The only other possibility that comes to my mind is that rbd_resize() might delay the actual resize operation, but I would still expect consecutive resize requests to be executed in order, and since we call rbd_aio_writev()/rbd_aio_write_zeroes() 
immediately after the rbd_resize() (with no yielding in between), everything should be executed in the order that we expect.



Maybe my assumption of parallelism 

Re: [PATCH] block/rbd: fix write zeroes with growing images

2022-03-24 Thread Hanna Reitz

On 24.03.22 11:42, Peter Lieven wrote:

Am 24.03.22 um 11:40 schrieb Stefano Garzarella:

On Thu, Mar 24, 2022 at 10:52:04AM +0100, Peter Lieven wrote:

Am 22.03.22 um 10:38 schrieb Hanna Reitz:

On 21.03.22 09:31, Stefano Garzarella wrote:

On Sat, Mar 19, 2022 at 04:15:33PM +0100, Peter Lieven wrote:



Am 18.03.2022 um 17:47 schrieb Stefano Garzarella 
:


On Fri, Mar 18, 2022 at 04:48:18PM +0100, Peter Lieven wrote:



Am 18.03.2022 um 09:25 schrieb Stefano Garzarella 
:


On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:



Am 17.03.2022 um 17:26 schrieb Stefano Garzarella 
:


Commit d24f80234b ("block/rbd: increase dynamically the 
image size")
added a workaround to support growing images (eg. qcow2), 
resizing

the image before write operations that exceed the current size.

We recently added support for write zeroes and without the
workaround we can have problems with qcow2.

So let's move the resize into qemu_rbd_start_co() and do it 
when

the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
Signed-off-by: Stefano Garzarella 
---
block/rbd.c | 26 ++
1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 8f183eba2a..6caf35cbba 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1107,6 +1107,20 @@ static int coroutine_fn 
qemu_rbd_start_co(BlockDriverState *bs,


  assert(!qiov || qiov->size == bytes);

+    if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
+    /*
+ * RBD APIs don't allow us to write more than 
actual size, so in order
+ * to support growing images, we resize the image 
before write

+ * operations that exceed the current size.
+ */
+    if (offset + bytes > s->image_size) {
+    int r = qemu_rbd_resize(bs, offset + bytes);
+    if (r < 0) {
+    return r;
+    }
+    }
+    }
+
  r = rbd_aio_create_completion(,
(rbd_callback_t) qemu_rbd_completion_cb, );
  if (r < 0) {
@@ -1182,18 +1196,6 @@ coroutine_fn 
qemu_rbd_co_pwritev(BlockDriverState *bs, int64_t offset,
   int64_t bytes, QEMUIOVector 
*qiov,

BdrvRequestFlags flags)
{
-    BDRVRBDState *s = bs->opaque;
-    /*
- * RBD APIs don't allow us to write more than actual 
size, so in order
- * to support growing images, we resize the image 
before write

- * operations that exceed the current size.
- */
-    if (offset + bytes > s->image_size) {
-    int r = qemu_rbd_resize(bs, offset + bytes);
-    if (r < 0) {
-    return r;
-    }
-    }
  return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, 
RBD_AIO_WRITE);

}

-- 2.35.1



Do we really have a use case for growing rbd images?


The use case is to have a qcow2 image on rbd.
I don't think it's very common, but some people use it and 
here [1] we had a little discussion about features that could 
be interesting (e.g. persistent dirty bitmaps for incremental 
backup).


In any case the support is quite simple and does not affect 
other use cases since we only increase the size when we go 
beyond the current size.


IMHO we can have it in :-)



The QCOW2 alone doesn’t make much sense, but additional 
metadata might be a use case.


Yep.

Be aware that the current approach will serialize requests. If 
there is a real use case, we might think of a better solution.


Good point, but it only happens when we have to resize, so maybe 
it's okay for now, but I agree we could do better ;-)


There might also be a problem if a write for a higher offset past 
eof will be executed shortly before a write to a slightly lower 
offset past eof. The second resize will fail as it would shrink 
the image. We would need proper locking to avoid this. Maybe we 
need to check if we write past eof. If yes, take a lock around 
the resize op and then check again if it’s still eof and only 
resize if true.


I thought rbd_resize() was synchronous. Indeed when you said this 
could serialize writes it sounded like confirmation to me.


Since we call rbd_resize() before rbd_aio_writev(), I thought this 
case could not occur.


Can you please elaborate?


Seconding this request, because if rbd_resize() is allowed to 
shrink data, it being asynchronous might cause data corruption.


I’ll keep your patch because I find this highly unlikely, though: 
qemu_rbd_resize() itself is definitely synchronous, it can’t invoke 
qemu_coroutine_yield().


The only other possibility that comes to my mind is that 
rbd_resize() might delay the actual resize operation, but I would 
still expect consecutive resize requests to be executed in order, 
and since we call rbd_aio_writev()/rbd_aio_write_zeroes() 
immediately after the rbd_resize() (with no yielding in between), 
everything should be executed in the order that we expect.



Maybe my assumption of parallelism here was 

Re: [PATCH] block/rbd: fix write zeroes with growing images

2022-03-24 Thread Peter Lieven

Am 24.03.22 um 11:40 schrieb Stefano Garzarella:

On Thu, Mar 24, 2022 at 10:52:04AM +0100, Peter Lieven wrote:

Am 22.03.22 um 10:38 schrieb Hanna Reitz:

On 21.03.22 09:31, Stefano Garzarella wrote:

On Sat, Mar 19, 2022 at 04:15:33PM +0100, Peter Lieven wrote:




Am 18.03.2022 um 17:47 schrieb Stefano Garzarella :

On Fri, Mar 18, 2022 at 04:48:18PM +0100, Peter Lieven wrote:




Am 18.03.2022 um 09:25 schrieb Stefano Garzarella :


On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:




Am 17.03.2022 um 17:26 schrieb Stefano Garzarella :


Commit d24f80234b ("block/rbd: increase dynamically the image size")
added a workaround to support growing images (eg. qcow2), resizing
the image before write operations that exceed the current size.

We recently added support for write zeroes and without the
workaround we can have problems with qcow2.

So let's move the resize into qemu_rbd_start_co() and do it when
the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
Signed-off-by: Stefano Garzarella 
---
block/rbd.c | 26 ++
1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 8f183eba2a..6caf35cbba 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1107,6 +1107,20 @@ static int coroutine_fn 
qemu_rbd_start_co(BlockDriverState *bs,

  assert(!qiov || qiov->size == bytes);

+    if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
+    /*
+ * RBD APIs don't allow us to write more than actual size, so in order
+ * to support growing images, we resize the image before write
+ * operations that exceed the current size.
+ */
+    if (offset + bytes > s->image_size) {
+    int r = qemu_rbd_resize(bs, offset + bytes);
+    if (r < 0) {
+    return r;
+    }
+    }
+    }
+
  r = rbd_aio_create_completion(,
    (rbd_callback_t) qemu_rbd_completion_cb, );
  if (r < 0) {
@@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, 
int64_t offset,
   int64_t bytes, QEMUIOVector *qiov,
   BdrvRequestFlags flags)
{
-    BDRVRBDState *s = bs->opaque;
-    /*
- * RBD APIs don't allow us to write more than actual size, so in order
- * to support growing images, we resize the image before write
- * operations that exceed the current size.
- */
-    if (offset + bytes > s->image_size) {
-    int r = qemu_rbd_resize(bs, offset + bytes);
-    if (r < 0) {
-    return r;
-    }
-    }
  return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
}

-- 2.35.1



Do we really have a use case for growing rbd images?


The use case is to have a qcow2 image on rbd.
I don't think it's very common, but some people use it and here [1] we had a 
little discussion about features that could be interesting (e.g.  persistent 
dirty bitmaps for incremental backup).

In any case the support is quite simple and does not affect other use cases 
since we only increase the size when we go beyond the current size.

IMHO we can have it in :-)



The QCOW2 alone doesn’t make much sense, but additional metadata might be a use 
case.


Yep.


Be aware that the current approach will serialize requests. If there is a real 
use case, we might think of a better solution.


Good point, but it only happens when we have to resize, so maybe it's okay for 
now, but I agree we could do better ;-)


There might also be a problem if a write for a higher offset past eof will be executed shortly before a write to a slightly lower offset past eof. The second resize will fail as it would shrink the image. We would need proper locking to avoid this. 
Maybe we need to check if we write past eof. If yes, take a lock around the resize op and then check again if it’s still eof and only resize if true.


I thought rbd_resize() was synchronous. Indeed when you said this could 
serialize writes it sounded like confirmation to me.

Since we call rbd_resize() before rbd_aio_writev(), I thought this case could 
not occur.

Can you please elaborate?


Seconding this request, because if rbd_resize() is allowed to shrink data, it 
being asynchronous might cause data corruption.

I’ll keep your patch because I find this highly unlikely, though: 
qemu_rbd_resize() itself is definitely synchronous, it can’t invoke 
qemu_coroutine_yield().

The only other possibility that comes to my mind is that rbd_resize() might delay the actual resize operation, but I would still expect consecutive resize requests to be executed in order, and since we call rbd_aio_writev()/rbd_aio_write_zeroes() 
immediately after the rbd_resize() (with no yielding in between), everything should be executed in the order that we expect.



Maybe my assumption of parallelism here was wrong. I 

Re: [PATCH] block/rbd: fix write zeroes with growing images

2022-03-24 Thread Stefano Garzarella

On Thu, Mar 24, 2022 at 10:52:04AM +0100, Peter Lieven wrote:

Am 22.03.22 um 10:38 schrieb Hanna Reitz:

On 21.03.22 09:31, Stefano Garzarella wrote:

On Sat, Mar 19, 2022 at 04:15:33PM +0100, Peter Lieven wrote:




Am 18.03.2022 um 17:47 schrieb Stefano Garzarella :

On Fri, Mar 18, 2022 at 04:48:18PM +0100, Peter Lieven wrote:




Am 18.03.2022 um 09:25 schrieb Stefano Garzarella :


On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:




Am 17.03.2022 um 17:26 schrieb Stefano Garzarella :


Commit d24f80234b ("block/rbd: increase dynamically the image size")
added a workaround to support growing images (eg. qcow2), resizing
the image before write operations that exceed the current size.

We recently added support for write zeroes and without the
workaround we can have problems with qcow2.

So let's move the resize into qemu_rbd_start_co() and do it when
the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
Signed-off-by: Stefano Garzarella 
---
block/rbd.c | 26 ++
1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 8f183eba2a..6caf35cbba 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1107,6 +1107,20 @@ static int coroutine_fn 
qemu_rbd_start_co(BlockDriverState *bs,

  assert(!qiov || qiov->size == bytes);

+    if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
+    /*
+ * RBD APIs don't allow us to write more than actual size, so in order
+ * to support growing images, we resize the image before write
+ * operations that exceed the current size.
+ */
+    if (offset + bytes > s->image_size) {
+    int r = qemu_rbd_resize(bs, offset + bytes);
+    if (r < 0) {
+    return r;
+    }
+    }
+    }
+
  r = rbd_aio_create_completion(,
    (rbd_callback_t) qemu_rbd_completion_cb, );
  if (r < 0) {
@@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, 
int64_t offset,
   int64_t bytes, QEMUIOVector *qiov,
   BdrvRequestFlags flags)
{
-    BDRVRBDState *s = bs->opaque;
-    /*
- * RBD APIs don't allow us to write more than actual size, so in order
- * to support growing images, we resize the image before write
- * operations that exceed the current size.
- */
-    if (offset + bytes > s->image_size) {
-    int r = qemu_rbd_resize(bs, offset + bytes);
-    if (r < 0) {
-    return r;
-    }
-    }
  return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
}

-- 2.35.1



Do we really have a use case for growing rbd images?


The use case is to have a qcow2 image on rbd.
I don't think it's very common, but some people use it and here [1] we had a 
little discussion about features that could be interesting (e.g.  persistent 
dirty bitmaps for incremental backup).

In any case the support is quite simple and does not affect other use cases 
since we only increase the size when we go beyond the current size.

IMHO we can have it in :-)



The QCOW2 alone doesn’t make much sense, but additional metadata might be a use 
case.


Yep.


Be aware that the current approach will serialize requests. If there is a real 
use case, we might think of a better solution.


Good point, but it only happens when we have to resize, so maybe it's okay for 
now, but I agree we could do better ;-)


There might also be a problem if a write for a higher offset 
past eof will be executed shortly before a write to a slightly 
lower offset past eof. The second resize will fail as it would 
shrink the image. We would need proper locking to avoid this. 
Maybe we need to check if we write past eof. If yes, take a lock 
around the resize op and then check again if it’s still eof and 
only resize if true.


I thought rbd_resize() was synchronous. Indeed when you said this could 
serialize writes it sounded like confirmation to me.

Since we call rbd_resize() before rbd_aio_writev(), I thought this case could 
not occur.

Can you please elaborate?


Seconding this request, because if rbd_resize() is allowed to shrink data, it 
being asynchronous might cause data corruption.

I’ll keep your patch because I find this highly unlikely, though: 
qemu_rbd_resize() itself is definitely synchronous, it can’t invoke 
qemu_coroutine_yield().

The only other possibility that comes to my mind is that 
rbd_resize() might delay the actual resize operation, but I would 
still expect consecutive resize requests to be executed in order, 
and since we call rbd_aio_writev()/rbd_aio_write_zeroes() 
immediately after the rbd_resize() (with no yielding in between), 
everything should be executed in the order that we expect.



Maybe my assumption of parallelism here was wrong. I was thinking of:


Request A: write at 

Re: [PATCH] block/rbd: fix write zeroes with growing images

2022-03-24 Thread Peter Lieven

Am 22.03.22 um 10:38 schrieb Hanna Reitz:

On 21.03.22 09:31, Stefano Garzarella wrote:

On Sat, Mar 19, 2022 at 04:15:33PM +0100, Peter Lieven wrote:




Am 18.03.2022 um 17:47 schrieb Stefano Garzarella :

On Fri, Mar 18, 2022 at 04:48:18PM +0100, Peter Lieven wrote:




Am 18.03.2022 um 09:25 schrieb Stefano Garzarella :


On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:




Am 17.03.2022 um 17:26 schrieb Stefano Garzarella :


Commit d24f80234b ("block/rbd: increase dynamically the image size")
added a workaround to support growing images (eg. qcow2), resizing
the image before write operations that exceed the current size.

We recently added support for write zeroes and without the
workaround we can have problems with qcow2.

So let's move the resize into qemu_rbd_start_co() and do it when
the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
Signed-off-by: Stefano Garzarella 
---
block/rbd.c | 26 ++
1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 8f183eba2a..6caf35cbba 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1107,6 +1107,20 @@ static int coroutine_fn 
qemu_rbd_start_co(BlockDriverState *bs,

  assert(!qiov || qiov->size == bytes);

+    if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
+    /*
+ * RBD APIs don't allow us to write more than actual size, so in order
+ * to support growing images, we resize the image before write
+ * operations that exceed the current size.
+ */
+    if (offset + bytes > s->image_size) {
+    int r = qemu_rbd_resize(bs, offset + bytes);
+    if (r < 0) {
+    return r;
+    }
+    }
+    }
+
  r = rbd_aio_create_completion(,
    (rbd_callback_t) qemu_rbd_completion_cb, );
  if (r < 0) {
@@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, 
int64_t offset,
   int64_t bytes, QEMUIOVector *qiov,
   BdrvRequestFlags flags)
{
-    BDRVRBDState *s = bs->opaque;
-    /*
- * RBD APIs don't allow us to write more than actual size, so in order
- * to support growing images, we resize the image before write
- * operations that exceed the current size.
- */
-    if (offset + bytes > s->image_size) {
-    int r = qemu_rbd_resize(bs, offset + bytes);
-    if (r < 0) {
-    return r;
-    }
-    }
  return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
}

--
2.35.1



Do we really have a use case for growing rbd images?


The use case is to have a qcow2 image on rbd.
I don't think it's very common, but some people use it and here [1] we had a 
little discussion about features that could be interesting (e.g.  persistent 
dirty bitmaps for incremental backup).

In any case the support is quite simple and does not affect other use cases 
since we only increase the size when we go beyond the current size.

IMHO we can have it in :-)



The QCOW2 alone doesn’t make much sense, but additional metadata might be a use 
case.


Yep.


Be aware that the current approach will serialize requests. If there is a real 
use case, we might think of a better solution.


Good point, but it only happens when we have to resize, so maybe it's okay for 
now, but I agree we could do better ;-)


There might also be a problem if a write for a higher offset past eof will be executed shortly before a write to a slightly lower offset past eof. The second resize will fail as it would shrink the image. We would need proper locking to avoid this. 
Maybe we need to check if we write past eof. If yes, take a lock around the resize op and then check again if it’s still eof and only resize if true.


I thought rbd_resize() was synchronous. Indeed when you said this could 
serialize writes it sounded like confirmation to me.

Since we call rbd_resize() before rbd_aio_writev(), I thought this case could 
not occur.

Can you please elaborate?


Seconding this request, because if rbd_resize() is allowed to shrink data, it 
being asynchronous might cause data corruption.

I’ll keep your patch because I find this highly unlikely, though: 
qemu_rbd_resize() itself is definitely synchronous, it can’t invoke 
qemu_coroutine_yield().

The only other possibility that comes to my mind is that rbd_resize() might delay the actual resize operation, but I would still expect consecutive resize requests to be executed in order, and since we call rbd_aio_writev()/rbd_aio_write_zeroes() 
immediately after the rbd_resize() (with no yielding in between), everything should be executed in the order that we expect.



Maybe my assumption of parallelism here was wrong. I was thinking of:


Request A: write at offset (EOL + 4k).

Request A: rbd_resize is invoked (size EOL + 4k)


Re: [PATCH] block/rbd: fix write zeroes with growing images

2022-03-22 Thread Hanna Reitz

On 21.03.22 09:31, Stefano Garzarella wrote:

On Sat, Mar 19, 2022 at 04:15:33PM +0100, Peter Lieven wrote:



Am 18.03.2022 um 17:47 schrieb Stefano Garzarella 
:


On Fri, Mar 18, 2022 at 04:48:18PM +0100, Peter Lieven wrote:



Am 18.03.2022 um 09:25 schrieb Stefano Garzarella 
:


On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:



Am 17.03.2022 um 17:26 schrieb Stefano Garzarella 
:


Commit d24f80234b ("block/rbd: increase dynamically the image 
size")

added a workaround to support growing images (eg. qcow2), resizing
the image before write operations that exceed the current size.

We recently added support for write zeroes and without the
workaround we can have problems with qcow2.

So let's move the resize into qemu_rbd_start_co() and do it when
the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
Signed-off-by: Stefano Garzarella 
---
block/rbd.c | 26 ++
1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 8f183eba2a..6caf35cbba 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1107,6 +1107,20 @@ static int coroutine_fn 
qemu_rbd_start_co(BlockDriverState *bs,


  assert(!qiov || qiov->size == bytes);

+    if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
+    /*
+ * RBD APIs don't allow us to write more than actual 
size, so in order
+ * to support growing images, we resize the image 
before write

+ * operations that exceed the current size.
+ */
+    if (offset + bytes > s->image_size) {
+    int r = qemu_rbd_resize(bs, offset + bytes);
+    if (r < 0) {
+    return r;
+    }
+    }
+    }
+
  r = rbd_aio_create_completion(,
    (rbd_callback_t) 
qemu_rbd_completion_cb, );

  if (r < 0) {
@@ -1182,18 +1196,6 @@ coroutine_fn 
qemu_rbd_co_pwritev(BlockDriverState *bs, int64_t offset,

   int64_t bytes, QEMUIOVector *qiov,
   BdrvRequestFlags flags)
{
-    BDRVRBDState *s = bs->opaque;
-    /*
- * RBD APIs don't allow us to write more than actual size, 
so in order

- * to support growing images, we resize the image before write
- * operations that exceed the current size.
- */
-    if (offset + bytes > s->image_size) {
-    int r = qemu_rbd_resize(bs, offset + bytes);
-    if (r < 0) {
-    return r;
-    }
-    }
  return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, 
RBD_AIO_WRITE);

}

--
2.35.1



Do we really have a use case for growing rbd images?


The use case is to have a qcow2 image on rbd.
I don't think it's very common, but some people use it and here 
[1] we had a little discussion about features that could be 
interesting (e.g.  persistent dirty bitmaps for incremental backup).


In any case the support is quite simple and does not affect other 
use cases since we only increase the size when we go beyond the 
current size.


IMHO we can have it in :-)



The QCOW2 alone doesn’t make much sense, but additional metadata 
might be a use case.


Yep.

Be aware that the current approach will serialize requests. If 
there is a real use case, we might think of a better solution.


Good point, but it only happens when we have to resize, so maybe 
it's okay for now, but I agree we could do better ;-)


There might also be a problem if a write for a higher offset past eof 
will be executed shortly before a write to a slightly lower offset 
past eof. The second resize will fail as it would shrink the image. 
We would need proper locking to avoid this. Maybe we need to check if 
we write past eof. If yes, take a lock around the resize op and then 
check again if it’s still eof and only resize if true.


I thought rbd_resize() was synchronous. Indeed when you said this 
could serialize writes it sounded like confirmation to me.


Since we call rbd_resize() before rbd_aio_writev(), I thought this 
case could not occur.


Can you please elaborate?


Seconding this request, because if rbd_resize() is allowed to shrink 
data, it being asynchronous might cause data corruption.


I’ll keep your patch because I find this highly unlikely, though: 
qemu_rbd_resize() itself is definitely synchronous, it can’t invoke 
qemu_coroutine_yield().


The only other possibility that comes to my mind is that rbd_resize() 
might delay the actual resize operation, but I would still expect 
consecutive resize requests to be executed in order, and since we call 
rbd_aio_writev()/rbd_aio_write_zeroes() immediately after the 
rbd_resize() (with no yielding in between), everything should be 
executed in the order that we expect.


Hanna




Re: [PATCH] block/rbd: fix write zeroes with growing images

2022-03-21 Thread Stefano Garzarella

On Sat, Mar 19, 2022 at 04:15:33PM +0100, Peter Lieven wrote:




Am 18.03.2022 um 17:47 schrieb Stefano Garzarella :

On Fri, Mar 18, 2022 at 04:48:18PM +0100, Peter Lieven wrote:




Am 18.03.2022 um 09:25 schrieb Stefano Garzarella :


On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:




Am 17.03.2022 um 17:26 schrieb Stefano Garzarella :


Commit d24f80234b ("block/rbd: increase dynamically the image size")
added a workaround to support growing images (eg. qcow2), resizing
the image before write operations that exceed the current size.

We recently added support for write zeroes and without the
workaround we can have problems with qcow2.

So let's move the resize into qemu_rbd_start_co() and do it when
the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
Signed-off-by: Stefano Garzarella 
---
block/rbd.c | 26 ++
1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 8f183eba2a..6caf35cbba 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1107,6 +1107,20 @@ static int coroutine_fn 
qemu_rbd_start_co(BlockDriverState *bs,

  assert(!qiov || qiov->size == bytes);

+if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
+/*
+ * RBD APIs don't allow us to write more than actual size, so in order
+ * to support growing images, we resize the image before write
+ * operations that exceed the current size.
+ */
+if (offset + bytes > s->image_size) {
+int r = qemu_rbd_resize(bs, offset + bytes);
+if (r < 0) {
+return r;
+}
+}
+}
+
  r = rbd_aio_create_completion(,
(rbd_callback_t) qemu_rbd_completion_cb, );
  if (r < 0) {
@@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, 
int64_t offset,
   int64_t bytes, QEMUIOVector *qiov,
   BdrvRequestFlags flags)
{
-BDRVRBDState *s = bs->opaque;
-/*
- * RBD APIs don't allow us to write more than actual size, so in order
- * to support growing images, we resize the image before write
- * operations that exceed the current size.
- */
-if (offset + bytes > s->image_size) {
-int r = qemu_rbd_resize(bs, offset + bytes);
-if (r < 0) {
-return r;
-}
-}
  return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
}

--
2.35.1



Do we really have a use case for growing rbd images?


The use case is to have a qcow2 image on rbd.
I don't think it's very common, but some people use it and here [1] 
we had a little discussion about features that could be interesting 
(e.g.  persistent dirty bitmaps for incremental backup).


In any case the support is quite simple and does not affect other 
use cases since we only increase the size when we go beyond the 
current size.


IMHO we can have it in :-)



The QCOW2 alone doesn’t make much sense, but additional metadata 
might be a use case.


Yep.

Be aware that the current approach will serialize requests. If there 
is a real use case, we might think of a better solution.


Good point, but it only happens when we have to resize, so maybe it's 
okay for now, but I agree we could do better ;-)


There might also be a problem if a write for a higher offset past eof 
will be executed shortly before a write to a slightly lower offset past 
eof. The second resize will fail as it would shrink the image. We would 
need proper locking to avoid this. Maybe we need to check if we write 
past eof. If yes, take a lock around the resize op and then check again 
if it’s still eof and only resize if true.


I thought rbd_resize() was synchronous. Indeed when you said this could 
serialize writes it sounded like confirmation to me.


Since we call rbd_resize() before rbd_aio_writev(), I thought this case 
could not occur.


Can you please elaborate?

Thanks,
Stefano




Re: [PATCH] block/rbd: fix write zeroes with growing images

2022-03-21 Thread Stefano Garzarella

On Sat, Mar 19, 2022 at 02:23:18PM +0100, Ilya Dryomov wrote:

On Sat, Mar 19, 2022 at 1:40 PM Ilya Dryomov  wrote:


On Fri, Mar 18, 2022 at 9:25 AM Stefano Garzarella  wrote:
>
> On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:
> >
> >
> >> Am 17.03.2022 um 17:26 schrieb Stefano Garzarella :
> >>
> >> Commit d24f80234b ("block/rbd: increase dynamically the image size")
> >> added a workaround to support growing images (eg. qcow2), resizing
> >> the image before write operations that exceed the current size.
> >>
> >> We recently added support for write zeroes and without the
> >> workaround we can have problems with qcow2.
> >>
> >> So let's move the resize into qemu_rbd_start_co() and do it when
> >> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
> >>
> >> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
> >> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
> >> Signed-off-by: Stefano Garzarella 
> >> ---
> >> block/rbd.c | 26 ++
> >> 1 file changed, 14 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/block/rbd.c b/block/rbd.c
> >> index 8f183eba2a..6caf35cbba 100644
> >> --- a/block/rbd.c
> >> +++ b/block/rbd.c
> >> @@ -1107,6 +1107,20 @@ static int coroutine_fn 
qemu_rbd_start_co(BlockDriverState *bs,
> >>
> >> assert(!qiov || qiov->size == bytes);
> >>
> >> +if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
> >> +/*
> >> + * RBD APIs don't allow us to write more than actual size, so in 
order
> >> + * to support growing images, we resize the image before write
> >> + * operations that exceed the current size.
> >> + */
> >> +if (offset + bytes > s->image_size) {
> >> +int r = qemu_rbd_resize(bs, offset + bytes);
> >> +if (r < 0) {
> >> +return r;
> >> +}
> >> +}
> >> +}
> >> +
> >> r = rbd_aio_create_completion(,
> >>   (rbd_callback_t) qemu_rbd_completion_cb, 
);
> >> if (r < 0) {
> >> @@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState 
*bs, int64_t offset,
> >>  int64_t bytes, QEMUIOVector *qiov,
> >>  BdrvRequestFlags flags)
> >> {
> >> -BDRVRBDState *s = bs->opaque;
> >> -/*
> >> - * RBD APIs don't allow us to write more than actual size, so in order
> >> - * to support growing images, we resize the image before write
> >> - * operations that exceed the current size.
> >> - */
> >> -if (offset + bytes > s->image_size) {
> >> -int r = qemu_rbd_resize(bs, offset + bytes);
> >> -if (r < 0) {
> >> -return r;
> >> -}
> >> -}
> >> return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, 
RBD_AIO_WRITE);
> >> }
> >>
> >> --
> >> 2.35.1
> >>
> >
> >Do we really have a use case for growing rbd images?
>
> The use case is to have a qcow2 image on rbd.
> I don't think it's very common, but some people use it and here [1] we
> had a little discussion about features that could be interesting (e.g.
> persistent dirty bitmaps for incremental backup).

RBD supports that natively (object-map and fast-diff image features).
The granularity is higher than a typical QCOW2 granularity (64K vs 4M)
but I have never heard of that being a concern.


Sorry, I meant to say lower (more coarse grained) above.

RBD's default object size is 4M and that is the granularity at which
dirtiness is typically tracked.  It is possible to ask for a byte-level
incremental diff but it is obviously slow.


I didn't know that, thanks for letting me know!

Stefano




Re: [PATCH] block/rbd: fix write zeroes with growing images

2022-03-19 Thread Peter Lieven



> Am 18.03.2022 um 17:47 schrieb Stefano Garzarella :
> 
> On Fri, Mar 18, 2022 at 04:48:18PM +0100, Peter Lieven wrote:
>> 
>> 
 Am 18.03.2022 um 09:25 schrieb Stefano Garzarella :
>>> 
>>> On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:
 
 
>> Am 17.03.2022 um 17:26 schrieb Stefano Garzarella :
> 
> Commit d24f80234b ("block/rbd: increase dynamically the image size")
> added a workaround to support growing images (eg. qcow2), resizing
> the image before write operations that exceed the current size.
> 
> We recently added support for write zeroes and without the
> workaround we can have problems with qcow2.
> 
> So let's move the resize into qemu_rbd_start_co() and do it when
> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
> Signed-off-by: Stefano Garzarella 
> ---
> block/rbd.c | 26 ++
> 1 file changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/block/rbd.c b/block/rbd.c
> index 8f183eba2a..6caf35cbba 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -1107,6 +1107,20 @@ static int coroutine_fn 
> qemu_rbd_start_co(BlockDriverState *bs,
> 
>   assert(!qiov || qiov->size == bytes);
> 
> +if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
> +/*
> + * RBD APIs don't allow us to write more than actual size, so in 
> order
> + * to support growing images, we resize the image before write
> + * operations that exceed the current size.
> + */
> +if (offset + bytes > s->image_size) {
> +int r = qemu_rbd_resize(bs, offset + bytes);
> +if (r < 0) {
> +return r;
> +}
> +}
> +}
> +
>   r = rbd_aio_create_completion(,
> (rbd_callback_t) qemu_rbd_completion_cb, 
> );
>   if (r < 0) {
> @@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState 
> *bs, int64_t offset,
>int64_t bytes, QEMUIOVector *qiov,
>BdrvRequestFlags flags)
> {
> -BDRVRBDState *s = bs->opaque;
> -/*
> - * RBD APIs don't allow us to write more than actual size, so in 
> order
> - * to support growing images, we resize the image before write
> - * operations that exceed the current size.
> - */
> -if (offset + bytes > s->image_size) {
> -int r = qemu_rbd_resize(bs, offset + bytes);
> -if (r < 0) {
> -return r;
> -}
> -}
>   return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
> }
> 
> --
> 2.35.1
> 
 
 Do we really have a use case for growing rbd images?
>>> 
>>> The use case is to have a qcow2 image on rbd.
>>> I don't think it's very common, but some people use it and here [1] we had 
>>> a little discussion about features that could be interesting (e.g.  
>>> persistent dirty bitmaps for incremental backup).
>>> 
>>> In any case the support is quite simple and does not affect other use cases 
>>> since we only increase the size when we go beyond the current size.
>>> 
>>> IMHO we can have it in :-)
>>> 
>> 
>> The QCOW2 alone doesn’t make much sense, but additional metadata might be a 
>> use case.
> 
> Yep.
> 
>> Be aware that the current approach will serialize requests. If there is a 
>> real use case, we might think of a better solution.
> 
> Good point, but it only happens when we have to resize, so maybe it's okay 
> for now, but I agree we could do better ;-)

There might also be a problem if a write for a higher offset past eof will be 
executed shortly before a write to a slightly lower offset past eof. The second 
resize will fail as it would shrink the image. We would need proper locking to 
avoid this. Maybe we need to check if we write past eof. If yes, take a lock 
around the resize op and then check again if it’s still eof and only resize if 
true.

Peter

> 
> Thanks,
> Stefano
> 





Re: [PATCH] block/rbd: fix write zeroes with growing images

2022-03-19 Thread Ilya Dryomov
On Sat, Mar 19, 2022 at 1:40 PM Ilya Dryomov  wrote:
>
> On Fri, Mar 18, 2022 at 9:25 AM Stefano Garzarella  
> wrote:
> >
> > On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:
> > >
> > >
> > >> Am 17.03.2022 um 17:26 schrieb Stefano Garzarella :
> > >>
> > >> Commit d24f80234b ("block/rbd: increase dynamically the image size")
> > >> added a workaround to support growing images (eg. qcow2), resizing
> > >> the image before write operations that exceed the current size.
> > >>
> > >> We recently added support for write zeroes and without the
> > >> workaround we can have problems with qcow2.
> > >>
> > >> So let's move the resize into qemu_rbd_start_co() and do it when
> > >> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
> > >>
> > >> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
> > >> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
> > >> Signed-off-by: Stefano Garzarella 
> > >> ---
> > >> block/rbd.c | 26 ++
> > >> 1 file changed, 14 insertions(+), 12 deletions(-)
> > >>
> > >> diff --git a/block/rbd.c b/block/rbd.c
> > >> index 8f183eba2a..6caf35cbba 100644
> > >> --- a/block/rbd.c
> > >> +++ b/block/rbd.c
> > >> @@ -1107,6 +1107,20 @@ static int coroutine_fn 
> > >> qemu_rbd_start_co(BlockDriverState *bs,
> > >>
> > >> assert(!qiov || qiov->size == bytes);
> > >>
> > >> +if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
> > >> +/*
> > >> + * RBD APIs don't allow us to write more than actual size, so 
> > >> in order
> > >> + * to support growing images, we resize the image before write
> > >> + * operations that exceed the current size.
> > >> + */
> > >> +if (offset + bytes > s->image_size) {
> > >> +int r = qemu_rbd_resize(bs, offset + bytes);
> > >> +if (r < 0) {
> > >> +return r;
> > >> +}
> > >> +}
> > >> +}
> > >> +
> > >> r = rbd_aio_create_completion(,
> > >>   (rbd_callback_t) 
> > >> qemu_rbd_completion_cb, );
> > >> if (r < 0) {
> > >> @@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState 
> > >> *bs, int64_t offset,
> > >>  int64_t bytes, QEMUIOVector *qiov,
> > >>  BdrvRequestFlags flags)
> > >> {
> > >> -BDRVRBDState *s = bs->opaque;
> > >> -/*
> > >> - * RBD APIs don't allow us to write more than actual size, so in 
> > >> order
> > >> - * to support growing images, we resize the image before write
> > >> - * operations that exceed the current size.
> > >> - */
> > >> -if (offset + bytes > s->image_size) {
> > >> -int r = qemu_rbd_resize(bs, offset + bytes);
> > >> -if (r < 0) {
> > >> -return r;
> > >> -}
> > >> -}
> > >> return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, 
> > >> RBD_AIO_WRITE);
> > >> }
> > >>
> > >> --
> > >> 2.35.1
> > >>
> > >
> > >Do we really have a use case for growing rbd images?
> >
> > The use case is to have a qcow2 image on rbd.
> > I don't think it's very common, but some people use it and here [1] we
> > had a little discussion about features that could be interesting (e.g.
> > persistent dirty bitmaps for incremental backup).
>
> RBD supports that natively (object-map and fast-diff image features).
> The granularity is higher than a typical QCOW2 granularity (64K vs 4M)
> but I have never heard of that being a concern.

Sorry, I meant to say lower (more coarse grained) above.

RBD's default object size is 4M and that is the granularity at which
dirtiness is typically tracked.  It is possible to ask for a byte-level
incremental diff but it is obviously slow.

Thanks,

Ilya



Re: [PATCH] block/rbd: fix write zeroes with growing images

2022-03-19 Thread Ilya Dryomov
On Fri, Mar 18, 2022 at 9:25 AM Stefano Garzarella  wrote:
>
> On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:
> >
> >
> >> Am 17.03.2022 um 17:26 schrieb Stefano Garzarella :
> >>
> >> Commit d24f80234b ("block/rbd: increase dynamically the image size")
> >> added a workaround to support growing images (eg. qcow2), resizing
> >> the image before write operations that exceed the current size.
> >>
> >> We recently added support for write zeroes and without the
> >> workaround we can have problems with qcow2.
> >>
> >> So let's move the resize into qemu_rbd_start_co() and do it when
> >> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
> >>
> >> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
> >> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
> >> Signed-off-by: Stefano Garzarella 
> >> ---
> >> block/rbd.c | 26 ++
> >> 1 file changed, 14 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/block/rbd.c b/block/rbd.c
> >> index 8f183eba2a..6caf35cbba 100644
> >> --- a/block/rbd.c
> >> +++ b/block/rbd.c
> >> @@ -1107,6 +1107,20 @@ static int coroutine_fn 
> >> qemu_rbd_start_co(BlockDriverState *bs,
> >>
> >> assert(!qiov || qiov->size == bytes);
> >>
> >> +if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
> >> +/*
> >> + * RBD APIs don't allow us to write more than actual size, so in 
> >> order
> >> + * to support growing images, we resize the image before write
> >> + * operations that exceed the current size.
> >> + */
> >> +if (offset + bytes > s->image_size) {
> >> +int r = qemu_rbd_resize(bs, offset + bytes);
> >> +if (r < 0) {
> >> +return r;
> >> +}
> >> +}
> >> +}
> >> +
> >> r = rbd_aio_create_completion(,
> >>   (rbd_callback_t) qemu_rbd_completion_cb, 
> >> );
> >> if (r < 0) {
> >> @@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState 
> >> *bs, int64_t offset,
> >>  int64_t bytes, QEMUIOVector *qiov,
> >>  BdrvRequestFlags flags)
> >> {
> >> -BDRVRBDState *s = bs->opaque;
> >> -/*
> >> - * RBD APIs don't allow us to write more than actual size, so in order
> >> - * to support growing images, we resize the image before write
> >> - * operations that exceed the current size.
> >> - */
> >> -if (offset + bytes > s->image_size) {
> >> -int r = qemu_rbd_resize(bs, offset + bytes);
> >> -if (r < 0) {
> >> -return r;
> >> -}
> >> -}
> >> return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, 
> >> RBD_AIO_WRITE);
> >> }
> >>
> >> --
> >> 2.35.1
> >>
> >
> >Do we really have a use case for growing rbd images?
>
> The use case is to have a qcow2 image on rbd.
> I don't think it's very common, but some people use it and here [1] we
> had a little discussion about features that could be interesting (e.g.
> persistent dirty bitmaps for incremental backup).

RBD supports that natively (object-map and fast-diff image features).
The granularity is higher than a typical QCOW2 granularity (64K vs 4M)
but I have never heard of that being a concern.

Thanks,

Ilya



Re: [PATCH] block/rbd: fix write zeroes with growing images

2022-03-19 Thread Ilya Dryomov
On Thu, Mar 17, 2022 at 5:26 PM Stefano Garzarella  wrote:
>
> Commit d24f80234b ("block/rbd: increase dynamically the image size")
> added a workaround to support growing images (eg. qcow2), resizing
> the image before write operations that exceed the current size.
>
> We recently added support for write zeroes and without the
> workaround we can have problems with qcow2.
>
> So let's move the resize into qemu_rbd_start_co() and do it when
> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
>
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
> Signed-off-by: Stefano Garzarella 
> ---
>  block/rbd.c | 26 ++
>  1 file changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index 8f183eba2a..6caf35cbba 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -1107,6 +1107,20 @@ static int coroutine_fn 
> qemu_rbd_start_co(BlockDriverState *bs,
>
>  assert(!qiov || qiov->size == bytes);
>
> +if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
> +/*
> + * RBD APIs don't allow us to write more than actual size, so in 
> order
> + * to support growing images, we resize the image before write
> + * operations that exceed the current size.
> + */
> +if (offset + bytes > s->image_size) {
> +int r = qemu_rbd_resize(bs, offset + bytes);
> +if (r < 0) {
> +return r;
> +}
> +}
> +}
> +
>  r = rbd_aio_create_completion(,
>(rbd_callback_t) qemu_rbd_completion_cb, 
> );
>  if (r < 0) {
> @@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, 
> int64_t offset,
>   int64_t bytes, QEMUIOVector *qiov,
>   BdrvRequestFlags flags)
>  {
> -BDRVRBDState *s = bs->opaque;
> -/*
> - * RBD APIs don't allow us to write more than actual size, so in order
> - * to support growing images, we resize the image before write
> - * operations that exceed the current size.
> - */
> -if (offset + bytes > s->image_size) {
> -int r = qemu_rbd_resize(bs, offset + bytes);
> -if (r < 0) {
> -return r;
> -}
> -}
>  return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
>  }
>
> --
> 2.35.1
>

Reviewed-by: Ilya Dryomov 

Thanks,

Ilya



Re: [PATCH] block/rbd: fix write zeroes with growing images

2022-03-18 Thread Stefano Garzarella

On Fri, Mar 18, 2022 at 04:48:18PM +0100, Peter Lieven wrote:




Am 18.03.2022 um 09:25 schrieb Stefano Garzarella :

On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:




Am 17.03.2022 um 17:26 schrieb Stefano Garzarella :


Commit d24f80234b ("block/rbd: increase dynamically the image size")
added a workaround to support growing images (eg. qcow2), resizing
the image before write operations that exceed the current size.

We recently added support for write zeroes and without the
workaround we can have problems with qcow2.

So let's move the resize into qemu_rbd_start_co() and do it when
the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
Signed-off-by: Stefano Garzarella 
---
block/rbd.c | 26 ++
1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 8f183eba2a..6caf35cbba 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1107,6 +1107,20 @@ static int coroutine_fn 
qemu_rbd_start_co(BlockDriverState *bs,

   assert(!qiov || qiov->size == bytes);

+if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
+/*
+ * RBD APIs don't allow us to write more than actual size, so in order
+ * to support growing images, we resize the image before write
+ * operations that exceed the current size.
+ */
+if (offset + bytes > s->image_size) {
+int r = qemu_rbd_resize(bs, offset + bytes);
+if (r < 0) {
+return r;
+}
+}
+}
+
   r = rbd_aio_create_completion(,
 (rbd_callback_t) qemu_rbd_completion_cb, );
   if (r < 0) {
@@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, 
int64_t offset,
int64_t bytes, QEMUIOVector *qiov,
BdrvRequestFlags flags)
{
-BDRVRBDState *s = bs->opaque;
-/*
- * RBD APIs don't allow us to write more than actual size, so in order
- * to support growing images, we resize the image before write
- * operations that exceed the current size.
- */
-if (offset + bytes > s->image_size) {
-int r = qemu_rbd_resize(bs, offset + bytes);
-if (r < 0) {
-return r;
-}
-}
   return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
}

--
2.35.1



Do we really have a use case for growing rbd images?


The use case is to have a qcow2 image on rbd.
I don't think it's very common, but some people use it and here [1] we had a 
little discussion about features that could be interesting (e.g.  persistent 
dirty bitmaps for incremental backup).

In any case the support is quite simple and does not affect other use cases 
since we only increase the size when we go beyond the current size.

IMHO we can have it in :-)



The QCOW2 alone doesn’t make much sense, but additional metadata might 
be a use case.


Yep.

Be aware that the current approach will serialize requests. If there is 
a real use case, we might think of a better solution.


Good point, but it only happens when we have to resize, so maybe it's 
okay for now, but I agree we could do better ;-)


Thanks,
Stefano




Re: [PATCH] block/rbd: fix write zeroes with growing images

2022-03-18 Thread Hanna Reitz

On 17.03.22 17:26, Stefano Garzarella wrote:

Commit d24f80234b ("block/rbd: increase dynamically the image size")
added a workaround to support growing images (eg. qcow2), resizing
the image before write operations that exceed the current size.

We recently added support for write zeroes and without the
workaround we can have problems with qcow2.

So let's move the resize into qemu_rbd_start_co() and do it when
the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
Signed-off-by: Stefano Garzarella 
---
  block/rbd.c | 26 ++
  1 file changed, 14 insertions(+), 12 deletions(-)


Thanks, applied to my block branch:

https://gitlab.com/hreitz/qemu/-/commits/block

Hanna




Re: [PATCH] block/rbd: fix write zeroes with growing images

2022-03-18 Thread Peter Lieven



> Am 18.03.2022 um 09:25 schrieb Stefano Garzarella :
> 
> On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:
>> 
>> 
 Am 17.03.2022 um 17:26 schrieb Stefano Garzarella :
>>> 
>>> Commit d24f80234b ("block/rbd: increase dynamically the image size")
>>> added a workaround to support growing images (eg. qcow2), resizing
>>> the image before write operations that exceed the current size.
>>> 
>>> We recently added support for write zeroes and without the
>>> workaround we can have problems with qcow2.
>>> 
>>> So let's move the resize into qemu_rbd_start_co() and do it when
>>> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
>>> 
>>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
>>> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
>>> Signed-off-by: Stefano Garzarella 
>>> ---
>>> block/rbd.c | 26 ++
>>> 1 file changed, 14 insertions(+), 12 deletions(-)
>>> 
>>> diff --git a/block/rbd.c b/block/rbd.c
>>> index 8f183eba2a..6caf35cbba 100644
>>> --- a/block/rbd.c
>>> +++ b/block/rbd.c
>>> @@ -1107,6 +1107,20 @@ static int coroutine_fn 
>>> qemu_rbd_start_co(BlockDriverState *bs,
>>> 
>>>assert(!qiov || qiov->size == bytes);
>>> 
>>> +if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
>>> +/*
>>> + * RBD APIs don't allow us to write more than actual size, so in 
>>> order
>>> + * to support growing images, we resize the image before write
>>> + * operations that exceed the current size.
>>> + */
>>> +if (offset + bytes > s->image_size) {
>>> +int r = qemu_rbd_resize(bs, offset + bytes);
>>> +if (r < 0) {
>>> +return r;
>>> +}
>>> +}
>>> +}
>>> +
>>>r = rbd_aio_create_completion(,
>>>  (rbd_callback_t) qemu_rbd_completion_cb, 
>>> );
>>>if (r < 0) {
>>> @@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState 
>>> *bs, int64_t offset,
>>> int64_t bytes, QEMUIOVector *qiov,
>>> BdrvRequestFlags flags)
>>> {
>>> -BDRVRBDState *s = bs->opaque;
>>> -/*
>>> - * RBD APIs don't allow us to write more than actual size, so in order
>>> - * to support growing images, we resize the image before write
>>> - * operations that exceed the current size.
>>> - */
>>> -if (offset + bytes > s->image_size) {
>>> -int r = qemu_rbd_resize(bs, offset + bytes);
>>> -if (r < 0) {
>>> -return r;
>>> -}
>>> -}
>>>return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
>>> }
>>> 
>>> --
>>> 2.35.1
>>> 
>> 
>> Do we really have a use case for growing rbd images?
> 
> The use case is to have a qcow2 image on rbd.
> I don't think it's very common, but some people use it and here [1] we had a 
> little discussion about features that could be interesting (e.g.  persistent 
> dirty bitmaps for incremental backup).
> 
> In any case the support is quite simple and does not affect other use cases 
> since we only increase the size when we go beyond the current size.
> 
> IMHO we can have it in :-)
> 

The QCOW2 alone doesn’t make much sense, but additional metadata might be a use 
case.
Be aware that the current approach will serialize requests. If there is a real 
use case, we might think of a better solution.

Peter

> Thanks,
> Stefano
> 
> [1] https://lore.kernel.org/all/20190415080452.GA6031@localhost.localdomain/
> 





Re: [PATCH] block/rbd: fix write zeroes with growing images

2022-03-18 Thread Stefano Garzarella

On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:




Am 17.03.2022 um 17:26 schrieb Stefano Garzarella :

Commit d24f80234b ("block/rbd: increase dynamically the image size")
added a workaround to support growing images (eg. qcow2), resizing
the image before write operations that exceed the current size.

We recently added support for write zeroes and without the
workaround we can have problems with qcow2.

So let's move the resize into qemu_rbd_start_co() and do it when
the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
Signed-off-by: Stefano Garzarella 
---
block/rbd.c | 26 ++
1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 8f183eba2a..6caf35cbba 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1107,6 +1107,20 @@ static int coroutine_fn 
qemu_rbd_start_co(BlockDriverState *bs,

assert(!qiov || qiov->size == bytes);

+if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
+/*
+ * RBD APIs don't allow us to write more than actual size, so in order
+ * to support growing images, we resize the image before write
+ * operations that exceed the current size.
+ */
+if (offset + bytes > s->image_size) {
+int r = qemu_rbd_resize(bs, offset + bytes);
+if (r < 0) {
+return r;
+}
+}
+}
+
r = rbd_aio_create_completion(,
  (rbd_callback_t) qemu_rbd_completion_cb, );
if (r < 0) {
@@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, 
int64_t offset,
 int64_t bytes, QEMUIOVector *qiov,
 BdrvRequestFlags flags)
{
-BDRVRBDState *s = bs->opaque;
-/*
- * RBD APIs don't allow us to write more than actual size, so in order
- * to support growing images, we resize the image before write
- * operations that exceed the current size.
- */
-if (offset + bytes > s->image_size) {
-int r = qemu_rbd_resize(bs, offset + bytes);
-if (r < 0) {
-return r;
-}
-}
return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
}

--
2.35.1



Do we really have a use case for growing rbd images?


The use case is to have a qcow2 image on rbd.
I don't think it's very common, but some people use it and here [1] we 
had a little discussion about features that could be interesting (e.g.  
persistent dirty bitmaps for incremental backup).


In any case the support is quite simple and does not affect other use 
cases since we only increase the size when we go beyond the current 
size.


IMHO we can have it in :-)

Thanks,
Stefano

[1] 
https://lore.kernel.org/all/20190415080452.GA6031@localhost.localdomain/





Re: [PATCH] block/rbd: fix write zeroes with growing images

2022-03-17 Thread Peter Lieven



> Am 17.03.2022 um 17:26 schrieb Stefano Garzarella :
> 
> Commit d24f80234b ("block/rbd: increase dynamically the image size")
> added a workaround to support growing images (eg. qcow2), resizing
> the image before write operations that exceed the current size.
> 
> We recently added support for write zeroes and without the
> workaround we can have problems with qcow2.
> 
> So let's move the resize into qemu_rbd_start_co() and do it when
> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
> Signed-off-by: Stefano Garzarella 
> ---
> block/rbd.c | 26 ++
> 1 file changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/block/rbd.c b/block/rbd.c
> index 8f183eba2a..6caf35cbba 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -1107,6 +1107,20 @@ static int coroutine_fn 
> qemu_rbd_start_co(BlockDriverState *bs,
> 
> assert(!qiov || qiov->size == bytes);
> 
> +if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
> +/*
> + * RBD APIs don't allow us to write more than actual size, so in 
> order
> + * to support growing images, we resize the image before write
> + * operations that exceed the current size.
> + */
> +if (offset + bytes > s->image_size) {
> +int r = qemu_rbd_resize(bs, offset + bytes);
> +if (r < 0) {
> +return r;
> +}
> +}
> +}
> +
> r = rbd_aio_create_completion(,
>   (rbd_callback_t) qemu_rbd_completion_cb, 
> );
> if (r < 0) {
> @@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, 
> int64_t offset,
>  int64_t bytes, QEMUIOVector *qiov,
>  BdrvRequestFlags flags)
> {
> -BDRVRBDState *s = bs->opaque;
> -/*
> - * RBD APIs don't allow us to write more than actual size, so in order
> - * to support growing images, we resize the image before write
> - * operations that exceed the current size.
> - */
> -if (offset + bytes > s->image_size) {
> -int r = qemu_rbd_resize(bs, offset + bytes);
> -if (r < 0) {
> -return r;
> -}
> -}
> return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
> }
> 
> -- 
> 2.35.1
> 

Do we really have a use case for growing rbd images?

Peter




[PATCH] block/rbd: fix write zeroes with growing images

2022-03-17 Thread Stefano Garzarella
Commit d24f80234b ("block/rbd: increase dynamically the image size")
added a workaround to support growing images (eg. qcow2), resizing
the image before write operations that exceed the current size.

We recently added support for write zeroes and without the
workaround we can have problems with qcow2.

So let's move the resize into qemu_rbd_start_co() and do it when
the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
Signed-off-by: Stefano Garzarella 
---
 block/rbd.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 8f183eba2a..6caf35cbba 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1107,6 +1107,20 @@ static int coroutine_fn 
qemu_rbd_start_co(BlockDriverState *bs,
 
 assert(!qiov || qiov->size == bytes);
 
+if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
+/*
+ * RBD APIs don't allow us to write more than actual size, so in order
+ * to support growing images, we resize the image before write
+ * operations that exceed the current size.
+ */
+if (offset + bytes > s->image_size) {
+int r = qemu_rbd_resize(bs, offset + bytes);
+if (r < 0) {
+return r;
+}
+}
+}
+
 r = rbd_aio_create_completion(,
   (rbd_callback_t) qemu_rbd_completion_cb, );
 if (r < 0) {
@@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, 
int64_t offset,
  int64_t bytes, QEMUIOVector *qiov,
  BdrvRequestFlags flags)
 {
-BDRVRBDState *s = bs->opaque;
-/*
- * RBD APIs don't allow us to write more than actual size, so in order
- * to support growing images, we resize the image before write
- * operations that exceed the current size.
- */
-if (offset + bytes > s->image_size) {
-int r = qemu_rbd_resize(bs, offset + bytes);
-if (r < 0) {
-return r;
-}
-}
 return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
 }
 
-- 
2.35.1