[Qemu-devel] [Bug 601946] Re: [Feature request] qemu-img multi-threaded compressed image conversion

2017-10-06 Thread Quentin Casasnovas
@Jinank I have not started working on this at all, so please go ahead!
Let me know if I can help with testing or anything, we make quite
extensive use of nbd and qcow2 images internally.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/601946

Title:
  [Feature request] qemu-img multi-threaded compressed image conversion

Status in QEMU:
  New

Bug description:
  Feature request:
  qemu-img multi-threaded compressed image conversion

  Suppose I want to convert raw image to compressed qcow2. Multi-
  threaded conversion will be much faster, because bottleneck is
  compressing data.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/601946/+subscriptions



[Qemu-devel] [Bug 601946] Re: [Feature request] qemu-img multi-threaded compressed image conversion

2017-01-21 Thread Quentin Casasnovas
That was also my feeling, so nice to get a confirmation!

Another related thing would be to allow qemu-nbd to write compressed
blocks its backing image - today if you use a qcow2 with compression,
any block which is written to gets uncompressed in the resulting image
and you need to recompress the image offline with qemu-img.

Would you have any pointers/documentation on how best to implement this
so both qemu-img and qemu-nbd can use multithreaded compressed writes ?
I'm totally new to qemu block subsystem.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/601946

Title:
  [Feature request] qemu-img multi-threaded compressed image conversion

Status in QEMU:
  New

Bug description:
  Feature request:
  qemu-img multi-threaded compressed image conversion

  Suppose I want to convert raw image to compressed qcow2. Multi-
  threaded conversion will be much faster, because bottleneck is
  compressing data.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/601946/+subscriptions



[Qemu-devel] [Bug 601946] Re: [Feature request] qemu-img multi-threaded compressed image conversion

2017-01-10 Thread Quentin Casasnovas
It looks like qcow2_write_compressed() has been removed and turned into
a qemu co-routine in qemu 2.8.0 (released in December 2017) to support
live compressed back-ups.  Any pointers to start working on this?  We
have servers with 128 CPUs and it's very sad to see them compress on a
single CPU and take tens of minutes instead of a few seconds.. :)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/601946

Title:
  [Feature request] qemu-img multi-threaded compressed image conversion

Status in QEMU:
  New

Bug description:
  Feature request:
  qemu-img multi-threaded compressed image conversion

  Suppose I want to convert raw image to compressed qcow2. Multi-
  threaded conversion will be much faster, because bottleneck is
  compressing data.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/601946/+subscriptions



Re: [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-11 Thread Quentin Casasnovas
On Tue, May 10, 2016 at 02:34:18PM -0600, Eric Blake wrote:
> On 05/06/2016 02:45 AM, Quentin Casasnovas wrote:
> > When running fstrim on a filesystem mounted through qemu-nbd with
> > --discard=on, fstrim would fail with I/O errors:
> > 
> >   $ fstrim /k/spl/ice/
> >   fstrim: /k/spl/ice/: FITRIM ioctl failed: Input/output error
> > 
> > and qemu-nbd was spitting these:
> > 
> >   nbd.c:nbd_co_receive_request():L1232: len (94621696) is larger than max 
> > len (33554432)
> 
> Your patch duplicates what is already present in qemu:
> 
> commit eb38c3b67018ff8069e4f674a28661931a8a3e4f
> Author: Paolo Bonzini <pbonz...@redhat.com>
> Date:   Thu Jan 7 14:32:42 2016 +0100
> 
> nbd-server: do not check request length except for reads and writes
> 
> Only reads and writes need to allocate memory correspondent to the
> request length.  Other requests can be sent to the storage without
> allocating any memory, and thus any request length is acceptable.
> 
> Reported-by: Sitsofe Wheeler <sits...@yahoo.com>
> Cc: qemu-bl...@nongnu.org
> Reviewed-by: Max Reitz <mre...@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
> 
> For the purposes of qemu-stable, it's better to backport the existing
> patch than to write a new version of it.
> 

Ha sorry I missed this! I wouldn't have tried to debug/fix myself otherwise :)

> It also helps to state what version of qemu you were testing, as it is
> obviously not the (soon-to-be-released) version 2.6 which already has
> the fix.
>

I was using qemu-2.5.0-rc3 on Gentoo but this was also verified on some
Debian systems which appears to be on 1.1.2+dfsg-6a+deb7u12 and on Ubuntu
Xenial 2.0.0+dfsg-2ubuntu1.22.

I wrote the patch on top of https://github.com/bonzini/qemu.git:master
(a7e00e2) which didn't contain the fix last Friday.

Anyway, cool if the fix is going into mainline :)

Quentin



Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Quentin Casasnovas
On Tue, May 10, 2016 at 05:23:21PM +0100, Alex Bligh wrote:
> 
> On 10 May 2016, at 17:04, Quentin Casasnovas <quentin.casasno...@oracle.com> 
> wrote:
> 
> > Alright, I assumed by 'length of an NBD request', the specification was
> > talking about the length of.. well, the request as opposed to whatever is
> > in the length field of the request header.
> 
> With NBD_CMD_TRIM the length in the header field is 32 bit and specifies
> the length of data to trim, not the length of the data transferred (which
> is none).
> 
> > Is there a use case where you'd want to split up a single big TRIM request
> > in smaller ones (as in some hardware would not support it or something)?
> > Even then, it looks like this splitting up would be hardware dependant and
> > better implemented in block device drivers.
> 
> Part of the point of the block size extension is to push such limits to the
> client.
> 
> I could make up use cases (e.g. that performing a multi-gigabyte trim in
> a single threaded server will effectively block all other I/O), but the
> main reason in my book is orthogonality, and the fact the client needs
> to do some breaking up anyway.
> 
> > I'm just finding odd that something that fits inside the length field can't
> > be used.
> 
> That's a different point. That's Qemu's 'Denial of Service Attack'
> prevention, *not* maximum block sizes. It isn't dropping it because
> of a maximum block size parameter. If it doesn't support the block size
> extension which the version you're looking at does not, it's meant
> to handle requests up to 2^32-1 long EXCEPT that it MAY error requests
> so long as to cause a denial of service attack. As this doesn't fit
> into that case (it's a TRIM), it shouldn't be erroring it on that grounds.
> 
> I agree Qemu should fix that.
> 
> (So in a sense Eric and I are arguing about something irrelevant to
> your current problem, which is how this would work /with/ the block
> size extensions, as Eric is in the process of implementing them).
> 

Riight!  OK understood, thanks for the explanation.

Quentin



Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Quentin Casasnovas
On Tue, May 10, 2016 at 05:54:44PM +0200, Quentin Casasnovas wrote:
> On Tue, May 10, 2016 at 09:46:36AM -0600, Eric Blake wrote:
> > On 05/10/2016 09:41 AM, Alex Bligh wrote:
> > > 
> > > On 10 May 2016, at 16:29, Eric Blake <ebl...@redhat.com> wrote:
> > > 
> > >> So the kernel is currently one of the clients that does NOT honor block
> > >> sizes, and as such, servers should be prepared for ANY size up to
> > >> UINT_MAX (other than DoS handling).
> > > 
> > > Interesting followup question:
> > > 
> > > If the kernel does not fragment TRIM requests at all (in the
> > > same way it fragments read and write requests), I suspect
> > > something bad may happen with TRIM requests over 2^31
> > > in size (particularly over 2^32 in size), as the length
> > > field in nbd only has 32 bits.
> > > 
> > > Whether it supports block size constraints or not, it is
> > > going to need to do *some* breaking up of requests.
> > 
> > Does anyone have an easy way to cause the kernel to request a trim
> > operation that large on a > 4G export?  I'm not familiar enough with
> > EXT4 operation to know what file system operations you can run to
> > ultimately indirectly create a file system trim operation that large.
> > But maybe there is something simpler - does the kernel let you use the
> > fallocate(2) syscall operation with FALLOC_FL_PUNCH_HOLE or
> > FALLOC_FL_ZERO_RANGE on an fd backed by an NBD device?
> > 
> 
> It was fairly reproducible here, we just used a random qcow2 image with
> some Debian minimal system pre-installed, mounted that qcow2 image through
> qemu-nbd then compiled a whole kernel inside it.  Then you can make clean
> and run fstrim on the mount point.  I'm assuming you can go faster than
> that by just writing a big file to the qcow2 image mounted without -o
> discard, delete the big file, then remount with -o discard + run fstrim.
> 

Looks like there's an easier way:

 $ qemu-img create -f qcow2 foo.qcow2 10G
 $ qemu-nbd --discard=on -c /dev/nbd0 foo.qcow2
 $ mkfs.ext4 /dev/nbd0
 mke2fs 1.42.13 (17-May-2015)
 Discarding device blocks: failed - Input/output error
 Creating filesystem with 2621440 4k blocks and 655360 inodes
 Filesystem UUID: 25aeb51f-0dea-4c1d-8b65-61f6bcdf97e9
 Superblock backups stored on blocks:
   32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

 Allocating group tables: done
 Writing inode tables: done
 Creating journal (32768 blocks): done
 Writing superblocks and filesystem accounting information: done

Notice the "Discarding device blocks: failed - Input/output error" line, I
bet that it is mkfs.ext4 trying to trim all blocks prior to writing the
filesystem, but it gets an I/O error while doing so.  I haven't verified it
is the same problem, but it it isn't, simply mount the resulting filesystem
and run fstrim on it:

 $ mount -o discard /dev/nbd0 /tmp/foo
 $ fstrim /tmp/foo
 fstrim: /tmp/foo: FITRIM ioctl failed: Input/output error

Quentin



Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Quentin Casasnovas
On Tue, May 10, 2016 at 04:49:57PM +0100, Alex Bligh wrote:
> 
> On 10 May 2016, at 16:45, Quentin Casasnovas <quentin.casasno...@oracle.com> 
> wrote:
> 
> > I'm by no mean an expert in this, but why would the kernel break up those
> > TRIM commands?  After all, breaking things up makes sense when the length
> > of the request is big, not that much when it only contains the request
> > header, which is the case for TRIM commands.
> 
> 1. You are assuming that the only reason for limiting the size of operations 
> is
>limiting the data transferred within one request. That is not necessarily
>the case. There are good reasons (if only orthogonality) to have limits
>in place even where no data is transferred.
> 
> 2. As and when the blocksize extension is implemented in the kernel (it isn't
>now), the protocol requires it.
> 
> 3. The maximum length of an NBD trim operation is 2^32. The maximum length
>of a trim operation is larger. Therefore the kernel needs to do at least
>some breaking up.

Alright, I assumed by 'length of an NBD request', the specification was
talking about the length of.. well, the request as opposed to whatever is
in the length field of the request header.

Is there a use case where you'd want to split up a single big TRIM request
in smaller ones (as in some hardware would not support it or something)?
Even then, it looks like this splitting up would be hardware dependant and
better implemented in block device drivers.

I'm just finding odd that something that fits inside the length field can't
be used.  I do agree with your point number 3, obviously if the lenght
field type doesn't allow something bigger than a u32, then the kernel has
to do some breaking up in that case.

Quentin



Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Quentin Casasnovas
On Tue, May 10, 2016 at 09:46:36AM -0600, Eric Blake wrote:
> On 05/10/2016 09:41 AM, Alex Bligh wrote:
> > 
> > On 10 May 2016, at 16:29, Eric Blake  wrote:
> > 
> >> So the kernel is currently one of the clients that does NOT honor block
> >> sizes, and as such, servers should be prepared for ANY size up to
> >> UINT_MAX (other than DoS handling).
> > 
> > Interesting followup question:
> > 
> > If the kernel does not fragment TRIM requests at all (in the
> > same way it fragments read and write requests), I suspect
> > something bad may happen with TRIM requests over 2^31
> > in size (particularly over 2^32 in size), as the length
> > field in nbd only has 32 bits.
> > 
> > Whether it supports block size constraints or not, it is
> > going to need to do *some* breaking up of requests.
> 
> Does anyone have an easy way to cause the kernel to request a trim
> operation that large on a > 4G export?  I'm not familiar enough with
> EXT4 operation to know what file system operations you can run to
> ultimately indirectly create a file system trim operation that large.
> But maybe there is something simpler - does the kernel let you use the
> fallocate(2) syscall operation with FALLOC_FL_PUNCH_HOLE or
> FALLOC_FL_ZERO_RANGE on an fd backed by an NBD device?
> 

It was fairly reproducible here, we just used a random qcow2 image with
some Debian minimal system pre-installed, mounted that qcow2 image through
qemu-nbd then compiled a whole kernel inside it.  Then you can make clean
and run fstrim on the mount point.  I'm assuming you can go faster than
that by just writing a big file to the qcow2 image mounted without -o
discard, delete the big file, then remount with -o discard + run fstrim.

Quentin



Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Quentin Casasnovas
On Tue, May 10, 2016 at 04:38:29PM +0100, Alex Bligh wrote:
> Eric,
> 
> On 10 May 2016, at 16:29, Eric Blake  wrote:
> >>> Maybe we should revisit that in the spec, and/or advertise yet another
> >>> block size (since the maximum size for a trim and/or write_zeroes
> >>> request may indeed be different than the maximum size for a read/write).
> >> 
> >> I think it's up to the server to either handle large requests, or
> >> for the client to break these up.
> > 
> > But the question at hand here is whether we should permit servers to
> > advertise multiple maximum block sizes (one for read/write, another one
> > for trim/write_zero, or even two [at least qemu tracks a separate
> > maximum trim vs. write_zero sizing in its generic block layer]), or
> > merely stick with the current wording that requires clients that honor
> > maximum block size to obey the same maximum for ALL commands, regardless
> > of amount of data sent over the wire.
> 
> In my view, we should not change this. Block sizes maxima are not there
> to support DoS prevention (that's a separate phrase). They are there
> to impose maximum block sizes. Adding a different maximum block size
> for different commands is way too overengineered. There are after
> all reasons (especially without structured replies) why you'd want
> different maximum block sizes for writes and reads. If clients support
> block sizes, they will necessarily have to have the infrastructure
> to break requests up.
> 
> IE maximum block size should continue to mean maximum block size.
> 
> >> 
> >> The core problem here is that the kernel (and, ahem, most servers) are
> >> ignorant of the block size extension, and need to guess how to break
> >> things up. In my view the client (kernel in this case) should
> >> be breaking the trim requests up into whatever size it uses as the
> >> maximum size write requests. But then it would have to know about block
> >> sizes which are in (another) experimental extension.
> > 
> > Correct - no one has yet patched the kernel to honor block sizes
> > advertised through what is currently an experimental extension.
> 
> Unsurprising at it's still experimental, and only settled down a couple
> of weeks ago :-)
> 
> >  (We
> > have ioctl(NBD_SET_BLKSIZE) which can be argued to set the kernel's
> > minimum block size,
> 
> Technically that is 'out of band transmission of block size
> constraints' :-)
> 
> > but I haven't audited whether the kernel actually
> > guarantees that all client requests are sent aligned to the value passed
> > that way - but we have nothing to set the maximum size,
> 
> indeed
> 
> > and are at the
> > mercy of however the kernel currently decides to split large requests).
> 
> I am surprised TRIM doesn't get broken up the same way READ and WRITE
> do.
>

I'm by no mean an expert in this, but why would the kernel break up those
TRIM commands?  After all, breaking things up makes sense when the length
of the request is big, not that much when it only contains the request
header, which is the case for TRIM commands.

What am I missing?

Quentin



[Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-06 Thread Quentin Casasnovas
When running fstrim on a filesystem mounted through qemu-nbd with
--discard=on, fstrim would fail with I/O errors:

  $ fstrim /k/spl/ice/
  fstrim: /k/spl/ice/: FITRIM ioctl failed: Input/output error

and qemu-nbd was spitting these:

  nbd.c:nbd_co_receive_request():L1232: len (94621696) is larger than max len 
(33554432)

Enabling debug output on the NBD driver in the Linux kernel showed the
request length field sent was the one received and that qemu-nbd returned
22 (EINVAL) as error code:

  EXT4-fs (nbd0p1): mounted filesystem with ordered data mode. Opts: discard
  block nbd0: request 880094c0cc18: dequeued (flags=1)
  block nbd0: request 880094c0cc18: sending control (read@5255168,4096B)
  block nbd0: request 880094c0cc18: got reply
  block nbd0: request 880094c0cc18: got 4096 bytes data
  block nbd0: request 880094c0cc18: done
  block nbd0: request 8801728796d8: dequeued (flags=1)
  block nbd0: request 8801728796d8: sending control 
(trim/discard@39464960,45056B)
  block nbd0: request 8801728796d8: got reply
  block nbd0: request 8801728796d8: done
  block nbd0: request 880172879ae0: dequeued (flags=1)
  block nbd0: request 880172879ae0: sending control 
(trim/discard@39653376,16384B)
  block nbd0: request 880172879ae0: got reply
  block nbd0: request 880172879ae0: done
  block nbd0: request 880172879d90: dequeued (flags=1)
  block nbd0: request 880172879d90: sending control 
(trim/discard@40644608,94621696B)
   

  block nbd0: Other side returned error (22)
 ^^

The length of the request seems huge but this is really just the filesystem
telling the block device driver that "this length should be trimmed", and,
unlike for a NBD_CMD_READ or NBD_CMD_WRITE, we'll not try to read/write
that amount of data from/to the NBD socket.  It is thus safe to remove the
length check for a NBD_CMD_TRIM.

I've confirmed this with both the protocol documentation at:

 https://github.com/yoe/nbd/blob/master/doc/proto.md

and looking at the kernel side implementation of the nbd device
(drivers/block/nbd.c) where it only sends the request header with no data
for a NBD_CMD_TRIM.

With this fix in, I am now able to run fstrim on my qcow2 images and keep
them small (or at least keep their size proportional to the amount of data
present on them).

Signed-off-by: Quentin Casasnovas <quentin.casasno...@oracle.com>
CC: Paolo Bonzini <pbonz...@redhat.com>
CC: <qemu-devel@nongnu.org>
CC: <qemu-sta...@nongnu.org>
CC: <qemu-triv...@nongnu.org>
---
 nbd.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/nbd.c b/nbd.c
index b3d9654..e733669 100644
--- a/nbd.c
+++ b/nbd.c
@@ -1209,6 +1209,11 @@ static ssize_t nbd_co_send_reply(NBDRequest *req, struct 
nbd_reply *reply,
 return rc;
 }
 
+static bool nbd_should_check_request_size(const struct nbd_request *request)
+{
+return (request->type & NBD_CMD_MASK_COMMAND) != NBD_CMD_TRIM;
+}
+
 static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request 
*request)
 {
 NBDClient *client = req->client;
@@ -1227,7 +1232,8 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, 
struct nbd_request *reque
 goto out;
 }
 
-if (request->len > NBD_MAX_BUFFER_SIZE) {
+if (nbd_should_check_request_size(request) &&
+request->len > NBD_MAX_BUFFER_SIZE) {
 LOG("len (%u) is larger than max len (%u)",
 request->len, NBD_MAX_BUFFER_SIZE);
 rc = -EINVAL;
-- 
2.8.1