Re: [PATCH 2/2] block: file-posix: Replace posix_fallocate with fallocate

2020-09-15 Thread Nir Soffer
On Mon, Sep 14, 2020 at 8:32 PM Daniel P. Berrangé  wrote:
>
> On Mon, Aug 31, 2020 at 05:01:27PM +0300, Nir Soffer wrote:
> > If fallocate() is not supported, posix_fallocate() falls back to
> > inefficient allocation, writing one byte for every 4k bytes[1]. This is
> > very slow compared with writing zeros. In oVirt we measured ~400%
> > improvement in allocation time when replacing posix_fallocate() with
> > manually writing zeroes[2].
> >
> > We also know that posix_fallocated() does not work well when using OFD
> > locks[3]. We don't know the reason yet for this issue yet.
> >
> > Change preallocate_falloc() to use fallocate() instead of
> > posix_falloate(), and fall back to full preallocation if not supported.
> >
> > Here are quick test results with this change.
> >
> > Before (qemu-img-5.1.0-2.fc32.x86_64):
> >
> > $ time qemu-img create -f raw -o preallocation=falloc /tmp/nfs3/test.raw 6g
> > Formatting '/tmp/nfs3/test.raw', fmt=raw size=6442450944 
> > preallocation=falloc
> >
> > real 0m42.100s
> > user 0m0.602s
> > sys 0m4.137s
> >
> > NFS stats:
> > calls  retransauthrefrshwrite
> > 15715830  1572205   1571321
> >
> > After:
> >
> > $ time ./qemu-img create -f raw -o preallocation=falloc /tmp/nfs3/test.raw 
> > 6g
> > Formatting '/tmp/nfs3/test.raw', fmt=raw size=6442450944 
> > preallocation=falloc
> >
> > real 0m15.551s
> > user 0m0.070s
> > sys 0m2.623s
> >
> > NFS stats:
> > calls  retransauthrefrshwrite
> > 24620  0  24624 24567
> >
> > [1] 
> > https://code.woboq.org/userspace/glibc/sysdeps/posix/posix_fallocate.c.html#96
> > [2] https://bugzilla.redhat.com/1850267#c25
> > [3] https://bugzilla.redhat.com/1851097
>
> This bug appears to be private to RH employees only, so rather than link
> to it, please summarize any important facts in it for benefit of nonn-RH
> QEMU contributors.

Thanks, I missed that detail when linking to the bug. The bug is public now.

> > Signed-off-by: Nir Soffer 
> > ---
> >  block/file-posix.c | 32 +-
> >  docs/system/qemu-block-drivers.rst.inc | 11 +
> >  docs/tools/qemu-img.rst| 11 +
> >  qapi/block-core.json   |  4 ++--
> >  4 files changed, 25 insertions(+), 33 deletions(-)
>
> Regards,
> Daniel
> --
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
>




Re: [PATCH 2/2] block: file-posix: Replace posix_fallocate with fallocate

2020-09-14 Thread Daniel P . Berrangé
On Mon, Aug 31, 2020 at 05:01:27PM +0300, Nir Soffer wrote:
> If fallocate() is not supported, posix_fallocate() falls back to
> inefficient allocation, writing one byte for every 4k bytes[1]. This is
> very slow compared with writing zeros. In oVirt we measured ~400%
> improvement in allocation time when replacing posix_fallocate() with
> manually writing zeroes[2].
> 
> We also know that posix_fallocated() does not work well when using OFD
> locks[3]. We don't know the reason yet for this issue yet.
> 
> Change preallocate_falloc() to use fallocate() instead of
> posix_falloate(), and fall back to full preallocation if not supported.
> 
> Here are quick test results with this change.
> 
> Before (qemu-img-5.1.0-2.fc32.x86_64):
> 
> $ time qemu-img create -f raw -o preallocation=falloc /tmp/nfs3/test.raw 6g
> Formatting '/tmp/nfs3/test.raw', fmt=raw size=6442450944 preallocation=falloc
> 
> real 0m42.100s
> user 0m0.602s
> sys 0m4.137s
> 
> NFS stats:
> calls      retrans    authrefrshwrite
> 1571583    0          1572205   1571321
> 
> After:
> 
> $ time ./qemu-img create -f raw -o preallocation=falloc /tmp/nfs3/test.raw 6g
> Formatting '/tmp/nfs3/test.raw', fmt=raw size=6442450944 preallocation=falloc
> 
> real 0m15.551s
> user 0m0.070s
> sys 0m2.623s
> 
> NFS stats:
> calls      retrans    authrefrshwrite
> 24620      0          24624 24567  
> 
> [1] 
> https://code.woboq.org/userspace/glibc/sysdeps/posix/posix_fallocate.c.html#96
> [2] https://bugzilla.redhat.com/1850267#c25
> [3] https://bugzilla.redhat.com/1851097

This bug appears to be private to RH employees only, so rather than link
to it, please summarize any important facts in it for benefit of nonn-RH
QEMU contributors.

> 
> Signed-off-by: Nir Soffer 
> ---
>  block/file-posix.c | 32 +-
>  docs/system/qemu-block-drivers.rst.inc | 11 +
>  docs/tools/qemu-img.rst| 11 +
>  qapi/block-core.json   |  4 ++--
>  4 files changed, 25 insertions(+), 33 deletions(-)

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 2/2] block: file-posix: Replace posix_fallocate with fallocate

2020-09-01 Thread Alberto Garcia
On Mon 31 Aug 2020 04:01:27 PM CEST, Nir Soffer wrote:
> If fallocate() is not supported, posix_fallocate() falls back to
> inefficient allocation, writing one byte for every 4k bytes[1]. This is
> very slow compared with writing zeros. In oVirt we measured ~400%
> improvement in allocation time when replacing posix_fallocate() with
> manually writing zeroes[2].
>
> We also know that posix_fallocated() does not work well when using OFD
> locks[3]. We don't know the reason yet for this issue yet.
>
> Change preallocate_falloc() to use fallocate() instead of
> posix_falloate(), and fall back to full preallocation if not
> supported.


>  case PREALLOC_MODE_FALLOC:
>  result = preallocate_falloc(fd, current_length, offset, errp);
> -goto out;
> +if (result != -ENOTSUP)
> +goto out;
> +/* If fallocate() is not supported, fallback to full preallocation. 
> */

With the missing braces in this if statement,

Reviewed-by: Alberto Garcia 

Berto



[PATCH 2/2] block: file-posix: Replace posix_fallocate with fallocate

2020-08-31 Thread Nir Soffer
If fallocate() is not supported, posix_fallocate() falls back to
inefficient allocation, writing one byte for every 4k bytes[1]. This is
very slow compared with writing zeros. In oVirt we measured ~400%
improvement in allocation time when replacing posix_fallocate() with
manually writing zeroes[2].

We also know that posix_fallocated() does not work well when using OFD
locks[3]. We don't know the reason yet for this issue yet.

Change preallocate_falloc() to use fallocate() instead of
posix_falloate(), and fall back to full preallocation if not supported.

Here are quick test results with this change.

Before (qemu-img-5.1.0-2.fc32.x86_64):

$ time qemu-img create -f raw -o preallocation=falloc /tmp/nfs3/test.raw 6g
Formatting '/tmp/nfs3/test.raw', fmt=raw size=6442450944 preallocation=falloc

real 0m42.100s
user 0m0.602s
sys 0m4.137s

NFS stats:
calls      retrans    authrefrshwrite
1571583    0          1572205   1571321

After:

$ time ./qemu-img create -f raw -o preallocation=falloc /tmp/nfs3/test.raw 6g
Formatting '/tmp/nfs3/test.raw', fmt=raw size=6442450944 preallocation=falloc

real 0m15.551s
user 0m0.070s
sys 0m2.623s

NFS stats:
calls      retrans    authrefrshwrite
24620      0          24624 24567  

[1] 
https://code.woboq.org/userspace/glibc/sysdeps/posix/posix_fallocate.c.html#96
[2] https://bugzilla.redhat.com/1850267#c25
[3] https://bugzilla.redhat.com/1851097

Signed-off-by: Nir Soffer 
---
 block/file-posix.c | 32 +-
 docs/system/qemu-block-drivers.rst.inc | 11 +
 docs/tools/qemu-img.rst| 11 +
 qapi/block-core.json   |  4 ++--
 4 files changed, 25 insertions(+), 33 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 341ffb1cb4..eac3c0b412 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1835,36 +1835,24 @@ static int allocate_first_block(int fd, size_t max_size)
 static int preallocate_falloc(int fd, int64_t current_length, int64_t offset,
   Error **errp)
 {
-#ifdef CONFIG_POSIX_FALLOCATE
+#ifdef CONFIG_FALLOCATE
 int result;
 
 if (offset == current_length)
 return 0;
 
-/*
- * Truncating before posix_fallocate() makes it about twice slower on
- * file systems that do not support fallocate(), trying to check if a
- * block is allocated before allocating it, so don't do that here.
- */
-
-result = -posix_fallocate(fd, current_length,
-  offset - current_length);
+result = do_fallocate(fd, 0, current_length, offset - current_length);
 if (result != 0) {
-/* posix_fallocate() doesn't set errno. */
-error_setg_errno(errp, -result,
- "Could not preallocate new data");
+error_setg_errno(errp, -result, "Could not preallocate new data");
 return result;
 }
 
 if (current_length == 0) {
 /*
- * posix_fallocate() uses fallocate() if the filesystem supports
- * it, or fallback to manually writing zeroes. If fallocate()
- * was used, unaligned reads from the fallocated area in
- * raw_probe_alignment() will succeed, hence we need to allocate
- * the first block.
+ * Unaligned reads from the fallocated area in raw_probe_alignment()
+ * will succeed, hence we need to allocate the first block.
  *
- * Optimize future alignment probing; ignore failures.
+ * Optimizes future alignment probing; ignore failures.
  */
 allocate_first_block(fd, offset);
 }
@@ -1973,10 +1961,12 @@ static int handle_aiocb_truncate(void *opaque)
 }
 
 switch (prealloc) {
-#ifdef CONFIG_POSIX_FALLOCATE
+#ifdef CONFIG_FALLOCATE
 case PREALLOC_MODE_FALLOC:
 result = preallocate_falloc(fd, current_length, offset, errp);
-goto out;
+if (result != -ENOTSUP)
+goto out;
+/* If fallocate() is not supported, fallback to full preallocation. */
 #endif
 case PREALLOC_MODE_FULL:
 result = preallocate_full(fd, current_length, offset, errp);
@@ -3080,7 +3070,7 @@ static QemuOptsList raw_create_opts = {
 .name = BLOCK_OPT_PREALLOC,
 .type = QEMU_OPT_STRING,
 .help = "Preallocation mode (allowed values: off"
-#ifdef CONFIG_POSIX_FALLOCATE
+#ifdef CONFIG_FALLOCATE
 ", falloc"
 #endif
 ", full)"
diff --git a/docs/system/qemu-block-drivers.rst.inc 
b/docs/system/qemu-block-drivers.rst.inc
index b052a6d14e..8e4acf397e 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -25,11 +25,12 @@ This section describes each format and the options that are 
supported for it.
   .. program:: raw
   .. option:: preallocation
 
-Preallocation mode (allowed values: ``off``, ``falloc``,
-``full``). ``falloc`` mode preallocates space for image by
-