bug#51345: dd with conv=fsync sometimes returns when its writes are still cached

2022-01-28 Thread Paul Eggert
I found a bit of time to work on this and installed the attached patch, 
which should address the issue. Thanks for reporting it.From 3368b8745046aeaa89f418f560e714b374f1a560 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Fri, 28 Jan 2022 00:01:07 -0800
Subject: [PATCH] dd: synchronize output after write errors

Problem reported by Sworddragon (Bug#51345).
* src/dd.c (cleanup): Synchronize output unless dd has been interrupted.
(synchronize_output): New function, split out from dd_copy.
Update conversions_mask so synchronization is done at most once.
(main): Do not die with the output file open, since we want to be
able to synchronize it before exiting.  Synchronize output before
exiting.
---
 NEWS   |  3 ++
 doc/coreutils.texi | 10 +++---
 src/dd.c   | 76 +-
 3 files changed, 64 insertions(+), 25 deletions(-)

diff --git a/NEWS b/NEWS
index 15c9428bd..757abee15 100644
--- a/NEWS
+++ b/NEWS
@@ -41,6 +41,9 @@ GNU coreutils NEWS-*- outline -*-
   padding them with zeros to 9 digits.  It uses clock_getres and
   clock_gettime to infer the clock resolution.
 
+  dd conv=fsync now synchronizes output even after a write error,
+  and similarly for dd conv=fdatasync.
+
   timeout --foreground --kill-after=... will now exit with status 137
   if the kill signal was sent, which is consistent with the behavior
   when the --foreground option is not specified.  This allows users to
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index c17406550..088d1764c 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -9397,14 +9397,16 @@ Continue after read errors.
 @item fdatasync
 @opindex fdatasync
 @cindex synchronized data writes, before finishing
-Synchronize output data just before finishing.  This forces a physical
-write of output data.
+Synchronize output data just before finishing,
+even if there were write errors.
+This forces a physical write of output data.
 
 @item fsync
 @opindex fsync
 @cindex synchronized data and metadata writes, before finishing
-Synchronize output data and metadata just before finishing.  This
-forces a physical write of output data and metadata.
+Synchronize output data and metadata just before finishing,
+even if there were write errors.
+This forces a physical write of output data and metadata.
 
 @end table
 
diff --git a/src/dd.c b/src/dd.c
index 957ad129e..4ddc6db12 100644
--- a/src/dd.c
+++ b/src/dd.c
@@ -939,6 +939,8 @@ iclose (int fd)
   return 0;
 }
 
+static int synchronize_output (void);
+
 static void
 cleanup (void)
 {
@@ -948,6 +950,13 @@ cleanup (void)
   alignfree (obuf);
 #endif
 
+  if (!interrupt_signal)
+{
+  int sync_status = synchronize_output ();
+  if (sync_status)
+exit (sync_status);
+}
+
   if (iclose (STDIN_FILENO) != 0)
 die (EXIT_FAILURE, errno, _("closing input file %s"), quoteaf (input_file));
 
@@ -2377,17 +2386,33 @@ dd_copy (void)
   && 0 <= reported_w_bytes && reported_w_bytes < w_bytes)
 print_xfer_stats (0);
 
-  if ((conversions_mask & C_FDATASYNC) && ifdatasync (STDOUT_FILENO) != 0)
+  return exit_status;
+}
+
+/* Synchronize output according to conversions_mask.
+   Do this even if w_bytes is zero, as fsync and fdatasync
+   flush out write requests from other processes too.
+   Clear bits in conversions_mask so that synchronization is done only once.
+   Return zero if successful, an exit status otherwise.  */
+
+static int
+synchronize_output (void)
+{
+  int exit_status = 0;
+  int mask = conversions_mask;
+  conversions_mask &= ~ (C_FDATASYNC | C_FSYNC);
+
+  if ((mask & C_FDATASYNC) && ifdatasync (STDOUT_FILENO) != 0)
 {
   if (errno != ENOSYS && errno != EINVAL)
 {
   error (0, errno, _("fdatasync failed for %s"), quoteaf (output_file));
   exit_status = EXIT_FAILURE;
 }
-  conversions_mask |= C_FSYNC;
+  mask |= C_FSYNC;
 }
 
-  if ((conversions_mask & C_FSYNC) && ifsync (STDOUT_FILENO) != 0)
+  if ((mask & C_FSYNC) && ifsync (STDOUT_FILENO) != 0)
 {
   error (0, errno, _("fsync failed for %s"), quoteaf (output_file));
   return EXIT_FAILURE;
@@ -2460,6 +2485,16 @@ main (int argc, char **argv)
| (conversions_mask & C_EXCL ? O_EXCL : 0)
| (seek_records || (conversions_mask & C_NOTRUNC) ? 0 : O_TRUNC));
 
+  off_t size;
+  if ((INT_MULTIPLY_WRAPV (seek_records, output_blocksize, )
+   || INT_ADD_WRAPV (seek_bytes, size, ))
+  && !(conversions_mask & C_NOTRUNC))
+die (EXIT_FAILURE, 0,
+ _("offset too large: "
+   "cannot truncate to a length of seek=%"PRIdMAX""
+   " (%td-byte) blocks"),
+ seek_records, output_blocksize);
+
   /* Open the output file w

bug#51345:

2021-10-29 Thread Paul Eggert

On 10/29/21 05:38, Pádraig Brady wrote:

I'm leaning towards improving this by always
doing an fsync on exit if we get a read or write error
and have successfully written any data, so that
previously written data is sync'd as requested.


Sounds like a good idea to me too.





bug#51345:

2021-10-29 Thread Pádraig Brady

On 28/10/2021 22:59, Sworddragon wrote:

Despite I'm not using Linux as main system anymore and wanted to avoid
getting into too much work I found some time to do some tests as this issue
bugs me just too much.


You could try running the following immediately after,
to see if it also returns quickly:

   blockdev --flushbufs /dev/sdb


Yes, this command also blocks for a bit over 1 minute when this issue
occurs.


Right, so that suggests conv=fsync on dd was ineffective.


Here is the output (I had to freely translate the strings since
this Knoppix instance is only in german so they might be slightly
inaccurate; Also I had to type all text since it was executed on a
different system but carefully checked to not introduce any typos):

root@Microknoppix:~# dd if=/dev/random of=/dev/sdb bs=1M conv=fsync
status=progress
1039138816 Bytes(1,0 GB, 991 MiB) copied, 56 s, 18,5 MB/s
dd: Error on writing '/dev/sdb': The device has not enough free space
999+0 records in
998+0 records out


Ah right. What's probably happening is that dd is not doing the fsync
because it's exiting early because of the ENOSPC from write(2).

To avoid needing the buffer drain (with fsync), you can use





bug#51345:

2021-10-29 Thread Sworddragon
I got an email that somebody replied to this report but it looks like they
did send it to the wrong mail address (maybe they will be merged in order)
but I will still reply to it:

> Suggestion as a possible workaround: Have a look at random(4) and
random(7),
> and ask yourself whether your use of /dev/random, rather than
/dev/urandom,
> is really necessary for your application. If not, you might try
/dev/urandom
> instead and report what you observe.
>
> As documented in those man pages, there are good reasons to avoid using
> /dev/random, not the least of which is that it blocks frequently (every
time
> the entropy pool is exhausted), whereas /dev/urandom does not. That alone
may
> explain the execution time inconsistencies you're reporting.

I'm aware of the differences of both and I don't see how the use of
/dev/random here could explain any of the issues:

- The drastically increased writing times are caused after dd claimed about
no free empty space (assuming this means dd has finished doing its explicit
writing task, e.g. writing to the transparent underlying cache) and dd's
times until this point were fine implying that either significant blocking
has never occured or dd correcly handles blocking input files by
asynchronously continuing to write to the output file to avoid unneccessary
delays.
- That dd returns while the buffers are not flushed can't also be explained
by the use of /dev/random unless there would be some very crazy out of mind
bug somewhere.

But I still did some tests with /dev/urandom and conv=fsync and I did see
all 3 cases too (Normal writing times that are slightly longer (133.247s);
Drastically increased writing times (235.906s) and early returning from dd
(56.4327s) while an immediated executed sync still blocked for over a
minute).

Also for the slightly delayed writing times (~133s with conv=fsync compared
to ~129s with oflag=direct) I noticed that with conv=fsync the LED of the
USB Thumb Drive starts to blink a few seconds after dd started showing
status progress so I assume Knoppix's kernel/settings cause the cache to be
flushed slightly delayed which seems more or less normal and would explain
this specific case of being ~4s slower.


And while I was writing this the next message got in:


> Ah right. What's happening is dd is not doing the fsync()
> as it's exiting early due to write(2) getting ENOSPC.

But that would make not much sense for 2 reasons:

1. The error message that the space went empty is the expected behavior
here and there is no rational dd should exit then instead of still calling
fsync().
2. In the most attempts after dd has thrown the error message about no free
empty space it still blocked for at least over a minute and an immediated
excuted sync always returned.

So it looks dd sometimes does call fsync() on ENOSPC and sometimes it
doesn't. And why a successfull call to fsync() then is sometimes ~2.5
slower is also a mystery.


> So this is a gotcha that should at least be documented.
> Though I'm leaning towards improving this by always
> doing an fsync on exit if we get a read or write error
> and have successfully written any data, so that
> previously written data is sync'd as requested.

Yes, ensuring fsync() being called seems to be the better option here. But
this still leaves the question from above why dd seemingly does this
sometimes already on ENOSPC? Maybe it is a race between ENOSPC and fsync()
in dd causing this bug? Eventually the sometimes occuring very delayed
writes (e.g. ~235s instead of ~133s) play a role here?


bug#51345:

2021-10-29 Thread Pádraig Brady

On 28/10/2021 22:59, Sworddragon wrote:

Despite I'm not using Linux as main system anymore and wanted to avoid
getting into too much work I found some time to do some tests as this issue
bugs me just too much.


You could try running the following immediately after,
to see if it also returns quickly:

   blockdev --flushbufs /dev/sdb


Yes, this command also blocks for a bit over 1 minute when this issue
occurs.


Right that suggests the conv=fsync to dd was ineffective


Here is the output (I had to freely translate the strings since
this Knoppix instance is only in german so they might be slightly
inaccurate; Also I had to type all text since it was executed on a
different system but carefully checked to not introduce any typos):

root@Microknoppix:~# dd if=/dev/random of=/dev/sdb bs=1M conv=fsync
status=progress
1039138816 Bytes(1,0 GB, 991 MiB) copied, 56 s, 18,5 MB/s
dd: Error on writing '/dev/sdb': The device has not enough free space
999+0 records in
998+0 records out


Ah right. What's happening is dd is not doing the fsync()
as it's exiting early due to write(2) getting ENOSPC.

As you've seen you can avoid the need for fsync()
to flush buffers with oflag=direct.
A reason that might be faster also is that depending on your free mem,
you would avoid churning the kernel caches.

Another way to at least ensure the conv=fsync was effective,
would be to not write too much.  I.e. you could determine
the exact size of the disk (with `blockdev --getsize64 /dev/sdb` for e.g.)
and then use an appropriate bs= and count=. That's awkward though
and difficult to do with good performance due to larger block sizes
not generally aligning with the device size.

So this is a gotcha that should at least be documented.
Though I'm leaning towards improving this by always
doing an fsync on exit if we get a read or write error
and have successfully written any data, so that
previously written data is sync'd as requested.

cheers,
Pádraig





bug#51345:

2021-10-28 Thread Sworddragon
Despite I'm not using Linux as main system anymore and wanted to avoid
getting into too much work I found some time to do some tests as this issue
bugs me just too much.

> You could try running the following immediately after,
> to see if it also returns quickly:
>
>   blockdev --flushbufs /dev/sdb

Yes, this command also blocks for a bit over 1 minute when this issue
occurs. Here is the output (I had to freely translate the strings since
this Knoppix instance is only in german so they might be slightly
inaccurate; Also I had to type all text since it was executed on a
different system but carefully checked to not introduce any typos):

root@Microknoppix:~# dd if=/dev/random of=/dev/sdb bs=1M conv=fsync
status=progress
1039138816 Bytes(1,0 GB, 991 MiB) copied, 56 s, 18,5 MB/s
dd: Error on writing '/dev/sdb': The device has not enough free space
999+0 records in
998+0 records out
1047526912 Bytes (1,0 GB, 999 MiB) copied, 57,0283 s, 18,4 MB/s
root@Microknoppix:~# time blockdev --flushbufs /dev/sdb

real1m9,747s
user0m0,001s
sys 0m0,005s


(A bit offtopic, but maybe another minor bug: The first line of the copied
bytes differ slightly with every attempt (like 991 MiB, 994 MiB, 997 MiB,
etc.) but the last line seems to be always the same up to the last byte. I
had the impression the output from status=progress does not do a final
update once dd throws the writing error because the free space went out, is
my assumption correct? If so, it would probably make sense and be helpful
for others on debugging attempts if dd would do a final update. But back to
the original topic)


> Also with rates status.
>
> dd if=/dev/random of=/dev/sdb bs=1M oflag=direct status=progress

With conv=fsync the bug usually occurs every 3-4 attempts (but the sample
size is small) so I decided to do 20 attempts with oflag=direct. The issue
did not appear once and every attempt was roughly 129 seconds (+/- 0.5
seconds) and an executed sync afterwards always returned immediately. Does
direct I/O signal the related storage device to not use its bultin-cache?
If not this implies that the USB Thumb Drive's firmware is very likely not
faulty and does not contribute to this issue.

But I noticed some other thing (I also noticed it in the past but it got
now my attention after all the stable oflag=direct writes): With conv=fsync
the writing time is sometimes around double as high as it should be. On
doing now 3 attempts with it dd showed me final times of 244,382s, 239,127s
and 135,389s, while the status progress always stopped after 56s. So
waiting for the buffers to be flushed after dd claimed about no free empty
storage space the times differ very significantly at best roughly as fast
as oflag=direct and at worst ~2.5 times slower than they should be.


To sum this up we have 2 issues:

1. The status progress from status=progress does not make a final update
(which would be useful) - but that is offtopic and probably easy to fix so
I guess there is no more for me to do here.
2. fsync returns often either early when the buffers are not fully flushed
yet or it drastically increases the writing time compared to normal
attempts (even the normal attempts are apparently quite a bit slower than
direct I/O while probably the opposite should be the case) which is very
likely either a bug in dd or something in the Linux Kernel's fsync()
imlementation is very broken.


bug#51345: dd with conv=fsync sometimes returns when its writes are still cached

2021-10-25 Thread Bob Proulx
Sworddragon wrote:
> On Knoppix 9.1 with the Linux Kernel 5.10.10-64 x86_64 and GNU Coreutils
> 8.32 I wanted to overwrite my USB Thumb Drive a few times with random data
> via "dd if=/dev/random of=/dev/sdb bs=1M conv=fsync". While it usually
> takes ~2+ minutes to perform this action dd returned once after less than
> 60 seconds which made me a bit curious.

I suggest another try using oflag=direct instead of conv=fsync.

dd if=/dev/random of=/dev/sdb bs=1M oflag=direct

Also with rates status.

dd if=/dev/random of=/dev/sdb bs=1M oflag=direct status=progress

Here is the documentation for it.

  ‘oflag=FLAG[,FLAG]...’

 ‘direct’
  Use direct I/O for data, avoiding the buffer cache.  Note that
  the kernel may impose restrictions on read or write buffer
  sizes.  For example, with an ext4 destination file system and
  a Linux-based kernel, using ‘oflag=direct’ will cause writes
  to fail with ‘EINVAL’ if the output buffer size is not a
  multiple of 512.

Bob






bug#51345:

2021-10-24 Thread Sworddragon
> You could try running the following immediately after,
> to see if it also returns quickly:
>
>blockdev --flushbufs /dev/sdb


The issue does not reproduce always and the related USB Thumb Drive has
already been prepared for and does store important data so that is not an
easy task. The USB Thumb Drive is also a pretty old device (roughly 10
years or even older) with only 1 GB of storage space. When dd with
conv=fsync returned after half of its usual writing time I guess it is
unlikely that the controller of the USB Thumb Drive has its own dedicated
512 MiB buffer attached to it.


> Well we're relying on the kernel here to not return from fync()
> until appropriate.

But the question is if there is a minor unobvious bug somewhere is the
controlling logic of dd that might still cause such a bug. But I checked
the manpages for the sync() and fsync() calls and they are actually quite
interesting. fsync() describes as flushing the caches for even data
retrieval after crashes/reboots. But the interesting part here is that it
describes after that it blocks until the devices reports completion. But
what happens if the device reports completion even if the kernel still sees
cached writes in its memory-mapped area (since storage devices are like
their own small computers and could lie or have faulty firmwares)? If
fsync() returns early here it would not be against the documention in the
manpage. sync() is here more simple as it describes itself as writing all
pending modifications for file (meta)data to the underlying filesystems. If
this would result returning after the device reports completion but the
kernel still sees cached writes in its context that would be strictly a
bug. The interesting part here is that the notes section of sync() sets
sync(), syncfs() and fsync() equal in guarantees.

With this information I see 3 possibilities here:

1. This is a bug in the controlling logic of dd that might not be obvious
at all.
2. This is a bug in fsync() or somewhere more below in the Linux Kernel.
3. Returning early is the intended behavior of fsync() and does not
strictly conflict with the manpage.

If the last is the case it might be worth proposing a change to the Linux
Kernel to additionally ensure that all cached writes are being sent out
from the Kernel's context before a return from fsync() is possible. It
would also mean that currently users can't rely on fsync() (e.g. via dd
with conv=fsync) to ensure the data has been flushed - instead they would
need to take additional action like executing a sync in the terminal
afterwards.


bug#51345: dd with conv=fsync sometimes returns when its writes are still cached

2021-10-23 Thread Pádraig Brady

On 23/10/2021 09:18, Sworddragon wrote:

On Knoppix 9.1 with the Linux Kernel 5.10.10-64 x86_64 and GNU Coreutils
8.32 I wanted to overwrite my USB Thumb Drive a few times with random data
via "dd if=/dev/random of=/dev/sdb bs=1M conv=fsync". While it usually
takes ~2+ minutes to perform this action dd returned once after less than
60 seconds which made me a bit curious. On a later attempt dd returned with
this command after ~1 minute again but the LED on my USB Thumb Drive was
still blinking and an immediated executed sync on the terminal blocked for
~1 minute and once it returned the LED on my USB Thumb Drive also stopped
blinking.

Knoppix 9.1 was booted as a live system from a DVD and the only another
persistent storage attached to it was an internal HDD which was already
overwritten the same way via a past booting session.


Unfortunately Linux is not my main operating system anymore so I randomly
encountered this issue on a nearly 1 year old distribution which might be
slightly outdated. But the issue is pretty severe as it can lead to
significant data loss so it is worth at least reporting it (and having the
"bad" behavior to always call sync after dd nonetheless will probably now
continue to stay for quite a while).



Well we're relying on the kernel here to not return from fync()
until appropriate.  You could try running the following immediately after,
to see if it also returns quickly:

  blockdev --flushbufs /dev/sdb

Note a subsequent `sync /dev/sdb` would probably not be effective,
since that would only consider data written by theh sync process
(which would be nothing).

cheers,
Pádraig





bug#51345: dd with conv=fsync sometimes returns when its writes are still cached

2021-10-23 Thread Sworddragon
On Knoppix 9.1 with the Linux Kernel 5.10.10-64 x86_64 and GNU Coreutils
8.32 I wanted to overwrite my USB Thumb Drive a few times with random data
via "dd if=/dev/random of=/dev/sdb bs=1M conv=fsync". While it usually
takes ~2+ minutes to perform this action dd returned once after less than
60 seconds which made me a bit curious. On a later attempt dd returned with
this command after ~1 minute again but the LED on my USB Thumb Drive was
still blinking and an immediated executed sync on the terminal blocked for
~1 minute and once it returned the LED on my USB Thumb Drive also stopped
blinking.

Knoppix 9.1 was booted as a live system from a DVD and the only another
persistent storage attached to it was an internal HDD which was already
overwritten the same way via a past booting session.


Unfortunately Linux is not my main operating system anymore so I randomly
encountered this issue on a nearly 1 year old distribution which might be
slightly outdated. But the issue is pretty severe as it can lead to
significant data loss so it is worth at least reporting it (and having the
"bad" behavior to always call sync after dd nonetheless will probably now
continue to stay for quite a while).