Thanks for your comment!

On 2016-12-18 John Reiser wrote:
> On 12/18/2016 11:30 AM, Lasse Collin wrote:
> > There's a bug report about data loss due computer losing power.
> > fsync() or fdatasync() in xz would quite likely have avoided the
> > data loss.
> >
> >     https://bugs.debian.org/814089
> >
> > I considered adding fsync() to xz a long ago (it probably was before
> > 5.0.0). I didn't add it because it had a huge performance impact
> > when compressing many small files, and that downside still exists.
> > gzip and bzip2 don't use fsync(). On the other hand, performance
> > doesn't matter so much if there's a too high chance of data loss.  
> 
> The chance of data loss is somewhat small.  Also, in practice the
> actual damages can be bounded by regular backups [in the case of bug
> 814089 above, a USB flash memory drive would have served], and/or by
> using the option "data=journal" when mounting an ext4 file system.

I agree that the chance is fairly small and backups can help, but on
the other hand it feels stupid that one can lose data in these
situations.

Zero-length files (especially config files) on ext4 were a hot topic in
2009 because with some programs the problem occurred often enough to be
a real problem (it's not acceptable to require users to cherry-pick
dozens of desktop environment config files from backups when the
computer crashes or loses power). File system people said that apps
should use fsync() and some app developers said it's too slow when one
only needs a barrier i.e. only the write ordering matters, not the
timing when it gets written to the disk.

Despite the push to make apps use fsync() in 2009, many still don't
use it. For example, "mv /hdd1/foo /hdd2/" using GNU coreutils creates a
file on the second hard disk and unlinks the file from the first disk
and never calls fsync() on the target file (or the directory containing
it). So in this situation mv has a potential data loss issue in a
roughly comparable way as xz, gzip, and bzip2 do. Yet I feel that it's
good that mv doesn't call fsync() because it would make it slower,
especially when moving many small files. I use cp + sync + rm when I'm
paranoid about this.

> For the truly paranoid, LD_PRELOAD a shared library which intercepts
> the rename() system call, and apply fsync()/fdatasync().

This won't work so well with xz. This isn't the typical open + write +
close + rename that is done e.g. with many config files. Instead of
rename() there is unlink() in xz. ext4 has a hack exposed via the option
auto_da_alloc (enabled by default) that handles the config file
situation but not the xz behavior (which is fine; I'm not blaming ext4
at all).

Also, requiring LD_PRELOAD hacks isn't user friendly and few will know
about it until they have had their first accident.

Having --fsync option in xz and telling users to put it in the
XZ_DEFAULTS environment variable would be less bad, but again, few
would do so until they have had their first accident. Defaulting to
fsync and having --no-fsync would avoid this, but it would annoy another
set of people.

> Keep the existing non-fsync() behavior of the xz app.

So far everyone I've asked has been in favor non-fsync() behavior. I
will go with that in xz 5.2.3 which I plan to release very soon,
perhaps even during this year. :-)

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode

Reply via email to