Thanks for your comment! On 2016-12-18 John Reiser wrote: > On 12/18/2016 11:30 AM, Lasse Collin wrote: > > There's a bug report about data loss due computer losing power. > > fsync() or fdatasync() in xz would quite likely have avoided the > > data loss. > > > > https://bugs.debian.org/814089 > > > > I considered adding fsync() to xz a long ago (it probably was before > > 5.0.0). I didn't add it because it had a huge performance impact > > when compressing many small files, and that downside still exists. > > gzip and bzip2 don't use fsync(). On the other hand, performance > > doesn't matter so much if there's a too high chance of data loss. > > The chance of data loss is somewhat small. Also, in practice the > actual damages can be bounded by regular backups [in the case of bug > 814089 above, a USB flash memory drive would have served], and/or by > using the option "data=journal" when mounting an ext4 file system.
I agree that the chance is fairly small and backups can help, but on the other hand it feels stupid that one can lose data in these situations. Zero-length files (especially config files) on ext4 were a hot topic in 2009 because with some programs the problem occurred often enough to be a real problem (it's not acceptable to require users to cherry-pick dozens of desktop environment config files from backups when the computer crashes or loses power). File system people said that apps should use fsync() and some app developers said it's too slow when one only needs a barrier i.e. only the write ordering matters, not the timing when it gets written to the disk. Despite the push to make apps use fsync() in 2009, many still don't use it. For example, "mv /hdd1/foo /hdd2/" using GNU coreutils creates a file on the second hard disk and unlinks the file from the first disk and never calls fsync() on the target file (or the directory containing it). So in this situation mv has a potential data loss issue in a roughly comparable way as xz, gzip, and bzip2 do. Yet I feel that it's good that mv doesn't call fsync() because it would make it slower, especially when moving many small files. I use cp + sync + rm when I'm paranoid about this. > For the truly paranoid, LD_PRELOAD a shared library which intercepts > the rename() system call, and apply fsync()/fdatasync(). This won't work so well with xz. This isn't the typical open + write + close + rename that is done e.g. with many config files. Instead of rename() there is unlink() in xz. ext4 has a hack exposed via the option auto_da_alloc (enabled by default) that handles the config file situation but not the xz behavior (which is fine; I'm not blaming ext4 at all). Also, requiring LD_PRELOAD hacks isn't user friendly and few will know about it until they have had their first accident. Having --fsync option in xz and telling users to put it in the XZ_DEFAULTS environment variable would be less bad, but again, few would do so until they have had their first accident. Defaulting to fsync and having --no-fsync would avoid this, but it would annoy another set of people. > Keep the existing non-fsync() behavior of the xz app. So far everyone I've asked has been in favor non-fsync() behavior. I will go with that in xz 5.2.3 which I plan to release very soon, perhaps even during this year. :-) -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode