>The underlying problem with ext4 is that some kde executables do
>something like this:
>1a) open and read data from file x, close file x
>1b) open and truncate file x
>1c) write data to file x
>1d) close file x
>
>or
>
>2a) open and read data from file x, close file x
>2b) open and truncate file x.new
>2c) write data to file x.new
>2d) close file x.new
>2e) rename file x.new to file x
>
>Concerning case 1) I think ZFS may lose data if power is lost right
>after 1b) and open(xxx,O_WRONLY|O_TRUNC|O_CREAT) is issued in a
>transaction group separately from the one containing 1c/1d.

Yes, I would assume that is possible but the change for it happening is
small.  Other filesystems prefer to write the meta data prompt, but
ZFS will easily wait until the file is completely written.

And UFS has the extra problem that it can change the file size and reading
will show garbage in the file.  (It changes the inode, possibly because 
it's in the log, but it hasn't written the data).   We've seen that problem
with UFS and the /etc/*_* driver files.  Precisely because we didn't
flush/fsync.  (And in some cases we used fsync(fileno(file)), but the
new content was still in the stdio buffer)

Only "versioned" filesystems can make the first sequence work.

>Concerning case 2) I cannot see ZFS losing any data, because of
>copy-on-write and transaction grouping.
>
>Theodore Ts'o (ext4 developer) commented that both cases are flawed and
>cannot be supported correctly, because of a lacking fsync() before
>close. Is this correct? His comment is over here:
>https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54

Perhaps we should Theodore Ts'o that ZFS gets this right :-)

I'm assuming that all transactions in group N all happened before
group N+1 at least when it comes to the partial order in which the
transactions happen.

Casper

_______________________________________________
zfs-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to