On Tue, Aug 19, 2008 at 2:14 AM, Joerg Sonnenberger <[EMAIL PROTECTED]> wrote: > On Sun, Aug 17, 2008 at 08:40:20PM +1000, Dmitri Nikulin wrote: >> I personally believe that Unix should have had a transactional file IO >> API from the start, so that all modern file systems would implement it >> and atomicity would be the standard, not a rare feature. > > I am not exactly sure what you mean with "atomicity", but can you > demonstrate even *one* filesystem where writes of two processes are > atomic relative to each other? I don't know any.
The COW approach in ZFS appears to do exactly that. The block pointer is not updated to the new copy of the block until it's finished writing, so one process can write and the other won't read the new block until the write completes and therefore the block pointer is updated. That's regardless of whether or not the change has been finalised on-disk. > There are also very good reasons why Unix filesystem IO never was > transactional. It is way too expensive and complex to allow that. I never said transactional IO should be the *only* way of doing IO. Having a transactional API available would mean most file systems would support it at least within the requirements. Most applications that really need it end up doing it their own way, which is fine in theory, but on some relatively unsafe filesystems like UFS and ext2, even a very thorough transactional file format (e.g. SQLite) can be corrupted, especially if the operation involves changing the metadata like the file size, which is something you do all the time while accumulating database inserts. Even on ext3 in writeback (aka fast but unsafe) mode you can have a situation where the metadata has been updated but the actual file data hasn't been written yet, so it's anyone's guess what's actually in the file on the next boot. The solution is to use full journaling (very slow) or at least ordered mode (default). Most filesystems in use today have neither! -- Dmitri Nikulin Centre for Synchrotron Science Monash University Victoria 3800, Australia
