On 26-Jul-09, at 11:08 AM, Frank Middleton wrote:

On 07/25/09 04:30 PM, Carson Gaspar wrote:

No. You'll lose unwritten data, but won't corrupt the pool, because
the on-disk state will be sane, as long as your iSCSI stack doesn't
lie about data commits or ignore cache flush commands. Why is this so
difficult for people to understand? Let me create a simple example
for you.

Are you sure about this example? AFAIK metadata refers to things like
the file's name, atime, ACLs, etc., etc. Your example seems to be more
about how a journal works, which has little to do with metatdata other
than to manage it.

Now if you were too lazy to bother to follow the instructions properly, we could end up with bizarre things. This is what happens when storage
lies and re-orders writes across boundaries.

On 07/25/09 07:34 PM, Toby Thain wrote:

The problem is assumed *ordering*. In this respect VB ignoring flushes
and real hardware are not going to behave the same.

Why? An ignored flush is ignored. It may be more likely in VB, but it
can always happen.

And whenever it does: guess what happens?

It mystifies me that VB would in some way alter
the ordering.

Carson already went through a more detailed explanation. Let me try a different one:

ZFS issues writes A, B, C, FLUSH, D, E, F.

case 1) the semantics of the flush* allow ZFS to presume that A, B, C are all 'committed' at the point that D is issued. You can understand that A, B, C may be done in any order, and D, E, F may be done in any order, due to the numerous abstraction layers involved - all the way down to the disk's internal scheduling. ANY of these layers can affect the ordering of durable, physical writes _in the absence of a flush/barrier_.

case 2) but if the flush does NOT occur with the necessary semantics, the ordering of ALL SIX operations is now indeterminate, and by the time ZFS issues D, any of the first 3 (A, B, C) may well not have been committed at all. There is a very good chance this will violate an integrity assumption (I haven't studied the source so I can't point you to a specific design detail or line; rather I am working from how I understand transactional/journaled systems to work. Assuming my argument is valid, I am sure a ZFS engineer can cite a specific violation).

As has already been mentioned in this context, I think by David Magda, ordinary hardware will show this problem _if flushes are not functioning_ (an unusual case on bare metal), while on VirtualBox this is the default!


...

Doesn't ZIL effectively make ZFS into a journalled file system

Of course ZFS is transactional, as are other filesystems and software systems, such as RDBMS. But integrity of such systems depends on a hardware flush primitive that actually works. We are getting hoarse repeating this.

--Toby

* Essentially 'commit' semantics: Flush synchronously, operation is complete only when data is durably stored.

...
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to