Re: [storage-discuss] Hung Pools

Ross Sat, 15 Nov 2008 14:09:49 -0800

Been mentioned time and time again (and in much more detail) on the ZFS forums. 
 The best post I saw summing up ZFS at this point basically said that ZFS is 
focused on data integrity, but that does not automatically mean data 
availability.


Unfortunately that's a bit of an odd distinction, and I feel ZFS is going to 
catch a few people out.  I had the pool hang / timeout issue with iSCSI, where 
a single offline device hung the entire pool.  At the time I mentioned that if 
ZFS is reliant on device drivers to spot failures, it's possible for a single 
device fault is capable of hanging your entire pool, and that can be any fault 
that hasn't been planned for, or a bug in the driver.

To my mind it's the whole blacklist / whitelist thing.  ZFS should be assuming 
that drivers can fail, and do its best to manage the pool regardless, instead 
of assuming that the code is perfect.  It already checks the integrity of the 
data (so it obviously doesn't trust the hardware / drivers completely).  Surely 
adding a timeout check isn't impossible.
-- 
This message posted from opensolaris.org
_______________________________________________
storage-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/storage-discuss

Re: [storage-discuss] Hung Pools

Reply via email to