2011-10-24 12:49, Humberto N. Castejon Martinez пишет:
I would like to share my ZFS filesystem over the network and make it
in addition fault tolerant. I am out after performance and fault
tolerance, but I do not want to miss the advantages of deduplication,
cloning and snapshoting offered by ZFS. I have read something about
Lustre being integrated with ZFS, so that could be an option, right?
Could I also use, for example, MooseFS? Thanks!
Well, I do hope someone will prove me wrong, but here's how I see the
A single ZFS pool (containing all the datasets, their snapshots and
clones) can only be "imported" on one server at a time. So whatever the
configuration (Lustre on top of ZFS datasets, or ZFS in Lustre volumes -
if any of these is possible now at all), that single ZFS node would be
your bottleneck and SPOF.
For fault tolerance you might have two identical (equivalent) storage
nodes serving same data and replicating changes to each other (with
double the storage requirements), or you can make a HA system with
shared storage equally accessible by two servers (all storage, including
cache SSDs which you should have in case of performance), with ZFS being
served from no more than one of these servers at any time. Such a HA
system might be made with dual-pathed direct connections (i.e. dual-port
SAS enclosures, backplanes and further on - disks, including SSDs)
connected to HBAs in two servers, or by SAN switch meshing.
In case of replication, it can be tricky to determine an authoritative
side in case of conflicts. If only one side is guaranteed to use a
certain dataset in RW mode at any time interval, you could replicate it
to the other side by sending snapshots. If access is file-based and a
certain file is only changed at one side, you might use an rsync loop to
replicate these changes continually.
Either way, having several ZFS nodes with identical data is your best
shot at parallel performance and fault-tolerance at once (NFS client can
be configured to failover between identical servers), as long as you can
figure out the RW mastership. I am not sure if you can designate a
single NFS server as the write master and several others as readonly
slaves (like you can with LDAP servers, for example), but even then your
NFS clients would have to allow some time for write replications to
propagate to slave nodes. During that time, reads (of recent changes)
should also be handled by the write-master.
All-in-all, to me now this seems like a tricky quest (which I pondered
for a while and abandoned for now). I would be happy to read that it is
indeed possible and how that's doable ;)
zfs-discuss mailing list