2011-10-24 12:49, Humberto N. Castejon Martinez пишет:
Hi,

I would like to share my ZFS filesystem over the network and make it in addition fault tolerant. I am out after performance and fault tolerance, but I do not want to miss the advantages of deduplication, cloning and snapshoting offered by ZFS. I have read something about Lustre being integrated with ZFS, so that could be an option, right? Could I also use, for example, MooseFS? Thanks!

Well, I do hope someone will prove me wrong, but here's how I see the situation now:

A single ZFS pool (containing all the datasets, their snapshots and clones) can only be "imported" on one server at a time. So whatever the configuration (Lustre on top of ZFS datasets, or ZFS in Lustre volumes - if any of these is possible now at all), that single ZFS node would be your bottleneck and SPOF.

For fault tolerance you might have two identical (equivalent) storage nodes serving same data and replicating changes to each other (with double the storage requirements), or you can make a HA system with shared storage equally accessible by two servers (all storage, including cache SSDs which you should have in case of performance), with ZFS being served from no more than one of these servers at any time. Such a HA system might be made with dual-pathed direct connections (i.e. dual-port SAS enclosures, backplanes and further on - disks, including SSDs) connected to HBAs in two servers, or by SAN switch meshing.

In case of replication, it can be tricky to determine an authoritative side in case of conflicts. If only one side is guaranteed to use a certain dataset in RW mode at any time interval, you could replicate it to the other side by sending snapshots. If access is file-based and a certain file is only changed at one side, you might use an rsync loop to replicate these changes continually.

Either way, having several ZFS nodes with identical data is your best shot at parallel performance and fault-tolerance at once (NFS client can be configured to failover between identical servers), as long as you can figure out the RW mastership. I am not sure if you can designate a single NFS server as the write master and several others as readonly slaves (like you can with LDAP servers, for example), but even then your NFS clients would have to allow some time for write replications to propagate to slave nodes. During that time, reads (of recent changes) should also be handled by the write-master.

All-in-all, to me now this seems like a tricky quest (which I pondered for a while and abandoned for now). I would be happy to read that it is indeed possible and how that's doable ;)

//Jim


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to