Rob, Thanks for taking a look at OrangeFS.... I have responded to your questions inline...
On Fri, Oct 12, 2012 at 11:25 AM, <[email protected]> wrote: > Hi,**** > > ** ** > > I’m evaluating OrangeFS here at RAL at the moment for possible future use > as a distributed FS for our LHC disk storage system. I’ve got the FS > working, and have run some performance tests. My configuration looks like > this:**** > > ** ** > > **- **6*10TB storage nodes, each set up to hold both data and > metadata.**** > > **- **One client box that mounts the FS for external use.**** > > **- **db4.8.30 on every box.**** > > ** ** > > I’ve got some questions:**** > > ** ** > > **1) **Is there any way to make changes to the system without > bringing everything down first? We need to be able to add or remove nodes > to or from the OrangeFS instance on the fly, without bringing everything > down or losing data. **** > > ** i. * > *Can the system be made extensible by initially only allocating part of > the available namespace in orangefs-server.conf, and then adding more > servers into the unused regions of the namespace? > You can reserve handle ranges so its easy to add nodes later, currently this requires a restart to all the server processes (so they can be aware of the other servers and the additional ranges in each server config file), but that can be done pretty quickly with a script. We are developing toward v3 which will replace the current handle model with UUIDs (128bit) for server/handle tuples. As a part of this, we will enable dynamic adding and removing of nodes (nodes will be able to on the fly discover new nodes which have the appropriate pki). > **** > > **2) **Is the loss of a node from the FS as catastrophic as it > appears to be? Turning off the daemon on one storage/metadata node brought > down the whole system. I’m aware of the documented solution using Heartbeat > , but that only appears to keep the system upright, rather than helping us > to preserve data in the event of a failure. > The MD server failure generally does cause the whole file system to hang until that MD server comes back. This is due to some part of the MD needed to complete an operation ending up being on the unavailable node. The FS portion is more graceful at this time (only that portion is down, but it generally needs to be a FS only node). To work around this scenario we are starting to test DRBD - http://www.drbd.org/ - to be used for both MD and IO replication with corosync/pacemaker where the DRBD handles the data replication between nodes. If you can live with part of the filesystem being unavailable with an offline node and new files can be written, then you can just do the DRBD with the MD, you just have to decide how distributed you need your MD to be based on your workload (how many MD nodes you need). > **** > > **3) **Is there a timescale for the results of the development work > into improving redundancy? > So currently OrangeFS has file replication on immutable (helps some use cases, but not real time ones), we are actively working on realtime file data replication (split flow) and this should be complete and is targeted for v9.1 1st Qtr next year, this will include the ability to set replication at the directory level (or file level with the APIs). This will resolve the file availability problem without needed DRBD for IO nodes. The MD redundancy is not going to be available until v3 as part of the filesystem, (there was too much reworking to make that worth while before then), so after Q1 next year, to get the redundancy needed, you will need to use DRBD for MD and the file system will be able to handle the rest. For releases, currently the targets are: 2.8.7 (sometime before year end - probably after 2.9 beta) 2.9.0 (beta SC12 timeframe, release a month or so after) - capability based security and dist. MD for directory entries (we have been spending a lot of time working through the security aspects to make it easier to work with) 2.9.1 (end of 1st Q 2013). (add realtime replication) 3.0 - maybe SC13 thanks, -boyd > **** > > ** ** > > Thanks very much,**** > > ** ** > > Rob Appleyard**** > > ** ** > > RAL**** > > ** ** > > -- > Scanned by iCritical. > > > _______________________________________________ > Pvfs2-users mailing list > [email protected] > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users > >
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
