Re: [Pvfs2-users] Evaluating OrangeFS at Rutherford Appleton Labs

Boyd Wilson Mon, 15 Oct 2012 07:06:29 -0700

Rob,
Thanks for taking a look at OrangeFS....   I have responded to your
questions inline...

On Fri, Oct 12, 2012 at 11:25 AM, <[email protected]> wrote:

>  Hi,****
>
> ** **
>
> I’m evaluating OrangeFS here at RAL at the moment for possible future use
> as a distributed FS for our LHC disk storage system. I’ve got the FS
> working, and have run some performance tests. My configuration looks like
> this:****
>
> ** **
>
> **-          **6*10TB storage nodes, each set up to hold both data and
> metadata.****
>
> **-          **One client box that mounts the FS for external use.****
>
> **-          **db4.8.30 on every box.****
>
> ** **
>
> I’ve got some questions:****
>
> ** **
>
> **1)      **Is there any way to make changes to the system without
> bringing everything down first? We need to be able to add or remove nodes
> to or from the OrangeFS instance on the fly, without bringing everything
> down or losing data. ****
>
> **                                                               i.      *
> *Can the system be made extensible by initially only allocating part of
> the available namespace in orangefs-server.conf, and then adding more
> servers into the unused regions of the namespace?
>

You can reserve handle ranges so its easy to add nodes later, currently
this requires a restart to all the server processes (so they can be aware
of the other servers and the additional ranges in each server config file),
but that can be done pretty quickly with a script.   We are developing
toward v3 which will replace the current handle model with UUIDs (128bit)
for server/handle tuples.   As a part of this, we will enable dynamic
adding and removing of nodes (nodes will be able to on the fly discover new
nodes which have the appropriate pki).

> ****
>
> **2)      **Is the loss of a node from the FS as catastrophic as it
> appears to be? Turning off the daemon on one storage/metadata node brought
> down the whole system. I’m aware of the documented solution using Heartbeat
> , but that only appears to keep the system upright, rather than helping us
> to preserve data in the event of a failure.
>
The MD server failure generally does cause the whole file system to hang
until that MD server comes back.  This is due to some part of the MD needed
to complete an operation ending up being on the unavailable node.   The FS
portion is more graceful at this time (only that portion is down, but it
generally needs to be a FS only node).

To work around this scenario we are starting to test DRBD -
http://www.drbd.org/ - to be used for both MD and IO replication with
corosync/pacemaker where the DRBD handles the data replication between
nodes.   If you can live with part of the filesystem being unavailable with
an offline node and new files can be written, then you can just do the DRBD
with the MD, you just have to decide how distributed you need your MD to be
based on your workload (how many MD nodes you need).

> ****
>
> **3)      **Is there a timescale for the results of the development work
> into improving redundancy?
>
So currently OrangeFS has file replication on immutable (helps some use
cases, but not real time ones), we are actively working on realtime file
data replication (split flow) and this should be complete and is targeted
for v9.1 1st Qtr next year, this will include the ability to set
replication at the directory level (or file level with the APIs).   This
will resolve the file availability problem without needed DRBD for IO
nodes.   The MD redundancy is not going to be available until v3 as part of
the filesystem, (there was too much reworking to make that worth while
before then), so after Q1 next year, to get the redundancy needed, you will
need to use DRBD for MD and the file system will be able to handle the rest.

For releases, currently the targets are:

2.8.7 (sometime before year end - probably after 2.9 beta)
2.9.0 (beta SC12 timeframe, release a month or so after) - capability based
security and dist. MD for directory entries (we have been spending a lot of
time working through the security aspects to make it easier to work with)
2.9.1 (end of 1st Q 2013). (add realtime replication)
3.0 - maybe SC13

thanks,
-boyd

> ****
>
> ** **
>
> Thanks very much,****
>
> ** **
>
> Rob Appleyard****
>
> ** **
>
> RAL****
>
> ** **
>
> --
> Scanned by iCritical.
>
>
> _______________________________________________
> Pvfs2-users mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
>

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Re: [Pvfs2-users] Evaluating OrangeFS at Rutherford Appleton Labs

Reply via email to