> initial DB population (it's a fresh
> install of MariaDB10 on a 3rd host - dedicated LUN for /var/lib/mysql
> but without and SSD devices)

Interesting. I think I might setup a test instance of MariaDB and
compare performance (if any) between MySQL and MariaDB as a
replacement for RBH. If you have any insight as to performance, please
share!

> I'm seeing an average of 15-20TB/day for the scan - is this normal?

As Scott mentioned in his follow up, TB/day is worth while
measurement, but it does not give the complete picture. In the case of
ZOT Files (Zillions of Tiny Files) this unit not helpful.

Take a look at http://sourceforge.net/apps/trac/robinhood/wiki/TunePipeline
and read what Thomas wrote. This is the best way to understand where
the bottleneck is with scanning.


> Also, some of our users have huge directory structures with millions
> of directories and tiny (o240k) files within them *cough* openfoam --
> do other sites see this and how do they deal with the filetype mix?

The ZOT problem (Zillions of Tiny Files) is a common one, even on our
cluster. We are roughly 1/5PB in size, but are quickly expanding and
will reach _over_ a PB in the next few months.

Right now we are moving away from GlusterFS (which uses a tree
structure for metadata distributed across all the storage nodes). This
approach is extremely horrible when the tree grows due to ZOT files.
For stream reads / writes this is great, but for an HPC environment
with multiple use cases, its not so great.

As a middle ground approach between GlusterFS and Lustre, we are
trying FraunhoferFS which has a dedicated metdata server similar to
that of lustre, but much much much much easier to setup and manage.

Currently we detect "ZOT Offenders" as those users who have folders on
the cluster that contain over 2,000 files in them. As we switch away
from GlusterFS, we expect to increase that number a few fold.

RBH allows us to detect these ZOT Offenders within seconds as we can
simply query the database, and then email the users. Most of the time,
its educating them on proper 'big data' techniques. Other times, its
simply reminding them to clean up after themselves (SGE output,
checkpoint files, etc).

-Adam

--
Adam Brenner
Computer Science, Undergraduate Student
Donald Bren School of Information and Computer Sciences

System Administrator, HPC Cluster
Office of Information Technology
http://hpc.oit.uci.edu/

University of California, Irvine
www.ics.uci.edu/~aebrenne/
[email protected]


On Sun, Apr 20, 2014 at 12:52 AM, Andrew Elwell <[email protected]> wrote:
> Hi folks,
>
> I suspect this is a "how long is a piece of string" question, but
> roughly what order of scan speed do other sites see on large systems?
>
> We have a 3PB /scratch hosted on sonnexion appliances (Cray) so I'm
> running 2 instances of robinhood (one on each of two esDM nodes) --
> one as a lustre changelog, and the other performing a --scan -O
> --no-gc -d to help with the initial DB population (it's a fresh
> install of MariaDB10 on a 3rd host - dedicated LUN for /var/lib/mysql
> but without and SSD devices)
>
> I'm seeing an average of 15-20TB/day for the scan - is this normal?
> Also, some of our users have huge directory structures with millions
> of directories and tiny (o240k) files within them *cough* openfoam --
> do other sites see this and how do they deal with the filetype mix?
>
>
> so far in (~7d) I have:
> type    ,      count,     volume,   avg_size
> symlink ,     269149,   19.68 MB,         77
> dir     ,   41570192,  160.88 GB,    4.06 KB
> file    ,  194195639,  134.64 TB,  744.42 KB
> fifo    ,          3,          0,          0
>
> Total: 236034983 entries, 148206163990949 bytes (134.79 TB)
>
>
> Many thanks
>
> Andrew
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/NeoTech
> _______________________________________________
> robinhood-support mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/robinhood-support

------------------------------------------------------------------------------
Start Your Social Network Today - Download eXo Platform
Build your Enterprise Intranet with eXo Platform Software
Java Based Open Source Intranet - Social, Extensible, Cloud Ready
Get Started Now And Turn Your Intranet Into A Collaboration Platform
http://p.sf.net/sfu/ExoPlatform
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to