Hi Carmelo,

Check in robinhood logs was is the slowest operation in robinhood pipeline (grep STATS ...), and where the operrations are stacked (waiting status).

- If the limiting point it the filesystem access (GET_INFO_FS stage) and the slowdown is caused by the filesystem load, I don't know how I can help you.

- If the limiting point is the DB access (DB_APPLY stage), consider this:

"accounting" feature is a performance killer for the DB access pattern.
If you accept to wait a few minutes to get user/group info, i suggest you to:
1) disable it (user_acct/group_acct = no)
2) then run "rbh-config reset_acct"
3) get the last code version of robinhood that will be included in 2.5.5.
In particular it allows to mix parallel and batched DB access when accounting is disabled. This results in a x3-x4 speed up of DB operations.
|git clone https://github.com/cea-hpc/robinhood.git|
    sh autogen.sh
    ./configure
    make rpm
4) remove your "max_batch_size = 1;" tuning once you upgraded to this version.


Your my.cnf looks good, but I'm not a MySQL expert.

A few comments about your robinhood config below:


On 05/06/15 10:07, Carmelo Ponti (CSCS) wrote:
# List Manager configuration
ListManager
{
     # Method for committing information to database.
     commit_behavior = autocommit ;
Do you get better performances with "autocommit" compared to "transaction"?

     user_acct  = enabled ;
     group_acct = enabled ;
See my previous recommendation about this.
     match_classes = TRUE;

If you don't care about fileclass reports (rbh-report --class-info) you can disable "match_classes".

     Ignore
     {
         type == directory
         and
         ( name == ".snapdir" or name == ".snapshot" )
     }
This is useless with Lustre.


# ChangeLog Reader configuration
# Parameters for processing MDT changelogs :
ChangeLog
{
...
     queue_max_size   = 1000 ;
     queue_max_age    = 5s ;
     queue_check_interval = 1s ;
}
You can try increasing max size and max age (x2?) to get more chance to eliminate redundant changelog records.

Purge_Trigger
{
     trigger_on         = global_usage ;
Trigerring purge on OST_usage is more efficient, and safer to avoid ENOSPC errors for users.

Regards,
Thomas
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to