Re: [robinhood-support] High CPU usage of robinhood --readlog --once

LEIBOVICI Thomas Fri, 09 Sep 2016 02:42:23 -0700

2016/09/09 09:12:46 [21484/2] STATS | 1: GET_INFO_DB | 9990 | 0 |0 | 1819630 | 0.20 |2016/09/09 09:12:46 [21484/2] STATS | 2: GET_INFO_FS | 0 | 10 |0 | 1819613 | 3.75 |

It shows robinhood threads are busy running FS operations, that's whythere are incoming records waiting to pass the GET_INFO_DB step, butdoesn't means there is a problem about the DB performance.FS latency is a little high (3.75ms), but not catastrophic is it runs 10in parallel.


If not already done, you can tune Lustre osc and mdc's max_rpcs_in_flight:
/proc/fs/lustre/*/*/max_rpcs_in_flight

Also consider tuning peer_credits of ko2iblnd module, but this requiresto tune it on servers and lnet routers too.


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.02    0.00   80.22    0.00    0.00   19.76
This indicates most of the CPU load is consumed by the system.
It seams there is a lot of activity in the lustre client.

A possible explanation is that it manages a too large MD cache.
This can be reduced by tuning lru_size (usually between 100 and 400):
/proc/fs/lustre/ldlm/namespaces/*/lru_size

However, I think the situation will become better once robinhood willhave dequeued all changelog accumulated during your initial scan.It is currently processing records emitted 1day 1/2 ago, but processthem 4 times faster than their accumulation speed, so it should finallyprocess them all.2016/09/09 09:12:46 [21484/2] STATS | last read record time =2016/09/07 20:28:30.099225

2016/09/09 09:12:46 [21484/2] STATS |    processing speed ratio   = 4.00

Regards,
Thomas

On 09/09/16 09:24, Marcin Stolarek wrote:

OK, I've tried a few times. Did some mysql tuning and what I have now is:
2016/09/09 09:12:46 [21484/2] STATS | ==================== Dumpingstats at 2016/09/09 09:12:46 =====================2016/09/09 09:12:46 [21484/2] STATS | ======== General statistics=========2016/09/09 09:12:46 [21484/2] STATS | Daemon start time: 2016/09/0908:42:45
2016/09/09 09:12:46 [21484/2] STATS | Started modules: log_reader
2016/09/09 09:12:46 [21484/2] STATS | ChangeLog reader #0:
2016/09/09 09:12:46 [21484/2] STATS |    fs_name    = plkra
2016/09/09 09:12:46 [21484/2] STATS |    mdt_name   = MDT0000
2016/09/09 09:12:46 [21484/2] STATS |    reader_id  = cl1
2016/09/09 09:12:46 [21484/2] STATS |    records read    = 1830902
2016/09/09 09:12:46 [21484/2] STATS |    interesting records = 1830620
2016/09/09 09:12:46 [21484/2] STATS |    suppressed records  = 282
2016/09/09 09:12:46 [21484/2] STATS |    records pending     = 1000
2016/09/09 09:12:46 [21484/2] STATS | last received =2016/09/09 09:04:132016/09/09 09:12:46 [21484/2] STATS | last read record time =2016/09/07 20:28:30.0992252016/09/09 09:12:46 [21484/2] STATS | last read record id =183911162016/09/09 09:12:46 [21484/2] STATS | last pushed record id =183901172016/09/09 09:12:46 [21484/2] STATS | last committed record id =183800832016/09/09 09:12:46 [21484/2] STATS | last cleared record id =183798642016/09/09 09:12:46 [21484/2] STATS | read speed = 677.18record/sec
2016/09/09 09:12:46 [21484/2] STATS |    processing speed ratio   = 4.00
2016/09/09 09:12:46 [21484/2] STATS |    status         = busy
2016/09/09 09:12:46 [21484/2] STATS |    ChangeLog stats:
2016/09/09 09:12:46 [21484/2] STATS | MARK: 0, CREAT: 262, MKDIR:9, HLINK: 0, SLINK: 0, MKNOD: 0, UNLNK: 7, RMDIR: 0, RENME: 02016/09/09 09:12:46 [21484/2] STATS | RNMTO: 0, OPEN: 0, CLOSE: 0,LYOUT: 0, TRUNC: 48, SATTR: 1830576, XATTR: 0, HSM: 0
2016/09/09 09:12:46 [21484/2] STATS |    MTIME: 0, CTIME: 0, ATIME: 0
2016/09/09 09:12:46 [21484/2] STATS | ==== EntryProcessor PipelineStats ===
2016/09/09 09:12:46 [21484/2] STATS | Idle threads: 0
2016/09/09 09:12:46 [21484/2] STATS | Id constraints count: 10000(hash min=0/max=6/avg=0.6)2016/09/09 09:12:46 [21484/2] STATS | Name constraints count: 18 (hashmin=0/max=1/avg=0.0)2016/09/09 09:12:46 [21484/2] STATS | Stage | Wait | Curr | Done |Total | ms/op |2016/09/09 09:12:46 [21484/2] STATS | 0: GET_FID | 0 | 0 |0 | 0 | 0.00 |2016/09/09 09:12:46 [21484/2] STATS | 1: GET_INFO_DB | 9990 | 0 |0 | 1819630 | 0.20 |2016/09/09 09:12:46 [21484/2] STATS | 2: GET_INFO_FS | 0 | 10 |0 | 1819613 | 3.75 |2016/09/09 09:12:46 [21484/2] STATS | 3: PRE_APPLY | 0 | 0 |0 | 1819606 | 0.00 |2016/09/09 09:12:46 [21484/2] STATS | 4: DB_APPLY | 0 | 0 |0 | 1819606 | 0.17 | 99.76% batched (avg batch size: 17.0)2016/09/09 09:12:46 [21484/2] STATS | 5: CHGLOG_CLR | 0 | 0 |0 | 1819620 | 0.01 |2016/09/09 09:12:46 [21484/2] STATS | 6: RM_OLD_ENTRIES | 0 | 0| 0 | 0 | 0.00 |2016/09/09 09:12:46 [21484/2] STATS | DB ops:get=1819360/ins=246/upd=1819360/rm=0
2016/09/09 09:12:46 [21484/2] STATS | --- Pipeline stage details ---
2016/09/09 09:12:46 [21484/2] STATS | GET_INFO_DB : first: changelogrecord #18380115, fid=[0x2000291ab:0xdb7b:0x0], status=waiting2016/09/09 09:12:46 [21484/2] STATS | GET_INFO_DB : last: changelogrecord #18390116, fid=[0x200002eae:0x1b148:0x0], status=waiting2016/09/09 09:12:46 [21484/2] STATS | GET_INFO_FS : first: changelogrecord #18380085, fid=[0x200025ab0:0x1a350:0x0], status=processing2016/09/09 09:12:46 [21484/2] STATS | GET_INFO_FS : last: changelogrecord #18380112, fid=[0x2000291ab:0xdb7a:0x0], status=processing
Whay do someone experienced with robinhood can read from this? I seethat I have a lot of waiting GET_INFO_DB operations - does it mean mydatabase is responding slow? Should I check slow queries?
My server responsiveness is limited :)

What I have from I/O stat is:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.02    0.00   80.22    0.00    0.00   19.76

Device:            tps    MB_read/s    MB_wrtn/s  MB_read    MB_wrtn
sda               1.88         0.00         0.02  0          0


cheers,
Marcin
2016-09-07 13:31 GMT+02:00 LEIBOVICI Thomas <[email protected]<mailto:[email protected]>>:
    On 09/07/16 13:03, Marcin Stolarek wrote:
    Hi,

    I installed robinhood 3.0-rc1 with lustre 2.5. Initial scan of
    the filesystem took around 2 days. For this time the changelog
    was enabled.
    Today I started:
    #robinhood --readlog --once
     and after a few hours robinhood process started to use a lot of
    CPU (around 700% - everything available). Is this normal?
    Why not... if it's for doing useful things, such as processing
    bursts of changelogs :)
    But I'd say it may also be the sign of something wrong...
    You can check if the changelog processing is going well and how it
    evolved:
    grep STATS /var/log/robinhood.log

    Thomas
    cheers,
    Marcin


    
------------------------------------------------------------------------------

    _______________________________________________
    robinhood-support mailing list
    [email protected]
    <mailto:[email protected]>
    https://lists.sourceforge.net/lists/listinfo/robinhood-support
    <https://lists.sourceforge.net/lists/listinfo/robinhood-support>

------------------------------------------------------------------------------

_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Re: [robinhood-support] High CPU usage of robinhood --readlog --once

Reply via email to