DB_apply is actually the longer stage with 0.4ms.
Other awaiting operations we see are stacked into the pipeline waiting
for a worker thread to process it
(the queue length is controled by the max_pending_operations you set in
your config).
It is normal there are many of them in the first stage when the
changelog is full.
read speed is a good indicator of how fast it goes.
Did you get better results with the latest code and disabling accounting?
On 05/06/15 13:42, Carmelo Ponti (CSCS) wrote:
Hi Thomas
Thank you for you prompt answer.
On Wed, 2015-05-06 at 10:52 +0200, LEIBOVICI Thomas wrote:
Hi Carmelo,
Check in robinhood logs was is the slowest operation in robinhood
pipeline (grep STATS ...), and where the operrations are stacked
(waiting status).
The slower operation is GET_INFO_DB with 99999 waiting:
2015/05/06 13:02:47 robinhood@daintrbh01
<mailto:robinhood@daintrbh01>[22725/2] STATS | Stage |
Wait | Curr | Done | Total | ms/op |
2015/05/06 13:02:47 robinhood@daintrbh01
<mailto:robinhood@daintrbh01>[22725/2] STATS | 0: GET_FID |
0 | 0 | 0 | 0 | 0.00 |
2015/05/06 13:02:47 robinhood@daintrbh01
<mailto:robinhood@daintrbh01>[22725/2] STATS | 1: GET_INFO_DB
|99998 | 0 | 0 | 28642026 | 0.31 |
2015/05/06 13:02:47 robinhood@daintrbh01
<mailto:robinhood@daintrbh01>[22725/2] STATS | 2: GET_INFO_FS |
0 | 0 | 0 | 10440925 | 0.24 |
2015/05/06 13:02:47 robinhood@daintrbh01
<mailto:robinhood@daintrbh01>[22725/2] STATS | 3: REPORTING |
0 | 0 | 0 | 53424 | 0.00 |
2015/05/06 13:02:47 robinhood@daintrbh01
<mailto:robinhood@daintrbh01>[22725/2] STATS | 4: PRE_APPLY |
0 | 0 | 0 | 7813312 | 0.00 |
2015/05/06 13:02:47 robinhood@daintrbh01
<mailto:robinhood@daintrbh01>[22725/2] STATS | 5: DB_APPLY |
0 | 0 | 0 | 7813312 | 0.40 |
2015/05/06 13:02:47 robinhood@daintrbh01
<mailto:robinhood@daintrbh01>[22725/2] STATS | 6: CHGLOG_CLR |
1 | 0 | 0 | 21389216 | 0.02 |
2015/05/06 13:02:47 robinhood@daintrbh01
<mailto:robinhood@daintrbh01>[22725/2] STATS | 7: RM_OLD_ENTRIES |
0 | 0 | 0 | 0 | 0.00 |
- If the limiting point is the DB access (DB_APPLY stage), consider this:
I compiled robinhood 2.5.5 and I applied the changes suggested.
Do you get better performances with "autocommit" compared to "transaction"?
I don't see much difference:
autocommit (2729.07 only after robinhood restart)
2015/05/06 13:15:17 robinhood@daintrbh01
<mailto:robinhood@daintrbh01>[11264/1] STATS | read
speed = 2729.07 record/sec
2015/05/06 13:16:17 robinhood@daintrbh01
<mailto:robinhood@daintrbh01>[11264/1] STATS | read
speed = 1917.30 record/sec
2015/05/06 13:17:17 robinhood@daintrbh01
<mailto:robinhood@daintrbh01>[11264/1] STATS | read
speed = 1965.80 record/sec
2015/05/06 13:18:17 robinhood@daintrbh01
<mailto:robinhood@daintrbh01>[11264/1] STATS | read
speed = 1295.02 record/sec
and then continue around 1200
transaction ( 3088.33 only after robinhood restart)
2015/05/06 13:28:26 robinhood@daintrbh01
<mailto:robinhood@daintrbh01>[11864/1] STATS | read
speed = 3088.33 record/sec
2015/05/06 13:29:26 robinhood@daintrbh01
<mailto:robinhood@daintrbh01>[11864/1] STATS | read
speed = 1866.95 record/sec
2015/05/06 13:30:29 robinhood@daintrbh01
<mailto:robinhood@daintrbh01>[11864/1] STATS | read
speed = 1787.73 record/sec
2015/05/06 13:31:29 robinhood@daintrbh01
<mailto:robinhood@daintrbh01>[11864/1] STATS | read
speed = 2105.38 record/sec
2015/05/06 13:32:29 robinhood@daintrbh01
<mailto:robinhood@daintrbh01>[11864/1] STATS | read
speed = 1273.35 record/sec
and then continue around 1200
> match_classes = TRUE;
If you don't care about fileclass reports (rbh-report --class-info) you
can disable "match_classes".
I'm keeping it for the moment.
> Ignore
> {
> type == directory
> and
> ( name == ".snapdir" or name == ".snapshot" )
> }
This is useless with Lustre.
Removed
> # ChangeLog Reader configuration
> # Parameters for processing MDT changelogs :
> ChangeLog
> {
> ...
> queue_max_size = 1000 ;
> queue_max_age = 5s ;
> queue_check_interval = 1s ;
> }
You can try increasing max size and max age (x2?) to get more chance to
eliminate redundant changelog records.
Done
> Purge_Trigger
> {
> trigger_on = global_usage ;
Trigerring purge on OST_usage is more efficient, and safer to avoid
ENOSPC errors for users.
Done
GET_INFO_DB is still 99998. This appears only when the changelog is
full. Usually is 0 or max 500.
Carmelo
--
----------------------------------------------------------------------
Carmelo Ponti System Engineer
CSCS Swiss Center for Scientific Computing
Via Trevano 131 Email: [email protected]
CH-6900 Lugano http://www.cscs.ch
Phone: +41 91 610 82 15/Fax: +41 91 610 82 82
----------------------------------------------------------------------
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support