Colin,

Thanks for your reply.  I look forward to hearing what "best practices" you
come up with for improving performance.  We have already used the
http://mysqltuner.pl/ script, and matched the server memory to the DB size,
but not done much beyond that.

That said, I don't think the problem with the system is the database.  When
running a full scan, we get "average speed:   3018.87 entries/sec", which I
would have thought is more stressful on the DB than changelog processing
and yet runs several times faster.

I also just tried another test, since our changelog backlog was so bad and
getting worse, I figure we can pick up the missed bits with a full scan
later.  I went ahead and cleared the changelogs.  That seemed to let
Robinhood progress much faster, at least for a little while.

[root@sherwood ~]# lfs changelog_clear lfs3-MDT0000 cl1 0

Before the clear:

2016/02/04 19:49:21 [38629/1] STATS |    read speed               = 484.42
record/sec
2016/02/04 20:04:21 [38629/1] STATS |    read speed               = 448.04
record/sec
2016/02/04 20:19:21 [38629/1] STATS |    read speed               = 458.21
record/sec
2016/02/04 20:34:21 [38629/1] STATS |    read speed               = 464.67
record/sec
2016/02/04 20:49:21 [38629/1] STATS |    read speed               = 460.15
record/sec
2016/02/04 21:04:21 [38629/1] STATS |    read speed               = 473.90
record/sec

After the clear:

2016/02/04 21:19:21 [38629/1] STATS |    read speed               =
30047.39 record/sec (14756.61 incl. idle time)
2016/02/04 21:34:22 [38629/1] STATS |    read speed               =
16627.84 record/sec (942.24 incl. idle time)
2016/02/04 21:49:22 [38629/1] STATS |    read speed               = 1552.43
record/sec (1216.07 incl. idle time)
2016/02/04 22:04:22 [38629/1] STATS |    read speed               = 2077.57
record/sec (1041.09 incl. idle time)
2016/02/04 22:19:22 [38629/1] STATS |    read speed               = 3987.93
record/sec (908.36 incl. idle time)
2016/02/04 22:34:22 [38629/1] STATS |    read speed               = 2482.72
record/sec (918.61 incl. idle time)
2016/02/04 22:49:22 [38629/1] STATS |    read speed               = 1997.80
record/sec (854.61 incl. idle time)
2016/02/04 23:04:22 [38629/1] STATS |    read speed               = 2259.46
record/sec (1031.82 incl. idle time)
2016/02/04 23:19:22 [38629/1] STATS |    read speed               = 1521.73
record/sec (953.62 incl. idle time)
2016/02/04 23:34:22 [38629/1] STATS |    read speed               = 1321.58
record/sec (879.58 incl. idle time)

Current counters:

[root@lfs-mds-3-1 ~]# cat /proc/fs/lustre/mdd/*MDT*/changelog_users
current index: 591196749
ID    index
cl1   591164109

So, a backlog of only 32640 after running for 2.5 hours, which is WAY
better than it was before.

That sounds a lot like https://jira.hpdd.intel.com/browse/LU-5405.  So,
just to be sure, I went back and verified our lustre server source...

[root@lfs-mds-3-1 lustre-2.5.37.ddn1]# grep -n -m 1 -C 2 cfs_list_add
lustre/obdclass/llog_cat.c
202-
203-    down_write(&cathandle->lgh_lock);
204:    cfs_list_add_tail(&loghandle->u.phd.phd_entry,
&cathandle->u.chd.chd_head);
205-    up_write(&cathandle->lgh_lock);
206-

...and that looks like we DO have the fix.


So, is there perhaps another bug here in changelog processing?

Other tuning that I'm missing?

Thanks,
Nathan


On Thu, Feb 4, 2016 at 1:57 PM, Colin Faber <[email protected]> wrote:

> Hi Nathan,
>
> I'm actively working on a similar type of benchmarking activities. From
> what I've found so far, as most of my slowdowns are within the database
> processing itself. This can be observed by utilizing db benchmarking tools
> (sysbench with oltp testing).
>
> From what I've found robinhood usually sees double the best case
> performance (transactions per second) on a complex and read only workload
> over a million record set.
>
> These rates are pretty bad, even on beefy hardware, once you've solved
> your database performance problems then you get to move onto issues with
> FID lookup. There are a few tickets (which I don't have off hand) to
> improve this (multiple FID lookups per request, etc), but it's still slow.
>
> In general though changelog reading shouldn't be a bottleneck, even on my
> single processor E5-2609 based system I'm able to read almost 100k records
> / second.
>
> -cf
>
>
> On Thu, Feb 4, 2016 at 1:40 PM, Nathan Dauchy - NOAA Affiliate <
> [email protected]> wrote:
>
>> Greetings Robinhood Developers and Users,
>>
>> It seems as though Robinhood changelog processing is having trouble
>> keeping up on our system.  From reading mailing list threads, I believe we
>> should be expecting higher processing rates.  However, trials of various
>> suggestions have not yielded much better performance.  Does anyone have
>> guidance on how to identify exactly where the bottleneck is, or suggestions
>> on how to speed things up?
>>
>> Lustre servers are running 2.5.37-ddn1 (which *does* appear to have the
>> changelog ordering patch in LU-5405).
>>
>> Robinhood server is on CentOS-6.7, running lustre-2.5.41-ddn3, and using
>> robinhood-tmpfs-2.5.5-2.lustre2.5.el6.x86_64.  Backend disk is just
>> internal RAID10 SAS, but the DB pretty much fits in memory.
>>
>> Here is how we are measuring the "backlog":
>>
>> [root@lfs-mds-3-1 ~]# cat /proc/fs/lustre/mdd/*MDT*/changelog_users;
>> sleep 60; cat /proc/fs/lustre/mdd/*MDT*/changelog_users
>> current index: 576107893
>> ID    index
>> cl1   566324338
>> current index: 576145483
>> ID    index
>> cl1   566354338
>>
>> 576107893-566324338 = 9783555
>> 576145483-566354338 = 9791145
>> So you can see that in this case robinhood is falling behind, though
>> occasional periods of lower user activity let us start to catch up again
>> every once in a while.
>>
>> (566354338-566324338)/60 = 500
>> So it looks like we are processing just 500 records per second.
>> Strangely, this number is very consistent when re-tested!
>>
>>
>> Here is how the processing rates look from the server side:
>>
>> [root@sherwood ~]# tac /db/robinhood/log/robinhood_lfs3.log | grep -m 1
>> -B 35 -A 4 "STATS | ChangeLog reader" | tac
>> 2016/02/04 18:37:34 [3291/1] STATS | ==================== Dumping stats
>> at 2016/02/04 18:37:34 =====================
>> 2016/02/04 18:37:34 [3291/1] STATS | ======== General statistics =========
>> 2016/02/04 18:37:34 [3291/1] STATS | Daemon start time: 2016/02/01
>> 19:37:12
>> 2016/02/04 18:37:34 [3291/1] STATS | Started modules: log_reader
>> 2016/02/04 18:37:34 [3291/1] STATS | ChangeLog reader #0:
>> 2016/02/04 18:37:34 [3291/1] STATS |    fs_name    =   lfs3
>> 2016/02/04 18:37:34 [3291/1] STATS |    mdt_name   =   MDT0000
>> 2016/02/04 18:37:34 [3291/1] STATS |    reader_id  =   cl1
>> 2016/02/04 18:37:34 [3291/1] STATS |    records read        = 208328153
>> 2016/02/04 18:37:34 [3291/1] STATS |    interesting records = 129868456
>> 2016/02/04 18:37:34 [3291/1] STATS |    suppressed records  = 78459697
>> 2016/02/04 18:37:34 [3291/1] STATS |    records pending     = 999
>> 2016/02/04 18:37:34 [3291/1] STATS |    last received            =
>> 2016/02/04 18:37:34
>> 2016/02/04 18:37:34 [3291/1] STATS |    last read record time    =
>> 2016/02/04 17:34:23.559036
>> 2016/02/04 18:37:34 [3291/1] STATS |    last read record id      =
>> 564497302
>> 2016/02/04 18:37:34 [3291/1] STATS |    last pushed record id    =
>> 564496173
>> 2016/02/04 18:37:34 [3291/1] STATS |    last committed record id =
>> 564485779
>> 2016/02/04 18:37:34 [3291/1] STATS |    last cleared record id   =
>> 564481847
>> 2016/02/04 18:37:34 [3291/1] STATS |    read speed               = 492.57
>> record/sec
>> 2016/02/04 18:37:34 [3291/1] STATS |    processing speed ratio   = 0.33
>> 2016/02/04 18:37:34 [3291/1] STATS |    status                   = busy
>> 2016/02/04 18:37:34 [3291/1] STATS |    ChangeLog stats:
>> 2016/02/04 18:37:34 [3291/1] STATS |    MARK: 0, CREAT: 17776256, MKDIR:
>> 788677, HLINK: 3212, SLINK: 2422002, MKNOD: 0, UNLNK: 25982485
>> 2016/02/04 18:37:34 [3291/1] STATS |    RMDIR: 763162, RENME: 6482894,
>> RNMTO: 0, OPEN: 0, CLOSE: 0, LYOUT: 300, TRUNC: 78659450
>> 2016/02/04 18:37:34 [3291/1] STATS |    SATTR: 75397226, XATTR: 26767,
>> HSM: 0, MTIME: 25510, CTIME: 213, ATIME: 0
>> 2016/02/04 18:37:34 [3291/1] STATS | ==== EntryProcessor Pipeline Stats
>> ===
>> 2016/02/04 18:37:34 [3291/1] STATS | Idle threads: 8
>> 2016/02/04 18:37:34 [3291/1] STATS | Id constraints count: 9168 (hash
>> min=0/max=10/avg=0.6)
>> 2016/02/04 18:37:34 [3291/1] STATS | Name constraints count: 9056 (hash
>> min=0/max=41/avg=0.6)
>> 2016/02/04 18:37:34 [3291/1] STATS | Stage              | Wait | Curr |
>> Done |     Total | ms/op |
>> 2016/02/04 18:37:34 [3291/1] STATS |  0: GET_FID        |    0 |    0 |
>>  0 |         0 |  0.00 |
>> 2016/02/04 18:37:34 [3291/1] STATS |  1: GET_INFO_DB    |   81 |    0 |
>> 8130 | 136887093 |  1.63 |
>> 2016/02/04 18:37:34 [3291/1] STATS |  2: GET_INFO_FS    |    0 |    0 |
>>  0 | 103122533 |  3.70 |
>> 2016/02/04 18:37:34 [3291/1] STATS |  3: REPORTING      |    0 |    0 |
>>  0 |  90558068 |  0.01 |
>> 2016/02/04 18:37:34 [3291/1] STATS |  4: PRE_APPLY      |    0 |    0 |
>>  0 | 113372336 |  0.00 |
>> 2016/02/04 18:37:34 [3291/1] STATS |  5: DB_APPLY       |  952 |    1 |
>>  3 | 113371383 |  1.19 | 63.30% batched (avg batch size: 8.4)
>> 2016/02/04 18:37:34 [3291/1] STATS |  6: CHGLOG_CLR     |    0 |    0 |
>>  0 | 136878007 |  0.01 |
>> 2016/02/04 18:37:34 [3291/1] STATS |  7: RM_OLD_ENTRIES |    0 |    0 |
>>  0 |         0 |  0.00 |
>> 2016/02/04 18:37:34 [3291/1] STATS | DB ops:
>> get=103359170/ins=9377195/upd=81180648/rm=22813326
>> 2016/02/04 18:37:34 [3291/1] STATS | --- Pipeline stage details ---
>>
>>
<snip>


> NOTE: we can *temporarily* improve the GET_INFO_FS timings by stopping,
>> unmounting, remounting, and restarting...
>>
>> [root@sherwood ~]# grep ": GET_INFO_FS "
>> /db/robinhood/log/robinhood_lfs3.log | tail -n 6
>> 2016/02/04 19:07:34 [3291/1] STATS |  2: GET_INFO_FS    |    0 |    0 |
>>  0 | 103164823 |  3.70 |
>> 2016/02/04 19:22:34 [3291/1] STATS |  2: GET_INFO_FS    |    0 |    0 |
>>  0 | 103185647 |  3.70 |
>> 2016/02/04 19:30:03 [3291/2] STATS |  2: GET_INFO_FS    |    0 |    0 |
>>  0 | 103197626 |  3.70 |
>> 2016/02/04 19:49:21 [38629/1] STATS |  2: GET_INFO_FS    |    0 |    0 |
>>    0 |     18409 |  1.87 |
>> 2016/02/04 20:04:21 [38629/1] STATS |  2: GET_INFO_FS    |    0 |    0 |
>>    0 |     34587 |  1.96 |
>> 2016/02/04 20:19:21 [38629/1] STATS |  2: GET_INFO_FS    |    0 |    0 |
>>    0 |     53473 |  2.19 |
>>
>>
>> On the other hand, when using an interactive scan, we get more than 3x
>> better processing rates:
>>
>> [root@sherwood ~]# rbh-report -a -f lfs3
>>     Last filesystem scan:
>>             entries scanned: 135773284
>>             average speed:   3018.87 entries/sec
>>
>> The filesystem in question is:
>>
>> [root@sherwood ~]# rbh-report -i -f lfs3
>> Using config file '/etc/robinhood.d/tmpfs/lfs3.conf'.
>> type    ,      count,     volume,   avg_size
>> symlink ,    4819982,  399.90 MB,         87
>> dir     ,    5413496,   29.48 GB,    5.71 KB
>> file    ,  125761404,    1.04 PB,    8.92 MB
>> sock    ,          2,          0,          0
>>
>> Total: 135994884 entries, 1175890771195111 bytes (1.04 PB)
>>
>> [root@sherwood ~]# df -h /mnt/lfs3
>> Filesystem            Size  Used Avail Use% Mounted on
>> 10.179.16.132@o2ib:10.179.16.131@o2ib:/lfs3
>>                       2.1P  1.1P  970T  53% /mnt/lfs3
>>
>>
>> Some of our current tuning attempts, based on previous recommendations on
>> this list, are:
>>
>> * sysctl vm.vfs_cache_pressure = 150
>>
>> * MySQL config (/etc/my.cnf):
>>     innodb_thread_concurrency=32
>>     innodb_max_dirty_pages_pct=15
>>     max_connections=256
>>     query_cache_size=8M
>>     thread_cache_size=4
>>     innodb_buffer_pool_size=240G (~85% of actual memory)
>>
>> * Robinhood config:
>>     batch_ack_count = 10000 ;
>>     force_polling    = ON ;
>>     polling_interval = 1s ;
>>     queue_max_size   = 1000 ;   # (could be 2x larger?)
>>     queue_max_age    = 5s ;       # (could be 2x larger?)
>>     queue_check_interval = 1s ;
>>
>> * Lustre parameters:
>>     llite.*.statahead_max=4
>>     mdc.*.max_rpcs_in_flight=64
>>     ldlm.namespaces.*osc*.lru_max_age=1200
>>     ldlm.namespaces.*osc*.lru_size=100
>>
>> * Dropping inode cache daily, with:
>>     echo 2 > /proc/sys/vm/drop_caches   # (could be hourly?)
>>     # (would it be better to run "lru_size=clear" hourly?)
>>
>> * Tuning module "ko2iblnd peer_credits" has NOT yet been attempted.
>> (unclear on what a good value is)
>>     [root@lfs-mds-3-1 ~]# cat /proc/sys/lnet/peers | egrep
>> "^(nid|10.179.17.41@o2ib)"
>>     nid                      refs state  last   max   rtr   min    tx
>> min queue
>>     10.179.17.41@o2ib           1    NA    -1     8     8     8     8
>> -91 0
>>     [root@sherwood ~]# cat /proc/sys/lnet/peers | egrep
>> "^(nid|10.179.16.131@o2ib)"
>>     nid                      refs state  last   max   rtr   min    tx
>> min queue
>>     10.179.16.131@o2ib          1    NA    -1     8     8     8     8
>> -1188 0
>>
>>
>> Thanks for any suggestions!!!
>>
>> -Nathan
>>
>>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to