On Wed, Nov 26, 2014 at 12:48 PM, Thomas LEIBOVICI <[email protected]>
wrote:
> Le 26/11/2014 18:19, Craig Tierney - NOAA Affiliate a écrit :
>
> Thomas,
>
> We backported the patch. It was just a one-liner to put changlog
> entries at the tail, versus the head, of the list. After the last catchup
> of the changelogs completed, I created a bunch of new files while robinhood
> was not running. The processing rate is still about 400 entries per
> second. In particular, it looked like it was processing about 1024
> records every 2.5 seconds.
>
> So I looked in the configuration and saw that I had:
>
> # clear changelog every 1024 records:
> batch_ack_count = 1024 ;
>
> Craig,
>
> This is strange. The behavior you describe sounds exactly like the problem
> that must be fixed with the patch:
> every changelog_clear() call to the MDS stucks changelog delivery for a
> while.
>
> Is there a lot of stacked records? You can see this on the MDS, as far I I
> can remember, in /proc/fs/lustre/*mdd*/changelog_user something like that,
> you have the last record id and the last cleared record.
>
>
> What I have been doing to determine the changlog processing rate is to use
the changelog_user information. For example:
[root@lfs-mds-2-1 ~]# !cat
cat /proc/fs/lustre/mdd/lfs2-MDT0000/changelog_users ; sleep 30 ; cat
/proc/fs/lustre/mdd/lfs2-MDT0000/changelog_users
current index: 265951473
ID index
cl1 265796018
current index: 265951473
ID index
cl1 265816018
Even though the backlog is 155k changelogs, and the batch_ack_count is
10000, nothing changed over 30 seconds. In the stats, I see:
2014/11/26 20:59:19 [14613/2] STATS | ChangeLog reader #0:
2014/11/26 20:59:19 [14613/2] STATS | fs_name = lfs2
2014/11/26 20:59:19 [14613/2] STATS | mdt_name = MDT0000
2014/11/26 20:59:19 [14613/2] STATS | reader_id = cl1
2014/11/26 20:59:19 [14613/2] STATS | records read = 117135
2014/11/26 20:59:19 [14613/2] STATS | interesting records = 117135
2014/11/26 20:59:19 [14613/2] STATS | suppressed records = 0
2014/11/26 20:59:19 [14613/2] STATS | records pending = 0
2014/11/26 20:59:19 [14613/2] STATS | last received =
2014/11/26 20:59:17
2014/11/26 20:59:19 [14613/2] STATS | last read record time =
2014/11/26 20:55:06.685525
2014/11/26 20:59:19 [14613/2] STATS | last read record id =
265806017
2014/11/26 20:59:19 [14613/2] STATS | last pushed record id =
265806017
2014/11/26 20:59:19 [14613/2] STATS | last committed record id =
265796017
2014/11/26 20:59:19 [14613/2] STATS | last cleared record id =
265796017
2014/11/26 20:59:19 [14613/2] STATS | read speed = 672.94
record/sec (247.03 incl. idle time)
2014/11/26 20:59:19 [14613/2] STATS | processing speed ratio = 20.92
2014/11/26 20:59:19 [14613/2] STATS | status =
terminating
2014/11/26 20:59:19 [14613/2] STATS | ChangeLog stats:
2014/11/26 20:59:19 [14613/2] STATS | MARK: 0, CREAT: 44, MKDIR: 0,
HLINK: 0, SLINK: 0, MKNOD: 0, UNLNK: 110499, RMDIR: 6592
2014/11/26 20:59:19 [14613/2] STATS | RENME: 0, RNMTO: 0, OPEN: 0,
CLOSE: 0, LYOUT: 0, TRUNC: 0, SATTR: 0, XATTR: 0, HSM: 0
2014/11/26 20:59:19 [14613/2] STATS | MTIME: 0, CTIME: 0, ATIME: 0
But right now it seems to be stuck.
Craig
I don't know why this would slow things down, I thought it was just an
> update optimization. I ran some tests with a different changelog user and
> it seemed dumping the changelogs and updating the position should never be
> a limitation as I was able to grab over 100,000 entries and reset the count
> in a few seconds.
>
> OK.
>
>
> So I updated batch_ack_count to 10,000. Now the change log processing
> rate seemed to go up to 1666 logs/second (over 30 seconds). This is
> better. If the rate is limited by the database performance, then there
> probably isn't much more I can do (comparing to scan rates).
>
> "grep STAT" into robinhood log would help to indentify the limitation you
> hit.
> If you want to sample stats for a shorter period that the default (which
> is 15 or 20minutes), you can change the "stats_interval" in the config.
>
>
> What do people use for a value of batch_ack_count on large, PB sized,
> filesystems?
>
> I think a good value is a few seconds of changelog processing. So 10k is a
> good value in you case.
>
>
> Regards
>
>
> Thanks,
> Craig
>
>
> On Tue, Nov 18, 2014 at 3:00 AM, LEIBOVICI Thomas <[email protected]
> > wrote:
>
>> Hi Craig,
>>
>> No, it is njot expected to get such a slow processing speed.
>> According to the Lustre versions you run, this slow processing may be due
>> to the following Lustre bug:
>>
>> https://jira.hpdd.intel.com/browse/LU-5405
>>
>> It is a MDS fix. For now the fix is only landed in Lustre 2.5.4. I don't
>> know if it can be backported to Lustre2.4...
>>
>> Regards,
>> Thomas
>>
>>
>> On 11/17/14 21:11, Craig Tierney - NOAA Affiliate wrote:
>>
>> Hi,
>>
>> I have just installed Robinhood 2.5.3 to monitor a Lustre 2.4.3
>> system. The client on the server is running the 2.5.3 version. When I did
>> an initlal scan of another test system I saw scan rates of about 1000-2000
>> entries per second. While I had configured robinhood to monitor this new
>> system, the Robinhood server was not running when we started to copy data
>> to the new filesystem. From the changelog statistics, I am about 144m
>> events behind. Processing the change logs seems only be going at 375
>> entries per second.
>>
>> Is this typical? I would have expected the processing of changelog
>> events to be much faster than this or at least as fast as a normal file
>> scan.
>>
>> Thanks,
>> Craig
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>> with Interactivity, Sharing, Native Excel Exports, App Integration & more
>> Get technology previously reserved for billion-dollar corporations,
>> FREEhttp://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
>>
>>
>>
>> _______________________________________________
>> robinhood-support mailing
>> [email protected]https://lists.sourceforge.net/lists/listinfo/robinhood-support
>>
>>
>>
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations,
> FREEhttp://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
>
>
>
> _______________________________________________
> robinhood-support mailing
> [email protected]https://lists.sourceforge.net/lists/listinfo/robinhood-support
>
>
>
>
> ------------------------------
> <http://www.avast.com/>
>
> Ce courrier électronique ne contient aucun virus ou logiciel malveillant
> parce que la protection Antivirus avast! <http://www.avast.com/> est
> active.
>
>
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support