On Aug 2, 2014, at 6:55 AM, Chris Ferebee via smartos-discuss 
<[email protected]> wrote:

> Robert,
> 
> The script you suggested outputs the following, with the server mostly idle 
> and one OS X client spinning in the Finder, doing its thing to cause the 
> flood of getcwd() calls.
> 
> # dtrace -n 'fbt::dnlc_reverse_lookup:entry{ self->t = timestamp; }' -n 
> 'fbt::dnlc_reverse_lookup:return/self->t/{ @[arg1 == 0 ? 0 : 1] = 
> quantize(timestamp - self->t); self->t = 0; }'
> dtrace: description 'fbt::dnlc_reverse_lookup:entry' matched 1 probe
> dtrace: description 'fbt::dnlc_reverse_lookup:return' matched 1 probe
> 
> ^C
> 
>        1
>           value  ------------- Distribution ------------- count    
>         1048576 |                                         0        
>         2097152 |@@@@@@@@                                 240      
>         4194304 |                                         8        
>         8388608 |@@@@@@@@@@@@@@@@@@@@@                    636      
>        16777216 |@@@@@@@@@@@                              348      
>        33554432 |                                         0        
> 
>        0
>           value  ------------- Distribution ------------- count    
>         8388608 |                                         0        
>        16777216 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 971      
>        33554432 |                                         2        
>        67108864 |                                         0        
> 
> 
> I don't yet understand how this relates to the results from Ralph's afp.d 
> Dtrace script, which shows getcwd() taking either ~ 50 us or 250 ms, with a 
> 3:1 bimodal distribution. Does getcwd() call dnlc_reverse_lookup always, or 
> just sometimes?
> 
> How might the size of the dnlc affect things? I understand it can be adjusted 
> by putting
> 
>       set ncsize=nnn
> 
> into /etc/system, but I have also seen suggestions that it can be dynamically 
> tuned like this:
> 
> # echo dnlc_max_nentries/W0t10485760 | mdb -kw
> # echo ncsize/W0t5240000 | mdb -kw
> 
> (Is that safe or advisable? Does it work? I'm hesitant to try, but it would 
> be convenient not to reboot the server, not to mention the difficulty of 
> modifying /etc/system on SmartOS.)
> 
> Currently, ncsize is set to 444,719 (SmartOS default?), and dnlc_nentries 
> seems to slowly fluctuate between 200,000 and 300,000.
> 
> # echo ncsize/D | mdb -kw
> ncsize: 444719                # and similarly for
> dnlc_nentries: 272139
> dnlc_max_nentries: 889438                    
> dnlc_nentries_low_water: 440316
> 
> Dnlc cache hits seem to be around 50% per
> 
> # vmstat -s
> 5079608454 total name lookups (cache hits 50%)
> 
> and
> 
> # kstat -n dnlcstats
> module: unix                            instance: 0     
> name:   dnlcstats                       class:    misc
>        crtime                          8,600642349
>        dir_add_abort                   0
>        dir_add_max                     0
>        dir_add_no_memory               0
>        dir_cached_current              2
>        dir_cached_total                2
>        dir_entries_cached_current      1094
>        dir_fini_purge                  0
>        dir_hits                        0
>        dir_misses                      19498
>        dir_reclaim_any                 0
>        dir_reclaim_last                0
>        dir_remove_entry_fail           0
>        dir_remove_space_fail           0
>        dir_start_no_memory             0
>        dir_update_fail                 0
>        double_enters                   14
>        enters                          1858063
>        hits                            2546253549
>        misses                          2531297654
>        negative_cache_hits             22475516
>        pick_free                       0
>        pick_heuristic                  995275
>        pick_last                       3997

pick_heuristic and pick_last are the import indicators that your DLNC  could be
too small. Ideally, these are 0.

Somewhere around here I've got a dnlcstat... there is one in the dtrace toolkit,
but you can also do one using just kstats instead.
 -- richard


>        purge_all                       0
>        purge_fs1                       0
>        purge_total_entries             1786
>        purge_vfs                       44
>        purge_vp                        67
>        snaptime                        3016218,120526680
> 
> Thanks for insights so far!
> 
> Best,
> Chris
> 
> 
> Am 01.08.2014 um 17:07 schrieb Robert Mustacchi via smartos-discuss 
> <[email protected]>:
> 
>> On 07/31/2014 11:15 PM, Chris Ferebee via smartos-discuss wrote:
>>> Hi Robert,
>>> 
>>> The flame graph is here:
>>> 
>>>  <http://fere.be/p/netatalk-flamegraph.svg>
>>> 
>>> with corresponding output from Ralph's Dtrace script here:
>>> 
>>>  <http://fere.be/p/netatalk.afp.d.out>
>>> 
>>> The raw kernel Dtrace output of
>>> 
>>>  dtrace -x stackframes=100 -n 'profile-997 /arg0/ { @[stack()] = count(); } 
>>> tick-60s { exit(0); }' -o netatalk.kern_stacks-dtrace.out
>>> 
>>> is here:
>>> 
>>>  <http://fere.be/p/netatalk.kern_stacks-dtrace.out>
>>> 
>>> So if I understand things up to this point, the kernel is spending a lot of 
>>> time in dnlc_reverse_lookup. Does that tell you anything?
>> 
>> Hey Chris,
>> 
>> I'm not terribly familiar with how the dnlc is designed. So what'd be
>> interesting here is to correlate this with the number of hits versus
>> misses in the dnlc.
>> 
>> The questions I want to answer are something along the lines of:
>> 
>> o Are we spending a lot of time in the dnlc just to then miss?
>> o Are we spending a lot of time in the dnlc scanning it because it's
>> just really large and we're not hashing across it well?
>> o Is something else entirely going on?
>> 
>> Of course, the comment of the function has the following:
>> 
>> /*
>> * Perform a reverse lookup in the DNLC.  This will find the first
>> occurrence of
>> * the vnode.  If successful, it will return the vnode of the parent,
>> and the
>> * name of the entry in the given buffer.  If it cannot be found, or the
>> buffer
>> * is too small, then it will return NULL.  Note that this is a highly
>> * inefficient function, since the DNLC is constructed solely for forward
>> * lookups.
>> */
>> 
>> Which doesn't bode well. But perhaps, what I might go and do is
>> something like:
>> 
>> dtrace -n 'fbt::dnlc_reverse_lookup:entry{ self->t = timestamp; }' -n
>> 'fbt::dnlc_reverse_lookup:return/self->t/{ @[arg1 == 0 ? 0 : 1] =
>> quantize(timestamp - self->t); self->t = 0; }'
>> 
>> Robert
>> 
>>>> Am 31.07.2014 um 17:29 schrieb Robert Mustacchi <[email protected]>:
>>>> 
>>>>> On 07/31/2014 08:15 AM, Chris Ferebee via smartos-discuss wrote:
>>>>> I'm trying to debug performance issues on a SmartOS 
>>>>> joyent_20140613T024634Z server that is running netatalk in a zone.
>>>>> 
>>>>> After some investigation with the help of netatalk developer Ralph Böhme, 
>>>>> I find that the netatalk afpd is spending a very long time in the syscall
>>>>> 
>>>>>  getcwd()
>>>>> 
>>>>> in some cases. Ralph provided a Dtrace script which breaks down the time 
>>>>> spent in various syscalls, and getcwd() seems to be the major bottleneck 
>>>>> here. I've attached an excerpt of the Dtrace analysis below.
>>>>> 
>>>>> Is there any reason why getcwd() should take a long time? Disk I/O does 
>>>>> not appear to be an issue. For reference, I got similar results before 
>>>>> and after starting a zpool scrub on the zones pool, and the scrub is 
>>>>> proceeding at 940 MB/s.
>>>>> 
>>>>> Further background on the issue is discussed here:
>>>>> 
>>>>>  <http://sourceforge.net/p/netatalk/mailman/message/32660961/>
>>>>> 
>>>>> and the full output of the Dtrace script is here:
>>>>> 
>>>>>  <https://gist.github.com/ferebee/99016ccd293b3f616c77>
>>>>> 
>>>>> All sugestions are most welcome!
>>>> 
>>>> Hey Chris,
>>>> 
>>>> I'd suggest gathering a kernel flame graph while you're running the
>>>> test. It's not immediately obvious to me why getcwd() would be hot, but
>>>> what we're seeing is that with getcwd() you have a bimodal distribution.
>>>> So you have the ~100 entries that are taking about 250ms (I'm fairly
>>>> certain I read the script correctly and it saves everything in us).
>>>> 
>>>> It also appears that the kernel caches the value of the current working
>>>> directory, therefore it might be the case that these longer ones are
>>>> when you have that cache miss because the working directory changed.
>>>> 
>>>> What I might suggest as a next step is making a kernel flame graph to
>>>> correspond with this workload and run it concurrently with the script.
>>>> That might allow us to better understand what's going on. You can
>>>> generate a flame graph with https://github.com/brendangregg/flamegraph.
>>>> 
>>>> Robert
>>> 
>>> 
>>> 
>>> -------------------------------------------
>>> smartos-discuss
>>> Archives: https://www.listbox.com/member/archive/184463/=now
>>> RSS Feed: 
>>> https://www.listbox.com/member/archive/rss/184463/21483261-4b78dd38
>>> Modify Your Subscription: https://www.listbox.com/member/?&;
>>> Powered by Listbox: http://www.listbox.com
>>> 
>> 
>> 
>> 
>> -------------------------------------------
>> smartos-discuss
>> Archives: https://www.listbox.com/member/archive/184463/=now
>> RSS Feed: https://www.listbox.com/member/archive/rss/184463/24804823-eebbfb1e
>> Modify Your Subscription: https://www.listbox.com/member/?&;
>> Powered by Listbox: http://www.listbox.com
> 
> 
> 
> -------------------------------------------
> smartos-discuss
> Archives: https://www.listbox.com/member/archive/184463/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/184463/21953302-fd56db47
> Modify Your Subscription: https://www.listbox.com/member/?&;
> Powered by Listbox: http://www.listbox.com



-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Reply via email to