Re: [smartos-discuss] Problem with netatalk, very long response times from getcwd()

Chris Ferebee via smartos-discuss Sat, 02 Aug 2014 06:56:07 -0700

Robert,

The script you suggested outputs the following, with the server mostly idle and 
one OS X client spinning in the Finder, doing its thing to cause the flood of 
getcwd() calls.


# dtrace -n 'fbt::dnlc_reverse_lookup:entry{ self->t = timestamp; }' -n 
'fbt::dnlc_reverse_lookup:return/self->t/{ @[arg1 == 0 ? 0 : 1] = 
quantize(timestamp - self->t); self->t = 0; }'
dtrace: description 'fbt::dnlc_reverse_lookup:entry' matched 1 probe
dtrace: description 'fbt::dnlc_reverse_lookup:return' matched 1 probe

^C

        1
           value  ------------- Distribution ------------- count    
         1048576 |                                         0        
         2097152 |@@@@@@@@                                 240      
         4194304 |                                         8        
         8388608 |@@@@@@@@@@@@@@@@@@@@@                    636      
        16777216 |@@@@@@@@@@@                              348      
        33554432 |                                         0        

        0
           value  ------------- Distribution ------------- count    
         8388608 |                                         0        
        16777216 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 971      
        33554432 |                                         2        
        67108864 |                                         0        


I don't yet understand how this relates to the results from Ralph's afp.d 
Dtrace script, which shows getcwd() taking either ~ 50 us or 250 ms, with a 3:1 
bimodal distribution. Does getcwd() call dnlc_reverse_lookup always, or just 
sometimes?

How might the size of the dnlc affect things? I understand it can be adjusted 
by putting

        set ncsize=nnn

into /etc/system, but I have also seen suggestions that it can be dynamically 
tuned like this:

# echo dnlc_max_nentries/W0t10485760 | mdb -kw
# echo ncsize/W0t5240000 | mdb -kw

(Is that safe or advisable? Does it work? I'm hesitant to try, but it would be 
convenient not to reboot the server, not to mention the difficulty of modifying 
/etc/system on SmartOS.)
 
Currently, ncsize is set to 444,719 (SmartOS default?), and dnlc_nentries seems 
to slowly fluctuate between 200,000 and 300,000.

# echo ncsize/D | mdb -kw
ncsize: 444719          # and similarly for
dnlc_nentries: 272139
dnlc_max_nentries: 889438                    
dnlc_nentries_low_water: 440316

Dnlc cache hits seem to be around 50% per

# vmstat -s
5079608454 total name lookups (cache hits 50%)

and

# kstat -n dnlcstats
module: unix                            instance: 0     
name:   dnlcstats                       class:    misc
        crtime                          8,600642349
        dir_add_abort                   0
        dir_add_max                     0
        dir_add_no_memory               0
        dir_cached_current              2
        dir_cached_total                2
        dir_entries_cached_current      1094
        dir_fini_purge                  0
        dir_hits                        0
        dir_misses                      19498
        dir_reclaim_any                 0
        dir_reclaim_last                0
        dir_remove_entry_fail           0
        dir_remove_space_fail           0
        dir_start_no_memory             0
        dir_update_fail                 0
        double_enters                   14
        enters                          1858063
        hits                            2546253549
        misses                          2531297654
        negative_cache_hits             22475516
        pick_free                       0
        pick_heuristic                  995275
        pick_last                       3997
        purge_all                       0
        purge_fs1                       0
        purge_total_entries             1786
        purge_vfs                       44
        purge_vp                        67
        snaptime                        3016218,120526680

Thanks for insights so far!

Best,
Chris


Am 01.08.2014 um 17:07 schrieb Robert Mustacchi via smartos-discuss 
<[email protected]>:

> On 07/31/2014 11:15 PM, Chris Ferebee via smartos-discuss wrote:
>> Hi Robert,
>> 
>> The flame graph is here:
>> 
>>   <http://fere.be/p/netatalk-flamegraph.svg>
>> 
>> with corresponding output from Ralph's Dtrace script here:
>> 
>>   <http://fere.be/p/netatalk.afp.d.out>
>> 
>> The raw kernel Dtrace output of
>> 
>>   dtrace -x stackframes=100 -n 'profile-997 /arg0/ { @[stack()] = count(); } 
>> tick-60s { exit(0); }' -o netatalk.kern_stacks-dtrace.out
>> 
>> is here:
>> 
>>   <http://fere.be/p/netatalk.kern_stacks-dtrace.out>
>> 
>> So if I understand things up to this point, the kernel is spending a lot of 
>> time in dnlc_reverse_lookup. Does that tell you anything?
> 
> Hey Chris,
> 
> I'm not terribly familiar with how the dnlc is designed. So what'd be
> interesting here is to correlate this with the number of hits versus
> misses in the dnlc.
> 
> The questions I want to answer are something along the lines of:
> 
> o Are we spending a lot of time in the dnlc just to then miss?
> o Are we spending a lot of time in the dnlc scanning it because it's
> just really large and we're not hashing across it well?
> o Is something else entirely going on?
> 
> Of course, the comment of the function has the following:
> 
> /*
> * Perform a reverse lookup in the DNLC.  This will find the first
> occurrence of
> * the vnode.  If successful, it will return the vnode of the parent,
> and the
> * name of the entry in the given buffer.  If it cannot be found, or the
> buffer
> * is too small, then it will return NULL.  Note that this is a highly
> * inefficient function, since the DNLC is constructed solely for forward
> * lookups.
> */
> 
> Which doesn't bode well. But perhaps, what I might go and do is
> something like:
> 
> dtrace -n 'fbt::dnlc_reverse_lookup:entry{ self->t = timestamp; }' -n
> 'fbt::dnlc_reverse_lookup:return/self->t/{ @[arg1 == 0 ? 0 : 1] =
> quantize(timestamp - self->t); self->t = 0; }'
> 
> Robert
> 
>>> Am 31.07.2014 um 17:29 schrieb Robert Mustacchi <[email protected]>:
>>> 
>>>> On 07/31/2014 08:15 AM, Chris Ferebee via smartos-discuss wrote:
>>>> I'm trying to debug performance issues on a SmartOS 
>>>> joyent_20140613T024634Z server that is running netatalk in a zone.
>>>> 
>>>> After some investigation with the help of netatalk developer Ralph Böhme, 
>>>> I find that the netatalk afpd is spending a very long time in the syscall
>>>> 
>>>>   getcwd()
>>>> 
>>>> in some cases. Ralph provided a Dtrace script which breaks down the time 
>>>> spent in various syscalls, and getcwd() seems to be the major bottleneck 
>>>> here. I've attached an excerpt of the Dtrace analysis below.
>>>> 
>>>> Is there any reason why getcwd() should take a long time? Disk I/O does 
>>>> not appear to be an issue. For reference, I got similar results before and 
>>>> after starting a zpool scrub on the zones pool, and the scrub is 
>>>> proceeding at 940 MB/s.
>>>> 
>>>> Further background on the issue is discussed here:
>>>> 
>>>>   <http://sourceforge.net/p/netatalk/mailman/message/32660961/>
>>>> 
>>>> and the full output of the Dtrace script is here:
>>>> 
>>>>   <https://gist.github.com/ferebee/99016ccd293b3f616c77>
>>>> 
>>>> All sugestions are most welcome!
>>> 
>>> Hey Chris,
>>> 
>>> I'd suggest gathering a kernel flame graph while you're running the
>>> test. It's not immediately obvious to me why getcwd() would be hot, but
>>> what we're seeing is that with getcwd() you have a bimodal distribution.
>>> So you have the ~100 entries that are taking about 250ms (I'm fairly
>>> certain I read the script correctly and it saves everything in us).
>>> 
>>> It also appears that the kernel caches the value of the current working
>>> directory, therefore it might be the case that these longer ones are
>>> when you have that cache miss because the working directory changed.
>>> 
>>> What I might suggest as a next step is making a kernel flame graph to
>>> correspond with this workload and run it concurrently with the script.
>>> That might allow us to better understand what's going on. You can
>>> generate a flame graph with https://github.com/brendangregg/flamegraph.
>>> 
>>> Robert
>> 
>> 
>> 
>> -------------------------------------------
>> smartos-discuss
>> Archives: https://www.listbox.com/member/archive/184463/=now
>> RSS Feed: https://www.listbox.com/member/archive/rss/184463/21483261-4b78dd38
>> Modify Your Subscription: https://www.listbox.com/member/?&;
>> Powered by Listbox: http://www.listbox.com
>> 
> 
> 
> 
> -------------------------------------------
> smartos-discuss
> Archives: https://www.listbox.com/member/archive/184463/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/184463/24804823-eebbfb1e
> Modify Your Subscription: https://www.listbox.com/member/?&;
> Powered by Listbox: http://www.listbox.com



-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Re: [smartos-discuss] Problem with netatalk, very long response times from getcwd()

Reply via email to