Hi Robert,

The flame graph is here:

   <http://fere.be/p/netatalk-flamegraph.svg>

with corresponding output from Ralph's Dtrace script here:

   <http://fere.be/p/netatalk.afp.d.out>

The raw kernel Dtrace output of

   dtrace -x stackframes=100 -n 'profile-997 /arg0/ { @[stack()] = count(); } 
tick-60s { exit(0); }' -o netatalk.kern_stacks-dtrace.out

is here:

   <http://fere.be/p/netatalk.kern_stacks-dtrace.out>

So if I understand things up to this point, the kernel is spending a lot of 
time in dnlc_reverse_lookup. Does that tell you anything?

Thanks,
Chris


> Am 31.07.2014 um 17:29 schrieb Robert Mustacchi <[email protected]>:
> 
>> On 07/31/2014 08:15 AM, Chris Ferebee via smartos-discuss wrote:
>> I'm trying to debug performance issues on a SmartOS joyent_20140613T024634Z 
>> server that is running netatalk in a zone.
>> 
>> After some investigation with the help of netatalk developer Ralph Böhme, I 
>> find that the netatalk afpd is spending a very long time in the syscall
>> 
>>    getcwd()
>> 
>> in some cases. Ralph provided a Dtrace script which breaks down the time 
>> spent in various syscalls, and getcwd() seems to be the major bottleneck 
>> here. I've attached an excerpt of the Dtrace analysis below.
>> 
>> Is there any reason why getcwd() should take a long time? Disk I/O does not 
>> appear to be an issue. For reference, I got similar results before and after 
>> starting a zpool scrub on the zones pool, and the scrub is proceeding at 940 
>> MB/s.
>> 
>> Further background on the issue is discussed here:
>> 
>>    <http://sourceforge.net/p/netatalk/mailman/message/32660961/>
>> 
>> and the full output of the Dtrace script is here:
>> 
>>    <https://gist.github.com/ferebee/99016ccd293b3f616c77>
>> 
>> All sugestions are most welcome!
> 
> Hey Chris,
> 
> I'd suggest gathering a kernel flame graph while you're running the
> test. It's not immediately obvious to me why getcwd() would be hot, but
> what we're seeing is that with getcwd() you have a bimodal distribution.
> So you have the ~100 entries that are taking about 250ms (I'm fairly
> certain I read the script correctly and it saves everything in us).
> 
> It also appears that the kernel caches the value of the current working
> directory, therefore it might be the case that these longer ones are
> when you have that cache miss because the working directory changed.
> 
> What I might suggest as a next step is making a kernel flame graph to
> correspond with this workload and run it concurrently with the script.
> That might allow us to better understand what's going on. You can
> generate a flame graph with https://github.com/brendangregg/flamegraph.
> 
> Robert



-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Reply via email to