On 07/31/2014 11:15 PM, Chris Ferebee via smartos-discuss wrote:
> Hi Robert,
>
> The flame graph is here:
>
> <http://fere.be/p/netatalk-flamegraph.svg>
>
> with corresponding output from Ralph's Dtrace script here:
>
> <http://fere.be/p/netatalk.afp.d.out>
>
> The raw kernel Dtrace output of
>
> dtrace -x stackframes=100 -n 'profile-997 /arg0/ { @[stack()] = count(); }
> tick-60s { exit(0); }' -o netatalk.kern_stacks-dtrace.out
>
> is here:
>
> <http://fere.be/p/netatalk.kern_stacks-dtrace.out>
>
> So if I understand things up to this point, the kernel is spending a lot of
> time in dnlc_reverse_lookup. Does that tell you anything?
Hey Chris,
I'm not terribly familiar with how the dnlc is designed. So what'd be
interesting here is to correlate this with the number of hits versus
misses in the dnlc.
The questions I want to answer are something along the lines of:
o Are we spending a lot of time in the dnlc just to then miss?
o Are we spending a lot of time in the dnlc scanning it because it's
just really large and we're not hashing across it well?
o Is something else entirely going on?
Of course, the comment of the function has the following:
/*
* Perform a reverse lookup in the DNLC. This will find the first
occurrence of
* the vnode. If successful, it will return the vnode of the parent,
and the
* name of the entry in the given buffer. If it cannot be found, or the
buffer
* is too small, then it will return NULL. Note that this is a highly
* inefficient function, since the DNLC is constructed solely for forward
* lookups.
*/
Which doesn't bode well. But perhaps, what I might go and do is
something like:
dtrace -n 'fbt::dnlc_reverse_lookup:entry{ self->t = timestamp; }' -n
'fbt::dnlc_reverse_lookup:return/self->t/{ @[arg1 == 0 ? 0 : 1] =
quantize(timestamp - self->t); self->t = 0; }'
Robert
>> Am 31.07.2014 um 17:29 schrieb Robert Mustacchi <[email protected]>:
>>
>>> On 07/31/2014 08:15 AM, Chris Ferebee via smartos-discuss wrote:
>>> I'm trying to debug performance issues on a SmartOS joyent_20140613T024634Z
>>> server that is running netatalk in a zone.
>>>
>>> After some investigation with the help of netatalk developer Ralph Böhme, I
>>> find that the netatalk afpd is spending a very long time in the syscall
>>>
>>> getcwd()
>>>
>>> in some cases. Ralph provided a Dtrace script which breaks down the time
>>> spent in various syscalls, and getcwd() seems to be the major bottleneck
>>> here. I've attached an excerpt of the Dtrace analysis below.
>>>
>>> Is there any reason why getcwd() should take a long time? Disk I/O does not
>>> appear to be an issue. For reference, I got similar results before and
>>> after starting a zpool scrub on the zones pool, and the scrub is proceeding
>>> at 940 MB/s.
>>>
>>> Further background on the issue is discussed here:
>>>
>>> <http://sourceforge.net/p/netatalk/mailman/message/32660961/>
>>>
>>> and the full output of the Dtrace script is here:
>>>
>>> <https://gist.github.com/ferebee/99016ccd293b3f616c77>
>>>
>>> All sugestions are most welcome!
>>
>> Hey Chris,
>>
>> I'd suggest gathering a kernel flame graph while you're running the
>> test. It's not immediately obvious to me why getcwd() would be hot, but
>> what we're seeing is that with getcwd() you have a bimodal distribution.
>> So you have the ~100 entries that are taking about 250ms (I'm fairly
>> certain I read the script correctly and it saves everything in us).
>>
>> It also appears that the kernel caches the value of the current working
>> directory, therefore it might be the case that these longer ones are
>> when you have that cache miss because the working directory changed.
>>
>> What I might suggest as a next step is making a kernel flame graph to
>> correspond with this workload and run it concurrently with the script.
>> That might allow us to better understand what's going on. You can
>> generate a flame graph with https://github.com/brendangregg/flamegraph.
>>
>> Robert
>
>
>
> -------------------------------------------
> smartos-discuss
> Archives: https://www.listbox.com/member/archive/184463/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/184463/21483261-4b78dd38
> Modify Your Subscription: https://www.listbox.com/member/?&
> Powered by Listbox: http://www.listbox.com
>
-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription:
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com