On Oct 4, 2007, at 11:12 AM, Pete Wyckoff wrote:

[EMAIL PROTECTED] wrote on Wed, 03 Oct 2007 16:31 -0500:
I've placed the big trace I have from the instrumented server at:

http://www.mcs.anl.gov/~slang/sbig.out

I get nothing useful out of this unfortunately.  In fact, I don't
see the big values (0.2 ish) you had in your "Iterations 1-10" plot:

$ sed 15242000q sbig.out | ./epoll-hist.pl | sed 1,2d | awk '{if ($2 > max) {max = $2}} END {print max}'
0.0167419999925187

I've attached a dump of the metadata server while doing 10 creates/
deletes from the VFS, sleeping 4 secs, and then 10 creates/deletes
from the admin tool, sleeping 10 secs, and then repeating that.  The
time differences between VFS and pvfs admin tools are apparent.
[..]
If its helpful to you, Wireshark (TAFKA Ethereal) has pvfs dissection
out of the box, so if you just load the dump into Wireshark you can
see the pvfs requests and responses, along with tcp packets.  This
turned out to be really useful to me.

Yeah, wireshark is cute, having PVFS.  But I see no correlation
between the timestamps on the network and those in the sbig file.

Those are from runs on completely different systems. Sorry for the confusion.

Was hoping to see a packet arrive in the tcp dump, then see
epoll_wait END within a few 10s of us after that.  Hoping to prove
that epoll_wait wasn't the problem here.  But the packet arrivals
are pretty far away from when sbig says epoll_wait found something.
More than the 10 ms or so delays you are seeing.

I'll have to look.  Its possible I posted the wrong trace.  :-|


Let me know if you see anything interesting.  I'm going to keep
trying to reproduce this with my synthetic test.

Some observations.  The libc does nothing to epoll.  It is just a
system call.  In the kernel, the timeout is converted to ticks.
Seeing the basecase 12ms in your plots, it is clear you are using a
HZ=250 kernel.  Running HZ=1000 would reduce this to 10ms when
nothing happens, but probably not fix the problem.

Each epoll fd is locked, but we only do epoll_wait in one thread, so
there will be no contention.  The incidence of _ctl operations is
very rare, so no problem there.

When TCP sees an incoming packet, it invokes the epoll callback that
wakes up the waiting task.  Hard to imagine any delays in here.  Of
course, the polling task may not actually run until it can be
scheduled on a processor.

I'm guessing this is simple CPU contention.  Can you look at the
average run-queue length (load average) during some runs and see if
that seems to be the case?  Might not be fine-grained enough.  Are
there other scheduler tools available that can help with this
approach?  You might change your synthetic test to add some threads
with lots of CPU-hungry tasks and see if that provokes the behavior.

Yeah that could be, but it doesn't explain why operations from a newly connected socket don't create contention, while identical operations from a long-lived socket do.

Thanks for the pointers.
-sam


For timing tests here, we'll run with a non-threaded server and
disable posix locks.  You might experiment with that and see if you
at least get consistent results.

                -- Pete


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to