On Oct 3, 2007, at 10:17 AM, Pete Wyckoff wrote:

[EMAIL PROTECTED] wrote on Wed, 03 Oct 2007 09:52 -0500:
Tried that.  :-)  Its more or less the same problem with poll.  The
behavior of poll timings seems a bit less erratic than with epoll,
but the performance degradation is identical.
[..]
Its a PITA to debug, because the servers have to remain running for a long time (and the clients have to remain mounted) for the problem to
be visible.  Rob suggested I use strace on the servers to see what
epoll was doing, and that showed some interesting results.
Basically, it looks like epoll_wait takes significantly longer when
clients are doing operations over the VFS, rather than with the pvfs2
admin tools.  Also, strace reported epoll_ctl(...,
EPOLL_CTL_ADD, ...)) getting called a few times, even for the VFS
ops, and in those cases its returning EEXISTS.

Really?  Poll also behaves the same?  Now I am intrigued.

Heh. That's what it takes huh? I'll have to start adding random poll comments in my emails to get through your filter. ;-)


You can't really tell how long epoll_wait is taking just using
strace, since it will wait until a packet arrives plus this
mysterious extra time.

Can you do something on the server like:

    tcpdump -ttt
    strace -tt -T

to distinguish the two cases of 1) epoll_wait is taking a long time
after the packet shows up at the host, vs 2) the client request
packet is taking a long time to show up.

I'm fairly sure its number 1). I got dumps off the server while I was doing creates and deletes over the VFS on a system that had been running for a while and exhibited this performance degradation. The delay was seen between the receipt of the request, and the send of the response. Something in the server handling of the request was slowing it down. At that point Rob suggested I strace the server to see if it was system call related, and we noticed the behavior with epoll.


If (2), try the same exercise at the client side.

This has been a problem we've been seeing on our BGL system at Argonne for over a year. Its taken me that long to dig into where the degradation occurs, but it was tough to pin down, in part because it wasn't clear if it was a client side problem or server side. On a production system admins usually just restart everything. After working with one of the admins here, we started noticing that restarting the servers seemed to fix it, but then noticed that restarting the client daemon seemed to fix it too.

With BGL, IO nodes mount the pvfs volume every time they reboot, which is essentially every time a new job runs. So the problem wasn't visible there, but it was possibly the cause (many connections coming and going over time). This would cause the degradation on the login nodes, which remained up and connected for weeks and months. My guess is that it hasn't been a visible problem for many users because their workloads differ -- either they always use the VFS and not the admin tools/MPI-IO, or vice-versa. Mixed MPI-IO runs and a mounted pvfs volume should cause the slowdown in the mounted volume though.


I'm sure some of us will look at the traces and dumps too, if you
send them out.

The traces are huge. :-) On the order of ~500MB. I can probably put them on the web somewhere or something if you really want to sift through them. I also have zoomed in plots of the plots I sent in the previous email, which I can send. I've attached an example, but I have lots more :-).

<<inline: Picture 19.png>>


The dumps are not as large.  I'll try to dig them up.
-sam


                -- Pete


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to