Re: [Pvfs2-developers] epoll fun

Sam Lang Wed, 03 Oct 2007 09:05:14 -0700


On Oct 3, 2007, at 10:17 AM, Pete Wyckoff wrote:

[EMAIL PROTECTED] wrote on Wed, 03 Oct 2007 09:52 -0500:

Tried that.  :-)  Its more or less the same problem with poll.  The
behavior of poll timings seems a bit less erratic than with epoll,
but the performance degradation is identical.

[..]

Its a PITA to debug, because the servers have to remain runningfor along time (and the clients have to remain mounted) for theproblem to
be visible.  Rob suggested I use strace on the servers to see what
epoll was doing, and that showed some interesting results.
Basically, it looks like epoll_wait takes significantly longer when
clients are doing operations over the VFS, rather than with thepvfs2
admin tools.  Also, strace reported epoll_ctl(...,
EPOLL_CTL_ADD, ...)) getting called a few times, even for the VFS
ops, and in those cases its returning EEXISTS.


Really?  Poll also behaves the same?  Now I am intrigued.

Heh. That's what it takes huh? I'll have to start adding randompoll comments in my emails to get through your filter. ;-)


You can't really tell how long epoll_wait is taking just using
strace, since it will wait until a packet arrives plus this
mysterious extra time.

Can you do something on the server like:

    tcpdump -ttt
    strace -tt -T

to distinguish the two cases of 1) epoll_wait is taking a long time
after the packet shows up at the host, vs 2) the client request
packet is taking a long time to show up.

I'm fairly sure its number 1). I got dumps off the server while Iwas doing creates and deletes over the VFS on a system that had beenrunning for a while and exhibited this performance degradation. Thedelay was seen between the receipt of the request, and the send ofthe response. Something in the server handling of the request wasslowing it down. At that point Rob suggested I strace the server tosee if it was system call related, and we noticed the behavior withepoll.


If (2), try the same exercise at the client side.

This has been a problem we've been seeing on our BGL system atArgonne for over a year. Its taken me that long to dig into wherethe degradation occurs, but it was tough to pin down, in part becauseit wasn't clear if it was a client side problem or server side. On aproduction system admins usually just restart everything. Afterworking with one of the admins here, we started noticing thatrestarting the servers seemed to fix it, but then noticed thatrestarting the client daemon seemed to fix it too.

With BGL, IO nodes mount the pvfs volume every time they reboot,which is essentially every time a new job runs. So the problemwasn't visible there, but it was possibly the cause (many connectionscoming and going over time). This would cause the degradation on thelogin nodes, which remained up and connected for weeks and months.My guess is that it hasn't been a visible problem for many usersbecause their workloads differ -- either they always use the VFS andnot the admin tools/MPI-IO, or vice-versa. Mixed MPI-IO runs and amounted pvfs volume should cause the slowdown in the mounted volumethough.


I'm sure some of us will look at the traces and dumps too, if you
send them out.

The traces are huge. :-) On the order of ~500MB. I can probablyput them on the web somewhere or something if you really want to siftthrough them. I also have zoomed in plots of the plots I sent in theprevious email, which I can send. I've attached an example, but Ihave lots more :-).

<<inline: Picture 19.png>>


The dumps are not as large.  I'll try to dig them up.
-sam


                -- Pete

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] epoll fun

Reply via email to