[Pvfs2-developers] Re: the halloween bug fixed

Sam Lang Fri, 05 Oct 2007 11:21:52 -0700


On Oct 5, 2007, at 12:14 PM, Sam Lang wrote:

On Oct 5, 2007, at 10:49 AM, Sam Lang wrote:
The obvious and easy fix is to have bmi-tcp return true fromDROP_ADDR_QUERY for all address references. As far as I can tell,the only thing we save by keeping them around is a little memoryallocation (the socket gets closed either way).
This suggested fix isn't right. The DEC_ADDR_REF which decrementsthe refcount to zero, is invoked after sending the final response,but that's usually before the client (in the case of the admintools) closes the connection. It looks like its thetcp_forget_addr in the bmi method that needs to call back out tothe bmi wrapper layer to remove the reference from the list. I cancall BMI_set_info(addr, BMI_TCP_CLOSE_SOCKET) from tcp_forget_addr,but that seems a bit backwards...

Actually it looks like we just need a companion function forbmi_method_addr_reg_callback.

-sam

-sam
In the changes I've been working on to get multiple addresssupport in BMI, I've already replaced the linked list with ahashtable, which wouldn't have made the problem go away, but thedegradation wouldn't have been quite as bad (may have made itharder to find, actually). Maybe its time to add some profilinginfo (perf stats?) to our basic list, queue and hash structuresthat would tell us how big they're getting.
Anyway, thanks to all for contributing to the debugging processfor this one.
-sam

On Sep 26, 2007, at 6:00 PM, Sam Lang wrote:
Hi All,
I've been trying to debug a problem with PVFS, where performancedegrades slowly with a long-lived (weeks and months) PVFSvolume. The degradation is significant -- simple metadataoperations are an order of magnitude slower after a month or so.The behavior turns out to only occur with the VFS and pvfs2-client daemon: performance of the admin tools (pvfs2-touch,pvfs2-rm, etc.) to the same set of servers remains good.Restarting the client daemon also fixes the problem, suggestingthat the long-lived open sockets are somehow the cause. Theslowness also appears to be at the servers not the clients: thesame kernel module and client daemon to a different filesystemand set of servers doesn't exhibit the performance degradation.
Also, I should mention that the system config is a littledifferent than usual. We have IO nodes mounting and unmountingthe PVFS volume (and stopping the client daemon) with eachuser's job, which is fairly frequent, while on the login nodes,the volume remains mounted for a long time (and where theperformance degrades).
Our hunch here is that epoll or our use of epoll on the serversis somehow to blame. Maybe the file descriptors opened on theserver for pvfs2-client-core are getting pushed down further andfurther into the epoll set, which for some reason is growing withnew connections coming and going. This might be the case if wewere failing to remove sockets from the set on disconnect, forexample. It doesn't look like that's happening though, at leastfor normal disconnects.
Its a PITA to debug, because the servers have to remain runningfor a long time (and the clients have to remain mounted) for theproblem to be visible. Rob suggested I use strace on the serversto see what epoll was doing, and that showed some interestingresults. Basically, it looks like epoll_wait takes significantlylonger when clients are doing operations over the VFS, ratherthan with the pvfs2 admin tools. Also, strace reported epoll_ctl(..., EPOLL_CTL_ADD, ...)) getting called a few times, even forthe VFS ops, and in those cases its returning EEXISTS.
I noticed that we add a socket to the epoll set whenever we get anew connection, or a read or write is posted (enqueue_operation),but we only remove the socket from the epoll set on errors ordisconnects. So why are we adding it for reads and writes? Anyconnected socket should already be in the set, no? I think thismay be why I'm seeing EEXISTS with strace.
Also, is it safe to check the error from epoll_ctl inBMI_socket_collection_[add|remove]?
And finally, assuming PVFS is actually using epoll callsproperly, does anyone know of epoll bugs on a SUSE 2.6.5 kernelthat would cause epoll_ctl(..., EPOLL_CTL_DEL, ....) to not dowhat its meant to? Googling epoll and SUSE 2.6.5 isn't turningup anything...
Thanks,
-sam


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

[Pvfs2-developers] Re: the halloween bug fixed

Reply via email to