Hi all: I'm (once again) experiencing system instability that appears to be traceable to pvfs2. Symptoms usually show up when one or more users start long SCP sessions for transferring 5+GB of data lasting several hours. I believe they usually have 1-3 sessions running in parallel. Symptoms include:
* High load averages (and climbing slowly with additional use) without supporting CPU load. The ONLY way to recover from this is reboot. My load average is currently 7.10 with 99.8% idle CPU. * hung SCP and other I/O processes * large amounts of RAM "missing" (Currently free -m reports 7552MB in use; adding up usage from all processes comes to about 1GB. * Often (always?) some users' files become unaccessible (although users have stopped reporting those problems as its happened so frequently). If I let this go a bit longer, there's a reasonable chance that the machine will just spontaneously reboot. There's nothing logged as to the cause. No OOM or other errors...Just one minute everything's fine; the next its booting up. Sometimes it will take a long time for these problems to build up (for example, right now the system load and memory issues are here with a couple days of "building"); sometimes the system will spontaneously reboot several times in one day (with no notice of climbing loads or the like). These problems so far have only happened on the head node (pvfs client); our compute nodes have not shown this problem. System configuration: Rocks 5.1 with manual pvfs setup (NOT using rocks-supplied PVFS binaries or configurations) pvfs 2.7.1 + patches from pcarns 3 CentOS 5 dedicated PVFS servers (each with ~10TB storage, Dell PERC 6/e + MD1000's) PVFS servers are running over bonded dual-gig connections using linux kernel ethernet bonding driver Clients are single-gig connected. no off-site pvfs2 access (scp/ssh/sftp access only, via the head node) Any suggestions? I'm getting fairly desperate for help, as pvfs2 has been the main destabilizing factor for the cluster since it went online, and causing spontaneous reboots is not a good thing.... Thanks! --Jim _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
