This seems to happen on a 2.6 kernel also. I'm using a 2.6.9-22 on a RHEL4 client.
I also attempted this with the network going away on a pvfs2 server node. I issued a "ifdown eth0 && sleep 200 && ifup eth0" on a pvfs2 server node on the /mnt/pvfs2 file system. I went through the same process of issuing a "df" on the /mnt/pvfs2, getting a connection timed out, then a "df" on the /mnt/pvfs2-tmp, and got a connection timed out also. I watched (ping) the pvfs2 server node where eth0 was brought down, and immediately after eth0 came back up, I issued a "df" on /mnt/pvfs2-tmp again. It worked at this point. > -----Original Message----- > From: David Metheny [mailto:[EMAIL PROTECTED] > Sent: Thursday, February 23, 2006 8:27 AM > To: 'Sam Lang' > Cc: '[email protected]' > Subject: RE: [Pvfs2-developers] Problem with multiple pvfs2 > file systems mounted on a single client > > I wasn't able to reproduce the problem by just killing the > server process. I tried both killing the server process and > powering off the server and the client handled errors from > the killing of the server process fine. > > I was using a 2.4.21-27 kernel on a RHEL3 client... I'll see > if I can reproduce on a 2.6 kernel. > > > -----Original Message----- > > From: Sam Lang [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, February 22, 2006 4:48 PM > > To: [EMAIL PROTECTED] > > Cc: [email protected] > > Subject: Re: [Pvfs2-developers] Problem with multiple pvfs2 file > > systems mounted on a single client > > > > > > Hi David, > > > > I tried to reproduce your results with the 2.6 kernel, and > wasn't able > > to. Are you using 2.4? Also, I didn't actually pull the > plug on one > > of the nodes, I just killed the server, but that should be close > > enough to your test case unless you're routing stuff > through that node > > ;-). > > > > -sam > > > > On Feb 22, 2006, at 12:16 PM, David Metheny wrote: > > > > > It appears the error described below will span across > other mounted > > > file systems on a client when encountered, until the client > > software > > > is reloaded. > > > > > > > > > I've got a client with 2 pvfs2 file systems mounted: > > > > > > /mnt/pvfs2 > > > /mnt/pvfs2-tmp > > > > > > Both PVFS2 file system configurations contained the following when > > > mounted: > > > ServerJobBMITimeoutSecs 30 > > > ServerJobFlowTimeoutSecs 30 > > > ClientJobBMITimeoutSecs 300 > > > ClientJobFlowTimeoutSecs 300 > > > ClientRetryLimit 5 > > > ClientRetryDelayMilliSecs 2000 > > > > > > I've dynamically changed the clients timeout settings after the > > > mounts: > > > [EMAIL PROTECTED] root]# /sbin/sysctl -w pvfs2.op-timeout-secs=5 > > > > > > A pvfs2 server node lost power on the /mnt/pvfs2 file > system. After > > > issuing a "df -h /mnt/pvfs2", the client received a "connection > > > timed-out" > > > error. > > > > > > [EMAIL PROTECTED] root]# df -h /mnt/pvfs2 > > > Filesystem Size Used Avail Use% Mounted on > > > df: `/mnt/pvfs2': Connection timed out > > > > > > An immediate subsequent "df -h /mnt/pvfs2-tmp" also returned > > > "connection timed out" > > > [EMAIL PROTECTED] root]# df -h /mnt/pvfs2-tmp > > > df: `/mnt/pvfs2-tmp': Connection timed out > > > > > > An unmount of the /mnt/pvfs2 shared works fine. > > > [EMAIL PROTECTED] root]# umount /mnt/pvfs2 > > > > > > Another subsequent ""df -h /mnt/pvfs2-tmp" still returns > > "connection > > > timed out" > > > [EMAIL PROTECTED] root]# df -h /mnt/pvfs2-tmp > > > df: `/mnt/pvfs2-tmp': Connection timed out > > > > > > After unloading the userspace and kernel module, restarting pvfs2 > > > software, and remounting the /mnt/pvfs2-tmp filesystem, a "df -h > > > /mnt/pvfs2-tmp" > > > successfully completed > > > [EMAIL PROTECTED] root]# df -h /mnt/pvfs2-tmp > > > Filesystem Size Used Avail Use% Mounted on > > > hostname:3334/pvfs2-fs > > > 1.9T 381G 1.6T 20% /mnt/pvfs2-tmp > > > > > > > > > The pvfs2 client log contained: > > > [E 02/22 11:28] msgpair failed, will retry:: Connection refused [E > > > 02/22 11:28] msgpair failed, will retry:: Connection > > refused [E 02/22 > > > 11:28] msgpair failed, will retry:: Connection refused [E > > 02/22 11:29] > > > msgpair failed, will retry:: Connection refused [E 02/22 11:29] > > > msgpair failed, will retry:: Connection refused [E 02/22 11:29] > > > msgpair failed, will retry:: Connection refused [E 02/22 > 11:29] *** > > > msgpairarray_completion_fn: msgpair to server > > > tcp://hvcwydev0329:3334 failed: Connection refused [E > 02/22 11:29] > > > *** Out of retries. > > > [E 02/22 11:29] Statfs failed: Connection refused [E 02/22 11:36] > > > msgpair failed, will retry:: Operation cancelled (possibly due to > > > timeout) [E 02/22 11:39] msgpair failed, will retry:: > > Connection timed > > > out [E 02/22 11:42] msgpair failed, will retry:: Connection > > timed out > > > > > > _______________________________________________ > > > Pvfs2-developers mailing list > > > [email protected] > > > > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers > > > > > > _______________________________________________ Pvfs2-developers mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
