I wasn't able to reproduce the problem by just killing the server process. I tried both killing the server process and powering off the server and the client handled errors from the killing of the server process fine.
I was using a 2.4.21-27 kernel on a RHEL3 client... I'll see if I can reproduce on a 2.6 kernel. > -----Original Message----- > From: Sam Lang [mailto:[EMAIL PROTECTED] > Sent: Wednesday, February 22, 2006 4:48 PM > To: [EMAIL PROTECTED] > Cc: [email protected] > Subject: Re: [Pvfs2-developers] Problem with multiple pvfs2 > file systems mounted on a single client > > > Hi David, > > I tried to reproduce your results with the 2.6 kernel, and > wasn't able to. Are you using 2.4? Also, I didn't actually > pull the plug on one of the nodes, I just killed the server, > but that should be close enough to your test case unless > you're routing stuff through that node ;-). > > -sam > > On Feb 22, 2006, at 12:16 PM, David Metheny wrote: > > > It appears the error described below will span across other mounted > > file systems on a client when encountered, until the client > software > > is reloaded. > > > > > > I've got a client with 2 pvfs2 file systems mounted: > > > > /mnt/pvfs2 > > /mnt/pvfs2-tmp > > > > Both PVFS2 file system configurations contained the following when > > mounted: > > ServerJobBMITimeoutSecs 30 > > ServerJobFlowTimeoutSecs 30 > > ClientJobBMITimeoutSecs 300 > > ClientJobFlowTimeoutSecs 300 > > ClientRetryLimit 5 > > ClientRetryDelayMilliSecs 2000 > > > > I've dynamically changed the clients timeout settings after the > > mounts: > > [EMAIL PROTECTED] root]# /sbin/sysctl -w pvfs2.op-timeout-secs=5 > > > > A pvfs2 server node lost power on the /mnt/pvfs2 file system. After > > issuing a "df -h /mnt/pvfs2", the client received a "connection > > timed-out" > > error. > > > > [EMAIL PROTECTED] root]# df -h /mnt/pvfs2 > > Filesystem Size Used Avail Use% Mounted on > > df: `/mnt/pvfs2': Connection timed out > > > > An immediate subsequent "df -h /mnt/pvfs2-tmp" also returned > > "connection timed out" > > [EMAIL PROTECTED] root]# df -h /mnt/pvfs2-tmp > > df: `/mnt/pvfs2-tmp': Connection timed out > > > > An unmount of the /mnt/pvfs2 shared works fine. > > [EMAIL PROTECTED] root]# umount /mnt/pvfs2 > > > > Another subsequent ""df -h /mnt/pvfs2-tmp" still returns > "connection > > timed out" > > [EMAIL PROTECTED] root]# df -h /mnt/pvfs2-tmp > > df: `/mnt/pvfs2-tmp': Connection timed out > > > > After unloading the userspace and kernel module, restarting pvfs2 > > software, and remounting the /mnt/pvfs2-tmp filesystem, a "df -h > > /mnt/pvfs2-tmp" > > successfully completed > > [EMAIL PROTECTED] root]# df -h /mnt/pvfs2-tmp > > Filesystem Size Used Avail Use% Mounted on > > hostname:3334/pvfs2-fs > > 1.9T 381G 1.6T 20% /mnt/pvfs2-tmp > > > > > > The pvfs2 client log contained: > > [E 02/22 11:28] msgpair failed, will retry:: Connection refused [E > > 02/22 11:28] msgpair failed, will retry:: Connection > refused [E 02/22 > > 11:28] msgpair failed, will retry:: Connection refused [E > 02/22 11:29] > > msgpair failed, will retry:: Connection refused [E 02/22 11:29] > > msgpair failed, will retry:: Connection refused [E 02/22 11:29] > > msgpair failed, will retry:: Connection refused [E 02/22 11:29] *** > > msgpairarray_completion_fn: msgpair to server > > tcp://hvcwydev0329:3334 failed: Connection refused [E 02/22 11:29] > > *** Out of retries. > > [E 02/22 11:29] Statfs failed: Connection refused [E 02/22 11:36] > > msgpair failed, will retry:: Operation cancelled (possibly due to > > timeout) [E 02/22 11:39] msgpair failed, will retry:: > Connection timed > > out [E 02/22 11:42] msgpair failed, will retry:: Connection > timed out > > > > _______________________________________________ > > Pvfs2-developers mailing list > > [email protected] > > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers > > > _______________________________________________ Pvfs2-developers mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
