This seems to happen on a 2.6 kernel also. I'm using a 2.6.9-22 on a RHEL4
client. 

I also attempted this with the network going away on a pvfs2 server node. I
issued a 
"ifdown eth0 && sleep 200 && ifup eth0" on a pvfs2 server node on the
/mnt/pvfs2 file system. I went through the same process of issuing a "df" on
the /mnt/pvfs2, getting a connection timed out, then a "df" on the
/mnt/pvfs2-tmp, and got a connection timed out also. I watched (ping) the
pvfs2 server node where eth0 was brought down, and immediately after eth0
came back up, I issued a "df" on /mnt/pvfs2-tmp again. It worked at this
point. 

> -----Original Message-----
> From: David Metheny [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, February 23, 2006 8:27 AM
> To: 'Sam Lang'
> Cc: '[email protected]'
> Subject: RE: [Pvfs2-developers] Problem with multiple pvfs2 
> file systems mounted on a single client
> 
> I wasn't able to reproduce the problem by just killing the 
> server process. I tried both killing the server process and 
> powering off the server and the client handled errors from 
> the killing of the server process fine. 
> 
> I was using a 2.4.21-27 kernel on a RHEL3 client... I'll see 
> if I can reproduce on a 2.6 kernel. 
> 
> > -----Original Message-----
> > From: Sam Lang [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, February 22, 2006 4:48 PM
> > To: [EMAIL PROTECTED]
> > Cc: [email protected]
> > Subject: Re: [Pvfs2-developers] Problem with multiple pvfs2 file 
> > systems mounted on a single client
> > 
> > 
> > Hi David,
> > 
> > I tried to reproduce your results with the 2.6 kernel, and 
> wasn't able 
> > to.  Are you using 2.4?  Also, I didn't actually pull the 
> plug on one 
> > of the nodes, I just killed the server, but that should be close 
> > enough to your test case unless you're routing stuff 
> through that node 
> > ;-).
> > 
> > -sam
> > 
> > On Feb 22, 2006, at 12:16 PM, David Metheny wrote:
> > 
> > > It appears the error described below will span across 
> other mounted 
> > > file systems on a client when encountered, until the client
> > software
> > > is reloaded.
> > >
> > >
> > > I've got a client with 2 pvfs2 file systems mounted:
> > >
> > >   /mnt/pvfs2
> > >   /mnt/pvfs2-tmp
> > >
> > > Both PVFS2 file system configurations contained the following when
> > > mounted:
> > >         ServerJobBMITimeoutSecs 30
> > >         ServerJobFlowTimeoutSecs 30
> > >         ClientJobBMITimeoutSecs 300
> > >         ClientJobFlowTimeoutSecs 300
> > >         ClientRetryLimit 5
> > >         ClientRetryDelayMilliSecs 2000
> > >
> > > I've dynamically changed the clients timeout settings after the
> > > mounts:
> > >   [EMAIL PROTECTED] root]# /sbin/sysctl -w pvfs2.op-timeout-secs=5
> > >
> > > A pvfs2 server node lost power on the /mnt/pvfs2 file 
> system. After 
> > > issuing a "df -h /mnt/pvfs2", the client received a "connection 
> > > timed-out"
> > > error.
> > >
> > >   [EMAIL PROTECTED] root]# df -h /mnt/pvfs2
> > >   Filesystem            Size  Used Avail Use% Mounted on
> > >   df: `/mnt/pvfs2': Connection timed out
> > >
> > > An immediate subsequent "df -h /mnt/pvfs2-tmp" also returned 
> > > "connection timed out"
> > >   [EMAIL PROTECTED] root]# df -h /mnt/pvfs2-tmp
> > >   df: `/mnt/pvfs2-tmp': Connection timed out
> > >
> > > An unmount of the /mnt/pvfs2 shared works fine.
> > >   [EMAIL PROTECTED] root]# umount /mnt/pvfs2
> > >
> > > Another subsequent ""df -h /mnt/pvfs2-tmp" still returns
> > "connection
> > > timed out"
> > >   [EMAIL PROTECTED] root]# df -h /mnt/pvfs2-tmp
> > >   df: `/mnt/pvfs2-tmp': Connection timed out
> > >
> > > After unloading the userspace and kernel module, restarting pvfs2 
> > > software, and remounting the /mnt/pvfs2-tmp filesystem, a "df -h 
> > > /mnt/pvfs2-tmp"
> > > successfully completed
> > > [EMAIL PROTECTED] root]# df -h /mnt/pvfs2-tmp
> > > Filesystem            Size  Used Avail Use% Mounted on
> > > hostname:3334/pvfs2-fs
> > >                       1.9T  381G  1.6T  20% /mnt/pvfs2-tmp
> > >
> > >
> > > The pvfs2 client log contained:
> > > [E 02/22 11:28] msgpair failed, will retry:: Connection refused [E
> > > 02/22 11:28] msgpair failed, will retry:: Connection
> > refused [E 02/22
> > > 11:28] msgpair failed, will retry:: Connection refused [E
> > 02/22 11:29]
> > > msgpair failed, will retry:: Connection refused [E 02/22 11:29] 
> > > msgpair failed, will retry:: Connection refused [E 02/22 11:29] 
> > > msgpair failed, will retry:: Connection refused [E 02/22 
> 11:29] ***
> > > msgpairarray_completion_fn: msgpair to server
> > > tcp://hvcwydev0329:3334 failed: Connection  refused [E 
> 02/22 11:29]
> > > *** Out of retries.
> > > [E 02/22 11:29] Statfs failed: Connection refused [E 02/22 11:36] 
> > > msgpair failed, will retry:: Operation cancelled (possibly due to
> > > timeout) [E 02/22 11:39] msgpair failed, will retry:: 
> > Connection timed
> > > out [E 02/22 11:42] msgpair failed, will retry:: Connection
> > timed out
> > >
> > > _______________________________________________
> > > Pvfs2-developers mailing list
> > > [email protected]
> > > 
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
> > >
> > 
> 

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to