> -----Original Message-----
> From: Sam Lang [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, February 22, 2006 12:47 PM
> To: [EMAIL PROTECTED]
> Cc: [email protected]
> Subject: Re: [Pvfs2-developers] Problem with multiple pvfs2 
> file systems mounted on a single client
> 
> 
> Hi David,
> 
> Just to clarify your setup a bit, you are running two 
> separate sets of metadata and io servers for the two 
> different mountpoints, so that for /mnt/pvfs2 you have one 
> set of nodes running the servers, and for /mnt/pvfs2-tmp you 
> have a completely separate set of server nodes?  
Yes. The two file systems have unique sets of nodes. 

> And when one 
> of the servers dies from one filesystem, the  
> other filesystem is unresponsive as well?    
Yes.

> What does pvfs2-ping  
> tell you about the status of the servers for /mnt/pvfs2-tmp?
The pvfs2-ping reports the system is up and running: 

[EMAIL PROTECTED] root]# pvfs2-ping -m /mnt/pvfs2-tmp

(1) Parsing tab file...

(2) Initializing system interface...

(3) Initializing each file system found in tab file: /etc/mtab...

   /mnt/pvfs2-tmp: Ok
   /mnt/pvfs2: Ok

(4) Searching for /mnt/pvfs2-tmp in pvfstab...

   PVFS2 servers: tcp://hvcwydev0380:3334
   Storage name: pvfs2-fs
   Local mount point: /mnt/pvfs2-tmp

   PVFS2 servers: tcp://hvcwydev0328:3334
   Storage name: pvfs2-fs
   Local mount point: /mnt/pvfs2

   meta servers:
   tcp://hvcwydev0380:3334
   tcp://hvcwydev0381:3334
   tcp://hvcwydev0382:3334
   tcp://hvcwydev0383:3334
   tcp://hvcwydev0384:3334
   tcp://hvcwydev0385:3334
   tcp://hvcwydev0386:3334
   tcp://hvcwydev0387:3334
   tcp://hvcwydev0388:3334
   tcp://hvcwydev0389:3334
   tcp://hvcwydev0390:3334
   tcp://hvcwydev0391:3334
   tcp://hvcwydev0392:3334
   tcp://hvcwydev0393:3334
   tcp://hvcwydev0394:3334

   data servers:
   tcp://hvcwydev0380:3334
   tcp://hvcwydev0381:3334
   tcp://hvcwydev0382:3334
   tcp://hvcwydev0383:3334
   tcp://hvcwydev0384:3334
   tcp://hvcwydev0385:3334
   tcp://hvcwydev0386:3334
   tcp://hvcwydev0387:3334
   tcp://hvcwydev0388:3334
   tcp://hvcwydev0389:3334
   tcp://hvcwydev0390:3334
   tcp://hvcwydev0391:3334
   tcp://hvcwydev0392:3334
   tcp://hvcwydev0393:3334
   tcp://hvcwydev0394:3334

(5) Verifying that all servers are responding...

   meta servers:
   tcp://hvcwydev0380:3334 Ok
   tcp://hvcwydev0381:3334 Ok
   tcp://hvcwydev0382:3334 Ok
   tcp://hvcwydev0383:3334 Ok
   tcp://hvcwydev0384:3334 Ok
   tcp://hvcwydev0385:3334 Ok
   tcp://hvcwydev0386:3334 Ok
   tcp://hvcwydev0387:3334 Ok
   tcp://hvcwydev0388:3334 Ok
   tcp://hvcwydev0389:3334 Ok
   tcp://hvcwydev0390:3334 Ok
   tcp://hvcwydev0391:3334 Ok
   tcp://hvcwydev0392:3334 Ok
   tcp://hvcwydev0393:3334 Ok
   tcp://hvcwydev0394:3334 Ok

   data servers:
   tcp://hvcwydev0380:3334 Ok
   tcp://hvcwydev0381:3334 Ok
   tcp://hvcwydev0382:3334 Ok
   tcp://hvcwydev0383:3334 Ok
   tcp://hvcwydev0384:3334 Ok
   tcp://hvcwydev0385:3334 Ok
   tcp://hvcwydev0386:3334 Ok
   tcp://hvcwydev0387:3334 Ok
   tcp://hvcwydev0388:3334 Ok
   tcp://hvcwydev0389:3334 Ok
   tcp://hvcwydev0390:3334 Ok
   tcp://hvcwydev0391:3334 Ok
   tcp://hvcwydev0392:3334 Ok
   tcp://hvcwydev0393:3334 Ok
   tcp://hvcwydev0394:3334 Ok

(6) Verifying that fsid 115831708 is acceptable to all servers...

   Ok; all servers understand fs_id 115831708

(7) Verifying that root handle is owned by one server...

   Root handle: 1048576
   Ok; root handle is owned by exactly one server.

=============================================================

The PVFS filesystem at /mnt/pvfs2-tmp appears to be correctly configured.


> 
> -sam
> 
> On Feb 22, 2006, at 12:16 PM, David Metheny wrote:
> 
> > It appears the error described below will span across other mounted 
> > file systems on a client when encountered, until the client 
> software 
> > is reloaded.
> >
> >
> > I've got a client with 2 pvfs2 file systems mounted:
> >
> >     /mnt/pvfs2
> >     /mnt/pvfs2-tmp
> >
> > Both PVFS2 file system configurations contained the following when
> > mounted:
> >         ServerJobBMITimeoutSecs 30
> >         ServerJobFlowTimeoutSecs 30
> >         ClientJobBMITimeoutSecs 300
> >         ClientJobFlowTimeoutSecs 300
> >         ClientRetryLimit 5
> >         ClientRetryDelayMilliSecs 2000
> >
> > I've dynamically changed the clients timeout settings after the
> > mounts:
> >     [EMAIL PROTECTED] root]# /sbin/sysctl -w pvfs2.op-timeout-secs=5
> >
> > A pvfs2 server node lost power on the /mnt/pvfs2 file system. After 
> > issuing a "df -h /mnt/pvfs2", the client received a "connection 
> > timed-out"
> > error.
> >
> >     [EMAIL PROTECTED] root]# df -h /mnt/pvfs2
> >     Filesystem            Size  Used Avail Use% Mounted on
> >     df: `/mnt/pvfs2': Connection timed out
> >
> > An immediate subsequent "df -h /mnt/pvfs2-tmp" also returned 
> > "connection timed out"
> >     [EMAIL PROTECTED] root]# df -h /mnt/pvfs2-tmp
> >     df: `/mnt/pvfs2-tmp': Connection timed out
> >
> > An unmount of the /mnt/pvfs2 shared works fine.
> >     [EMAIL PROTECTED] root]# umount /mnt/pvfs2
> >
> > Another subsequent ""df -h /mnt/pvfs2-tmp" still returns 
> "connection 
> > timed out"
> >     [EMAIL PROTECTED] root]# df -h /mnt/pvfs2-tmp
> >     df: `/mnt/pvfs2-tmp': Connection timed out
> >
> > After unloading the userspace and kernel module, restarting pvfs2 
> > software, and remounting the /mnt/pvfs2-tmp filesystem, a "df -h 
> > /mnt/pvfs2-tmp"
> > successfully completed
> > [EMAIL PROTECTED] root]# df -h /mnt/pvfs2-tmp
> > Filesystem            Size  Used Avail Use% Mounted on
> > hostname:3334/pvfs2-fs
> >                       1.9T  381G  1.6T  20% /mnt/pvfs2-tmp
> >
> >
> > The pvfs2 client log contained:
> > [E 02/22 11:28] msgpair failed, will retry:: Connection refused [E 
> > 02/22 11:28] msgpair failed, will retry:: Connection 
> refused [E 02/22 
> > 11:28] msgpair failed, will retry:: Connection refused [E 
> 02/22 11:29] 
> > msgpair failed, will retry:: Connection refused [E 02/22 11:29] 
> > msgpair failed, will retry:: Connection refused [E 02/22 11:29] 
> > msgpair failed, will retry:: Connection refused [E 02/22 11:29] *** 
> > msgpairarray_completion_fn: msgpair to server
> > tcp://hvcwydev0329:3334 failed: Connection  refused [E 02/22 11:29] 
> > *** Out of retries.
> > [E 02/22 11:29] Statfs failed: Connection refused [E 02/22 11:36] 
> > msgpair failed, will retry:: Operation cancelled (possibly due to 
> > timeout) [E 02/22 11:39] msgpair failed, will retry:: 
> Connection timed 
> > out [E 02/22 11:42] msgpair failed, will retry:: Connection 
> timed out
> >
> > _______________________________________________
> > Pvfs2-developers mailing list
> > [email protected]
> > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
> >
> 

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to