On Feb 23, 2006, at 9:35 AM, David Metheny wrote:

This seems to happen on a 2.6 kernel also. I'm using a 2.6.9-22 on a RHEL4
client.

I also attempted this with the network going away on a pvfs2 server node. I
issued a
"ifdown eth0 && sleep 200 && ifup eth0" on a pvfs2 server node on the
/mnt/pvfs2 file system. I went through the same process of issuing a "df" on
the /mnt/pvfs2, getting a connection timed out, then a "df" on the
/mnt/pvfs2-tmp, and got a connection timed out also. I watched (ping) the pvfs2 server node where eth0 was brought down, and immediately after eth0 came back up, I issued a "df" on /mnt/pvfs2-tmp again. It worked at this
point.


Hi David,

I get a little different behavior. If I create a network partition between client and server2 nodes, and then do a df -h <mnt1>. I get an operation timed out error on the first attempt, but repeated attempts are successful. Also, when I do df -h <mnt2> my error is a little different. Instead of connection timed-out, I get a Invalid Argument (EINVAL). Not sure what's up with that. I'll keep looking into the initial connection timed-out behavior, just wanted to give you an update.

-sam

-----Original Message-----
From: David Metheny [mailto:[EMAIL PROTECTED]
Sent: Thursday, February 23, 2006 8:27 AM
To: 'Sam Lang'
Cc: '[email protected]'
Subject: RE: [Pvfs2-developers] Problem with multiple pvfs2
file systems mounted on a single client

I wasn't able to reproduce the problem by just killing the
server process. I tried both killing the server process and
powering off the server and the client handled errors from
the killing of the server process fine.

I was using a 2.4.21-27 kernel on a RHEL3 client... I'll see
if I can reproduce on a 2.6 kernel.

-----Original Message-----
From: Sam Lang [mailto:[EMAIL PROTECTED]
Sent: Wednesday, February 22, 2006 4:48 PM
To: [EMAIL PROTECTED]
Cc: [email protected]
Subject: Re: [Pvfs2-developers] Problem with multiple pvfs2 file
systems mounted on a single client


Hi David,

I tried to reproduce your results with the 2.6 kernel, and
wasn't able
to.  Are you using 2.4?  Also, I didn't actually pull the
plug on one
of the nodes, I just killed the server, but that should be close
enough to your test case unless you're routing stuff
through that node
;-).

-sam

On Feb 22, 2006, at 12:16 PM, David Metheny wrote:

It appears the error described below will span across
other mounted
file systems on a client when encountered, until the client
software
is reloaded.


I've got a client with 2 pvfs2 file systems mounted:

        /mnt/pvfs2
        /mnt/pvfs2-tmp

Both PVFS2 file system configurations contained the following when
mounted:
        ServerJobBMITimeoutSecs 30
        ServerJobFlowTimeoutSecs 30
        ClientJobBMITimeoutSecs 300
        ClientJobFlowTimeoutSecs 300
        ClientRetryLimit 5
        ClientRetryDelayMilliSecs 2000

I've dynamically changed the clients timeout settings after the
mounts:
        [EMAIL PROTECTED] root]# /sbin/sysctl -w pvfs2.op-timeout-secs=5

A pvfs2 server node lost power on the /mnt/pvfs2 file
system. After
issuing a "df -h /mnt/pvfs2", the client received a "connection
timed-out"
error.

        [EMAIL PROTECTED] root]# df -h /mnt/pvfs2
        Filesystem            Size  Used Avail Use% Mounted on
        df: `/mnt/pvfs2': Connection timed out

An immediate subsequent "df -h /mnt/pvfs2-tmp" also returned
"connection timed out"
        [EMAIL PROTECTED] root]# df -h /mnt/pvfs2-tmp
        df: `/mnt/pvfs2-tmp': Connection timed out

An unmount of the /mnt/pvfs2 shared works fine.
        [EMAIL PROTECTED] root]# umount /mnt/pvfs2

Another subsequent ""df -h /mnt/pvfs2-tmp" still returns
"connection
timed out"
        [EMAIL PROTECTED] root]# df -h /mnt/pvfs2-tmp
        df: `/mnt/pvfs2-tmp': Connection timed out

After unloading the userspace and kernel module, restarting pvfs2
software, and remounting the /mnt/pvfs2-tmp filesystem, a "df -h
/mnt/pvfs2-tmp"
successfully completed
[EMAIL PROTECTED] root]# df -h /mnt/pvfs2-tmp
Filesystem            Size  Used Avail Use% Mounted on
hostname:3334/pvfs2-fs
                      1.9T  381G  1.6T  20% /mnt/pvfs2-tmp


The pvfs2 client log contained:
[E 02/22 11:28] msgpair failed, will retry:: Connection refused [E
02/22 11:28] msgpair failed, will retry:: Connection
refused [E 02/22
11:28] msgpair failed, will retry:: Connection refused [E
02/22 11:29]
msgpair failed, will retry:: Connection refused [E 02/22 11:29]
msgpair failed, will retry:: Connection refused [E 02/22 11:29]
msgpair failed, will retry:: Connection refused [E 02/22
11:29] ***
msgpairarray_completion_fn: msgpair to server
tcp://hvcwydev0329:3334 failed: Connection  refused [E
02/22 11:29]
*** Out of retries.
[E 02/22 11:29] Statfs failed: Connection refused [E 02/22 11:36]
msgpair failed, will retry:: Operation cancelled (possibly due to
timeout) [E 02/22 11:39] msgpair failed, will retry::
Connection timed
out [E 02/22 11:42] msgpair failed, will retry:: Connection
timed out

_______________________________________________
Pvfs2-developers mailing list
[email protected]

http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers





_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to