I was able to replicate it in a little bit simpler environment this afternoon. It looks like the problem is with the statfs and/or mount upcalls.

The problem with those two is that they are serviced in pvfs2-client-core using blocking functions- so if one of them hangs on a long network timeout then no other operations (even to other file systems) can be processed.

The kernel module has an operation timeout value that is independent of the BMI timeout that pvfs2-client uses; therefore even once the statfs command has timed out the pvfs2-client daemon is probably still hung for a relatively long time (ClientJobBMITimeoutSecs * ClientRetryLimit seconds).

I will look into seeing if it is possible to make nonblocking versions of these service functions...

-Phil

Sam Lang wrote:

On Feb 23, 2006, at 9:35 AM, David Metheny wrote:

This seems to happen on a 2.6 kernel also. I'm using a 2.6.9-22 on a RHEL4
client.

I also attempted this with the network going away on a pvfs2 server node. I
issued a
"ifdown eth0 && sleep 200 && ifup eth0" on a pvfs2 server node on the
/mnt/pvfs2 file system. I went through the same process of issuing a "df" on
the /mnt/pvfs2, getting a connection timed out, then a "df" on the
/mnt/pvfs2-tmp, and got a connection timed out also. I watched (ping) the pvfs2 server node where eth0 was brought down, and immediately after eth0
came back up, I issued a "df" on /mnt/pvfs2-tmp again. It worked at  this
point.


Hi David,

I get a little different behavior. If I create a network partition between client and server2 nodes, and then do a df -h <mnt1>. I get an operation timed out error on the first attempt, but repeated attempts are successful. Also, when I do df -h <mnt2> my error is a little different. Instead of connection timed-out, I get a Invalid Argument (EINVAL). Not sure what's up with that. I'll keep looking into the initial connection timed-out behavior, just wanted to give you an update.

-sam

-----Original Message-----
From: David Metheny [mailto:[EMAIL PROTECTED]
Sent: Thursday, February 23, 2006 8:27 AM
To: 'Sam Lang'
Cc: '[email protected]'
Subject: RE: [Pvfs2-developers] Problem with multiple pvfs2
file systems mounted on a single client

I wasn't able to reproduce the problem by just killing the
server process. I tried both killing the server process and
powering off the server and the client handled errors from
the killing of the server process fine.

I was using a 2.4.21-27 kernel on a RHEL3 client... I'll see
if I can reproduce on a 2.6 kernel.

-----Original Message-----
From: Sam Lang [mailto:[EMAIL PROTECTED]
Sent: Wednesday, February 22, 2006 4:48 PM
To: [EMAIL PROTECTED]
Cc: [email protected]
Subject: Re: [Pvfs2-developers] Problem with multiple pvfs2 file
systems mounted on a single client


Hi David,

I tried to reproduce your results with the 2.6 kernel, and

wasn't able

to.  Are you using 2.4?  Also, I didn't actually pull the

plug on one

of the nodes, I just killed the server, but that should be close
enough to your test case unless you're routing stuff

through that node

;-).

-sam

On Feb 22, 2006, at 12:16 PM, David Metheny wrote:

It appears the error described below will span across

other mounted

file systems on a client when encountered, until the client

software

is reloaded.


I've got a client with 2 pvfs2 file systems mounted:

    /mnt/pvfs2
    /mnt/pvfs2-tmp

Both PVFS2 file system configurations contained the following when
mounted:
        ServerJobBMITimeoutSecs 30
        ServerJobFlowTimeoutSecs 30
        ClientJobBMITimeoutSecs 300
        ClientJobFlowTimeoutSecs 300
        ClientRetryLimit 5
        ClientRetryDelayMilliSecs 2000

I've dynamically changed the clients timeout settings after the
mounts:
    [EMAIL PROTECTED] root]# /sbin/sysctl -w pvfs2.op-timeout-secs=5

A pvfs2 server node lost power on the /mnt/pvfs2 file

system. After

issuing a "df -h /mnt/pvfs2", the client received a "connection
timed-out"
error.

    [EMAIL PROTECTED] root]# df -h /mnt/pvfs2
    Filesystem            Size  Used Avail Use% Mounted on
    df: `/mnt/pvfs2': Connection timed out

An immediate subsequent "df -h /mnt/pvfs2-tmp" also returned
"connection timed out"
    [EMAIL PROTECTED] root]# df -h /mnt/pvfs2-tmp
    df: `/mnt/pvfs2-tmp': Connection timed out

An unmount of the /mnt/pvfs2 shared works fine.
    [EMAIL PROTECTED] root]# umount /mnt/pvfs2

Another subsequent ""df -h /mnt/pvfs2-tmp" still returns

"connection

timed out"
    [EMAIL PROTECTED] root]# df -h /mnt/pvfs2-tmp
    df: `/mnt/pvfs2-tmp': Connection timed out

After unloading the userspace and kernel module, restarting pvfs2
software, and remounting the /mnt/pvfs2-tmp filesystem, a "df -h
/mnt/pvfs2-tmp"
successfully completed
[EMAIL PROTECTED] root]# df -h /mnt/pvfs2-tmp
Filesystem            Size  Used Avail Use% Mounted on
hostname:3334/pvfs2-fs
                      1.9T  381G  1.6T  20% /mnt/pvfs2-tmp


The pvfs2 client log contained:
[E 02/22 11:28] msgpair failed, will retry:: Connection refused [E
02/22 11:28] msgpair failed, will retry:: Connection

refused [E 02/22

11:28] msgpair failed, will retry:: Connection refused [E

02/22 11:29]

msgpair failed, will retry:: Connection refused [E 02/22 11:29]
msgpair failed, will retry:: Connection refused [E 02/22 11:29]
msgpair failed, will retry:: Connection refused [E 02/22

11:29] ***

msgpairarray_completion_fn: msgpair to server
tcp://hvcwydev0329:3334 failed: Connection  refused [E

02/22 11:29]

*** Out of retries.
[E 02/22 11:29] Statfs failed: Connection refused [E 02/22 11:36]
msgpair failed, will retry:: Operation cancelled (possibly due to
timeout) [E 02/22 11:39] msgpair failed, will retry::

Connection timed

out [E 02/22 11:42] msgpair failed, will retry:: Connection

timed out


_______________________________________________
Pvfs2-developers mailing list
[email protected]

http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers






_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to