On Feb 23, 2006, at 9:35 AM, David Metheny wrote:
This seems to happen on a 2.6 kernel also. I'm using a 2.6.9-22 on
a RHEL4
client.
I also attempted this with the network going away on a pvfs2 server
node. I
issued a
"ifdown eth0 && sleep 200 && ifup eth0" on a pvfs2 server node on the
/mnt/pvfs2 file system. I went through the same process of issuing
a "df" on
the /mnt/pvfs2, getting a connection timed out, then a "df" on the
/mnt/pvfs2-tmp, and got a connection timed out also. I watched
(ping) the
pvfs2 server node where eth0 was brought down, and immediately
after eth0
came back up, I issued a "df" on /mnt/pvfs2-tmp again. It worked at
this
point.
Hi David,
I get a little different behavior. If I create a network partition
between client and server2 nodes, and then do a df -h <mnt1>. I get
an operation timed out error on the first attempt, but repeated
attempts are successful. Also, when I do df -h <mnt2> my error is a
little different. Instead of connection timed-out, I get a Invalid
Argument (EINVAL). Not sure what's up with that. I'll keep looking
into the initial connection timed-out behavior, just wanted to give
you an update.
-sam
-----Original Message-----
From: David Metheny [mailto:[EMAIL PROTECTED]
Sent: Thursday, February 23, 2006 8:27 AM
To: 'Sam Lang'
Cc: '[email protected]'
Subject: RE: [Pvfs2-developers] Problem with multiple pvfs2
file systems mounted on a single client
I wasn't able to reproduce the problem by just killing the
server process. I tried both killing the server process and
powering off the server and the client handled errors from
the killing of the server process fine.
I was using a 2.4.21-27 kernel on a RHEL3 client... I'll see
if I can reproduce on a 2.6 kernel.
-----Original Message-----
From: Sam Lang [mailto:[EMAIL PROTECTED]
Sent: Wednesday, February 22, 2006 4:48 PM
To: [EMAIL PROTECTED]
Cc: [email protected]
Subject: Re: [Pvfs2-developers] Problem with multiple pvfs2 file
systems mounted on a single client
Hi David,
I tried to reproduce your results with the 2.6 kernel, and
wasn't able
to. Are you using 2.4? Also, I didn't actually pull the
plug on one
of the nodes, I just killed the server, but that should be close
enough to your test case unless you're routing stuff
through that node
;-).
-sam
On Feb 22, 2006, at 12:16 PM, David Metheny wrote:
It appears the error described below will span across
other mounted
file systems on a client when encountered, until the client
software
is reloaded.
I've got a client with 2 pvfs2 file systems mounted:
/mnt/pvfs2
/mnt/pvfs2-tmp
Both PVFS2 file system configurations contained the following when
mounted:
ServerJobBMITimeoutSecs 30
ServerJobFlowTimeoutSecs 30
ClientJobBMITimeoutSecs 300
ClientJobFlowTimeoutSecs 300
ClientRetryLimit 5
ClientRetryDelayMilliSecs 2000
I've dynamically changed the clients timeout settings after the
mounts:
[EMAIL PROTECTED] root]# /sbin/sysctl -w pvfs2.op-timeout-secs=5
A pvfs2 server node lost power on the /mnt/pvfs2 file
system. After
issuing a "df -h /mnt/pvfs2", the client received a "connection
timed-out"
error.
[EMAIL PROTECTED] root]# df -h /mnt/pvfs2
Filesystem Size Used Avail Use% Mounted on
df: `/mnt/pvfs2': Connection timed out
An immediate subsequent "df -h /mnt/pvfs2-tmp" also returned
"connection timed out"
[EMAIL PROTECTED] root]# df -h /mnt/pvfs2-tmp
df: `/mnt/pvfs2-tmp': Connection timed out
An unmount of the /mnt/pvfs2 shared works fine.
[EMAIL PROTECTED] root]# umount /mnt/pvfs2
Another subsequent ""df -h /mnt/pvfs2-tmp" still returns
"connection
timed out"
[EMAIL PROTECTED] root]# df -h /mnt/pvfs2-tmp
df: `/mnt/pvfs2-tmp': Connection timed out
After unloading the userspace and kernel module, restarting pvfs2
software, and remounting the /mnt/pvfs2-tmp filesystem, a "df -h
/mnt/pvfs2-tmp"
successfully completed
[EMAIL PROTECTED] root]# df -h /mnt/pvfs2-tmp
Filesystem Size Used Avail Use% Mounted on
hostname:3334/pvfs2-fs
1.9T 381G 1.6T 20% /mnt/pvfs2-tmp
The pvfs2 client log contained:
[E 02/22 11:28] msgpair failed, will retry:: Connection refused [E
02/22 11:28] msgpair failed, will retry:: Connection
refused [E 02/22
11:28] msgpair failed, will retry:: Connection refused [E
02/22 11:29]
msgpair failed, will retry:: Connection refused [E 02/22 11:29]
msgpair failed, will retry:: Connection refused [E 02/22 11:29]
msgpair failed, will retry:: Connection refused [E 02/22
11:29] ***
msgpairarray_completion_fn: msgpair to server
tcp://hvcwydev0329:3334 failed: Connection refused [E
02/22 11:29]
*** Out of retries.
[E 02/22 11:29] Statfs failed: Connection refused [E 02/22 11:36]
msgpair failed, will retry:: Operation cancelled (possibly due to
timeout) [E 02/22 11:39] msgpair failed, will retry::
Connection timed
out [E 02/22 11:42] msgpair failed, will retry:: Connection
timed out
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers