hi-
I setup a simple 2 node OrangeFS 2.8.5 cluster on a ubuntu11 kernel
and I'm seeing some strange behavior. I am seeing a file that "ls -l"'s
like this:
? ?????????? ? ? ? ? ? prime.msg
I'm wondering if I've done something wrong?
DETAILS:
I've got two nodes: h0 and h1. I first init OrangeFS 2.8.5 with
a script I wrote called pvfs-setup that basically calls pvfs2-genconfig
non-interactively, and then does:
/usr/local/sbin/pvfs2-server /etc/pvfs2-fs.conf -f
to finish the setup.
h0> sudo /users/chuck/usr/new/emulab-withpaths /users/chuck/usr/new/pvfs-setup
1905517811 h0
[S 05/29/2012 16:58:39] PVFS2 Server on node h0 version 2.8.5-orangefs
starting...
[D 05/29/2012 16:58:39] PVFS2 Server: storage space created. Exiting.
h0> ssh h1 sudo /users/chuck/usr/new/emulab-withpaths
/users/chuck/usr/new/pvfs-setup 1905517811 h0
[S 05/29/2012 16:58:45] PVFS2 Server on node h1 version 2.8.5-orangefs
starting...
[D 05/29/2012 16:58:45] PVFS2 Server: storage space created. Exiting.
h0>
next I start PVFS on both nodes and mount the newly created filesystem:
h0> df /m/pvfs
Filesystem 1K-blocks Used Available Use% Mounted on
tcp://h0:3334/pvfs2-fs
125399040K 106496K 125292544K 1% /m/pvfs
h0> ssh h1 df /m/pvfs
Filesystem 1K-blocks Used Available Use% Mounted on
tcp://h1:3334/pvfs2-fs
125399040K 106496K 125292544K 1% /m/pvfs
h0>
finally, I run my test script that creates a file /m/pvfs/prime.msg
with a random string in it (the date) and then cats the file on both
nodes (the data should match, right?).
the script works the first time, but later it fails with a garbage
file in /m/pvfs.
Here's the script source:
#!/usr/bin/perl
use strict;
my($dir, $msg, @hosts);
$dir = "/m/pvfs"; # PVFS mount point
sleep(1);
$msg = time();
unlink("$dir/prime.msg");
sleep(1);
unlink("$dir/prime.msg");
sleep(1);
open(PM, ">$dir/prime.msg") || die "cannot open $dir/prime.msg ($!)";
print PM "$msg\n" || die "cannot print msg ($!)";
close(PM) || die "cannot close msg file ($!)";
sleep(1);
@hosts = ("h0", "h1");
print "EXPECT: $msg\n";
foreach (@hosts) {
print "$_ ";
system("ssh $_ cat $dir/prime.msg");
}
print "\n";
foreach (@hosts) {
print "$_ ";
system("ssh $_ ls -ls $dir");
}
exit(0);
and here is the output that shows where it fails after the first run:
h0> perl ptest
EXPECT: 1338325147
h0 1338325147
h1 1338325147
h0 total 8K
4K drwxrwxrwx 1 root root 4096 2012-05-29 16:58 lost+found
4K -rw-r--r-- 1 chuck plfs 11 2012-05-29 16:59 prime.msg
h1 total 8K
4K drwxrwxrwx 1 root root 4096 2012-05-29 16:58 lost+found
4K -rw-r--r-- 1 chuck plfs 11 2012-05-29 16:59 prime.msg
h0>
h0>
h0> perl ptest
EXPECT: 1338325160
h0 1338325160
h1 cat: /m/pvfs/prime.msg: No such file or directory
h0 total 8K
4K drwxrwxrwx 1 root root 4096 2012-05-29 16:58 lost+found
4K -rw-r--r-- 1 chuck plfs 11 2012-05-29 16:59 prime.msg
h1 ls: cannot access /m/pvfs/prime.msg: Input/output error
total 4K
4K drwxrwxrwx 1 root root 4096 2012-05-29 16:58 lost+found
? ?????????? ? ? ? ? ? prime.msg
h0>
h0> perl ptest
EXPECT: 1338325171
h0 1338325171
h1 cat: /m/pvfs/prime.msg: Input/output error
h0 total 8K
4K drwxrwxrwx 1 root root 4096 2012-05-29 16:58 lost+found
4K -rw-r--r-- 1 chuck plfs 11 2012-05-29 16:59 prime.msg
h1 ls: cannot access /m/pvfs/prime.msg: Input/output error
total 4K
4K drwxrwxrwx 1 root root 4096 2012-05-29 16:58 lost+found
? ?????????? ? ? ? ? ? prime.msg
h0>
note that the metadata is all "????"'s when view on node h1,
but it looks fine on node h0.
I find that if I umount /m/pvfs, stop all PVFS processes, and
then restart them, then the test script will work once and
then fail, as above.
Furthermore, I have an old ubuntu10 install with OrangeFS 2.8.4
installed on it, and the test script runs just fine on it, so I
wonder if this is something to do with the 2.8.5 release?
I also noticed the recent message from Andrew Savchenko <[email protected]>
dated Tue, 15 May 2012 08:20:00 +0400
("Problems with intense parallel i/o via kernel VFS") where he
also got files with lots of ????'s:
From: Andrew Savchenko <[email protected]>
To: [email protected]
Subject: Re: [Pvfs2-users] Problems with intense parallel i/o via kernel VFS
...
> Please clarify:
> AFTER the make process has completed, you can access the files using
> pvfs2-cp but not through the kernel module? If this is true, can you send
> me the output of an "ls -al" and "pvfs2-ls -al"?
Yes, this is true. With an exception after "make process has failed" :)
ls -al shows:
?????????? ? ? ? ? ? site_que_attr_def.ht
I wonder if my problem is related to Andrew's?
chuck
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users