hi-

    I setup a simple 2 node OrangeFS 2.8.5 cluster on a ubuntu11 kernel
and I'm seeing some strange behavior.   I am seeing a file that "ls -l"'s
like this:

 ? ?????????? ? ?    ?       ?                ? prime.msg

I'm wondering if I've done something wrong?


DETAILS:

    I've got two nodes: h0 and h1.   I first init OrangeFS 2.8.5 with
a script I wrote called pvfs-setup that basically calls pvfs2-genconfig
non-interactively, and then does:

   /usr/local/sbin/pvfs2-server /etc/pvfs2-fs.conf -f

to finish the setup.

h0> sudo /users/chuck/usr/new/emulab-withpaths /users/chuck/usr/new/pvfs-setup 
1905517811 h0
[S 05/29/2012 16:58:39] PVFS2 Server on node h0 version 2.8.5-orangefs 
starting...
[D 05/29/2012 16:58:39] PVFS2 Server: storage space created. Exiting.
h0> ssh h1 sudo /users/chuck/usr/new/emulab-withpaths 
/users/chuck/usr/new/pvfs-setup 1905517811 h0
[S 05/29/2012 16:58:45] PVFS2 Server on node h1 version 2.8.5-orangefs 
starting...
[D 05/29/2012 16:58:45] PVFS2 Server: storage space created. Exiting.
h0> 


next I start PVFS on both nodes and mount the newly created filesystem:

h0> df /m/pvfs
Filesystem           1K-blocks      Used Available Use% Mounted on
tcp://h0:3334/pvfs2-fs
                     125399040K   106496K 125292544K   1% /m/pvfs
h0> ssh h1 df /m/pvfs
Filesystem           1K-blocks      Used Available Use% Mounted on
tcp://h1:3334/pvfs2-fs
                     125399040K   106496K 125292544K   1% /m/pvfs
h0> 


finally, I run my test script that creates a file /m/pvfs/prime.msg
with a random string in it (the date) and then cats the file on both
nodes (the data should match, right?).

the script works the first time, but later it fails with a garbage
file in /m/pvfs.   

Here's the script source:

#!/usr/bin/perl

use strict;

my($dir, $msg, @hosts);
$dir = "/m/pvfs";       # PVFS mount point

sleep(1);
$msg = time();
unlink("$dir/prime.msg");
sleep(1);
unlink("$dir/prime.msg");
sleep(1);
open(PM, ">$dir/prime.msg") || die "cannot open $dir/prime.msg ($!)";
print PM "$msg\n" || die "cannot print msg ($!)";
close(PM) || die "cannot close msg file ($!)";
sleep(1);

@hosts = ("h0", "h1");

print "EXPECT: $msg\n";
foreach (@hosts) {
   print "$_ ";
   system("ssh $_ cat $dir/prime.msg");
}
print "\n";
foreach (@hosts) {
   print "$_ ";
   system("ssh $_ ls -ls $dir");
}
exit(0);


and here is the output that shows where it fails after the first run:


h0> perl ptest 
EXPECT: 1338325147
h0 1338325147
h1 1338325147

h0 total 8K
4K drwxrwxrwx 1 root  root 4096 2012-05-29 16:58 lost+found
4K -rw-r--r-- 1 chuck plfs   11 2012-05-29 16:59 prime.msg
h1 total 8K
4K drwxrwxrwx 1 root  root 4096 2012-05-29 16:58 lost+found
4K -rw-r--r-- 1 chuck plfs   11 2012-05-29 16:59 prime.msg
h0> 
h0> 
h0> perl ptest
EXPECT: 1338325160
h0 1338325160
h1 cat: /m/pvfs/prime.msg: No such file or directory

h0 total 8K
4K drwxrwxrwx 1 root  root 4096 2012-05-29 16:58 lost+found
4K -rw-r--r-- 1 chuck plfs   11 2012-05-29 16:59 prime.msg
h1 ls: cannot access /m/pvfs/prime.msg: Input/output error
total 4K
4K drwxrwxrwx 1 root root 4096 2012-05-29 16:58 lost+found
 ? ?????????? ? ?    ?       ?                ? prime.msg
h0> 
h0> perl ptest
EXPECT: 1338325171
h0 1338325171
h1 cat: /m/pvfs/prime.msg: Input/output error

h0 total 8K
4K drwxrwxrwx 1 root  root 4096 2012-05-29 16:58 lost+found
4K -rw-r--r-- 1 chuck plfs   11 2012-05-29 16:59 prime.msg
h1 ls: cannot access /m/pvfs/prime.msg: Input/output error
total 4K
4K drwxrwxrwx 1 root root 4096 2012-05-29 16:58 lost+found
 ? ?????????? ? ?    ?       ?                ? prime.msg
h0> 


note that the metadata is all "????"'s when view on node h1,
but it looks fine on node h0.

I find that if I umount /m/pvfs, stop all PVFS processes, and
then restart them, then the test script will work once and
then fail, as above.


Furthermore, I have an old ubuntu10 install with OrangeFS 2.8.4
installed on it, and the test script runs just fine on it, so I
wonder if this is something to do with the 2.8.5 release?


I also noticed the recent message from Andrew Savchenko <[email protected]>
dated Tue, 15 May 2012 08:20:00 +0400 
("Problems with intense parallel i/o via kernel VFS") where he
also got files with lots of ????'s:

  From: Andrew Savchenko <[email protected]> 
  To: [email protected]
  Subject: Re: [Pvfs2-users] Problems with intense parallel i/o via kernel VFS

  ...

  > Please clarify:
  > AFTER the make process has completed, you can access the files using
  > pvfs2-cp but not through the kernel module?  If this is true, can you send
  > me the output of an "ls -al" and "pvfs2-ls -al"?
  
  Yes, this is true. With an exception after "make process has failed" :)
  
  ls -al shows:
  ?????????? ? ?       ?         ?            ? site_que_attr_def.ht



I wonder if my problem is related to Andrew's?



chuck
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to