The reboot of the I/O servers have now cleared up this failure. There is still some inconsistency in the filesystem while running pvfs2-fsck but I believe that I can clean these up.

Thanks,
Bill

Michael Moore wrote:
Hi Bill,

Sorry for the delay. The difference appears to be between what the management iterate handles call returns and what statfs returns to fsck. I'm looking now to get a better understanding how statfs and the trove ledger stuff gets it's counts versus how iterate handles counts them. In the mean time, have the server processes been restarted since this behavior started occurring? If not, is that a possibility?

Sorry again for the delay in getting back with you on this issue.

Thanks,
Michael

On Tue, Jan 04, 2011 at 01:48:20PM -0500, Bill Wichser wrote:
I've deleted those files with the native pvfs2-rm command which informed me to run pvfs2-fsck. Running pvfs2-validate turned up a number more which I removed. So there is nothing to pvfs2-viewdist on.

FWIW I'm running a meta on the head and the I/O servers on 16 compute nodes, version 2.8.2

[root@della3 bill]# pvfs2-stat /scratch/pvfs2
-------------------------------------------------------
  File Name     : /scratch/pvfs2
  Relative Name : /
  fs ID         : 1922795883
  Handle        : 1048576
  Mask          : 504000177
  Permissions   : 777
  Type          : Directory
  Size          : 4096
  Owner         : 0 (root)
  Group         : 0 (root)
  atime         : 1294130281 (Tue Jan  4 03:38:01 2011)
  mtime         : 1293499466 (Mon Dec 27 20:24:26 2010)
  ctime         : 1293499462 (Mon Dec 27 20:24:22 2010)
  dir entries   : 6

[root@della3 bill]# pvfs2-validate -d /scratch/pvfs2/
pvfs2-validate starting validation at object [/scratch/pvfs2]
pvfs2-validate done validating object tree at [/scratch/pvfs2]

[root@della3 bill]# pvfs2-fsck -p -m /scratch/pvfs2
# Current FSID is 1922795883.
Ugh! Server 1, Received 64789 total handles instead of 64792

So the total handles have changed, as expected because of the removals, but the difference is the same. Now to be honest, when I made that filesystem, I didn't run an fsck so it could be a remnant from last month. I don't know. But we have a bunch of Genomics people wrecking havoc with those strange files in kernel space. I was able to do an pvfs2-ls on them (user space) but didn't really pursue, hoping instead to just make the problem go away!

Thanks,
Bill

Michael Moore wrote:
Hi Bill,

Can you provide the output of pvfs2-stat on the parent directory and affected files and 'pvfs2-viewdist -f <path>' on the affected files?

Do you see any complaints in the server logs related to accessing these files?

Michael

On Mon, Jan 03, 2011 at 08:04:02AM -0500, Bill Wichser wrote:
Having some trouble with my filesystem. There are a few files which did not get written correctly by one of the users and some corruption looks to be present.

# ls -lR
./3689_old:
total 0
?--------- ? ? ? ?            ? clusmax.out

./3764_old:
total 0
?--------- ? ? ? ?            ? traj.xtc

These cannot be removed. In the past, a run of pvfs2-fsck seemed to correct these types of problems but this time all I get is the following message and the fsck terminates. I'm not sure how to correct this. Googling leads me to the source code. Anyone have any suggestions?

# pvfs2-fsck -p -v -m /scratch/pvfs2
# Current FSID is 1922795883.
Ugh! Server 1, Received 64796 total handles instead of 64800


Thanks, and Happy New Year to all!
Bill

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to