I'm glad that cleaned up the problem. From what I saw the statfs calls 
use handle counts that are maintained in memory. On a restart of the 
servers the handle lists in memory are regenerated using the same calls 
the iterate-handles management calls use.

I don't have an answer for how the in-memory handle counts got out of 
sync. 

If you run into other fsck issues feel free to ping the list.

Thanks,
Michael

On Mon, Jan 17, 2011 at 09:24:48AM -0500, Bill Wichser wrote:
> The reboot of the I/O servers have now cleared up this failure.  There 
> is still some inconsistency
> in the filesystem while running pvfs2-fsck but I believe that I can 
> clean these up.
> 
> Thanks,
> Bill
> 
> Michael Moore wrote:
> > Hi Bill,
> >
> > Sorry for the delay. The difference appears to be between what the 
> > management iterate handles call returns and what statfs returns to fsck. 
> > I'm looking now to get a better understanding how statfs and the trove 
> > ledger stuff gets it's counts versus how iterate handles counts them. 
> >
> > In the mean time, have the server processes been restarted since this 
> > behavior started occurring? If not, is that a possibility?
> >
> > Sorry again for the delay in getting back with you on this issue.
> >
> > Thanks,
> > Michael
> >
> > On Tue, Jan 04, 2011 at 01:48:20PM -0500, Bill Wichser wrote:
> >   
> >> I've deleted those files with the native pvfs2-rm command which informed 
> >> me to run pvfs2-fsck.  Running pvfs2-validate turned up a number more 
> >> which I removed.  So there is nothing to pvfs2-viewdist on.
> >>
> >> FWIW I'm running a meta on the head and the I/O servers on 16 compute 
> >> nodes, version 2.8.2
> >>
> >> [root@della3 bill]# pvfs2-stat /scratch/pvfs2
> >> -------------------------------------------------------
> >>   File Name     : /scratch/pvfs2
> >>   Relative Name : /
> >>   fs ID         : 1922795883
> >>   Handle        : 1048576
> >>   Mask          : 504000177
> >>   Permissions   : 777
> >>   Type          : Directory
> >>   Size          : 4096
> >>   Owner         : 0 (root)
> >>   Group         : 0 (root)
> >>   atime         : 1294130281 (Tue Jan  4 03:38:01 2011)
> >>   mtime         : 1293499466 (Mon Dec 27 20:24:26 2010)
> >>   ctime         : 1293499462 (Mon Dec 27 20:24:22 2010)
> >>   dir entries   : 6
> >>
> >> [root@della3 bill]# pvfs2-validate -d /scratch/pvfs2/
> >> pvfs2-validate starting validation at object [/scratch/pvfs2]
> >> pvfs2-validate done validating object tree at [/scratch/pvfs2]
> >>
> >> [root@della3 bill]# pvfs2-fsck -p -m /scratch/pvfs2
> >> # Current FSID is 1922795883.
> >> Ugh! Server 1, Received 64789 total handles instead of 64792
> >>
> >> So the total handles have changed, as expected because of the removals, 
> >> but the difference is the same.  Now to be honest, when I made that 
> >> filesystem, I didn't run an fsck so it could be a remnant from last 
> >> month.  I don't know.  But we have a bunch of Genomics people wrecking 
> >> havoc with those strange files in kernel space.  I was able to do an 
> >> pvfs2-ls on them (user space) but didn't really pursue, hoping instead 
> >> to just make the problem go away!
> >>
> >> Thanks,
> >> Bill
> >>
> >> Michael Moore wrote:
> >>     
> >>> Hi Bill,
> >>>
> >>> Can you provide the output of pvfs2-stat on the parent directory 
> >>> and affected files and 'pvfs2-viewdist -f <path>' on the affected files?
> >>>
> >>> Do you see any complaints in the server logs related to accessing these 
> >>> files?
> >>>
> >>> Michael
> >>>
> >>> On Mon, Jan 03, 2011 at 08:04:02AM -0500, Bill Wichser wrote:
> >>>   
> >>>       
> >>>> Having some trouble with my filesystem.  There are a few files which did 
> >>>> not get written correctly by one of the users and some corruption looks 
> >>>> to be present.
> >>>>
> >>>> # ls -lR
> >>>> ./3689_old:
> >>>> total 0
> >>>> ?--------- ? ? ? ?            ? clusmax.out
> >>>>
> >>>> ./3764_old:
> >>>> total 0
> >>>> ?--------- ? ? ? ?            ? traj.xtc
> >>>>
> >>>> These cannot be removed.  In the past, a run of pvfs2-fsck seemed to 
> >>>> correct these types of problems but this time all I get is the following 
> >>>> message and the fsck terminates.  I'm not sure how to correct this.  
> >>>> Googling leads me to the source code.  Anyone have any suggestions?
> >>>>
> >>>> # pvfs2-fsck -p -v -m /scratch/pvfs2
> >>>> # Current FSID is 1922795883.
> >>>> Ugh! Server 1, Received 64796 total handles instead of 64800
> >>>>
> >>>>
> >>>> Thanks, and Happy New Year to all!
> >>>> Bill
> >>>>
> >>>> _______________________________________________
> >>>> Pvfs2-users mailing list
> >>>> [email protected]
> >>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
> >>>>     
> >>>>         
> > _______________________________________________
> > Pvfs2-users mailing list
> > [email protected]
> > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
> >   
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to