Hey Phil, Yes, it is running 2.8.2. My setup was using 3 servers with 2.6.18-194.el5 kernels and High Availability. I have not had a chance yet to try it on another file system, so I do not know if it is specific to that setup. It has been triggered from more than one client, but the only know I know for certain was running a 2.6.9-89.ELsmp kernel.
Bart. On Fri, Jun 18, 2010 at 7:39 AM, Phil Carns <[email protected]> wrote: > Hi Bart, > > Is this on 2.8.2? Do you happen to know how many servers are needed to > trigger the problem? > > thanks, > -Phil > > > On 06/17/2010 04:08 PM, Bart Taylor wrote: > > > Hey guys, > > We have had some problems in the past on 2.6 with file creations leaving > bad > files that we cannot delete. Most utilities like ls and rm return "No such > file > or directory", and pvfs utilities like viewdist, pvfs2-ls, and pvfs2-rm > return > various errors. We have resorted to looking up the parent handle, the fsid, > and > filename and using pvfs2-remove-object to delete the entry. But we weren't > ever > able to intentionally recreate the problem. > > Recently while testing 2.8, I have been able to reliably trigger a similar > scenario where a file creation fails and leaves a garbage entry that cannot > be > deleted in any of the normal ways requiring the pvfs2-remove-object > approach to > clean up. The file and various outputs for this case: > > [r...@client dir]# ls -l 2010.06.10.28050 > total 0 > ?--------- ? ? ? ? ? File17027 > > [r...@client dir]# rm 2010.06.10.28050/File17027 > rm: cannot lstat `2010.06.10.28050/File17027': No such file or directory > > [r...@client dir]# rm -rf 2010.06.10.28050 > rm: cannot remove directory `2010.06.10.28050': Directory not empty > > [r...@client dir]# pvfs2-rm 2010.06.10.28050/File17027 > Error: An error occurred while removing 2010.06.10.28050/File17027 > PVFS_sys_remove: No such file or directory (error class: 0) > > [r...@client dir]# pvfs2-stat 2010.06.10.28050/File17027 > PVFS_sys_lookup: No such file or directory (error class: 0) > Error stating [2010.06.10.28050/File17027] > > [r...@client dir]# pvfs2-viewdist -f 2010.06.10.28050/File17027 > PVFS_sys_lookup: No such file or directory (error class: 0) > Could not open 2010.06.10.28050/File17027 > > [r...@client dir]# ls -l 2010.06.10.28050 > total 0 > ?--------- ? ? ? ? ? File17027 > > > I have included a test script that will spawn off a number of processes, > open a > bunch of files, write to each of them, then close them. You can tweak the > options as you want but using 5 processes and 50,000 files will usually > create > at least one of these files. Here is an example command: > > $> ulimit -n 1000000 && ./open-file-limit --num-files=50000 --sleep-time=1 > --num-processes=5 --directory=/mnt/pvfs2/ --file-size=1 > > You may have to do a long listing on any left-over directories to find the > file(s). > > I will give any help I can to help recreate the bad file or find the cause. > > Until then, is there a better (simpler) way to remove these entries, maybe > some sort of utility that doesn't require doing manual handle lookups > before > getting the file removed? It would ease some support pain if it were > simpler to > fix. > > Thanks for your help, > Bart. > > > _______________________________________________ > Pvfs2-developers mailing > [email protected]http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers > > > > _______________________________________________ > Pvfs2-developers mailing list > [email protected] > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers > >
_______________________________________________ Pvfs2-developers mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
