Hey Phil,

Yes, it is running 2.8.2.  My setup was using 3 servers with 2.6.18-194.el5
kernels and High Availability. I have not had a chance yet to try it on
another file system, so I do not know if it is specific to that setup. It
has been triggered from more than one client, but the only know I know for
certain was running a 2.6.9-89.ELsmp kernel.

Bart.


On Fri, Jun 18, 2010 at 7:39 AM, Phil Carns <[email protected]> wrote:

>  Hi Bart,
>
> Is this on 2.8.2?  Do you happen to know how many servers are needed to
> trigger the problem?
>
> thanks,
> -Phil
>
>
> On 06/17/2010 04:08 PM, Bart Taylor wrote:
>
>
> Hey guys,
>
> We have had some problems in the past on 2.6 with file creations leaving
> bad
> files that we cannot delete. Most utilities like ls and rm return "No such
> file
> or directory", and pvfs utilities like viewdist, pvfs2-ls, and pvfs2-rm
> return
> various errors. We have resorted to looking up the parent handle, the fsid,
> and
> filename and using pvfs2-remove-object to delete the entry. But we weren't
> ever
> able to intentionally recreate the problem.
>
> Recently while testing 2.8, I have been able to reliably trigger a similar
> scenario where a file creation fails and leaves a garbage entry that cannot
> be
> deleted in any of the normal ways requiring the pvfs2-remove-object
> approach to
> clean up. The file and various outputs for this case:
>
> [r...@client dir]# ls -l 2010.06.10.28050
> total 0
> ?---------  ? ? ? ?           ? File17027
>
> [r...@client dir]# rm 2010.06.10.28050/File17027
> rm: cannot lstat `2010.06.10.28050/File17027': No such file or directory
>
> [r...@client dir]# rm -rf 2010.06.10.28050
> rm: cannot remove directory `2010.06.10.28050': Directory not empty
>
> [r...@client dir]# pvfs2-rm 2010.06.10.28050/File17027
> Error: An error occurred while removing 2010.06.10.28050/File17027
> PVFS_sys_remove: No such file or directory (error class: 0)
>
> [r...@client dir]# pvfs2-stat 2010.06.10.28050/File17027
> PVFS_sys_lookup: No such file or directory (error class: 0)
> Error stating [2010.06.10.28050/File17027]
>
> [r...@client dir]# pvfs2-viewdist -f 2010.06.10.28050/File17027
> PVFS_sys_lookup: No such file or directory (error class: 0)
> Could not open 2010.06.10.28050/File17027
>
> [r...@client dir]# ls -l 2010.06.10.28050
> total 0
> ?---------  ? ? ? ?           ? File17027
>
>
> I have included a test script that will spawn off a number of processes,
> open a
> bunch of files, write to each of them, then close them. You can tweak the
> options as you want but using 5 processes and 50,000 files will usually
> create
> at least one of these files. Here is an example command:
>
> $> ulimit -n 1000000 && ./open-file-limit --num-files=50000 --sleep-time=1
> --num-processes=5 --directory=/mnt/pvfs2/ --file-size=1
>
> You may have to do a long listing on any left-over directories to find the
> file(s).
>
> I will give any help I can to help recreate the bad file or find the cause.
>
> Until then, is there a better (simpler) way to remove these entries, maybe
> some sort of utility that doesn't require doing manual handle lookups
> before
> getting the file removed? It would ease some support pain if it were
> simpler to
> fix.
>
> Thanks for your help,
> Bart.
>
>
> _______________________________________________
> Pvfs2-developers mailing 
> [email protected]http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
>
>
> _______________________________________________
> Pvfs2-developers mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
>
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to