Sorry, I gave you the wrong version; we have 4.8.30. Becky -- Becky Ligon PVFS Developer Clemson University 864-650-4065
> Are you using the latest version of Berkeley DB? > > We ran into a problem about a year ago with a user who was > reading,writing,and deleting the same file over and over again and > discovered that the older versions of Berkeley DB had threading issues. > We upgraded our environment to use 4.3.29 and haven't seen any problems > along those lines since. > > Becky > -- > Becky Ligon > PVFS Developer > Clemson University > 864-650-4065 > >> Hey Phil, >> >> After a little wrangling and a discussion or two with you off-list, I >> tested >> both of the patches you sent, the steps to create one of the files in >> this >> state, and the outcome on 2.8. >> >> Both patches work as expected. Adding the ENOENT case to sys-remove does >> indeed allow pvfs2-rm to remove the bad file entry; it does however >> leave >> the datafile bstreams stranded. I am not sure if anything can be done >> about >> that. The second set of changes now allows errors to propagate to the >> client and pvfs2-lsplus to print which file is exhibiting a problem. >> Thanks >> for those! It will definitely help with cleanup. >> >> Removing the metadata object for a file does indeed produce the same >> symptoms we are seeing. It produces a similar effect on 2.8 as well. I >> believe I was working with Sam and possibly you on this a few weeks ago >> but >> had to drop it for something more urgent. Our conversation can be found >> here: >> >> http://www.beowulf-underground.org/pipermail/pvfs2-developers/2010-June/004605.html >> >> In 2.8, pvfs2-rm does not currently remove the file entries. Sam had the >> same ENOTENT fix for 2.8. After creating a file and removing the >> metadata >> object, it allows the file entry to be removed. I believe Jim Kusznir >> may >> have been experiencing similar issues when he posted to the Users list >> here: >> >> http://www.beowulf-underground.org/pipermail/pvfs2-users/2010-September/003186.html >> >> It appears that something is still triggering this issue in 2.8. Based >> on >> my >> own anecdotal evidence with 2.6 and 2.8, it looks like it can happen >> when >> a >> file system is getting hammered with creates and/or deletes. My test >> case >> to >> reproduce on 2.8 involved several threads executing a script that >> creates, >> opens, writes, and then deletes a file many thousands of times. The >> problems >> we have had on 2.6 also correlate to heavy loads of file creations and >> deletions. >> >> Anyone have thoughts on why some files are getting left without their >> metadata object? >> >> Bart. >> >> >> >> On Mon, Oct 11, 2010 at 10:59 AM, Phil Carns <[email protected]> wrote: >> >>> On 10/11/2010 11:42 AM, Phil Carns wrote: >>> >>>> >>>> >>>>>> - how to make pvfs2-rm safely remove what it can (even if via a >>>>>> "force" >>>>>> option) >>>>>> - how to get pvfs2-lsplus (and probably other utilities and/or >>>>>> kernel >>>>>> module as well) to report a sane error message instead of the >>>>>> "Invalid >>>>>> object" message >>>>>> >>>>> >>>> The attached patch fixes the first problem (assuming I'm looking at >>>> the >>>> right scenario). For some >>>> >>> >>> ... and this attached patch fixes the second problem. If I do an >>> lsplus >>> with it on a broken file I now see this: >>> >>> >>> [pca...@pcarns-laptop admin]$ ./pvfs2-lsplus -alh /mnt/pvfs2/ >>> drwxrwxrwx 1 pcarns pcarns 4.0K 2010-10-11 11:34 . >>> drwxrwxrwx 1 pcarns pcarns 4.0K 2010-10-11 11:34 .. >>> (faked) >>> a.dat: could not retrieve attributes: No such file or directory >>> >>> -rw-r--r-- 1 pcarns pcarns 16.0M 2010-10-11 10:53 c.dat >>> -rw-r--r-- 1 pcarns pcarns 16.0M 2010-10-11 10:53 d.dat >>> drwxrwxrwx 1 pcarns pcarns 4.0K 2010-10-11 10:52 >>> lost+found >>> >>> That "<entry>: could not retrieve attributes: <error>" line of output >>> isn't >>> the prettiest thing in the world, but at least it shows the directory >>> entry >>> and an appropriate error message for it :) Feel free to adjust >>> pvfs2-lsplus.c appropriately if we need a different format there that >>> looks >>> more like /bin/ls. >>> >>> The problems in this path were shared by both the client and the >>> server, >>> but to make a long story short it wasn't propagating errors out >>> correctly >>> for individual attributes in a list-attr response. >>> >>> I have no idea what subset of these problems are relevant to the >>> current >>> code base. The list-attr server state machine has since been rewritten >>> using nested state machines, and pvfs2-lsplus has gone away (and its >>> logic >>> folded into pvfs2-ls instead). Can someone try out the example from >>> earlier >>> in the email thread on trunk or 2.8 to see what happens? We just need >>> to >>> create a file, remove the metadata object out from under it, and then >>> try >>> pvfs2-ls -alh on the directory and pvfs2-rm on the file to see what >>> happens... >>> >>> thanks, >>> -Phil >>> >>> _______________________________________________ >>> Pvfs2-developers mailing list >>> [email protected] >>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers >>> >>> >> _______________________________________________ >> Pvfs2-developers mailing list >> [email protected] >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers >> > > _______________________________________________ > Pvfs2-developers mailing list > [email protected] > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers > _______________________________________________ Pvfs2-developers mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
