Sorry, I gave you the wrong version; we have 4.8.30.

Becky
-- 
Becky Ligon
PVFS Developer
Clemson University
864-650-4065

> Are you using the latest version of Berkeley DB?
>
> We ran into a problem about a year ago with a user who was
> reading,writing,and deleting the same file over and over again and
> discovered that the older versions of Berkeley DB had threading issues.
> We upgraded our environment to use 4.3.29 and haven't seen any problems
> along those lines since.
>
> Becky
> --
> Becky Ligon
> PVFS Developer
> Clemson University
> 864-650-4065
>
>> Hey Phil,
>>
>> After a little wrangling and a discussion or two with you off-list, I
>> tested
>> both of the patches you sent, the steps to create one of the files in
>> this
>> state, and the outcome on 2.8.
>>
>> Both patches work as expected. Adding the ENOENT case to sys-remove does
>> indeed allow pvfs2-rm to remove the bad file entry; it does however
>> leave
>> the datafile bstreams stranded. I am not sure if anything can be done
>> about
>> that.  The second set of changes now allows errors to propagate to the
>> client and pvfs2-lsplus to print which file is exhibiting a problem.
>> Thanks
>> for those! It will definitely help with cleanup.
>>
>> Removing the metadata object for a file does indeed produce the same
>> symptoms we are seeing. It produces a similar effect on 2.8 as well. I
>> believe I was working with Sam and possibly you on this a few weeks ago
>> but
>> had to drop it for something more urgent. Our conversation can be found
>> here:
>>
>> http://www.beowulf-underground.org/pipermail/pvfs2-developers/2010-June/004605.html
>>
>> In 2.8, pvfs2-rm does not currently remove the file entries. Sam had the
>> same ENOTENT fix for 2.8. After creating a file and removing the
>> metadata
>> object, it allows the file entry to be removed. I believe Jim Kusznir
>> may
>> have been experiencing similar issues when he posted to the Users list
>> here:
>>
>> http://www.beowulf-underground.org/pipermail/pvfs2-users/2010-September/003186.html
>>
>> It appears that something is still triggering this issue in 2.8. Based
>> on
>> my
>> own anecdotal evidence with 2.6 and 2.8, it looks like it can happen
>> when
>> a
>> file system is getting hammered with creates and/or deletes. My test
>> case
>> to
>> reproduce on 2.8 involved several threads executing a script that
>> creates,
>> opens, writes, and then deletes a file many thousands of times. The
>> problems
>> we have had on 2.6 also correlate to heavy loads of file creations and
>> deletions.
>>
>> Anyone have thoughts on why some files are getting left without their
>> metadata object?
>>
>> Bart.
>>
>>
>>
>> On Mon, Oct 11, 2010 at 10:59 AM, Phil Carns <[email protected]> wrote:
>>
>>> On 10/11/2010 11:42 AM, Phil Carns wrote:
>>>
>>>>
>>>>
>>>>>> - how to make pvfs2-rm safely remove what it can (even if via a
>>>>>> "force"
>>>>>> option)
>>>>>> - how to get pvfs2-lsplus (and probably other utilities and/or
>>>>>> kernel
>>>>>> module as well) to report a sane error message instead of the
>>>>>> "Invalid
>>>>>> object" message
>>>>>>
>>>>>
>>>> The attached patch fixes the first problem (assuming I'm looking at
>>>> the
>>>> right scenario).  For some
>>>>
>>>
>>> ... and this attached patch fixes the second problem.  If I do an
>>> lsplus
>>> with it on a broken file I now see this:
>>>
>>>
>>> [pca...@pcarns-laptop admin]$ ./pvfs2-lsplus -alh /mnt/pvfs2/
>>> drwxrwxrwx    1 pcarns   pcarns          4.0K 2010-10-11 11:34 .
>>> drwxrwxrwx    1 pcarns   pcarns          4.0K 2010-10-11 11:34 ..
>>> (faked)
>>> a.dat: could not retrieve attributes: No such file or directory
>>>
>>> -rw-r--r--    1 pcarns   pcarns         16.0M 2010-10-11 10:53 c.dat
>>> -rw-r--r--    1 pcarns   pcarns         16.0M 2010-10-11 10:53 d.dat
>>> drwxrwxrwx    1 pcarns   pcarns          4.0K 2010-10-11 10:52
>>> lost+found
>>>
>>> That "<entry>: could not retrieve attributes: <error>" line of output
>>> isn't
>>> the prettiest thing in the world, but at least it shows the directory
>>> entry
>>> and an appropriate error message for it :)  Feel free to adjust
>>> pvfs2-lsplus.c appropriately if we need a different format there that
>>> looks
>>> more like /bin/ls.
>>>
>>> The problems in this path were shared by both the client and the
>>> server,
>>> but to make a long story short it wasn't propagating errors out
>>> correctly
>>> for individual attributes in a list-attr response.
>>>
>>> I have no idea what subset of these problems are relevant to the
>>> current
>>> code base.  The list-attr server state machine has since been rewritten
>>> using nested state machines, and pvfs2-lsplus has gone away (and its
>>> logic
>>> folded into pvfs2-ls instead).  Can someone try out the example from
>>> earlier
>>> in the email thread on trunk or 2.8 to see what happens?  We just need
>>> to
>>> create a file, remove the metadata object out from under it, and then
>>> try
>>> pvfs2-ls -alh on the directory and pvfs2-rm on the file to see what
>>> happens...
>>>
>>> thanks,
>>> -Phil
>>>
>>> _______________________________________________
>>> Pvfs2-developers mailing list
>>> [email protected]
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>
>>>
>> _______________________________________________
>> Pvfs2-developers mailing list
>> [email protected]
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>
> _______________________________________________
> Pvfs2-developers mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to