Re: [Pvfs2-developers] OrangeFS data corruption

Michael Moore Fri, 24 Jun 2011 07:43:36 -0700

On Fri, Jun 24, 2011 at 10:33 AM, Phil Carns <[email protected]> wrote:


> On 06/24/2011 10:15 AM, Michael Moore wrote:
>
>> With some additional offline information from Benjamin the problem has
>> been tracked down to dbpf_bstream_direct_write_op_**svc(). The issue is
>> that two write calls to different, contiguous, sections of the file
>> occur without locking around retrieval of the current file size. The
>> flow is similar to this, assuming two writes X -> Y, Y+1 - >Z
>> both writes enter dbpf_bstream_direct_write_op_**svc()
>> write X->Y gets the current file size
>> write X->Y makes the pwrite call
>> write Y+1 -> Z gets the current file size
>> write X->Y updates the file size
>> write Y+1 -> Z makes the pwrite call (padding zeros from the previous
>> end of file)
>> write Y+1 -> Z updates the file size
>>
>> I can certainly add some locking to prevent this. Mostly to Phil or
>> Sam, was there something in place that should be preventing this
>> before I add another wheel?
>>
>
> I can't speak for Sam, but your analysis sounds correct to me.  I guess it
> is the "padding zeros" part that is corrupting the data, right? Thanks for
> tracking that down!
>
>
>  I did try moving the flocks from direct_locked_write() around the get
>> file size and update but it looks like the fd is being closed causing
>> the locks to be tossed.
>>
>
> I think it is an fcntl lock, right?  Either way that would probably be
> tricky to use to protect the file size query.  I think that hits the db
> rather than the underlying file so it won't be affected by the lock.
>
>
You are correct, it's an fnctl lock. While it protects the underlying file,
the range of the lock in these cases is over-lapping due to alignment so it
should block on the second entry and, I was hoping, in effect provide a
per-bstream, byte range based lock. However, as I wrote that I realized in a
case where the write is aligned using the fcntl lock won't help.


> Kind of a separate topic, but if the fd is being closed then we might want
> to look into that too.  Trove has an fd open cache that is supposed to keep
> it from repeatedly opening and closing the same underlying file.
>

I didn't dig into it too far, I was seeing 0 as the fd for the unlock
fcntl() and just assumed that as the cause since the fcntl lock wasn't
blocking. I'll double check that behavior while I'm in there.

Thanks for the feedback!

Michael


>
> -Phil
>
> ______________________________**_________________
> Pvfs2-developers mailing list
> Pvfs2-developers@beowulf-**underground.org<[email protected]>
> http://www.beowulf-**underground.org/mailman/**listinfo/pvfs2-developers<http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers>
>

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] OrangeFS data corruption

Reply via email to