Re: [Pvfs2-developers] OrangeFS data corruption

Phil Carns Fri, 24 Jun 2011 08:04:36 -0700

On 06/24/2011 10:43 AM, Michael Moore wrote:

On Fri, Jun 24, 2011 at 10:33 AM, Phil Carns <[email protected]<mailto:[email protected]>> wrote:


    On 06/24/2011 10:15 AM, Michael Moore wrote:

        With some additional offline information from Benjamin the
        problem has
        been tracked down to dbpf_bstream_direct_write_op_svc(). The
        issue is
        that two write calls to different, contiguous, sections of the
        file
        occur without locking around retrieval of the current file
        size. The
        flow is similar to this, assuming two writes X -> Y, Y+1 - >Z
        both writes enter dbpf_bstream_direct_write_op_svc()
        write X->Y gets the current file size
        write X->Y makes the pwrite call
        write Y+1 -> Z gets the current file size
        write X->Y updates the file size
        write Y+1 -> Z makes the pwrite call (padding zeros from the
        previous
        end of file)
        write Y+1 -> Z updates the file size

        I can certainly add some locking to prevent this. Mostly to
        Phil or
        Sam, was there something in place that should be preventing this
        before I add another wheel?


    I can't speak for Sam, but your analysis sounds correct to me.  I
    guess it is the "padding zeros" part that is corrupting the data,
    right? Thanks for tracking that down!


        I did try moving the flocks from direct_locked_write() around
        the get
        file size and update but it looks like the fd is being closed
        causing
        the locks to be tossed.


    I think it is an fcntl lock, right?  Either way that would
    probably be tricky to use to protect the file size query.  I think
    that hits the db rather than the underlying file so it won't be
    affected by the lock.

You are correct, it's an fnctl lock. While it protects the underlyingfile, the range of the lock in these cases is over-lapping due toalignment so it should block on the second entry and, I was hoping, ineffect provide a per-bstream, byte range based lock.


Oh, right.

However, as I wrote that I realized in a case where the write isaligned using the fcntl lock won't help.

I'm not sure. It might be possible with fcntl locking to protect beyondEOF if that would help in that case.

    Kind of a separate topic, but if the fd is being closed then we
    might want to look into that too.  Trove has an fd open cache that
    is supposed to keep it from repeatedly opening and closing the
    same underlying file.
I didn't dig into it too far, I was seeing 0 as the fd for the unlockfcntl() and just assumed that as the cause since the fcntl lock wasn'tblocking. I'll double check that behavior while I'm in there.

0 is possible (since the server closes stdout etc.), but that does seempretty suspicious...


-Phil

Thanks for the feedback!

Michael


    -Phil

    _______________________________________________
    Pvfs2-developers mailing list
    [email protected]
    <mailto:[email protected]>
    http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] OrangeFS data corruption

Reply via email to