RE: [Pvfs2-developers] OrangeFS data corruption

Elaine Quarles Fri, 24 Jun 2011 09:28:30 -0700

Never mind! After looking at the request scheduler again I see that
concurrent write requests definitely *are* allowed.


 

Thanks,

Elaine

 

From: [email protected]
[mailto:[email protected]] On Behalf Of
Elaine Quarles
Sent: Friday, June 24, 2011 12:20 PM
To: [email protected]
Subject: RE: [Pvfs2-developers] OrangeFS data corruption

 

Does the job scheduler not prevent concurrent write requests on the same
handle?

 

Thanks,

Elaine

 

From: [email protected]
[mailto:[email protected]] On Behalf Of Phil
Carns
Sent: Friday, June 24, 2011 11:04 AM
To: Michael Moore
Cc: [email protected]
Subject: Re: [Pvfs2-developers] OrangeFS data corruption

 

On 06/24/2011 10:43 AM, Michael Moore wrote: 

On Fri, Jun 24, 2011 at 10:33 AM, Phil Carns <[email protected]> wrote:

On 06/24/2011 10:15 AM, Michael Moore wrote:

With some additional offline information from Benjamin the problem has
been tracked down to dbpf_bstream_direct_write_op_svc(). The issue is
that two write calls to different, contiguous, sections of the file
occur without locking around retrieval of the current file size. The
flow is similar to this, assuming two writes X -> Y, Y+1 - >Z
both writes enter dbpf_bstream_direct_write_op_svc()
write X->Y gets the current file size
write X->Y makes the pwrite call
write Y+1 -> Z gets the current file size
write X->Y updates the file size
write Y+1 -> Z makes the pwrite call (padding zeros from the previous
end of file)
write Y+1 -> Z updates the file size

I can certainly add some locking to prevent this. Mostly to Phil or
Sam, was there something in place that should be preventing this
before I add another wheel?

 

I can't speak for Sam, but your analysis sounds correct to me.  I guess it
is the "padding zeros" part that is corrupting the data, right? Thanks for
tracking that down! 

 

I did try moving the flocks from direct_locked_write() around the get
file size and update but it looks like the fd is being closed causing
the locks to be tossed.

 

I think it is an fcntl lock, right?  Either way that would probably be
tricky to use to protect the file size query.  I think that hits the db
rather than the underlying file so it won't be affected by the lock.


You are correct, it's an fnctl lock. While it protects the underlying file,
the range of the lock in these cases is over-lapping due to alignment so it
should block on the second entry and, I was hoping, in effect provide a
per-bstream, byte range based lock.


Oh, right.



However, as I wrote that I realized in a case where the write is aligned
using the fcntl lock won't help. 


I'm not sure.  It might be possible with fcntl locking to protect beyond EOF
if that would help in that case.  



 

Kind of a separate topic, but if the fd is being closed then we might want
to look into that too.  Trove has an fd open cache that is supposed to keep
it from repeatedly opening and closing the same underlying file.


I didn't dig into it too far, I was seeing 0 as the fd for the unlock
fcntl() and just assumed that as the cause since the fcntl lock wasn't
blocking. I'll double check that behavior while I'm in there.


0 is possible (since the server closes stdout etc.), but that does seem
pretty suspicious...

-Phil



Thanks for the feedback!

Michael
 


-Phil 


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

RE: [Pvfs2-developers] OrangeFS data corruption

Reply via email to