Does the job scheduler not prevent concurrent write requests on the same handle?
Thanks, Elaine From: [email protected] [mailto:[email protected]] On Behalf Of Phil Carns Sent: Friday, June 24, 2011 11:04 AM To: Michael Moore Cc: [email protected] Subject: Re: [Pvfs2-developers] OrangeFS data corruption On 06/24/2011 10:43 AM, Michael Moore wrote: On Fri, Jun 24, 2011 at 10:33 AM, Phil Carns <[email protected]> wrote: On 06/24/2011 10:15 AM, Michael Moore wrote: With some additional offline information from Benjamin the problem has been tracked down to dbpf_bstream_direct_write_op_svc(). The issue is that two write calls to different, contiguous, sections of the file occur without locking around retrieval of the current file size. The flow is similar to this, assuming two writes X -> Y, Y+1 - >Z both writes enter dbpf_bstream_direct_write_op_svc() write X->Y gets the current file size write X->Y makes the pwrite call write Y+1 -> Z gets the current file size write X->Y updates the file size write Y+1 -> Z makes the pwrite call (padding zeros from the previous end of file) write Y+1 -> Z updates the file size I can certainly add some locking to prevent this. Mostly to Phil or Sam, was there something in place that should be preventing this before I add another wheel? I can't speak for Sam, but your analysis sounds correct to me. I guess it is the "padding zeros" part that is corrupting the data, right? Thanks for tracking that down! I did try moving the flocks from direct_locked_write() around the get file size and update but it looks like the fd is being closed causing the locks to be tossed. I think it is an fcntl lock, right? Either way that would probably be tricky to use to protect the file size query. I think that hits the db rather than the underlying file so it won't be affected by the lock. You are correct, it's an fnctl lock. While it protects the underlying file, the range of the lock in these cases is over-lapping due to alignment so it should block on the second entry and, I was hoping, in effect provide a per-bstream, byte range based lock. Oh, right. However, as I wrote that I realized in a case where the write is aligned using the fcntl lock won't help. I'm not sure. It might be possible with fcntl locking to protect beyond EOF if that would help in that case. Kind of a separate topic, but if the fd is being closed then we might want to look into that too. Trove has an fd open cache that is supposed to keep it from repeatedly opening and closing the same underlying file. I didn't dig into it too far, I was seeing 0 as the fd for the unlock fcntl() and just assumed that as the cause since the fcntl lock wasn't blocking. I'll double check that behavior while I'm in there. 0 is possible (since the server closes stdout etc.), but that does seem pretty suspicious... -Phil Thanks for the feedback! Michael -Phil _______________________________________________ Pvfs2-developers mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________ Pvfs2-developers mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
