Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10777 )
Change subject: KUDU-2260: Log block manager should handle null bytes in metadata on crash ...................................................................... Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/10777/2/src/kudu/util/pb_util.cc File src/kudu/util/pb_util.cc: http://gerrit.cloudera.org:8080/#/c/10777/2/src/kudu/util/pb_util.cc@318 PS2, Line 318: // This can happen e.g. on ext4 in the default data=ordered mode, when the : // filesize metadata is updated but the new data is not persisted. > how does this happen, though? data=ordered is supposed to be: The discussion Mike linked in his comment on KUDU-2260 explains it in a roundabout way (https://plus.google.com/+KentonVarda/posts/JDwHfAiLGNQ). If you read it make sure you expand all the comments as the page auto-folds the most important ones in the middle. Here's the closest thing to a "money quote" I could find: Kenton Kvarda: "However, I don't think anything you describe quite explains what I'm seeing. I am not seeing a "zero length" issue. I'm seeing the opposite: I'm seeing the length being ahead of the data. That is: - I opened a file containing M bytes of data. - I appended N bytes of data. - The system crashed before I could fsync(). - On restore, the file has size M + N, but the last N bytes are NUL, not the bytes I wrote. - Note that M + N in my case was less than the block size, which is 4k. So, no allocation should have occurred. Is this expected to be a possible outcome with ext4? It seems to me that with delayed allocation, the file size update should also be delayed, so that the size is updated immediately after the data is flushed. This expectation seems consistent with the documentation for data=ordered, which says:" ... Ted Ts'o: "We could delay the isize update until after the data block is written back, but in the case where the program is doing more than just an append, but modifying some of the bytes just before isize as well as appending to the file, it's important to note that there may be up to 5 seconds after the data block is written back and when the journal transaction containing the isize update is committed. So suppose the file contained the string "red blue" and the you seek back to the beginning and write "red white blue". There will be window where after a crash reading the file will return "red whit" since the isize field will not have been updated. Essentially, if you want both the data and isize field to be consistent after a crash, you have to use fsync. So data=ordered has only been engineered to fix the stale data security problem, and doesn't pretend to do anything else." Ts'o talks more about what data=ordered "really means" in that discussion as well. http://gerrit.cloudera.org:8080/#/c/10777/2/src/kudu/util/slice.h File src/kudu/util/slice.h: http://gerrit.cloudera.org:8080/#/c/10777/2/src/kudu/util/slice.h@302 PS2, Line 302: ATTRIBUTE_NO_SANITIZE_THREAD > is this attribute needed in the header? thought it would only get picked up Do you mean you thought it would only get picked up on the definition? From https://clang.llvm.org/docs/AttributeReference.html#no-sanitize-thread: "Use __attribute__((no_sanitize_thread)) on a function declaration to specify that checks for data races on plain (non-atomic) memory accesses should not be inserted by ThreadSanitizer." -- To view, visit http://gerrit.cloudera.org:8080/10777 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0af5c9dbbe28afe7a179595ad20392b99cde2a1b Gerrit-Change-Number: 10777 Gerrit-PatchSet: 2 Gerrit-Owner: Will Berkeley <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Thu, 21 Jun 2018 17:13:51 +0000 Gerrit-HasComments: Yes
