Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10777 )

Change subject: KUDU-2260: Log block manager should handle null bytes in 
metadata on crash
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10777/2/src/kudu/util/pb_util.cc
File src/kudu/util/pb_util.cc:

http://gerrit.cloudera.org:8080/#/c/10777/2/src/kudu/util/pb_util.cc@318
PS2, Line 318:     // This can happen e.g. on ext4 in the default data=ordered 
mode, when the
             :     // filesize metadata is updated but the new data is not 
persisted.
> how does this happen, though? data=ordered is supposed to be:
The discussion Mike linked in his comment on KUDU-2260 explains it in a 
roundabout way (https://plus.google.com/+KentonVarda/posts/JDwHfAiLGNQ). If you 
read it make sure you expand all the comments as the page auto-folds the most 
important ones in the middle.

Here's the closest thing to a "money quote" I could find:

Kenton Kvarda: "However, I don't think anything you describe quite explains 
what I'm seeing.

I am not seeing a "zero length" issue. I'm seeing the opposite: I'm seeing the 
length being ahead of the data. That is:
- I opened a file containing M bytes of data.
- I appended N bytes of data.
- The system crashed before I could fsync().
- On restore, the file has size M + N, but the last N bytes are NUL, not the 
bytes I wrote.
- Note that M + N in my case was less than the block size, which is 4k. So, no 
allocation should have occurred.

Is this expected to be a possible outcome with ext4?

It seems to me that with delayed allocation, the file size update should also 
be delayed, so that the size is updated immediately after the data is flushed. 
This expectation seems consistent with the documentation for data=ordered, 
which says:" ...

Ted Ts'o: "We could delay the isize update until after the data block is 
written back, but in the case where the program is doing more than just an 
append, but modifying some of the bytes just before isize as well as appending 
to the file, it's important to note that there may be up to 5 seconds after the 
data block is written back and when the journal transaction containing the 
isize update is committed.   So suppose the file contained the string "red 
blue" and the you seek back to the beginning and write "red white blue".   
There will be window where after a crash reading the file will return "red 
whit" since the isize field will not have been updated.

Essentially, if you want both the data and isize field to be consistent after a 
crash, you have to use fsync.  So data=ordered has only been engineered to fix 
the stale data security problem, and doesn't pretend to do anything else."

Ts'o talks more about what data=ordered "really means" in that discussion as 
well.


http://gerrit.cloudera.org:8080/#/c/10777/2/src/kudu/util/slice.h
File src/kudu/util/slice.h:

http://gerrit.cloudera.org:8080/#/c/10777/2/src/kudu/util/slice.h@302
PS2, Line 302: ATTRIBUTE_NO_SANITIZE_THREAD
> is this attribute needed in the header? thought it would only get picked up
Do you mean you thought it would only get picked up on the definition?

From https://clang.llvm.org/docs/AttributeReference.html#no-sanitize-thread:

"Use __attribute__((no_sanitize_thread)) on a function declaration to specify 
that checks for data races on plain (non-atomic) memory accesses should not be 
inserted by ThreadSanitizer."



--
To view, visit http://gerrit.cloudera.org:8080/10777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0af5c9dbbe28afe7a179595ad20392b99cde2a1b
Gerrit-Change-Number: 10777
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Comment-Date: Thu, 21 Jun 2018 17:13:51 +0000
Gerrit-HasComments: Yes

Reply via email to