Grant Henke has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 13:

(25 comments)

http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile-test.cc
File src/kudu/cfile/cfile-test.cc:

Line 244:     if (FLAGS_cfile_write_checksums) {
> Can you make TestCFile a friend of CFileReader (see the FRIEND_TEST macro) 
Done


Line 330:     source->Size(&file_size);
> RETURN_NOT_OK
Done


Line 333:     source->Read(0, &data);
> RETURN_NOT_OK
Done


Line 843:   source->Size(&file_size);
> Should ASSERT_OK on this too.
Done


Line 847:     Status s = CorruptAndReadBlock(id, i, 254);
> Just in case one of the file's bytes actually has the value 254, maybe you 
I had something similar to this at one point but I worried I was getting to 
clever in a test. I figured a simple static value that was known to corrupt all 
bytes in the given test would be a clear way to fail fast if something in the 
test changed. I will take a look again now that my corrupt code is more 
straightforward.


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 56: TAG_FLAG(cfile_verify_checksums, experimental);
> Should probably default to evolving.
Done. I am not sure I am clear on the difference between experimental and 
evolving.


Line 137: Status CFileReader::ReadMagicAndLength(uint64_t offset, Slice* slice) 
{
> Since this function is now just a block read, I think it's net more readabl
I only kept it for the sake of the TRACE_EVENT1. Can remove it.


Line 210:     if (header.size() != header_size || checksum.size() != 
checksum_size) {
> Do we need this check now that there are no short reads?
Done


Line 215:     if (FLAGS_cfile_verify_checksums) {
> If this is false, why bother reading the checksum out of the block at all?
Done


Line 226:     RETURN_NOT_OK(block_->Read(off, &header));
> Would be cleaner to call ReadV() regardless of whether checksums exist or n
Done


Line 227:     if (header.size() != header_size) {
> Do we need this check now that there are no short reads?
Done


Line 273:   // This is done to avoid the need for a follow up read call.
> Just to be clear, if there are no checksums, the four extra bytes we're rea
That is exactly the choice being made. An earlier version of the patch did this 
in 2 reads. However because we know the header alone has enough bytes to 
support the over read, it should be safe to do this. 

The draw back here is that cfiles without checksums read 4 bytes more than 
required.


Line 277:   if (footer.size() != footer_size || checksum.size() != 
checksum_size) {
> No short reads; do we still need this?
Done


Line 283:   // incompatible_features flag tells us if a checksum exists at all.
> What's the point of the footer checksum then? It seems extraordinarily rare
Though rare, I am viewing checksumming as an all or nothing feature.

It's possible the contents of the protobuf could be parsable but incorrect. In 
the case parsing fails, we know its corrupt. In the case parsing succeeds the 
checksum ensures we parsed the right thing.

Given the importance of the fields in the footer I think it's worth being sure.


Line 438:     }
> This effectively forbids zero-length block reads, right? Is that an invaria
This is mainly to prevent data_size -= checksum_size from becoming negative and 
resulting in a cryptic error.


Line 461:     if (block.size() != data_size || checksum.size() != 
checksum_size) {
> No short reads...
Done


Line 474:     }
> Maybe it would even be possible to incorporate the ReadV() too? Basically a
Good idea. The one place that is different is the footer since we 
"preemptively" read checksums data and the checksum comes before the data. This 
could be reused for the header and data blocks though.


Line 476:     RETURN_NOT_OK(block_->Read(ptr.offset(), &block));
> Let's use ReadV() for both cases, and vary whether we pass a one-slice vect
Done


Line 477:     if (block.size() != data_size) {
> No short reads...
Done


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

Line 188:   // Returns true if the file has checksums on the header, footer, 
and data blocks
> Nit: terminate with a period.
Done


Line 189:   bool hasChecksums() const;
> Nit: should be either has_checksums() or HasChecksums()
Done


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_writer.cc
File src/kudu/cfile/cfile_writer.cc:

Line 67: TAG_FLAG(cfile_write_checksums, experimental);
> 1. Why default to false?
I describe the defaults a bit in the patch summary. Think I should add a more 
detailed comment here?

Here is a quote from the patch summary:
   
   cfile_write_checksums is defaulted to false to ensure upgrades don't
   immediately result in performance degredation and incompatible data on
   downgrade. It can and likely should be defaulted to true in a later release.

Essentially I think they should be opt in for at least 1 release. This also 
ensure we support a downgrade of at least 1 release.


Line 263:   // Prepend a footer checksum.
> Any idea why we're using block_->Append() here and not WriteRawData() as in
I saw the same thing but don't have a great answer. I can change it if you 
think I should. I opted to follow the existing code since I was unsure.


Line 494:     RETURN_NOT_OK(WriteRawData(data));
> Hmm, a little surprised that Todd didn't suggest coalescing these into one 
As I was reading through the comments I started getting a similar feeling that 
WriteV would be a good idea. I can definitely look into adding it for use in 
this patch following the ReadV() patch.


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/util/crc-test.cc
File src/kudu/util/crc-test.cc:

Line 58:   ASSERT_EQ(0xa9421b7, data_crc); // Known value from crcutil usage 
test program.
> Since this is used 3 times now, store it up front in a "const uint64_t kKno
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to