Adar Dembo has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile ......................................................................
Patch Set 10: (21 comments) I didn't review the checksumming stuff yet, just looked at the rest and the plumbing. Since the ReadV() stuff turned out to be pretty substantial, could you move it into its own commit? Would be nice to finish reviewing it separately. http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: Line 210: bool has_checksums_; This will be padded to 8 bytes, which is actually quite a bit given how many CFileReaders we instantiate. Perhaps we can calculate this on the fly instead, using a mask on footer_'s features? http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/block_manager-test.cc File src/kudu/fs/block_manager-test.cc: PS10, Line 262: gscoped_ptr Nit: use unique_ptr. Actually, these scratch spaces are tiny; can you just allocate them on the stack? Line 267: ReadResult { Surprised that this style of struct initialization works; isn't it C99 but not C++11 compatible? http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: Line 38: class Env; No need for this anymore. Though if you decide to pass the vector by pointer as I suggested in env.h, you can omit the env.h include, forward declare ReadResult, and continue to forward declare Env. PS10, Line 162: result You mean results? http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/fs-test-util.h File src/kudu/fs/fs-test-util.h: PS10, Line 72: vector This is a header so we retain namespace qualifications (i.e. this should be std::vector). http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/crc.cc File src/kudu/util/crc.cc: PS10, Line 50: Nit: extra space here. http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/crc.h File src/kudu/util/crc.h: PS10, Line 38: crc32 Maybe previous_crc32 to be more clear? http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env.h File src/kudu/util/env.h: Line 371: struct ReadResult { Should provide a short high-level struct document too. PS10, Line 403: results I'm finding this to be somewhat confusing. Part of the issue is with the name ReadResult: the majority of stuff in it aren't "results"; they're actually input to the function. Maybe ReadEntry or ReadVEntry would be better? Part of the issue is that the Slice* means an extra level of indirection in the setup that callers have to do. Perhaps they can be regular Slices, that are default-initialized by the caller and set by the callee? That means 'results' should be passed by pointer, but I think that's a clearer model anyway, since we pass by pointer things that the function is expected to modify. Also, could you reorganize ReadResult so that the pure IN parameters precede IN/OUT and OUT? Meaning, the order should probably be length, scratch, then result. Line 536: // TODO (GH): Document Yep. :) http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_posix.cc File src/kudu/util/env_posix.cc: PS10, Line 175: const struct iovec * Should be const struct iovec* Line 180: return read_bytes; Nit: if you're going to elide the braces, merge this line into the conditional line. This makes it harder to accidentally mess up the conditional if another line is added. Line 182: break; Don't we need to update total_read_bytes with our partial read results before breaking? Line 334: virtual Status ReadV(uint64_t offset, vector<ReadResult> results) const OVERRIDE { Could you add TRACE_EVENT1() calls to ReadV() and Read()? Not sure why they were missing in the first place. PS10, Line 340: ReadResult Can be cref? Line 341: return iovec { Again, surprised this kind of initialization works in C++11. PS10, Line 347: results.size() arraylength(iov) is more idiomatic. Line 353: // Adjust slice sizes based on bytes read for short reads As we discussed, let's make short reads an error, for Read() too. Do it as a separate patch and layer this one on top of it. PS10, Line 355: ReadResult Nit: auto is fine for short-scoped types like this one. http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_util.h File src/kudu/util/env_util.h: Line 66: Status ReadVFully(RandomAccessFile* file, uint64_t offset, std::vector<ReadResult> results); If you end up solving the partial read stuff directly in RandomAccessFile, this won't be necessary. But if it is, please also doc it. -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant Henke <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
