Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8860 )
Change subject: design-docs: improve cfile.md ...................................................................... Patch Set 1: (5 comments) Filled in some things http://gerrit.cloudera.org:8080/#/c/8860/1/docs/design-docs/cfile.md File docs/design-docs/cfile.md: http://gerrit.cloudera.org:8080/#/c/8860/1/docs/design-docs/cfile.md@64 PS1, Line 64: TODO(dan): point out when/what version the v1 -> v2 transition was made. https://github.com/apache/kudu/commit/f82cf6918c00dff6aecad5a6b50836b7eb5db508 seems like 1.3? http://gerrit.cloudera.org:8080/#/c/8860/1/docs/design-docs/cfile.md@76 PS1, Line 76: How big is a data block in bytes and row count, typically? : - How do we decide when a data block is full (by data size, by # of values, ...)? : - For Prefix encoding, how many restart points can we expect to be in a single : block? - cfile block size is determined by the cfile_default_block_size flag (min is 512B), seems to dictate the size of the buffer we allocate for it in the BlockBuilder implementations - looking at a couple encoding types (rle, binary_plain), data is appended to a specified-sized buffer, and once it's at the end of the buffer, it's considered full, at which point the cblock is finished - not sure, although this information might be better suited for binary_prefix_block.h? Seems like it's missing a few details... http://gerrit.cloudera.org:8080/#/c/8860/1/docs/design-docs/cfile.md@113 PS1, Line 113: Group Varint Frame-Of-Reference Encoding apparently this is deprecated? See common.proto http://gerrit.cloudera.org:8080/#/c/8860/1/docs/design-docs/cfile.md@133 PS1, Line 133: TODO(dan): No discussion of dictionary encoding, and the associated dictionary : block. Not sure if we need all of the encodings described in this doc. For a list of them, there's common.proto, and at least for dictionary encoding, binary_dict_block.h is pretty informative, unless there's something you'd like to see that's missing. Maybe a map between encodings and the .h's? http://gerrit.cloudera.org:8080/#/c/8860/1/docs/design-docs/cfile.md@170 PS1, Line 170: ODO(dan): What's the branching factor of the B-Tree? : : TODO(dan): is the 'configured block size' the same for index and data blocks? : : TODO(dan): What does `Int` signify in the diagram below? There is some : explanation/background missing here. There should probably be some mention of concurrent_btree.h - Interesting. From concurrent_btree.h: keyptr_space = Traits::kInternalNodeSize - constant_overhead; kFanout = keyptr_space / (sizeof(KeyInlineSlice) + sizeof(NodePtr<Traits>)) - from cfile_writer.cc, index_block_size is 32KB - I think "Int" refers to an InternalNode on CBTree -- To view, visit http://gerrit.cloudera.org:8080/8860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I770028bba3f7a49c96f32893c285221c84be39ce Gerrit-Change-Number: 8860 Gerrit-PatchSet: 1 Gerrit-Owner: Dan Burkert <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Dan Burkert <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Hao Hao <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Tue, 19 Dec 2017 09:27:30 +0000 Gerrit-HasComments: Yes
