Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8860 )

Change subject: design-docs: improve cfile.md
......................................................................


Patch Set 1:

(5 comments)

Filled in some things

http://gerrit.cloudera.org:8080/#/c/8860/1/docs/design-docs/cfile.md
File docs/design-docs/cfile.md:

http://gerrit.cloudera.org:8080/#/c/8860/1/docs/design-docs/cfile.md@64
PS1, Line 64: TODO(dan): point out when/what version the v1 -> v2 transition 
was made.
https://github.com/apache/kudu/commit/f82cf6918c00dff6aecad5a6b50836b7eb5db508 
seems like 1.3?


http://gerrit.cloudera.org:8080/#/c/8860/1/docs/design-docs/cfile.md@76
PS1, Line 76: How big is a data block in bytes and row count, typically?
            :   - How do we decide when a data block is full (by data size, by 
# of values, ...)?
            :   - For Prefix encoding, how many restart points can we expect to 
be in a single
            :     block?
- cfile block size is determined by the cfile_default_block_size flag (min is 
512B), seems to dictate the size of the buffer we allocate for it in the 
BlockBuilder implementations
- looking at a couple encoding types (rle, binary_plain), data is appended to a 
specified-sized buffer, and once it's at the end of the buffer, it's considered 
full, at which point the cblock is finished
- not sure, although this information might be better suited for 
binary_prefix_block.h? Seems like it's missing a few details...


http://gerrit.cloudera.org:8080/#/c/8860/1/docs/design-docs/cfile.md@113
PS1, Line 113: Group Varint Frame-Of-Reference Encoding
apparently this is deprecated? See common.proto


http://gerrit.cloudera.org:8080/#/c/8860/1/docs/design-docs/cfile.md@133
PS1, Line 133: TODO(dan): No discussion of dictionary encoding, and the 
associated dictionary
             :            block.
Not sure if we need all of the encodings described in this doc. For a list of 
them, there's common.proto, and at least for dictionary encoding, 
binary_dict_block.h is pretty informative, unless there's something you'd like 
to see that's missing.
Maybe a map between encodings and the .h's?


http://gerrit.cloudera.org:8080/#/c/8860/1/docs/design-docs/cfile.md@170
PS1, Line 170: ODO(dan): What's the branching factor of the B-Tree?
             :
             : TODO(dan): is the 'configured block size' the same for index and 
data blocks?
             :
             : TODO(dan): What does `Int` signify in the diagram below? There 
is some
             : explanation/background missing here.
There should probably be some mention of concurrent_btree.h
- Interesting. From concurrent_btree.h:

 keyptr_space = Traits::kInternalNodeSize - constant_overhead;
 kFanout = keyptr_space / (sizeof(KeyInlineSlice) + sizeof(NodePtr<Traits>))

- from cfile_writer.cc, index_block_size is 32KB
- I think "Int" refers to an InternalNode on CBTree



--
To view, visit http://gerrit.cloudera.org:8080/8860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I770028bba3f7a49c96f32893c285221c84be39ce
Gerrit-Change-Number: 8860
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Burkert <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Hao Hao <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Tue, 19 Dec 2017 09:27:30 +0000
Gerrit-HasComments: Yes

Reply via email to