Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8860 )

Change subject: design-docs: improve cfile.md
......................................................................


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8860/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8860/1//COMMIT_MSG@19
PS1, Line 19: attrocious
nit: atrocious


http://gerrit.cloudera.org:8080/#/c/8860/1/docs/design-docs/cfile.md
File docs/design-docs/cfile.md:

http://gerrit.cloudera.org:8080/#/c/8860/1/docs/design-docs/cfile.md@76
PS1, Line 76: How big is a data block in bytes and row count, typically?
            :   - How do we decide when a data block is full (by data size, by 
# of values, ...)?
            :   - For Prefix encoding, how many restart points can we expect to 
be in a single
            :     block?
> - cfile block size is determined by the cfile_default_block_size flag (min
It'd also be nice to clarify the behavior if a value overflows the buffer. Do 
we extend the buffer to fit it or truncate the buffer and put the value first 
in the next cblock? What happens if a single cell value is too big for a block?


http://gerrit.cloudera.org:8080/#/c/8860/1/docs/design-docs/cfile.md@86
PS1, Line 86: group-varint coded
> Not sure if group-varint encoding is also deprecated for this?
It's not.


http://gerrit.cloudera.org:8080/#/c/8860/1/docs/design-docs/cfile.md@100
PS1, Line 100: restart point" which is necessary for
             : faster binary searching.
> A bit more explanation on how it is related to faster binary searching?
Without restarts, the nth value in the block has to be computed from values 
1..(n - 1), so binary searching into the block is not possible without decoding 
all previous values. With restart points, binary searching can find the largest 
restart point <= the desired value, and decode forward from there.


http://gerrit.cloudera.org:8080/#/c/8860/1/docs/design-docs/cfile.md@133
PS1, Line 133: TODO(dan): No discussion of dictionary encoding, and the 
associated dictionary
             :            block.
> Yeah, I think it would be useful to link the more detail doc on .h
+1 to leaving out the details of encodings here and just referring elsewhere.


http://gerrit.cloudera.org:8080/#/c/8860/1/docs/design-docs/cfile.md@199
PS1, Line 199: my best guess is
fwiw, I agree with your guess


http://gerrit.cloudera.org:8080/#/c/8860/1/docs/design-docs/cfile.md@222
PS1, Line 222: queries like: "seek to the data block
             : containing the Nth entry in this CFile".
> Should we add some insight on from which layer these queries are issued?
+1. I'm thinking it'd be used to skip forward to index i in CFiles for 
non-primary key columns when the value index was used to skip forward and ended 
up at index i?



--
To view, visit http://gerrit.cloudera.org:8080/8860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I770028bba3f7a49c96f32893c285221c84be39ce
Gerrit-Change-Number: 8860
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Burkert <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Hao Hao <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Comment-Date: Wed, 20 Dec 2017 21:52:15 +0000
Gerrit-HasComments: Yes

Reply via email to