Andrew Wong has posted comments on this change.

Change subject: Blogpost describing predicate evaluation pushdown

Patch Set 4:

File _posts/

PS3, Line 16: predicate is
> Nit: "predicates are" (there can be more than one, after all)

PS3, Line 17: its
> What is "its" referring to here? Shouldn't this just be "Kudu's"?

PS3, Line 20: evaluation
> This early in the post it's not yet clear what "evaluation" means. Maybe us
Done. Hmm, "predicate evaluation" is fine I think, I wanted to keep the intro 
relatively devoid of more technical details.

PS3, Line 50: and the character
            : offsets to the start of each word are stored as a list of integers
> What does "word" mean in this context? Not literally "space-separated unit 
Right, I mean "string".

Line 121: ![png]({{ site.github.url 
}}/img/predicate-pushdown/pushdown-tpch.png){: .img-responsive}
> What does "Average Query Batch Time" mean in this graph? Isn't "Average" im
Right, "Average" is implied, although in a slightly different context to "Scan 
Time". The TPC-H runs used scanners, whereas the above tests were unit tests 
solely timing the scan time of a tablet, so the TPC-H encompassed all of the 
overhead associated with RPCs and the wire (versus just the tablet scans).
File _posts/

PS4, Line 47: Kudu will
            : automatically switch to _plain encoding_.
> Nit: consider using the present tense instead: 'When the vocabulary of a di

PS4, Line 75: then
> Nit: consider dropping 'then'.

PS4, Line 78: checks
> check

PS4, Line 83: ends
> end

PS4, Line 82: when there are too many unique values to fit in a dictionary, 
            : suffers
> I'm confused a bit.  From the above, I can see that if there are too many w
Dictionary blocks have two different blocks: a vocabulary block that contains 
all of the words in the dictionary, and a data block that stores the data 
When the vocabulary fills up, the whole dictionary block doesn't get swapped to 
a plain block. The data block that was an integer-storing (bshuf) block becomes 
a plain encoded block. This keeps the encoding type of each column consistent, 
since the data is technically still stored in a dictionary encoded format, just 
in "plain" mode. And therefore there still is a dictionary bitset that doesn't 
get used by blocks that to switch to "plain" mode. It should be noted that a 
dictionary is shared among an entire cfile, not just the block.
Will come up with a less verbose way of saying this.

PS4, Line 124: consists
> Nit: consider replacing 'consists' with 'includes'.  Then no need for 'of' 

PS4, Line 126: query
> Nit: 'query performance' or 'querying performance'?
Done, I think query performance is good.

PS4, Line 144:  to 1.0
> this looks like a strayed part.
version 1.0? 1.0 release?

PS4, Line 144:  Top
> a Top Level Apache project

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3b593959194a6ce9190f562339dc04a1d8fceba
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Andrew Wong <>
Gerrit-Reviewer: Adar Dembo <>
Gerrit-Reviewer: Alexey Serbin <>
Gerrit-Reviewer: Andrew Wong <>
Gerrit-Reviewer: Dan Burkert <>
Gerrit-Reviewer: Jean-Daniel Cryans <>
Gerrit-Reviewer: Todd Lipcon <>
Gerrit-HasComments: Yes

Reply via email to