Andrew Wong has posted comments on this change.

Change subject: Blogpost describing predicate evaluation pushdown
......................................................................


Patch Set 4:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/4384/3/_posts/predicate-pushdown.md
File _posts/predicate-pushdown.md:

PS3, Line 16: predicate is
> Nit: "predicates are" (there can be more than one, after all)
Done


PS3, Line 17: its
> What is "its" referring to here? Shouldn't this just be "Kudu's"?
Done


PS3, Line 20: evaluation
> This early in the post it's not yet clear what "evaluation" means. Maybe us
Done. Hmm, "predicate evaluation" is fine I think, I wanted to keep the intro 
relatively devoid of more technical details.


PS3, Line 50: and the character
            : offsets to the start of each word are stored as a list of integers
> What does "word" mean in this context? Not literally "space-separated unit 
Right, I mean "string".


Line 121: ![png]({{ site.github.url 
}}/img/predicate-pushdown/pushdown-tpch.png){: .img-responsive}
> What does "Average Query Batch Time" mean in this graph? Isn't "Average" im
Right, "Average" is implied, although in a slightly different context to "Scan 
Time". The TPC-H runs used scanners, whereas the above tests were unit tests 
solely timing the scan time of a tablet, so the TPC-H encompassed all of the 
overhead associated with RPCs and the wire (versus just the tablet scans).


http://gerrit.cloudera.org:8080/#/c/4384/4/_posts/predicate-pushdown.md
File _posts/predicate-pushdown.md:

PS4, Line 47: Kudu will
            : automatically switch to _plain encoding_.
> Nit: consider using the present tense instead: 'When the vocabulary of a di
Done


PS4, Line 75: then
> Nit: consider dropping 'then'.
Done


PS4, Line 78: checks
> check
Done


PS4, Line 83: ends
> end
Done


PS4, Line 82: when there are too many unique values to fit in a dictionary, 
performance
            : suffers
> I'm confused a bit.  From the above, I can see that if there are too many w
Dictionary blocks have two different blocks: a vocabulary block that contains 
all of the words in the dictionary, and a data block that stores the data 
numerically.
When the vocabulary fills up, the whole dictionary block doesn't get swapped to 
a plain block. The data block that was an integer-storing (bshuf) block becomes 
a plain encoded block. This keeps the encoding type of each column consistent, 
since the data is technically still stored in a dictionary encoded format, just 
in "plain" mode. And therefore there still is a dictionary bitset that doesn't 
get used by blocks that to switch to "plain" mode. It should be noted that a 
dictionary is shared among an entire cfile, not just the block.
Will come up with a less verbose way of saying this.


PS4, Line 124: consists
> Nit: consider replacing 'consists' with 'includes'.  Then no need for 'of' 
Done


PS4, Line 126: query
> Nit: 'query performance' or 'querying performance'?
Done, I think query performance is good.


PS4, Line 144:  to 1.0
> this looks like a strayed part.
version 1.0? 1.0 release?


PS4, Line 144:  Top
> a Top Level Apache project
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4384
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3b593959194a6ce9190f562339dc04a1d8fceba
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Andrew Wong <anjuw...@g.ucla.edu>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <anjuw...@g.ucla.edu>
Gerrit-Reviewer: Dan Burkert <d...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to