John Russell has posted comments on this change. Change subject: [DOCS] Major update to Impala + Kudu page ......................................................................
Patch Set 8: (15 comments) Addressed some early comments from Dimitris and Todd that I skipped over before. (Some had already been fixed based on comments on subsequent patch sets.) http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_kudu.xml File docs/topics/impala_kudu.xml: PS4, Line 697: : : </conbody> : : </concept> : : </concept> > Remove this paragraph. You can use Kudu's white paper (section 3.2) for a b I'll hide the paragraph and link to the white paper. (The white paper uses the old DISTRIBUTE BY syntax so it's not perfect as background reading for users of Impala 2.8.) PS4, Line 706: : <title>Partitioning for Kudu Tables</title> : : <conbody> : : <p> : Kudu tables use special mechanisms to distribute data among the underlying : tablet servers. Although we refer to such tables as partitioned tables, they are : distinguished from traditional Impala partitioned tables by use of different clauses : on the <codeph>CREATE TABLE</codeph> statement. Kudu tables use : <code > I don't understand the point of this paragraph before even presenting the f Since now I'm referring people to the Kudu white paper which mentions the old syntax, let's leave this here for the moment. PS4, Line 727: > remove Done PS4, Line 731: rent ways to divide the data for each : column, or even for different value ranges within a column. This flexibility le > You need to mention the drawback of using hash partitioning as well. i.e. q Done PS4, Line 743: which used an experimental fork of the Impala code. For example, the : <codeph>DISTRIBUTE BY</codeph> clause is now <codeph>PA > That claim in not universally true. I would just remove it or make it case Done PS4, Line 751: > this is cluster-dependent, and based on a Kudu configuration. Would not doc Done PS4, Line 765: ibuted, instead of : clumping together all in the same bucket. Spreading new rows across the buckets this : way lets insertion > That's not necessarily the primary reason for using range partitioning. You Done PS4, Line 767: in parallel across multiple tablet servers. : Separating the hashed values can impose additional overhead on queries, where > Can we add a formal syntax here? For the moment I'm going to link over to the CREATE TABLE page. Later I'll see if I can extract just the pieces for the relevant clauses and reuse them here. PS4, Line 774: y 20,000 rows per partition. > You need to talk when VALUE and VALUES is used (single vs multi-column rang Isn't VALUE / VALUES independent of single or multiple columns in the range? I thought it was dependent on whether there was only 1 comparison operator or 2: Single column case: VALUE <= constant constant <= VALUES < constant Multiple column case: VALUE <= (constant,constant) (constant,constant) <= VALUES < (constant,constant) PS4, Line 788: The largest number of buckets that you can create with a <codeph>PARTITIONS</codeph> : clause varies depending on the number of tablet servers in the cluster, while the smallest is 2. : > might be worth a note in the text above that this is multiplicative with an Done PS4, Line 874: </p> : : <p> : Ra > I don't believe this is true (or at least it's not our intention that it is Done PS4, Line 884: partition value = 'C', partition value = 'D', partition value = 'F') : ]]> : </codeblock> > fill this out? For this high-level overview stuff, I will use simple toy tables just to illustrate the syntax without the trappings of a real enterprise-class table. PS4, Line 908: alter table year_ranges add range partition 1890 <= values < 1893; : ]]> : </codeblock> : > I don't think this will show per-tablet sizes currently Done PS4, Line 977: : <!-- To do: fill in example. --> : <codeblock><![CDATA[ : > not true anymore. now it's my_db::my_table Done PS4, Line 1007: : <p conref="../shared/impala_common.xml#common/kudu_metadata_intro"/> : > some missing words in this sentence Done -- To view, visit http://gerrit.cloudera.org:8080/5649 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell <[email protected]> Gerrit-Reviewer: Ambreen Kazi <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]> Gerrit-Reviewer: John Russell <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
