[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

John Russell (Code Review) Tue, 24 Jan 2017 11:36:31 -0800

John Russell has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................



Patch Set 8:

(15 comments)

Addressed some early comments from Dimitris and Todd that I skipped over 
before. (Some had already been fixed based on comments on subsequent patch 
sets.)

http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_kudu.xml
File docs/topics/impala_kudu.xml:

PS4, Line 697: 
             : 
             :         </conbody>
             : 
             :       </concept>
             : 
             :     </concept>
> Remove this paragraph. You can use Kudu's white paper (section 3.2) for a b
I'll hide the paragraph and link to the white paper. (The white paper uses the 
old DISTRIBUTE BY syntax so it's not perfect as background reading for users of 
Impala 2.8.)


PS4, Line 706: 
             :       <title>Partitioning for Kudu Tables</title>
             : 
             :       <conbody>
             : 
             :         <p>
             :           Kudu tables use special mechanisms to distribute data 
among the underlying
             :           tablet servers. Although we refer to such tables as 
partitioned tables, they are
             :           distinguished from traditional Impala partitioned 
tables by use of different clauses
             :           on the <codeph>CREATE TABLE</codeph> statement. Kudu 
tables use
             :           <code
> I don't understand the point of this paragraph before even presenting the f
Since now I'm referring people to the Kudu white paper which mentions the old 
syntax, let's leave this here for the moment.


PS4, Line 727: 
> remove
Done


PS4, Line 731: rent ways to divide the data for each
             :           column, or even for different value ranges within a 
column. This flexibility le
> You need to mention the drawback of using hash partitioning as well. i.e. q
Done


PS4, Line 743:             which used an experimental fork of the Impala code. 
For example, the
             :             <codeph>DISTRIBUTE BY</codeph> clause is now 
<codeph>PA
> That claim in not universally true. I would just remove it or make it case 
Done


PS4, Line 751: 
> this is cluster-dependent, and based on a Kudu configuration. Would not doc
Done


PS4, Line 765: ibuted, instead of
             :             clumping together all in the same bucket. Spreading 
new rows across the buckets this
             :             way lets insertion 
> That's not necessarily the primary reason for using range partitioning. You
Done


PS4, Line 767: in parallel across multiple tablet servers.
             :             Separating the hashed values can impose additional 
overhead on queries, where
> Can we add a formal syntax here?
For the moment I'm going to link over to the CREATE TABLE page. Later I'll see 
if I can extract just the pieces for the relevant clauses and reuse them here.


PS4, Line 774: y 20,000 rows per partition.
> You need to talk when VALUE and VALUES is used (single vs multi-column rang
Isn't VALUE / VALUES independent of single or multiple columns in the range? I 
thought it was dependent on whether there was only 1 comparison operator or 2:

Single column case:
VALUE <= constant
constant <= VALUES < constant

Multiple column case:
VALUE <= (constant,constant)
(constant,constant) <= VALUES < (constant,constant)


PS4, Line 788:               The largest number of buckets that you can create 
with a <codeph>PARTITIONS</codeph>
             :               clause varies depending on the number of tablet 
servers in the cluster, while the smallest is 2.
             :  
> might be worth a note in the text above that this is multiplicative with an
Done


PS4, Line 874:           </p>
             : 
             :           <p>
             :             Ra
> I don't believe this is true (or at least it's not our intention that it is
Done


PS4, Line 884:   partition value = 'C', partition value = 'D', partition value 
= 'F')
             : ]]>
             : </codeblock>
> fill this out?
For this high-level overview stuff, I will use simple toy tables just to 
illustrate the syntax without the trappings of a real enterprise-class table.


PS4, Line 908: alter table year_ranges add range partition 1890 <= values < 
1893;
             : ]]>
             : </codeblock>
             : 
> I don't think this will show per-tablet sizes currently
Done


PS4, Line 977: 
             : <!-- To do: fill in example. -->
             : <codeblock><![CDATA[
             : 
> not true anymore. now it's my_db::my_table
Done


PS4, Line 1007: 
              :         <p 
conref="../shared/impala_common.xml#common/kudu_metadata_intro"/>
              :  
> some missing words in this sentence
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <[email protected]>
Gerrit-Reviewer: Ambreen Kazi <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]>
Gerrit-Reviewer: John Russell <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Reply via email to