Matthew Jacobs has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 14:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/5649/14/docs/topics/impala_explain.xml
File docs/topics/impala_explain.xml:

Line 269:       require any casting, can be pushed to Kudu.
Binary predicates and IN list predicates can be pushed to Kudu.


http://gerrit.cloudera.org:8080/#/c/5649/14/docs/topics/impala_kudu.xml
File docs/topics/impala_kudu.xml:

PS14, Line 147: The work is parallelized
              :               across units of computing called
'work' and 'computing' is kind of misleading given kudu is storage. How about:

Tablets are stored by tablet servers.


PS14, Line 150: You can colocate the tablet servers on the same hosts as the 
DataNodes, although that is not required.
Though our recommendation will be to colocate Impalads with tservers.


PS14, Line 220:               On the logical side, the uniqueness constraint 
allows you to avoid duplicate data in a table.
I like this positive spin on our consistency limitations


PS14, Line 220: duplicate
duplicating


PS14, Line 546:                   <codeph>BIT_SHUFFLE</codeph>: rearrange the 
bits of the values to efficiently
              :                   compress sequences of values that are 
identical or vary only slightly based
              :                   on primary key order.
according to the compression doc below, this is also compressed with lz4 after 
the shuffle algorithm is applied


PS14, Line 562: No joy trying keywords UNKNOWN, or GROUP_VARINT with TINYINT 
and BIGINT.
can you file a JIRA please


PS14, Line 1100: 
               :           See <xref keyref="kudu_tables"/>
this doesnt render in the pdf


PS14, Line 1153: In particular, do not rely on an <codeph>INSERT ... 
SELECT</codeph> statement
               :         that selects from the same table into which it is 
inserting, unless you include extra
               :         conditions in the <codeph>WHERE</codeph> clause to 
avoid reading the newly inserted rows
               :         within the same statement
this gets repeated very similarly in the next section. not sure which one it's 
better suited for, but looks like a duplication right now


PS14, Line 1237: data that is read while a write
               :         operation is in progress
kudu does have atomic per row operations, so this needs to be clear it refers 
to impala statements that can read or write multiple rows in the same query.

how about: or data that is read across multiple rows in a SELECT statement 
while a concurrent DML statement is modifying rows.


http://gerrit.cloudera.org:8080/#/c/5649/14/docs/topics/impala_literals.xml
File docs/topics/impala_literals.xml:

PS14, Line 409: Kudu tables default to the <codeph>NOT NULL</codeph> setting 
for each column.
this is not true, default is nullable except for PK cols


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <[email protected]>
Gerrit-Reviewer: Ambreen Kazi <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]>
Gerrit-Reviewer: John Russell <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to