Matthew Jacobs has posted comments on this change. Change subject: [DOCS] Major update to Impala + Kudu page ......................................................................
Patch Set 14: (11 comments) http://gerrit.cloudera.org:8080/#/c/5649/14/docs/topics/impala_explain.xml File docs/topics/impala_explain.xml: Line 269: require any casting, can be pushed to Kudu. Binary predicates and IN list predicates can be pushed to Kudu. http://gerrit.cloudera.org:8080/#/c/5649/14/docs/topics/impala_kudu.xml File docs/topics/impala_kudu.xml: PS14, Line 147: The work is parallelized : across units of computing called 'work' and 'computing' is kind of misleading given kudu is storage. How about: Tablets are stored by tablet servers. PS14, Line 150: You can colocate the tablet servers on the same hosts as the DataNodes, although that is not required. Though our recommendation will be to colocate Impalads with tservers. PS14, Line 220: On the logical side, the uniqueness constraint allows you to avoid duplicate data in a table. I like this positive spin on our consistency limitations PS14, Line 220: duplicate duplicating PS14, Line 546: <codeph>BIT_SHUFFLE</codeph>: rearrange the bits of the values to efficiently : compress sequences of values that are identical or vary only slightly based : on primary key order. according to the compression doc below, this is also compressed with lz4 after the shuffle algorithm is applied PS14, Line 562: No joy trying keywords UNKNOWN, or GROUP_VARINT with TINYINT and BIGINT. can you file a JIRA please PS14, Line 1100: : See <xref keyref="kudu_tables"/> this doesnt render in the pdf PS14, Line 1153: In particular, do not rely on an <codeph>INSERT ... SELECT</codeph> statement : that selects from the same table into which it is inserting, unless you include extra : conditions in the <codeph>WHERE</codeph> clause to avoid reading the newly inserted rows : within the same statement this gets repeated very similarly in the next section. not sure which one it's better suited for, but looks like a duplication right now PS14, Line 1237: data that is read while a write : operation is in progress kudu does have atomic per row operations, so this needs to be clear it refers to impala statements that can read or write multiple rows in the same query. how about: or data that is read across multiple rows in a SELECT statement while a concurrent DML statement is modifying rows. http://gerrit.cloudera.org:8080/#/c/5649/14/docs/topics/impala_literals.xml File docs/topics/impala_literals.xml: PS14, Line 409: Kudu tables default to the <codeph>NOT NULL</codeph> setting for each column. this is not true, default is nullable except for PK cols -- To view, visit http://gerrit.cloudera.org:8080/5649 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell <[email protected]> Gerrit-Reviewer: Ambreen Kazi <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]> Gerrit-Reviewer: John Russell <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
