John Russell has posted comments on this change. Change subject: [DOCS] Major update to Impala + Kudu page ......................................................................
Patch Set 7: (18 comments) Addressed all comments, esp. splitting CREATE TABLE syntax for HDFS vs. Kudu. http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_explain.xml File docs/topics/impala_explain.xml: PS7, Line 250: > "in a SCAN KUDU node" Done PS7, Line 251: , and might involve transmitting : non-matching rows that are filtered out on the Impala side. > remove Done http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_kudu.xml File docs/topics/impala_kudu.xml: PS7, Line 76: scan performance close to that of Parquet > Make sure you check with Mostafa about this claim. Done PS7, Line 147: The work is parallelized : across units of computing called <term>tablet servers</term>. > I believe the unit of computing is the tablet not the tablet server, unless The tablet server is the compute resource, the tablet holds the actual data. PS7, Line 149: between the tablets and tablet servers. > I think it would be nice to mention that the user is not responsible for ma Done PS7, Line 168: (CREATE TABLE and ALTER TABLE) > It is weird to mention CREATE without DROP. Please remove this and mention Since this section is only talking about the delta in syntax, I'll leave as-is. PS7, Line 397: <p> : For non-Kudu tables, Impala allows any column to contain <codeph>NULL</codeph> : values, because it is not practical to enforce a <q>not null</q> constraint on HDFS : data files that could be prepared using external tools and ETL processes. : </p> : : <p conref="../shared/impala_common.xml#common/pk_implies_not_null"/> : : <p> : For example, a table containing geographic information might require the latitude : and longitude coordinates to always be specified. Other attributes might be allowed : to be <codeph>NULL</codeph>. For example, a location might not have a designated : place name, its altitude might be unimportant, and its population might be initially : unknown, to be filled in later. : </p> > You may want to swap these two paragraphs. Done PS7, Line 717: PARTITION BY > Impala still uses "PARTITIONED BY" for HDFS tables. Done PS7, Line 727: . By setting : up an effective partitioning scheme for a Kudu table, you can ensure that the work for : a query can be parallelized evenly across the hosts in a cluster. > Remove. Sometimes the goal is to scan as little as possible. You can say th Whole paragraph was hidden per comment on a previous patch set. (Instead, we link people to the Kudu white paper for those kinds of details.) PS7, Line 936: To see the distribution of data in a Kudu table across the underlying buckets and : partitions, use the <codeph>SHOW TABLE STATS</codeph> statement. > This is unfortunately not accurate. SHOW TABLE STATS will only show the dis Done. Reworded rather than deleting entirely. PS7, Line 1122: change > "changes" Done PS7, Line 1159: strong consistency for order of operations > I am not sure I know what that means. Done PS7, Line 1159: total : success or total failure of a multi-row statement > This is "atomicity". Maybe just mention atomic multi-row statements. Done PS7, Line 1160: or data that is read while a write : operation is in progress > Isolation. I added the technical terminology into my wording. PS7, Line 1288: <title>Memory Usage for Operations on Kudu Tables</title> : : <conbody> : : <p> : The Apache Kudu architecture, topology, and data storage techniques result in : different patterns of memory usage for Impala statements than with HDFS-backed tables. : </p> > I don't find this particularly informative and suggest we remove it unless Yes, audience="hidden" earlier in the <concept> tag will make it invisible. http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_literals.xml File docs/topics/impala_literals.xml: PS7, Line 408: most Impala tables > Impala tables backed by HDFS or S3? "most" is kind of vague Done http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_revoke.xml File docs/topics/impala_revoke.xml: PS7, Line 115: access to a Kudu table is <q>all or nothing</q>. > "only table-level permissions are enforced in Kudu tables. Column-level per Done http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_tables.xml File docs/topics/impala_tables.xml: PS7, Line 293: using the Apache Kudu storage system > "stored in Apache Kudu" Done -- To view, visit http://gerrit.cloudera.org:8080/5649 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell <[email protected]> Gerrit-Reviewer: Ambreen Kazi <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]> Gerrit-Reviewer: John Russell <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
