Alex Behm has posted comments on this change. Change subject: IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement ......................................................................
Patch Set 1: (8 comments) Looks good, just minor comments http://gerrit.cloudera.org:8080/#/c/7680/1/docs/topics/impala_scalability.xml File docs/topics/impala_scalability.xml: Line 863: queries to understand the data distribution and plan a partitioning strategy, I'd leave out the "to understand the data distribution and plan a partitioning strategy" because that already supposes a certain use case in the user's mind. I'd not make any assumptions about what the user wants to do with TABLESAMPLE. Line 865: to only a percentage of data within the table. This technique reduces the overhead Nice! http://gerrit.cloudera.org:8080/#/c/7680/1/docs/topics/impala_select.xml File docs/topics/impala_select.xml: Line 175: clause immediately after a table reference, to specify that the query only processes an a certain percentage of the table data? an "arbitrary portion" sounds strange and it's not really completely arbitrary http://gerrit.cloudera.org:8080/#/c/7680/1/docs/topics/impala_tablesample.xml File docs/topics/impala_tablesample.xml: Line 57: The <codeph>TABLESAMPLE</codeph> clause comes immediately after a table name. table name or alias, e.g. from mytable t tablesample ... Line 69: processing a particular set of data files, the proportion of sampled data from the suggest "selecting a random set of data files" instead of "processing a particular set of data files" Line 77: sampling considers the same set of data files each time. <codeph>REPEATABLE</codeph> suggest "selects" instead of "considers" Line 172: by itself, because all phases of query execution use less data overall. This is not necessarily true, depending on whether the small query optimization kicks in with limit. Line 257: table metadata is not updated by a <codeph>REFRESH</codeph> whitespace -- To view, visit http://gerrit.cloudera.org:8080/7680 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Greg Rahn <[email protected]> Gerrit-Reviewer: John Russell <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-HasComments: Yes
