Alex Behm has posted comments on this change.

Change subject: IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
......................................................................


Patch Set 1:

(8 comments)

Looks good, just minor comments

http://gerrit.cloudera.org:8080/#/c/7680/1/docs/topics/impala_scalability.xml
File docs/topics/impala_scalability.xml:

Line 863:   queries to understand the data distribution and plan a partitioning 
strategy,
I'd leave out the "to understand the data distribution and plan a partitioning 
strategy" because that already supposes a certain use case in the user's mind. 
I'd not make any assumptions about what the user wants to do with TABLESAMPLE.


Line 865:   to only a percentage of data within the table. This technique 
reduces the overhead
Nice!


http://gerrit.cloudera.org:8080/#/c/7680/1/docs/topics/impala_select.xml
File docs/topics/impala_select.xml:

Line 175:         clause immediately after a table reference, to specify that 
the query only processes an
a certain percentage of the table data? an "arbitrary portion" sounds strange 
and it's not really completely arbitrary


http://gerrit.cloudera.org:8080/#/c/7680/1/docs/topics/impala_tablesample.xml
File docs/topics/impala_tablesample.xml:

Line 57:       The <codeph>TABLESAMPLE</codeph> clause comes immediately after 
a table name.
table name or alias, e.g.

from mytable t tablesample ...


Line 69:       processing a particular set of data files, the proportion of 
sampled data from the
suggest "selecting a random set of data files" instead of "processing a 
particular set of data files"


Line 77:       sampling considers the same set of data files each time. 
<codeph>REPEATABLE</codeph>
suggest "selects" instead of "considers"


Line 172:       by itself, because all phases of query execution use less data 
overall.
This is not necessarily true, depending on whether the small query optimization 
kicks in with limit.


Line 257:       table metadata is not updated by a <codeph>REFRESH</codeph> 
whitespace


-- 
To view, visit http://gerrit.cloudera.org:8080/7680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Greg Rahn <[email protected]>
Gerrit-Reviewer: John Russell <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-HasComments: Yes

Reply via email to