John Russell has posted comments on this change.

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode 
query option
......................................................................


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7300/1/docs/impala.ditamap
File docs/impala.ditamap:

Line 179:           <topicref rev="2.9.0 IMPALA-5381 IMPALA-5583" 
href="topics/impala_default_join_distribution_mode.xml"/>
> Why mention IMPALA-5583 also?
In the past I've referred both to the "code implementation" JIRA and "document 
the new feature" JIRA in this kind of context. Just for ease of future 
maintenance and tracing if something is wrong or missing on the doc side. I 
guess that's less important when the doc one is a subtask of the code one. I'll 
take it out.


http://gerrit.cloudera.org:8080/#/c/7300/1/docs/topics/impala_default_join_distribution_mode.xml
File docs/topics/impala_default_join_distribution_mode.xml:

Line 40:       This option determines the join strategy that Impala uses when 
any of the tables
> We deliberately did not use "join strategy" in the option name because stra
Can you elaborate a little on the meaning of "join distribution mode" then? 
That's not terminology we've used elsewhere in the docs.


Line 47:       Hive <codeph>ANALYZE TABLE</codeph> statement.
> Sure you want to keep the ANALYZE TABLE part? In most situations we cannot 
Done


Line 48:       By default, when a table involved in the join query does not 
have statistics,
> Accuracy could be improved. What if both tables do not have stats? Clarify 
What is the answer if both tables are missing stats? Does Impala make a 
deduction about which is smaller and that one gets broadcast while the other 
doesn't?


Line 58:       might be missing statistics due to the overhead involved in 
calculating them,
> I wouldn't suppose a particular reason for not having stats.
Done


Line 61:       of a table involved in a join query and only transmits a portion 
of the table
> Not very accurate, both tables are transferred across the network. Not sure
I'd prefer to prepare and fine-tune a brief explanation so I could reuse that 
wording in places where such terminology is mentioned to a reader that might 
not have seen it before. Anyone who needs detailed background info can follow 
the "related info" links at the end of the page.


Line 67:       recommended when setting up and deploying new clusters. This 
setting is
> We should mention why we recommend this. SHUFFLE is generally a safer optio
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: John Russell <[email protected]>
Gerrit-HasComments: Yes

Reply via email to