John Russell has posted comments on this change. Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode query option ......................................................................
Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/7300/1/docs/impala.ditamap File docs/impala.ditamap: Line 179: <topicref rev="2.9.0 IMPALA-5381 IMPALA-5583" href="topics/impala_default_join_distribution_mode.xml"/> > Why mention IMPALA-5583 also? In the past I've referred both to the "code implementation" JIRA and "document the new feature" JIRA in this kind of context. Just for ease of future maintenance and tracing if something is wrong or missing on the doc side. I guess that's less important when the doc one is a subtask of the code one. I'll take it out. http://gerrit.cloudera.org:8080/#/c/7300/1/docs/topics/impala_default_join_distribution_mode.xml File docs/topics/impala_default_join_distribution_mode.xml: Line 40: This option determines the join strategy that Impala uses when any of the tables > We deliberately did not use "join strategy" in the option name because stra Can you elaborate a little on the meaning of "join distribution mode" then? That's not terminology we've used elsewhere in the docs. Line 47: Hive <codeph>ANALYZE TABLE</codeph> statement. > Sure you want to keep the ANALYZE TABLE part? In most situations we cannot Done Line 48: By default, when a table involved in the join query does not have statistics, > Accuracy could be improved. What if both tables do not have stats? Clarify What is the answer if both tables are missing stats? Does Impala make a deduction about which is smaller and that one gets broadcast while the other doesn't? Line 58: might be missing statistics due to the overhead involved in calculating them, > I wouldn't suppose a particular reason for not having stats. Done Line 61: of a table involved in a join query and only transmits a portion of the table > Not very accurate, both tables are transferred across the network. Not sure I'd prefer to prepare and fine-tune a brief explanation so I could reuse that wording in places where such terminology is mentioned to a reader that might not have seen it before. Anyone who needs detailed background info can follow the "related info" links at the end of the page. Line 67: recommended when setting up and deploying new clusters. This setting is > We should mention why we recommend this. SHUFFLE is generally a safer optio Done -- To view, visit http://gerrit.cloudera.org:8080/7300 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: John Russell <[email protected]> Gerrit-HasComments: Yes
