Alex Behm has posted comments on this change. Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode query option ......................................................................
Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/7300/1/docs/impala.ditamap File docs/impala.ditamap: Line 179: <topicref rev="2.9.0 IMPALA-5381 IMPALA-5583" href="topics/impala_default_join_distribution_mode.xml"/> Why mention IMPALA-5583 also? http://gerrit.cloudera.org:8080/#/c/7300/1/docs/topics/impala_default_join_distribution_mode.xml File docs/topics/impala_default_join_distribution_mode.xml: Line 40: This option determines the join strategy that Impala uses when any of the tables We deliberately did not use "join strategy" in the option name because strategy is too generic. Line 47: Hive <codeph>ANALYZE TABLE</codeph> statement. Sure you want to keep the ANALYZE TABLE part? In most situations we cannot effectively use what Hive produces. Line 48: By default, when a table involved in the join query does not have statistics, Accuracy could be improved. What if both tables do not have stats? Clarify that one table is going to be broadcast. Might even be worth explicitly listing what happens if one table has stats and the other doesn't (the one without stats will be broadcast) Line 58: might be missing statistics due to the overhead involved in calculating them, I wouldn't suppose a particular reason for not having stats. Line 61: of a table involved in a join query and only transmits a portion of the table Not very accurate, both tables are transferred across the network. Not sure if we need to explain the differences between broadcast+shuffle here, maybe provide a link to their explanation/definition? Line 67: recommended when setting up and deploying new clusters. This setting is We should mention why we recommend this. SHUFFLE is generally a safer option because the join build will be less prone to spilling and/or OOM. -- To view, visit http://gerrit.cloudera.org:8080/7300 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: John Russell <[email protected]> Gerrit-HasComments: Yes
