cloud-fan commented on a change in pull request #32960:
URL: https://github.com/apache/spark/pull/32960#discussion_r666908791



##########
File path: docs/sql-performance-tuning.md
##########
@@ -273,7 +273,32 @@ This feature coalesces the post shuffle partitions based 
on the map output stati
  </table>
  
 ### Converting sort-merge join to broadcast join
-AQE converts sort-merge join to broadcast hash join when the runtime 
statistics of any join side is smaller than the broadcast hash join threshold. 
This is not as efficient as planning a broadcast hash join in the first place, 
but it's better than keep doing the sort-merge join, as we can save the sorting 
of both the join sides, and read shuffle files locally to save network 
traffic(if `spark.sql.adaptive.localShuffleReader.enabled` is true)
+AQE converts sort-merge join to broadcast hash join when the runtime 
statistics of any join side is smaller than the adaptive broadcast hash join 
threshold. This is not as efficient as planning a broadcast hash join in the 
first place, but it's better than keep doing the sort-merge join, as we can 
save the sorting of both the join sides, and read shuffle files locally to save 
network traffic(if `spark.sql.adaptive.localShuffleReader.enabled` is true)
+  <table class="table">
+     <tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since 
Version</th></tr>
+     <tr>
+       <td><code>spark.sql.adaptive.autoBroadcastJoinThreshold</code></td>
+       <td>(none)</td>
+       <td>
+         Configures the maximum size in bytes for a table that will be 
broadcast to all worker nodes when performing a join. By setting this value to 
-1 broadcasting can be disabled. The default value is same with 
<code>spark.sql.autoBroadcastJoinThreshold</code>. Note that, this config is 
used only in adaptive framework.
+       </td>
+       <td>3.2.0</td>
+     </tr>
+  </table>
+
+### Converting sort-merge join to shuffled hash join
+AQE converts sort-merge join to shuffled hash join when all reduce partitions 
size are small enough, the max threshold can see the config 
`spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold`.

Review comment:
       `... when all the post shuffle partitions are smaller than a threshold.`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to