[ https://issues.apache.org/jira/browse/SPARK-43021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reassigned SPARK-43021: ----------------------------------- Assignee: zzzzming95 > Shuffle happens when Coalesce Buckets should occur > -------------------------------------------------- > > Key: SPARK-43021 > URL: https://issues.apache.org/jira/browse/SPARK-43021 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.3.1 > Reporter: Nikita Eshkeev > Assignee: zzzzming95 > Priority: Minor > > h1. What I did > I define the following code: > {{from pyspark.sql import SparkSession}} > {{spark = (}} > {{ SparkSession}} > {{ .builder}} > {{ .appName("Bucketing")}} > {{ .master("local[4]")}} > {{ .config("spark.sql.bucketing.coalesceBucketsInJoin.enabled", True)}} > {{ .config("spark.sql.autoBroadcastJoinThreshold", "-1")}} > {{ .getOrCreate()}} > {{)}} > {{df1 = spark.range(0, 100)}} > {{df2 = spark.range(0, 100, 2)}} > {{df1.write.bucketBy(4, "id").mode("overwrite").saveAsTable("t1")}} > {{df2.write.bucketBy(2, "id").mode("overwrite").saveAsTable("t2")}} > {{t1 = spark.table("t1")}} > {{t2 = spark.table("t2")}} > {{t2.join(t1, "id").explain()}} > h1. What happened > There is an Exchange node in the join plan > h1. What is expected > The plan should not contain any Exchange/Shuffle nodes, because {{t1}}'s > number of buckets is 4 and {{t2}}'s number of buckets is 2, and their ratio > is 2 which is less than 4 > ({{spark.sql.bucketing.coalesceBucketsInJoin.maxBucketRatio}}) and > [CoalesceBucketsInJoin|https://github.com/apache/spark/blob/c9878a212958bc54be529ef99f5e5d1ddf513ec8/sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala] > should be applied -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org