[ 
https://issues.apache.org/jira/browse/SPARK-43021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-43021:
-----------------------------------

    Assignee: zzzzming95

> Shuffle happens when Coalesce Buckets should occur
> --------------------------------------------------
>
>                 Key: SPARK-43021
>                 URL: https://issues.apache.org/jira/browse/SPARK-43021
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.3.1
>            Reporter: Nikita Eshkeev
>            Assignee: zzzzming95
>            Priority: Minor
>
> h1. What I did
> I define the following code:
> {{from pyspark.sql import SparkSession}}
> {{spark = (}}
> {{  SparkSession}}
> {{    .builder}}
> {{    .appName("Bucketing")}}
> {{    .master("local[4]")}}
> {{    .config("spark.sql.bucketing.coalesceBucketsInJoin.enabled", True)}}
> {{    .config("spark.sql.autoBroadcastJoinThreshold", "-1")}}
> {{    .getOrCreate()}}
> {{)}}
> {{df1 = spark.range(0, 100)}}
> {{df2 = spark.range(0, 100, 2)}}
> {{df1.write.bucketBy(4, "id").mode("overwrite").saveAsTable("t1")}}
> {{df2.write.bucketBy(2, "id").mode("overwrite").saveAsTable("t2")}}
> {{t1 = spark.table("t1")}}
> {{t2 = spark.table("t2")}}
> {{t2.join(t1, "id").explain()}}
> h1. What happened
> There is an Exchange node in the join plan
> h1. What is expected
> The plan should not contain any Exchange/Shuffle nodes, because {{t1}}'s 
> number of buckets is 4 and {{t2}}'s number of buckets is 2, and their ratio 
> is 2 which is less than 4 
> ({{spark.sql.bucketing.coalesceBucketsInJoin.maxBucketRatio}}) and 
> [CoalesceBucketsInJoin|https://github.com/apache/spark/blob/c9878a212958bc54be529ef99f5e5d1ddf513ec8/sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala]
>  should be applied



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to