zhengruifeng edited a comment on pull request #34602:
URL: https://github.com/apache/spark/pull/34602#issuecomment-994264151


   @advancedxy  Sorry for the late reply and thanks for ping me.
   
   I did a quick test with https://github.com/apache/spark/pull/33893
   
   Unfortunately, https://github.com/apache/spark/pull/33893 failed to handle 
the case, since the whole plan including `Exchange` nodes (which were 
unexpected when I was implementing https://github.com/apache/spark/pull/33893) 
were passed.
   
   test code:
   ```
   spark.conf.set("spark.sql.adaptive.enabled", true)
   spark.conf.set("spark.sql.adaptive.skewJoin.enabled", true)
   spark.conf.set("spark.sql.autoBroadcastJoinThreshold", "-1")
   spark.conf.set("spark.sql.adaptive.coalescePartitions.enabled", false)
   spark.conf.set("spark.sql.shuffle.partitions", 10)
   
spark.conf.set("spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes", 
"100")
   spark.conf.set("spark.sql.adaptive.advisoryPartitionSizeInBytes", "100")
   
   spark.range(0, 1000, 1, 10).selectExpr("id % 3 as key1", "id % 3 as 
value1").createOrReplaceTempView("skewData1")
   spark.range(0, 1000, 1, 10).selectExpr("id % 1 as key2", "id as 
value2").createOrReplaceTempView("skewData2")
   spark.range(0, 1000, 1, 10).selectExpr("id % 1 as key3", "id as 
value3").createOrReplaceTempView("skewData3")
   
   
   spark.sql("SELECT key1 FROM skewData1 JOIN skewData2 ON key1 = key2 JOIN 
skewData3 ON value2 = 
value3").write.mode("overwrite").parquet("/tmp/tmp1.parquet")
   ```
   
   
   related log:
   
   ```
   21/12/15 11:15:33 DEBUG SparkSqlParser: Parsing command: SELECT key1 FROM 
skewData1 JOIN skewData2 ON key1 = key2 JOIN skewData3 ON value2 = value3
   21/12/15 11:15:34 DEBUG OptimizeSkewedJoin: Optimizing Project #75: 
ShuffledJoins: [SortMergeJoin, SortMergeJoin]
   21/12/15 11:15:34 DEBUG OptimizeSkewedJoin: Optimizing Project #75: Do NOT 
support operators [Exchange, Exchange, Range, Exchange, Range, Exchange, Range]
   21/12/15 11:15:35 DEBUG OptimizeSkewedJoin: Optimizing Project #161: 
ShuffledJoins: [SortMergeJoin, SortMergeJoin]
   21/12/15 11:15:35 DEBUG OptimizeSkewedJoin: Optimizing Project #161: Do NOT 
support operators [Exchange]
   21/12/15 11:15:35 DEBUG OptimizeSkewedJoin: Optimizing Project #200: 
ShuffledJoins: [SortMergeJoin, SortMergeJoin]
   21/12/15 11:15:35 DEBUG OptimizeSkewedJoin: Optimizing Project #200: Do NOT 
support operators [Exchange]
   21/12/15 11:15:35 DEBUG OptimizeSkewedJoin: Optimizing Project #247: 
ShuffledJoins: [SortMergeJoin]
   21/12/15 11:15:36 DEBUG OptimizeSkewedJoin: Optimizing Project #261: 
ShuffledJoins: [SortMergeJoin]
   21/12/15 11:15:36 DEBUG OptimizeSkewedJoin: Optimizing Project #261: 
ShuffleQueryStages: [3, 2]
   21/12/15 11:15:36 DEBUG OptimizeSkewedJoin: Optimizing Project #261: 
Splittable ShuffleQueryStages: [3, 2]
   21/12/15 11:15:36 DEBUG OptimizeSkewedJoin: Optimizing Project #261: 
Optimizing ShuffleQueryStage #3 in skew join, size info: median size: 21184, 
max size: 26854, min size: 18341, avg size: 21544
   21/12/15 11:15:36 DEBUG OptimizeSkewedJoin: Optimizing Project #261: 
Optimizing ShuffleQueryStage #2 in skew join, size info: median size: 1227, max 
size: 1308, min size: 1142, avg size: 1230
   21/12/15 11:15:36 DEBUG OptimizeSkewedJoin: Optimizing Project #261: Totally 
0 skew partitions found
   ```
   
   ---------------------
   
   update:
   
   I try to change `if (shuffleStages.length == 2)` to `if 
(shuffleStages.length >= 2)` in master, than the two added cases can be 
successfully optimized.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to