[GitHub] [spark] wForget commented on pull request #41609: [SPARK-44065][SQL] Optimize BroadcastHashJoin skew when localShuffleReader is disabled

via GitHub Tue, 04 Jul 2023 02:32:50 -0700


wForget commented on PR #41609:
URL: https://github.com/apache/spark/pull/41609#issuecomment-1619894072


   > I think it makes sense to optimize skew with broadcast hash join. But it 
seems still useful when localShuffleReaderEnabled is enabled. Here is an 
attempt to resolve skewed partition with local shuffle reader, #40312. 
   
   I tried the test case in #40312. The optimized plan of 
OptimizeShuffleWithLocalRead is rolled back because there is a HashAggregate 
operation after BroadcastHashJoin which makes ValidateRequirements.validate 
fail.
   
   The implementation of #40312 uses 
`OptimizeShuffleWithLocalRead.getPartitionSpecs` and 
`spark.sql.adaptive.forceOptimizeSkewedJoin=true` to make local shuffle read 
effective, but it will introduce additional shuffle, which seems not a good way.
   
   How about we remove the localShuffleReaderEnabled judgment in the current 
implementation?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] wForget commented on pull request #41609: [SPARK-44065][SQL] Optimize BroadcastHashJoin skew when localShuffleReader is disabled

Reply via email to