wankunde commented on PR #38176:
URL: https://github.com/apache/spark/pull/38176#issuecomment-1273985832

   > > For example, if ADVISORY_PARTITION_SIZE_IN_BYTES is 64M, the two sides 
of join are : [30M, 30M, 30M, 30M, 30M] and [100M, 100M, 100M, 100M, 100M], 
CoalesceShufflePartitions will transform ShuffleQueryStageExec to 
CustomShuffleReaderExec(stage, ShufflePartitionSpec[130M, 130M, 130M, 130M, 
130M]) , this query should be able to convert to ShuffledHashJoin (not support 
now).
   > 
   > I'm not sure what you mean, join has two shuffle stages so should also 
have two shuffle reads. For your example, CoalesceShufflePartitions won't 
coalesce partition since the sum of two continuous partition size bigger than 
64MB. You can find some details in test `ShufflePartitionsUtilSuite`. And this 
example has already supported to convert to shj if you set 
ADAPTIVE_MAX_SHUFFLE_HASH_JOIN_LOCAL_MAP_THRESHOLD to 64MB.
   > 
   
   It can not be converted to shj If 
ADAPTIVE_MAX_SHUFFLE_HASH_JOIN_LOCAL_MAP_THRESHOLD is less than 64MB, eg 
ADAPTIVE_MAX_SHUFFLE_HASH_JOIN_LOCAL_MAP_THRESHOLD = 40M?
   
   Our production executor has 18 vCores, so we want to set 
ADAPTIVE_MAX_SHUFFLE_HASH_JOIN_LOCAL_MAP_THRESHOLD to a small value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to