huaxingao commented on PR #34785:
URL: https://github.com/apache/spark/pull/34785#issuecomment-1132363307
Thanks @aokolnychyi for the proposal. I agree that we should support both
strictly required distribution and best effort distribution. For best effort
distribution, if user doesn't request a specific number of partitions, we will
split skewed partitions and coalesce small partitions. For strictly required
distribution, if user doesn't request a specific number of partitions, we will
coalesce small partitions but we will NOT split skewed partitions since this
changes the required distribution.
In interface `RequiresDistributionAndOrdering`, I will add
```
default boolean distributionStrictlyRequired() { return true; }
```
Then in `DistributionAndOrderingUtils`.`prepareQuery`, I will change the
code to something like this:
```
val queryWithDistribution = if (distribution.nonEmpty) {
if (!write.distributionStrictlyRequired() && numPartitions == 0) {
RebalancePartitions(distribution, query)
} else {
if (numPartitions > 0) {
RepartitionByExpression(distribution, query, numPartitions)
} else {
RepartitionByExpression(distribution, query, None)
}
}
...
```
Basically, in the best effort case, if the requested numPartitions is 0, we
will use `RebalancePartitions` so both `OptimizeSkewInRebalancePartitions` and
`CoalesceShufflePartitions` will be applied. In the strictly required
distribution case, if the requested numPartitions is 0, we will use
`RepartitionByExpression(distribution, query, None)` so
`CoalesceShufflePartitions` will be applied.
Does this sound correct for every one?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]