[GitHub] spark pull request #19828: [SPARK-22614] Dataset API: repartitionByRange(......

hvanhovell Mon, 27 Nov 2017 08:12:55 -0800

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19828#discussion_r153242566
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
    @@ -448,8 +448,15 @@ abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
           case r: logical.Range =>
             execution.RangeExec(r) :: Nil
           case logical.RepartitionByExpression(expressions, child, 
numPartitions) =>
    -        exchange.ShuffleExchangeExec(HashPartitioning(
    -          expressions, numPartitions), planLater(child)) :: Nil
    +        // RepartitionByExpression's constructor verifies that either all 
expressions are
    +        // of type SortOrder, in which case we're doing RangePartitioning, 
or none of them are,
    +        // in which case we're doing HashPartitioning.
    +        val partitioning = if 
(expressions.forall(_.isInstanceOf[SortOrder])) {
    --- End diff --
    
    We have discussed this before, but to me it makes slightly more sense to 
add this logic to the `RepartitionByExpression` plan.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19828: [SPARK-22614] Dataset API: repartitionByRange(......

Reply via email to