Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22961#discussion_r232906123
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
 ---
    @@ -214,13 +214,22 @@ object ShuffleExchangeExec {
               override def getPartition(key: Any): Int = key.asInstanceOf[Int]
             }
           case RangePartitioning(sortingExpressions, numPartitions) =>
    -        // Internally, RangePartitioner runs a job on the RDD that samples 
keys to compute
    -        // partition bounds. To get accurate samples, we need to copy the 
mutable keys.
    +        // Extract only fields used for sorting to avoid collecting large 
fields that does not
    +        // affect sorting result when deciding partition bounds in 
RangePartitioner
             val rddForSampling = rdd.mapPartitionsInternal { iter =>
    +          val projection =
    +            UnsafeProjection.create(sortingExpressions.map(_.child), 
outputAttributes)
               val mutablePair = new MutablePair[InternalRow, Null]()
    -          iter.map(row => mutablePair.update(row.copy(), null))
    +          // Internally, RangePartitioner runs a job on the RDD that 
samples keys to compute
    +          // partition bounds. To get accurate samples, we need to copy 
the mutable keys.
    +          iter.map(row => mutablePair.update(projection(row).copy(), null))
             }
    -        implicit val ordering = new 
LazilyGeneratedOrdering(sortingExpressions, outputAttributes)
    +        // Construct ordering on extracted sort key.
    +        val orderingAttributes = sortingExpressions.zipWithIndex.map { 
case (ord, i) =>
    +          ord.copy(child = BoundReference(i, ord.dataType, ord.nullable))
    +        }
    +        implicit val ordering: Ordering[InternalRow] =
    +          new LazilyGeneratedOrdering(orderingAttributes)
    --- End diff --
    
    yea, let's follow the previous style: 
https://github.com/apache/spark/pull/22961/files#diff-3ceee31a3da1b7c71dddd32f666126fbL223


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to