[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

sddyljsx Sun, 12 Aug 2018 20:45:57 -0700

Github user sddyljsx commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21859#discussion_r209486199
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
 ---
    @@ -294,7 +296,12 @@ object ShuffleExchangeExec {
               sorter.sort(iter.asInstanceOf[Iterator[UnsafeRow]])
             }
           } else {
    -        rdd
    +        part match {
    +          case partitioner: RangePartitioner[InternalRow @unchecked, _]
    --- End diff --
    
    yes, but 
    ```
    def parallelize[T: ClassTag](
          seq: Seq[T],
          numSlices: Int = defaultParallelism): RDD[T] = withScope {
        assertNotStopped()
        new ParallelCollectionRDD[T](this, seq, numSlices, Map[Int, 
Seq[String]]())
      }
    ```
    the parallelize function needs this ClassTag, so we must match it here.
    I tried to match RangePartitioner[_, _], but there is an error:
    ```
    Error:(302, 37) No ClassTag available for _
    Error occurred in an application involving default arguments.
                sparkContext.parallelize(partitioner.getSampledArray.toSeq, 
rdd.getNumPartitions)
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

Reply via email to