[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-10-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-10-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-10-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95200/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #95200 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95200/testReport)**
 for PR 21859 at commit 
[`6f52f1f`](https://github.com/apache/spark/commit/6f52f1fde3d4df9384e1c99d08b930953843bcde).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #95200 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95200/testReport)**
 for PR 21859 at commit 
[`6f52f1f`](https://github.com/apache/spark/commit/6f52f1fde3d4df9384e1c99d08b930953843bcde).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-24 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21859
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95070/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #95070 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95070/testReport)**
 for PR 21859 at commit 
[`6f52f1f`](https://github.com/apache/spark/commit/6f52f1fde3d4df9384e1c99d08b930953843bcde).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #95070 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95070/testReport)**
 for PR 21859 at commit 
[`6f52f1f`](https://github.com/apache/spark/commit/6f52f1fde3d4df9384e1c99d08b930953843bcde).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-21 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21859
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #95048 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95048/testReport)**
 for PR 21859 at commit 
[`6f52f1f`](https://github.com/apache/spark/commit/6f52f1fde3d4df9384e1c99d08b930953843bcde).
 * This patch **fails from timeout after a configured wait of \`400m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95048/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #95048 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95048/testReport)**
 for PR 21859 at commit 
[`6f52f1f`](https://github.com/apache/spark/commit/6f52f1fde3d4df9384e1c99d08b930953843bcde).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-21 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21859
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94990/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94990 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94990/testReport)**
 for PR 21859 at commit 
[`6f52f1f`](https://github.com/apache/spark/commit/6f52f1fde3d4df9384e1c99d08b930953843bcde).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-20 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21859
  
If this optimization is done more generally, will the implicitly cached 
data cause memory pressure on driver, as seems we don't have way to release 
them?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-20 Thread sddyljsx
Github user sddyljsx commented on the issue:

https://github.com/apache/spark/pull/21859
  
'The ShuffleWriter should treat RangePartitioner specially and consume the 
sampled data in RangePartitioner instead of the input iterator.' This idea is 
good, maybe we can cache both the K and V when doing sample.
I will have a try on this idea.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94990 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94990/testReport)**
 for PR 21859 at commit 
[`6f52f1f`](https://github.com/apache/spark/commit/6f52f1fde3d4df9384e1c99d08b930953843bcde).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-20 Thread sddyljsx
Github user sddyljsx commented on the issue:

https://github.com/apache/spark/pull/21859
  
I read the source code again. 
The RangePartitioner[K, V] in ShuffleExchangeExec is an instance of 
RangePartitioner[InternalRow, Null]. RangePartitioner only sample K for getting 
the rangeBounds. So We can get the InternalRow when doing sample.
After getting the RangePartitioner, the ShuffleExchangeExec will map the 
InternalRow to [partitionId, InternalRow] for shuffle (the RangePartitioner 
generates the partitionId).
The shuffle won't use the RangePartitioner, it will use 
PartitionIdPassthrough instead.
In other words, the ShuffleWriter won't know the RangePartitioner's 
existence.

```
val rddWithPartitionIds: RDD[Product2[Int, InternalRow]] = 
newRdd.mapPartitionsInternal { iter =>
  val getPartitionKey = getPartitionKeyExtractor()
  val mutablePair = new MutablePair[Int, InternalRow]()
  iter.map { row => 
mutablePair.update(part.getPartition(getPartitionKey(row)), row) }
}

 val dependency =
  new ShuffleDependency[Int, InternalRow, InternalRow](
rddWithPartitionIds,
new PartitionIdPassthrough(part.numPartitions),
serializer)

private class PartitionIdPassthrough(override val numPartitions: Int) 
extends Partitioner {
  override def getPartition(key: Any): Int = key.asInstanceOf[Int]
}
```

The optimization will parallelize the cached InternalRow to the newRdd 
instead of getting it again.

But in other places, like rdd's sortByKey

```
def sortByKey(ascending: Boolean = true, numPartitions: Int = 
self.partitions.length)
  : RDD[(K, V)] = self.withScope
  {
val part = new RangePartitioner(numPartitions, self, ascending)
new ShuffledRDD[K, V, V](self, part)
  .setKeyOrdering(if (ascending) ordering else ordering.reverse)
  }
// getDependencies function in ShuffledRDD
override def getDependencies: Seq[Dependency[_]] = {
val serializer = userSpecifiedSerializer.getOrElse {
  val serializerManager = SparkEnv.get.serializerManager
  if (mapSideCombine) {
serializerManager.getSerializer(implicitly[ClassTag[K]], 
implicitly[ClassTag[C]])
  } else {
serializerManager.getSerializer(implicitly[ClassTag[K]], 
implicitly[ClassTag[V]])
  }
}
List(new ShuffleDependency(prev, part, serializer, keyOrdering, 
aggregator, mapSideCombine))
  }

```
The rdd is [K, V], and the shuffle uses RangePartitioner directly.  But we 
can only get K when doing sample. so we can't restore the rdd using the cache.

They work in two different ways.

So the optimization only works in Spark Sql's ShuffleExchangeExec by now.

'The ShuffleWriter should treat RangePartitioner specially and consume the 
sampled data in RangePartitioner instead of the input iterator.' This idea is 
good, maybe we can cache both the K and V when doing sample. I will have a try.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-20 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21859
  
I don't think this optimization should be done at SQL layer. The 
`ShuffleWriter` should treat `RangePartitioner` specially and consume the 
sampled data in `RangePartitioner` instead of the input iterator.

By doing that the SQL layer(as well as all other components) can benefit 
from it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94958/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94958 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94958/testReport)**
 for PR 21859 at commit 
[`ea7f317`](https://github.com/apache/spark/commit/ea7f3178fc6c6ea08f69b67796ff4b9333194a49).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94958 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94958/testReport)**
 for PR 21859 at commit 
[`ea7f317`](https://github.com/apache/spark/commit/ea7f3178fc6c6ea08f69b67796ff4b9333194a49).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94947/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94947 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94947/testReport)**
 for PR 21859 at commit 
[`dd6c09c`](https://github.com/apache/spark/commit/dd6c09c868c812c3ff4493f3989be9e012786956).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94947 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94947/testReport)**
 for PR 21859 at commit 
[`dd6c09c`](https://github.com/apache/spark/commit/dd6c09c868c812c3ff4493f3989be9e012786956).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-20 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21859
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94939/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94939 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94939/testReport)**
 for PR 21859 at commit 
[`dd6c09c`](https://github.com/apache/spark/commit/dd6c09c868c812c3ff4493f3989be9e012786956).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-19 Thread sddyljsx
Github user sddyljsx commented on the issue:

https://github.com/apache/spark/pull/21859
  
We may not know in advance how big this query is. The data at the beginning 
is large, but it may be very small after filtering.
I encountered this problem while using thrift server for queries. The 
program will query the sorted abnormal orders from the order table  through 
jdbc every day, the sql is 'select * from order where order_status = 4 and 
order_time = 20180820 order by order_id '. The number of abnormal orders is 
small.  It will be very inconvenient to cache the table using jdbc.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-19 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21859
  
for small queries, can we just do
```
val df = table.filter(...).cache()
df.sort()
```

We should carefully make trade off between the SQL engine complexity and 
user-benefits. How userful is this feature?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94939 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94939/testReport)**
 for PR 21859 at commit 
[`dd6c09c`](https://github.com/apache/spark/commit/dd6c09c868c812c3ff4493f3989be9e012786956).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-18 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21859
  
LGTM cc @viirya @cloud-fan @gatorsmile


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94931/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94931 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94931/testReport)**
 for PR 21859 at commit 
[`46bab16`](https://github.com/apache/spark/commit/46bab165af68c1ef2dd1fc57e7f27f5d27c72015).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94931 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94931/testReport)**
 for PR 21859 at commit 
[`46bab16`](https://github.com/apache/spark/commit/46bab165af68c1ef2dd1fc57e7f27f5d27c72015).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-18 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21859
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94688/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94688 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94688/testReport)**
 for PR 21859 at commit 
[`46bab16`](https://github.com/apache/spark/commit/46bab165af68c1ef2dd1fc57e7f27f5d27c72015).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94688 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94688/testReport)**
 for PR 21859 at commit 
[`46bab16`](https://github.com/apache/spark/commit/46bab165af68c1ef2dd1fc57e7f27f5d27c72015).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-13 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21859
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-12 Thread sddyljsx
Github user sddyljsx commented on the issue:

https://github.com/apache/spark/pull/21859
  
I think I need another retest . Please help. @viirya 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94662/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94662 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94662/testReport)**
 for PR 21859 at commit 
[`46bab16`](https://github.com/apache/spark/commit/46bab165af68c1ef2dd1fc57e7f27f5d27c72015).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94662 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94662/testReport)**
 for PR 21859 at commit 
[`46bab16`](https://github.com/apache/spark/commit/46bab165af68c1ef2dd1fc57e7f27f5d27c72015).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-12 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21859
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-12 Thread sddyljsx
Github user sddyljsx commented on the issue:

https://github.com/apache/spark/pull/21859
  
please help retest it . @kiszk @viirya 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94603/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94603 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94603/testReport)**
 for PR 21859 at commit 
[`46bab16`](https://github.com/apache/spark/commit/46bab165af68c1ef2dd1fc57e7f27f5d27c72015).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94603 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94603/testReport)**
 for PR 21859 at commit 
[`46bab16`](https://github.com/apache/spark/commit/46bab165af68c1ef2dd1fc57e7f27f5d27c72015).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94496/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94496 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94496/testReport)**
 for PR 21859 at commit 
[`7cc6e5a`](https://github.com/apache/spark/commit/7cc6e5a286a527669497f880fc10946bdcfe0cfe).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94496 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94496/testReport)**
 for PR 21859 at commit 
[`7cc6e5a`](https://github.com/apache/spark/commit/7cc6e5a286a527669497f880fc10946bdcfe0cfe).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-09 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21859
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94485/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94485 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94485/testReport)**
 for PR 21859 at commit 
[`7cc6e5a`](https://github.com/apache/spark/commit/7cc6e5a286a527669497f880fc10946bdcfe0cfe).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94485 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94485/testReport)**
 for PR 21859 at commit 
[`7cc6e5a`](https://github.com/apache/spark/commit/7cc6e5a286a527669497f880fc10946bdcfe0cfe).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-09 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21859
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-09 Thread sddyljsx
Github user sddyljsx commented on the issue:

https://github.com/apache/spark/pull/21859
  
@ueshin 
please retest it, an unkown error occurred.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-09 Thread sddyljsx
Github user sddyljsx commented on the issue:

https://github.com/apache/spark/pull/21859
  
retest this, please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94468/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94468 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94468/testReport)**
 for PR 21859 at commit 
[`7cc6e5a`](https://github.com/apache/spark/commit/7cc6e5a286a527669497f880fc10946bdcfe0cfe).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-08 Thread sddyljsx
Github user sddyljsx commented on the issue:

https://github.com/apache/spark/pull/21859
  
This optimization is only for SQL, but other places also use 
RangePartitioner. What it can affect other places?

The failed UTs are caused by 

```
else if (sampleCacheEnabled && numItems == numSampled) {
// get the sampled data
sampledArray = sketched.foldLeft(Array.empty[K])((total, sample) => 
{
  total ++ sample._3
})
Array.empty
  }
```
the RangePartitioner's rangeBounds will be empty. I think the rangeBounds 
will have no use if the optimization works, so an empty array is returned. But 
it may cause errors, so I change the code, and always get the rangeBounds. By 
this way, the only diff of the new RangePartitioner is that it stores an extra 
smapledArray which won't effect other places.
what's more, 
```
class RangePartitioner[K : Ordering : ClassTag, V](
partitions: Int,
rdd: RDD[_ <: Product2[K, V]],
private var ascending: Boolean = true,
val samplePointsPerPartitionHint: Int = 20,
val sampleCacheEnabled: Boolean = false)
```
the default value of the sampleCacheEnabled is false, only in this place it 
is true
```
ShuffleExchangeExec.scala
new RangePartitioner(
  numPartitions,
  rddForSampling,
  ascending = true,
  samplePointsPerPartitionHint = 
SQLConf.get.rangeExchangeSampleSizePerPartition,
  sampleCacheEnabled = SQLConf.get.rangeExchangeSampleCacheEnabled)
```
it will be safe.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94468 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94468/testReport)**
 for PR 21859 at commit 
[`7cc6e5a`](https://github.com/apache/spark/commit/7cc6e5a286a527669497f880fc10946bdcfe0cfe).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-08 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21859
  
Good point. [These 
failures](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94431/testReport/)
 may show that it affects other places.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-08 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21859
  
This optimization is only for SQL, but other places also use 
`RangePartitioner`. What it can affect other places?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94431/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94431 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94431/testReport)**
 for PR 21859 at commit 
[`58361ee`](https://github.com/apache/spark/commit/58361ee2a585fc6a3cf452b1e1ccb4e2718f53f6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94431 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94431/testReport)**
 for PR 21859 at commit 
[`58361ee`](https://github.com/apache/spark/commit/58361ee2a585fc6a3cf452b1e1ccb4e2718f53f6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94424 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94424/testReport)**
 for PR 21859 at commit 
[`bf31370`](https://github.com/apache/spark/commit/bf31370776100df1890b0cfab4a66f066f0d2e5b).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94424/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94424 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94424/testReport)**
 for PR 21859 at commit 
[`bf31370`](https://github.com/apache/spark/commit/bf31370776100df1890b0cfab4a66f066f0d2e5b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-08 Thread sddyljsx
Github user sddyljsx commented on the issue:

https://github.com/apache/spark/pull/21859
  
@ueshin  please test again


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21859
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94417/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94417 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94417/testReport)**
 for PR 21859 at commit 
[`e4bd2e3`](https://github.com/apache/spark/commit/e4bd2e3b3e18890d3aa015d85755a3880a0f002d).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21859
  
**[Test build #94417 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94417/testReport)**
 for PR 21859 at commit 
[`e4bd2e3`](https://github.com/apache/spark/commit/e4bd2e3b3e18890d3aa015d85755a3880a0f002d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >