[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-10-10 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/22010
  
If this is not yet in 2.4 it shouldn’t be merged now.

On Wed, Oct 10, 2018 at 10:57 AM Holden Karau 
wrote:

> Open question: is this suitable for branch-2.4 since it predates the
> branch cut or not? (I know we've gone back and forth on how we do that).
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>
-- 
-x



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-10-10 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/22010
  
Open question: is this suitable for branch-2.4 since it predates the branch 
cut or not? (I know we've gone back and forth on how we do that).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-27 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22010
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96680/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22010
  
**[Test build #96680 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96680/testReport)**
 for PR 22010 at commit 
[`95357cf`](https://github.com/apache/spark/commit/95357cff3da95c962c575f1b8efe155841ed78a5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-27 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/22010
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3529/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22010
  
**[Test build #96680 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96680/testReport)**
 for PR 22010 at commit 
[`95357cf`](https://github.com/apache/spark/commit/95357cff3da95c962c575f1b8efe155841ed78a5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-27 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22010
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22010
  
**[Test build #96669 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96669/testReport)**
 for PR 22010 at commit 
[`95357cf`](https://github.com/apache/spark/commit/95357cff3da95c962c575f1b8efe155841ed78a5).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96669/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3519/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22010
  
**[Test build #96669 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96669/testReport)**
 for PR 22010 at commit 
[`95357cf`](https://github.com/apache/spark/commit/95357cff3da95c962c575f1b8efe155841ed78a5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-27 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22010
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96652/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22010
  
**[Test build #96652 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96652/testReport)**
 for PR 22010 at commit 
[`95357cf`](https://github.com/apache/spark/commit/95357cff3da95c962c575f1b8efe155841ed78a5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3505/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22010
  
**[Test build #96652 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96652/testReport)**
 for PR 22010 at commit 
[`95357cf`](https://github.com/apache/spark/commit/95357cff3da95c962c575f1b8efe155841ed78a5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22010
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96641/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22010
  
**[Test build #96641 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96641/testReport)**
 for PR 22010 at commit 
[`95357cf`](https://github.com/apache/spark/commit/95357cff3da95c962c575f1b8efe155841ed78a5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3496/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22010
  
**[Test build #96641 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96641/testReport)**
 for PR 22010 at commit 
[`95357cf`](https://github.com/apache/spark/commit/95357cff3da95c962c575f1b8efe155841ed78a5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-26 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/22010
  
I'll leave this until Friday morning (pacific) in case anyone has last 
minute comments. cc @rxin / @HyukjinKwon / @mgaido91


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-26 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/22010
  
So by running `sc.parallelize(1.to(1000)).map(x => (x % 10, 
x)).sortByKey().distinct().count()` in 2.3.0 and my PR we can see the 
difference:
![240_proposed_distinct_screenshot from 2018-09-26 
11-41-13](https://user-images.githubusercontent.com/59893/46101578-317cbb00-c181-11e8-8fa0-6f6b90383aa5.png)
![230_distinct_screenshot from 2018-09-26 
11-40-51](https://user-images.githubusercontent.com/59893/46101583-33df1500-c181-11e8-9142-a83e8be65ee4.png)
And see one less shuffle.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96583/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22010
  
**[Test build #96583 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96583/testReport)**
 for PR 22010 at commit 
[`849f67b`](https://github.com/apache/spark/commit/849f67bf6c9a54007fec63a0b97cecfc7137e0be).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-25 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/22010
  
Did another quick micro benchmark on a small cluster:

```scala
import org.apache.spark.util.collection.ExternalAppendOnlyMap

def removeDuplicatesInPartition(partition: Iterator[(Int, Int)]): 
Iterator[(Int, Int)] = {
  // Create an instance of external append only map which ignores values.
  val map = new ExternalAppendOnlyMap[(Int, Int), Null, Null](
createCombiner = value => null,
mergeValue = (a, b) => a,
mergeCombiners = (a, b) => a)
  map.insertAll(partition.map(_ -> null))
  map.iterator.map(_._1)
}


def time[R](block: => R): (Long, R) = {
val t0 = System.nanoTime()
val result = block // call-by-name
val t1 = System.nanoTime()
println("Elapsed time: " + (t1 - t0) + "ns")
(t1, result)
}

val count = 1000
val inputData = sc.parallelize(1.to(count))
val keyed = inputData.map(x => (x % 100, x))
val shuffled = keyed.repartition(50).cache()
shuffled.count()

val o1 = time(shuffled.distinct().count())
val n1 = time(shuffled.mapPartitions(removeDuplicatesInPartition).count())
val n2 = time(shuffled.mapPartitions(removeDuplicatesInPartition).count())
val o2 = time(shuffled.distinct().count())
val n3 = time(shuffled.mapPartitions(removeDuplicatesInPartition).count())
```

And the result is:

> Elapsed time: 1790932239ns
  
> Elapsed time: 381450402ns
> Elapsed time: 340449179ns
> Elapsed time: 1524955492ns
  
> Elapsed time: 291948041ns
> import org.apache.spark.util.collection.ExternalAppendOnlyMap
> removeDuplicatesInPartition: (partition: Iterator[(Int, 
Int)])Iterator[(Int, Int)]
> time: [R](block: => R)(Long, R)
> count: Int = 1000
> inputData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[10] at 
parallelize at :52
> keyed: org.apache.spark.rdd.RDD[(Int, Int)] = MapPartitionsRDD[11] at map 
at :53
> shuffled: org.apache.spark.rdd.RDD[(Int, Int)] = MapPartitionsRDD[15] at 
repartition at :54
> o1: (Long, Long) = (2943493642271881,1000)
> n1: (Long, Long) = (2943494027399482,1000)
> n2: (Long, Long) = (2943494371228656,1000)
> o2: (Long, Long) = (2943495899580372,1000)
> n3: (Long, Long) = (2943496195569891,1000)
> 

Increasing count by a factor of 10 we get:
> Elapsed time: 21679193176ns   
  
> Elapsed time: 3114223737ns
  
> Elapsed time: 3348141004ns
  
> Elapsed time: 51267597984ns   
  
> Elapsed time: 3931899963ns
  
> import org.apache.spark.util.collection.ExternalAppendOnlyMap
> removeDuplicatesInPartition: (partition: Iterator[(Int, 
Int)])Iterator[(Int, Int)]
> time: [R](block: => R)(Long, R)
> count: Int = 1
> inputData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[25] at 
parallelize at :56
> keyed: org.apache.spark.rdd.RDD[(Int, Int)] = MapPartitionsRDD[26] at map 
at :57
> shuffled: org.apache.spark.rdd.RDD[(Int, Int)] = MapPartitionsRDD[30] at 
repartition at :58
> o1: (Long, Long) = (2943648438919959,1)
> n1: (Long, Long) = (2943651557292201,1)
> n2: (Long, Long) = (2943654909392808,1)
> o2: (Long, Long) = (2943706180722021,1)
> n3: (Long, Long) = (2943710116461734,1)
> 
> 

So that looks like close to an order of magnitude improvement.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3454/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22010
  
**[Test build #96583 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96583/testReport)**
 for PR 22010 at commit 
[`849f67b`](https://github.com/apache/spark/commit/849f67bf6c9a54007fec63a0b97cecfc7137e0be).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-10 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/22010
  
@cloud-fan yeah that's totally an option. Since @rxin asked for it to use 
`reduceByKey` I went with that approach, but I'd be happy to use the 
`ExternalAppendOnlyMap` if that's ok with folks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-10 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22010
  
I think this works, can we post some Spark web UI screenshots to confirm 
the shuffle is indeed eliminated?

BTW one idea to simplify the implementation:
```
def distinct(numPartitions) = {
  ...
  mapPartitions(iter => {
val map = new ExternalAppendOnlyMap[T, Null, Null](
  createCombiner = identity,
  mergeValue = (a, b) => a,
  mergeCombiners = (a, b) => a)
map.insertAll(iter.map(_ -> null))
map.iterator.map(_._1)
  }, preservesPartitioning = true)
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-09 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/22010
  
@rxin So that RDD could not exist with a known partitioner (regardless of 
range-based or hash based the partitioner must be deterministic so two elements 
with the same key must go to the same partition & if two elements do not have 
same key they can not be duplicates of each other). Distinct looks at both the 
input k/v as one elem not just v (e.g an RDD of `[(1, 2), (2, 2), (2, 
2)].distinct()` should produce `[(1,2), (2, 2)]`).



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-08 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/22010
  
Actually @holdenk is this change even correct? RDD.distinct is not key 
based. It is based on the value of the elements in RDD. Even if `numPartitions 
== partitions.length`, it doesn't mean the RDD is hash partitioned this way.

Consider this RDD:

Partition 1: 1, 2, 3
Partition 2: 1, 2, 3

rdd.distinct() should return 1, 2, 3

with your change it'd still return 1, 2, 3, 1, 2, 3.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-08 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/22010
  
Hey @rxin & @cloud-fan I'd really appreciate your input on the tricks I did 
to keep the partioniner information present -- is this the right approach?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95768/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22010
  
**[Test build #95768 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95768/testReport)**
 for PR 22010 at commit 
[`4c89653`](https://github.com/apache/spark/commit/4c8965345fca51ee3accc7707b98c846eea4e01b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95767/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22010
  
**[Test build #95767 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95767/testReport)**
 for PR 22010 at commit 
[`7ed7589`](https://github.com/apache/spark/commit/7ed7589bcba9273aa14ba207bfaf5bb67b57e6c8).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2911/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2910/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22010
  
**[Test build #95768 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95768/testReport)**
 for PR 22010 at commit 
[`4c89653`](https://github.com/apache/spark/commit/4c8965345fca51ee3accc7707b98c846eea4e01b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-06 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/22010
  
Updated to use reduceByKey. I'd really appreciate feedback on if adding the 
param to `MapPartitionsRDD` was the way to go or if I should sub-class it 
instead.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22010
  
**[Test build #95767 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95767/testReport)**
 for PR 22010 at commit 
[`7ed7589`](https://github.com/apache/spark/commit/7ed7589bcba9273aa14ba207bfaf5bb67b57e6c8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-31 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/22010
  
thanks for checking @rxin @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22010
  
I am sorry guys. I rushed to take a look.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-30 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22010
  
While this saves a shuffle, but the algorithm becomes different. Previously 
we use the shuffe aggregator, which stores data in a `ExternalAppendOnlyMap`. 
Now we use a scala set, which may OOM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-30 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/22010
  
Thanks for pinging. Please don't merge this until you've addressed the OOM 
issue. The aggregators were created to handle incoming data larger than size of 
memory. We should never use a Scala or Java hash set to put all the data in.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-30 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/22010
  
I think this is pretty clearly a win, but since it's been awhile since I 
did anything in core I'll leave this until Friday morning (pacific) in-case any 
of the committers who've been working there have something to say (e.g. 
@jkbradley / @rxin ).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-27 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/22010
  
I did a quick micro-benchmark on this and got:

> scala> :paste
> // Entering paste mode (ctrl-D to finish)
> 
> import scala.collection.{mutable, Map}
> def removeDuplicatesInPartition(itr: Iterator[Int]): Iterator[Int] = {
> val set = new mutable.HashSet[Int]()
> itr.filter(set.add(_))
> }
> 
> def time[R](block: => R): (Long, R) = {
> val t0 = System.nanoTime()
> val result = block// call-by-name
> val t1 = System.nanoTime()
> println("Elapsed time: " + (t1 - t0) + "ns")
> (t1, result)
> }
> 
> val count = 100
> val inputData = sc.parallelize(1.to(count)).cache()
> inputData.count()
> 
> val o1 = time(inputData.distinct().count())
> val n1 = 
time(inputData.mapPartitions(removeDuplicatesInPartition).count())
> val n2 = 
time(inputData.mapPartitions(removeDuplicatesInPartition).count())
> val o2 = time(inputData.distinct().count())
> val n3 = 
time(inputData.mapPartitions(removeDuplicatesInPartition).count())
> 
> 
> // Exiting paste mode, now interpreting.
> 
> Elapsed time: 2464151504ns
  
> Elapsed time: 219130154ns
> Elapsed time: 133545428ns
> Elapsed time: 927133584ns 
  
> Elapsed time: 242432642ns
> import scala.collection.{mutable, Map}
> removeDuplicatesInPartition: (itr: Iterator[Int])Iterator[Int]
> time: [R](block: => R)(Long, R)
> count: Int = 100
> inputData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[19] at 
parallelize at :47
> o1: (Long, Long) = (437102431151279,100)
> n1: (Long, Long) = (437102654798968,100)
> n2: (Long, Long) = (437102792389328,100)
> o2: (Long, Long) = (437103724196085,100)
> n3: (Long, Long) = (437103971061275,100)
> 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-13 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/22010
  
@HyukjinKwon sure, I'll do a micro benchmark sometime this week.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22010
  
Logically looks right but would you mind if I ask a simple benchmark 
@holdenk just to make everything clear?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94634/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22010
  
**[Test build #94634 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94634/testReport)**
 for PR 22010 at commit 
[`5fd3659`](https://github.com/apache/spark/commit/5fd36592a26b07fdb58e79e4efbb6b70daea54df).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22010
  
**[Test build #94634 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94634/testReport)**
 for PR 22010 at commit 
[`5fd3659`](https://github.com/apache/spark/commit/5fd36592a26b07fdb58e79e4efbb6b70daea54df).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2091/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-11 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/22010
  
Test failure is streaming timeout, likely unrelated. Jenkins retest this 
please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94607/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22010
  
**[Test build #94607 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94607/testReport)**
 for PR 22010 at commit 
[`5fd3659`](https://github.com/apache/spark/commit/5fd36592a26b07fdb58e79e4efbb6b70daea54df).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22010
  
**[Test build #94607 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94607/testReport)**
 for PR 22010 at commit 
[`5fd3659`](https://github.com/apache/spark/commit/5fd36592a26b07fdb58e79e4efbb6b70daea54df).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2076/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-11 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/22010
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94577/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22010
  
**[Test build #94577 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94577/testReport)**
 for PR 22010 at commit 
[`5fd3659`](https://github.com/apache/spark/commit/5fd36592a26b07fdb58e79e4efbb6b70daea54df).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22010
  
**[Test build #94577 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94577/testReport)**
 for PR 22010 at commit 
[`5fd3659`](https://github.com/apache/spark/commit/5fd36592a26b07fdb58e79e4efbb6b70daea54df).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2053/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94303/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22010
  
**[Test build #94303 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94303/testReport)**
 for PR 22010 at commit 
[`a7fbc74`](https://github.com/apache/spark/commit/a7fbc74335c2df27002e8158f8e83a919195eed7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22010
  
**[Test build #94303 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94303/testReport)**
 for PR 22010 at commit 
[`a7fbc74`](https://github.com/apache/spark/commit/a7fbc74335c2df27002e8158f8e83a919195eed7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1861/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22010
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org