[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-13 Thread colorant
Github user colorant commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45981871 @dorx Do you think this works for extreme large data set with really small sample size? e.g. n = 1.0x10^11 while sample = 1 ? in that case, the final adjusted fraction

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-13 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45992111 @colorant Tried the following with the new implementation: ~~~ val rdd = sc.parallelize(0 until 10, 1).flatMap(i = Iterator.fill(10)(0)) // 10^10

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-13 Thread dorx
Github user dorx commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-46059095 @colorant Thanks for taking a look at this! First of all let me just say that I ran Xiangrui's code but with .fill(1000) (so 100x in RDD size), and it was still

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13691081 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -394,20 +402,22 @@ abstract class RDD[T: ClassTag]( return new Array[T](0)

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13691100 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -394,20 +402,22 @@ abstract class RDD[T: ClassTag]( return new Array[T](0)

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13691144 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -394,20 +401,22 @@ abstract class RDD[T: ClassTag]( return new Array[T](0)

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13691153 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -394,20 +401,22 @@ abstract class RDD[T: ClassTag]( return new Array[T](0)

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13691305 --- Diff: core/src/main/scala/org/apache/spark/util/random/SamplingUtils.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13691375 --- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala --- @@ -498,55 +501,56 @@ class RDDSuite extends FunSuite with SharedSparkContext { }

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13691382 --- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala --- @@ -498,55 +501,56 @@ class RDDSuite extends FunSuite with SharedSparkContext { }

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13691388 --- Diff: core/src/test/scala/org/apache/spark/util/random/SamplingUtilsSuite.scala --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13691398 --- Diff: core/src/test/scala/org/apache/spark/util/random/SamplingUtilsSuite.scala --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13691458 --- Diff: python/pyspark/rdd.py --- @@ -365,27 +366,25 @@ def takeSample(self, withReplacement, num, seed=None): fraction = 0.0

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13691473 --- Diff: python/pyspark/rdd.py --- @@ -365,27 +366,25 @@ def takeSample(self, withReplacement, num, seed=None): fraction = 0.0

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13691499 --- Diff: python/pyspark/rdd.py --- @@ -365,27 +366,25 @@ def takeSample(self, withReplacement, num, seed=None): fraction = 0.0

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13691512 --- Diff: python/pyspark/rdd.py --- @@ -400,6 +399,18 @@ def takeSample(self, withReplacement, num, seed=None): sampler.shuffle(samples)

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13691490 --- Diff: python/pyspark/rdd.py --- @@ -365,27 +366,25 @@ def takeSample(self, withReplacement, num, seed=None): fraction = 0.0

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13691520 --- Diff: python/pyspark/rdd.py --- @@ -400,6 +399,18 @@ def takeSample(self, withReplacement, num, seed=None): sampler.shuffle(samples)

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13691539 --- Diff: python/pyspark/rdd.py --- @@ -400,6 +399,18 @@ def takeSample(self, withReplacement, num, seed=None): sampler.shuffle(samples)

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13691536 --- Diff: python/pyspark/rdd.py --- @@ -400,6 +399,18 @@ def takeSample(self, withReplacement, num, seed=None): sampler.shuffle(samples)

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13691546 --- Diff: python/pyspark/rdd.py --- @@ -400,6 +399,18 @@ def takeSample(self, withReplacement, num, seed=None): sampler.shuffle(samples)

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread dorx
Github user dorx commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13720492 --- Diff: python/pyspark/rdd.py --- @@ -365,27 +366,25 @@ def takeSample(self, withReplacement, num, seed=None): fraction = 0.0

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread dorx
Github user dorx commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13720693 --- Diff: python/pyspark/rdd.py --- @@ -365,27 +366,25 @@ def takeSample(self, withReplacement, num, seed=None): fraction = 0.0

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45947595 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45947583 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45951844 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45951862 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45952768 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45952770 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15723/ --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45953817 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45953826 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45955115 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45955130 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45957667 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15726/ --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45957665 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15727/ --- If your project

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45957662 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45957664 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45958592 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45958593 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15730/ --- If your project

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45960389 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45960381 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13732786 --- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala --- @@ -22,6 +22,9 @@ import scala.reflect.ClassTag import org.scalatest.FunSuite

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45961652 Build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45961661 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45963020 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15737/ --- If your project

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45963019 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45964543 Build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45964549 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45964768 Build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45964776 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45965544 LGTM. Thanks! Waiting for Jenkins ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45966208 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15738/ --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45966207 Build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45966881 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45968210 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45969008 Build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45969011 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15743/ --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45969009 Build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45969013 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15742/ --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45970156 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15747/ --- If your project

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45970154 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45970681 Merged. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/916 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45657547 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45657561 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13613085 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -394,20 +401,22 @@ abstract class RDD[T: ClassTag]( return new Array[T](0)

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13613361 --- Diff: core/src/main/scala/org/apache/spark/util/random/SamplingUtils.scala --- @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13613522 --- Diff: pom.xml --- @@ -257,6 +257,11 @@ version1.5/version /dependency dependency +

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45662406 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45662408 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15632/ --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-10 Thread dorx
Github user dorx commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45666850 @falaki please feel free to take a look --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45666931 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45667095 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45666943 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45667096 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15643/ --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45668065 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45668085 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45675122 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45675125 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15644/ --- If your project

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45549588 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15575/ --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45549587 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45550410 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15576/ --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45550409 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45542040 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45542629 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45545694 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45546660 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-02 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-44861886 @dorx Please click the link above and check test failures. It is related to python imports. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13294532 --- Diff: python/pyspark/rdd.py --- @@ -31,6 +31,7 @@ import warnings import heapq from random import Random +from math import sqrt, log, min

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-02 Thread concretevitamin
Github user concretevitamin commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-44869183 Hey @dorx -- how much faster is this new sampling method? I'd love to port this to SparkR once this is merged in. --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-02 Thread dorx
Github user dorx commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-44876657 @concretevitamin probably not faster on individual runs (in fact there's slightly more computation/example). What this gains us is the ability to guarantee that we get

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-44878013 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-44877997 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-44882426 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15352/ --- If your project

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-02 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-44885870 @concretevitamin This is certainly faster for large sample size. The size of the candidate set is about `s + O(sqrt(s))` with this implementation, while the previous

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13319638 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -402,10 +411,11 @@ abstract class RDD[T: ClassTag]( } if (num

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-01 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-44802433 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-44802546 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-44802549 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-44803842 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15343/ --- If your project is set up for it, you can

  1   2   >