[GitHub] spark pull request: [SPARK-3790][MLlib] CosineSimilarity Example

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2622#issuecomment-57896544 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21287/consoleFull) for PR 2622 at commit [`eca3dfd`](https://github.com/ap

[GitHub] spark pull request: [SPARK-3790][MLlib] CosineSimilarity Example

2014-10-03 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/2622#issuecomment-57896484 Parameters are now configurable. Added approximation error reporting. Added JIRA. --- If your project is set up for it, you can reply to this email and have you

[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2647#issuecomment-57896055 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21286/consoleFull) for PR 2647 at commit [`5fc1259`](https://github.com/a

[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...

2014-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2647#issuecomment-57896058 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/2

[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2647#issuecomment-57894903 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21286/consoleFull) for PR 2647 at commit [`5fc1259`](https://github.com/ap

[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...

2014-10-03 Thread sarutak
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2647#discussion_r18427632 --- Diff: project/SparkBuild.scala --- @@ -99,6 +99,30 @@ object SparkBuild extends PomBuild { v.split("(\\s+|,)").filterNot(_.isEmpty).map(_.trim.r

[GitHub] spark pull request: [SPARK-2098] All Spark processes should suppor...

2014-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2379#issuecomment-57894238 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/2

[GitHub] spark pull request: [SPARK-2098] All Spark processes should suppor...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2379#issuecomment-57894236 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21285/consoleFull) for PR 2379 at commit [`80b0b12`](https://github.com/a

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-03 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-57894117 I ran the style tests. The pass. Is there something else in the style guide that is not captured in the tests ? I have expended much effort to avoid serializ

[GitHub] spark pull request: [SPARK-2098] All Spark processes should suppor...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2379#issuecomment-57893391 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21285/consoleFull) for PR 2379 at commit [`80b0b12`](https://github.com/ap

[GitHub] spark pull request: [SPARK-3772] Allow `ipython` to be used by Pys...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2651#issuecomment-57893128 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21284/consoleFull) for PR 2651 at commit [`c4f5778`](https://github.com/a

[GitHub] spark pull request: [SPARK-3772] Allow `ipython` to be used by Pys...

2014-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2651#issuecomment-57893129 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/2

[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-03 Thread tdas
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/2624#issuecomment-57892709 @JoshRosen the simple fix is to delete the threadlocal variable completely. Then any access to the threadlocal variable from any thread (even threadpool in Py4J) is going to

[GitHub] spark pull request: [SPARK-3772] Allow `ipython` to be used by Pys...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2651#issuecomment-57891695 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21284/consoleFull) for PR 2651 at commit [`c4f5778`](https://github.com/ap

[GitHub] spark pull request: [SPARK-3772] Allow `ipython` to be used by Pys...

2014-10-03 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2651#issuecomment-57891571 /cc @davies @cocoatomo @robbles for reviews / feedback. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: [SPARK-3772] Allow `ipython` to be used by Pys...

2014-10-03 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2651#issuecomment-57891560 /cc @davies @cocotomo @robbles for reviews / feedback. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. I

[GitHub] spark pull request: [SPARK-3772] Allow `ipython` to be used by Pys...

2014-10-03 Thread JoshRosen
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/2651 [SPARK-3772] Allow `ipython` to be used by Pyspark workers; IPython fixes: This pull request addresses a few issues related to PySpark's IPython support: - Fix the remaining uses of the '

[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...

2014-10-03 Thread sarutak
GitHub user sarutak reopened a pull request: https://github.com/apache/spark/pull/2647 [SPARK-3787] Assembly jar name is wrong when we build with sbt omitting -Dhadoop.version This PR is another solution for When we build with sbt with profile for hadoop and without property for ha

[GitHub] spark pull request: [SPARK-3788] [yarn] Fix compareFs to do the ri...

2014-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2650#issuecomment-57891486 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/2

[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...

2014-10-03 Thread sarutak
Github user sarutak closed the pull request at: https://github.com/apache/spark/pull/2647 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is ena

[GitHub] spark pull request: [SPARK-3788] [yarn] Fix compareFs to do the ri...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2650#issuecomment-57891483 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21282/consoleFull) for PR 2650 at commit [`0e36be7`](https://github.com/a

[GitHub] spark pull request: [SPARK-3788] [yarn] Fix compareFs to do the ri...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2649#issuecomment-57891421 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21283/consoleFull) for PR 2649 at commit [`c938845`](https://github.com/a

[GitHub] spark pull request: [SPARK-3788] [yarn] Fix compareFs to do the ri...

2014-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2649#issuecomment-57891422 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/2

[GitHub] spark pull request: [SPARK-3786] [PySpark] speedup tests

2014-10-03 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2646#discussion_r18426767 --- Diff: python/pyspark/shuffle.py --- @@ -428,7 +427,7 @@ def _recursive_merged_items(self, start): subdirs = [os.path.join(d, "parts", st

[GitHub] spark pull request: [SPARK-3786] [PySpark] speedup tests

2014-10-03 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2646#discussion_r18426749 --- Diff: python/pyspark/tests.py --- @@ -152,7 +152,7 @@ def test_external_sort(self): self.assertGreater(shuffle.DiskBytesSpilled, last)

[GitHub] spark pull request: [SPARK-3786] [PySpark] speedup tests

2014-10-03 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2646#discussion_r18426755 --- Diff: python/pyspark/tests.py --- @@ -152,7 +152,7 @@ def test_external_sort(self): self.assertGreater(shuffle.DiskBytesSpilled, last)

[GitHub] spark pull request: [SPARK-3786] [PySpark] speedup tests

2014-10-03 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2646#issuecomment-57890710 This is a great set of refactorings! Thanks for improving the consistency of the test suite names. --- If your project is set up for it, you can reply to this email a

[GitHub] spark pull request: [SPARK-3786] [PySpark] speedup tests

2014-10-03 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2646#discussion_r18426715 --- Diff: python/pyspark/tests.py --- @@ -754,27 +756,19 @@ def test_serialize_nested_array_and_map(self): self.assertEqual("2", row.d)

[GitHub] spark pull request: [SPARK-3788] [yarn] Fix compareFs to do the ri...

2014-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2650#issuecomment-57890533 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/2

[GitHub] spark pull request: [SPARK-3788] [yarn] Fix compareFs to do the ri...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2650#issuecomment-57890530 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21281/consoleFull) for PR 2650 at commit [`772aead`](https://github.com/a

[GitHub] spark pull request: Event proration based on event timestamps.

2014-10-03 Thread bijaybisht
Github user bijaybisht commented on the pull request: https://github.com/apache/spark/pull/2633#issuecomment-57890383 Don't understand why it failed this time. Can test be re-fired? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...

2014-10-03 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-5790 Hey , I will check this patch very soon. I have an impression that these changes to SparkBuild are not needed. Even if they are needed then something needs fixed in po

[GitHub] spark pull request: [SPARK-3788] [yarn] Fix compareFs to do the ri...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2649#issuecomment-57888369 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21280/consoleFull) for PR 2649 at commit [`9f7b571`](https://github.com/a

[GitHub] spark pull request: [SPARK-3788] [yarn] Fix compareFs to do the ri...

2014-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2649#issuecomment-57888370 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/2

[GitHub] spark pull request: [SPARK-3486][MLlib][PySpark] PySpark support f...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2356#issuecomment-57888189 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21279/consoleFull) for PR 2356 at commit [`a73fa19`](https://github.com/a

[GitHub] spark pull request: [SPARK-3486][MLlib][PySpark] PySpark support f...

2014-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2356#issuecomment-57888194 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/2

[GitHub] spark pull request: [SPARK-3788] [yarn] Fix compareFs to do the ri...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2649#issuecomment-57888024 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21283/consoleFull) for PR 2649 at commit [`c938845`](https://github.com/ap

[GitHub] spark pull request: [SPARK-3788] [yarn] Fix compareFs to do the ri...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2650#issuecomment-57888034 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21282/consoleFull) for PR 2650 at commit [`0e36be7`](https://github.com/ap

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread erikerlandson
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/2455#issuecomment-57883154 @mengxr I'll be occupied next week but I'll try to respond asap to your feedback the week after --- If your project is set up for it, you can reply to this email a

[GitHub] spark pull request: [SPARK-3788] [yarn] Fix compareFs to do the ri...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2650#issuecomment-57882214 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21281/consoleFull) for PR 2650 at commit [`772aead`](https://github.com/ap

[GitHub] spark pull request: [SPARK-3788] [yarn] Fix compareFs to do the ri...

2014-10-03 Thread vanzin
GitHub user vanzin opened a pull request: https://github.com/apache/spark/pull/2650 [SPARK-3788] [yarn] Fix compareFs to do the right thing for HA, federati... ...on (1.1 version). HA and federation use namespaces instead of host names, so you can't resolve them since th

[GitHub] spark pull request: [SPARK-3788] [yarn] Fix compareFs to do the ri...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2649#issuecomment-57881327 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21280/consoleFull) for PR 2649 at commit [`9f7b571`](https://github.com/ap

[GitHub] spark pull request: [SPARK-3486][MLlib][PySpark] PySpark support f...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2356#issuecomment-57881048 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21279/consoleFull) for PR 2356 at commit [`a73fa19`](https://github.com/ap

[GitHub] spark pull request: [SPARK-3786] [PySpark] speedup tests

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2646#issuecomment-57881070 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/264/consoleFull) for PR 2646 at commit [`6a2a4b0`](https://github.com/

[GitHub] spark pull request: [SPARK-3788] [yarn] Fix compareFs to do the ri...

2014-10-03 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/2649#issuecomment-57880944 I tested this on both regular and federated HDFS, verified that the "Upload foo..." message in the logs does not show up in either while it would show up for federated HDF

[GitHub] spark pull request: [SPARK-3788] [yarn] Fix compareFs to do the ri...

2014-10-03 Thread vanzin
GitHub user vanzin opened a pull request: https://github.com/apache/spark/pull/2649 [SPARK-3788] [yarn] Fix compareFs to do the right thing for HA, federati... ...on. HA and federation use namespaces instead of host names, so you can't resolve them since that will fail.

[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2647#issuecomment-57880751 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21278/consoleFull) for PR 2647 at commit [`eebbb7d`](https://github.com/a

[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...

2014-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2647#issuecomment-57880772 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/2

[GitHub] spark pull request: [SPARK-3606] [yarn] Correctly configure AmIpFi...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2497#issuecomment-57880702 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21277/consoleFull) for PR 2497 at commit [`75cde8c`](https://github.com/a

[GitHub] spark pull request: [SPARK-3606] [yarn] Correctly configure AmIpFi...

2014-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2497#issuecomment-57880723 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/2

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2455#issuecomment-57874204 @erikerlandson I didn't check the test code. I will try to find another time to make a pass on the test. The implementation looks good to me except minor inline comments.

[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...

2014-10-03 Thread jkleckner
Github user jkleckner commented on a diff in the pull request: https://github.com/apache/spark/pull/2647#discussion_r18423582 --- Diff: project/SparkBuild.scala --- @@ -99,6 +99,30 @@ object SparkBuild extends PomBuild { v.split("(\\s+|,)").filterNot(_.isEmpty).map(_.trim

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423491 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423498 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423493 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423500 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423504 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423499 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423485 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423495 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423492 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423489 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423437 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -39,13 +42,46 @@ trait RandomSampler[T, U] extends Pseudorandom with Clo

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423464 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423484 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423470 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423457 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423473 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423487 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423468 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423448 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423443 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -39,13 +42,46 @@ trait RandomSampler[T, U] extends Pseudorandom with Clo

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423433 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -39,13 +42,46 @@ trait RandomSampler[T, U] extends Pseudorandom with Clo

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423478 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423454 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423461 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423474 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423463 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423479 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423453 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423459 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423475 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423477 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423429 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -39,13 +42,46 @@ trait RandomSampler[T, U] extends Pseudorandom with Clo

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423444 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -39,13 +42,46 @@ trait RandomSampler[T, U] extends Pseudorandom with Clo

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423440 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -39,13 +42,46 @@ trait RandomSampler[T, U] extends Pseudorandom with Clo

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423426 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -43,7 +43,8 @@ import org.apache.spark.partial.PartialResult import org.apache.spark.st

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423449 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423438 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -39,13 +42,46 @@ trait RandomSampler[T, U] extends Pseudorandom with Clo

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423427 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -375,7 +376,9 @@ abstract class RDD[T: ClassTag]( val sum = weights.sum va

[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...

2014-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2612#issuecomment-57873138 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/2

[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2612#issuecomment-57873123 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21276/consoleFull) for PR 2612 at commit [`91fb0fd`](https://github.com/a

[GitHub] spark pull request: [Spark] RDD take() method: overestimate too mu...

2014-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2648#issuecomment-57872445 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pro

[GitHub] spark pull request: [Spark] RDD take() method: overestimate too mu...

2014-10-03 Thread yingjieMiao
GitHub user yingjieMiao opened a pull request: https://github.com/apache/spark/pull/2648 [Spark] RDD take() method: overestimate too much In the comment (Line 1083), it says: "Otherwise, interpolate the number of partitions we need to try, but overestimate it by 50%." `(1.5

[GitHub] spark pull request: [SPARK-3786] [PySpark] speedup tests

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2646#issuecomment-57871751 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/264/consoleFull) for PR 2646 at commit [`6a2a4b0`](https://github.com/a

[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2647#issuecomment-57870821 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21278/consoleFull) for PR 2647 at commit [`eebbb7d`](https://github.com/ap

[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...

2014-10-03 Thread sarutak
GitHub user sarutak opened a pull request: https://github.com/apache/spark/pull/2647 [SPARK-3787] Assembly jar name is wrong when we build with sbt omitting -Dhadoop.version This PR is another solution for When we build with sbt with profile for hadoop and without property for hado

[GitHub] spark pull request: [SPARK-3606] [yarn] Correctly configure AmIpFi...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2497#issuecomment-57869998 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21277/consoleFull) for PR 2497 at commit [`75cde8c`](https://github.com/ap

[GitHub] spark pull request: [SPARK-3606] [yarn] Correctly configure AmIpFi...

2014-10-03 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/2497#discussion_r18421944 --- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala --- @@ -148,14 +146,19 @@ private[spark] object JettyUtils extends Logging { h

[GitHub] spark pull request: [SPARK-3606] [yarn] Correctly configure AmIpFi...

2014-10-03 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/2497#discussion_r18421854 --- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala --- @@ -148,14 +146,19 @@ private[spark] object JettyUtils extends Logging {

  1   2   3   4   >