[GitHub] spark pull request: [SPARK-3571] Spark standalone cluster mode doe...

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2436#issuecomment-55957810 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20488/consoleFull) for PR 2436 at commit

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-09-17 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-55958078 Summarizing some of our in-person discussion (@davies, let me know if I've made any mistakes here!): `GroupByKey` and `SameKey` work together to address the

[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...

2014-09-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2378#discussion_r17693196 --- Diff: python/pyspark/mllib/recommendation.py --- @@ -54,34 +64,51 @@ def __del__(self): def predict(self, user, product): return

[GitHub] spark pull request: [SQL][DOCS] Improve table caching section

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2434#issuecomment-55958250 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20479/consoleFull) for PR 2434 at commit

[GitHub] spark pull request: [SPARK-1455] Fix expansion of testing argument...

2014-09-17 Thread nchammas
GitHub user nchammas opened a pull request: https://github.com/apache/spark/pull/2437 [SPARK-1455] Fix expansion of testing arguments to sbt Testing arguments to `sbt` need to be passed as an array, not a single, long string. You can merge this pull request into a Git repository

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-09-17 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-55958996 This looks like a good patch. The code here is fairly complicated and had some complex control flow, although after discussion I believe that it works correctly. It

[GitHub] spark pull request: [SPARK-3571] Spark standalone cluster mode doe...

2014-09-17 Thread sarutak
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2436#discussion_r17693786 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -491,14 +491,13 @@ private[spark] class Master( val

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-09-17 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-55959414 There's a bit of code duplication between ExternalGroupBy and ExternalMerger, but maybe this is unavoidable. It would be nice to add a short comment to

[GitHub] spark pull request: [SPARK-3534] Fix expansion of testing argument...

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2437#issuecomment-55960009 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20489/consoleFull) for PR 2437 at commit

[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...

2014-09-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2378#discussion_r17694320 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -40,11 +43,11 @@ import org.apache.spark.mllib.util.MLUtils

[GitHub] spark pull request: [SPARK-3534] Fix expansion of testing argument...

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2437#issuecomment-55960641 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20490/consoleFull) for PR 2437 at commit

[GitHub] spark pull request: [SPARK-3571] Spark standalone cluster mode doe...

2014-09-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/2436#discussion_r17694410 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -491,14 +491,13 @@ private[spark] class Master( val

[GitHub] spark pull request: [SPARK-3571] Spark standalone cluster mode doe...

2014-09-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/2436#discussion_r17694510 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -490,22 +490,27 @@ private[spark] class Master( // Randomization

[GitHub] spark pull request: [SPARK-3534] Fix expansion of testing argument...

2014-09-17 Thread nchammas
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/2437#issuecomment-55961238 cc @marmbrus After merging, I suggest trying this out with a dummy SQL PR. I have to head out now, but I'll be online later tonight. --- If your project is

[GitHub] spark pull request: [SPARK-3571] Spark standalone cluster mode doe...

2014-09-17 Thread sarutak
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2436#discussion_r17695015 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -491,14 +491,13 @@ private[spark] class Master( val

[GitHub] spark pull request: [SPARK-3571] Spark standalone cluster mode doe...

2014-09-17 Thread sarutak
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2436#discussion_r17695194 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -490,22 +490,27 @@ private[spark] class Master( // Randomization

[GitHub] spark pull request: [SPARK-3571] Spark standalone cluster mode doe...

2014-09-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/2436#discussion_r17695418 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -491,14 +491,13 @@ private[spark] class Master( val

[GitHub] spark pull request: [SPARK-3571] Spark standalone cluster mode doe...

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2436#issuecomment-55962852 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20484/consoleFull) for PR 2436 at commit

[GitHub] spark pull request: [SPARK-3477] Clean up code in Yarn Client / Cl...

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2350#issuecomment-55963416 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20485/consoleFull) for PR 2350 at commit

[GitHub] spark pull request: [SPARK-3571] Spark standalone cluster mode doe...

2014-09-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/2436#discussion_r17695766 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -491,14 +491,13 @@ private[spark] class Master( val

[GitHub] spark pull request: [SPARK-3477] Clean up code in Yarn Client / Cl...

2014-09-17 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2350#issuecomment-55963814 @vanzin @tgravescs I think I have addressed all of your comments. Let me know if I missed something. Can you look at the latest changes? I have tested the basic

[GitHub] spark pull request: [SPARK-3377] [Metrics] Metrics can be accident...

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2432#issuecomment-55963948 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20486/consoleFull) for PR 2432 at commit

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2435#issuecomment-55964387 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/128/consoleFull) for PR 2435 at commit

[GitHub] spark pull request: add a util method for changing the log level w...

2014-09-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2433#issuecomment-55964616 Is the idea here for this to be called by spark users or by spark internal components? If the former, this won't be visible because it's in a private object. It could

[GitHub] spark pull request: [SPARK-3571] Spark standalone cluster mode doe...

2014-09-17 Thread sarutak
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2436#discussion_r17696251 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -491,14 +491,13 @@ private[spark] class Master( val

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-17 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2435#issuecomment-55964720 Some graphX test failure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: add a util method for changing the log level w...

2014-09-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2433#issuecomment-55964698 One case where this could be useful is in the Spark shell: ``` scala sc.setLoggingLevel(WARN) ``` --- If your project is set up for it, you can reply

[GitHub] spark pull request: add a util method for changing the log level w...

2014-09-17 Thread holdenk
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/2433#issuecomment-55964974 Sounds good, how about I have the spark context do the conversion and call the utils method? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-3571] Spark standalone cluster mode doe...

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2436#issuecomment-55965053 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20491/consoleFull) for PR 2436 at commit

[GitHub] spark pull request: [SPARK-2098] All Spark processes should suppor...

2014-09-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/2379#discussion_r17696773 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/WorkerArguments.scala --- @@ -47,14 +48,15 @@ private[spark] class WorkerArguments(args:

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-55966108 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20487/consoleFull) for PR 1486 at commit

[GitHub] spark pull request: [SPARK-3319] [SPARK-3338] Resolve Spark submit...

2014-09-17 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2232#issuecomment-55966480 Yes, this changes the behavior if the user sets `spark.yarn.dist.*`. Note that the whole point of this PR is to fix the divergence in path resolution behavior if the

[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...

2014-09-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2378#discussion_r17697102 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -476,259 +436,167 @@ class PythonMLLibAPI extends Serializable

[GitHub] spark pull request: [SPARK-3454] Expose JSON representation of dat...

2014-09-17 Thread sarutak
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2333#issuecomment-55966735 test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: add a util method for changing the log level w...

2014-09-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2433#issuecomment-55966808 yeah, that works --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-3571] Spark standalone cluster mode doe...

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2436#issuecomment-55966901 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20492/consoleFull) for PR 2436 at commit

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-17 Thread cmccabe
Github user cmccabe commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-55966988 The unit test failure mentioned here seems to be coming from the binary compatibility checker. The text of the error is: [error] * class

[GitHub] spark pull request: [SPARK-3571] Spark standalone cluster mode doe...

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2436#issuecomment-55967262 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20488/consoleFull) for PR 2436 at commit

[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...

2014-09-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2378#discussion_r17697397 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -476,259 +436,167 @@ class PythonMLLibAPI extends Serializable

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-17 Thread codedeft
Github user codedeft commented on the pull request: https://github.com/apache/spark/pull/2435#issuecomment-55967377 Hi Joseph, I'll take a look when I can, but this is a massive PR, so I'm not sure if I'll have time to go through this thoroughly. * I suppose that

[GitHub] spark pull request: [SPARK-3454] Expose JSON representation of dat...

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2333#issuecomment-55967426 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20493/consoleFull) for PR 2333 at commit

[GitHub] spark pull request: add spark.driver.memory to config docs

2014-09-17 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2410#issuecomment-55967503 Hey @nartz, can you also add `spark.driver.extraClassPath`, `spark.driver.extraLibraryPath` and `spark.driver.extraJavaOptions`? --- If your project is set up for

[GitHub] spark pull request: Docs: move HA subsections to a deeper indentat...

2014-09-17 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2402#issuecomment-55967752 Thanks @ash211, merging this into master and 1.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-3407][SQL]Add Date type support

2014-09-17 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2344#issuecomment-55967788 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-3547]Using a special exit code instead ...

2014-09-17 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2421#issuecomment-55967900 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: Docs: move HA subsections to a deeper indentat...

2014-09-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2402 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-3051] Support looking-up named accumula...

2014-09-17 Thread nfergu
GitHub user nfergu opened a pull request: https://github.com/apache/spark/pull/2438 [SPARK-3051] Support looking-up named accumulators in a registry This proposal builds on SPARK-2380 (Support displaying accumulator values in the web UI) to allow named accumulables to be

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-17 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2390#issuecomment-55968181 Would you mind to close this PR since #2397 was opened as a replacement? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-17 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2381#issuecomment-55968221 Mind to close this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3051] Support looking-up named accumula...

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2438#issuecomment-55968236 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-17 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2435#issuecomment-55968376 @codedeft No problem; I apologize for how large the PR is. I agree this should be merged before further optimizations are made. This does not include node caching;

[GitHub] spark pull request: [SPARK-3534] Fix expansion of testing argument...

2014-09-17 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/2437#issuecomment-55968474 Merged, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-3407][SQL]Add Date type support

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2344#issuecomment-55968540 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20495/consoleFull) for PR 2344 at commit

[GitHub] spark pull request: [SPARK-3534] Fix expansion of testing argument...

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2437#issuecomment-55968592 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20490/consoleFull) for PR 2437 at commit

[GitHub] spark pull request: [SPARK-3534] Fix expansion of testing argument...

2014-09-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2437 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-17 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-55968915 @srowen, I moved most of the line comments into code comments that I have committed. Thx! --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: [SPARK-3407][SQL]Add Date type support

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2344#issuecomment-55969213 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20495/consoleFull) for PR 2344 at commit

[GitHub] spark pull request: [SPARK-3529] [SQL] Delete the temp files after...

2014-09-17 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2393#issuecomment-55969364 +1 for the `deleteOnExit`/`deleteRecursively` pattern. @mattf According to its

[GitHub] spark pull request: [SPARK-3551] Remove redundant putting FetchRes...

2014-09-17 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2413#issuecomment-55969817 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...

2014-09-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2378#discussion_r17698519 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -476,259 +436,167 @@ class PythonMLLibAPI extends Serializable

[GitHub] spark pull request: [SPARK-3266] [Java] Change JavaRDDLike trait t...

2014-09-17 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2186#issuecomment-55970035 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-3266] [Java] Change JavaRDDLike trait t...

2014-09-17 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2186#issuecomment-55970507 This almost certainly breaks binary compatibility; sorry for letting this PR sit for so long. I'll try to update it today or tomorrow. --- If your project is set up

[GitHub] spark pull request: [SPARK-3551] Remove redundant putting FetchRes...

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2413#issuecomment-55970591 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20496/consoleFull) for PR 2413 at commit

[GitHub] spark pull request: [SPARK-3571] Spark standalone cluster mode doe...

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2436#issuecomment-55971243 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20491/consoleFull) for PR 2436 at commit

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-17 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-55971460 To understand and evaluate this pull request, I would suggest that a reviewer do the following: 1) Look at the `PointOps` trait and its `FastEuclideanOps`

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-17 Thread codedeft
Github user codedeft commented on the pull request: https://github.com/apache/spark/pull/2435#issuecomment-55971575 @jkbradley I don't quite get what different columns in result numbers mean. Do you mean that you are still training exactly the same single tree (to depth 6) on

[GitHub] spark pull request: [SPARK-3571] Spark standalone cluster mode doe...

2014-09-17 Thread sarutak
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2436#issuecomment-55971553 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-17 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2435#issuecomment-55972221 Each row is a single (random) dataset. The 2 different sets of result columns are for 2 different RF implementations: * (numTrees): This is from an earlier commit,

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-17 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-55972212 @mengxr per your request, here is a pull request that addresses many of the outstanding issues with the 1.1.0 Spark K-Means clusterer. --- If your project is set

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-17 Thread ankurdave
Github user ankurdave commented on the pull request: https://github.com/apache/spark/pull/2435#issuecomment-55973870 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3547]Using a special exit code instead ...

2014-09-17 Thread sarutak
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2421#discussion_r17699917 --- Diff: sbin/start-thriftserver.sh --- @@ -27,7 +27,7 @@ set -o posix FWDIR=$(cd `dirname $0`/..; pwd)

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2435#issuecomment-55974300 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20499/consoleFull) for PR 2435 at commit

[GitHub] spark pull request: [SPARK-3571] Spark standalone cluster mode doe...

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2436#issuecomment-55974434 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20492/consoleFull) for PR 2436 at commit

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-17 Thread codedeft
Github user codedeft commented on the pull request: https://github.com/apache/spark/pull/2435#issuecomment-55974486 @jkbradley Thanks Joseph. It makes sense. It looks good upon very rough browsing. Some minor things: * Would be nice to have support for

[GitHub] spark pull request: [SPARK-3454] Expose JSON representation of dat...

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2333#issuecomment-55974879 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20493/consoleFull) for PR 2333 at commit

[GitHub] spark pull request: [SPARK-3547]Using a special exit code instead ...

2014-09-17 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2421#discussion_r17700240 --- Diff: sbin/start-thriftserver.sh --- @@ -27,7 +27,7 @@ set -o posix FWDIR=$(cd `dirname $0`/..; pwd)

[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...

2014-09-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2378#discussion_r17700232 --- Diff: python/pyspark/mllib/linalg.py --- @@ -23,14 +23,148 @@ SciPy is available in their environment. -import numpy -from

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-17 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2435#issuecomment-55975101 @codedeft For w/o replacement bagging, I definitely agree, and I'll make a JIRA for that after this PR is merged. For manual feature subset size, what sounds best to

[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...

2014-09-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2378#discussion_r17700424 --- Diff: python/pyspark/mllib/linalg.py --- @@ -23,14 +23,148 @@ SciPy is available in their environment. -import numpy -from

[GitHub] spark pull request: [SPARK-3007][SQL]Add Dynamic Partition suppo...

2014-09-17 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/2226#discussion_r17700457 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SparkHadoopWriter.scala --- @@ -0,0 +1,213 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...

2014-09-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2378#discussion_r17700471 --- Diff: python/pyspark/mllib/linalg.py --- @@ -23,14 +23,148 @@ SciPy is available in their environment. -import numpy -from

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-17 Thread codedeft
Github user codedeft commented on the pull request: https://github.com/apache/spark/pull/2435#issuecomment-55975519 @jkbradley I guess that I don't have a particular preference, (either fraction or the actual number). The actual number seems a bit better to me since you are not going

[GitHub] spark pull request: [SPARK-3007][SQL]Add Dynamic Partition suppo...

2014-09-17 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/2226#discussion_r17700552 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SparkHadoopWriter.scala --- @@ -0,0 +1,213 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3578] Fix upper bound in GraphGenerator...

2014-09-17 Thread ankurdave
GitHub user ankurdave opened a pull request: https://github.com/apache/spark/pull/2439 [SPARK-3578] Fix upper bound in GraphGenerators.sampleLogNormal GraphGenerators.sampleLogNormal is supposed to return an integer strictly less than maxVal. However, it violates this guarantee. It

[GitHub] spark pull request: [SPARK-3571] Spark standalone cluster mode doe...

2014-09-17 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2436#issuecomment-55975705 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-3571] Spark standalone cluster mode doe...

2014-09-17 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2436#issuecomment-55975876 Ok, LGTM... I tested this locally. Merging into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-3547]Using a special exit code instead ...

2014-09-17 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/2421#discussion_r17700696 --- Diff: sbin/start-thriftserver.sh --- @@ -27,7 +27,7 @@ set -o posix FWDIR=$(cd `dirname $0`/..; pwd)

[GitHub] spark pull request: [SPARK-3578] Fix upper bound in GraphGenerator...

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2439#issuecomment-55975949 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20500/consoleFull) for PR 2439 at commit

[GitHub] spark pull request: [SPARK-3571] Spark standalone cluster mode doe...

2014-09-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2436 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-17 Thread codedeft
Github user codedeft commented on the pull request: https://github.com/apache/spark/pull/2435#issuecomment-55976071 Additionally, I suppose allowing the actual size for feature subset as an input would be useful in model-search later on. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-3454] Expose JSON representation of dat...

2014-09-17 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2333#issuecomment-55976178 @sarutak, I believe @JoshRosen is working on a more general framework for extracting the info displayed on the UI as JSON, so there's a chance that we won't go with

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-17 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2435#issuecomment-55976264 I'll make a JIRA for supporting hand-picked numbers of features; we can discuss fraction vs. integer there. I like the functional options (sqrt, log2) supported by

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-17 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2435#issuecomment-55976349 For naming, scikit-learn uses max_features instead of featureSubsetStrategy. Both of those are a little vague. I'm wondering if the name should be changed to

[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...

2014-09-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2378#discussion_r17700987 --- Diff: python/pyspark/mllib/linalg.py --- @@ -23,14 +23,148 @@ SciPy is available in their environment. -import numpy -from

[GitHub] spark pull request: [SPARK-3564][WebUI] Display App ID on HistoryP...

2014-09-17 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2424#issuecomment-55976595 Thanks I'm merging this into master and 1.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-3454] Expose JSON representation of dat...

2014-09-17 Thread sarutak
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2333#issuecomment-55976627 @andrewor14 Thank you for notification! Actually, I need JSON representation for #2342 . I'm planning to parse JSON to use D3. --- If your project is set up for

[GitHub] spark pull request: [SPARK-3564][WebUI] Display App ID on HistoryP...

2014-09-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2424 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...

2014-09-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2378#discussion_r17701086 --- Diff: python/pyspark/mllib/linalg.py --- @@ -61,16 +195,19 @@ def __init__(self, size, *args): if type(pairs) == dict:

[GitHub] spark pull request: [SPARK-3007][SQL]Add Dynamic Partition suppo...

2014-09-17 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/2226#discussion_r17701144 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala --- @@ -522,6 +523,52 @@ class HiveQuerySuite extends

[GitHub] spark pull request: [SPARK-3567] appId field in SparkDeploySchedul...

2014-09-17 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2428#issuecomment-55976928 Looks good. Have you looked at whether we need to do the same for other scheduler backends? (e.g. yarn, mesos) --- If your project is set up for it, you can reply to

<    1   2   3   4   5   >