[GitHub] spark pull request: [SPARK-2075][Core] backport for branch-1.2

2014-12-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3758#issuecomment-67811183 [Test build #24699 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24699/consoleFull) for PR 3758 at commit

[GitHub] spark pull request: [SPARK-2075][Core] backport for branch-1.2

2014-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3758#issuecomment-67811187 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-4916][SQL][DOCS]Update SQL programming ...

2014-12-22 Thread luogankun
GitHub user luogankun opened a pull request: https://github.com/apache/spark/pull/3759 [SPARK-4916][SQL][DOCS]Update SQL programming guide about cache section `SchemeRDD.cache()` now uses in-memory columnar storage. You can merge this pull request into a Git repository by running:

[GitHub] spark pull request: [SPARK-4916][SQL][DOCS]Update SQL programming ...

2014-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3759#issuecomment-67811391 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-4917] Add a function to convert into a ...

2014-12-22 Thread maropu
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/3760 [SPARK-4917] Add a function to convert into a graph with canonical edges in GraphOps Convert bi-directional edges into uni-directional ones instead of 'canonicalOrientation' in

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an Arti...

2014-12-22 Thread loachli
Github user loachli commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-67812105 @avulanov: *Could you write a brief description to the ANN test called Gradient of ANN to let the reader understand more clearly what we are testing?* The test

[GitHub] spark pull request: [SPARK-4917] Add a function to convert into a ...

2014-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3760#issuecomment-67812097 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-2075][Core] backport for branch-1.2

2014-12-22 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/3758#issuecomment-67814128 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-4692] [SQL] Support ! boolean logic ope...

2014-12-22 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3555#issuecomment-67814308 Ah, sorry, forgot that the golden answer file name is generated by the MD5 of the query string. Then let's revert the last space change. I think this minor issue

[GitHub] spark pull request: [SPARK-2075][Core] backport for branch-1.2

2014-12-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3758#issuecomment-67814511 [Test build #24700 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24700/consoleFull) for PR 3758 at commit

[GitHub] spark pull request: [SQL] spark-sql aborted if passed in a wrong s...

2014-12-22 Thread scwf
GitHub user scwf opened a pull request: https://github.com/apache/spark/pull/3761 [SQL] spark-sql aborted if passed in a wrong sql If we passed in a wrong sql like ```abdcdfsfs```, the spark-sql script aborted. You can merge this pull request into a Git repository by running:

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-22 Thread FlytxtRnD
Github user FlytxtRnD commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-67816287 Sorry for late reply.predictLabels() and predictMembership() looks fine.But what about moving the computeSoftAssignments() to GaussianMixtureModelEM class(in KMeans,

[GitHub] spark pull request: [SQL] spark-sql aborted if passed in a wrong s...

2014-12-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3761#issuecomment-67816416 [Test build #24701 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24701/consoleFull) for PR 3761 at commit

[GitHub] spark pull request: [SPARK-1953][YARN]yarn client mode Application...

2014-12-22 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/3607#issuecomment-67816606 @andrewor14 Ok, I got what you mean. I think I have a misunderstanding before. To solve this problem, should we just delete `(--driver-memory,

[GitHub] spark pull request: [SPARK-4692] [SQL] Support ! boolean logic ope...

2014-12-22 Thread YanTangZhai
Github user YanTangZhai commented on the pull request: https://github.com/apache/spark/pull/3555#issuecomment-67816709 @liancheng I will revert the last space change. Thanks for your comment. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-4692] [SQL] Support ! boolean logic ope...

2014-12-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3555#issuecomment-67817199 [Test build #24702 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24702/consoleFull) for PR 3555 at commit

[GitHub] spark pull request: [SQL] spark-sql aborted if passed in a wrong s...

2014-12-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3761#issuecomment-67818470 [Test build #24703 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24703/consoleFull) for PR 3761 at commit

[GitHub] spark pull request: [SPARK-4912][SQL] Persistent tables for the Sp...

2014-12-22 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/3752#discussion_r22159720 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -55,8 +56,60 @@ private[hive] class

[GitHub] spark pull request: [SPARK-4912][SQL] Persistent tables for the Sp...

2014-12-22 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/3752#discussion_r22160548 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -55,8 +56,60 @@ private[hive] class

[GitHub] spark pull request: [SQL] spark-sql aborted if passed in a wrong s...

2014-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3761#issuecomment-67822551 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SQL] spark-sql aborted if passed in a wrong s...

2014-12-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3761#issuecomment-67822544 [Test build #24701 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24701/consoleFull) for PR 3761 at commit

[GitHub] spark pull request: [SPARK-2075][Core] backport for branch-1.2

2014-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3758#issuecomment-67822642 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-2075][Core] backport for branch-1.2

2014-12-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3758#issuecomment-67822635 [Test build #24700 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24700/consoleFull) for PR 3758 at commit

[GitHub] spark pull request: [SPARK-4912][SQL] Persistent tables for the Sp...

2014-12-22 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/3752#discussion_r22161200 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -55,8 +56,60 @@ private[hive] class

[GitHub] spark pull request: [SPARK-4692] [SQL] Support ! boolean logic ope...

2014-12-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3555#issuecomment-67823381 [Test build #24702 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24702/consoleFull) for PR 3555 at commit

[GitHub] spark pull request: [SPARK-4692] [SQL] Support ! boolean logic ope...

2014-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3555#issuecomment-67823389 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SQL] spark-sql aborted if passed in a wrong s...

2014-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3761#issuecomment-67824715 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SQL] spark-sql aborted if passed in a wrong s...

2014-12-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3761#issuecomment-67824711 [Test build #24703 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24703/consoleFull) for PR 3761 at commit

[GitHub] spark pull request: [SPARK-4907][MLlib] Inconsistent loss and grad...

2014-12-22 Thread bryanyang0528
Github user bryanyang0528 commented on the pull request: https://github.com/apache/spark/pull/3746#issuecomment-67825123 On my opinion, I don't think the parameter of the cost function is 1/m or 1/2m is the critical deference. Across the cost function L = alpha * 1/2n ||A

[GitHub] spark pull request: [SPARK-4907][MLlib] Inconsistent loss and grad...

2014-12-22 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3746#issuecomment-67826137 @bryanyang0528 I don't think anyone's suggesting that the extra factor of 1/2 is more or less correct or desirable per se. The solution doesn't depend on the absolute

[GitHub] spark pull request: [SPARK-4692] [SQL] Support ! boolean logic ope...

2014-12-22 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3555#issuecomment-67827365 Thanks for the update, this now LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-22 Thread FlytxtRnD
Github user FlytxtRnD commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22163213 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala --- @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-22 Thread FlytxtRnD
Github user FlytxtRnD commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22163250 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala --- @@ -0,0 +1,93 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: Reuse Text in saveAsTextFile

2014-12-22 Thread zsxwing
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/3762 Reuse Text in saveAsTextFile Reuse Text in saveAsTextFile to reduce GC. /cc @rxin You can merge this pull request into a Git repository by running: $ git pull

[GitHub] spark pull request: [SPARK-4918][Core] Reuse Text in saveAsTextFil...

2014-12-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3762#issuecomment-67832465 [Test build #24704 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24704/consoleFull) for PR 3762 at commit

[GitHub] spark pull request: [SPARK-4918][Core] Reuse Text in saveAsTextFil...

2014-12-22 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3762#issuecomment-67832813 I think it's a small but OK optimization. Hadoop won't save the `Text` object itself, so it's safe, here in the 'save' method. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-2075][Core] backport for branch-1.2

2014-12-22 Thread zsxwing
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/3758#issuecomment-67834037 Looks there is some issue in HiveThriftServer2 in the branch-1.2? @liancheng ``` Exception in thread main java.lang.RuntimeException:

[GitHub] spark pull request: [SPARK-2075][Core] backport for branch-1.2

2014-12-22 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3758#issuecomment-67834393 This is probably caused by SPARK-4914, which is a bug in `dev/run-tests` and doesn't affect production code. PR #3756 was opened to fix this. --- If your project is

[GitHub] spark pull request: [SPARK-4914][Build] Cleans lib_managed before ...

2014-12-22 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3756#issuecomment-67834656 @pwendell @JoshRosen Would you please take a look at this? This issue is causing random PR build failures. Thanks! --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-4907][MLlib] Inconsistent loss and grad...

2014-12-22 Thread bryanyang0528
Github user bryanyang0528 commented on the pull request: https://github.com/apache/spark/pull/3746#issuecomment-67836176 @srowen I agree on that need a absolute value can be compared with others software. Maybe it would add a parameter to control the extra factor? --- If your

[GitHub] spark pull request: #SPARK-2808 update kafka to version 0.8.2

2014-12-22 Thread helena
Github user helena commented on the pull request: https://github.com/apache/spark/pull/3631#issuecomment-67836412 @JoshRosen Ticket name updated :) Sorry for the delay, I was away. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-4918][Core] Reuse Text in saveAsTextFil...

2014-12-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3762#issuecomment-67839700 [Test build #24704 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24704/consoleFull) for PR 3762 at commit

[GitHub] spark pull request: [SPARK-4918][Core] Reuse Text in saveAsTextFil...

2014-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3762#issuecomment-67839708 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-4907][MLlib] Inconsistent loss and grad...

2014-12-22 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/3746#issuecomment-67842962 @bryanyang0528 The learning rate issue here is different story. With modern optimization algorithms like LBFGS and OWLQN, the learning rate is not required. The

[GitHub] spark pull request: [SPARK-4907][MLlib] Inconsistent loss and grad...

2014-12-22 Thread bryanyang0528
Github user bryanyang0528 commented on the pull request: https://github.com/apache/spark/pull/3746#issuecomment-67847818 @dbtsai Thank you for your clear explanation which helps me alot! --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...

2014-12-22 Thread witgo
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/1518#discussion_r22171070 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Regularizer.scala --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4920][UI]:current spark version in UI i...

2014-12-22 Thread uncleGen
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/3763 [SPARK-4920][UI]:current spark version in UI is not striking. It is not convenient to see the Spark version. We can keep the same style with Spark website.

[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...

2014-12-22 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1518#discussion_r22173571 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Regularizer.scala --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4920][UI]:current spark version in UI i...

2014-12-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3763#issuecomment-67852386 [Test build #24705 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24705/consoleFull) for PR 3763 at commit

[GitHub] spark pull request: [SPARK-4860][pyspark][sql] speeding up `sample...

2014-12-22 Thread jbencook
GitHub user jbencook opened a pull request: https://github.com/apache/spark/pull/3764 [SPARK-4860][pyspark][sql] speeding up `sample()` and `takeSample()` This PR modifies the python `SchemaRDD` to use `sample()` and `takeSample()` from Scala instead of the slower python

[GitHub] spark pull request: [SPARK-4860][pyspark][sql] speeding up `sample...

2014-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3764#issuecomment-67860230 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-4918][Core] Reuse Text in saveAsTextFil...

2014-12-22 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3762#issuecomment-67860552 Great idea, LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-4920][UI]:current spark version in UI i...

2014-12-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3763#issuecomment-67863235 [Test build #24705 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24705/consoleFull) for PR 3763 at commit

[GitHub] spark pull request: [SPARK-4920][UI]:current spark version in UI i...

2014-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3763#issuecomment-67863240 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-4913] Fix incorrect event log path

2014-12-22 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3755#issuecomment-67868189 Hmm, guess I missed this in my testing. Anyway, I think this is the wrong place for the fix. The right fix in my view should be in `SparkDeploySchedulerBackend`,

[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...

2014-12-22 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/1222#issuecomment-67868333 Thanks Josh! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-22 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3686#issuecomment-67869210 It says it's standalone mode only because it's never been implemented anywhere else. You're now implementing it for Yarn, I don't see a reason why you wouldn't just reuse

[GitHub] spark pull request: [SPARK-4913] Fix incorrect event log path

2014-12-22 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3755#issuecomment-67870864 Here's what I think is a better approach, feel free to use / adapt it: https://gist.github.com/vanzin/e1910b11ce00630fe9d4 --- If your project is set up for it, you

[GitHub] spark pull request: [Minor] Fix scala doc

2014-12-22 Thread ash211
Github user ash211 commented on the pull request: https://github.com/apache/spark/pull/3751#issuecomment-67871664 This is a very minor change -- do we need a Jira ticket for it?

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-22 Thread avulanov
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-67872100 @dbtsai I did local experiment on mnist and your new implementation seems to be more than 2x faster than the previous one! I am going to perform bigger experiments. In

[GitHub] spark pull request: [SPARK-4860][pyspark][sql] speeding up `sample...

2014-12-22 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/3764#issuecomment-67875459 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-4912][SQL] Persistent tables for the Sp...

2014-12-22 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3752#discussion_r22183385 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -55,8 +56,60 @@ private[hive] class HiveMetastoreCatalog(hive:

[GitHub] spark pull request: [SPARK-4860][pyspark][sql] speeding up `sample...

2014-12-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3764#issuecomment-67875679 [Test build #24706 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24706/consoleFull) for PR 3764 at commit

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3702#discussion_r22183573 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala --- @@ -28,9 +28,23 @@ import

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3702#discussion_r22183579 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetricsSuite.scala --- @@ -124,4 +124,36 @@ class

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3702#discussion_r22183575 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala --- @@ -103,7 +117,37 @@ class

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3702#discussion_r22183580 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetricsSuite.scala --- @@ -124,4 +124,36 @@ class

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-67876179 @srowen The logic test look fine; I just added a couple of comments. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-4912][SQL] Persistent tables for the Sp...

2014-12-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3752#issuecomment-67876256 [Test build #24707 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24707/consoleFull) for PR 3752 at commit

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-22 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3686#issuecomment-67876245 So, after actually reading the code :-), the current implementation uses `spark.yarn.am.cores` for both client and cluster mode. I think that's bad, because if

[GitHub] spark pull request: [SPARK-4912][SQL] Persistent tables for the Sp...

2014-12-22 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3752#discussion_r22183673 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -55,8 +56,60 @@ private[hive] class HiveMetastoreCatalog(hive:

[GitHub] spark pull request: [SPARK-4912][SQL] Persistent tables for the Sp...

2014-12-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3752#issuecomment-67876806 [Test build #24708 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24708/consoleFull) for PR 3752 at commit

[GitHub] spark pull request: [SPARK-4917] Add a function to convert into a ...

2014-12-22 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/3760#issuecomment-67876851 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2075][Core] backport for branch-1.2

2014-12-22 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/3758#issuecomment-67876912 I'm merging this since I really only wanted to check for compilation. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-4913] Fix incorrect event log path

2014-12-22 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3755#issuecomment-67877038 retest this please @vanzin I believe we separated the definition of `isEventLogEnabled` from that of `eventLogger` because of the following initialization

[GitHub] spark pull request: [SPARK-2075][Core] backport for branch-1.2

2014-12-22 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/3758#issuecomment-67877226 Alright I've merged this. Do you mind closing the PR? Github doesn't close it unless the commit is merged into master. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-4913] Fix incorrect event log path

2014-12-22 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3755#issuecomment-67877360 I see. Hmm. That sucks. :-/ A comment there would help at least, but even better would be to avoid this tight coupling altogether. --- If your project is set up for it,

[GitHub] spark pull request: [SPARK-4917] Add a function to convert into a ...

2014-12-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3760#issuecomment-67877352 [Test build #24709 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24709/consoleFull) for PR 3760 at commit

[GitHub] spark pull request: [SPARK-4918][Core] Reuse Text in saveAsTextFil...

2014-12-22 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/3762#issuecomment-67877459 LGTM. Merging in master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-4918][Core] Reuse Text in saveAsTextFil...

2014-12-22 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3762 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-4913] Fix incorrect event log path

2014-12-22 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3755#issuecomment-67877780 Hey @viirya I believe the right fix here is to change the `eventLogFile` field back to an `eventLogDir` (because it refers to the base logging directory, not the

[GitHub] spark pull request: [SPARK-3382] GradientDescent convergence toler...

2014-12-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3636#discussion_r22184450 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala --- @@ -77,6 +80,17 @@ class GradientDescent private[mllib]

[GitHub] spark pull request: [SPARK-4749] [mllib]: Allow initializing KMean...

2014-12-22 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3610#issuecomment-67877997 failure in a streaming test...retesting --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-4749] [mllib]: Allow initializing KMean...

2014-12-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3610#issuecomment-67878052 [Test build #551 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/551/consoleFull) for PR 3610 at commit

[GitHub] spark pull request: [SPARK-4749] [mllib]: Allow initializing KMean...

2014-12-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3610#discussion_r22184615 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -43,13 +43,14 @@ class KMeans private ( private var runs:

[GitHub] spark pull request: [SPARK-4920][UI]:current spark version in UI i...

2014-12-22 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3763#issuecomment-67878388 +1. I also thought the bottom greyed out text is too obscure. @JoshRosen any thoughts? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-4915][YARN] Fix classname to be specifi...

2014-12-22 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3757#issuecomment-67878534 I'm merging this since this is just docs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-4915][YARN] Fix classname to be specifi...

2014-12-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3757#issuecomment-67878503 [Test build #24710 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24710/consoleFull) for PR 3757 at commit

[GitHub] spark pull request: [SPARK-4915][YARN] Fix classname to be specifi...

2014-12-22 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3757#issuecomment-67878474 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-4915][YARN] Fix classname to be specifi...

2014-12-22 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3757#issuecomment-67878448 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-4915][YARN] Fix classname to be specifi...

2014-12-22 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3757 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22184915 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4913] Fix incorrect event log path

2014-12-22 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3755#issuecomment-67878956 Andrew's suggestion sounds good. Long term, I think it would be better to send this log path later (as some sort of application stopping message maybe?), instead of

[GitHub] spark pull request: [SPARK-4881] Use SparkConf#getBoolean instead ...

2014-12-22 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3733#issuecomment-67878993 No worries. Once you make those changes I will merge this. By the way for issues as minor as this one I don't think filing a JIRA is necessary. I would just put

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22185023 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4870] Add spark version to driver log

2014-12-22 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3717#issuecomment-67879567 This looks fine. I'm going to tweak the log format a little bit when I merge it. Thanks --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-4870] Add spark version to driver log

2014-12-22 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3717 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22185641 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,242 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-22 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-67880399 @tgaloppo MLUtils.EPSILON is actually private[util]. I think it would be fine to change it to be private[mllib]. CC: @mengxr @tgaloppo I strongly recommend

[GitHub] spark pull request: [SPARK-4913] Fix incorrect event log path

2014-12-22 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3755#issuecomment-67880836 For instance... https://github.com/andrewor14/spark/compare/fix-event-log-suggestion --- If your project is set up for it, you can reply to this email and have your

  1   2   3   4   >