[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 Sorry, I gave a wrong answer at the beginning. Next time, I will review it more carefully before leaving the comment. Thank you for your work! --- If your project is set up for it, you can

[GitHub] spark issue #14660: [SPARK-17071][SQL] Fetch Parquet schema without another ...

2016-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14660 **[Test build #63828 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63828/consoleFull)** for PR 14660 at commit

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 :) I think about this issue again. At this stage, could you make a PR for this? I think you're the best person to do that. You made this optimizer and found the correct fix.

[GitHub] spark pull request #14660: [SPARK-17071][SQL] Fetch Parquet schema without a...

2016-08-15 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/14660 [SPARK-17071][SQL] Fetch Parquet schema without another Spark job when it is a single file to touch ## What changes were proposed in this pull request? It seems Spark executes

[GitHub] spark issue #13796: [SPARK-7159][ML] Add multiclass logistic regression to S...

2016-08-15 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/13796 @sethah Thank you for great work. I'll make another pass tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 One more try: ```Scala val splitConjunctiveConditions: Seq[Expression] = splitConjunctivePredicates(filter.condition) val conditions = splitConjunctiveConditions ++

[GitHub] spark issue #14616: [SPARK-17034][SQL] adds expression UnresolvedOrdinal to ...

2016-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14616 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63821/ Test PASSed. ---

[GitHub] spark issue #14616: [SPARK-17034][SQL] adds expression UnresolvedOrdinal to ...

2016-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14616 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 Another version. : ) ```Scala val splitConjunctiveConditions: Seq[Expression] = splitConjunctivePredicates(filter.condition) val conditions =

[GitHub] spark issue #14616: [SPARK-17034][SQL] adds expression UnresolvedOrdinal to ...

2016-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14616 **[Test build #63821 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63821/consoleFull)** for PR 14616 at commit

[GitHub] spark issue #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Model wrapp...

2016-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14392 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63827/ Test PASSed. ---

[GitHub] spark issue #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Model wrapp...

2016-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14392 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Model wrapp...

2016-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14392 **[Test build #63827 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63827/consoleFull)** for PR 14392 at commit

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 How about another version? ``` val leftConditions = (splitConjunctiveConditions ++ filter.constraints.filter(_.isInstanceOf[IsNotNull]))

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 Oh, that would be perfect fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14182: [SPARK-16444][SparkR]: Isotonic Regression wrapper in Sp...

2016-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14182 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14182: [SPARK-16444][SparkR]: Isotonic Regression wrapper in Sp...

2016-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14182 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63826/ Test PASSed. ---

[GitHub] spark issue #14182: [SPARK-16444][SparkR]: Isotonic Regression wrapper in Sp...

2016-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14182 **[Test build #63826 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63826/consoleFull)** for PR 14182 at commit

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 How about this fix? ``` val leftHasNonNullPredicate = leftConditions.exists(canFilterOutNull) || filter.constraints.filter(_.isInstanceOf[IsNotNull]) .exists(expr

[GitHub] spark issue #13796: [SPARK-7159][ML] Add multiclass logistic regression to S...

2016-08-15 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/13796 @dbtsai Thanks for taking the time to review this! Major items right now: * Adding derivation to the aggregator doc (this is mostly finished, just fighting scala doc with Latex) *

[GitHub] spark issue #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Model wrapp...

2016-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14392 **[Test build #63825 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63825/consoleFull)** for PR 14392 at commit

[GitHub] spark issue #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Model wrapp...

2016-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14392 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63825/ Test PASSed. ---

[GitHub] spark issue #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Model wrapp...

2016-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14392 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #13796: [SPARK-7159][ML] Add multiclass logistic regressi...

2016-08-15 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/13796#discussion_r74876946 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/MultinomialLogisticRegression.scala --- @@ -0,0 +1,626 @@ +/* + * Licensed to the

[GitHub] spark issue #14182: [SPARK-16444][SparkR]: Isotonic Regression wrapper in Sp...

2016-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14182 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14182: [SPARK-16444][SparkR]: Isotonic Regression wrapper in Sp...

2016-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14182 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63824/ Test PASSed. ---

[GitHub] spark issue #14182: [SPARK-16444][SparkR]: Isotonic Regression wrapper in Sp...

2016-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14182 **[Test build #63824 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63824/consoleFull)** for PR 14182 at commit

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14359 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63822/ Test PASSed. ---

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14359 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14359 **[Test build #63822 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63822/consoleFull)** for PR 14359 at commit

[GitHub] spark issue #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Model wrapp...

2016-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14392 **[Test build #63827 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63827/consoleFull)** for PR 14392 at commit

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 Another better fix is to use `nullable` in `Expression` for `IsNotNull` constraints. `filter.constraints.filter(_.isInstanceOf[IsNotNull])` --- If your project is set up for it, you can reply

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 `canFilterOutNull ` will cover almost all the cases. Sorry, I did not read the plan until you asked me to write a test case. Then, I realized the implementation of natural/using join is just

[GitHub] spark issue #14182: [SPARK-16444][SparkR]: Isotonic Regression wrapper in Sp...

2016-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14182 **[Test build #63826 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63826/consoleFull)** for PR 14182 at commit

[GitHub] spark pull request #14506: [SPARK-16916][SQL] serde/storage properties shoul...

2016-08-15 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14506 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14659 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 Please let me think more on this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-15 Thread Sherry302
GitHub user Sherry302 opened a pull request: https://github.com/apache/spark/pull/14659 [SPARK-16757] Set up Spark caller context to HDFS ## What changes were proposed in this pull request? 1. Pass `jobId` to Task. 2. Invoke Hadoop APIs. A new function

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 Yep. I agree. `Expr` could be anything. However, this will reduce the scope of this optimization greatly. Is it okay for you? --- If your project is set up for it, you can reply to this

[GitHub] spark issue #14506: [SPARK-16916][SQL] serde/storage properties should not h...

2016-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14506 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14506: [SPARK-16916][SQL] serde/storage properties should not h...

2016-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14506 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63818/ Test PASSed. ---

[GitHub] spark issue #14506: [SPARK-16916][SQL] serde/storage properties should not h...

2016-08-15 Thread yhuai
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/14506 Thanks. Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 If that is not applicable, I agree with @gatorsmile . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 That just resolves a specific case. The expressions could be much more complex. `Coalesce` can be used in a very deep layer. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #14506: [SPARK-16916][SQL] serde/storage properties should not h...

2016-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14506 **[Test build #63818 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63818/consoleFull)** for PR 14506 at commit

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 What about this if we could exclude those functions? ```scala val leftHasNonNullPredicate = leftConditions.exists(canFilterOutNull) ||

[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...

2016-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14447 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63820/ Test PASSed. ---

[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...

2016-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14447 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...

2016-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14447 **[Test build #63820 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63820/consoleFull)** for PR 14447 at commit

[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-15 Thread junyangq
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74874929 --- Diff: R/pkg/R/functions.R --- @@ -1143,7 +1139,7 @@ setMethod("minute", #' @export #' @examples \dontrun{select(df,

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 The right fix is to change the following statements ```Scala val leftHasNonNullPredicate = leftConditions.exists(canFilterOutNull) ||

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 Sorry, my above description is not clear. `isnotnull(coalesce(b#227, c#238))` does not filter out `NULL` of `b#227` and `c#238`. Only when both are `b#227` and `c#238` are `NULL`,

[GitHub] spark issue #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Model wrapp...

2016-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14392 **[Test build #63825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63825/consoleFull)** for PR 14392 at commit

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread yhuai
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/14580 Can you explain `isnotnull(coalesce(b#227, c#238)) does not filter out NULL!!!`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-15 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/14359 Btw, to give back-of-the-envelope estimates, we can look at 2 numbers: (1) How many nodes will be split on each iteration? (2) How big is the forest which is serialized and sent to workers

[GitHub] spark issue #14182: [SPARK-16444][SparkR]: Isotonic Regression wrapper in Sp...

2016-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14182 **[Test build #63824 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63824/consoleFull)** for PR 14182 at commit

[GitHub] spark issue #14658: [WIP][SPARK-5928] Remote Shuffle Blocks cannot be more t...

2016-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14658 **[Test build #63823 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63823/consoleFull)** for PR 14658 at commit

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 ```Scala val df12 = df1.join(df2, $"df1.a" === $"df2.a", "fullouter") .select(coalesce($"df1.b", $"df2.c").as("a"), $"df1.b", $"df2.c") df12.join(df3, "a").explain(true)

[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-15 Thread junyangq
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74874081 --- Diff: R/pkg/R/SQLContext.R --- @@ -181,7 +181,7 @@ getDefaultSqlSource <- function() { #' @method createDataFrame default #' @note

[GitHub] spark pull request #14658: [WIP][SPARK-5928] Remote Shuffle Blocks cannot be...

2016-08-15 Thread witgo
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/14658 [WIP][SPARK-5928] Remote Shuffle Blocks cannot be more than 2 GB ## What changes were proposed in this pull request? Add class `ChunkFetchInputStream` and it have the following effects:

[GitHub] spark pull request #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Mode...

2016-08-15 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14392#discussion_r74873932 --- Diff: R/pkg/R/generics.R --- @@ -1279,6 +1279,13 @@ setGeneric("spark.naiveBayes", function(data, formula, ...) { standardGeneric("s #' @export

[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-15 Thread junyangq
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74873867 --- Diff: R/pkg/R/mllib.R --- @@ -298,14 +304,15 @@ setMethod("summary", signature(object = "NaiveBayesModel"), #' Users can call \code{summary} to

[GitHub] spark issue #14628: [SPARK-17050][ML][MLLib] Improve kmean rdd.aggregate to ...

2016-08-15 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14628 @holdenk I think depth (2) is enough to handle large RDD and bigger depth may add cost. I'll append test result later. Thanks! --- If your project is set up for it, you can reply to this

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14359 **[Test build #63822 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63822/consoleFull)** for PR 14359 at commit

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 None of us is right. : ( ```isnotnull(coalesce(b#227, c#238))``` does not filter out `NULL`!!! Thus, the right fix is to remove the second condition. ```Scala

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-15 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/14359 Sorry for the long delay; I've been swamped by other things for a while. Re-emerging... I switched to Stack and then realized Stack has been deprecated in Scala 2.11, so I reverted to

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 I found the root cause. None of us is right. : ( --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14647: [WIP][Test only][DEMO][SPARK-6235]Address various 2G lim...

2016-08-15 Thread witgo
Github user witgo commented on the issue: https://github.com/apache/spark/pull/14647 @hvanhovell I will submit some small PRs and provide a more high level description of them. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-08-15 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13758 You are right. I missed `UnsafeArrayData` is a subclass of `ArrayData`. We can pass `UnsafeArrayData` to an projection. I have one question. When we directly generate `UnsafeArrayData`

[GitHub] spark issue #14616: [SPARK-17034][SQL] adds expression UnresolvedOrdinal to ...

2016-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14616 **[Test build #63821 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63821/consoleFull)** for PR 14616 at commit

[GitHub] spark issue #14649: [SPARK-17059][SQL] Allow FileFormat to specify partition...

2016-08-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14649 Also, if my understanding is correct, we are picking up only single file to read footer (see

[GitHub] spark pull request #14649: [SPARK-17059][SQL] Allow FileFormat to specify pa...

2016-08-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14649#discussion_r74872775 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -423,6 +425,54 @@ class

[GitHub] spark pull request #14649: [SPARK-17059][SQL] Allow FileFormat to specify pa...

2016-08-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14649#discussion_r74872795 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -423,6 +425,54 @@ class

[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...

2016-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14447 **[Test build #63820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63820/consoleFull)** for PR 14447 at commit

[GitHub] spark issue #14626: [SPARK-16519][SPARKR] Handle SparkR RDD generics that cr...

2016-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14626 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14626: [SPARK-16519][SPARKR] Handle SparkR RDD generics that cr...

2016-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14626 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63819/ Test PASSed. ---

[GitHub] spark issue #14626: [SPARK-16519][SPARKR] Handle SparkR RDD generics that cr...

2016-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14626 **[Test build #63819 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63819/consoleFull)** for PR 14626 at commit

[GitHub] spark pull request #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptro...

2016-08-15 Thread keypointt
Github user keypointt commented on a diff in the pull request: https://github.com/apache/spark/pull/14447#discussion_r74871845 --- Diff: R/pkg/R/mllib.R --- @@ -414,6 +421,94 @@ setMethod("predict", signature(object = "KMeansModel"),

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 Thank you, @nsyca! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #13428: [SPARK-12666][CORE] SparkSubmit packages fix for when 'd...

2016-08-15 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/13428 Thanks for the review @JoshRosen, I made the requested changes and tested it out once more. I think it is low risk because it is pretty well isolated to this particular issue and only improves

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 Hmm. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 Yep. Here is the output. ```scala scala> val a = Seq((1,2),(2,3)).toDF("a","b").createOrReplaceTempView("A") scala> val b =

[GitHub] spark pull request #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptro...

2016-08-15 Thread felixcheung
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14447#discussion_r74870955 --- Diff: R/pkg/R/mllib.R --- @@ -414,6 +421,94 @@ setMethod("predict", signature(object = "KMeansModel"),

[GitHub] spark pull request #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptro...

2016-08-15 Thread felixcheung
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14447#discussion_r74870995 --- Diff: R/pkg/R/mllib.R --- @@ -414,6 +421,94 @@ setMethod("predict", signature(object = "KMeansModel"),

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14580 @dongjoon-hyun, could you please try this on your PR? val a = Seq((1,2),(2,3)).toDF("a","b").createOrReplaceTempView("A") val b =

[GitHub] spark pull request #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptro...

2016-08-15 Thread felixcheung
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14447#discussion_r74870693 --- Diff: R/pkg/R/mllib.R --- @@ -414,6 +421,94 @@ setMethod("predict", signature(object = "KMeansModel"),

[GitHub] spark issue #14626: [SPARK-16519][SPARKR] Handle SparkR RDD generics that cr...

2016-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14626 **[Test build #63819 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63819/consoleFull)** for PR 14626 at commit

[GitHub] spark issue #14641: [Minor] [SparkR] spark.glm weightCol should in the signa...

2016-08-15 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/14641 I think tests are only passing string, but we should coerce this to be safe. LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Mode...

2016-08-15 Thread felixcheung
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14392#discussion_r74870049 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +659,110 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"),

[GitHub] spark pull request #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Mode...

2016-08-15 Thread felixcheung
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14392#discussion_r74869983 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +659,110 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"),

[GitHub] spark pull request #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Mode...

2016-08-15 Thread felixcheung
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14392#discussion_r74869872 --- Diff: R/pkg/R/mllib.R --- @@ -526,6 +533,24 @@ setMethod("write.ml", signature(object = "KMeansModel", path = "character"),

[GitHub] spark pull request #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Mode...

2016-08-15 Thread felixcheung
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14392#discussion_r74869843 --- Diff: R/pkg/R/generics.R --- @@ -1279,6 +1279,13 @@ setGeneric("spark.naiveBayes", function(data, formula, ...) { standardGeneric("s #' @export

[GitHub] spark pull request #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-15 Thread felixcheung
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14229#discussion_r74869802 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LDAWrapper.scala --- @@ -0,0 +1,207 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #14506: [SPARK-16916][SQL] serde/storage properties should not h...

2016-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14506 **[Test build #63818 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63818/consoleFull)** for PR 14506 at commit

[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-08-15 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13758 you can take a look at `GenerateUnsafeProjection`, if the `ArrayData` is already an unsafe array, we will copy it directly, no iteration is needed. --- If your project is set up for it, you can

[GitHub] spark pull request #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-15 Thread felixcheung
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14229#discussion_r74869681 --- Diff: R/pkg/R/mllib.R --- @@ -605,6 +701,69 @@ setMethod("spark.survreg", signature(data = "SparkDataFrame", formula = "formula

[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark

2016-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/8880 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63817/ Test FAILed. ---

[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark

2016-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/8880 **[Test build #63817 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63817/consoleFull)** for PR 8880 at commit

[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark

2016-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/8880 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-15 Thread felixcheung
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14229#discussion_r74869578 --- Diff: R/pkg/R/mllib.R --- @@ -605,6 +701,69 @@ setMethod("spark.survreg", signature(data = "SparkDataFrame", formula = "formula

  1   2   3   4   5   6   >