[GitHub] spark pull request: [SPARK-13550] [ML] Add java example for ml.clu...

2016-02-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11428


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13550] [ML] Add java example for ml.clu...

2016-02-29 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/11428#issuecomment-190601335
  
LGTM. Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13232][YARN] Fix executor node label

2016-02-29 Thread jerryshao
Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/11129#issuecomment-190599752
  
Any further updates on it? CC @sryza about this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13548][BUILD] Move tags and unsafe modu...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11426#issuecomment-190597216
  
**[Test build #2596 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2596/consoleFull)**
 for PR 11426 at commit 
[`0c967cc`](https://github.com/apache/spark/commit/0c967cc95ce8b709a7c06ec442f9991fe40d9b4e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...

2016-02-29 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/10940#issuecomment-190593882
  
Had an offline discussion with @dbtsai and @coderxiang . We agreed to keep 
the current behavior and have it well documented. I will mark this JIRA as 
"won't" and created SPARK-13590 for documentation and logging improvement.

@coderxiang Do you mind closing this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6735][YARN] Add window based executor f...

2016-02-29 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/10241#discussion_r54530945
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -152,7 +164,17 @@ private[yarn] class YarnAllocator(
 
   def getNumExecutorsRunning: Int = numExecutorsRunning
 
-  def getNumExecutorsFailed: Int = numExecutorsFailed
+  def getNumExecutorsFailed: Int = synchronized {
+val endTime = clock.getTimeMillis()
+
+while (executorFailuresValidityInterval > 0
+  && failedExecutorsTimeStamps.nonEmpty
--- End diff --

Sure, I will add it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11136#issuecomment-190591444
  
**[Test build #52227 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52227/consoleFull)**
 for PR 11136 at commit 
[`007a4ec`](https://github.com/apache/spark/commit/007a4ec324db273c048ed65fe8942daba0c9d844).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13385][MLlib] Enable AssociationRules t...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11411#issuecomment-190587137
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52221/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13385][MLlib] Enable AssociationRules t...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11411#issuecomment-190587133
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13385][MLlib] Enable AssociationRules t...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11411#issuecomment-190586830
  
**[Test build #52221 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52221/consoleFull)**
 for PR 11411 at commit 
[`9c3a8c3`](https://github.com/apache/spark/commit/9c3a8c34117f081600f54bd774e58e3ca93aa4ba).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11391#discussion_r54529927
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala
 ---
@@ -78,4 +78,21 @@ class WholeStageCodegenSuite extends SparkPlanTest with 
SharedSQLContext {
 
p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort]).isDefined)
 assert(df.collect() === Array(Row(1), Row(2), Row(3)))
   }
+
+  test("Limit should be included in WholeStageCodegen") {
+val df = sqlContext.range(1).limit(100).sort(col("id"))
+val plan = df.queryExecution.executedPlan
+
+assert(plan.find(p =>
+  p.isInstanceOf[WholeStageCodegen] &&
+p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort] &&
--- End diff --

Agreed. Let me remove this later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12817] Add BlockManager.getOrElseUpdate...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11436#issuecomment-190586185
  
**[Test build #52226 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52226/consoleFull)**
 for PR 11436 at commit 
[`f8cccea`](https://github.com/apache/spark/commit/f8cccea3641e17ad656892c42642678f1a86af5b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11391#discussion_r54529803
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala
 ---
@@ -78,4 +78,21 @@ class WholeStageCodegenSuite extends SparkPlanTest with 
SharedSQLContext {
 
p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort]).isDefined)
 assert(df.collect() === Array(Row(1), Row(2), Row(3)))
   }
+
+  test("Limit should be included in WholeStageCodegen") {
+val df = sqlContext.range(1).limit(100).sort(col("id"))
+val plan = df.queryExecution.executedPlan
+
+assert(plan.find(p =>
+  p.isInstanceOf[WholeStageCodegen] &&
+p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort] &&
--- End diff --

These kind of tests are easy to break, we may don't need this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13538][ML] Add GaussianMixture to ML

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11419#issuecomment-190585130
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13538][ML] Add GaussianMixture to ML

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11419#issuecomment-190585131
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52218/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13538][ML] Add GaussianMixture to ML

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11419#issuecomment-190585033
  
**[Test build #52218 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52218/consoleFull)**
 for PR 11419 at commit 
[`bbf9432`](https://github.com/apache/spark/commit/bbf9432646c4d573606ce9b21d88bd04069ca802).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13139][SQL] Create native DDL commands

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11048#issuecomment-190584968
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52217/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13139][SQL] Create native DDL commands

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11048#issuecomment-190584967
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11391#discussion_r54529510
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala
 ---
@@ -78,4 +78,21 @@ class WholeStageCodegenSuite extends SparkPlanTest with 
SharedSQLContext {
 
p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort]).isDefined)
 assert(df.collect() === Array(Row(1), Row(2), Row(3)))
   }
+
+  test("Limit should be included in WholeStageCodegen") {
+val df = sqlContext.range(1).limit(100).sort(col("id"))
+val plan = df.queryExecution.executedPlan
+
+assert(plan.find(p =>
+  p.isInstanceOf[WholeStageCodegen] &&
+p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort] &&
--- End diff --

Yeah, because we can't leave limit as last operator otherwise it will 
transform to collect limit, so I add a sort here. I will remove it once I am 
back to laptop (few hours later).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13139][SQL] Create native DDL commands

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11048#issuecomment-190584474
  
**[Test build #52217 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52217/consoleFull)**
 for PR 11048 at commit 
[`6032268`](https://github.com/apache/spark/commit/603226830dc8aee52ca957c60f15cb164f10fb90).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10506#issuecomment-190579869
  
**[Test build #52225 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52225/consoleFull)**
 for PR 10506 at commit 
[`7cec07c`](https://github.com/apache/spark/commit/7cec07c59ffb73261c743c5dffd5ea262ca9c0dc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13404] [SQL] Create variables for input...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11274#issuecomment-190577409
  
**[Test build #52224 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52224/consoleFull)**
 for PR 11274 at commit 
[`1a1452e`](https://github.com/apache/spark/commit/1a1452e8fbcf15314da30b3342dec1bafca012a6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13404] [SQL] Create variables for input...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11274#issuecomment-190576044
  
**[Test build #2595 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2595/consoleFull)**
 for PR 11274 at commit 
[`ca8fe0f`](https://github.com/apache/spark/commit/ca8fe0f5f55cabb1bb5903c3e85c150b31eaa7c7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10506#issuecomment-190575725
  
**[Test build #52223 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52223/consoleFull)**
 for PR 10506 at commit 
[`a117dcd`](https://github.com/apache/spark/commit/a117dcdcfc4ebffb5aa338ead75fbc03515a2db5).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10506#issuecomment-190575732
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52223/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10506#issuecomment-190575730
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13550] [ML] Add java example for ml.clu...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11428#issuecomment-190575477
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

2016-02-29 Thread jerryshao
Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/10506#issuecomment-190575489
  
@andrewor14 , would you please review this patch again, it is pending here 
a long time and I think it is actually a bug here. Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13550] [ML] Add java example for ml.clu...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11428#issuecomment-190575220
  
**[Test build #52220 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52220/consoleFull)**
 for PR 11428 at commit 
[`4f0e3b9`](https://github.com/apache/spark/commit/4f0e3b92549d832936407cd7f2b3d334b087e5a3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class JavaBisectingKMeansExample `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13550] [ML] Add java example for ml.clu...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11428#issuecomment-190575482
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52220/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13404] [SQL] Create variables for input...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11274#issuecomment-190575463
  
**[Test build #2594 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2594/consoleFull)**
 for PR 11274 at commit 
[`ca8fe0f`](https://github.com/apache/spark/commit/ca8fe0f5f55cabb1bb5903c3e85c150b31eaa7c7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10506#issuecomment-190575051
  
**[Test build #52223 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52223/consoleFull)**
 for PR 10506 at commit 
[`a117dcd`](https://github.com/apache/spark/commit/a117dcdcfc4ebffb5aa338ead75fbc03515a2db5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190575061
  
**[Test build #5 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/5/consoleFull)**
 for PR 11391 at commit 
[`b64e52d`](https://github.com/apache/spark/commit/b64e52d189e5041cc1af2ebb0d656c5f5c12c82d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11391#discussion_r54528196
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala
 ---
@@ -78,4 +78,21 @@ class WholeStageCodegenSuite extends SparkPlanTest with 
SharedSQLContext {
 
p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort]).isDefined)
 assert(df.collect() === Array(Row(1), Row(2), Row(3)))
   }
+
+  test("Limit should be included in WholeStageCodegen") {
+val df = sqlContext.range(1).limit(100).sort(col("id"))
+val plan = df.queryExecution.executedPlan
+
+assert(plan.find(p =>
+  p.isInstanceOf[WholeStageCodegen] &&
+p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort] &&
--- End diff --

The sort is not related to limit, could you remove it from this PR? (we may 
revert the commit for sort)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13444] [MLlib] QuantileDiscretizer choo...

2016-02-29 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/11402#issuecomment-190571986
  
@oliverpierson I haven't seen this test fails in the master build. If I'm 
correct, we control the random seed in the master branch resulting 
deterministic behavior. But we don't have it in branch-1.6. If that is the 
case, we can either backport the commit that implements `setSeed` 
(https://github.com/apache/spark/commit/574571c87098795a2206a113ee9ed4bafba8f00f)
 or backport it but hide the public APIs and fix the seed on branch-1.6 (so we 
don't expose new APIs). @srowen Which one do you prefer?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13586][STREAMING]add config to skip gen...

2016-02-29 Thread jerryshao
Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/11440#issuecomment-190570144
  
For example, if your sliding duration is 1, window duration is 4, and batch 
duration is 1, and the down time is 3. If you skip this this 3 batches, IIUC 
the result will be wrong, 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13550] [ML] Add java example for ml.clu...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11428#issuecomment-190569078
  
**[Test build #52220 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52220/consoleFull)**
 for PR 11428 at commit 
[`4f0e3b9`](https://github.com/apache/spark/commit/4f0e3b92549d832936407cd7f2b3d334b087e5a3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13385][MLlib] Enable AssociationRules t...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11411#issuecomment-190569087
  
**[Test build #52221 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52221/consoleFull)**
 for PR 11411 at commit 
[`9c3a8c3`](https://github.com/apache/spark/commit/9c3a8c34117f081600f54bd774e58e3ca93aa4ba).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13551] [MLLib] Fix wrong comment and re...

2016-02-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11429


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13586][STREAMING]add config to skip gen...

2016-02-29 Thread jeanlyn
Github user jeanlyn commented on the pull request:

https://github.com/apache/spark/pull/11440#issuecomment-190568465
  
Thanks @jerryshao for suggestion!
> Jobs generated in the down time can be used for WAL replay, did you test 
when these down jobs are removed, the behavior of WAL replay is still correct?

It seems that the `pendingTimes` is use for WAL replay, i do not skip these 
batches 

> Also for some windowing operations, I think this removal of down time 
jobs may possibly lead to the inconsistent result of windowing aggregation.

Does inconsistent result mean wrong result?

Also, i will running the unit test with the config set to true by default 
in my local computer.





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13551] [MLLib] Fix wrong comment and re...

2016-02-29 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/11429#issuecomment-190568451
  
Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13385][MLlib] Enable AssociationRules t...

2016-02-29 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/11411#issuecomment-190568059
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13550] [ML] Add java example for ml.clu...

2016-02-29 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/11428#issuecomment-190567826
  
@srowen It would be nice to have example code in the user guide for every 
algorithm. And this PR helps.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190567560
  
**[Test build #52219 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52219/consoleFull)**
 for PR 11391 at commit 
[`8d254d2`](https://github.com/apache/spark/commit/8d254d206686dd9c6edd053d4abcd184799fcc2a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13550] [ML] Add java example for ml.clu...

2016-02-29 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/11428#issuecomment-190567858
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13538][ML] Add GaussianMixture to ML

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11419#issuecomment-190567555
  
**[Test build #52218 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52218/consoleFull)**
 for PR 11419 at commit 
[`bbf9432`](https://github.com/apache/spark/commit/bbf9432646c4d573606ce9b21d88bd04069ca802).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13538][ML] Add GaussianMixture to ML

2016-02-29 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/11419#issuecomment-190566838
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13538][ML] Add GaussianMixture to ML

2016-02-29 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/11419#issuecomment-190566812
  
add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11391#discussion_r54526775
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/BufferedRowIterator.java 
---
@@ -35,6 +35,8 @@
   // used when there is no column in output
   protected UnsafeRow unsafeRow = new UnsafeRow(0);
 
+  protected boolean stopEarly = false;
--- End diff --

yeah. I am updating it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190563231
  
**[Test build #2593 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2593/consoleFull)**
 for PR 11437 at commit 
[`6fce801`](https://github.com/apache/spark/commit/6fce80141c76604167914a8cbb39847f1a4f457a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13583][BUILD] Enforce `UnusedImports` J...

2016-02-29 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/11438#issuecomment-190562653
  
Rebased to trigger the Jenkins test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12977][Streaming][WIP] Support Streamin...

2016-02-29 Thread jerryshao
Github user jerryshao closed the pull request at:

https://github.com/apache/spark/pull/10966


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12941][SQL][BRANCH-1.4] Spark-SQL JDBC ...

2016-02-29 Thread thomastechs
Github user thomastechs commented on a diff in the pull request:

https://github.com/apache/spark/pull/10912#discussion_r54525699
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala 
---
@@ -445,4 +445,9 @@ class JDBCSuite extends SparkFunSuite with 
BeforeAndAfter {
 assert(agg.getCatalystType(1, "", 1, null) == Some(StringType))
   }
 
+  test("OracleDialect type mapping") {
+val oracleDialect = JdbcDialects.get("jdbc:oracle://127.0.0.1/db")
+assert(oracleDialect.getJDBCType(StringType).
+  map(_.databaseTypeDefinition).get == "VARCHAR2(255)")
+  }
--- End diff --

Okei @yhuaiSo I shall submit another PR, for the same JIRA with the 
updates in the JDBCSuite.scala in the master branch. I shall also close this PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11391#discussion_r54525690
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/BufferedRowIterator.java 
---
@@ -35,6 +35,8 @@
   // used when there is no column in output
   protected UnsafeRow unsafeRow = new UnsafeRow(0);
 
+  protected boolean stopEarly = false;
--- End diff --

We could use `addMutableState`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11391#discussion_r54525659
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/BufferedRowIterator.java 
---
@@ -35,6 +35,8 @@
   // used when there is no column in output
   protected UnsafeRow unsafeRow = new UnsafeRow(0);
 
+  protected boolean stopEarly = false;
--- End diff --

Since `stopEarly` is only accessed generated functions, we don't need this 
anymore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12981][SQL] Fix Python UDF extraction f...

2016-02-29 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/10935#issuecomment-190554648
  
Using these two functionally equavalent code snippets:

Scala
```
val data = Seq((1, "1"), (2, "2"), (3, "2"), (1, "3")).toDF("a","b")
val my_filter = sqlContext.udf.register("my_filter", (a:Int) => a==1)
data.select(col("a")).distinct().filter(my_filter(col("a")))
```

Python
```
data = sqlContext.createDataFrame([(1, "1"), (2, "2"), (3, "2"), (1, "3")], 
["a", "b"])
my_filter = udf(lambda a: a == 1, BooleanType())
data.select(col("a")).distinct().filter(my_filter(col("a")))
```

The logical plan comes out `execute(aggregateCondition)` in here is as 
below:


https://github.com/apache/spark/blob/916fc34f98dd731f607d9b3ed657bad6cc30df2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L801

Scala
```
Aggregate [a#8], [UDF(a#8) AS havingCondition#11]
+- Project [a#8]
   +- Project [_1#6 AS a#8,_2#7 AS b#9]
  +- LocalRelation [_1#6,_2#7], [[1,1],[2,2],[3,2],[1,3]]
```

Python
```
Project [havingCondition#2]
+- Aggregate [a#0L], [pythonUDF#3 AS havingCondition#2]
   +- EvaluatePython PythonUDF#(a#0L), pythonUDF#3: boolean
  +- Project [a#0L]
 +- LogicalRDD [a#0L,b#1], MapPartitionsRDD[4] at 
applySchemaToPythonRDD at NativeMethodAccessorImpl.java:-2
```
We can see in Python's case, we inject an extra Project when 
`execute(aggregateCondition)`going through ExtractPythonUDFs, but 
ResolveAggregateFunctions expects an Aggregate here:


https://github.com/apache/spark/blob/916fc34f98dd731f607d9b3ed657bad6cc30df2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L801-L805


With this fix, the logical plan generated for Python UDFs does not 
construct a Project if it is an Aggregate, making it consistent with its Scala 
counterpart, which gives correct results for ResolveAggregateFunctions to 
consume:

After fix, Python:
```
Aggregate [a#0L], [pythonUDF#3 AS havingCondition#2]
+- EvaluatePython PythonUDF#(a#0L), pythonUDF#3: boolean
   +- Project [a#0L]
  +- LogicalRDD [a#0L,b#1], MapPartitionsRDD[4] at 
applySchemaToPythonRDD at NativeMethodAccessorImpl.java:-2
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-29 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/11136#discussion_r54524989
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
 ---
@@ -0,0 +1,577 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.regression
+
+import breeze.stats.distributions.{Gaussian => GD}
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.PredictorParams
+import org.apache.spark.ml.feature.Instance
+import org.apache.spark.ml.optim._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.util.Identifiable
+import org.apache.spark.mllib.linalg.{BLAS, Vector}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.functions._
+
+/**
+ * Params for Generalized Linear Regression.
+ */
+private[regression] trait GeneralizedLinearRegressionBase extends 
PredictorParams
+  with HasFitIntercept with HasMaxIter with HasTol with HasRegParam with 
HasWeightCol
+  with HasSolver with Logging {
+
+  /**
+   * Param for the name of family which is a description of the error 
distribution
+   * to be used in the model.
+   * Supported options: "gaussian", "binomial", "poisson" and "gamma".
+   * Default is "gaussian".
+   * @group param
+   */
+  @Since("2.0.0")
+  final val family: Param[String] = new Param(this, "family",
+"The name of family which is a description of the error distribution 
to be used in the " +
+  "model. Supported options: gaussian(default), binomial, poisson and 
gamma.",
+
ParamValidators.inArray[String](GeneralizedLinearRegression.supportedFamilyNames.toArray))
+
+  /** @group getParam */
+  @Since("2.0.0")
+  def getFamily: String = $(family)
+
+  /**
+   * Param for the name of link function which provides the relationship
+   * between the linear predictor and the mean of the distribution 
function.
+   * Supported options: "identity", "log", "inverse", "logit", "probit", 
"cloglog" and "sqrt".
+   * @group param
+   */
+  @Since("2.0.0")
+  final val link: Param[String] = new Param(this, "link", "The name of 
link function " +
+"which provides the relationship between the linear predictor and the 
mean of the " +
+"distribution function. Supported options: identity, log, inverse, 
logit, probit, " +
+"cloglog and sqrt.",
+
ParamValidators.inArray[String](GeneralizedLinearRegression.supportedLinkNames.toArray))
+
+  /** @group getParam */
+  @Since("2.0.0")
+  def getLink: String = $(link)
+
+  import GeneralizedLinearRegression._
+
+  @Since("2.0.0")
+  override def validateParams(): Unit = {
+if ($(solver) == "irls") {
+  setDefault(maxIter -> 25)
+}
+if (isDefined(link)) {
+  require(supportedFamilyAndLinkPairs.contains(
+Family.fromName($(family)) -> Link.fromName($(link))), 
"Generalized Linear Regression " +
+s"with ${$(family)} family does not support ${$(link)} link 
function.")
+}
+  }
+}
+
+/**
+ * :: Experimental ::
+ *
+ * Fit a Generalized Linear Model 
([[https://en.wikipedia.org/wiki/Generalized_linear_model]])
+ * specified by giving a symbolic description of the linear predictor 
(link function) and
+ * a description of the error distribution (family).
+ * It supports "gaussian", "binomial", "poisson" and "gamma" as family.
+ * Valid link functions for each family is listed below. The first link 
function of each family
+ * is the default one.
+ *  - "gaussian" -> "identity", "log", "inverse"
+ *  - "binomial" -> "logit", "probit", "cloglog"
+ *  - "poisson"  -> "log", "identity", "sqrt"
+ *  - "gamma"-> "inverse", "identity", "log"
+ */

[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...

2016-02-29 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/10940#issuecomment-190553302
  
@coderxiang @dbtsai Sorry for late response! I actually thought this PR 
already got merged ... Anyway, I tested `glmnet` and found that `glmnet` 
outputs zero coefficients for constant columns regardless of intercept, 
regularization, and standardization settings. I thought about it today and I 
feel it actually makes sense. If we have a constant column in our training 
data, do we expect it to change or stay constant in test data? If its value 
might change, we should set its coefficient to zero because we cannot estimate 
how big the change would be. If its value stays constant (or maybe users 
created this column to add bias manually), it shouldn't be regularized and 
users should really turn on `fitIntercept` instead. So my suggestion is to 
follow glmnet and set the coefficients of constant columns to zero regardless 
of other settings. If there are constant columns and `fitIntercept` is false. 
We should output a warning message. Does it sound good to you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12817] Add BlockManager.getOrElseUpdate...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11436#issuecomment-190552987
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52212/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12817] Add BlockManager.getOrElseUpdate...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11436#issuecomment-190552985
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12817] Add BlockManager.getOrElseUpdate...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11436#issuecomment-190552868
  
**[Test build #52212 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52212/consoleFull)**
 for PR 11436 at commit 
[`50f66d1`](https://github.com/apache/spark/commit/50f66d18a5b836a8012e171a1ece8bea83c60e19).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13457][SQL] Removes DataFrame RDD opera...

2016-02-29 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/11388#discussion_r54524553
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
@@ -1427,30 +1427,6 @@ class DataFrame private[sql](
   def transform[U](t: DataFrame => DataFrame): DataFrame = t(this)
 
   /**
-   * Returns a new RDD by applying a function to all rows of this 
DataFrame.
-   * @group rdd
-   * @since 1.3.0
-   */
-  def map[R: ClassTag](f: Row => R): RDD[R] = rdd.map(f)
--- End diff --

Good question... Checked the Jenkins MiMA section of the build log of this 
PR, didn't see any lines related to DataFrame.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13457][SQL] Removes DataFrame RDD opera...

2016-02-29 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/11388#discussion_r54524448
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/evaluation/RegressionEvaluator.scala 
---
@@ -85,7 +85,8 @@ final class RegressionEvaluator @Since("1.4.0") 
(@Since("1.4.0") override val ui
 
 val predictionAndLabels = dataset
   .select(col($(predictionCol)).cast(DoubleType), 
col($(labelCol)).cast(DoubleType))
-  .map { case Row(prediction: Double, label: Double) =>
+  .rdd.
--- End diff --

Thanks, will fix this in future PRs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190551990
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190551991
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52213/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190551827
  
**[Test build #52213 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52213/consoleFull)**
 for PR 11391 at commit 
[`c887cf4`](https://github.com/apache/spark/commit/c887cf47a36da8d34e33afeca273f415df629fbb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12893][YARN] Fix history URL redirect e...

2016-02-29 Thread jerryshao
Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/10821#issuecomment-190551333
  
@steveloughran , here "1" is the number of attempts 
[here](https://github.com/apache/spark/blob/master/core/src/main/resources/org/apache/spark/ui/static/historypage.js#L126),
 and it used to generate a URL 
[here](https://github.com/apache/spark/blob/master/core/src/main/resources/org/apache/spark/ui/static/historypage-template.html#L67).
 Also in the yarn code, this "1" or "2" is gotten from [attempt 
id](https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L276).
 

This "1" or "2" as attempt id to concatenate the URL is not accessable  in 
my local test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13139][SQL] Create native DDL commands

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11048#issuecomment-190548517
  
**[Test build #52217 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52217/consoleFull)**
 for PR 11048 at commit 
[`6032268`](https://github.com/apache/spark/commit/603226830dc8aee52ca957c60f15cb164f10fb90).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11136#issuecomment-190538874
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52216/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11136#issuecomment-190538867
  
**[Test build #52216 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52216/consoleFull)**
 for PR 11136 at commit 
[`31a912c`](https://github.com/apache/spark/commit/31a912cd74cf3dffbf8cc0af8c57b777d49579eb).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-29 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11359#discussion_r54522189
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala 
---
@@ -93,4 +97,74 @@ case class Sort(
   sortedIterator
 }
   }
+
+  override def upstreams(): Seq[RDD[InternalRow]] = {
+child.asInstanceOf[CodegenSupport].upstreams()
+  }
+
+  // Name of sorter variable used in codegen.
+  private var sorterVariable: String = _
+
+  override protected def doProduce(ctx: CodegenContext): String = {
+val needToSort = ctx.freshName("needToSort")
+ctx.addMutableState("boolean", needToSort, s"$needToSort = true;")
+
+
+// Initialize the class member variables. This includes the instance 
of the Sorter and
+// the iterator to return sorted rows.
+val thisPlan = ctx.addReferenceObj("plan", this)
+sorterVariable = ctx.freshName("sorter")
+ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, 
sorterVariable,
+  s"$sorterVariable = $thisPlan.createSorter();")
+val metrics = ctx.freshName("metrics")
+ctx.addMutableState(classOf[TaskMetrics].getName, metrics,
+  s"$metrics = org.apache.spark.TaskContext.get().taskMetrics();")
+val sortedIterator = ctx.freshName("sortedIter")
+ctx.addMutableState("scala.collection.Iterator", 
sortedIterator, "")
+
+val addToSorter = ctx.freshName("addToSorter")
+ctx.addNewFunction(addToSorter,
+  s"""
+| private void $addToSorter() throws java.io.IOException {
+|   ${child.asInstanceOf[CodegenSupport].produce(ctx, this)}
+| }
+  """.stripMargin.trim)
+
+val outputRow = ctx.freshName("outputRow")
+val dataSize = metricTerm(ctx, "dataSize")
+val spillSize = metricTerm(ctx, "spillSize")
+val spillSizeBefore = ctx.freshName("spillSizeBefore")
+s"""
+   | if ($needToSort) {
+   |   $addToSorter();
+   |   Long $spillSizeBefore = $metrics.memoryBytesSpilled();
+   |   $sortedIterator = $sorterVariable.sort();
+   |   $dataSize.add($sorterVariable.getPeakMemoryUsage());
+   |   $spillSize.add($metrics.memoryBytesSpilled() - 
$spillSizeBefore);
+   |   
$metrics.incPeakExecutionMemory($sorterVariable.getPeakMemoryUsage());
+   |   $needToSort = false;
+   | }
+   |
+   | while ($sortedIterator.hasNext()) {
+   |   UnsafeRow $outputRow = (UnsafeRow)$sortedIterator.next();
+   |   ${consume(ctx, null, outputRow)}
+   |   if (shouldStop()) return;
+   | }
+ """.stripMargin.trim
+  }
+
+  override def doConsume(ctx: CodegenContext, input: Seq[ExprCode]): 
String = {
+val colExprs = child.output.zipWithIndex.map { case (attr, i) =>
+  BoundReference(i, attr.dataType, attr.nullable)
+}
+
+ctx.currentVars = input
+val code = GenerateUnsafeProjection.createCode(ctx, colExprs)
+
+s"""
+   | // Convert the input attributes to an UnsafeRow and add it to the 
sorter
+   | ${code.code}
--- End diff --

This may have performance regression, when Sort is top of Exchange (or 
other operator that produce UnsafeRow), we will create variables from 
UnsafeRow, than create another UnsafeRow using these variables.

See https://github.com/apache/spark/pull/11008#discussion_r53856345

@yhuai Should we revert this patch or fix this by follow-up PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11136#issuecomment-190538871
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13583][BUILD] Enforce `UnusedImports` J...

2016-02-29 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/11438#issuecomment-190537588
  
It seems that Jenkins fails due to irrelevant things like the following.
``` 
Error instrumenting 
class:org.apache.spark.mllib.regression.IsotonicRegressionModel$SaveLoadV1_0$
...
```
Other PRs' test fail with similar logs. Should we wait for a while and 
re-trigger to test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11136#issuecomment-190537367
  
**[Test build #52216 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52216/consoleFull)**
 for PR 11136 at commit 
[`31a912c`](https://github.com/apache/spark/commit/31a912cd74cf3dffbf8cc0af8c57b777d49579eb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11136#discussion_r54521794
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
 ---
@@ -0,0 +1,499 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.regression
+
+import scala.util.Random
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.param.ParamsSuite
+import org.apache.spark.ml.util.MLTestingUtils
+import org.apache.spark.mllib.classification.LogisticRegressionSuite._
+import org.apache.spark.mllib.linalg.{BLAS, DenseVector, Vectors}
+import org.apache.spark.mllib.random._
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.mllib.util.TestingUtils._
+import org.apache.spark.sql.{DataFrame, Row}
+
+class GeneralizedLinearRegressionSuite extends SparkFunSuite with 
MLlibTestSparkContext {
+
+  private val seed: Int = 42
+  @transient var datasetGaussianIdentity: DataFrame = _
+  @transient var datasetGaussianLog: DataFrame = _
+  @transient var datasetGaussianInverse: DataFrame = _
+  @transient var datasetBinomial: DataFrame = _
+  @transient var datasetPoissonLog: DataFrame = _
+  @transient var datasetPoissonIdentity: DataFrame = _
+  @transient var datasetPoissonSqrt: DataFrame = _
+  @transient var datasetGammaInverse: DataFrame = _
+  @transient var datasetGammaIdentity: DataFrame = _
+  @transient var datasetGammaLog: DataFrame = _
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+
+import GeneralizedLinearRegressionSuite._
+
+datasetGaussianIdentity = sqlContext.createDataFrame(
+  sc.parallelize(generateGeneralizedLinearRegressionInput(
+intercept = 2.5, coefficients = Array(2.2, 0.6), xMean = 
Array(2.9, 10.5),
+xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01,
+family = "gaussian", link = "identity"), 2))
+
+datasetGaussianLog = sqlContext.createDataFrame(
+  sc.parallelize(generateGeneralizedLinearRegressionInput(
+intercept = 0.25, coefficients = Array(0.22, 0.06), xMean = 
Array(2.9, 10.5),
+xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01,
+family = "gaussian", link = "log"), 2))
+
+datasetGaussianInverse = sqlContext.createDataFrame(
+  sc.parallelize(generateGeneralizedLinearRegressionInput(
+intercept = 2.5, coefficients = Array(2.2, 0.6), xMean = 
Array(2.9, 10.5),
+xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01,
+family = "gaussian", link = "inverse"), 2))
+
+datasetBinomial = {
+  val nPoints = 1
+  val coefficients = Array(-0.57997, 0.912083, -0.371077, -0.819866, 
2.688191)
+  val xMean = Array(5.843, 3.057, 3.758, 1.199)
+  val xVariance = Array(0.6856, 0.1899, 3.116, 0.581)
+
+  val testData =
+generateMultinomialLogisticInput(coefficients, xMean, xVariance, 
true, nPoints, seed)
+
+  sqlContext.createDataFrame(sc.parallelize(testData, 4))
+}
+
+datasetPoissonLog = sqlContext.createDataFrame(
+  sc.parallelize(generateGeneralizedLinearRegressionInput(
+intercept = 0.25, coefficients = Array(0.22, 0.06), xMean = 
Array(2.9, 10.5),
+xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01,
+family = "poisson", link = "log"), 2))
+
+datasetPoissonIdentity = sqlContext.createDataFrame(
+  sc.parallelize(generateGeneralizedLinearRegressionInput(
+intercept = 2.5, coefficients = Array(2.2, 0.6), xMean = 
Array(2.9, 10.5),
+xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01,
+family = "poisson", link = "identity"), 2))
+
+datasetPoissonSqrt = sqlContext.createDataFrame(
+  

[GitHub] spark pull request: [SPARK-13586][STREAMING]add config to skip gen...

2016-02-29 Thread jerryshao
Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/11440#issuecomment-190531231
  
Also for some windowing operations, I think this removal of down time jobs 
may possibly lead to the inconsistent result of windowing aggregation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11136#issuecomment-190530641
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11136#issuecomment-190530637
  
**[Test build #52215 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52215/consoleFull)**
 for PR 11136 at commit 
[`314b562`](https://github.com/apache/spark/commit/314b562f315723a7117851289c8f5b6e1b16a6ac).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11136#issuecomment-190530642
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52215/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13586][STREAMING]add config to skip gen...

2016-02-29 Thread jerryshao
Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/11440#issuecomment-190530543
  
Jobs generated in the down time can be used for WAL replay, did you test 
when these down jobs are removed, the behavior of WAL replay is still correct?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11136#issuecomment-190529580
  
**[Test build #52215 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52215/consoleFull)**
 for PR 11136 at commit 
[`314b562`](https://github.com/apache/spark/commit/314b562f315723a7117851289c8f5b6e1b16a6ac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13583][BUILD] Enforce `UnusedImports` J...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11438#issuecomment-190529756
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13583][BUILD] Enforce `UnusedImports` J...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11438#issuecomment-190529757
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52203/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13583][BUILD] Enforce `UnusedImports` J...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11438#issuecomment-190529610
  
**[Test build #52203 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52203/consoleFull)**
 for PR 11438 at commit 
[`5e82490`](https://github.com/apache/spark/commit/5e82490c007146393c5d326ad23ca53ea41c4208).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-29 Thread yanboliang
Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/11136#issuecomment-190528993
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-190527518
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52210/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-190527517
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-190527389
  
**[Test build #52210 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52210/consoleFull)**
 for PR 9483 at commit 
[`fdac95b`](https://github.com/apache/spark/commit/fdac95bb06546b5d92b8c5dda5ee633f2221d347).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11136#issuecomment-190527136
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52214/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11136#issuecomment-190527127
  
**[Test build #52214 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52214/consoleFull)**
 for PR 11136 at commit 
[`314b562`](https://github.com/apache/spark/commit/314b562f315723a7117851289c8f5b6e1b16a6ac).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11136#issuecomment-190527134
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9325][SPARK-R] collect() head() and sho...

2016-02-29 Thread sun-rui
Github user sun-rui commented on the pull request:

https://github.com/apache/spark/pull/11336#issuecomment-190526619
  
@olarayej, I am not sure if it is conceptually correct to associate a 
Column to only one DF. Conceptually, a Column could be depend on 0, 1, 2 or 
more DataFrames. For example:
c1 <- df1$c1
c2 <- df2$c2
c3 < - c1 + c2



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190526392
  
**[Test build #52213 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52213/consoleFull)**
 for PR 11391 at commit 
[`c887cf4`](https://github.com/apache/spark/commit/c887cf47a36da8d34e33afeca273f415df629fbb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11136#discussion_r54519291
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
 ---
@@ -0,0 +1,499 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.regression
+
+import scala.util.Random
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.param.ParamsSuite
+import org.apache.spark.ml.util.MLTestingUtils
+import org.apache.spark.mllib.classification.LogisticRegressionSuite._
+import org.apache.spark.mllib.linalg.{BLAS, DenseVector, Vectors}
+import org.apache.spark.mllib.random._
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.mllib.util.TestingUtils._
+import org.apache.spark.sql.{DataFrame, Row}
+
+class GeneralizedLinearRegressionSuite extends SparkFunSuite with 
MLlibTestSparkContext {
+
+  private val seed: Int = 42
+  @transient var datasetGaussianIdentity: DataFrame = _
+  @transient var datasetGaussianLog: DataFrame = _
+  @transient var datasetGaussianInverse: DataFrame = _
+  @transient var datasetBinomial: DataFrame = _
+  @transient var datasetPoissonLog: DataFrame = _
+  @transient var datasetPoissonIdentity: DataFrame = _
+  @transient var datasetPoissonSqrt: DataFrame = _
+  @transient var datasetGammaInverse: DataFrame = _
+  @transient var datasetGammaIdentity: DataFrame = _
+  @transient var datasetGammaLog: DataFrame = _
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+
+import GeneralizedLinearRegressionSuite._
+
+datasetGaussianIdentity = sqlContext.createDataFrame(
+  sc.parallelize(generateGeneralizedLinearRegressionInput(
+intercept = 2.5, coefficients = Array(2.2, 0.6), xMean = 
Array(2.9, 10.5),
+xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01,
+family = "gaussian", link = "identity"), 2))
+
+datasetGaussianLog = sqlContext.createDataFrame(
+  sc.parallelize(generateGeneralizedLinearRegressionInput(
+intercept = 0.25, coefficients = Array(0.22, 0.06), xMean = 
Array(2.9, 10.5),
+xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01,
+family = "gaussian", link = "log"), 2))
+
+datasetGaussianInverse = sqlContext.createDataFrame(
+  sc.parallelize(generateGeneralizedLinearRegressionInput(
+intercept = 2.5, coefficients = Array(2.2, 0.6), xMean = 
Array(2.9, 10.5),
+xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01,
+family = "gaussian", link = "inverse"), 2))
+
+datasetBinomial = {
+  val nPoints = 1
+  val coefficients = Array(-0.57997, 0.912083, -0.371077, -0.819866, 
2.688191)
+  val xMean = Array(5.843, 3.057, 3.758, 1.199)
+  val xVariance = Array(0.6856, 0.1899, 3.116, 0.581)
+
+  val testData =
+generateMultinomialLogisticInput(coefficients, xMean, xVariance, 
true, nPoints, seed)
+
+  sqlContext.createDataFrame(sc.parallelize(testData, 4))
+}
+
+datasetPoissonLog = sqlContext.createDataFrame(
+  sc.parallelize(generateGeneralizedLinearRegressionInput(
+intercept = 0.25, coefficients = Array(0.22, 0.06), xMean = 
Array(2.9, 10.5),
+xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01,
+family = "poisson", link = "log"), 2))
+
+datasetPoissonIdentity = sqlContext.createDataFrame(
+  sc.parallelize(generateGeneralizedLinearRegressionInput(
+intercept = 2.5, coefficients = Array(2.2, 0.6), xMean = 
Array(2.9, 10.5),
+xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01,
+family = "poisson", link = "identity"), 2))
+
+datasetPoissonSqrt = sqlContext.createDataFrame(
+  

[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11136#issuecomment-190526388
  
**[Test build #52214 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52214/consoleFull)**
 for PR 11136 at commit 
[`314b562`](https://github.com/apache/spark/commit/314b562f315723a7117851289c8f5b6e1b16a6ac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-29 Thread yanboliang
Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/11136#issuecomment-190526311
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11136#discussion_r54519070
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
 ---
@@ -0,0 +1,499 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.regression
+
+import scala.util.Random
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.param.ParamsSuite
+import org.apache.spark.ml.util.MLTestingUtils
+import org.apache.spark.mllib.classification.LogisticRegressionSuite._
+import org.apache.spark.mllib.linalg.{BLAS, DenseVector, Vectors}
+import org.apache.spark.mllib.random._
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.mllib.util.TestingUtils._
+import org.apache.spark.sql.{DataFrame, Row}
+
+class GeneralizedLinearRegressionSuite extends SparkFunSuite with 
MLlibTestSparkContext {
+
+  private val seed: Int = 42
+  @transient var datasetGaussianIdentity: DataFrame = _
+  @transient var datasetGaussianLog: DataFrame = _
+  @transient var datasetGaussianInverse: DataFrame = _
+  @transient var datasetBinomial: DataFrame = _
+  @transient var datasetPoissonLog: DataFrame = _
+  @transient var datasetPoissonIdentity: DataFrame = _
+  @transient var datasetPoissonSqrt: DataFrame = _
+  @transient var datasetGammaInverse: DataFrame = _
+  @transient var datasetGammaIdentity: DataFrame = _
+  @transient var datasetGammaLog: DataFrame = _
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+
+import GeneralizedLinearRegressionSuite._
+
+datasetGaussianIdentity = sqlContext.createDataFrame(
+  sc.parallelize(generateGeneralizedLinearRegressionInput(
+intercept = 2.5, coefficients = Array(2.2, 0.6), xMean = 
Array(2.9, 10.5),
+xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01,
+family = "gaussian", link = "identity"), 2))
+
+datasetGaussianLog = sqlContext.createDataFrame(
+  sc.parallelize(generateGeneralizedLinearRegressionInput(
+intercept = 0.25, coefficients = Array(0.22, 0.06), xMean = 
Array(2.9, 10.5),
+xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01,
+family = "gaussian", link = "log"), 2))
+
+datasetGaussianInverse = sqlContext.createDataFrame(
+  sc.parallelize(generateGeneralizedLinearRegressionInput(
+intercept = 2.5, coefficients = Array(2.2, 0.6), xMean = 
Array(2.9, 10.5),
+xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01,
+family = "gaussian", link = "inverse"), 2))
+
+datasetBinomial = {
+  val nPoints = 1
+  val coefficients = Array(-0.57997, 0.912083, -0.371077, -0.819866, 
2.688191)
+  val xMean = Array(5.843, 3.057, 3.758, 1.199)
+  val xVariance = Array(0.6856, 0.1899, 3.116, 0.581)
+
+  val testData =
+generateMultinomialLogisticInput(coefficients, xMean, xVariance, 
true, nPoints, seed)
+
+  sqlContext.createDataFrame(sc.parallelize(testData, 4))
+}
+
+datasetPoissonLog = sqlContext.createDataFrame(
+  sc.parallelize(generateGeneralizedLinearRegressionInput(
+intercept = 0.25, coefficients = Array(0.22, 0.06), xMean = 
Array(2.9, 10.5),
+xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01,
+family = "poisson", link = "log"), 2))
+
+datasetPoissonIdentity = sqlContext.createDataFrame(
+  sc.parallelize(generateGeneralizedLinearRegressionInput(
+intercept = 2.5, coefficients = Array(2.2, 0.6), xMean = 
Array(2.9, 10.5),
+xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01,
+family = "poisson", link = "identity"), 2))
+
+datasetPoissonSqrt = sqlContext.createDataFrame(
+  

[GitHub] spark pull request: [SPARK-12817] Add BlockManager.getOrElseUpdate...

2016-02-29 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/11436#discussion_r54518895
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -852,18 +878,20 @@ private[spark] class BlockManager(
 Await.ready(replicationFuture, Duration.Inf)
   }
 case _ =>
-  val remoteStartTime = System.currentTimeMillis
-  // Serialize the block if not already done
-  if (bytesAfterPut == null) {
-if (valuesAfterPut == null) {
-  throw new SparkException(
-"Underlying put returned neither an Iterator nor bytes! 
This shouldn't happen.")
+  if (blockWasSuccessfullyStored) {
--- End diff --

/cc @tdas, the goal of this change is to avoid attempting to replicate 
deserialized, memory-only blocks if their initial cache / persist fails due to 
a lack of memory.

For the purposes of this patch, we need to do this to prevent the iterator 
from being consumed so that it can be passed back to the caller. More 
generally, though, I think that we should have this change to avoid OOMs by 
trying to serialize an entire partition which was too large to be stored.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12817] Add BlockManager.getOrElseUpdate...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11436#issuecomment-190522557
  
**[Test build #52212 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52212/consoleFull)**
 for PR 11436 at commit 
[`50f66d1`](https://github.com/apache/spark/commit/50f66d18a5b836a8012e171a1ece8bea83c60e19).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13586]add config to skip generate down ...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11440#issuecomment-190522437
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >