[GitHub] spark pull request: [SPARK-13550] [ML] Add java example for ml.clu...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11428 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13550] [ML] Add java example for ml.clu...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/11428#issuecomment-190601335 LGTM. Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13232][YARN] Fix executor node label
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/11129#issuecomment-190599752 Any further updates on it? CC @sryza about this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13548][BUILD] Move tags and unsafe modu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11426#issuecomment-190597216 **[Test build #2596 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2596/consoleFull)** for PR 11426 at commit [`0c967cc`](https://github.com/apache/spark/commit/0c967cc95ce8b709a7c06ec442f9991fe40d9b4e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-190593882 Had an offline discussion with @dbtsai and @coderxiang . We agreed to keep the current behavior and have it well documented. I will mark this JIRA as "won't" and created SPARK-13590 for documentation and logging improvement. @coderxiang Do you mind closing this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6735][YARN] Add window based executor f...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/10241#discussion_r54530945 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -152,7 +164,17 @@ private[yarn] class YarnAllocator( def getNumExecutorsRunning: Int = numExecutorsRunning - def getNumExecutorsFailed: Int = numExecutorsFailed + def getNumExecutorsFailed: Int = synchronized { +val endTime = clock.getTimeMillis() + +while (executorFailuresValidityInterval > 0 + && failedExecutorsTimeStamps.nonEmpty --- End diff -- Sure, I will add it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11136#issuecomment-190591444 **[Test build #52227 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52227/consoleFull)** for PR 11136 at commit [`007a4ec`](https://github.com/apache/spark/commit/007a4ec324db273c048ed65fe8942daba0c9d844). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13385][MLlib] Enable AssociationRules t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11411#issuecomment-190587137 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52221/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13385][MLlib] Enable AssociationRules t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11411#issuecomment-190587133 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13385][MLlib] Enable AssociationRules t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11411#issuecomment-190586830 **[Test build #52221 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52221/consoleFull)** for PR 11411 at commit [`9c3a8c3`](https://github.com/apache/spark/commit/9c3a8c34117f081600f54bd774e58e3ca93aa4ba). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/11391#discussion_r54529927 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala --- @@ -78,4 +78,21 @@ class WholeStageCodegenSuite extends SparkPlanTest with SharedSQLContext { p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort]).isDefined) assert(df.collect() === Array(Row(1), Row(2), Row(3))) } + + test("Limit should be included in WholeStageCodegen") { +val df = sqlContext.range(1).limit(100).sort(col("id")) +val plan = df.queryExecution.executedPlan + +assert(plan.find(p => + p.isInstanceOf[WholeStageCodegen] && +p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort] && --- End diff -- Agreed. Let me remove this later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12817] Add BlockManager.getOrElseUpdate...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11436#issuecomment-190586185 **[Test build #52226 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52226/consoleFull)** for PR 11436 at commit [`f8cccea`](https://github.com/apache/spark/commit/f8cccea3641e17ad656892c42642678f1a86af5b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11391#discussion_r54529803 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala --- @@ -78,4 +78,21 @@ class WholeStageCodegenSuite extends SparkPlanTest with SharedSQLContext { p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort]).isDefined) assert(df.collect() === Array(Row(1), Row(2), Row(3))) } + + test("Limit should be included in WholeStageCodegen") { +val df = sqlContext.range(1).limit(100).sort(col("id")) +val plan = df.queryExecution.executedPlan + +assert(plan.find(p => + p.isInstanceOf[WholeStageCodegen] && +p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort] && --- End diff -- These kind of tests are easy to break, we may don't need this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13538][ML] Add GaussianMixture to ML
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11419#issuecomment-190585130 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13538][ML] Add GaussianMixture to ML
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11419#issuecomment-190585131 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52218/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13538][ML] Add GaussianMixture to ML
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11419#issuecomment-190585033 **[Test build #52218 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52218/consoleFull)** for PR 11419 at commit [`bbf9432`](https://github.com/apache/spark/commit/bbf9432646c4d573606ce9b21d88bd04069ca802). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13139][SQL] Create native DDL commands
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11048#issuecomment-190584968 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52217/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13139][SQL] Create native DDL commands
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11048#issuecomment-190584967 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/11391#discussion_r54529510 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala --- @@ -78,4 +78,21 @@ class WholeStageCodegenSuite extends SparkPlanTest with SharedSQLContext { p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort]).isDefined) assert(df.collect() === Array(Row(1), Row(2), Row(3))) } + + test("Limit should be included in WholeStageCodegen") { +val df = sqlContext.range(1).limit(100).sort(col("id")) +val plan = df.queryExecution.executedPlan + +assert(plan.find(p => + p.isInstanceOf[WholeStageCodegen] && +p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort] && --- End diff -- Yeah, because we can't leave limit as last operator otherwise it will transform to collect limit, so I add a sort here. I will remove it once I am back to laptop (few hours later). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13139][SQL] Create native DDL commands
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11048#issuecomment-190584474 **[Test build #52217 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52217/consoleFull)** for PR 11048 at commit [`6032268`](https://github.com/apache/spark/commit/603226830dc8aee52ca957c60f15cb164f10fb90). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10506#issuecomment-190579869 **[Test build #52225 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52225/consoleFull)** for PR 10506 at commit [`7cec07c`](https://github.com/apache/spark/commit/7cec07c59ffb73261c743c5dffd5ea262ca9c0dc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13404] [SQL] Create variables for input...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11274#issuecomment-190577409 **[Test build #52224 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52224/consoleFull)** for PR 11274 at commit [`1a1452e`](https://github.com/apache/spark/commit/1a1452e8fbcf15314da30b3342dec1bafca012a6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13404] [SQL] Create variables for input...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11274#issuecomment-190576044 **[Test build #2595 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2595/consoleFull)** for PR 11274 at commit [`ca8fe0f`](https://github.com/apache/spark/commit/ca8fe0f5f55cabb1bb5903c3e85c150b31eaa7c7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10506#issuecomment-190575725 **[Test build #52223 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52223/consoleFull)** for PR 10506 at commit [`a117dcd`](https://github.com/apache/spark/commit/a117dcdcfc4ebffb5aa338ead75fbc03515a2db5). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10506#issuecomment-190575732 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52223/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10506#issuecomment-190575730 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13550] [ML] Add java example for ml.clu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11428#issuecomment-190575477 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/10506#issuecomment-190575489 @andrewor14 , would you please review this patch again, it is pending here a long time and I think it is actually a bug here. Thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13550] [ML] Add java example for ml.clu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11428#issuecomment-190575220 **[Test build #52220 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52220/consoleFull)** for PR 11428 at commit [`4f0e3b9`](https://github.com/apache/spark/commit/4f0e3b92549d832936407cd7f2b3d334b087e5a3). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class JavaBisectingKMeansExample ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13550] [ML] Add java example for ml.clu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11428#issuecomment-190575482 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52220/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13404] [SQL] Create variables for input...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11274#issuecomment-190575463 **[Test build #2594 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2594/consoleFull)** for PR 11274 at commit [`ca8fe0f`](https://github.com/apache/spark/commit/ca8fe0f5f55cabb1bb5903c3e85c150b31eaa7c7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12552][Core]Correctly count the driver ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10506#issuecomment-190575051 **[Test build #52223 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52223/consoleFull)** for PR 10506 at commit [`a117dcd`](https://github.com/apache/spark/commit/a117dcdcfc4ebffb5aa338ead75fbc03515a2db5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11391#issuecomment-190575061 **[Test build #5 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/5/consoleFull)** for PR 11391 at commit [`b64e52d`](https://github.com/apache/spark/commit/b64e52d189e5041cc1af2ebb0d656c5f5c12c82d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11391#discussion_r54528196 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala --- @@ -78,4 +78,21 @@ class WholeStageCodegenSuite extends SparkPlanTest with SharedSQLContext { p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort]).isDefined) assert(df.collect() === Array(Row(1), Row(2), Row(3))) } + + test("Limit should be included in WholeStageCodegen") { +val df = sqlContext.range(1).limit(100).sort(col("id")) +val plan = df.queryExecution.executedPlan + +assert(plan.find(p => + p.isInstanceOf[WholeStageCodegen] && +p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort] && --- End diff -- The sort is not related to limit, could you remove it from this PR? (we may revert the commit for sort) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13444] [MLlib] QuantileDiscretizer choo...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/11402#issuecomment-190571986 @oliverpierson I haven't seen this test fails in the master build. If I'm correct, we control the random seed in the master branch resulting deterministic behavior. But we don't have it in branch-1.6. If that is the case, we can either backport the commit that implements `setSeed` (https://github.com/apache/spark/commit/574571c87098795a2206a113ee9ed4bafba8f00f) or backport it but hide the public APIs and fix the seed on branch-1.6 (so we don't expose new APIs). @srowen Which one do you prefer? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13586][STREAMING]add config to skip gen...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/11440#issuecomment-190570144 For example, if your sliding duration is 1, window duration is 4, and batch duration is 1, and the down time is 3. If you skip this this 3 batches, IIUC the result will be wrong, --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13550] [ML] Add java example for ml.clu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11428#issuecomment-190569078 **[Test build #52220 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52220/consoleFull)** for PR 11428 at commit [`4f0e3b9`](https://github.com/apache/spark/commit/4f0e3b92549d832936407cd7f2b3d334b087e5a3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13385][MLlib] Enable AssociationRules t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11411#issuecomment-190569087 **[Test build #52221 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52221/consoleFull)** for PR 11411 at commit [`9c3a8c3`](https://github.com/apache/spark/commit/9c3a8c34117f081600f54bd774e58e3ca93aa4ba). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13551] [MLLib] Fix wrong comment and re...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11429 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13586][STREAMING]add config to skip gen...
Github user jeanlyn commented on the pull request: https://github.com/apache/spark/pull/11440#issuecomment-190568465 Thanks @jerryshao for suggestion! > Jobs generated in the down time can be used for WAL replay, did you test when these down jobs are removed, the behavior of WAL replay is still correct? It seems that the `pendingTimes` is use for WAL replay, i do not skip these batches > Also for some windowing operations, I think this removal of down time jobs may possibly lead to the inconsistent result of windowing aggregation. Does inconsistent result mean wrong result? Also, i will running the unit test with the config set to true by default in my local computer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13551] [MLLib] Fix wrong comment and re...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/11429#issuecomment-190568451 Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13385][MLlib] Enable AssociationRules t...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/11411#issuecomment-190568059 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13550] [ML] Add java example for ml.clu...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/11428#issuecomment-190567826 @srowen It would be nice to have example code in the user guide for every algorithm. And this PR helps. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11391#issuecomment-190567560 **[Test build #52219 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52219/consoleFull)** for PR 11391 at commit [`8d254d2`](https://github.com/apache/spark/commit/8d254d206686dd9c6edd053d4abcd184799fcc2a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13550] [ML] Add java example for ml.clu...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/11428#issuecomment-190567858 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13538][ML] Add GaussianMixture to ML
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11419#issuecomment-190567555 **[Test build #52218 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52218/consoleFull)** for PR 11419 at commit [`bbf9432`](https://github.com/apache/spark/commit/bbf9432646c4d573606ce9b21d88bd04069ca802). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13538][ML] Add GaussianMixture to ML
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/11419#issuecomment-190566838 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13538][ML] Add GaussianMixture to ML
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/11419#issuecomment-190566812 add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/11391#discussion_r54526775 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/BufferedRowIterator.java --- @@ -35,6 +35,8 @@ // used when there is no column in output protected UnsafeRow unsafeRow = new UnsafeRow(0); + protected boolean stopEarly = false; --- End diff -- yeah. I am updating it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190563231 **[Test build #2593 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2593/consoleFull)** for PR 11437 at commit [`6fce801`](https://github.com/apache/spark/commit/6fce80141c76604167914a8cbb39847f1a4f457a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13583][BUILD] Enforce `UnusedImports` J...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/11438#issuecomment-190562653 Rebased to trigger the Jenkins test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12977][Streaming][WIP] Support Streamin...
Github user jerryshao closed the pull request at: https://github.com/apache/spark/pull/10966 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12941][SQL][BRANCH-1.4] Spark-SQL JDBC ...
Github user thomastechs commented on a diff in the pull request: https://github.com/apache/spark/pull/10912#discussion_r54525699 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -445,4 +445,9 @@ class JDBCSuite extends SparkFunSuite with BeforeAndAfter { assert(agg.getCatalystType(1, "", 1, null) == Some(StringType)) } + test("OracleDialect type mapping") { +val oracleDialect = JdbcDialects.get("jdbc:oracle://127.0.0.1/db") +assert(oracleDialect.getJDBCType(StringType). + map(_.databaseTypeDefinition).get == "VARCHAR2(255)") + } --- End diff -- Okei @yhuaiSo I shall submit another PR, for the same JIRA with the updates in the JDBCSuite.scala in the master branch. I shall also close this PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11391#discussion_r54525690 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/BufferedRowIterator.java --- @@ -35,6 +35,8 @@ // used when there is no column in output protected UnsafeRow unsafeRow = new UnsafeRow(0); + protected boolean stopEarly = false; --- End diff -- We could use `addMutableState` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11391#discussion_r54525659 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/BufferedRowIterator.java --- @@ -35,6 +35,8 @@ // used when there is no column in output protected UnsafeRow unsafeRow = new UnsafeRow(0); + protected boolean stopEarly = false; --- End diff -- Since `stopEarly` is only accessed generated functions, we don't need this anymore. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12981][SQL] Fix Python UDF extraction f...
Github user xguo27 commented on the pull request: https://github.com/apache/spark/pull/10935#issuecomment-190554648 Using these two functionally equavalent code snippets: Scala ``` val data = Seq((1, "1"), (2, "2"), (3, "2"), (1, "3")).toDF("a","b") val my_filter = sqlContext.udf.register("my_filter", (a:Int) => a==1) data.select(col("a")).distinct().filter(my_filter(col("a"))) ``` Python ``` data = sqlContext.createDataFrame([(1, "1"), (2, "2"), (3, "2"), (1, "3")], ["a", "b"]) my_filter = udf(lambda a: a == 1, BooleanType()) data.select(col("a")).distinct().filter(my_filter(col("a"))) ``` The logical plan comes out `execute(aggregateCondition)` in here is as below: https://github.com/apache/spark/blob/916fc34f98dd731f607d9b3ed657bad6cc30df2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L801 Scala ``` Aggregate [a#8], [UDF(a#8) AS havingCondition#11] +- Project [a#8] +- Project [_1#6 AS a#8,_2#7 AS b#9] +- LocalRelation [_1#6,_2#7], [[1,1],[2,2],[3,2],[1,3]] ``` Python ``` Project [havingCondition#2] +- Aggregate [a#0L], [pythonUDF#3 AS havingCondition#2] +- EvaluatePython PythonUDF#(a#0L), pythonUDF#3: boolean +- Project [a#0L] +- LogicalRDD [a#0L,b#1], MapPartitionsRDD[4] at applySchemaToPythonRDD at NativeMethodAccessorImpl.java:-2 ``` We can see in Python's case, we inject an extra Project when `execute(aggregateCondition)`going through ExtractPythonUDFs, but ResolveAggregateFunctions expects an Aggregate here: https://github.com/apache/spark/blob/916fc34f98dd731f607d9b3ed657bad6cc30df2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L801-L805 With this fix, the logical plan generated for Python UDFs does not construct a Project if it is an Aggregate, making it consistent with its Scala counterpart, which gives correct results for ResolveAggregateFunctions to consume: After fix, Python: ``` Aggregate [a#0L], [pythonUDF#3 AS havingCondition#2] +- EvaluatePython PythonUDF#(a#0L), pythonUDF#3: boolean +- Project [a#0L] +- LogicalRDD [a#0L,b#1], MapPartitionsRDD[4] at applySchemaToPythonRDD at NativeMethodAccessorImpl.java:-2 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/11136#discussion_r54524989 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -0,0 +1,577 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.regression + +import breeze.stats.distributions.{Gaussian => GD} + +import org.apache.spark.{Logging, SparkException} +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.ml.PredictorParams +import org.apache.spark.ml.feature.Instance +import org.apache.spark.ml.optim._ +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared._ +import org.apache.spark.ml.util.Identifiable +import org.apache.spark.mllib.linalg.{BLAS, Vector} +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{DataFrame, Row} +import org.apache.spark.sql.functions._ + +/** + * Params for Generalized Linear Regression. + */ +private[regression] trait GeneralizedLinearRegressionBase extends PredictorParams + with HasFitIntercept with HasMaxIter with HasTol with HasRegParam with HasWeightCol + with HasSolver with Logging { + + /** + * Param for the name of family which is a description of the error distribution + * to be used in the model. + * Supported options: "gaussian", "binomial", "poisson" and "gamma". + * Default is "gaussian". + * @group param + */ + @Since("2.0.0") + final val family: Param[String] = new Param(this, "family", +"The name of family which is a description of the error distribution to be used in the " + + "model. Supported options: gaussian(default), binomial, poisson and gamma.", + ParamValidators.inArray[String](GeneralizedLinearRegression.supportedFamilyNames.toArray)) + + /** @group getParam */ + @Since("2.0.0") + def getFamily: String = $(family) + + /** + * Param for the name of link function which provides the relationship + * between the linear predictor and the mean of the distribution function. + * Supported options: "identity", "log", "inverse", "logit", "probit", "cloglog" and "sqrt". + * @group param + */ + @Since("2.0.0") + final val link: Param[String] = new Param(this, "link", "The name of link function " + +"which provides the relationship between the linear predictor and the mean of the " + +"distribution function. Supported options: identity, log, inverse, logit, probit, " + +"cloglog and sqrt.", + ParamValidators.inArray[String](GeneralizedLinearRegression.supportedLinkNames.toArray)) + + /** @group getParam */ + @Since("2.0.0") + def getLink: String = $(link) + + import GeneralizedLinearRegression._ + + @Since("2.0.0") + override def validateParams(): Unit = { +if ($(solver) == "irls") { + setDefault(maxIter -> 25) +} +if (isDefined(link)) { + require(supportedFamilyAndLinkPairs.contains( +Family.fromName($(family)) -> Link.fromName($(link))), "Generalized Linear Regression " + +s"with ${$(family)} family does not support ${$(link)} link function.") +} + } +} + +/** + * :: Experimental :: + * + * Fit a Generalized Linear Model ([[https://en.wikipedia.org/wiki/Generalized_linear_model]]) + * specified by giving a symbolic description of the linear predictor (link function) and + * a description of the error distribution (family). + * It supports "gaussian", "binomial", "poisson" and "gamma" as family. + * Valid link functions for each family is listed below. The first link function of each family + * is the default one. + * - "gaussian" -> "identity", "log", "inverse" + * - "binomial" -> "logit", "probit", "cloglog" + * - "poisson" -> "log", "identity", "sqrt" + * - "gamma"-> "inverse", "identity", "log" + */
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-190553302 @coderxiang @dbtsai Sorry for late response! I actually thought this PR already got merged ... Anyway, I tested `glmnet` and found that `glmnet` outputs zero coefficients for constant columns regardless of intercept, regularization, and standardization settings. I thought about it today and I feel it actually makes sense. If we have a constant column in our training data, do we expect it to change or stay constant in test data? If its value might change, we should set its coefficient to zero because we cannot estimate how big the change would be. If its value stays constant (or maybe users created this column to add bias manually), it shouldn't be regularized and users should really turn on `fitIntercept` instead. So my suggestion is to follow glmnet and set the coefficients of constant columns to zero regardless of other settings. If there are constant columns and `fitIntercept` is false. We should output a warning message. Does it sound good to you? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12817] Add BlockManager.getOrElseUpdate...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11436#issuecomment-190552987 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52212/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12817] Add BlockManager.getOrElseUpdate...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11436#issuecomment-190552985 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12817] Add BlockManager.getOrElseUpdate...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11436#issuecomment-190552868 **[Test build #52212 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52212/consoleFull)** for PR 11436 at commit [`50f66d1`](https://github.com/apache/spark/commit/50f66d18a5b836a8012e171a1ece8bea83c60e19). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13457][SQL] Removes DataFrame RDD opera...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/11388#discussion_r54524553 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala --- @@ -1427,30 +1427,6 @@ class DataFrame private[sql]( def transform[U](t: DataFrame => DataFrame): DataFrame = t(this) /** - * Returns a new RDD by applying a function to all rows of this DataFrame. - * @group rdd - * @since 1.3.0 - */ - def map[R: ClassTag](f: Row => R): RDD[R] = rdd.map(f) --- End diff -- Good question... Checked the Jenkins MiMA section of the build log of this PR, didn't see any lines related to DataFrame. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13457][SQL] Removes DataFrame RDD opera...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/11388#discussion_r54524448 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/RegressionEvaluator.scala --- @@ -85,7 +85,8 @@ final class RegressionEvaluator @Since("1.4.0") (@Since("1.4.0") override val ui val predictionAndLabels = dataset .select(col($(predictionCol)).cast(DoubleType), col($(labelCol)).cast(DoubleType)) - .map { case Row(prediction: Double, label: Double) => + .rdd. --- End diff -- Thanks, will fix this in future PRs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11391#issuecomment-190551990 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11391#issuecomment-190551991 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52213/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11391#issuecomment-190551827 **[Test build #52213 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52213/consoleFull)** for PR 11391 at commit [`c887cf4`](https://github.com/apache/spark/commit/c887cf47a36da8d34e33afeca273f415df629fbb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12893][YARN] Fix history URL redirect e...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/10821#issuecomment-190551333 @steveloughran , here "1" is the number of attempts [here](https://github.com/apache/spark/blob/master/core/src/main/resources/org/apache/spark/ui/static/historypage.js#L126), and it used to generate a URL [here](https://github.com/apache/spark/blob/master/core/src/main/resources/org/apache/spark/ui/static/historypage-template.html#L67). Also in the yarn code, this "1" or "2" is gotten from [attempt id](https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L276). This "1" or "2" as attempt id to concatenate the URL is not accessable in my local test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13139][SQL] Create native DDL commands
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11048#issuecomment-190548517 **[Test build #52217 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52217/consoleFull)** for PR 11048 at commit [`6032268`](https://github.com/apache/spark/commit/603226830dc8aee52ca957c60f15cb164f10fb90). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11136#issuecomment-190538874 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52216/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11136#issuecomment-190538867 **[Test build #52216 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52216/consoleFull)** for PR 11136 at commit [`31a912c`](https://github.com/apache/spark/commit/31a912cd74cf3dffbf8cc0af8c57b777d49579eb). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11359#discussion_r54522189 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala --- @@ -93,4 +97,74 @@ case class Sort( sortedIterator } } + + override def upstreams(): Seq[RDD[InternalRow]] = { +child.asInstanceOf[CodegenSupport].upstreams() + } + + // Name of sorter variable used in codegen. + private var sorterVariable: String = _ + + override protected def doProduce(ctx: CodegenContext): String = { +val needToSort = ctx.freshName("needToSort") +ctx.addMutableState("boolean", needToSort, s"$needToSort = true;") + + +// Initialize the class member variables. This includes the instance of the Sorter and +// the iterator to return sorted rows. +val thisPlan = ctx.addReferenceObj("plan", this) +sorterVariable = ctx.freshName("sorter") +ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, sorterVariable, + s"$sorterVariable = $thisPlan.createSorter();") +val metrics = ctx.freshName("metrics") +ctx.addMutableState(classOf[TaskMetrics].getName, metrics, + s"$metrics = org.apache.spark.TaskContext.get().taskMetrics();") +val sortedIterator = ctx.freshName("sortedIter") +ctx.addMutableState("scala.collection.Iterator", sortedIterator, "") + +val addToSorter = ctx.freshName("addToSorter") +ctx.addNewFunction(addToSorter, + s""" +| private void $addToSorter() throws java.io.IOException { +| ${child.asInstanceOf[CodegenSupport].produce(ctx, this)} +| } + """.stripMargin.trim) + +val outputRow = ctx.freshName("outputRow") +val dataSize = metricTerm(ctx, "dataSize") +val spillSize = metricTerm(ctx, "spillSize") +val spillSizeBefore = ctx.freshName("spillSizeBefore") +s""" + | if ($needToSort) { + | $addToSorter(); + | Long $spillSizeBefore = $metrics.memoryBytesSpilled(); + | $sortedIterator = $sorterVariable.sort(); + | $dataSize.add($sorterVariable.getPeakMemoryUsage()); + | $spillSize.add($metrics.memoryBytesSpilled() - $spillSizeBefore); + | $metrics.incPeakExecutionMemory($sorterVariable.getPeakMemoryUsage()); + | $needToSort = false; + | } + | + | while ($sortedIterator.hasNext()) { + | UnsafeRow $outputRow = (UnsafeRow)$sortedIterator.next(); + | ${consume(ctx, null, outputRow)} + | if (shouldStop()) return; + | } + """.stripMargin.trim + } + + override def doConsume(ctx: CodegenContext, input: Seq[ExprCode]): String = { +val colExprs = child.output.zipWithIndex.map { case (attr, i) => + BoundReference(i, attr.dataType, attr.nullable) +} + +ctx.currentVars = input +val code = GenerateUnsafeProjection.createCode(ctx, colExprs) + +s""" + | // Convert the input attributes to an UnsafeRow and add it to the sorter + | ${code.code} --- End diff -- This may have performance regression, when Sort is top of Exchange (or other operator that produce UnsafeRow), we will create variables from UnsafeRow, than create another UnsafeRow using these variables. See https://github.com/apache/spark/pull/11008#discussion_r53856345 @yhuai Should we revert this patch or fix this by follow-up PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11136#issuecomment-190538871 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13583][BUILD] Enforce `UnusedImports` J...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/11438#issuecomment-190537588 It seems that Jenkins fails due to irrelevant things like the following. ``` Error instrumenting class:org.apache.spark.mllib.regression.IsotonicRegressionModel$SaveLoadV1_0$ ... ``` Other PRs' test fail with similar logs. Should we wait for a while and re-trigger to test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11136#issuecomment-190537367 **[Test build #52216 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52216/consoleFull)** for PR 11136 at commit [`31a912c`](https://github.com/apache/spark/commit/31a912cd74cf3dffbf8cc0af8c57b777d49579eb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/11136#discussion_r54521794 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -0,0 +1,499 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.regression + +import scala.util.Random + +import org.apache.spark.SparkFunSuite +import org.apache.spark.ml.param.ParamsSuite +import org.apache.spark.ml.util.MLTestingUtils +import org.apache.spark.mllib.classification.LogisticRegressionSuite._ +import org.apache.spark.mllib.linalg.{BLAS, DenseVector, Vectors} +import org.apache.spark.mllib.random._ +import org.apache.spark.mllib.regression.LabeledPoint +import org.apache.spark.mllib.util.MLlibTestSparkContext +import org.apache.spark.mllib.util.TestingUtils._ +import org.apache.spark.sql.{DataFrame, Row} + +class GeneralizedLinearRegressionSuite extends SparkFunSuite with MLlibTestSparkContext { + + private val seed: Int = 42 + @transient var datasetGaussianIdentity: DataFrame = _ + @transient var datasetGaussianLog: DataFrame = _ + @transient var datasetGaussianInverse: DataFrame = _ + @transient var datasetBinomial: DataFrame = _ + @transient var datasetPoissonLog: DataFrame = _ + @transient var datasetPoissonIdentity: DataFrame = _ + @transient var datasetPoissonSqrt: DataFrame = _ + @transient var datasetGammaInverse: DataFrame = _ + @transient var datasetGammaIdentity: DataFrame = _ + @transient var datasetGammaLog: DataFrame = _ + + override def beforeAll(): Unit = { +super.beforeAll() + +import GeneralizedLinearRegressionSuite._ + +datasetGaussianIdentity = sqlContext.createDataFrame( + sc.parallelize(generateGeneralizedLinearRegressionInput( +intercept = 2.5, coefficients = Array(2.2, 0.6), xMean = Array(2.9, 10.5), +xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01, +family = "gaussian", link = "identity"), 2)) + +datasetGaussianLog = sqlContext.createDataFrame( + sc.parallelize(generateGeneralizedLinearRegressionInput( +intercept = 0.25, coefficients = Array(0.22, 0.06), xMean = Array(2.9, 10.5), +xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01, +family = "gaussian", link = "log"), 2)) + +datasetGaussianInverse = sqlContext.createDataFrame( + sc.parallelize(generateGeneralizedLinearRegressionInput( +intercept = 2.5, coefficients = Array(2.2, 0.6), xMean = Array(2.9, 10.5), +xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01, +family = "gaussian", link = "inverse"), 2)) + +datasetBinomial = { + val nPoints = 1 + val coefficients = Array(-0.57997, 0.912083, -0.371077, -0.819866, 2.688191) + val xMean = Array(5.843, 3.057, 3.758, 1.199) + val xVariance = Array(0.6856, 0.1899, 3.116, 0.581) + + val testData = +generateMultinomialLogisticInput(coefficients, xMean, xVariance, true, nPoints, seed) + + sqlContext.createDataFrame(sc.parallelize(testData, 4)) +} + +datasetPoissonLog = sqlContext.createDataFrame( + sc.parallelize(generateGeneralizedLinearRegressionInput( +intercept = 0.25, coefficients = Array(0.22, 0.06), xMean = Array(2.9, 10.5), +xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01, +family = "poisson", link = "log"), 2)) + +datasetPoissonIdentity = sqlContext.createDataFrame( + sc.parallelize(generateGeneralizedLinearRegressionInput( +intercept = 2.5, coefficients = Array(2.2, 0.6), xMean = Array(2.9, 10.5), +xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01, +family = "poisson", link = "identity"), 2)) + +datasetPoissonSqrt = sqlContext.createDataFrame( +
[GitHub] spark pull request: [SPARK-13586][STREAMING]add config to skip gen...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/11440#issuecomment-190531231 Also for some windowing operations, I think this removal of down time jobs may possibly lead to the inconsistent result of windowing aggregation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11136#issuecomment-190530641 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11136#issuecomment-190530637 **[Test build #52215 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52215/consoleFull)** for PR 11136 at commit [`314b562`](https://github.com/apache/spark/commit/314b562f315723a7117851289c8f5b6e1b16a6ac). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11136#issuecomment-190530642 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52215/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13586][STREAMING]add config to skip gen...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/11440#issuecomment-190530543 Jobs generated in the down time can be used for WAL replay, did you test when these down jobs are removed, the behavior of WAL replay is still correct? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11136#issuecomment-190529580 **[Test build #52215 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52215/consoleFull)** for PR 11136 at commit [`314b562`](https://github.com/apache/spark/commit/314b562f315723a7117851289c8f5b6e1b16a6ac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13583][BUILD] Enforce `UnusedImports` J...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11438#issuecomment-190529756 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13583][BUILD] Enforce `UnusedImports` J...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11438#issuecomment-190529757 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52203/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13583][BUILD] Enforce `UnusedImports` J...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11438#issuecomment-190529610 **[Test build #52203 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52203/consoleFull)** for PR 11438 at commit [`5e82490`](https://github.com/apache/spark/commit/5e82490c007146393c5d326ad23ca53ea41c4208). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/11136#issuecomment-190528993 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-190527518 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52210/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-190527517 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-190527389 **[Test build #52210 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52210/consoleFull)** for PR 9483 at commit [`fdac95b`](https://github.com/apache/spark/commit/fdac95bb06546b5d92b8c5dda5ee633f2221d347). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11136#issuecomment-190527136 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52214/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11136#issuecomment-190527127 **[Test build #52214 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52214/consoleFull)** for PR 11136 at commit [`314b562`](https://github.com/apache/spark/commit/314b562f315723a7117851289c8f5b6e1b16a6ac). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11136#issuecomment-190527134 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9325][SPARK-R] collect() head() and sho...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/11336#issuecomment-190526619 @olarayej, I am not sure if it is conceptually correct to associate a Column to only one DF. Conceptually, a Column could be depend on 0, 1, 2 or more DataFrames. For example: c1 <- df1$c1 c2 <- df2$c2 c3 < - c1 + c2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11391#issuecomment-190526392 **[Test build #52213 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52213/consoleFull)** for PR 11391 at commit [`c887cf4`](https://github.com/apache/spark/commit/c887cf47a36da8d34e33afeca273f415df629fbb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/11136#discussion_r54519291 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -0,0 +1,499 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.regression + +import scala.util.Random + +import org.apache.spark.SparkFunSuite +import org.apache.spark.ml.param.ParamsSuite +import org.apache.spark.ml.util.MLTestingUtils +import org.apache.spark.mllib.classification.LogisticRegressionSuite._ +import org.apache.spark.mllib.linalg.{BLAS, DenseVector, Vectors} +import org.apache.spark.mllib.random._ +import org.apache.spark.mllib.regression.LabeledPoint +import org.apache.spark.mllib.util.MLlibTestSparkContext +import org.apache.spark.mllib.util.TestingUtils._ +import org.apache.spark.sql.{DataFrame, Row} + +class GeneralizedLinearRegressionSuite extends SparkFunSuite with MLlibTestSparkContext { + + private val seed: Int = 42 + @transient var datasetGaussianIdentity: DataFrame = _ + @transient var datasetGaussianLog: DataFrame = _ + @transient var datasetGaussianInverse: DataFrame = _ + @transient var datasetBinomial: DataFrame = _ + @transient var datasetPoissonLog: DataFrame = _ + @transient var datasetPoissonIdentity: DataFrame = _ + @transient var datasetPoissonSqrt: DataFrame = _ + @transient var datasetGammaInverse: DataFrame = _ + @transient var datasetGammaIdentity: DataFrame = _ + @transient var datasetGammaLog: DataFrame = _ + + override def beforeAll(): Unit = { +super.beforeAll() + +import GeneralizedLinearRegressionSuite._ + +datasetGaussianIdentity = sqlContext.createDataFrame( + sc.parallelize(generateGeneralizedLinearRegressionInput( +intercept = 2.5, coefficients = Array(2.2, 0.6), xMean = Array(2.9, 10.5), +xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01, +family = "gaussian", link = "identity"), 2)) + +datasetGaussianLog = sqlContext.createDataFrame( + sc.parallelize(generateGeneralizedLinearRegressionInput( +intercept = 0.25, coefficients = Array(0.22, 0.06), xMean = Array(2.9, 10.5), +xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01, +family = "gaussian", link = "log"), 2)) + +datasetGaussianInverse = sqlContext.createDataFrame( + sc.parallelize(generateGeneralizedLinearRegressionInput( +intercept = 2.5, coefficients = Array(2.2, 0.6), xMean = Array(2.9, 10.5), +xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01, +family = "gaussian", link = "inverse"), 2)) + +datasetBinomial = { + val nPoints = 1 + val coefficients = Array(-0.57997, 0.912083, -0.371077, -0.819866, 2.688191) + val xMean = Array(5.843, 3.057, 3.758, 1.199) + val xVariance = Array(0.6856, 0.1899, 3.116, 0.581) + + val testData = +generateMultinomialLogisticInput(coefficients, xMean, xVariance, true, nPoints, seed) + + sqlContext.createDataFrame(sc.parallelize(testData, 4)) +} + +datasetPoissonLog = sqlContext.createDataFrame( + sc.parallelize(generateGeneralizedLinearRegressionInput( +intercept = 0.25, coefficients = Array(0.22, 0.06), xMean = Array(2.9, 10.5), +xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01, +family = "poisson", link = "log"), 2)) + +datasetPoissonIdentity = sqlContext.createDataFrame( + sc.parallelize(generateGeneralizedLinearRegressionInput( +intercept = 2.5, coefficients = Array(2.2, 0.6), xMean = Array(2.9, 10.5), +xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01, +family = "poisson", link = "identity"), 2)) + +datasetPoissonSqrt = sqlContext.createDataFrame( +
[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11136#issuecomment-190526388 **[Test build #52214 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52214/consoleFull)** for PR 11136 at commit [`314b562`](https://github.com/apache/spark/commit/314b562f315723a7117851289c8f5b6e1b16a6ac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/11136#issuecomment-190526311 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/11136#discussion_r54519070 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -0,0 +1,499 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.regression + +import scala.util.Random + +import org.apache.spark.SparkFunSuite +import org.apache.spark.ml.param.ParamsSuite +import org.apache.spark.ml.util.MLTestingUtils +import org.apache.spark.mllib.classification.LogisticRegressionSuite._ +import org.apache.spark.mllib.linalg.{BLAS, DenseVector, Vectors} +import org.apache.spark.mllib.random._ +import org.apache.spark.mllib.regression.LabeledPoint +import org.apache.spark.mllib.util.MLlibTestSparkContext +import org.apache.spark.mllib.util.TestingUtils._ +import org.apache.spark.sql.{DataFrame, Row} + +class GeneralizedLinearRegressionSuite extends SparkFunSuite with MLlibTestSparkContext { + + private val seed: Int = 42 + @transient var datasetGaussianIdentity: DataFrame = _ + @transient var datasetGaussianLog: DataFrame = _ + @transient var datasetGaussianInverse: DataFrame = _ + @transient var datasetBinomial: DataFrame = _ + @transient var datasetPoissonLog: DataFrame = _ + @transient var datasetPoissonIdentity: DataFrame = _ + @transient var datasetPoissonSqrt: DataFrame = _ + @transient var datasetGammaInverse: DataFrame = _ + @transient var datasetGammaIdentity: DataFrame = _ + @transient var datasetGammaLog: DataFrame = _ + + override def beforeAll(): Unit = { +super.beforeAll() + +import GeneralizedLinearRegressionSuite._ + +datasetGaussianIdentity = sqlContext.createDataFrame( + sc.parallelize(generateGeneralizedLinearRegressionInput( +intercept = 2.5, coefficients = Array(2.2, 0.6), xMean = Array(2.9, 10.5), +xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01, +family = "gaussian", link = "identity"), 2)) + +datasetGaussianLog = sqlContext.createDataFrame( + sc.parallelize(generateGeneralizedLinearRegressionInput( +intercept = 0.25, coefficients = Array(0.22, 0.06), xMean = Array(2.9, 10.5), +xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01, +family = "gaussian", link = "log"), 2)) + +datasetGaussianInverse = sqlContext.createDataFrame( + sc.parallelize(generateGeneralizedLinearRegressionInput( +intercept = 2.5, coefficients = Array(2.2, 0.6), xMean = Array(2.9, 10.5), +xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01, +family = "gaussian", link = "inverse"), 2)) + +datasetBinomial = { + val nPoints = 1 + val coefficients = Array(-0.57997, 0.912083, -0.371077, -0.819866, 2.688191) + val xMean = Array(5.843, 3.057, 3.758, 1.199) + val xVariance = Array(0.6856, 0.1899, 3.116, 0.581) + + val testData = +generateMultinomialLogisticInput(coefficients, xMean, xVariance, true, nPoints, seed) + + sqlContext.createDataFrame(sc.parallelize(testData, 4)) +} + +datasetPoissonLog = sqlContext.createDataFrame( + sc.parallelize(generateGeneralizedLinearRegressionInput( +intercept = 0.25, coefficients = Array(0.22, 0.06), xMean = Array(2.9, 10.5), +xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01, +family = "poisson", link = "log"), 2)) + +datasetPoissonIdentity = sqlContext.createDataFrame( + sc.parallelize(generateGeneralizedLinearRegressionInput( +intercept = 2.5, coefficients = Array(2.2, 0.6), xMean = Array(2.9, 10.5), +xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01, +family = "poisson", link = "identity"), 2)) + +datasetPoissonSqrt = sqlContext.createDataFrame( +
[GitHub] spark pull request: [SPARK-12817] Add BlockManager.getOrElseUpdate...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/11436#discussion_r54518895 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -852,18 +878,20 @@ private[spark] class BlockManager( Await.ready(replicationFuture, Duration.Inf) } case _ => - val remoteStartTime = System.currentTimeMillis - // Serialize the block if not already done - if (bytesAfterPut == null) { -if (valuesAfterPut == null) { - throw new SparkException( -"Underlying put returned neither an Iterator nor bytes! This shouldn't happen.") + if (blockWasSuccessfullyStored) { --- End diff -- /cc @tdas, the goal of this change is to avoid attempting to replicate deserialized, memory-only blocks if their initial cache / persist fails due to a lack of memory. For the purposes of this patch, we need to do this to prevent the iterator from being consumed so that it can be passed back to the caller. More generally, though, I think that we should have this change to avoid OOMs by trying to serialize an entire partition which was too large to be stored. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12817] Add BlockManager.getOrElseUpdate...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11436#issuecomment-190522557 **[Test build #52212 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52212/consoleFull)** for PR 11436 at commit [`50f66d1`](https://github.com/apache/spark/commit/50f66d18a5b836a8012e171a1ece8bea83c60e19). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13586]add config to skip generate down ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11440#issuecomment-190522437 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org