[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15659 @JoshRosen - yes since we ship the jars with them we want people to be able to install the correct package for the hadoop distribution they are running with/against. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15881: [SPARK-18434][ML] Add missing ParamValidations for ML al...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15881 Thanks for taking this on :) I did a quick skim and these all look pretty reasonable (in some cases we are directly calling an old mllib algorithm which would do the validation at training time and the values seem to line up). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15740: [SPARK-18232] [Mesos] Support CNI
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15740 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11122: [SPARK-13027][STREAMING] Added batch time as a parameter...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11122 **[Test build #3426 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3426/consoleFull)** for PR 11122 at commit [`fe68b6c`](https://github.com/apache/spark/commit/fe68b6c03300a37799ccaad2ee554bde005c8f6f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15887: [SPARK-18442][SQL] Fix nullability of WrapOption.
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15887 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15740: [SPARK-18232] [Mesos] Support CNI
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15740 Merging in master! Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14612: [SPARK-16803] [SQL] SaveAsTable does not work whe...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14612#discussion_r87958491 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala --- @@ -89,6 +89,22 @@ case class AnalyzeCreateTable(sparkSession: SparkSession) extends Rule[LogicalPl } c +case c @ CreateTable(tableDesc, mode, Some(query)) +if mode == SaveMode.Append && isHiveSerdeTable(tableDesc.identifier) => --- End diff -- Let me try another way to fix it. Will submit a new PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14612: [SPARK-16803] [SQL] SaveAsTable does not work whe...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14612#discussion_r87958997 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala --- @@ -89,6 +89,22 @@ case class AnalyzeCreateTable(sparkSession: SparkSession) extends Rule[LogicalPl } c +case c @ CreateTable(tableDesc, mode, Some(query)) +if mode == SaveMode.Append && isHiveSerdeTable(tableDesc.identifier) => --- End diff -- uh... Actually, I found a bug in our write path. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15803: [SPARK-18298][Web UI]change gmt time to local zone time ...
Github user WangTaoTheTonic commented on the issue: https://github.com/apache/spark/pull/15803 I'll post how UI works and what changes it did to be different before later :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11122: [SPARK-13027][STREAMING] Added batch time as a parameter...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11122 **[Test build #3426 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3426/consoleFull)** for PR 11122 at commit [`fe68b6c`](https://github.com/apache/spark/commit/fe68b6c03300a37799ccaad2ee554bde005c8f6f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15884: [WIP][SPARK-18433][SQL] Improve DataSource option keys t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15884 **[Test build #68655 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68655/consoleFull)** for PR 15884 at commit [`e69735c`](https://github.com/apache/spark/commit/e69735ca13024b6ecd245c0c1823524c50e21c95). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15884: [WIP][SPARK-18433][SQL] Improve DataSource option keys t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15884 **[Test build #68654 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68654/consoleFull)** for PR 15884 at commit [`937db99`](https://github.com/apache/spark/commit/937db990c30a2592fbe9192858611e1843a9a982). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15878: [SPARK-18430] [SQL] Fixed Exception Messages when Hittin...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15878 Sure, will do it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15887: [SPARK-18442][SQL] Fix nullability of WrapOption.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15887 **[Test build #68653 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68653/consoleFull)** for PR 15887 at commit [`a7140cd`](https://github.com/apache/spark/commit/a7140cd62c7293ebab9683344b14910b9e3c6ff5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15887: [SPARK-18442][SQL] Fix nullability of WrapOption.
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/15887 [SPARK-18442][SQL] Fix nullability of WrapOption. ## What changes were proposed in this pull request? The nullability of `WrapOption` should be `false`. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-18442 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15887.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15887 commit a7140cd62c7293ebab9683344b14910b9e3c6ff5 Author: Takuya UESHINDate: 2016-11-15T05:37:58Z Fix nullability of `WrapOption`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15878: [SPARK-18430] [SQL] Fixed Exception Messages when...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15878 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15878: [SPARK-18430] [SQL] Fixed Exception Messages when Hittin...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15878 Can you submit a backport for branch-2.0? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15878: [SPARK-18430] [SQL] Fixed Exception Messages when Hittin...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15878 Merging in master/branch-2.1. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15875: [SPARK-18428][DOC] Update docs for GraphX
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15875 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15885: [SPARK-18440][Structured Streaming] Pass correct ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15885#discussion_r87949452 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala --- @@ -142,19 +145,80 @@ class FileStreamSinkSuite extends StreamTest { } } - test("FileStreamSink - parquet") { + test("parquet") { testFormat(None) // should not throw error as default format parquet when not specified testFormat(Some("parquet")) } - test("FileStreamSink - text") { + test("text") { testFormat(Some("text")) } - test("FileStreamSink - json") { + test("json") { testFormat(Some("json")) } + test("aggregation + watermark + append mode") { --- End diff -- maybe write some comment explaining what this is testing? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15885: [SPARK-18440][Structured Streaming] Pass correct ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15885#discussion_r87949429 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala --- @@ -86,6 +86,7 @@ object FileFormatWriter extends Logging { def write( sparkSession: SparkSession, plan: LogicalPlan, --- End diff -- we shouldn't need the plan here do we? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15875: [SPARK-18428][DOC] Update docs for GraphX
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15875 Thanks - merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11122: [SPARK-13027][STREAMING] Added batch time as a parameter...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/11122 retest this please. LGTM. Let's run the test again. Will merge after test passes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15880 +1 on the postgres approach --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15886: [MINOR][DOC] Fix typos in the 'configuration', 'monitori...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15886 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15886: [MINOR][DOC] Fix typos in the 'configuration', 'm...
GitHub user weiqingy opened a pull request: https://github.com/apache/spark/pull/15886 [MINOR][DOC] Fix typos in the 'configuration', 'monitoring' and 'sql-programming-guide' documentation ## What changes were proposed in this pull request? Fix typos in the 'configuration', 'monitoring' and 'sql-programming-guide' documentation. ## How was this patch tested? Manually. You can merge this pull request into a Git repository by running: $ git pull https://github.com/weiqingy/spark fixTypo Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15886.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15886 commit 2e509ce8e3beaa294500e8e00394e2840b3866d4 Author: Weiqing YangDate: 2016-11-15T04:38:42Z [MINOR][DOC] Fix typo in the 'configuration', 'monitoring' and 'sql-programming-guide' documentation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15843 I'd prefer option2 for safety since the model summaries should be an issue for GC. And looks like Java model summaries don't have copy method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15883: [SPARK-18438][SPARKR][ML] spark.mlp should support RForm...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15883 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68650/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15883: [SPARK-18438][SPARKR][ML] spark.mlp should support RForm...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15883 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15883: [SPARK-18438][SPARKR][ML] spark.mlp should support RForm...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15883 **[Test build #68650 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68650/consoleFull)** for PR 15883 at commit [`56a58fa`](https://github.com/apache/spark/commit/56a58fa22fd2243c536f71bca78ec65a15a44ecc). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15869: [YARN][DOC] Update Yarn configuration doc
Github user weiqingy commented on the issue: https://github.com/apache/spark/pull/15869 Hi, @srowen I have replied your comments and updated the PR. Could you please review it again? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15869: [YARN][DOC] Update Yarn configuration doc
Github user weiqingy commented on a diff in the pull request: https://github.com/apache/spark/pull/15869#discussion_r87945871 --- Diff: docs/running-on-yarn.md --- @@ -495,6 +468,20 @@ To use a custom metrics.properties for the application master and executors, upd name matches both the include and the exclude pattern, this file will be excluded eventually. + + spark.yarn.report.interval + 1s + + Interval between reports of the current app status in Yarn cluster mode. --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15869: [YARN][DOC] Update Yarn configuration doc
Github user weiqingy commented on a diff in the pull request: https://github.com/apache/spark/pull/15869#discussion_r87945875 --- Diff: docs/running-on-yarn.md --- @@ -495,6 +468,20 @@ To use a custom metrics.properties for the application master and executors, upd name matches both the include and the exclude pattern, this file will be excluded eventually. + + spark.yarn.report.interval + 1s + + Interval between reports of the current app status in Yarn cluster mode. + + + + spark.yarn.services + Nil --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15875: [SPARK-18428][DOC] Update docs for GraphX
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15875 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68652/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15875: [SPARK-18428][DOC] Update docs for GraphX
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15875 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15875: [SPARK-18428][DOC] Update docs for GraphX
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15875 **[Test build #68652 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68652/consoleFull)** for PR 15875 at commit [`6e03ccc`](https://github.com/apache/spark/commit/6e03ccc7c7173a3074c2dd9c4e739340201dbc6d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15835: [SPARK-17059][SQL] Allow FileFormat to specify partition...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15835 I guess we should ping @liancheng as he was reviewing the previous one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15849 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15849 **[Test build #68651 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68651/consoleFull)** for PR 15849 at commit [`e0c8b4e`](https://github.com/apache/spark/commit/e0c8b4eaa17b98cb0832e980c20fae8a6844fa96). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15849 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68651/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15869: [YARN][DOC] Update Yarn configuration doc
Github user weiqingy commented on a diff in the pull request: https://github.com/apache/spark/pull/15869#discussion_r87945049 --- Diff: docs/running-on-yarn.md --- @@ -118,19 +118,6 @@ To use a custom metrics.properties for the application master and executors, upd - spark.driver.memory --- End diff -- Yes, in Yarn mode, the properties `spark.driver.cores`, `spark.driver.memory`, and `spark.executor.memory` are often used. But if searching them in Spark code (i.e. the information below), we will find they are not Yarn-specific. - https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L448 - https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L477 - https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L469 - https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L469 - https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L135 - https://github.com/apache/spark/blob/master/mesos/src/main/scala/org/apache/spark/deploy/rest/mesos/MesosRestServer.scala#L92 - ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15868: [SPARK-18413][SQL] Add `maxConnections` JDBCOption
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15868 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68647/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15868: [SPARK-18413][SQL] Add `maxConnections` JDBCOption
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15868 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15868: [SPARK-18413][SQL] Add `maxConnections` JDBCOption
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15868 **[Test build #68647 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68647/consoleFull)** for PR 15868 at commit [`3378b5e`](https://github.com/apache/spark/commit/3378b5e040041f1af1159d07e3d3b1ef47c6c8c1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15875: [SPARK-18428][DOC] Update docs for GraphX
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15875 @rxin I update this PR to notify in `Vertex and Edge RDDs` that not all methods are listed and add link of `VertexRDD` and `EdgeRDD`. BTW, use `VertexId` instead of `VertexID` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15875: [SPARK-18428][DOC] Update docs for Graph.op
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15875 **[Test build #68652 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68652/consoleFull)** for PR 15875 at commit [`6e03ccc`](https://github.com/apache/spark/commit/6e03ccc7c7173a3074c2dd9c4e739340201dbc6d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15849 **[Test build #68651 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68651/consoleFull)** for PR 15849 at commit [`e0c8b4e`](https://github.com/apache/spark/commit/e0c8b4eaa17b98cb0832e980c20fae8a6844fa96). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15885: [SPARK-18440][Structured Streaming] Pass correct query e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15885 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15885: [SPARK-18440][Structured Streaming] Pass correct query e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15885 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68643/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15885: [SPARK-18440][Structured Streaming] Pass correct query e...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15885 **[Test build #68643 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68643/consoleFull)** for PR 15885 at commit [`337ef01`](https://github.com/apache/spark/commit/337ef01d06237b613d04011795b73c564b4b3e54). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15881: [SPARK-18434][ML] Add missing ParamValidations for ML al...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15881 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68648/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15881: [SPARK-18434][ML] Add missing ParamValidations for ML al...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15881 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15881: [SPARK-18434][ML] Add missing ParamValidations for ML al...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15881 **[Test build #68648 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68648/consoleFull)** for PR 15881 at commit [`587fb9c`](https://github.com/apache/spark/commit/587fb9c8d233ec8d83750d4e1d39996da20b34e4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15852: Spark-18187 [SQL] CompactibleFileStreamLog should not us...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/15852 LGTM overall. If this is accepted, then i will close #15827 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15852: Spark-18187 [SQL] CompactibleFileStreamLog should...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/15852#discussion_r87941243 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala --- @@ -63,7 +63,60 @@ abstract class CompactibleFileStreamLog[T <: AnyRef : ClassTag]( protected def isDeletingExpiredLog: Boolean - protected def compactInterval: Int + protected def defaultCompactInterval: Int + + protected final lazy val compactInterval: Int = { +// SPARK-18187: "compactInterval" can be set by user via defaultCompactInterval. +// If there are existing log entries, then we should ensure a compatible compactInterval +// is used, irrespective of the defaultCompactInterval. There are three cases: +// +// 1. If there is no '.compact' file, we can use the default setting directly. +// 2. If there are two or more '.compact' files, we use the interval of patch id suffix with +// '.compact' as compactInterval. It is unclear whether this case will ever happen in the +// current code, since only the latest '.compact' file is retained i.e., other are garbage +// collected. +// 3. If there is only one '.compact' file, then we must find a compact interval +// that is compatible with (i.e., a divisor of) the previous compact file, and that +// faithfully tries to represent the revised default compact interval i.e., is at least +// is large if possible. +// e.g., if defaultCompactInterval is 5 (and previous compact interval could have +// been any 2,3,4,6,12), then a log could be: 11.compact, 12, 13, in which case +// will ensure that the new compactInterval = 6 > 5 and (11 + 1) % 6 == 0 +val compactibleBatchIds = fileManager.list(metadataPath, batchFilesFilter) + .filter(f => f.getPath.toString.endsWith(CompactibleFileStreamLog.COMPACT_FILE_SUFFIX)) + .map(f => pathToBatchId(f.getPath)) + .sorted + .reverse + +// Case 1 +var interval = defaultCompactInterval +if (compactibleBatchIds.length >= 2) { + // Case 2 + val latestCompactBatchId = compactibleBatchIds(0) + val previousCompactBatchId = compactibleBatchIds(1) + interval = (latestCompactBatchId - previousCompactBatchId).toInt + logInfo(s"Compact interval case 2 = $interval") +} else if (compactibleBatchIds.length == 1) { + // Case 3 + val latestCompactBatchId = compactibleBatchIds(0).toInt + if (latestCompactBatchId + 1 <= defaultCompactInterval) { +interval = latestCompactBatchId + 1 + } else if (defaultCompactInterval < (latestCompactBatchId + 1) / 2) { +// Find the first divisor >= default compact interval +def properDivisors(n: Int, min: Int) = + (min to n/2).filter(i => n % i == 0) :+ n + --- End diff -- to => until ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15852: Spark-18187 [SQL] CompactibleFileStreamLog should not us...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15852 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68644/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15852: Spark-18187 [SQL] CompactibleFileStreamLog should not us...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15852 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15883: [SPARK-18438][SPARKR][ML] spark.mlp should support RForm...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15883 **[Test build #68650 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68650/consoleFull)** for PR 15883 at commit [`56a58fa`](https://github.com/apache/spark/commit/56a58fa22fd2243c536f71bca78ec65a15a44ecc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15852: Spark-18187 [SQL] CompactibleFileStreamLog should not us...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15852 **[Test build #68644 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68644/consoleFull)** for PR 15852 at commit [`24e3617`](https://github.com/apache/spark/commit/24e36177e1eb24e7b250cb5356b47c0507e96d68). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15817: [SPARK-18366][PYSPARK][ML] Add handleInvalid to Pyspark ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15817 **[Test build #68649 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68649/consoleFull)** for PR 15817 at commit [`6687d3c`](https://github.com/apache/spark/commit/6687d3c74123349c55dc516db48679680a622e1a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15817: [SPARK-18366][PYSPARK][ML] Add handleInvalid to Pyspark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15817 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15817: [SPARK-18366][PYSPARK][ML] Add handleInvalid to Pyspark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15817 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68649/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15883: [SPARK-18438][SPARKR][ML] spark.mlp should support RForm...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15883 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15880 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68645/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15880 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15835: [SPARK-17059][SQL] Allow FileFormat to specify partition...
Github user pwoody commented on the issue: https://github.com/apache/spark/pull/15835 @rxin @HyukjinKwon ready for more review on my end. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15880 **[Test build #68645 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68645/consoleFull)** for PR 15880 at commit [`1506d40`](https://github.com/apache/spark/commit/1506d406b5596a557a5c86f16b180239850901ad). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15881: [SPARK-18434][ML] Add missing ParamValidations fo...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/15881#discussion_r87939529 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala --- @@ -73,11 +73,13 @@ private[ml] trait DecisionTreeParams extends PredictorParams /** * Minimum information gain for a split to be considered at a tree node. + * Should be >= 0.0. * (default = 0.0) * @group param */ final val minInfoGain: DoubleParam = new DoubleParam(this, "minInfoGain", -"Minimum information gain for a split to be considered at a tree node.") +"Minimum information gain for a split to be considered at a tree node.", --- End diff -- Oh, I think it's better to not add `>=0` here, because all params in `treeParams.scala` don't add it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15817: [SPARK-18366][PYSPARK][ML] Add handleInvalid to Pyspark ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15817 **[Test build #68649 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68649/consoleFull)** for PR 15817 at commit [`6687d3c`](https://github.com/apache/spark/commit/6687d3c74123349c55dc516db48679680a622e1a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15817: [SPARK-18366][PYSPARK][ML] Add handleInvalid to Pyspark ...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15817 @jkbradley done ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15881: [SPARK-18434][ML] Add missing ParamValidations fo...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/15881#discussion_r87939306 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala --- @@ -73,11 +73,13 @@ private[ml] trait DecisionTreeParams extends PredictorParams /** * Minimum information gain for a split to be considered at a tree node. + * Should be >= 0.0. * (default = 0.0) * @group param */ final val minInfoGain: DoubleParam = new DoubleParam(this, "minInfoGain", -"Minimum information gain for a split to be considered at a tree node.") +"Minimum information gain for a split to be considered at a tree node.", --- End diff -- ðï¼ I will add it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15884: [WIP][SPARK-18433][SQL] Improve DataSource option keys t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15884 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15884: [WIP][SPARK-18433][SQL] Improve DataSource option keys t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15884 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68642/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15884: [WIP][SPARK-18433][SQL] Improve DataSource option keys t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15884 **[Test build #68642 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68642/consoleFull)** for PR 15884 at commit [`30eff08`](https://github.com/apache/spark/commit/30eff086159dabc8db7a46f6d4021c187d7fa4ed). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class CaseInsensitiveMap(map: Map[String, String]) extends Map[String, String]` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15881: [SPARK-18434][ML] Add missing ParamValidations fo...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/15881#discussion_r87939017 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala --- @@ -73,11 +73,13 @@ private[ml] trait DecisionTreeParams extends PredictorParams /** * Minimum information gain for a split to be considered at a tree node. + * Should be >= 0.0. * (default = 0.0) * @group param */ final val minInfoGain: DoubleParam = new DoubleParam(this, "minInfoGain", -"Minimum information gain for a split to be considered at a tree node.") +"Minimum information gain for a split to be considered at a tree node.", --- End diff -- Sorry to miss this. Perhaps add >=0 at the end to be consistent with others. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15881: [SPARK-18434][ML] Add missing ParamValidations for ML al...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15881 **[Test build #68648 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68648/consoleFull)** for PR 15881 at commit [`587fb9c`](https://github.com/apache/spark/commit/587fb9c8d233ec8d83750d4e1d39996da20b34e4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15881: [SPARK-18434][ML] Add missing ParamValidations for ML al...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15881 Thanks for reviewing. I update the checking in LiR's `solver` : Array -> Set, output supported options in err message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15817: [SPARK-18366][PYSPARK][ML] Add handleInvalid to Pyspark ...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15817 Can you please implement the Param directly in Bucketizer and QuantileDiscretizer? Just like in Scala, HasHandleInvalid has built-in Param doc which applies to existing use cases but not Bucketizer and QuantileDiscretizer. It will be better to copy the Param, setter, and getter into Bucketizer and QuantileDiscretizer so that we can specialize the built-in Param doc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15852: Spark-18187 [SQL] CompactibleFileStreamLog should...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/15852#discussion_r87936909 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala --- @@ -63,7 +63,60 @@ abstract class CompactibleFileStreamLog[T <: AnyRef : ClassTag]( protected def isDeletingExpiredLog: Boolean - protected def compactInterval: Int + protected def defaultCompactInterval: Int + + protected final lazy val compactInterval: Int = { +// SPARK-18187: "compactInterval" can be set by user via defaultCompactInterval. +// If there are existing log entries, then we should ensure a compatible compactInterval +// is used, irrespective of the defaultCompactInterval. There are three cases: +// +// 1. If there is no '.compact' file, we can use the default setting directly. +// 2. If there are two or more '.compact' files, we use the interval of patch id suffix with +// '.compact' as compactInterval. It is unclear whether this case will ever happen in the +// current code, since only the latest '.compact' file is retained i.e., other are garbage +// collected. --- End diff -- The log garbage operation is controlled by 'spark.sql.streaming.fileSource.log.deletion'. When it is 'false', there may be two or more '.compact' files --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15877: [SPARK-18429] [SQL] implement a new Aggregate for...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/15877#discussion_r87936572 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CountMinSketchAggSuite.scala --- @@ -0,0 +1,284 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import java.io.ByteArrayInputStream + +import scala.reflect.ClassTag +import scala.util.Random + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult.TypeCheckFailure +import org.apache.spark.sql.catalyst.expressions.{AttributeReference, BoundReference, Cast, GenericInternalRow, Literal} +import org.apache.spark.sql.catalyst.util.ArrayData +import org.apache.spark.sql.types._ +import org.apache.spark.unsafe.types.UTF8String +import org.apache.spark.util.sketch.CountMinSketch + +class CountMinSketchAggSuite extends SparkFunSuite { + private val childExpression = BoundReference(0, IntegerType, nullable = true) + private val epsOfTotalCount = 0.0001 + private val confidence = 0.99 + private val seed = 42 + + test("serialize and de-serialize") { +// Check empty serialize and de-serialize +val agg = new CountMinSketchAgg(childExpression, Literal(epsOfTotalCount), Literal(confidence), + Literal(seed)) +val buffer = CountMinSketch.create(epsOfTotalCount, confidence, seed) +assert(buffer.equals(agg.deserialize(agg.serialize(buffer + +// Check non-empty serialize and de-serialize +val random = new Random(31) +(0 until 1).map(_ => random.nextInt(100)).foreach { value => + buffer.add(value) +} +assert(buffer.equals(agg.deserialize(agg.serialize(buffer + } + + def testHighLevelInterface[T: ClassTag]( + dataType: DataType, + sampledItemIndices: Array[Int], + allItems: Array[T], + exactFreq: Map[T, Long]): Any = { +test(s"high level interface, update, merge, eval... - $dataType") { + val agg = new CountMinSketchAgg(BoundReference(0, dataType, nullable = true), +Literal(epsOfTotalCount), Literal(confidence), Literal(seed)) + assert(!agg.nullable) + + val group1 = 0 until sampledItemIndices.length / 2 + val group1Buffer = agg.createAggregationBuffer() + group1.foreach { index => +val input = InternalRow(allItems(sampledItemIndices(index))) +agg.update(group1Buffer, input) + } + + val group2 = sampledItemIndices.length / 2 until sampledItemIndices.length + val group2Buffer = agg.createAggregationBuffer() + group2.foreach { index => +val input = InternalRow(allItems(sampledItemIndices(index))) +agg.update(group2Buffer, input) + } + + val mergeBuffer = agg.createAggregationBuffer() + agg.merge(mergeBuffer, group1Buffer) + agg.merge(mergeBuffer, group2Buffer) + checkResult(agg.eval(mergeBuffer), allItems, exactFreq) +} --- End diff -- Ok, I'll also test these. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15884: [WIP][SPARK-18433][SQL] Improve DataSource option...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15884#discussion_r87936113 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala --- @@ -34,35 +34,42 @@ private[sql] class JSONOptions( @transient private val parameters: Map[String, String]) --- End diff -- I will update this PR in that way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15884: [WIP][SPARK-18433][SQL] Improve DataSource option...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15884#discussion_r87936135 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala --- @@ -34,35 +34,42 @@ private[sql] class JSONOptions( @transient private val parameters: Map[String, String]) --- End diff -- `JSONOptions` is spark private, we can just declare the type is `CaseInsensitiveMap`, i.e. `parameters: CaseInsensitiveMap` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15884: [WIP][SPARK-18433][SQL] Improve DataSource option...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15884#discussion_r87936058 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala --- @@ -34,35 +34,42 @@ private[sql] class JSONOptions( @transient private val parameters: Map[String, String]) --- End diff -- Ah, just changing the function signature. `parameters: CaseInsensitiveMap[String, String]`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15865: [SPARK-18420][BUILD] Fix the errors caused by lint check...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15865 For sure, I ran ```bash $ ./dev/lint-java Using `mvn` from path: .../mvn Checkstyle checks passed. ``` It seems fine. It looks good to me for few minor comments above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15865: [SPARK-18420][BUILD] Fix the errors caused by lin...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15865#discussion_r87935757 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaInteractionExample.java --- @@ -48,8 +47,7 @@ public static void main(String[] args) { RowFactory.create(5, 9, 2, 7, 10, 7, 3), RowFactory.create(6, 1, 1, 4, 2, 8, 4) ); - --- End diff -- If you are supposed to push more commits, let us revert this extra newline removal. I guess some guys don't like such changes. At least I was told twice before. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15884: [WIP][SPARK-18433][SQL] Improve DataSource option...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15884#discussion_r87935620 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala --- @@ -34,35 +34,42 @@ private[sql] class JSONOptions( @transient private val parameters: Map[String, String]) --- End diff -- Oh, sure! You mean `require(parameters.isInstanceOf[CaseInsensitiveMap])`, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15865: [SPARK-18420][BUILD] Fix the errors caused by lin...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15865#discussion_r87935645 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -109,7 +109,8 @@ public void pointTo(Object baseObject, long baseOffset, int sizeInBytes) { // Read the number of elements from the first 8 bytes. final long numElements = Platform.getLong(baseObject, baseOffset); assert numElements >= 0 : "numElements (" + numElements + ") should >= 0"; -assert numElements <= Integer.MAX_VALUE : "numElements (" + numElements + ") should <= Integer.MAX_VALUE"; +assert numElements <= Integer.MAX_VALUE : +"numElements (" + numElements + ") should <= Integer.MAX_VALUE"; --- End diff -- Hi @ConeyLiu, it is a super minor but it seems we should make the indentation for this (and the same instances here) double-spaced. I at least could find some examples for this such as in `UnsafeMemoryAllocator` ```java assert (memory.obj == null) : "baseObject not null; are you trying to use the off-heap allocator to free on-heap memory?"; ``` or in `ShuffleSortDataFormat` ```java assert (length <= buffer.size()) : "the buffer is smaller than required: " + buffer.size() + " < " + length; ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15868: [SPARK-18413][SQL] Add `maxConnections` JDBCOption
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15868 Thank you for retriggering, @gatorsmile . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15880 +1 on the postgres approach --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15884: [WIP][SPARK-18433][SQL] Improve DataSource option...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15884#discussion_r87935015 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala --- @@ -34,35 +34,42 @@ private[sql] class JSONOptions( @transient private val parameters: Map[String, String]) --- End diff -- can we just force users to pass a `CaseInsensitiveMap`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15857: [SPARK-18300][SQL] Do not apply foldable propagat...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15857#discussion_r87934920 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FoldablePropagationSuite.scala --- @@ -118,14 +118,30 @@ class FoldablePropagationSuite extends PlanTest { Seq( testRelation.select(Literal(1).as('x), 'a).select('x + 'a), testRelation.select(Literal(2).as('x), 'a).select('x + 'a))) - .select('x) val optimized = Optimize.execute(query.analyze) val correctAnswer = Union( Seq( testRelation.select(Literal(1).as('x), 'a).select((Literal(1).as('x) + 'a).as("(x + a)")), testRelation.select(Literal(2).as('x), 'a).select((Literal(2).as('x) + 'a).as("(x + a)" - .select('x).analyze --- End diff -- We are not checking whether it is resolvable or not. See the PR https://github.com/apache/spark/pull/15417 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15868: [SPARK-18413][SQL] Add `maxConnections` JDBCOption
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15868 **[Test build #68647 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68647/consoleFull)** for PR 15868 at commit [`3378b5e`](https://github.com/apache/spark/commit/3378b5e040041f1af1159d07e3d3b1ef47c6c8c1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15868: [SPARK-18413][SQL] Add `maxConnections` JDBCOption
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15868 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15849: [SPARK-18410][STREAMING] Add structured kafka exa...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/15849#discussion_r87934733 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredKafkaWordCount.java --- @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.sql.streaming; + +import org.apache.spark.api.java.function.FlatMapFunction; +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Encoders; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.SparkSession; +import org.apache.spark.sql.streaming.StreamingQuery; + +import java.util.Arrays; +import java.util.Iterator; + +/** + * Consumes messages from one or more topics in Kafka and does wordcount. + * Usage: JavaStructuredKafkaWordCount + *The Kafka "bootstrap.servers" configuration. A + * comma-separated list of host:port. + *There are three kinds of type, i.e. 'assign', 'subscribe', + * 'subscribePattern'. + * |- Specific TopicPartitions to consume. Json string + * | {"topicA":[0,1],"topicB":[2,4]}. + * |- The topic list to subscribe. A comma-separated list of + * | topics. + * |- The pattern used to subscribe to topic(s). + * | Only one of "assign, "subscribe" or "subscribePattern" options can be + * | specified for Kafka source. + *A list of one or more kafka topics to consume from. Please refer --- End diff -- get it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15877: [SPARK-18429] [SQL] implement a new Aggregate for...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/15877#discussion_r87934606 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CountMinSketchAgg.scala --- @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import java.io.{ByteArrayInputStream, ByteArrayOutputStream} + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult.{TypeCheckFailure, TypeCheckSuccess} +import org.apache.spark.sql.catalyst.expressions.{Expression, ExpressionDescription} +import org.apache.spark.sql.catalyst.util.GenericArrayData +import org.apache.spark.sql.types._ +import org.apache.spark.unsafe.types.UTF8String +import org.apache.spark.util.sketch.CountMinSketch + +/** + * This function returns a count-min sketch of a column with the given esp, confidence and seed. + * A count-min sketch is a probabilistic data structure used for summarizing streams of data in + * sub-linear space, which is useful for equality predicates and join size estimation. + * + * @param child child expression that can produce column value with `child.eval(inputRow)` + * @param epsExpression relative error, must be positive + * @param confidenceExpression confidence, must be positive and less than 1.0 + * @param seedExpression random seed + */ +@ExpressionDescription( + usage = """ +_FUNC_(col, eps, confidence, seed) - Returns a count-min sketch of a column with the given esp, + confidence and seed. The result is an array of bytes, which should be deserialized to a + `CountMinSketch` before usage. `CountMinSketch` is useful for equality predicates and join + size estimation. + """) +case class CountMinSketchAgg( +child: Expression, +epsExpression: Expression, +confidenceExpression: Expression, +seedExpression: Expression, +override val mutableAggBufferOffset: Int, +override val inputAggBufferOffset: Int) extends TypedImperativeAggregate[CountMinSketch] { + + def this( + child: Expression, + epsExpression: Expression, + confidenceExpression: Expression, + seedExpression: Expression) = { +this(child, epsExpression, confidenceExpression, seedExpression, 0, 0) + } + + override def checkInputDataTypes(): TypeCheckResult = { +val defaultCheck = super.checkInputDataTypes() +if (defaultCheck.isFailure) { + defaultCheck +} else if (!epsExpression.foldable || !confidenceExpression.foldable || + !seedExpression.foldable) { + TypeCheckFailure( +"The eps, confidence or seed provided must be a literal or constant foldable") +} else if (epsExpression.eval() == null || confidenceExpression.eval() == null || + seedExpression.eval() == null) { + TypeCheckFailure("The eps, confidence or seed provided should not be null") +} else { + // parameter validity will be checked in CountMinSketchImpl + TypeCheckSuccess +} + } + + override def createAggregationBuffer(): CountMinSketch = { +val eps: Double = epsExpression.eval().asInstanceOf[Double] +val confidence: Double = confidenceExpression.eval().asInstanceOf[Double] +val seed: Int = seedExpression.eval().asInstanceOf[Int] +CountMinSketch.create(eps, confidence, seed) + } + + override def update(buffer: CountMinSketch, input: InternalRow): Unit = { +val value = child.eval(input) +// ignore empty rows +if (value != null) { + // UTF8String is a spark sql type, while CountMinSketch accepts String type + buffer.add(if (value.isInstanceOf[UTF8String]) value.toString else value) +} + } + + override
[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15704 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15704 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68640/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15704 **[Test build #68640 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68640/consoleFull)** for PR 15704 at commit [`fab5682`](https://github.com/apache/spark/commit/fab5682ab4c78fc23f0d2db40ae6338e2d5dbab3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15868: [SPARK-18413][SQL] Add `maxConnections` JDBCOption
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15868 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68641/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11122: [SPARK-13027][STREAMING] Added batch time as a parameter...
Github user zzcclp commented on the issue: https://github.com/apache/spark/pull/11122 It sounds good. @zsxwing could you take a look at this pr? does this pr plan to be merged into branch-2.1? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org