[GitHub] spark pull request #18067: [SPARK-20849][DOC][SPARKR] Document R DecisionTre...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/18067#discussion_r118423872 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -776,6 +778,20 @@ newDF <- createDataFrame(data.frame(x = c(1.5, 3.2))) head(predict(isoregModel, newDF)) ``` + Decision Tree + +`spark.decisionTree` fits a [decision tree](https://en.wikipedia.org/wiki/Decision_tree_learning) classification or regression model on a `SparkDataFrame`. +Users can call `summary` to get a summary of the fitted model, `predict` to make predictions, and `write.ml`/`read.ml` to save/load fitted models. + +We use the `longley` dataset to train a decision tree and make predictions: + +```{r} +df <- createDataFrame(longley) --- End diff -- option 2: do you mean using {r, warning=FALSE}` like other examples? I think both are OK,. which do you prefer? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 Sorry about test failures. Will fix tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/18025 @felixcheung All comments are addressed now and I think this is ready for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/18025 - New commit now resolves the Name issue. `@title` does not work, which is the header in the second line `\title{Aggregate functions for Column operations}`. The solution is to use `@name NULL` for the generics. Now we have: ![image](https://cloud.githubusercontent.com/assets/11082368/26437454/3780b8d4-40d2-11e7-83e9-80eec206f000.png) - Also added several more practical examples. But most of these functions are very straightforward to use. ![image](https://cloud.githubusercontent.com/assets/11082368/26437488/5be621be-40d2-11e7-8df8-0e5c99fb6ef6.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18025 **[Test build #77341 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77341/testReport)** for PR 18025 at commit [`038eac3`](https://github.com/apache/spark/commit/038eac3a60b330a29fc7099c31913175f6593e3c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18104 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18104 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77339/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18104 **[Test build #77339 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77339/testReport)** for PR 18104 at commit [`a72ab8c`](https://github.com/apache/spark/commit/a72ab8c3153743ee5b5d0fe4ba797023aac6e88c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18097: [Spark-20873][SQL] Improve the error message for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18097#discussion_r118421574 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnType.scala --- @@ -684,7 +684,7 @@ private[columnar] object ColumnType { case struct: StructType => STRUCT(struct) case udt: UserDefinedType[_] => apply(udt.sqlType) case other => -throw new Exception(s"Unsupported type: $other") +throw new Exception(s"Unsupported type: ${other.typeName}") --- End diff -- `typeName` -> `simpleString` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18094: [Spark-20775][SQL] Added scala support from_json
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18094#discussion_r118421269 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3072,6 +3072,22 @@ object functions { * @since 2.1.0 */ def from_json(e: Column, schema: String, options: java.util.Map[String, String]): Column = { +from_json(e, schema, options.asScala.toMap) + } + + /** +* (Scala-specific) Parses a column containing a JSON string into a `StructType` or `ArrayType` of `StructType`s --- End diff -- ditto. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18094: [Spark-20775][SQL] Added scala support from_json
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18094#discussion_r118421254 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3060,7 +3060,7 @@ object functions { from_json(e, schema, Map.empty[String, String]) /** - * Parses a column containing a JSON string into a `StructType` or `ArrayType` of `StructType`s + * (Java-specific) Parses a column containing a JSON string into a `StructType` or `ArrayType` of `StructType`s --- End diff -- nit: ScalaStyle check will fail saying `File line length exceeds 100 characters`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18097: [Spark-20873][SQL] Improve the error message for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18097#discussion_r118421460 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/ColumnTypeSuite.scala --- @@ -144,4 +144,18 @@ class ColumnTypeSuite extends SparkFunSuite with Logging { ColumnType(DecimalType(19, 0)) } } + + test("show type name in type mismatch error") { +val invalidType = new DataType { +override def defaultSize: Int = 1 +override private[spark] def asNullable: DataType = null --- End diff -- `null` -> `this` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18097: [Spark-20873][SQL] Improve the error message for unsuppo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18097 **[Test build #77340 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77340/testReport)** for PR 18097 at commit [`f53de3e`](https://github.com/apache/spark/commit/f53de3ea2606ecf5073d2577d0f82feb0671b8a0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18097: [Spark-20873][SQL] Improve the error message for unsuppo...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18097 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13067: [SPARK-4131] [SQL] Support INSERT OVERWRITE [LOCAL] DIRE...
Github user santhavathi commented on the issue: https://github.com/apache/spark/pull/13067 Is this feature available yet? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13067: [SPARK-4131] [SQL] Support INSERT OVERWRITE [LOCAL] DIRE...
Github user santhavathi commented on the issue: https://github.com/apache/spark/pull/13067 Is this feature available yet? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18104 **[Test build #77339 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77339/testReport)** for PR 18104 at commit [`a72ab8c`](https://github.com/apache/spark/commit/a72ab8c3153743ee5b5d0fe4ba797023aac6e88c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18091: [SPARK-20868][CORE] UnsafeShuffleWriter should verify th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18091 **[Test build #77338 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77338/testReport)** for PR 18091 at commit [`c79de07`](https://github.com/apache/spark/commit/c79de072fd4c0e32f5a62d15f8d921095d4e3bf0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18064 **[Test build #77337 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77337/testReport)** for PR 18064 at commit [`eec0946`](https://github.com/apache/spark/commit/eec0946842657539d69deab43641d32d247f67ec). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18064 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77337/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18064 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18091: [SPARK-20868][CORE] UnsafeShuffleWriter should verify th...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18091 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18064 **[Test build #77337 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77337/testReport)** for PR 18064 at commit [`eec0946`](https://github.com/apache/spark/commit/eec0946842657539d69deab43641d32d247f67ec). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18104 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77336/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18104 **[Test build #77336 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77336/testReport)** for PR 18104 at commit [`313dcbc`](https://github.com/apache/spark/commit/313dcbc99c408c81d6bd5e5395bb373e1d0f418a). * This patch **fails R style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18104 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18104 **[Test build #77336 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77336/testReport)** for PR 18104 at commit [`313dcbc`](https://github.com/apache/spark/commit/313dcbc99c408c81d6bd5e5395bb373e1d0f418a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18058 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18058 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77334/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18058 **[Test build #77334 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77334/testReport)** for PR 18058 at commit [`44267cb`](https://github.com/apache/spark/commit/44267cb56dafd59fb9a43cd72b18d5c1c2cf0c6b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added if joi...
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/16985 @cloud-fan : ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17181: [SPARK-19824][Core] Standalone master JSON not showing c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17181 **[Test build #77335 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77335/testReport)** for PR 17181 at commit [`f8b5eaf`](https://github.com/apache/spark/commit/f8b5eaf37547a77e03a63de7a6b44e3886b38aec). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18064 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18064 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77328/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18064 **[Test build #77328 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77328/testReport)** for PR 18064 at commit [`57f9dde`](https://github.com/apache/spark/commit/57f9dde7d4469bbd7f1e04a04fac2041a2d743e6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17181: [SPARK-19824][Core] Standalone master JSON not showing c...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17181 Could you please check whether there exists any other inconsistent value between the UI and JSON API? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12252: [SPARK-14460] [SQL] properly handling of column name con...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/12252 seems it's fixed in https://github.com/apache/spark/pull/15662 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17181: [SPARK-19824][Core] Standalone master JSON not showing c...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17181 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18025 re: title, would explicitly adding `@title` help? re: multiple class - agreed, a link or `@seealso` should be good. wouldn't `?coalesce` show the overloads though --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18104 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18104 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77330/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18104 **[Test build #77330 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77330/testReport)** for PR 18104 at commit [`dab72a6`](https://github.com/apache/spark/commit/dab72a60441e337e9143c4144795a723d5cc0867). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18058 Hi, I'm not familiar with pyspark. I just wonder whether is it needed to create a unit test for verification. If yes, how to check it? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18058 **[Test build #77334 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77334/testReport)** for PR 18058 at commit [`44267cb`](https://github.com/apache/spark/commit/44267cb56dafd59fb9a43cd72b18d5c1c2cf0c6b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (...
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18058#discussion_r118416434 --- Diff: python/pyspark/ml/fpm.py --- @@ -49,6 +49,32 @@ def getMinSupport(self): return self.getOrDefault(self.minSupport) +class HasNumPartitions(Params): +""" +Mixin for param support. +""" + +numPartitions = Param( +Params._dummy(), +"numPartitions", +"""Number of partitions (at least 1) used by parallel FP-growth. +By default the param is not set, +and partition number of the input dataset is used.""", +typeConverter=TypeConverters.toInt) + +def setNumPartitions(self, value): +""" +Sets the value of :py:attr:`numPartitions`. +""" +return self._set(numPartitions=value) + +def getNumPartitions(self): +""" +Gets the value of numPartitions or its default value. --- End diff -- added. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (...
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18058#discussion_r118416400 --- Diff: python/pyspark/ml/fpm.py --- @@ -49,6 +49,32 @@ def getMinSupport(self): return self.getOrDefault(self.minSupport) +class HasNumPartitions(Params): +""" +Mixin for param support. +""" + +numPartitions = Param( +Params._dummy(), +"numPartitions", +"""Number of partitions (at least 1) used by parallel FP-growth. --- End diff -- replaced. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (...
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18058#discussion_r118416391 --- Diff: python/pyspark/ml/fpm.py --- @@ -49,6 +49,32 @@ def getMinSupport(self): return self.getOrDefault(self.minSupport) +class HasNumPartitions(Params): +""" +Mixin for param support. --- End diff -- modified. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17967 yes I'd hold this for a day. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18057: [SPARK-20786][SQL][Backport-2.2]Improve ceil and floor h...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18057 **[Test build #77333 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77333/testReport)** for PR 18057 at commit [`4c68688`](https://github.com/apache/spark/commit/4c68688d3c970a0ca95c5afb6f1a60fb02b14421). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18090: [SPARK-20250][Core]Improper OOM error when a task been k...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18090 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17770 **[Test build #77332 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77332/testReport)** for PR 17770 at commit [`8314cc3`](https://github.com/apache/spark/commit/8314cc310d9cf5d807a7e9b9de3c962dc37bf3e8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17770 **[Test build #77331 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77331/testReport)** for PR 17770 at commit [`b82b018`](https://github.com/apache/spark/commit/b82b0181c16b64968feaf560eb1422193746efde). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/18025 @felixcheung - The links to `stddev_samp` etc are already removed in the latest commit. - About collecting all the example into one, I think that'll work for this particular one. But I'm not sure about this in general. These methods are still spread out in `.R` file. And if we decide to change the grouping of these functions later on, it will be very difficult if we don't have examples in those methods. - For a method that is defined for multiple classes but meaning are drastically different, I agree that it's best to document by class. One downside is a generic `?coalesce` can only go to one help page, e.g., the help for SparkDataFrame, not the other classed. However, we can add links to the `coalesce` methods for the other classes in the `SeeAlso` section. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18091: [SPARK-20868][CORE] UnsafeShuffleWriter should verify th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18091 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77323/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18091: [SPARK-20868][CORE] UnsafeShuffleWriter should verify th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18091 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18057: [SPARK-20786][SQL][Backport-2.2]Improve ceil and floor h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18057 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77327/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18091: [SPARK-20868][CORE] UnsafeShuffleWriter should verify th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18091 **[Test build #77323 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77323/testReport)** for PR 18091 at commit [`c79de07`](https://github.com/apache/spark/commit/c79de072fd4c0e32f5a62d15f8d921095d4e3bf0). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18057: [SPARK-20786][SQL][Backport-2.2]Improve ceil and floor h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18057 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18057: [SPARK-20786][SQL][Backport-2.2]Improve ceil and floor h...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18057 **[Test build #77327 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77327/testReport)** for PR 18057 at commit [`eaf236a`](https://github.com/apache/spark/commit/eaf236af538d4f3454d598dc5ba5a254e12647d6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17972: [SPARK-20723][ML]Add intermediate storage level to tree ...
Github user phatak-dev commented on the issue: https://github.com/apache/spark/pull/17972 @MLnick can you start a jenkins build? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18104 **[Test build #77330 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77330/testReport)** for PR 18104 at commit [`dab72a6`](https://github.com/apache/spark/commit/dab72a60441e337e9143c4144795a723d5cc0867). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17967 @felixcheung @yanboliang I'm fine with either the ascii table or the html table. It's your call. Hope to get over this minor doc issue and get this PR in soon. I can update the doc later if we find a better way. Thanks much. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test...
GitHub user felixcheung opened a pull request: https://github.com/apache/spark/pull/18104 [SPARK-20877][SPARKR][WIP] add timestamps to test runs ## What changes were proposed in this pull request? to investigate how long they run ## How was this patch tested? Jenkins, AppVeyor You can merge this pull request into a Git repository by running: $ git pull https://github.com/felixcheung/spark rtimetest Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18104.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18104 commit dab72a60441e337e9143c4144795a723d5cc0867 Author: Felix CheungDate: 2017-05-25T03:53:24Z timestamp tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18079: [SPARK-20841][SQL] Support column aliases for catalog ta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18079 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77322/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18079: [SPARK-20841][SQL] Support column aliases for catalog ta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18079 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77326/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #77326 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77326/testReport)** for PR 16578 at commit [`9f2f340`](https://github.com/apache/spark/commit/9f2f3409172ba09d15494f9faf861bb6ad683911). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18079: [SPARK-20841][SQL] Support column aliases for catalog ta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18079 **[Test build #77322 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77322/testReport)** for PR 18079 at commit [`b0e5805`](https://github.com/apache/spark/commit/b0e5805951471bb6bb8da98af75e99ac3057bc63). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16989 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77321/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16989 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16989 **[Test build #77321 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77321/testReport)** for PR 16989 at commit [`b07a3b6`](https://github.com/apache/spark/commit/b07a3b61ba483989b2c205e88cf9fdc73a4205df). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public final class FileSegmentManagedBuffer extends ManagedBuffer ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18083: [SPARK-20863] Add metrics/instrumentation to Live...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18083#discussion_r118413353 --- Diff: core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala --- @@ -124,11 +136,13 @@ private[spark] class LiveListenerBus(val sparkContext: SparkContext) extends Spa logError(s"$name has already stopped! Dropping event $event") return } +metrics.numEventsReceived.inc() val eventAdded = eventQueue.offer(event) if (eventAdded) { eventLock.release() } else { onDropEvent(event) + metrics.numDroppedEvents.inc() --- End diff -- is it better to move this to `onDropEvent`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18083: [SPARK-20863] Add metrics/instrumentation to Live...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18083#discussion_r118413314 --- Diff: core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala --- @@ -124,11 +136,13 @@ private[spark] class LiveListenerBus(val sparkContext: SparkContext) extends Spa logError(s"$name has already stopped! Dropping event $event") return } +metrics.numEventsReceived.inc() --- End diff -- here we also count dropped events? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18083: [SPARK-20863] Add metrics/instrumentation to Live...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18083#discussion_r118413299 --- Diff: core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala --- @@ -226,3 +240,34 @@ private[spark] object LiveListenerBus { val name = "SparkListenerBus" } +private[spark] class LiveListenerBusMetrics(queue: LinkedBlockingQueue[_]) extends Source { + override val sourceName: String = "LiveListenerBus" + override val metricRegistry: MetricRegistry = new MetricRegistry + + /** + * The total number of events posted to the LiveListenerBus. This counts the number of times + * that `post()` is called, which might be less than the total number of events processed in + * case events are dropped. --- End diff -- according to the code, we also count dropped events, isn't it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18083: [SPARK-20863] Add metrics/instrumentation to Live...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18083#discussion_r118413115 --- Diff: core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala --- @@ -226,3 +240,34 @@ private[spark] object LiveListenerBus { val name = "SparkListenerBus" } +private[spark] class LiveListenerBusMetrics(queue: LinkedBlockingQueue[_]) extends Source { + override val sourceName: String = "LiveListenerBus" + override val metricRegistry: MetricRegistry = new MetricRegistry + + /** + * The total number of events posted to the LiveListenerBus. This counts the number of times + * that `post()` is called, which might be less than the total number of events processed in + * case events are dropped. + */ + val numEventsReceived: Counter = metricRegistry.counter(MetricRegistry.name("numEventsReceived")) + + /** + * The total number of events that were dropped without being delivered to listeners. + */ + val numDroppedEvents: Counter = metricRegistry.counter(MetricRegistry.name("numEventsDropped")) + + /** + * The amount of time taken to post a single event to all listeners. + */ + val eventProcessingTime: Timer = metricRegistry.timer(MetricRegistry.name("eventProcessingTime")) + + /** + * The number of of messages waiting in the queue. + */ + val queueSize: Gauge[Int] = { --- End diff -- do we need this metric? Users can easily get it by looking at the `spark.scheduler.listenerbus.eventqueue.size` config. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18083: [SPARK-20863] Add metrics/instrumentation to Live...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18083#discussion_r118413024 --- Diff: core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala --- @@ -226,3 +240,34 @@ private[spark] object LiveListenerBus { val name = "SparkListenerBus" } +private[spark] class LiveListenerBusMetrics(queue: LinkedBlockingQueue[_]) extends Source { + override val sourceName: String = "LiveListenerBus" + override val metricRegistry: MetricRegistry = new MetricRegistry + + /** + * The total number of events posted to the LiveListenerBus. This counts the number of times + * that `post()` is called, which might be less than the total number of events processed in + * case events are dropped. + */ + val numEventsReceived: Counter = metricRegistry.counter(MetricRegistry.name("numEventsReceived")) + + /** + * The total number of events that were dropped without being delivered to listeners. + */ + val numDroppedEvents: Counter = metricRegistry.counter(MetricRegistry.name("numEventsDropped")) + + /** + * The amount of time taken to post a single event to all listeners. + */ + val eventProcessingTime: Timer = metricRegistry.timer(MetricRegistry.name("eventProcessingTime")) + + /** + * The number of of messages waiting in the queue. --- End diff -- nit: double `of` here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18083: [SPARK-20863] Add metrics/instrumentation to Live...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18083#discussion_r118412901 --- Diff: core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala --- @@ -33,25 +37,24 @@ import org.apache.spark.util.Utils * has started will events be actually propagated to all attached listeners. This listener bus * is stopped when `stop()` is called, and it will drop further events after stopping. */ -private[spark] class LiveListenerBus(val sparkContext: SparkContext) extends SparkListenerBus { +private[spark] class LiveListenerBus(conf: SparkConf) extends SparkListenerBus { self => import LiveListenerBus._ + private var sparkContext: SparkContext = _ + // Cap the capacity of the event queue so we get an explicit error (rather than // an OOM exception) if it's perpetually being added to more quickly than it's being drained. - private lazy val EVENT_QUEUE_CAPACITY = validateAndGetQueueSize() - private lazy val eventQueue = new LinkedBlockingQueue[SparkListenerEvent](EVENT_QUEUE_CAPACITY) - - private def validateAndGetQueueSize(): Int = { -val queueSize = sparkContext.conf.get(LISTENER_BUS_EVENT_QUEUE_SIZE) -if (queueSize <= 0) { - throw new SparkException("spark.scheduler.listenerbus.eventqueue.size must be > 0!") -} -queueSize + private val eventQueue = { +val capacity = conf.get(LISTENER_BUS_EVENT_QUEUE_SIZE) +require(capacity > 0, s"${LISTENER_BUS_EVENT_QUEUE_SIZE.key} must be > 0!") --- End diff -- this constraint can be put in `LISTENER_BUS_EVENT_QUEUE_SIZE` with `TypedConfigBuilder.checkValue` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18058#discussion_r118412802 --- Diff: python/pyspark/ml/fpm.py --- @@ -49,6 +49,32 @@ def getMinSupport(self): return self.getOrDefault(self.minSupport) +class HasNumPartitions(Params): +""" +Mixin for param support. +""" + +numPartitions = Param( +Params._dummy(), +"numPartitions", +"""Number of partitions (at least 1) used by parallel FP-growth. --- End diff -- does this need to be scrubbed ? I think we have `"""` everywhere --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18101: [SPARK-20874][Examples]Add Structured Streaming Kafka So...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18101 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77320/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18101: [SPARK-20874][Examples]Add Structured Streaming Kafka So...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18101 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17967 given that I think I'm ok with an ascii table as a one time thing. thoughts? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18101: [SPARK-20874][Examples]Add Structured Streaming Kafka So...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18101 **[Test build #77320 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77320/testReport)** for PR 18101 at commit [`e0c758d`](https://github.com/apache/spark/commit/e0c758d05452076ab96177e81e88e0974ef85846). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18025 also, since we have an Rd now what you think about collecting all the example into one - that should eliminate all the `Not run` in every other line. I think then also this will be a great opportunity to do more than simple `head(select(...))` something expanded and more practical? what do you think? also this https://github.com/apache/spark/pull/18025#issuecomment-303838880 I like this approach - these are my comments from your screen shot - I'll review more closely after more changes, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18090: [SPARK-20250][Core]Improper OOM error when a task been k...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18090 **[Test build #77329 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77329/testReport)** for PR 18090 at commit [`1a45ff5`](https://github.com/apache/spark/commit/1a45ff5ced6cffd3d8ed41574df3bdd8e463bc21). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18025 I guess we don't need link to stddev_samp since it's the same page shouldn't std_dev and var_samp also on this page? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18090: [SPARK-20250][Core]Improper OOM error when a task been k...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18090 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18090: [SPARK-20250][Core]Improper OOM error when a task...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18090#discussion_r118412069 --- Diff: core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java --- @@ -184,6 +185,10 @@ public long acquireExecutionMemory(long required, MemoryConsumer consumer) { break; } } + } catch (ClosedByInterruptException e) { --- End diff -- surprisingly this is also `IOException`... good catch! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18025 I think we need to give it a title explicitly - see the header/first line of https://cloud.githubusercontent.com/assets/11082368/26429381/64dd117e-409b-11e7-9661-659b5fbe8206.png --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18090: [SPARK-20250][Core]Improper OOM error when a task been k...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18090 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18064 **[Test build #77328 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77328/testReport)** for PR 18064 at commit [`57f9dde`](https://github.com/apache/spark/commit/57f9dde7d4469bbd7f1e04a04fac2041a2d743e6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18067: [SPARK-20849][DOC][SPARKR] Document R DecisionTre...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18067#discussion_r118411586 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -430,7 +430,7 @@ We use `svm` in package `e1071` as an example. We use all default settings excep costs <- exp(seq(from = log(1), to = log(1000), length.out = 5)) train <- function(cost) { stopifnot(requireNamespace("e1071", quietly = TRUE)) - model <- e1071::svm(Species ~ ., data = iris, cost = cost) + model <- e1071::svm(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = iris, cost = cost) --- End diff -- this isn't reverted? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18067: [SPARK-20849][DOC][SPARKR] Document R DecisionTre...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18067#discussion_r118411791 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -776,6 +778,20 @@ newDF <- createDataFrame(data.frame(x = c(1.5, 3.2))) head(predict(isoregModel, newDF)) ``` + Decision Tree + +`spark.decisionTree` fits a [decision tree](https://en.wikipedia.org/wiki/Decision_tree_learning) classification or regression model on a `SparkDataFrame`. +Users can call `summary` to get a summary of the fitted model, `predict` to make predictions, and `write.ml`/`read.ml` to save/load fitted models. + +We use the `longley` dataset to train a decision tree and make predictions: + +```{r} +df <- createDataFrame(longley) --- End diff -- as commented, before, please check. I'm pretty sure `createDataFrame(longley)` will cause a warning ``` longley GNP.deflator GNP Unemployed Armed.Forces Population Year Employed 1947 83.0 234.289 235.6159.0107.608 1947 60.323 1948 88.5 259.426 232.5145.6108.632 1948 61.122 ``` so our options are: - don't use longley (my earlier suggestion) - use longley but keep `warning=FALSE` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16989 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77319/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16989 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16989 **[Test build #77319 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77319/testReport)** for PR 16989 at commit [`188862e`](https://github.com/apache/spark/commit/188862e1a8f80c5147f504ff931ce427ba7c9084). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public final class FileSegmentManagedBuffer extends ManagedBuffer ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18064#discussion_r118410964 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriter.scala --- @@ -86,12 +86,10 @@ private[kafka010] object KafkaWriter extends Logging { topic: Option[String] = None): Unit = { val schema = queryExecution.analyzed.output validateQuery(queryExecution, kafkaParameters, topic) -SQLExecution.withNewExecutionId(sparkSession, queryExecution) { --- End diff -- If you mean `KafkaSourceProvider`, is it the same code path as `KafkaSink`? In `KafkaSink.addBath`, `KafkaWriter.write` is also called to write into Kafka. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17113: [SPARK-13669][Core] Improve the blacklist mechanism to h...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/17113 Thanks @tgravescs , I will update the code soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18065: [SPARK-20844] Remove experimental from Structured Stream...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18065 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18065: [SPARK-20844] Remove experimental from Structured Stream...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18065 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77317/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org