[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/15218 @mridulm You are right. This patch is mainly for the job that has multiple stages, which is very common in production pipeline. As you mentioned, if there is shuffle involved, getLocationsWithLargestOutputs in MapOutputTracker typically return None for the ShuffledRowRDD and ShuffledRDD because of the threshold REDUCER_PREF_LOCS_FRACTION (20%). The ShuffledRowRDD/ShuffleRDD can be easily more than 10 partitions (even hundreds) in real production pipeline, thus the patch does help a lot in CPU reservation time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15389: [SPARK-17817][PySpark] PySpark RDD Repartitioning Result...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15389 @holdenk @dusenberrymw @HyukjinKwon Thanks for review! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15178: [SPARK-17556][SQL] Executor side broadcast for broadcast...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15178 ping @rxin @JoshRosen Can you review this? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15389: [SPARK-17817][PySpark] PySpark RDD Repartitioning Result...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15389 @holdenk Thanks you for cc'ing me. It looks okay to me as targeted but I feel we need a sign-off. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15388: [SPARK-17821][SQL] Support And and Or in Expression Cano...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15388 ping @hvanhovell @rxin Can you take a look again? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double datatypes
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15314 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double datatypes
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15314 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66595/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double datatypes
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15314 **[Test build #66595 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66595/consoleFull)** for PR 15314 at commit [`fabe3c6`](https://github.com/apache/spark/commit/fabe3c65838a2b4c7e5ff227c8d585ad2f05ccee). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15319 **[Test build #66597 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66597/consoleFull)** for PR 15319 at commit [`1558d4c`](https://github.com/apache/spark/commit/1558d4c2f9190691239e9b27e9517714c2af2bcc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #10307: [SPARK-12334][SQL][PYSPARK] Support read from mul...
GitHub user zjffdu reopened a pull request: https://github.com/apache/spark/pull/10307 [SPARK-12334][SQL][PYSPARK] Support read from multiple input paths for orc file in DataFrameReader.orc Beside the issue in spark api, also fix 2 minor issues in pyspark * support read from multiple input paths for orc * support read from multiple input paths for text You can merge this pull request into a Git repository by running: $ git pull https://github.com/zjffdu/spark SPARK-12334 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10307.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10307 commit 3dd3452236156bb7ef36e9d290217e23556f6b6e Author: Jeff Zhang Date: 2015-12-15T10:00:47Z [SPARK-12334][SQL][PYSPARK] Support read from multiple input paths for orc file in DataFrameReader.orc commit b6a26e946fcf4331fb62382537ce2b0964a5b90e Author: Jeff Zhang Date: 2015-12-16T01:41:29Z Update doc commit 24a8f4f70cb9da2d83a39836e8517b55a9238e70 Author: Jeff Zhang Date: 2016-04-19T10:36:24Z address code style commit 6ac05805391f13dcd0530f1ecedbd837befcfb20 Author: Jeff Zhang Date: 2016-10-09T03:53:41Z resolve conflicts --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #10307: [SPARK-12334][SQL][PYSPARK] Support read from mul...
Github user zjffdu closed the pull request at: https://github.com/apache/spark/pull/10307 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15319: [SPARK-17733][SQL] InferFiltersFromConstraints ru...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/15319#discussion_r82515583 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -74,14 +74,26 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT * additional constraint of the form `b = 5` */ private def inferAdditionalConstraints(constraints: Set[Expression]): Set[Expression] = { +// Collect alias from expressions to avoid producing non-converging set of constraints +// for recursive functions. +// +// Don't apply transform on constraints if the attribute used to replace is an alias, +// because then both `QueryPlan.inferAdditionalConstraints` and +// `UnaryNode.getAliasedConstraints` applies and may produce a non-converging set of +// constraints. +// For more details, infer https://issues.apache.org/jira/browse/SPARK-17733 +val aliasMap = AttributeMap((expressions ++ children.flatMap(_.expressions)).collect { --- End diff -- Yes, `AttributeSet` is a better choice here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15403: [SPARK-17832][SQL] TableIdentifier.quotedString creates ...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/15403 @hvanhovell nvm about the `catalog.getTable` issue, it turns out to be my mistake. Sorry about that... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15403: [SPARK-17832][SQL] TableIdentifier.quotedString creates ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15403 **[Test build #66596 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66596/consoleFull)** for PR 15403 at commit [`59a1f0e`](https://github.com/apache/spark/commit/59a1f0e42bde93975b3a13447e626dcfbebc0d80). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up options...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15292 LGTM except 2 minor comments, thanks for working on it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double datatypes
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15314 **[Test build #66595 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66595/consoleFull)** for PR 15314 at commit [`fabe3c6`](https://github.com/apache/spark/commit/fabe3c65838a2b4c7e5ff227c8d585ad2f05ccee). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11211: [SPARK-13330][PYSPARK] PYTHONHASHSEED is not propgated t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11211 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11211: [SPARK-13330][PYSPARK] PYTHONHASHSEED is not propgated t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11211 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66590/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8318: [SPARK-1267][PYSPARK] Adds pip installer for pyspark
Github user mateiz commented on the issue: https://github.com/apache/spark/pull/8318 Cool, good to know that there's another ASF project that does it. We should go for it then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11211: [SPARK-13330][PYSPARK] PYTHONHASHSEED is not propgated t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11211 **[Test build #66590 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66590/consoleFull)** for PR 11211 at commit [`b761b85`](https://github.com/apache/spark/commit/b761b858391fd96e18b074b16763cfa46284917a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15292#discussion_r82515285 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -229,13 +229,9 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { table: String, parts: Array[Partition], connectionProperties: Properties): DataFrame = { -val props = new Properties() -extraOptions.foreach { case (key, value) => - props.put(key, value) -} -// connectionProperties should override settings in extraOptions --- End diff -- should we still keep this comment? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14861: [SPARK-17287] [PYSPARK] Add recursive kwarg to Py...
Github user jpiper closed the pull request at: https://github.com/apache/spark/pull/14861 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14861: [SPARK-17287] [PYSPARK] Add recursive kwarg to Python Sp...
Github user jpiper commented on the issue: https://github.com/apache/spark/pull/14861 Looks like this was actually added in #15140, so we can close this :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15159: [SPARK-17605][SPARK_SUBMIT] Add option spark.usePython a...
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/15159 @holdenk that's correct. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15159: [SPARK-17605][SPARK_SUBMIT] Add option spark.usePython a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15159 **[Test build #66594 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66594/consoleFull)** for PR 15159 at commit [`522e3e8`](https://github.com/apache/spark/commit/522e3e85143235a57c94cdc618133e66715264de). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15218 @zhzhan I am curious why this is the case for the jobs being mentioned. This pr should have an impact if the locality preference of the taskset being run is fairly suboptimal to begin with, no ? If the tasks have PROCESS_LOCAL or NODE_LOCAL locality preference - that will take precedence, and attempts to spread the load or reduce spread to nodes as envisioned here will not work. So the target here seems to be RACK_LOCAL or ANY locality preference - which should be fairly uncommon; unless I am missing something here w.r.t the jobs being run. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14215: [SPARK-16544][SQL][WIP] Support for conversion from comp...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14215 @wgtmac I hope this one is merged into 2.1 but I believe I am not supposed to decide it. I will anyway take out of the vectorized one described in the PR then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14861: [SPARK-17287] [PYSPARK] Add recursive kwarg to Python Sp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14861 **[Test build #66593 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66593/consoleFull)** for PR 14861 at commit [`190c63b`](https://github.com/apache/spark/commit/190c63b6fad4588533237fdc83d7e9e8d7b8de7f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrowth fail...
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/13493 PR is updated, @holdenk @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrowth fail...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13493 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrowth fail...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13493 **[Test build #66589 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66589/consoleFull)** for PR 13493 at commit [`e8fefc0`](https://github.com/apache/spark/commit/e8fefc05e0125974ed224f1de3acadbbbf3d98c8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrowth fail...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13493 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66589/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10307: [SPARK-12334][SQL][PYSPARK] Support read from multiple i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/10307 **[Test build #66592 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66592/consoleFull)** for PR 10307 at commit [`6ac0580`](https://github.com/apache/spark/commit/6ac05805391f13dcd0530f1ecedbd837befcfb20). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10307: [SPARK-12334][SQL][PYSPARK] Support read from multiple i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/10307 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66592/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10307: [SPARK-12334][SQL][PYSPARK] Support read from multiple i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/10307 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10307: [SPARK-12334][SQL][PYSPARK] Support read from multiple i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/10307 **[Test build #66592 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66592/consoleFull)** for PR 10307 at commit [`6ac0580`](https://github.com/apache/spark/commit/6ac05805391f13dcd0530f1ecedbd837befcfb20). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10307: [SPARK-12334][SQL][PYSPARK] Support read from multiple i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/10307 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10307: [SPARK-12334][SQL][PYSPARK] Support read from multiple i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/10307 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66591/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10307: [SPARK-12334][SQL][PYSPARK] Support read from multiple i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/10307 **[Test build #66591 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66591/consoleFull)** for PR 10307 at commit [`727b35a`](https://github.com/apache/spark/commit/727b35a6024adad61d89d2d515c3a1561df51cd2). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10307: [SPARK-12334][SQL][PYSPARK] Support read from multiple i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/10307 **[Test build #66591 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66591/consoleFull)** for PR 10307 at commit [`727b35a`](https://github.com/apache/spark/commit/727b35a6024adad61d89d2d515c3a1561df51cd2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15389: [SPARK-17817][PySpark] PySpark RDD Repartitioning Result...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15389 Maybe @hyukjinkwon could also do a review pass while we wait for @davies or someone with commit privileges to come by and do a final review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15404: Branch 2.0
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15404 Can you close this pull request? If it's an attempt at backporting you can just make a new PR once you get it sorted out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14650: [SPARK-17062][MESOS] add conf option to mesos dispatcher
Github user skonto commented on the issue: https://github.com/apache/spark/pull/14650 WIP --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11211: [SPARK-13330][PYSPARK] PYTHONHASHSEED is not propgated t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11211 **[Test build #66590 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66590/consoleFull)** for PR 11211 at commit [`b761b85`](https://github.com/apache/spark/commit/b761b858391fd96e18b074b16763cfa46284917a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11211: [SPARK-13330][PYSPARK] PYTHONHASHSEED is not propgated t...
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/11211 Conflicts is resolved. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrowth fail...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13493 **[Test build #66589 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66589/consoleFull)** for PR 13493 at commit [`e8fefc0`](https://github.com/apache/spark/commit/e8fefc05e0125974ed224f1de3acadbbbf3d98c8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14959: [SPARK-17387][PYSPARK] Creating SparkContext() from pyth...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/14959 Awesome, thanks for updating. I'm at PyData this weekend so will be a bit slow on my end. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14959: [SPARK-17387][PYSPARK] Creating SparkContext() from pyth...
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/14959 @vanzin @holdenk @BryanCutler PR is updated, please help review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15399: [SPARK-17819][SQL] Support default database in connectio...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15399 Hi, @gatorsmile . Could you review this PR when you have sometime? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14959: [SPARK-17387][PYSPARK] Creating SparkContext() from pyth...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14959 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14959: [SPARK-17387][PYSPARK] Creating SparkContext() from pyth...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14959 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66588/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14959: [SPARK-17387][PYSPARK] Creating SparkContext() from pyth...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14959 **[Test build #66588 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66588/consoleFull)** for PR 14959 at commit [`1972714`](https://github.com/apache/spark/commit/19727142d633f19a348658cf1c45993f45867fa4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14375: [SPARK-15194] [ML] Add Python ML API for MultivariateGau...
Github user praveendareddy21 commented on the issue: https://github.com/apache/spark/pull/14375 @MechCoder can you review and merge this PR? refer https://github.com/apache/spark/pull/13248 for discussions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14959: [SPARK-17387][PYSPARK] Creating SparkContext() from pyth...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14959 **[Test build #66588 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66588/consoleFull)** for PR 14959 at commit [`1972714`](https://github.com/apache/spark/commit/19727142d633f19a348658cf1c45993f45867fa4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14959: [SPARK-17387][PYSPARK] Creating SparkContext() from pyth...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14959 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14959: [SPARK-17387][PYSPARK] Creating SparkContext() from pyth...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14959 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66587/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14959: [SPARK-17387][PYSPARK] Creating SparkContext() from pyth...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14959 **[Test build #66587 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66587/consoleFull)** for PR 14959 at commit [`dbc4bb4`](https://github.com/apache/spark/commit/dbc4bb49a44569fd55e3ede895ab06d2e959da82). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14959: [SPARK-17387][PYSPARK] Creating SparkContext() from pyth...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14959 **[Test build #66587 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66587/consoleFull)** for PR 14959 at commit [`dbc4bb4`](https://github.com/apache/spark/commit/dbc4bb49a44569fd55e3ede895ab06d2e959da82). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13690: [SPARK-15767][R][ML] Decision Tree Regression wra...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/13690#discussion_r82513101 --- Diff: R/pkg/R/mllib.R --- @@ -1427,3 +1447,185 @@ print.summary.KSTest <- function(x, ...) { cat(summaryStr, "\n") invisible(x) } + +#' Decision Tree Model for Regression and Classification +#' +#' \code{spark.decisionTree} fits a Decision Tree Regression model or Classification model on +#' a SparkDataFrame. Users can call \code{summary} to get a summary of the fitted Decision Tree +#' model, \code{predict} to make predictions on new data, and \code{write.ml}/\code{read.ml} to +#' save/load fitted models. +#' For more details, see \href{https://en.wikipedia.org/wiki/Decision_tree_learning}{Decision Tree} +#' +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', ':', '+', and '-'. +#' @param type type of model to fit +#' @param maxDepth Maximum depth of the tree (>= 0). +#' @param maxBins Maximum number of bins used for discretizing continuous features and for choosing +#'how to split on features at each node. More bins give higher granularity. Must be +#'>= 2 and >= number of categories in any categorical feature. (default = 32) +#' @param ... additional arguments passed to the method. --- End diff -- Or a future PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13690: [SPARK-15767][R][ML] Decision Tree Regression wra...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/13690#discussion_r82512650 --- Diff: R/pkg/inst/tests/testthat/test_mllib.R --- @@ -791,4 +791,59 @@ test_that("spark.kstest", { expect_match(capture.output(stats)[1], "Kolmogorov-Smirnov test summary:") }) +test_that("spark.decisionTree Regression", { + data <- suppressWarnings(createDataFrame(longley)) --- End diff -- please add a test for print (see spark.glm) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13690: [SPARK-15767][R][ML] Decision Tree Regression wra...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/13690#discussion_r82512708 --- Diff: R/pkg/R/mllib.R --- @@ -1427,3 +1447,185 @@ print.summary.KSTest <- function(x, ...) { cat(summaryStr, "\n") invisible(x) } + +#' Decision Tree Model for Regression and Classification +#' +#' \code{spark.decisionTree} fits a Decision Tree Regression model or Classification model on +#' a SparkDataFrame. Users can call \code{summary} to get a summary of the fitted Decision Tree +#' model, \code{predict} to make predictions on new data, and \code{write.ml}/\code{read.ml} to +#' save/load fitted models. +#' For more details, see \href{https://en.wikipedia.org/wiki/Decision_tree_learning}{Decision Tree} +#' +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', ':', '+', and '-'. +#' @param type type of model to fit --- End diff -- please add the types supported, eg. `one of "regression" or "classification" as the type of model` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13690: [SPARK-15767][R][ML] Decision Tree Regression wra...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/13690#discussion_r82512794 --- Diff: R/pkg/R/mllib.R --- @@ -1427,3 +1447,185 @@ print.summary.KSTest <- function(x, ...) { cat(summaryStr, "\n") invisible(x) } + +#' Decision Tree Model for Regression and Classification +#' +#' \code{spark.decisionTree} fits a Decision Tree Regression model or Classification model on +#' a SparkDataFrame. Users can call \code{summary} to get a summary of the fitted Decision Tree +#' model, \code{predict} to make predictions on new data, and \code{write.ml}/\code{read.ml} to +#' save/load fitted models. +#' For more details, see \href{https://en.wikipedia.org/wiki/Decision_tree_learning}{Decision Tree} +#' +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', ':', '+', and '-'. +#' @param type type of model to fit +#' @param maxDepth Maximum depth of the tree (>= 0). +#' @param maxBins Maximum number of bins used for discretizing continuous features and for choosing +#'how to split on features at each node. More bins give higher granularity. Must be +#'>= 2 and >= number of categories in any categorical feature. (default = 32) +#' @param ... additional arguments passed to the method. +#' @aliases spark.decisionTree,SparkDataFrame,formula-method +#' @return \code{spark.decisionTree} returns a fitted Decision Tree model. +#' @rdname spark.decisionTree +#' @name spark.decisionTree +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(longley) +#' +#' # fit a Decision Tree Regression Model +#' model <- spark.decisionTree(data, Employed ~ ., type = "regression", maxDepth = 5, maxBins = 16) +#' +#' # get the summary of the model +#' summary(model) +#' +#' # make predictions +#' predictions <- predict(model, df) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' } +#' @note spark.decisionTree since 2.1.0 +setMethod("spark.decisionTree", signature(data = "SparkDataFrame", formula = "formula"), + function(data, formula, type = c("regression", "classification"), + maxDepth = 5, maxBins = 32 ) { +type <- match.arg(type) +formula <- paste(deparse(formula), collapse = "") +switch(type, + regression = { + jobj <- callJStatic("org.apache.spark.ml.r.DecisionTreeRegressorWrapper", + "fit", data@sdf, formula, as.integer(maxDepth), + as.integer(maxBins)) + new("DecisionTreeRegressionModel", jobj = jobj) + }, + classification = { + jobj <- callJStatic("org.apache.spark.ml.r.DecisionTreeClassifierWrapper", + "fit", data@sdf, formula, as.integer(maxDepth), + as.integer(maxBins)) + new("DecisionTreeClassificationModel", jobj = jobj) + } +) + }) + +# Makes predictions from a Decision Tree Regression model or +# a model produced by spark.decisionTree() + +#' @param newData a SparkDataFrame for testing. +#' @return \code{predict} returns a SparkDataFrame containing predicted labeled in a column named +#' "prediction" +#' @rdname spark.decisionTree +#' @export --- End diff -- add @aliases --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13690: [SPARK-15767][R][ML] Decision Tree Regression wra...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/13690#discussion_r82513082 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/DecisionTreeRegressorWrapper.scala --- @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.r + +import org.apache.hadoop.fs.Path +import org.json4s._ +import org.json4s.JsonDSL._ +import org.json4s.jackson.JsonMethods._ + +import org.apache.spark.ml.{Pipeline, PipelineModel} +import org.apache.spark.ml.attribute.AttributeGroup +import org.apache.spark.ml.feature.RFormula +import org.apache.spark.ml.regression.{DecisionTreeRegressionModel, DecisionTreeRegressor} +import org.apache.spark.ml.util._ +import org.apache.spark.sql.{DataFrame, Dataset} + +private[r] class DecisionTreeRegressorWrapper private ( + val pipeline: PipelineModel, + val features: Array[String], + val maxDepth: Int, + val maxBins: Int) extends MLWritable { + + private val DTModel: DecisionTreeRegressionModel = +pipeline.stages(1).asInstanceOf[DecisionTreeRegressionModel] + + lazy val depth: Int = DTModel.depth + lazy val numNodes: Int = DTModel.numNodes + + def summary: String = DTModel.toDebugString + + def transform(dataset: Dataset[_]): DataFrame = { +pipeline.transform(dataset) + .drop(DTModel.getFeaturesCol) + } + + override def write: MLWriter = new + DecisionTreeRegressorWrapper.DecisionTreeRegressorWrapperWriter(this) +} + +private[r] object DecisionTreeRegressorWrapper extends MLReadable[DecisionTreeRegressorWrapper] { + def fit(data: DataFrame, + formula: String, + maxDepth: Int, + maxBins: Int): DecisionTreeRegressorWrapper = { + +val rFormula = new RFormula() + .setFormula(formula) + .setFeaturesCol("features") --- End diff -- ditto --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15334: [SPARK-10367][SQL][WIP] Support Parquet logical type INT...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15334 Saw @viirya submitted a PR for the same issue: https://github.com/apache/spark/pull/7793 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13690: [SPARK-15767][R][ML] Decision Tree Regression wra...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/13690#discussion_r82512809 --- Diff: R/pkg/R/mllib.R --- @@ -1427,3 +1447,185 @@ print.summary.KSTest <- function(x, ...) { cat(summaryStr, "\n") invisible(x) } + +#' Decision Tree Model for Regression and Classification +#' +#' \code{spark.decisionTree} fits a Decision Tree Regression model or Classification model on +#' a SparkDataFrame. Users can call \code{summary} to get a summary of the fitted Decision Tree +#' model, \code{predict} to make predictions on new data, and \code{write.ml}/\code{read.ml} to +#' save/load fitted models. +#' For more details, see \href{https://en.wikipedia.org/wiki/Decision_tree_learning}{Decision Tree} +#' +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', ':', '+', and '-'. +#' @param type type of model to fit +#' @param maxDepth Maximum depth of the tree (>= 0). +#' @param maxBins Maximum number of bins used for discretizing continuous features and for choosing +#'how to split on features at each node. More bins give higher granularity. Must be +#'>= 2 and >= number of categories in any categorical feature. (default = 32) +#' @param ... additional arguments passed to the method. +#' @aliases spark.decisionTree,SparkDataFrame,formula-method +#' @return \code{spark.decisionTree} returns a fitted Decision Tree model. +#' @rdname spark.decisionTree +#' @name spark.decisionTree +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(longley) +#' +#' # fit a Decision Tree Regression Model +#' model <- spark.decisionTree(data, Employed ~ ., type = "regression", maxDepth = 5, maxBins = 16) +#' +#' # get the summary of the model +#' summary(model) +#' +#' # make predictions +#' predictions <- predict(model, df) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' } +#' @note spark.decisionTree since 2.1.0 +setMethod("spark.decisionTree", signature(data = "SparkDataFrame", formula = "formula"), + function(data, formula, type = c("regression", "classification"), + maxDepth = 5, maxBins = 32 ) { +type <- match.arg(type) +formula <- paste(deparse(formula), collapse = "") +switch(type, + regression = { + jobj <- callJStatic("org.apache.spark.ml.r.DecisionTreeRegressorWrapper", + "fit", data@sdf, formula, as.integer(maxDepth), + as.integer(maxBins)) + new("DecisionTreeRegressionModel", jobj = jobj) + }, + classification = { + jobj <- callJStatic("org.apache.spark.ml.r.DecisionTreeClassifierWrapper", + "fit", data@sdf, formula, as.integer(maxDepth), + as.integer(maxBins)) + new("DecisionTreeClassificationModel", jobj = jobj) + } +) + }) + +# Makes predictions from a Decision Tree Regression model or +# a model produced by spark.decisionTree() + +#' @param newData a SparkDataFrame for testing. +#' @return \code{predict} returns a SparkDataFrame containing predicted labeled in a column named +#' "prediction" +#' @rdname spark.decisionTree +#' @export +#' @note predict(decisionTreeRegressionModel) since 2.1.0 +setMethod("predict", signature(object = "DecisionTreeRegressionModel"), + function(object, newData) { +predict_internal(object, newData) + }) + +#' @rdname spark.decisionTree +#' @export +#' @note predict(decisionTreeClassificationModel) since 2.1.0 +setMethod("predict", signature(object = "DecisionTreeClassificationModel"), + function(object, newData) { +predict_internal(object, newData) + }) + +#' Save the Decision Tree Regression model to the input path. +#' +#' @param object A fitted Decision tree regression model +#' @param path The directory where the model is saved +#' @param overwrite Overwrites or not if the output path already exists. Default is FALSE +#' which means throw exception if the output path exists. +#' +#' @aliases write.ml,DecisionTreeRegressionModel,character-method +#' @rdname spark.deci
[GitHub] spark pull request #13690: [SPARK-15767][R][ML] Decision Tree Regression wra...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/13690#discussion_r82512689 --- Diff: R/pkg/R/mllib.R --- @@ -1427,3 +1447,185 @@ print.summary.KSTest <- function(x, ...) { cat(summaryStr, "\n") invisible(x) } + +#' Decision Tree Model for Regression and Classification +#' +#' \code{spark.decisionTree} fits a Decision Tree Regression model or Classification model on +#' a SparkDataFrame. Users can call \code{summary} to get a summary of the fitted Decision Tree +#' model, \code{predict} to make predictions on new data, and \code{write.ml}/\code{read.ml} to +#' save/load fitted models. +#' For more details, see \href{https://en.wikipedia.org/wiki/Decision_tree_learning}{Decision Tree} --- End diff -- could you point this url to the Spark programming guide, like http://spark.apache.org/docs/latest/ml-classification-regression.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13690: [SPARK-15767][R][ML] Decision Tree Regression wra...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/13690#discussion_r82512723 --- Diff: R/pkg/R/mllib.R --- @@ -1427,3 +1447,185 @@ print.summary.KSTest <- function(x, ...) { cat(summaryStr, "\n") invisible(x) } + +#' Decision Tree Model for Regression and Classification +#' +#' \code{spark.decisionTree} fits a Decision Tree Regression model or Classification model on +#' a SparkDataFrame. Users can call \code{summary} to get a summary of the fitted Decision Tree +#' model, \code{predict} to make predictions on new data, and \code{write.ml}/\code{read.ml} to +#' save/load fitted models. +#' For more details, see \href{https://en.wikipedia.org/wiki/Decision_tree_learning}{Decision Tree} +#' +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', ':', '+', and '-'. +#' @param type type of model to fit +#' @param maxDepth Maximum depth of the tree (>= 0). +#' @param maxBins Maximum number of bins used for discretizing continuous features and for choosing +#'how to split on features at each node. More bins give higher granularity. Must be +#'>= 2 and >= number of categories in any categorical feature. (default = 32) +#' @param ... additional arguments passed to the method. +#' @aliases spark.decisionTree,SparkDataFrame,formula-method +#' @return \code{spark.decisionTree} returns a fitted Decision Tree model. +#' @rdname spark.decisionTree +#' @name spark.decisionTree +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(longley) +#' +#' # fit a Decision Tree Regression Model +#' model <- spark.decisionTree(data, Employed ~ ., type = "regression", maxDepth = 5, maxBins = 16) +#' --- End diff -- Could we add an example for "classification" too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13690: [SPARK-15767][R][ML] Decision Tree Regression wra...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/13690#discussion_r82512914 --- Diff: R/pkg/R/mllib.R --- @@ -1427,3 +1447,185 @@ print.summary.KSTest <- function(x, ...) { cat(summaryStr, "\n") invisible(x) } + +#' Decision Tree Model for Regression and Classification +#' +#' \code{spark.decisionTree} fits a Decision Tree Regression model or Classification model on +#' a SparkDataFrame. Users can call \code{summary} to get a summary of the fitted Decision Tree +#' model, \code{predict} to make predictions on new data, and \code{write.ml}/\code{read.ml} to +#' save/load fitted models. +#' For more details, see \href{https://en.wikipedia.org/wiki/Decision_tree_learning}{Decision Tree} +#' +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', ':', '+', and '-'. +#' @param type type of model to fit +#' @param maxDepth Maximum depth of the tree (>= 0). +#' @param maxBins Maximum number of bins used for discretizing continuous features and for choosing +#'how to split on features at each node. More bins give higher granularity. Must be +#'>= 2 and >= number of categories in any categorical feature. (default = 32) +#' @param ... additional arguments passed to the method. +#' @aliases spark.decisionTree,SparkDataFrame,formula-method +#' @return \code{spark.decisionTree} returns a fitted Decision Tree model. +#' @rdname spark.decisionTree +#' @name spark.decisionTree +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(longley) +#' +#' # fit a Decision Tree Regression Model +#' model <- spark.decisionTree(data, Employed ~ ., type = "regression", maxDepth = 5, maxBins = 16) +#' +#' # get the summary of the model +#' summary(model) +#' +#' # make predictions +#' predictions <- predict(model, df) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' } +#' @note spark.decisionTree since 2.1.0 +setMethod("spark.decisionTree", signature(data = "SparkDataFrame", formula = "formula"), + function(data, formula, type = c("regression", "classification"), + maxDepth = 5, maxBins = 32 ) { +type <- match.arg(type) +formula <- paste(deparse(formula), collapse = "") +switch(type, + regression = { + jobj <- callJStatic("org.apache.spark.ml.r.DecisionTreeRegressorWrapper", + "fit", data@sdf, formula, as.integer(maxDepth), + as.integer(maxBins)) + new("DecisionTreeRegressionModel", jobj = jobj) + }, + classification = { + jobj <- callJStatic("org.apache.spark.ml.r.DecisionTreeClassifierWrapper", + "fit", data@sdf, formula, as.integer(maxDepth), + as.integer(maxBins)) + new("DecisionTreeClassificationModel", jobj = jobj) + } +) + }) + +# Makes predictions from a Decision Tree Regression model or +# a model produced by spark.decisionTree() + +#' @param newData a SparkDataFrame for testing. +#' @return \code{predict} returns a SparkDataFrame containing predicted labeled in a column named +#' "prediction" +#' @rdname spark.decisionTree +#' @export +#' @note predict(decisionTreeRegressionModel) since 2.1.0 +setMethod("predict", signature(object = "DecisionTreeRegressionModel"), + function(object, newData) { +predict_internal(object, newData) + }) + +#' @rdname spark.decisionTree +#' @export +#' @note predict(decisionTreeClassificationModel) since 2.1.0 +setMethod("predict", signature(object = "DecisionTreeClassificationModel"), + function(object, newData) { +predict_internal(object, newData) + }) + +#' Save the Decision Tree Regression model to the input path. +#' +#' @param object A fitted Decision tree regression model +#' @param path The directory where the model is saved +#' @param overwrite Overwrites or not if the output path already exists. Default is FALSE +#' which means throw exception if the output path exists. +#' +#' @aliases write.ml,DecisionTreeRegressionModel,character-method +#' @rdname spark.deci
[GitHub] spark pull request #13690: [SPARK-15767][R][ML] Decision Tree Regression wra...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/13690#discussion_r82512937 --- Diff: R/pkg/R/mllib.R --- @@ -1427,3 +1447,185 @@ print.summary.KSTest <- function(x, ...) { cat(summaryStr, "\n") invisible(x) } + +#' Decision Tree Model for Regression and Classification +#' +#' \code{spark.decisionTree} fits a Decision Tree Regression model or Classification model on +#' a SparkDataFrame. Users can call \code{summary} to get a summary of the fitted Decision Tree +#' model, \code{predict} to make predictions on new data, and \code{write.ml}/\code{read.ml} to +#' save/load fitted models. +#' For more details, see \href{https://en.wikipedia.org/wiki/Decision_tree_learning}{Decision Tree} +#' +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', ':', '+', and '-'. +#' @param type type of model to fit +#' @param maxDepth Maximum depth of the tree (>= 0). +#' @param maxBins Maximum number of bins used for discretizing continuous features and for choosing +#'how to split on features at each node. More bins give higher granularity. Must be +#'>= 2 and >= number of categories in any categorical feature. (default = 32) +#' @param ... additional arguments passed to the method. +#' @aliases spark.decisionTree,SparkDataFrame,formula-method +#' @return \code{spark.decisionTree} returns a fitted Decision Tree model. +#' @rdname spark.decisionTree +#' @name spark.decisionTree +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(longley) +#' +#' # fit a Decision Tree Regression Model +#' model <- spark.decisionTree(data, Employed ~ ., type = "regression", maxDepth = 5, maxBins = 16) +#' +#' # get the summary of the model +#' summary(model) +#' +#' # make predictions +#' predictions <- predict(model, df) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' } +#' @note spark.decisionTree since 2.1.0 +setMethod("spark.decisionTree", signature(data = "SparkDataFrame", formula = "formula"), + function(data, formula, type = c("regression", "classification"), + maxDepth = 5, maxBins = 32 ) { +type <- match.arg(type) +formula <- paste(deparse(formula), collapse = "") +switch(type, + regression = { + jobj <- callJStatic("org.apache.spark.ml.r.DecisionTreeRegressorWrapper", + "fit", data@sdf, formula, as.integer(maxDepth), + as.integer(maxBins)) + new("DecisionTreeRegressionModel", jobj = jobj) + }, + classification = { + jobj <- callJStatic("org.apache.spark.ml.r.DecisionTreeClassifierWrapper", + "fit", data@sdf, formula, as.integer(maxDepth), + as.integer(maxBins)) + new("DecisionTreeClassificationModel", jobj = jobj) + } +) + }) + +# Makes predictions from a Decision Tree Regression model or +# a model produced by spark.decisionTree() + +#' @param newData a SparkDataFrame for testing. +#' @return \code{predict} returns a SparkDataFrame containing predicted labeled in a column named +#' "prediction" +#' @rdname spark.decisionTree +#' @export +#' @note predict(decisionTreeRegressionModel) since 2.1.0 +setMethod("predict", signature(object = "DecisionTreeRegressionModel"), + function(object, newData) { +predict_internal(object, newData) + }) + +#' @rdname spark.decisionTree +#' @export +#' @note predict(decisionTreeClassificationModel) since 2.1.0 +setMethod("predict", signature(object = "DecisionTreeClassificationModel"), + function(object, newData) { +predict_internal(object, newData) + }) + +#' Save the Decision Tree Regression model to the input path. +#' +#' @param object A fitted Decision tree regression model +#' @param path The directory where the model is saved +#' @param overwrite Overwrites or not if the output path already exists. Default is FALSE +#' which means throw exception if the output path exists. +#' +#' @aliases write.ml,DecisionTreeRegressionModel,character-method +#' @rdname spark.deci
[GitHub] spark pull request #13690: [SPARK-15767][R][ML] Decision Tree Regression wra...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/13690#discussion_r82512799 --- Diff: R/pkg/R/mllib.R --- @@ -1427,3 +1447,185 @@ print.summary.KSTest <- function(x, ...) { cat(summaryStr, "\n") invisible(x) } + +#' Decision Tree Model for Regression and Classification +#' +#' \code{spark.decisionTree} fits a Decision Tree Regression model or Classification model on +#' a SparkDataFrame. Users can call \code{summary} to get a summary of the fitted Decision Tree +#' model, \code{predict} to make predictions on new data, and \code{write.ml}/\code{read.ml} to +#' save/load fitted models. +#' For more details, see \href{https://en.wikipedia.org/wiki/Decision_tree_learning}{Decision Tree} +#' +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', ':', '+', and '-'. +#' @param type type of model to fit +#' @param maxDepth Maximum depth of the tree (>= 0). +#' @param maxBins Maximum number of bins used for discretizing continuous features and for choosing +#'how to split on features at each node. More bins give higher granularity. Must be +#'>= 2 and >= number of categories in any categorical feature. (default = 32) +#' @param ... additional arguments passed to the method. +#' @aliases spark.decisionTree,SparkDataFrame,formula-method +#' @return \code{spark.decisionTree} returns a fitted Decision Tree model. +#' @rdname spark.decisionTree +#' @name spark.decisionTree +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(longley) +#' +#' # fit a Decision Tree Regression Model +#' model <- spark.decisionTree(data, Employed ~ ., type = "regression", maxDepth = 5, maxBins = 16) +#' +#' # get the summary of the model +#' summary(model) +#' +#' # make predictions +#' predictions <- predict(model, df) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' } +#' @note spark.decisionTree since 2.1.0 +setMethod("spark.decisionTree", signature(data = "SparkDataFrame", formula = "formula"), + function(data, formula, type = c("regression", "classification"), + maxDepth = 5, maxBins = 32 ) { +type <- match.arg(type) +formula <- paste(deparse(formula), collapse = "") +switch(type, + regression = { + jobj <- callJStatic("org.apache.spark.ml.r.DecisionTreeRegressorWrapper", + "fit", data@sdf, formula, as.integer(maxDepth), + as.integer(maxBins)) + new("DecisionTreeRegressionModel", jobj = jobj) + }, + classification = { + jobj <- callJStatic("org.apache.spark.ml.r.DecisionTreeClassifierWrapper", + "fit", data@sdf, formula, as.integer(maxDepth), + as.integer(maxBins)) + new("DecisionTreeClassificationModel", jobj = jobj) + } +) + }) + +# Makes predictions from a Decision Tree Regression model or +# a model produced by spark.decisionTree() + +#' @param newData a SparkDataFrame for testing. +#' @return \code{predict} returns a SparkDataFrame containing predicted labeled in a column named +#' "prediction" +#' @rdname spark.decisionTree +#' @export +#' @note predict(decisionTreeRegressionModel) since 2.1.0 +setMethod("predict", signature(object = "DecisionTreeRegressionModel"), + function(object, newData) { +predict_internal(object, newData) + }) + +#' @rdname spark.decisionTree +#' @export +#' @note predict(decisionTreeClassificationModel) since 2.1.0 --- End diff -- add `@aliases` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13690: [SPARK-15767][R][ML] Decision Tree Regression wra...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/13690#discussion_r82512787 --- Diff: R/pkg/R/mllib.R --- @@ -1427,3 +1447,185 @@ print.summary.KSTest <- function(x, ...) { cat(summaryStr, "\n") invisible(x) } + +#' Decision Tree Model for Regression and Classification +#' +#' \code{spark.decisionTree} fits a Decision Tree Regression model or Classification model on +#' a SparkDataFrame. Users can call \code{summary} to get a summary of the fitted Decision Tree +#' model, \code{predict} to make predictions on new data, and \code{write.ml}/\code{read.ml} to +#' save/load fitted models. +#' For more details, see \href{https://en.wikipedia.org/wiki/Decision_tree_learning}{Decision Tree} +#' +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', ':', '+', and '-'. +#' @param type type of model to fit +#' @param maxDepth Maximum depth of the tree (>= 0). +#' @param maxBins Maximum number of bins used for discretizing continuous features and for choosing +#'how to split on features at each node. More bins give higher granularity. Must be +#'>= 2 and >= number of categories in any categorical feature. (default = 32) +#' @param ... additional arguments passed to the method. +#' @aliases spark.decisionTree,SparkDataFrame,formula-method +#' @return \code{spark.decisionTree} returns a fitted Decision Tree model. +#' @rdname spark.decisionTree +#' @name spark.decisionTree +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(longley) +#' +#' # fit a Decision Tree Regression Model +#' model <- spark.decisionTree(data, Employed ~ ., type = "regression", maxDepth = 5, maxBins = 16) +#' +#' # get the summary of the model +#' summary(model) +#' +#' # make predictions +#' predictions <- predict(model, df) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' } +#' @note spark.decisionTree since 2.1.0 +setMethod("spark.decisionTree", signature(data = "SparkDataFrame", formula = "formula"), + function(data, formula, type = c("regression", "classification"), + maxDepth = 5, maxBins = 32 ) { +type <- match.arg(type) +formula <- paste(deparse(formula), collapse = "") +switch(type, + regression = { + jobj <- callJStatic("org.apache.spark.ml.r.DecisionTreeRegressorWrapper", + "fit", data@sdf, formula, as.integer(maxDepth), + as.integer(maxBins)) + new("DecisionTreeRegressionModel", jobj = jobj) + }, + classification = { + jobj <- callJStatic("org.apache.spark.ml.r.DecisionTreeClassifierWrapper", + "fit", data@sdf, formula, as.integer(maxDepth), + as.integer(maxBins)) + new("DecisionTreeClassificationModel", jobj = jobj) + } +) + }) + +# Makes predictions from a Decision Tree Regression model or +# a model produced by spark.decisionTree() --- End diff -- Isn't the `Decision Tree Regression model` produced by `spark.decisionTree()`? could you clarify? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13690: [SPARK-15767][R][ML] Decision Tree Regression wra...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/13690#discussion_r82512727 --- Diff: R/pkg/R/mllib.R --- @@ -1427,3 +1447,185 @@ print.summary.KSTest <- function(x, ...) { cat(summaryStr, "\n") invisible(x) } + +#' Decision Tree Model for Regression and Classification +#' +#' \code{spark.decisionTree} fits a Decision Tree Regression model or Classification model on +#' a SparkDataFrame. Users can call \code{summary} to get a summary of the fitted Decision Tree +#' model, \code{predict} to make predictions on new data, and \code{write.ml}/\code{read.ml} to +#' save/load fitted models. +#' For more details, see \href{https://en.wikipedia.org/wiki/Decision_tree_learning}{Decision Tree} +#' +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', ':', '+', and '-'. +#' @param type type of model to fit +#' @param maxDepth Maximum depth of the tree (>= 0). +#' @param maxBins Maximum number of bins used for discretizing continuous features and for choosing +#'how to split on features at each node. More bins give higher granularity. Must be +#'>= 2 and >= number of categories in any categorical feature. (default = 32) +#' @param ... additional arguments passed to the method. +#' @aliases spark.decisionTree,SparkDataFrame,formula-method +#' @return \code{spark.decisionTree} returns a fitted Decision Tree model. +#' @rdname spark.decisionTree +#' @name spark.decisionTree +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(longley) +#' +#' # fit a Decision Tree Regression Model +#' model <- spark.decisionTree(data, Employed ~ ., type = "regression", maxDepth = 5, maxBins = 16) +#' +#' # get the summary of the model +#' summary(model) +#' +#' # make predictions +#' predictions <- predict(model, df) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' } +#' @note spark.decisionTree since 2.1.0 +setMethod("spark.decisionTree", signature(data = "SparkDataFrame", formula = "formula"), + function(data, formula, type = c("regression", "classification"), + maxDepth = 5, maxBins = 32 ) { --- End diff -- nit: extra space after `32 )` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13690: [SPARK-15767][R][ML] Decision Tree Regression wra...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/13690#discussion_r82512770 --- Diff: R/pkg/R/mllib.R --- @@ -1427,3 +1447,185 @@ print.summary.KSTest <- function(x, ...) { cat(summaryStr, "\n") invisible(x) } + +#' Decision Tree Model for Regression and Classification +#' +#' \code{spark.decisionTree} fits a Decision Tree Regression model or Classification model on +#' a SparkDataFrame. Users can call \code{summary} to get a summary of the fitted Decision Tree +#' model, \code{predict} to make predictions on new data, and \code{write.ml}/\code{read.ml} to +#' save/load fitted models. +#' For more details, see \href{https://en.wikipedia.org/wiki/Decision_tree_learning}{Decision Tree} +#' +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', ':', '+', and '-'. +#' @param type type of model to fit +#' @param maxDepth Maximum depth of the tree (>= 0). +#' @param maxBins Maximum number of bins used for discretizing continuous features and for choosing +#'how to split on features at each node. More bins give higher granularity. Must be +#'>= 2 and >= number of categories in any categorical feature. (default = 32) +#' @param ... additional arguments passed to the method. --- End diff -- should it support other parameters, like numClasses, features, impurity? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13690: [SPARK-15767][R][ML] Decision Tree Regression wra...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/13690#discussion_r82513063 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/DecisionTreeClassifierWrapper.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.r + +import org.apache.hadoop.fs.Path +import org.json4s._ +import org.json4s.JsonDSL._ +import org.json4s.jackson.JsonMethods._ + +import org.apache.spark.ml.{Pipeline, PipelineModel} +import org.apache.spark.ml.attribute.AttributeGroup +import org.apache.spark.ml.classification.{DecisionTreeClassificationModel, DecisionTreeClassifier} +import org.apache.spark.ml.feature.RFormula +import org.apache.spark.ml.util._ +import org.apache.spark.sql.{DataFrame, Dataset} + +private[r] class DecisionTreeClassifierWrapper private ( + val pipeline: PipelineModel, + val features: Array[String], + val maxDepth: Int, + val maxBins: Int) extends MLWritable { + + private val DTModel: DecisionTreeClassificationModel = +pipeline.stages(1).asInstanceOf[DecisionTreeClassificationModel] + + lazy val depth: Int = DTModel.depth + lazy val numNodes: Int = DTModel.numNodes + lazy val numClasses: Int = DTModel.numClasses + + def summary: String = DTModel.toDebugString + + def transform(dataset: Dataset[_]): DataFrame = { +pipeline.transform(dataset) + .drop(DTModel.getFeaturesCol) + } + + override def write: MLWriter = new + DecisionTreeClassifierWrapper.DecisionTreeClassifierWrapperWriter(this) +} + +private[r] object DecisionTreeClassifierWrapper extends MLReadable[DecisionTreeClassifierWrapper] { + def fit(data: DataFrame, + formula: String, + maxDepth: Int, + maxBins: Int): DecisionTreeClassifierWrapper = { + +val rFormula = new RFormula() + .setFormula(formula) + .setFeaturesCol("features") --- End diff -- could you take a look at another model wrapper (like NaiveBayesWrapper) and `RWrapperUtils` on how to handle DataFrame column name - this shouldn't be hardcoded here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13690: [SPARK-15767][R][ML] Decision Tree Regression wra...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/13690#discussion_r82512912 --- Diff: R/pkg/R/mllib.R --- @@ -1427,3 +1447,185 @@ print.summary.KSTest <- function(x, ...) { cat(summaryStr, "\n") invisible(x) } + +#' Decision Tree Model for Regression and Classification +#' +#' \code{spark.decisionTree} fits a Decision Tree Regression model or Classification model on +#' a SparkDataFrame. Users can call \code{summary} to get a summary of the fitted Decision Tree +#' model, \code{predict} to make predictions on new data, and \code{write.ml}/\code{read.ml} to +#' save/load fitted models. +#' For more details, see \href{https://en.wikipedia.org/wiki/Decision_tree_learning}{Decision Tree} +#' +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', ':', '+', and '-'. +#' @param type type of model to fit +#' @param maxDepth Maximum depth of the tree (>= 0). +#' @param maxBins Maximum number of bins used for discretizing continuous features and for choosing +#'how to split on features at each node. More bins give higher granularity. Must be +#'>= 2 and >= number of categories in any categorical feature. (default = 32) +#' @param ... additional arguments passed to the method. +#' @aliases spark.decisionTree,SparkDataFrame,formula-method +#' @return \code{spark.decisionTree} returns a fitted Decision Tree model. +#' @rdname spark.decisionTree +#' @name spark.decisionTree +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(longley) +#' +#' # fit a Decision Tree Regression Model +#' model <- spark.decisionTree(data, Employed ~ ., type = "regression", maxDepth = 5, maxBins = 16) +#' +#' # get the summary of the model +#' summary(model) +#' +#' # make predictions +#' predictions <- predict(model, df) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' } +#' @note spark.decisionTree since 2.1.0 +setMethod("spark.decisionTree", signature(data = "SparkDataFrame", formula = "formula"), + function(data, formula, type = c("regression", "classification"), + maxDepth = 5, maxBins = 32 ) { +type <- match.arg(type) +formula <- paste(deparse(formula), collapse = "") +switch(type, + regression = { + jobj <- callJStatic("org.apache.spark.ml.r.DecisionTreeRegressorWrapper", + "fit", data@sdf, formula, as.integer(maxDepth), + as.integer(maxBins)) + new("DecisionTreeRegressionModel", jobj = jobj) + }, + classification = { + jobj <- callJStatic("org.apache.spark.ml.r.DecisionTreeClassifierWrapper", + "fit", data@sdf, formula, as.integer(maxDepth), + as.integer(maxBins)) + new("DecisionTreeClassificationModel", jobj = jobj) + } +) + }) + +# Makes predictions from a Decision Tree Regression model or +# a model produced by spark.decisionTree() + +#' @param newData a SparkDataFrame for testing. +#' @return \code{predict} returns a SparkDataFrame containing predicted labeled in a column named +#' "prediction" +#' @rdname spark.decisionTree +#' @export +#' @note predict(decisionTreeRegressionModel) since 2.1.0 +setMethod("predict", signature(object = "DecisionTreeRegressionModel"), + function(object, newData) { +predict_internal(object, newData) + }) + +#' @rdname spark.decisionTree +#' @export +#' @note predict(decisionTreeClassificationModel) since 2.1.0 +setMethod("predict", signature(object = "DecisionTreeClassificationModel"), + function(object, newData) { +predict_internal(object, newData) + }) + +#' Save the Decision Tree Regression model to the input path. +#' +#' @param object A fitted Decision tree regression model +#' @param path The directory where the model is saved +#' @param overwrite Overwrites or not if the output path already exists. Default is FALSE +#' which means throw exception if the output path exists. +#' +#' @aliases write.ml,DecisionTreeRegressionModel,character-method +#' @rdname spark.deci
[GitHub] spark pull request #14959: [SPARK-17387][PYSPARK] Creating SparkContext() fr...
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/14959#discussion_r82512997 --- Diff: python/pyspark/conf.py --- @@ -101,13 +101,25 @@ def __init__(self, loadDefaults=True, _jvm=None, _jconf=None): self._jconf = _jconf else: from pyspark.context import SparkContext -SparkContext._ensure_initialized() _jvm = _jvm or SparkContext._jvm -self._jconf = _jvm.SparkConf(loadDefaults) + +if _jvm: +# JVM is created, so create self._jconf directly through JVM +self._jconf = _jvm.SparkConf(loadDefaults) +self._conf = None +else: +# JVM is not created, so store data in self._conf first +self._jconf = None +self._conf = {} def set(self, key, value): """Set a configuration property.""" -self._jconf.set(key, unicode(value)) +# Try to set self._jconf first if JVM is created, set self._conf if JVM is not created yet. +if self._jconf: +self._jconf.set(key, unicode(value)) +else: +# Don't use unicode for self._conf, otherwise we will get exception when launching jvm. +self._conf[key] = value --- End diff -- Fixed, It might be my last previous commits' issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14959: [SPARK-17387][PYSPARK] Creating SparkContext() fr...
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/14959#discussion_r82512981 --- Diff: python/pyspark/conf.py --- @@ -118,28 +130,28 @@ def setIfMissing(self, key, value): def setMaster(self, value): """Set master URL to connect to.""" -self._jconf.setMaster(value) +self.set("spark.master", value) return self def setAppName(self, value): """Set application name.""" -self._jconf.setAppName(value) +self.set("spark.app.name", value) return self def setSparkHome(self, value): """Set path where Spark is installed on worker nodes.""" -self._jconf.setSparkHome(value) +self.set("spark.home", value) return self def setExecutorEnv(self, key=None, value=None, pairs=None): """Set an environment variable to be passed to executors.""" if (key is not None and pairs is not None) or (key is None and pairs is None): raise Exception("Either pass one key-value pair or a list of pairs") elif key is not None: -self._jconf.setExecutorEnv(key, value) +self.set("spark.executorEnv." + key, value) --- End diff -- Fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15404: Branch 2.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15404 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15404: Branch 2.0
GitHub user yintengfei opened a pull request: https://github.com/apache/spark/pull/15404 Branch 2.0 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-2.0 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15404.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15404 commit 5735b8bd769c64e2b0e0fae75bad794cde3edc99 Author: Reynold Xin Date: 2016-08-18T08:37:25Z [SPARK-16391][SQL] Support partial aggregation for reduceGroups ## What changes were proposed in this pull request? This patch introduces a new private ReduceAggregator interface that is a subclass of Aggregator. ReduceAggregator only requires a single associative and commutative reduce function. ReduceAggregator is also used to implement KeyValueGroupedDataset.reduceGroups in order to support partial aggregation. Note that the pull request was initially done by viirya. ## How was this patch tested? Covered by original tests for reduceGroups, as well as a new test suite for ReduceAggregator. Author: Reynold Xin Author: Liang-Chi Hsieh Closes #14576 from rxin/reduceAggregator. (cherry picked from commit 1748f824101870b845dbbd118763c6885744f98a) Signed-off-by: Wenchen Fan commit ec5f157a32f0c65b5f93bdde7a6334e982b3b83c Author: petermaxlee Date: 2016-08-18T11:44:13Z [SPARK-17117][SQL] 1 / NULL should not fail analysis ## What changes were proposed in this pull request? This patch fixes the problem described in SPARK-17117, i.e. "SELECT 1 / NULL" throws an analysis exception: ``` org.apache.spark.sql.AnalysisException: cannot resolve '(1 / NULL)' due to data type mismatch: differing types in '(1 / NULL)' (int and null). ``` The problem is that division type coercion did not take null type into account. ## How was this patch tested? A unit test for the type coercion, and a few end-to-end test cases using SQLQueryTestSuite. Author: petermaxlee Closes #14695 from petermaxlee/SPARK-17117. (cherry picked from commit 68f5087d2107d6afec5d5745f0cb0e9e3bdd6a0b) Signed-off-by: Herman van Hovell commit 176af17a7213a4c2847a04f715137257657f2961 Author: Xin Ren Date: 2016-08-10T07:49:06Z [MINOR][SPARKR] R API documentation for "coltypes" is confusing ## What changes were proposed in this pull request? R API documentation for "coltypes" is confusing, found when working on another ticket. Current version http://spark.apache.org/docs/2.0.0/api/R/coltypes.html, where parameters have 2 "x" which is a duplicate, and also the example is not very clear ![current](https://cloud.githubusercontent.com/assets/3925641/17386808/effb98ce-59a2-11e6-9657-d477d258a80c.png) ![screen shot 2016-08-03 at 5 56 00 pm](https://cloud.githubusercontent.com/assets/3925641/17386884/91831096-59a3-11e6-84af-39890b3d45d8.png) ## How was this patch tested? Tested manually on local machine. And the screenshots are like below: ![screen shot 2016-08-07 at 11 29 20 pm](https://cloud.githubusercontent.com/assets/3925641/17471144/df36633c-5cf6-11e6-8238-4e32ead0e529.png) ![screen shot 2016-08-03 at 5 56 22 pm](https://cloud.githubusercontent.com/assets/3925641/17386896/9d36cb26-59a3-11e6-9619-6dae29f7ab17.png) Author: Xin Ren Closes #14489 from keypointt/rExample. (cherry picked from commit 1203c8415cd11540f79a235e66a2f241ca6c71e4) Signed-off-by: Shivaram Venkataraman commit ea684b69cd6934bc093f4a5a8b0d8470e92157cd Author: Eric Liang Date: 2016-08-18T11:33:55Z [SPARK-17069] Expose spark.range() as table-valued function in SQL This adds analyzer rules for resolving table-valued functions, and adds one builtin implementation for range(). The arguments for range() are the same as those of `spark.range()`. Unit tests. cc hvanhovell Author: Eric Liang Closes #14656 from ericl/sc-4309. (cherry picked from commit 412dba63b511474a6db3c43c8618d803e604bc6b) Signed-off-by: Reynold Xin commit c180d637a3caca0d4e46f4980c10d1005eb453bc Author: petermaxlee Date: 2016-08-19T01:19:47Z [SPARK-16947][SQL] Support type coercion and foldable expression for inline tables This patch improves inline table support with the follow
[GitHub] spark pull request #14959: [SPARK-17387][PYSPARK] Creating SparkContext() fr...
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/14959#discussion_r82512714 --- Diff: python/pyspark/conf.py --- @@ -149,35 +161,53 @@ def setAll(self, pairs): :param pairs: list of key-value pairs to set """ for (k, v) in pairs: -self._jconf.set(k, v) +self.set(k, v) return self def get(self, key, defaultValue=None): """Get the configured value for some key, or return a default otherwise.""" if defaultValue is None: # Py4J doesn't call the right get() if we pass None -if not self._jconf.contains(key): -return None -return self._jconf.get(key) +if self._jconf: +if not self._jconf.contains(key): +return None +return self._jconf.get(key) +else: +if key not in self._conf: +return None +return self._conf[key] else: -return self._jconf.get(key, defaultValue) +if self._jconf: +return self._jconf.get(key, defaultValue) +else: +return self._conf.get(key, defaultValue) def getAll(self): """Get all values as a list of key-value pairs.""" pairs = [] -for elem in self._jconf.getAll(): -pairs.append((elem._1(), elem._2())) +if self._jconf: +for elem in self._jconf.getAll(): +pairs.append((elem._1(), elem._2())) +else: +for k, v in self._conf.items(): +pairs.append((k, v)) return pairs def contains(self, key): """Does this configuration contain a given key?""" -return self._jconf.contains(key) +if self._jconf: +return self._jconf.contains(key) +else: +return key in self._conf def toDebugString(self): """ Returns a printable version of the configuration, as a list of key=value pairs, one per line. """ -return self._jconf.toDebugString() +if self._jconf: +return self._jconf.toDebugString() +else: +return '\n'.join('%s=%s' % (k, v) for k, v in self._conf.items()) --- End diff -- They may be different, because _jconf has the extra configuration in jvm side (like spark-defaults.conf), while self._conf only has the configuration in python side. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15360: [SPARK-17073] [SQL] [FOLLOWUP] generate column-level sta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15360 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66586/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15360: [SPARK-17073] [SQL] [FOLLOWUP] generate column-level sta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15360 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15360: [SPARK-17073] [SQL] [FOLLOWUP] generate column-level sta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15360 **[Test build #66586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66586/consoleFull)** for PR 15360 at commit [`2ee4252`](https://github.com/apache/spark/commit/2ee4252c785848873fa422ec49b697154b703133). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9287: SPARK-11326: Split networking in standalone mode
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/9287 +1 for closing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9287: SPARK-11326: Split networking in standalone mode
Github user tnachen commented on the issue: https://github.com/apache/spark/pull/9287 This has been stale for a while, we should close this if there is no update here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12933: [Spark-15155][Mesos] Optionally ignore default role reso...
Github user tnachen commented on the issue: https://github.com/apache/spark/pull/12933 @hellertime Are you able to rebase? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13077: [SPARK-10748] [Mesos] Log error instead of crashing Spar...
Github user tnachen commented on the issue: https://github.com/apache/spark/pull/13077 @devaraj-kavali Are you still able to update this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13713: [SPARK-15994] [MESOS] Allow enabling Mesos fetch cache i...
Github user tnachen commented on the issue: https://github.com/apache/spark/pull/13713 @drcrallen Are you still planning to update this? It's quite a useful feature, so hoping this can get in. Also since Fine grain mode is depcreated I don't think we need to update it too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15334: [SPARK-10367][SQL][WIP] Support Parquet logical type INT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15334 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15334: [SPARK-10367][SQL][WIP] Support Parquet logical type INT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15334 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66585/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15334: [SPARK-10367][SQL][WIP] Support Parquet logical type INT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15334 **[Test build #66585 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66585/consoleFull)** for PR 15334 at commit [`72bc930`](https://github.com/apache/spark/commit/72bc93033b47266fd9661d72dcadbbd8ba906b4b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12691: [Spark-14761][SQL][WIP] Reject invalid join methods when...
Github user bkpathak commented on the issue: https://github.com/apache/spark/pull/12691 Hi @holdenk I am still interested in working on this but it looks like I pull and merged with a master branch instead of rebasing it. Can I close it and open another pull request. Or how should I proceed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15044: [SQL][SPARK-17490] Optimize SerializeFromObject() for a ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/15044 @hvanhovell, after my investigations, I have added code to generate `UnsafeArrayData' at two code paths. Could you please review this again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12691: [Spark-14761][SQL][WIP] Reject invalid join metho...
GitHub user bkpathak reopened a pull request: https://github.com/apache/spark/pull/12691 [Spark-14761][SQL][WIP] Reject invalid join methods when join columns are not specified in PySpark DataFrame join. ## What changes were proposed in this pull request? In PySpark, the invalid join type will not throw error for the following join: ```df1.join(df2, how='not-a-valid-join-type')``` The signature of the join is: ```def join(self, other, on=None, how=None):``` The existing code completely ignores the `how` parameter when `on` is `None`. This patch will process the arguments passed to join and pass in to JVM Spark SQL Analyzer, which will validate the join type passed. ## How was this patch tested? Used manual and existing test suites. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bkpathak/spark spark-14761 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12691.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12691 commit c76baff0cc4775c2191d075cc9a8176e4915fec8 Author: Bryan Cutler Date: 2016-09-11T09:19:39Z [SPARK-17336][PYSPARK] Fix appending multiple times to PYTHONPATH from spark-config.sh ## What changes were proposed in this pull request? During startup of Spark standalone, the script file spark-config.sh appends to the PYTHONPATH and can be sourced many times, causing duplicates in the path. This change adds a env flag that is set when the PYTHONPATH is appended so it will happen only one time. ## How was this patch tested? Manually started standalone master/worker and verified PYTHONPATH has no duplicate entries. Author: Bryan Cutler Closes #15028 from BryanCutler/fix-duplicate-pythonpath-SPARK-17336. commit 883c7631847a95684534222c1b6cfed8e62710c8 Author: Yanbo Liang Date: 2016-09-11T12:47:13Z [SPARK-17389][FOLLOW-UP][ML] Change KMeans k-means|| default init steps from 5 to 2. ## What changes were proposed in this pull request? #14956 reduced default k-means|| init steps to 2 from 5 only for spark.mllib package, we should also do same change for spark.ml and PySpark. ## How was this patch tested? Existing tests. Author: Yanbo Liang Closes #15050 from yanboliang/spark-17389. commit 767d48076971f6f1e2c93ee540a9b2e5e465631b Author: Sameer Agarwal Date: 2016-09-11T15:35:27Z [SPARK-17415][SQL] Better error message for driver-side broadcast join OOMs ## What changes were proposed in this pull request? This is a trivial patch that catches all `OutOfMemoryError` while building the broadcast hash relation and rethrows it by wrapping it in a nice error message. ## How was this patch tested? Existing Tests Author: Sameer Agarwal Closes #14979 from sameeragarwal/broadcast-join-error. commit 72eec70bdbf6fb67c977463db5d8d95dd3040ae8 Author: Josh Rosen Date: 2016-09-12T04:51:22Z [SPARK-17486] Remove unused TaskMetricsUIData.updatedBlockStatuses field The `TaskMetricsUIData.updatedBlockStatuses` field is assigned to but never read, increasing the memory consumption of the web UI. We should remove this field. Author: Josh Rosen Closes #15038 from JoshRosen/remove-updated-block-statuses-from-TaskMetricsUIData. commit cc87280fcd065b01667ca7a59a1a32c7ab757355 Author: cenyuhai Date: 2016-09-12T10:52:56Z [SPARK-17171][WEB UI] DAG will list all partitions in the graph ## What changes were proposed in this pull request? DAG will list all partitions in the graph, it is too slow and hard to see all graph. Always we don't want to see all partitionsï¼we just want to see the relations of DAG graph. So I just show 2 root nodes for Rdds. Before this PR, the DAG graph looks like [dag1.png](https://issues.apache.org/jira/secure/attachment/12824702/dag1.png), [dag3.png](https://issues.apache.org/jira/secure/attachment/12825456/dag3.png), after this PR, the DAG graph looks like [dag2.png](https://issues.apache.org/jira/secure/attachment/12824703/dag2.png),[dag4.png](https://issues.apache.org/jira/secure/attachment/12825457/dag4.png) Author: cenyuhai Author: å²çæµ· <261810...@qq.com> Closes #14737 from cenyuhai/SPARK-17171. commit 4efcdb7feae24e41d8120b59430f8b77cc2106a6 Author: codlife <1004910...@qq.com> Date: 2016-09-12T11:10:46Z [SPARK-17447] Performance improvement in Partitioner.defaultPartitioner without sortBy ## What changes were proposed in this pull request? if there are many rdds in some situations,the sort will loss he performance servely,actually we needn't sort the rdds , we can just scan the rdds one time to gai
[GitHub] spark pull request #12691: [Spark-14761][SQL][WIP] Reject invalid join metho...
Github user bkpathak closed the pull request at: https://github.com/apache/spark/pull/12691 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12691: [Spark-14761][SQL][WIP] Reject invalid join metho...
Github user bkpathak closed the pull request at: https://github.com/apache/spark/pull/12691 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12691: [Spark-14761][SQL][WIP] Reject invalid join metho...
GitHub user bkpathak reopened a pull request: https://github.com/apache/spark/pull/12691 [Spark-14761][SQL][WIP] Reject invalid join methods when join columns are not specified in PySpark DataFrame join. ## What changes were proposed in this pull request? In PySpark, the invalid join type will not throw error for the following join: ```df1.join(df2, how='not-a-valid-join-type')``` The signature of the join is: ```def join(self, other, on=None, how=None):``` The existing code completely ignores the `how` parameter when `on` is `None`. This patch will process the arguments passed to join and pass in to JVM Spark SQL Analyzer, which will validate the join type passed. ## How was this patch tested? Used manual and existing test suites. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bkpathak/spark spark-14761 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12691.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12691 commit c76baff0cc4775c2191d075cc9a8176e4915fec8 Author: Bryan Cutler Date: 2016-09-11T09:19:39Z [SPARK-17336][PYSPARK] Fix appending multiple times to PYTHONPATH from spark-config.sh ## What changes were proposed in this pull request? During startup of Spark standalone, the script file spark-config.sh appends to the PYTHONPATH and can be sourced many times, causing duplicates in the path. This change adds a env flag that is set when the PYTHONPATH is appended so it will happen only one time. ## How was this patch tested? Manually started standalone master/worker and verified PYTHONPATH has no duplicate entries. Author: Bryan Cutler Closes #15028 from BryanCutler/fix-duplicate-pythonpath-SPARK-17336. commit 883c7631847a95684534222c1b6cfed8e62710c8 Author: Yanbo Liang Date: 2016-09-11T12:47:13Z [SPARK-17389][FOLLOW-UP][ML] Change KMeans k-means|| default init steps from 5 to 2. ## What changes were proposed in this pull request? #14956 reduced default k-means|| init steps to 2 from 5 only for spark.mllib package, we should also do same change for spark.ml and PySpark. ## How was this patch tested? Existing tests. Author: Yanbo Liang Closes #15050 from yanboliang/spark-17389. commit 767d48076971f6f1e2c93ee540a9b2e5e465631b Author: Sameer Agarwal Date: 2016-09-11T15:35:27Z [SPARK-17415][SQL] Better error message for driver-side broadcast join OOMs ## What changes were proposed in this pull request? This is a trivial patch that catches all `OutOfMemoryError` while building the broadcast hash relation and rethrows it by wrapping it in a nice error message. ## How was this patch tested? Existing Tests Author: Sameer Agarwal Closes #14979 from sameeragarwal/broadcast-join-error. commit 72eec70bdbf6fb67c977463db5d8d95dd3040ae8 Author: Josh Rosen Date: 2016-09-12T04:51:22Z [SPARK-17486] Remove unused TaskMetricsUIData.updatedBlockStatuses field The `TaskMetricsUIData.updatedBlockStatuses` field is assigned to but never read, increasing the memory consumption of the web UI. We should remove this field. Author: Josh Rosen Closes #15038 from JoshRosen/remove-updated-block-statuses-from-TaskMetricsUIData. commit cc87280fcd065b01667ca7a59a1a32c7ab757355 Author: cenyuhai Date: 2016-09-12T10:52:56Z [SPARK-17171][WEB UI] DAG will list all partitions in the graph ## What changes were proposed in this pull request? DAG will list all partitions in the graph, it is too slow and hard to see all graph. Always we don't want to see all partitionsï¼we just want to see the relations of DAG graph. So I just show 2 root nodes for Rdds. Before this PR, the DAG graph looks like [dag1.png](https://issues.apache.org/jira/secure/attachment/12824702/dag1.png), [dag3.png](https://issues.apache.org/jira/secure/attachment/12825456/dag3.png), after this PR, the DAG graph looks like [dag2.png](https://issues.apache.org/jira/secure/attachment/12824703/dag2.png),[dag4.png](https://issues.apache.org/jira/secure/attachment/12825457/dag4.png) Author: cenyuhai Author: å²çæµ· <261810...@qq.com> Closes #14737 from cenyuhai/SPARK-17171. commit 4efcdb7feae24e41d8120b59430f8b77cc2106a6 Author: codlife <1004910...@qq.com> Date: 2016-09-12T11:10:46Z [SPARK-17447] Performance improvement in Partitioner.defaultPartitioner without sortBy ## What changes were proposed in this pull request? if there are many rdds in some situations,the sort will loss he performance servely,actually we needn't sort the rdds , we can just scan the rdds one time to gai
[GitHub] spark issue #15360: [SPARK-17073] [SQL] [FOLLOWUP] generate column-level sta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15360 **[Test build #66586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66586/consoleFull)** for PR 15360 at commit [`2ee4252`](https://github.com/apache/spark/commit/2ee4252c785848873fa422ec49b697154b703133). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15360: [SPARK-17073] [SQL] [FOLLOWUP] generate column-level sta...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/15360 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15371: [SPARK-17816] [Core] Fix ConcurrentModificationException...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15371 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org