[GitHub] spark pull request: Mytest0
Github user semad closed the pull request at: https://github.com/apache/spark/pull/8472 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9089] [Core] Fallback to another one if...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8337#issuecomment-135446848 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10316][SQL] respect nondeterministic ex...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/8486 [SPARK-10316][SQL] respect nondeterministic expressions in PhysicalOperation We did a lot of special handling for non-deterministic expressions in `Optimizer`. However, `PhysicalOperation` just collects all Projects and Filters and mess it up. We should respect the operators order caused by non-deterministic expressions in `PhysicalOperation`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8486.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8486 commit 16ae7e2394caf6a3925cb3c69692b4b14c7811cb Author: Wenchen Fan cloud0...@outlook.com Date: 2015-08-27T14:37:46Z respect nondeterministic expressions in PhysicalOperation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8484#issuecomment-135445272 [Test build #41687 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41687/console) for PR 8484 at commit [`9f88786`](https://github.com/apache/spark/commit/9f88786b4efa89a7eddd81e6bba58700630a4429). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8484#issuecomment-135445314 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8484#issuecomment-135436303 [Test build #41687 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41687/consoleFull) for PR 8484 at commit [`9f88786`](https://github.com/apache/spark/commit/9f88786b4efa89a7eddd81e6bba58700630a4429). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10315] remove document on spark.akka.fa...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/8483#issuecomment-135407119 just look at the branch-1.5 code, this parameter is not used (I guess this was used in the years when we used death watch in Spark's implementation) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9148][SPARK-10252][SQL] Update SQL Prog...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/8441#discussion_r38089726 --- Diff: docs/sql-programming-guide.md --- @@ -1696,12 +1711,16 @@ version specified by users. An isolated classloader is used here to avoid depend property can be one of three options: ol licodebuiltin/code/li -Use Hive 0.13.1, which is bundled with the Spark assembly jar when code-Phive/code is +Use Hive 1.2.1, which is bundled with the Spark assembly jar when code-Phive/code is enabled. When this option is chosen, codespark.sql.hive.metastore.version/code must be -either code0.13.1/code or not defined. +either code1.2.1/code or not defined. licodemaven/code/li -Use Hive jars of specified version downloaded from Maven repositories. -liA classpath in the standard format for both Hive and Hadoop./li +Use Hive jars of specified version downloaded from Maven repositories. This configuration +is not generally recommended for production deployments. +liA classpath in the standard format for the JVM. This classpath must include all of Hive +and its dependencies, including the correct version of Hadoop. These jars only need to be +present on the driver, but if you are running in yarn client mode then you must ensure --- End diff -- These jars aren't needed by the executors at all? If that is the case the only time they need to be shipped is in yarn cluster mode. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9170][SQL] Use OrcStructInspector to be...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135423916 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9170][SQL] Use OrcStructInspector to be...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135423867 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8472] [ML] [PySpark] Python API for DCT
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8485#issuecomment-135436880 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8472] [ML] [PySpark] Python API for DCT
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8485#issuecomment-135436933 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8472] [ML] [PySpark] Python API for DCT
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8485#issuecomment-135445963 [Test build #41688 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41688/console) for PR 8485 at commit [`565a831`](https://github.com/apache/spark/commit/565a83142eae29b23ad1bdae3239df375cc47001). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DCT(JavaTransformer, HasInputCol, HasOutputCol):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9089] [Core] Fallback to another one if...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8337#issuecomment-135449083 [Test build #41689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41689/consoleFull) for PR 8337 at commit [`573a37c`](https://github.com/apache/spark/commit/573a37c6a541d6993d6a45a2f7977056e936b05d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9170][SQL] Use OrcStructInspector to be...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135426805 [Test build #41686 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41686/consoleFull) for PR 7520 at commit [`dc8bd26`](https://github.com/apache/spark/commit/dc8bd26b21b67b9bc8d4021965a10bc29ce3b379). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8484#issuecomment-135434765 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9089] [Core] Fallback to another one if...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8337#issuecomment-135446822 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9089] [Core] Fallback to another one if...
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/8337#issuecomment-135446690 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4223] [Core] Support * in acls.
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/8398#issuecomment-135409799 I think it would be nice if we update the docs to tell users * is supported. Can you update docs/configuration.md. Perhaps under each description of modify.acsl, view.acls, admin.acls add something that says Special value of * means anyone --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8484#issuecomment-135434696 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8472] [ML] [PySpark] Python API for DCT
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/8485 [SPARK-8472] [ML] [PySpark] Python API for DCT Add Python API for ml.feature.DCT. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark spark-8472 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8485.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8485 commit 565a83142eae29b23ad1bdae3239df375cc47001 Author: Yanbo Liang yblia...@gmail.com Date: 2015-08-27T13:42:04Z Python API for DCT --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8484#issuecomment-135445315 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41687/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4223] [Core] Support * in acls.
Github user zhuoliu commented on the pull request: https://github.com/apache/spark/pull/8398#issuecomment-135452755 Sure. Docs updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10316][SQL] respect nondeterministic ex...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/8486#discussion_r38103673 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -26,83 +24,28 @@ import org.apache.spark.sql.catalyst.plans._ import org.apache.spark.sql.catalyst.plans.logical._ /** - * A pattern that matches any number of filter operations on top of another relational operator. - * Adjacent filter operators are collected and their conditions are broken up and returned as a - * sequence of conjunctive predicates. - * - * @return A tuple containing a sequence of conjunctive predicates that should be used to filter the - * output and a relational operator. + * A pattern that matches at most one Filter and one Project on top of another relational operator. + * Filter condition is broken up to conjunctive parts. */ -object FilteredOperation extends PredicateHelper { --- End diff -- The `FilteredOperation` is not used anywhere. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10315] remove document on spark.akka.fa...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/8483#issuecomment-135405791 @CodingCat yes, looks unused. Is it unused as of 1.5 too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8464#issuecomment-135407868 [Test build #41681 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41681/console) for PR 8464 at commit [`22e7bc0`](https://github.com/apache/spark/commit/22e7bc0b9882b637bb06ee39a66d3ece789042fa). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode ` * `case class UnionNode(children: Seq[LocalNode]) extends LocalNode ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8464#issuecomment-135407996 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8464#issuecomment-135407998 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41681/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9170][SQL] User-provided columns should...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135423114 @liancheng Thanks for the clear investigation and explanation. If I understand it correctly, it means that the original direction of this PR is correct. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9170][SQL] Use OrcStructInspector to be...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135437020 @viirya Yeah, I agree with you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8472] [ML] [PySpark] Python API for DCT
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8485#issuecomment-135439908 [Test build #41688 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41688/consoleFull) for PR 8485 at commit [`565a831`](https://github.com/apache/spark/commit/565a83142eae29b23ad1bdae3239df375cc47001). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8472] [ML] [PySpark] Python API for DCT
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8485#issuecomment-135446096 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8472] [ML] [PySpark] Python API for DCT
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8485#issuecomment-135446099 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41688/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-135491197 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4223] [Core] Support * in acls.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8398#issuecomment-135493649 [Test build #41699 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41699/console) for PR 8398 at commit [`b1d49b3`](https://github.com/apache/spark/commit/b1d49b32a7b1e85c265a8cee8930d0138fd3bd8d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4223] [Core] Support * in acls.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8398#issuecomment-135493701 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41699/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7685][ML] Apply weights to different sa...
Github user feynmanliang commented on a diff in the pull request: https://github.com/apache/spark/pull/7884#discussion_r38122528 --- Diff: project/MimaExcludes.scala --- @@ -60,6 +60,10 @@ object MimaExcludes { org.apache.spark.ml.regression.LeastSquaresCostFun.this), ProblemFilters.exclude[MissingMethodProblem]( org.apache.spark.ml.classification.LogisticCostFun.this), +ProblemFilters.exclude[MissingMethodProblem]( --- End diff -- Good point, this is OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-135502190 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-135502194 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41702/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9148][SPARK-10252][SQL] Update SQL Prog...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/8441#issuecomment-135508637 thanks LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9148][SPARK-10252][SQL] Update SQL Prog...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8441#issuecomment-135509928 [Test build #41704 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41704/consoleFull) for PR 8441 at commit [`f3fdf62`](https://github.com/apache/spark/commit/f3fdf625b0b092984d8d5f0e733a130ff9ff92b4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/8464#discussion_r38126845 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/local/LimitNode.scala --- @@ -0,0 +1,45 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the License); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an AS IS BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.execution.local + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute + + +case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode { + + private[this] var count = 0 + + override def output: Seq[Attribute] = child.output --- End diff -- Why do iterators need to know their `output`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/8464#discussion_r38126899 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/local/LimitNode.scala --- @@ -0,0 +1,45 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the License); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an AS IS BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.execution.local + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute + + +case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode { --- End diff -- Is there a need to distinguish `Unary` operators from others? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/8464#discussion_r38126926 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/local/LimitNode.scala --- @@ -0,0 +1,45 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the License); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an AS IS BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.execution.local + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute + + +case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode { + + private[this] var count = 0 + + override def output: Seq[Attribute] = child.output + + override def open(): Unit = child.open() --- End diff -- Should this also reset the count? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/8464#discussion_r38128387 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/local/LimitNode.scala --- @@ -0,0 +1,45 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the License); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an AS IS BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.execution.local + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute + + +case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode { + + private[this] var count = 0 + + override def output: Seq[Attribute] = child.output --- End diff -- Hmm I guess this is useful for `collect` which is nice for debugging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/8464#discussion_r38128313 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/local/LocalNodeTest.scala --- @@ -0,0 +1,189 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the License); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an AS IS BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.execution.local + +import scala.util.control.NonFatal + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow} +import org.apache.spark.sql.catalyst.util._ +import org.apache.spark.sql.{DataFrame, Row} +import org.apache.spark.sql.types.StructType + +class LocalNodeTest extends SparkFunSuite { + + /** + * Runs the LocalNode and makes sure the answer matches the expected result. + * @param input the input data to be used. + * @param nodeFunction a function which accepts the input LocalNode and uses it to instantiate + * the local physical operator that's being tested. + * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s. + * @param sortAnswers if true, the answers will be sorted by their toString representations prior + *to being compared. + */ + protected def checkAnswer( + input: DataFrame, + nodeFunction: LocalNode = LocalNode, + expectedAnswer: Seq[Row], + sortAnswers: Boolean = true): Unit = { +doCheckAnswer( + input :: Nil, + nodes = nodeFunction(nodes.head), + expectedAnswer, + sortAnswers) + } + + /** + * Runs the LocalNode and makes sure the answer matches the expected result. + * @param left the left input data to be used. + * @param right the right input data to be used. + * @param nodeFunction a function which accepts the input LocalNode and uses it to instantiate + * the local physical operator that's being tested. + * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s. + * @param sortAnswers if true, the answers will be sorted by their toString representations prior + *to being compared. + */ + protected def checkAnswer2( + left: DataFrame, + right: DataFrame, + nodeFunction: (LocalNode, LocalNode) = LocalNode, + expectedAnswer: Seq[Row], + sortAnswers: Boolean = true): Unit = { +doCheckAnswer( + left :: right :: Nil, + nodes = nodeFunction(nodes(0), nodes(1)), + expectedAnswer, + sortAnswers) + } + + /** + * Runs the `LocalNode`s and makes sure the answer matches the expected result. + * @param input the input data to be used. + * @param nodeFunction a function which accepts a sequence of input `LocalNode`s and uses them to + * instantiate the local physical operator that's being tested. + * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s. + * @param sortAnswers if true, the answers will be sorted by their toString representations prior + *to being compared. + */ + protected def doCheckAnswer( +input: Seq[DataFrame], +nodeFunction: Seq[LocalNode] = LocalNode, +expectedAnswer: Seq[Row], +sortAnswers: Boolean = true): Unit = { +LocalNodeTest.checkAnswer( + input.map(dataFrameToSeqScanNode), nodeFunction, expectedAnswer, sortAnswers) match { + case Some(errorMessage) = fail(errorMessage) + case None = +} + } + + protected def dataFrameToSeqScanNode(df: DataFrame): SeqScanNode = { +new SeqScanNode( + df.queryExecution.sparkPlan.output, + df.queryExecution.toRdd.map(_.copy()).collect()) + } + +} + +/** + * Helper methods for writing tests of individual local physical operators. + */ +object LocalNodeTest { + + /** + * Runs the `LocalNode`s and makes sure the
[GitHub] spark pull request: [SPARK-10020][MLlib]: ML model broadcasts shou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8249#issuecomment-135513807 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10020][MLlib]: ML model broadcasts shou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8249#issuecomment-135513810 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41693/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10020][MLlib]: ML model broadcasts shou...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8249#issuecomment-135513716 [Test build #41693 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41693/console) for PR 8249 at commit [`6dd471f`](https://github.com/apache/spark/commit/6dd471fef77092f2a0406f82dada49f7fb176757). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/8464#discussion_r38128741 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/local/LimitNode.scala --- @@ -0,0 +1,45 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the License); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an AS IS BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.execution.local + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute + + +case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode { + + private[this] var count = 0 + + override def output: Seq[Attribute] = child.output + + override def open(): Unit = child.open() + + override def close(): Unit = child.close() + + override def get(): InternalRow = child.get() --- End diff -- `get` should probably not have `()`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10018][MLlib]: ML model broadcasts shou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8247#issuecomment-135469131 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10020][MLlib]: ML model broadcasts shou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8249#issuecomment-135469107 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10019][MLlib]: ML model broadcasts shou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8248#issuecomment-135469126 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10020][MLlib]: ML model broadcasts shou...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/8249#issuecomment-135468749 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8484#issuecomment-135469174 [Test build #41691 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41691/console) for PR 8484 at commit [`9f88786`](https://github.com/apache/spark/commit/9f88786b4efa89a7eddd81e6bba58700630a4429). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class LogisticRegressionModel @Since(1.3.0) (` * `class SVMModel @Since(1.1.0) (` * `class GaussianMixtureModel @Since(1.3.0) (` * `class KMeansModel @Since(1.1.0) (@Since(1.0.0) val clusterCenters: Array[Vector])` * `class PowerIterationClusteringModel @Since(1.3.0) (` * `class StreamingKMeansModel @Since(1.2.0) (` * `class StreamingKMeans @Since(1.2.0) (` * `class BinaryClassificationMetrics @Since(1.3.0) (` * `class MulticlassMetrics @Since(1.1.0) (predictionAndLabels: RDD[(Double, Double)]) ` * `class MultilabelMetrics @Since(1.2.0) (predictionAndLabels: RDD[(Array[Double], Array[Double])]) ` * `class RegressionMetrics @Since(1.2.0) (` * `class ChiSqSelectorModel @Since(1.3.0) (` * `class ChiSqSelector @Since(1.3.0) (` * `class ElementwiseProduct @Since(1.4.0) (` * `class IDF @Since(1.2.0) (@Since(1.2.0) val minDocFreq: Int) ` * `class Normalizer @Since(1.1.0) (p: Double) extends VectorTransformer ` * `class PCA @Since(1.4.0) (@Since(1.4.0) val k: Int) ` * `class StandardScaler @Since(1.1.0) (withMean: Boolean, withStd: Boolean) extends Logging ` * `class StandardScalerModel @Since(1.3.0) (` * `class FPGrowthModel[Item: ClassTag] @Since(1.3.0) (` * ` class FreqItemset[Item] @Since(1.3.0) (` * ` class FreqSequence[Item] @Since(1.5.0) (` * `class PrefixSpanModel[Item] @Since(1.5.0) (` * `class DenseMatrix @Since(1.3.0) (` * `class SparseMatrix @Since(1.3.0) (` * `class DenseVector @Since(1.0.0) (` * `class SparseVector @Since(1.0.0) (` * `class BlockMatrix @Since(1.3.0) (` * `class CoordinateMatrix @Since(1.0.0) (` * `class IndexedRowMatrix @Since(1.0.0) (` * `class RowMatrix @Since(1.0.0) (` * `class PoissonGenerator @Since(1.1.0) (` * `class ExponentialGenerator @Since(1.3.0) (` * `class GammaGenerator @Since(1.3.0) (` * `class LogNormalGenerator @Since(1.3.0) (` * `case class Rating @Since(0.8.0) (` * `class MatrixFactorizationModel @Since(0.8.0) (` * `abstract class GeneralizedLinearModel @Since(1.0.0) (` * `class IsotonicRegressionModel @Since(1.3.0) (` * `case class LabeledPoint @Since(1.0.0) (` * `class LassoModel @Since(1.1.0) (` * `class LinearRegressionModel @Since(1.1.0) (` * `class RidgeRegressionModel @Since(1.1.0) (` * `class MultivariateGaussian @Since(1.3.0) (` * `case class BoostingStrategy @Since(1.4.0) (` * `class Strategy @Since(1.3.0) (` * `class DecisionTreeModel @Since(1.0.0) (` * `class Node @Since(1.2.0) (` * `class Predict @Since(1.2.0) (` * `class RandomForestModel @Since(1.2.0) (` * `class GradientBoostedTreesModel @Since(1.2.0) (` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-135469129 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10015][MLlib]: ML model broadcasts shou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8243#issuecomment-135469145 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10018][MLlib]: ML model broadcasts shou...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/8247#issuecomment-135468873 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10015][MLlib]: ML model broadcasts shou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8243#issuecomment-135469113 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10019][MLlib]: ML model broadcasts shou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8248#issuecomment-135469085 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10018][MLlib]: ML model broadcasts shou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8247#issuecomment-135469097 Build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9890] [Doc] [ML] User guide for CountVe...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/8487#discussion_r38115529 --- Diff: docs/ml-features.md --- @@ -211,6 +211,87 @@ for feature in result.select(result).take(3): /div /div +## CountVectorizer + +As a transformer, `CountVectorizerModel` converts a collection of text documents to vectors of token counts. +It takes parameter `vocabulary: Array[String]` and produces sparse representations for the documents over the vocabulary, which can then be passed to other algorithms like LDA. + +When an a-priori dictionary is not available, `CountVectorizer` can be used as an Estimator to extract the vocabulary and generates a `CountVectorizerModel`. +It will select the top `vocabSize` words ordered by term frequency across the corpus. +An optional parameter minDF also affect the fitting process by specifying the minimum number (or fraction if 1.0) of documents a term must appear in to be included in the vocabulary. + +div class=codetabs +div data-lang=scala markdown=1 +More details can be found in the API docs for +[CountVectorizer](api/scala/index.html#org.apache.spark.ml.feature.CountVectorizer) and +[CountVectorizerModel](api/scala/index.html#org.apache.spark.ml.feature.CountVectorizerModel). +{% highlight scala %} +import org.apache.spark.ml.feature.CountVectorizer +import org.apache.spark.mllib.util.CountVectorizerModel + +val df = sqlContext.createDataFrame(Seq( + (0, Array(a, b, c)), + (1, Array(a, b, b, c, a)) +)).toDF(id, words) + +// define CountVectorizerModel with a-priori vocabulary +val cv = new CountVectorizerModel(Array(a, b, c)) + .setInputCol(words) + .setOutputCol(features) + +// alternatively, fit a CountVectorizerModel from the corpus +val cv2: CountVectorizerModel = new CountVectorizer() + .setInputCol(words) + .setOutputCol(features) + .setVocabSize(3) + .setMinDF(2) // a term must appear in more than 2 documents to be included in the vocabulary + .fit(df) + +cv.transform(df).select(features).collect() +{% endhighlight %} +/div + +div data-lang=java markdown=1 +More details can be found in the API docs for +[CountVectorizer](api/java/org/apache/spark/ml/feature/CountVectorizer.html) and +[CountVectorizerModel](api/java/org/apache/spark/ml/feature/CountVectorizerModel.html). +{% highlight java %} +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.ml.feature.CountVectorizer; +import org.apache.spark.ml.feature.CountVectorizerModel; +import org.apache.spark.sql.DataFrame; + +// Input data: Each row is a bag of words from a sentence or document. +JavaRDDRow jrdd = jsc.parallelize(Arrays.asList( + RowFactory.create(Arrays.asList(a b c.split( ))), + RowFactory.create(Arrays.asList(a b b c a.split( ))) +)); +StructType schema = new StructType(new StructField[]{ + new StructField(text, new ArrayType(DataTypes.StringType, true), false, Metadata.empty()) +}); +DataFrame documentDF = sqlContext.createDataFrame(jrdd, schema); + +// define CountVectorizerModel with a-priori vocabulary +CountVectorizerModel cv = new CountVectorizerModel(new String[]{a, b, c}) + .setInputCol(text) + .setOutputCol(feature); + +// alternatively, fit a CountVectorizerModel from the corpus +CountVectorizerModel cv2 = new CountVectorizer() + .setInputCol(text) + .setOutputCol(feature) + .setVocabSize(3) + .setMinDF(2) // a term must appear in more than 2 documents to be included in the vocabulary + .fit(documentDF); + +DataFrame result = cv.transform(documentDF); --- End diff -- use `cv.transform(documentDF).show()` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9890] [Doc] [ML] User guide for CountVe...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/8487#discussion_r38115521 --- Diff: docs/ml-features.md --- @@ -211,6 +211,87 @@ for feature in result.select(result).take(3): /div /div +## CountVectorizer + +As a transformer, `CountVectorizerModel` converts a collection of text documents to vectors of token counts. +It takes parameter `vocabulary: Array[String]` and produces sparse representations for the documents over the vocabulary, which can then be passed to other algorithms like LDA. + +When an a-priori dictionary is not available, `CountVectorizer` can be used as an Estimator to extract the vocabulary and generates a `CountVectorizerModel`. +It will select the top `vocabSize` words ordered by term frequency across the corpus. +An optional parameter minDF also affect the fitting process by specifying the minimum number (or fraction if 1.0) of documents a term must appear in to be included in the vocabulary. + +div class=codetabs +div data-lang=scala markdown=1 +More details can be found in the API docs for +[CountVectorizer](api/scala/index.html#org.apache.spark.ml.feature.CountVectorizer) and +[CountVectorizerModel](api/scala/index.html#org.apache.spark.ml.feature.CountVectorizerModel). +{% highlight scala %} +import org.apache.spark.ml.feature.CountVectorizer +import org.apache.spark.mllib.util.CountVectorizerModel + +val df = sqlContext.createDataFrame(Seq( + (0, Array(a, b, c)), + (1, Array(a, b, b, c, a)) +)).toDF(id, words) + +// define CountVectorizerModel with a-priori vocabulary +val cv = new CountVectorizerModel(Array(a, b, c)) + .setInputCol(words) + .setOutputCol(features) + +// alternatively, fit a CountVectorizerModel from the corpus +val cv2: CountVectorizerModel = new CountVectorizer() + .setInputCol(words) + .setOutputCol(features) + .setVocabSize(3) + .setMinDF(2) // a term must appear in more than 2 documents to be included in the vocabulary + .fit(df) + +cv.transform(df).select(features).collect() +{% endhighlight %} +/div + +div data-lang=java markdown=1 +More details can be found in the API docs for +[CountVectorizer](api/java/org/apache/spark/ml/feature/CountVectorizer.html) and +[CountVectorizerModel](api/java/org/apache/spark/ml/feature/CountVectorizerModel.html). +{% highlight java %} +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.ml.feature.CountVectorizer; +import org.apache.spark.ml.feature.CountVectorizerModel; +import org.apache.spark.sql.DataFrame; + +// Input data: Each row is a bag of words from a sentence or document. +JavaRDDRow jrdd = jsc.parallelize(Arrays.asList( + RowFactory.create(Arrays.asList(a b c.split( ))), + RowFactory.create(Arrays.asList(a b b c a.split( ))) +)); +StructType schema = new StructType(new StructField[]{ + new StructField(text, new ArrayType(DataTypes.StringType, true), false, Metadata.empty()) +}); +DataFrame documentDF = sqlContext.createDataFrame(jrdd, schema); --- End diff -- `documentDF` - `df` to be consistent with Scala code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10257][MLlib] Removes Guava from all sp...
Github user feynmanliang commented on the pull request: https://github.com/apache/spark/pull/8451#issuecomment-135486169 Whoops forgot to push the last commit, the Strings and default list size should be there now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Test pr
GitHub user semad opened a pull request: https://github.com/apache/spark/pull/8488 Test pr Pull Req 1 You can merge this pull request into a Git repository by running: $ git pull https://github.com/semad/spark test_pr Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8488.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8488 commit 8c2e18f1f3b09b982ea75f95a22b489f4924a9de Author: shahram emadi shahram@shahrams-macbook-pro.local Date: 2015-08-26T22:30:54Z Add GCP stuff) commit 31095671fde7b70de9f498ff048790872f1157c7 Author: semad se...@users.noreply.github.com Date: 2015-08-26T23:05:18Z Update build_remote.sh commit a091f158be9ae3189a37670866434a575bd0d968 Author: semad se...@users.noreply.github.com Date: 2015-08-26T23:22:05Z Update build_remote.sh commit 03e7d292a795b1d16f5c426b989d18cd9f86cf28 Author: semad se...@users.noreply.github.com Date: 2015-08-26T23:29:21Z Update README.md Test commits commit 9e582a065207db0b13aa146fef987bf1e52754fa Author: semad se...@users.noreply.github.com Date: 2015-08-26T23:54:27Z Update README.md commit e39a30b1166d16fabff797781fcfce4eb732ae93 Author: semad se...@users.noreply.github.com Date: 2015-08-27T16:27:59Z Update README.md --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10257][MLlib] Removes Guava from all sp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8451#issuecomment-135487161 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10257][MLlib] Removes Guava from all sp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8451#issuecomment-135487130 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-135482010 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Test pr
Github user semad closed the pull request at: https://github.com/apache/spark/pull/8488 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10257][MLlib] Removes Guava from all sp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8451#issuecomment-135487780 [Test build #41701 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41701/consoleFull) for PR 8451 at commit [`0695e51`](https://github.com/apache/spark/commit/0695e5157ff8bd76d49769d787200f6b4799a294). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4223] [Core] Support * in acls.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8398#issuecomment-135495613 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4223] [Core] Support * in acls.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8398#issuecomment-135495581 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10257][MLlib] Removes Guava from all sp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8451#issuecomment-135498467 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10257][MLlib] Removes Guava from all sp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8451#issuecomment-135498471 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41701/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7685][ML] Apply weights to different sa...
Github user feynmanliang commented on a diff in the pull request: https://github.com/apache/spark/pull/7884#discussion_r38122369 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -218,31 +217,59 @@ class LogisticRegression(override val uid: String) override def getThreshold: Double = super.getThreshold + /** + * Whether to over-/undersamples each of training sample according to the given + * weight in `weightCol`. If empty, all samples are supposed to have weights as 1.0. + * Default is empty, so all samples have weight one. + * @group setParam + */ + def setWeightCol(value: String): this.type = set(weightCol, value) + setDefault(weightCol - ) + override def setThresholds(value: Array[Double]): this.type = super.setThresholds(value) override def getThresholds: Array[Double] = super.getThresholds override protected def train(dataset: DataFrame): LogisticRegressionModel = { // Extract columns from data. If dataset is persisted, do not persist oldDataset. -val instances = extractLabeledPoints(dataset).map { - case LabeledPoint(label: Double, features: Vector) = (label, features) -} +val instances: Either[RDD[(Double, Vector)], RDD[(Double, Double, Vector)]] = + if ($(weightCol).isEmpty) { +Left(dataset.select($(labelCol), $(featuresCol)).map { + case Row(label: Double, features: Vector) = (label, features) +}) + } else { +Right(dataset.select($(labelCol), $(weightCol), $(featuresCol)).map { + case Row(label: Double, weight: Double, features: Vector) = +(label, weight, features) +}) + } + val handlePersistence = dataset.rdd.getStorageLevel == StorageLevel.NONE -if (handlePersistence) instances.persist(StorageLevel.MEMORY_AND_DISK) - -val (summarizer, labelSummarizer) = instances.treeAggregate( - (new MultivariateOnlineSummarizer, new MultiClassSummarizer))( -seqOp = (c, v) = (c, v) match { - case ((summarizer: MultivariateOnlineSummarizer, labelSummarizer: MultiClassSummarizer), - (label: Double, features: Vector)) = -(summarizer.add(features), labelSummarizer.add(label)) -}, -combOp = (c1, c2) = (c1, c2) match { - case ((summarizer1: MultivariateOnlineSummarizer, - classSummarizer1: MultiClassSummarizer), (summarizer2: MultivariateOnlineSummarizer, - classSummarizer2: MultiClassSummarizer)) = -(summarizer1.merge(summarizer2), classSummarizer1.merge(classSummarizer2)) - }) +if (handlePersistence) instances.fold(identity, identity).persist(StorageLevel.MEMORY_AND_DISK) + +val (summarizer, labelSummarizer) = { + val combOp = (c1: (MultivariateOnlineSummarizer, MultiClassSummarizer), +c2: (MultivariateOnlineSummarizer, MultiClassSummarizer)) = + (c1._1.merge(c2._1), c1._2.merge(c2._2)) + + instances match { --- End diff -- OK I see what's going on; `fold` on the either expects two functions into the same type so type inference is inferring an upper bound for `RDD[(Double, Vector)]` and `RDD[(Double, Double, Vector)]` whereas in the earlier code `instances` was bound by the concrete types within the `Either`. We can leave as is or remove the `Either`s and use `RDD[(Double, 1.0, Vector)]` for the unweighted instances; I am a fan of removing the `Either`s since that will reduce pattern matching code but both approaches are acceptable to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/8464#discussion_r38127111 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/local/LimitNode.scala --- @@ -0,0 +1,45 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the License); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an AS IS BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.execution.local + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute + + +case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode { --- End diff -- Maybe more generally, if we are never going to do transformations of these iterator trees, do they need to inherit from `TreeNode`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8484#issuecomment-135514282 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4223] [Core] Support * in acls.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8398#issuecomment-135493699 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10316][SQL] respect nondeterministic ex...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8486#issuecomment-135496815 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41690/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4223] [Core] Support * in acls.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8398#issuecomment-135497162 [Test build #41703 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41703/consoleFull) for PR 8398 at commit [`b1d49b3`](https://github.com/apache/spark/commit/b1d49b32a7b1e85c265a8cee8930d0138fd3bd8d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10257][MLlib] Removes Guava from all sp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8451#issuecomment-135498118 [Test build #41701 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41701/console) for PR 8451 at commit [`0695e51`](https://github.com/apache/spark/commit/0695e5157ff8bd76d49769d787200f6b4799a294). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7685][ML] Apply weights to different sa...
Github user feynmanliang commented on the pull request: https://github.com/apache/spark/pull/7884#issuecomment-135501418 LGTM, I slightly prefer the `RDD[(Double, 1.0, Vector)]` approach but it's your call --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-135502050 [Test build #41702 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41702/console) for PR 8436 at commit [`074583e`](https://github.com/apache/spark/commit/074583e2fb5b31275f94af5d35f58fa0f2737c50). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class JavaStopWordsRemoverSuite ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8510] [CORE] [PYSPARK] NumPy matrices a...
Github user paberline-rms commented on the pull request: https://github.com/apache/spark/pull/8384#issuecomment-135505761 JIRA: https://issues.apache.org/jira/browse/SPARK-8510 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10182] [MLlib] GeneralizedLinearModel d...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8395 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9148][SPARK-10252][SQL] Update SQL Prog...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/8441#discussion_r38126088 --- Diff: docs/sql-programming-guide.md --- @@ -1696,12 +1711,16 @@ version specified by users. An isolated classloader is used here to avoid depend property can be one of three options: ol licodebuiltin/code/li -Use Hive 0.13.1, which is bundled with the Spark assembly jar when code-Phive/code is +Use Hive 1.2.1, which is bundled with the Spark assembly jar when code-Phive/code is enabled. When this option is chosen, codespark.sql.hive.metastore.version/code must be -either code0.13.1/code or not defined. +either code1.2.1/code or not defined. licodemaven/code/li -Use Hive jars of specified version downloaded from Maven repositories. -liA classpath in the standard format for both Hive and Hadoop./li +Use Hive jars of specified version downloaded from Maven repositories. This configuration +is not generally recommended for production deployments. +liA classpath in the standard format for the JVM. This classpath must include all of Hive +and its dependencies, including the correct version of Hadoop. These jars only need to be +present on the driver, but if you are running in yarn client mode then you must ensure --- End diff -- Correct, they are only used by the driver to get metadata. Thanks for the clarification on cluster vs client. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9148][SPARK-10252][SQL] Update SQL Prog...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8441#issuecomment-135508406 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9148][SPARK-10252][SQL] Update SQL Prog...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8441#issuecomment-135513991 [Test build #41704 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41704/console) for PR 8441 at commit [`f3fdf62`](https://github.com/apache/spark/commit/f3fdf625b0b092984d8d5f0e733a130ff9ff92b4). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8484#issuecomment-135514287 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41700/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7685][ML] Apply weights to different sa...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/7884#issuecomment-135503606 I know Xiangrui is using `RDD[(Double, 1.0, Vector)]` in isotonic regression, so I don't mind as well as long as everyone is on the same page. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10257][MLlib] Removes Guava from all sp...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8451 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8510] [CORE] [PYSPARK] NumPy matrices a...
Github user paberline commented on the pull request: https://github.com/apache/spark/pull/8384#issuecomment-135506108 JIRA: https://issues.apache.org/jira/browse/SPARK-8510 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10182] [MLlib] GeneralizedLinearModel d...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/8395#issuecomment-135507172 (PS not sure why it doesn't seem to show up, but the tests passed again after the last commit: https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1685/console ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9148][SPARK-10252][SQL] Update SQL Prog...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8441#issuecomment-135508384 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/8464#discussion_r38126751 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/local/LocalNodeTest.scala --- @@ -0,0 +1,189 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the License); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an AS IS BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.execution.local + +import scala.util.control.NonFatal + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow} +import org.apache.spark.sql.catalyst.util._ +import org.apache.spark.sql.{DataFrame, Row} +import org.apache.spark.sql.types.StructType + +class LocalNodeTest extends SparkFunSuite { + + /** + * Runs the LocalNode and makes sure the answer matches the expected result. + * @param input the input data to be used. + * @param nodeFunction a function which accepts the input LocalNode and uses it to instantiate + * the local physical operator that's being tested. + * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s. + * @param sortAnswers if true, the answers will be sorted by their toString representations prior + *to being compared. + */ + protected def checkAnswer( + input: DataFrame, + nodeFunction: LocalNode = LocalNode, + expectedAnswer: Seq[Row], + sortAnswers: Boolean = true): Unit = { +doCheckAnswer( + input :: Nil, + nodes = nodeFunction(nodes.head), + expectedAnswer, + sortAnswers) + } + + /** + * Runs the LocalNode and makes sure the answer matches the expected result. + * @param left the left input data to be used. + * @param right the right input data to be used. + * @param nodeFunction a function which accepts the input LocalNode and uses it to instantiate + * the local physical operator that's being tested. + * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s. + * @param sortAnswers if true, the answers will be sorted by their toString representations prior + *to being compared. + */ + protected def checkAnswer2( --- End diff -- Just name this `checkAnswer`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/8464#discussion_r38127948 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/local/UnionNode.scala --- @@ -0,0 +1,75 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the License); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an AS IS BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.execution.local + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute + +case class UnionNode(children: Seq[LocalNode]) extends LocalNode { --- End diff -- Consider making this an `Array[LocalNode]`. In general, we should probably only be using `Array` as this level of execution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-135491108 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-135492754 [Test build #41702 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41702/consoleFull) for PR 8436 at commit [`074583e`](https://github.com/apache/spark/commit/074583e2fb5b31275f94af5d35f58fa0f2737c50). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4223] [Core] Support * in acls.
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/8398#issuecomment-135495019 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9089] [Core] Fallback to another one if...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8337#issuecomment-135495054 [Test build #41689 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41689/console) for PR 8337 at commit [`573a37c`](https://github.com/apache/spark/commit/573a37c6a541d6993d6a45a2f7977056e936b05d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org