[GitHub] spark pull request: [SPARK-4226][SQL] Support Correlated Sub-queri...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12306#discussion_r60008476 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -110,6 +110,31 @@ trait CheckAnalysis { s"filter expression '${f.condition.sql}' " + s"of type ${f.condition.dataType.simpleString} is not a boolean.") + case f @ Filter(condition, child) => +// Make sure no correlated predicate is in an OUTER join, because this could change the +// semantics of the join. +lazy val attributes: Set[Expression] = child.output.toSet +def checkCorrelatedPredicates(p: PredicateSubquery): Unit = p.query.foreach { + case j @ Join(left, right, jt, _) if jt != Inner => +j.transformAllExpressions { + case e if attributes.contains(e) => +failAnalysis(s"Accessing outer query column is not allowed in outer joins: $e") +} + case _ => +} +splitConjunctivePredicates(condition).foreach { + case p: PredicateSubquery => +checkCorrelatedPredicates(p) + case Not(InSubQuery(_, query)) if query.output.exists(_.nullable) => +failAnalysis("NOT IN with nullable subquery is not supported. " + --- End diff -- This is something we should support eventually, but not now. so throwing a NotImplementedException is not worse than throwing an AnalysisError. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] Support Correlated Sub-queri...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/12306#discussion_r60008271 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -110,6 +110,31 @@ trait CheckAnalysis { s"filter expression '${f.condition.sql}' " + s"of type ${f.condition.dataType.simpleString} is not a boolean.") + case f @ Filter(condition, child) => +// Make sure no correlated predicate is in an OUTER join, because this could change the +// semantics of the join. +lazy val attributes: Set[Expression] = child.output.toSet +def checkCorrelatedPredicates(p: PredicateSubquery): Unit = p.query.foreach { + case j @ Join(left, right, jt, _) if jt != Inner => +j.transformAllExpressions { + case e if attributes.contains(e) => +failAnalysis(s"Accessing outer query column is not allowed in outer joins: $e") +} + case _ => +} +splitConjunctivePredicates(condition).foreach { + case p: PredicateSubquery => +checkCorrelatedPredicates(p) + case Not(InSubQuery(_, query)) if query.output.exists(_.nullable) => +failAnalysis("NOT IN with nullable subquery is not supported. " + --- End diff -- Yeah, this is possible. It would require us to move the entire rule to the SparkStrategy (it will be messy to pattern match a rewritten NAAJ). The downside of this is, is that SparkStrategy can throw Analysis error which should have been thrown before. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] Support Correlated Sub-queri...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12306#discussion_r60008194 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -110,6 +110,31 @@ trait CheckAnalysis { s"filter expression '${f.condition.sql}' " + s"of type ${f.condition.dataType.simpleString} is not a boolean.") + case f @ Filter(condition, child) => +// Make sure no correlated predicate is in an OUTER join, because this could change the +// semantics of the join. +lazy val attributes: Set[Expression] = child.output.toSet +def checkCorrelatedPredicates(p: PredicateSubquery): Unit = p.query.foreach { + case j @ Join(left, right, jt, _) if jt != Inner => +j.transformAllExpressions { + case e if attributes.contains(e) => +failAnalysis(s"Accessing outer query column is not allowed in outer joins: $e") --- End diff -- Then `l.id` should be part of right child of join, not part of the join condition, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14453]Remove the support of SPARK_JAVA_...
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/12465 [SPARK-14453]Remove the support of SPARK_JAVA_OPTS env variable ## What changes were proposed in this pull request? `SPARK_JAVA_OPTS` is deprecated since Spark 1.0, it is recommended to use `SparkConf` rather than env variable, so here propose to remove this variable. ## How was this patch tested? Unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark SPARK-14453 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12465.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12465 commit dd6e3ca465adeec5a8ae8dcb858311b49614bc0f Author: jerryshaoDate: 2016-04-18T05:28:11Z Remove SPARK_JAVA_OPTS env variable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] Support Correlated Sub-queri...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/12306#discussion_r60007877 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -110,6 +110,31 @@ trait CheckAnalysis { s"filter expression '${f.condition.sql}' " + s"of type ${f.condition.dataType.simpleString} is not a boolean.") + case f @ Filter(condition, child) => +// Make sure no correlated predicate is in an OUTER join, because this could change the +// semantics of the join. +lazy val attributes: Set[Expression] = child.output.toSet +def checkCorrelatedPredicates(p: PredicateSubquery): Unit = p.query.foreach { + case j @ Join(left, right, jt, _) if jt != Inner => +j.transformAllExpressions { + case e if attributes.contains(e) => +failAnalysis(s"Accessing outer query column is not allowed in outer joins: $e") --- End diff -- A filter can be below a join, i.e.: SELECT * FROM l WHERE EXISTS(SELECT * FROM r LEFT JOIN (SELECT * FROM s WHERE s.id = l.id) t ON t.id = r.id) But sure I'll add a test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13363][SQL] support Aggregator in Relat...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12451#issuecomment-211213074 Merging this failed on my laptop. Can you try merge this yourself? I think we are good to go since the failed tests are unrelated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13363][SQL] support Aggregator in Relat...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12451#issuecomment-211212763 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14398] [SQL] Audit non-reserved keyword...
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/12191#issuecomment-211212703 @bomeng sorry for not getting back to you sooner. Is sorting the list only for asthetics and ease of searching? It seems like it is not really worth effort if it is, what do you think? It might have a little merit in terms of performance to group all `nonReserved` keywords together. The parser has to check if a Token is on the nonReserved list and it does this by switch statements. Having a complete range of nonReserved tokens might allow a JIT/Compiler to optimize this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] Support Correlated Sub-queri...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12306#discussion_r60007659 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -110,6 +110,31 @@ trait CheckAnalysis { s"filter expression '${f.condition.sql}' " + s"of type ${f.condition.dataType.simpleString} is not a boolean.") + case f @ Filter(condition, child) => +// Make sure no correlated predicate is in an OUTER join, because this could change the +// semantics of the join. +lazy val attributes: Set[Expression] = child.output.toSet +def checkCorrelatedPredicates(p: PredicateSubquery): Unit = p.query.foreach { + case j @ Join(left, right, jt, _) if jt != Inner => +j.transformAllExpressions { + case e if attributes.contains(e) => +failAnalysis(s"Accessing outer query column is not allowed in outer joins: $e") --- End diff -- Since we only resolve the subquery inside Filter, how is this possible? Could you have sql test for this case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12810][PySpark] PySpark CrossValidatorM...
Github user vectorijk commented on the pull request: https://github.com/apache/spark/pull/12464#issuecomment-211210916 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14687][Core][SQL][MLlib] Call path.getF...
Github user lw-lin commented on the pull request: https://github.com/apache/spark/pull/12450#issuecomment-211209966 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] Support Correlated Sub-queri...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12306#discussion_r60007367 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -110,6 +110,31 @@ trait CheckAnalysis { s"filter expression '${f.condition.sql}' " + s"of type ${f.condition.dataType.simpleString} is not a boolean.") + case f @ Filter(condition, child) => +// Make sure no correlated predicate is in an OUTER join, because this could change the +// semantics of the join. +lazy val attributes: Set[Expression] = child.output.toSet --- End diff -- AttributeSet ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] Support Correlated Sub-queri...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12306#discussion_r60007268 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -110,6 +110,31 @@ trait CheckAnalysis { s"filter expression '${f.condition.sql}' " + s"of type ${f.condition.dataType.simpleString} is not a boolean.") + case f @ Filter(condition, child) => +// Make sure no correlated predicate is in an OUTER join, because this could change the +// semantics of the join. +lazy val attributes: Set[Expression] = child.output.toSet +def checkCorrelatedPredicates(p: PredicateSubquery): Unit = p.query.foreach { + case j @ Join(left, right, jt, _) if jt != Inner => +j.transformAllExpressions { + case e if attributes.contains(e) => +failAnalysis(s"Accessing outer query column is not allowed in outer joins: $e") +} + case _ => +} +splitConjunctivePredicates(condition).foreach { + case p: PredicateSubquery => +checkCorrelatedPredicates(p) + case Not(InSubQuery(_, query)) if query.output.exists(_.nullable) => +failAnalysis("NOT IN with nullable subquery is not supported. " + --- End diff -- Sometime the nullability of columns could be propagated in optimizer, could we move this check in SparkStrategy (when picking up a physical plan)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14642][SQL] import org.apache.spark.sql...
Github user sbcd90 commented on the pull request: https://github.com/apache/spark/pull/12458#issuecomment-211207173 Hello @yhuai , I tested the scenario you mentioned & it works fine for me without any errors. However, as you rightly mentioned, the following statement need not be removed. ``` import implicits._ ``` Hence, I reverted that particular change. Please comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14647][SQL] Group SQLContext/HiveContex...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12463#issuecomment-211205660 **[Test build #2804 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2804/consoleFull)** for PR 12463 at commit [`fe35d41`](https://github.com/apache/spark/commit/fe35d4126daef2d5de04d982b62d744d3350b1dd). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12810][PySpark] PySpark CrossValidatorM...
Github user vectorijk commented on the pull request: https://github.com/apache/spark/pull/12464#issuecomment-211205576 cc @feynmanliang @jkbradley @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14647][SQL] Group SQLContext/HiveContex...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12463#issuecomment-211205635 **[Test build #2805 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2805/consoleFull)** for PR 12463 at commit [`fe35d41`](https://github.com/apache/spark/commit/fe35d4126daef2d5de04d982b62d744d3350b1dd). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12810][PySpark] PySpark CrossValidatorM...
GitHub user vectorijk opened a pull request: https://github.com/apache/spark/pull/12464 [SPARK-12810][PySpark] PySpark CrossValidatorModel should support avgMetrics ## What changes were proposed in this pull request? support avgMetrics in CrossValidatorModel with Python ## How was this patch tested? Doctest and `test_save_load` in `pyspark/ml/test.py` [JIRA](https://issues.apache.org/jira/browse/SPARK-12810) You can merge this pull request into a Git repository by running: $ git pull https://github.com/vectorijk/spark spark-12810 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12464.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12464 commit 93a43bc5007acbc9c55a3eaf591f2b16df614c68 Author: Kai JiangDate: 2016-04-16T19:27:44Z supporting avgMetrics in CrossValidatorModel with Python --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14127][SQL][WIP] Describe table
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/12460#discussion_r60006341 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -254,6 +251,21 @@ class SparkSqlAstBuilder extends AstBuilder { } } + /** +* A column path can be specified as an parameter to describe command. It is a dot separated +* elements where the last element can be a String. +* TODO - check with Herman --- End diff -- cc @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14628][CORE][folllow-up] Always trackin...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12462#discussion_r60006203 --- Diff: core/src/main/scala/org/apache/spark/status/api/v1/api.scala --- @@ -172,10 +172,10 @@ class TaskMetrics private[spark]( val resultSerializationTime: Long, val memoryBytesSpilled: Long, val diskBytesSpilled: Long, -val inputMetrics: Option[InputMetrics], -val outputMetrics: Option[OutputMetrics], -val shuffleReadMetrics: Option[ShuffleReadMetrics], -val shuffleWriteMetrics: Option[ShuffleWriteMetrics]) +val inputMetrics: InputMetrics, --- End diff -- one thing - can you verify that this doesn't change the json output? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] Support Correlated Sub-queri...
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/12306#issuecomment-211199893 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14603] [SQL] [WIP] Verification of Meta...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/12385#issuecomment-211195724 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14691] [SQL] Simplify and Unify Error G...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/12459#issuecomment-211195477 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14647][SQL] Group SQLContext/HiveContex...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12463#issuecomment-211188821 **[Test build #2805 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2805/consoleFull)** for PR 12463 at commit [`fe35d41`](https://github.com/apache/spark/commit/fe35d4126daef2d5de04d982b62d744d3350b1dd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14647][SQL] Group SQLContext/HiveContex...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12463#issuecomment-211189391 **[Test build #2807 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2807/consoleFull)** for PR 12463 at commit [`fe35d41`](https://github.com/apache/spark/commit/fe35d4126daef2d5de04d982b62d744d3350b1dd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14647][SQL] Group SQLContext/HiveContex...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12463#issuecomment-211189182 **[Test build #2806 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2806/consoleFull)** for PR 12463 at commit [`fe35d41`](https://github.com/apache/spark/commit/fe35d4126daef2d5de04d982b62d744d3350b1dd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14647][SQL] Group SQLContext/HiveContex...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12463#issuecomment-211188665 **[Test build #2804 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2804/consoleFull)** for PR 12463 at commit [`fe35d41`](https://github.com/apache/spark/commit/fe35d4126daef2d5de04d982b62d744d3350b1dd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14647][SQL] Group SQLContext/HiveContex...
GitHub user yhuai opened a pull request: https://github.com/apache/spark/pull/12463 [SPARK-14647][SQL] Group SQLContext/HiveContext state into SharedState ## What changes were proposed in this pull request? This patch adds a SharedState that groups state shared across multiple SQLContexts. This is analogous to the SessionState added in SPARK-13526 that groups session-specific state. This cleanup makes the constructors of the contexts simpler and ultimately allows us to remove HiveContext in the near future. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yhuai/spark sharedState Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12463.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12463 commit fe35d4126daef2d5de04d982b62d744d3350b1dd Author: Yin HuaiDate: 2016-04-18T04:02:23Z Revert "Revert "[SPARK-14647][SQL] Group SQLContext/HiveContext state into SharedState"" This reverts commit 7de06a646dff7ede520d2e982ac0996d8c184650. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14642][SQL] import org.apache.spark.sql...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12458#discussion_r60003678 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/typed.scala --- @@ -46,8 +46,6 @@ object typed { override protected def _sqlContext: SQLContext = null } - import implicits._ --- End diff -- Do we need to remove this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14642][SQL] import org.apache.spark.sql...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12458#issuecomment-211181637 Can you try ``` import org.apache.spark.sql._ import org.apache.spark.sql.functions._ import org.apache.spark.sql.expressions._ import org.apache.spark.sql.types._ udf((v: String) => v.stripSuffix("-abc")) ``` and see if it throws something like the following exception? ``` error: object lang is not a member of package org.apache.spark.sql.expressions.java val normalizeVersion = udf((v: java.lang.String) => v.stripSuffix("-abc")) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14139][SQL] RowEncoder should preserve ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12364#issuecomment-211179912 OK please label it as a todo for future pr; otherwise it is difficult to tell if it is meant to be done as the current one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14139][SQL] RowEncoder should preserve ...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12364#issuecomment-211178558 yea, it's still valid, but I'd like to do it in follow-ups, as this PR is taken over from other people, I don't want to enlarge the scope too much. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14628][CORE][folllow-up] Always trackin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12462#issuecomment-211176510 **[Test build #2803 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2803/consoleFull)** for PR 12462 at commit [`08874a0`](https://github.com/apache/spark/commit/08874a0bf6e6d4512a4832f7564349a67c6257d3). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14557][SQL] Reading textfile (created t...
Github user kasjain commented on the pull request: https://github.com/apache/spark/pull/12356#issuecomment-211176212 Can any of the admin verify the above fix? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/11978#issuecomment-211175800 @srowen - As per andrew's comment, I thought it was fine to make the change given that HadoopRDD is marked as DeveloperAPI. Please let me know if any additional changes are needed. Additional info: Huge amount of changes in SPARK-13664 for FileSourceStrategy which is marked as the default codepath. So ideally, OrcRelation would no longer go via this codepath by default. Given that, this PR would have an impact if someone is trying to directly invoke HadoopRDD and has done closure clearing upfront. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14628][CORE][folllow-up] Always trackin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12462#issuecomment-211173829 **[Test build #2803 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2803/consoleFull)** for PR 12462 at commit [`08874a0`](https://github.com/apache/spark/commit/08874a0bf6e6d4512a4832f7564349a67c6257d3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14628][CORE][folllow-up] Always trackin...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12462#issuecomment-211173368 LGTM pending tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/12259#issuecomment-211169780 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14127][SQL][WIP] Describe table
Github user dilipbiswal commented on the pull request: https://github.com/apache/spark/pull/12460#issuecomment-211169665 @gatorsmile Thank you. I have resolved the conflicts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14628][CORE][folllow-up] Always trackin...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12462#discussion_r60001921 --- Diff: core/src/test/resources/HistoryServerExpectations/application_list_json_expectation.json --- @@ -2,108 +2,108 @@ "id" : "local-1430917381534", "name" : "Spark shell", "attempts" : [ { -"startTimeEpoch" : 1430917380893, -"endTimeEpoch" : 1430917391398, -"lastUpdatedEpoch" : 0, "startTime" : "2015-05-06T13:03:00.893GMT", "endTime" : "2015-05-06T13:03:11.398GMT", "lastUpdated" : "", "duration" : 10505, "sparkUser" : "irashid", -"completed" : true +"completed" : true, +"startTimeEpoch" : 1430917380893, +"endTimeEpoch" : 1430917391398, +"lastUpdatedEpoch" : 0 } ] }, { "id" : "local-1430917381535", "name" : "Spark shell", "attempts" : [ { "attemptId" : "2", -"startTimeEpoch" : 1430917380893, -"endTimeEpoch" : 1430917380950, -"lastUpdatedEpoch" : 0, "startTime" : "2015-05-06T13:03:00.893GMT", "endTime" : "2015-05-06T13:03:00.950GMT", "lastUpdated" : "", "duration" : 57, "sparkUser" : "irashid", -"completed" : true +"completed" : true, +"startTimeEpoch" : 1430917380893, +"endTimeEpoch" : 1430917380950, +"lastUpdatedEpoch" : 0 }, { "attemptId" : "1", -"startTimeEpoch" : 1430917380880, -"endTimeEpoch" : 1430917380890, -"lastUpdatedEpoch" : 0, --- End diff -- When I re-generate the gold answer, these 3 fields are moved to the last. It seems that when we introduce these 3 fields at https://github.com/apache/spark/pull/11326, we didn't re-generate gold answer, but update the answer files manually. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14628][CORE][folllow-up] Always trackin...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/12462 [SPARK-14628][CORE][folllow-up] Always tracking read/write metrics ## What changes were proposed in this pull request? This PR is a follow up for https://github.com/apache/spark/pull/12417, now we always track input/output/shuffle metrics in spark JSON protocol and status API. Most of the line changes are because of re-generating the gold answer for `HistoryServerSuite`, and we add a lot of 0 values for read/write metrics. ## How was this patch tested? existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark follow Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12462.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12462 commit 5d979c8dd2aa1ce36c7d5fae65b3ab0e0021ab7a Author: Wenchen FanDate: 2016-04-18T02:24:29Z cleanup commit 08874a0bf6e6d4512a4832f7564349a67c6257d3 Author: Wenchen Fan Date: 2016-04-18T02:57:13Z regenerate gold answer for HistoryServerSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14628][CORE][folllow-up] Always trackin...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12462#issuecomment-211167735 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14409][ML][WIP] Adding a RankingEvaluat...
GitHub user yongtang opened a pull request: https://github.com/apache/spark/pull/12461 [SPARK-14409][ML][WIP] Adding a RankingEvaluator to ML ## What changes were proposed in this pull request? This patch tries to add the implementation of Mean Rreciprocal Rank (MRR) in mllib.evaluation, as a first step toward adding a RankingEvaluator to ML. ## How was this patch tested? Additional test cast has been added to cover Mean Rreciprocal Rank (MRR). ## NOTE: This patch is a work in progress. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yongtang/spark SPARK-14409 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12461.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12461 commit 2838c240252e7a0578377d360f1a62e00bb48507 Author: Yong TangDate: 2016-04-18T02:55:09Z [SPARK-14409][ML][WIP] Adding a RankingEvaluator to ML This patch tries to add the implementation of Mean Rreciprocal Rank (MRR) in mllib.evaluation, as a first step toward adding a RankingEvaluator to ML. Additional test cast has been added to cover Mean Rreciprocal Rank (MRR). This patch is a work in progress. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...
Github user rajeshbalamohan commented on a diff in the pull request: https://github.com/apache/spark/pull/12105#discussion_r60001514 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -368,7 +369,10 @@ abstract class UnixTime extends BinaryExpression with ExpectsInputTypes { t.asInstanceOf[Long] / 100L case StringType if right.foldable => if (constFormat != null) { -Try(new SimpleDateFormat(constFormat.toString).parse( +if (formatter == null) { + formatter = Try(new SimpleDateFormat(constFormat.toString)).getOrElse(null) --- End diff -- Didn't want to throw the error back as it would break the earlier functionality. Eearlier it was returning null when any exception (i.e, could be constFormat being null, or parsing error) was thrown. Creating the formatter upfront in the recent commit and handling null earlier itself, to have minimal changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14685] [CORE] Document heritability of ...
Github user marcintustin commented on the pull request: https://github.com/apache/spark/pull/12455#issuecomment-211164186 The test failures are probably bogus, as per http://mail-archives.apache.org/mod_mbox/spark-dev/201604.mbox/%3CCAMFhwAYRbN0yJGwzvrY8atzS9CCudzioF%3DbcGogCwPq3gPC6Uw%40mail.gmail.com%3E --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14127][SQL][WIP] Describe table
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/12460#issuecomment-211163664 Please resolve the conflicts. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14127][SQL][WIP] Describe table
Github user dilipbiswal commented on the pull request: https://github.com/apache/spark/pull/12460#issuecomment-211163482 @andrewor14 Looking for some early feedback on this as i was thinking to do the same for show table extended. I did have a brief discussion with @gatorsmile on this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14127] Describe table
GitHub user dilipbiswal opened a pull request: https://github.com/apache/spark/pull/12460 [SPARK-14127] Describe table ## What changes were proposed in this pull request? This PR adds .support for describing partitions and columns. Support for describing tables were already in place. The PR moves the code to SessionCatalog/HiveSessionCatalog. Command Syntax: ``` SQL DESCRIBE [EXTENDED|FORMATTED] [db_name.]table_name [column_name] [PARTITION partition_spec] ``` ## How was this patch tested? Added test cases to DDLCommandSuite to verify the plan. Added some error tests to HiveCommandSuite. The rest of the coverage should be from existing test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dilipbiswal/spark dkb_desc_tbl Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12460.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12460 commit 5b349dae9f15e9c0e08c00d77d5baac6ae0d85a2 Author: Dilip BiswalDate: 2016-04-14T05:33:08Z [SPARK-14127] Describe table --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13382][DOCS][PYSPARK] Update pyspark te...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/11278#issuecomment-211162111 re-ping @JoshRosen thoughts? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14635] [ML] Documentation and Examples ...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/12454#discussion_r6837 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/TfIdfExample.scala --- @@ -20,7 +20,7 @@ package org.apache.spark.examples.ml import org.apache.spark.{SparkConf, SparkContext} // $example on$ -import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer} --- End diff -- Thanks. I'll send an update. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14635] [ML] Documentation and Examples ...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/12454#discussion_r6631 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/TfIdfExample.scala --- @@ -20,7 +20,7 @@ package org.apache.spark.examples.ml import org.apache.spark.{SparkConf, SparkContext} // $example on$ -import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer} --- End diff -- Why did we add CountVectorizer as an import here and not in JavaTfIdfExample? Since we aren't referencing it in code in either I'd probably leave it out of both personally (but either way consistency would be best). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14564] [ML] [MLlib] [PySpark] Python Wo...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/12428#issuecomment-211156441 Thanks for taking the initiative to do this. A few minor comments from the first pass through, but in the meantime maybe one the admins (possibly @jkbradley) could either say ok to jenkins to test or add to the whitelist? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14564] [ML] [MLlib] [PySpark] Python Wo...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/12428#discussion_r5886 --- Diff: python/pyspark/ml/feature.py --- @@ -2245,6 +2248,21 @@ def getMinCount(self): """ return self.getOrDefault(self.minCount) +@since("2.0.0") +def setWindowSize(self, value): +""" +Sets the value of :py:attr:`windowSize`. +""" +self._paramMap[self.windowSize] = value --- End diff -- Rather than directly accessing the param map, use the `_set` function (see https://github.com/apache/spark/pull/11939 / SPARK-14104) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14564] [ML] [MLlib] [PySpark] Python Wo...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/12428#discussion_r5797 --- Diff: python/pyspark/ml/feature.py --- @@ -2173,28 +2173,31 @@ class Word2Vec(JavaEstimator, HasStepSize, HasMaxIter, HasSeed, HasInputCol, Has minCount = Param(Params._dummy(), "minCount", "the minimum number of times a token must appear to be included in the " + "word2vec model's vocabulary", typeConverter=TypeConverters.toInt) +windowSize = Param(Params._dummy(), "windowSize", + "the window size (context words from [-window, window])", --- End diff -- Should mention the default value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14483][WEBUI] Display user name for eac...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/12257#issuecomment-211148808 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13363][SQL] support Aggregator in Relat...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12451#issuecomment-211148786 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14691] [SQL] Simplify and Unify Error G...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/12459 [SPARK-14691] [SQL] Simplify and Unify Error Generation for Unsupported Alter Table DDL What changes were proposed in this pull request? So far, we are capturing each unsupported Alter Table in separate visit functions. They should be unified and issue the same ParseException instead. This PR is to refactor the existing implementation and make error message consistent for Alter Table DDL. How was this patch tested? Updated the existing test cases and also added new test cases to ensure all the unsupported statements are covered. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark cleanAlterTable Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12459.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12459 commit 07a07795baa5ff8ffd4bb72237e872d5b684159e Author: gatorsmileDate: 2016-04-18T01:12:09Z Simplify and unify the exceptions for unsupported alter table DDL statements. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14423][YARN] Avoid same name files adde...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/12203#issuecomment-211141690 @vanzin , please help to review again, thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14642][SQL] import org.apache.spark.sql...
Github user sbcd90 commented on the pull request: https://github.com/apache/spark/pull/12458#issuecomment-211140805 Hi @rxin , I updated the title & description. Please have a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14609][SQL] Native support for LOAD DAT...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/12412#issuecomment-211140325 cc @andrewor14 @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/12259#issuecomment-211140342 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR] Revert removing explicit typing (chang...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12452#issuecomment-211139508 (The tests `org.apache.spark.sql.hive.HiveSparkSubmitSuite.SPARK-8020: set sql conf in spark conf ` and `org.apache.spark.sql.hive.HiveSparkSubmitSuite.SPARK-9757 Persist Parquet relation with decimal column` pass in my local) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR] Revert removing explicit typing (chang...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12452#issuecomment-211139260 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14642][SQL] import org.apache.spark.sql...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12458#issuecomment-211138452 Can you update the pull request to have the full title and alos fix the description to elaborate the problem? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [DO_NOT_MERGE][Test] Investigate MemorySinkSui...
Github user lw-lin closed the pull request at: https://github.com/apache/spark/pull/12443 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] Support Correlated Sub-queri...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12306#issuecomment-211136525 **[Test build #2802 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2802/consoleFull)** for PR 12306 at commit [`b473240`](https://github.com/apache/spark/commit/b473240ddf6ee8e4867f923184048d52afe0498a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14685] [CORE] Document heritability of ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12455#issuecomment-211135488 **[Test build #2801 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2801/consoleFull)** for PR 12455 at commit [`b96cde1`](https://github.com/apache/spark/commit/b96cde118c1265bf37ac7036581b8bb1bef80ee0). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14642][SQL] import org.apache.spark.sql...
GitHub user sbcd90 opened a pull request: https://github.com/apache/spark/pull/12458 [SPARK-14642][SQL] import org.apache.spark.sql.expressions._ breaks u⦠## What changes were proposed in this pull request? PR fixes the import issue which breaks udf functions. ## How was this patch tested? patch tested with unit tests. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) â¦df under functions You can merge this pull request into a Git repository by running: $ git pull https://github.com/sbcd90/spark udfFuncBreak Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12458.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12458 commit e3bc75f73b322812f24e58c86655f7bb2b304a4b Author: Subhobrata DeyDate: 2016-04-17T23:33:12Z [SPARK-14642][SQL] import org.apache.spark.sql.expressions._ breaks udf under functions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/12259#issuecomment-211127633 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] Support Correlated Sub-queri...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12306#issuecomment-211126406 **[Test build #2802 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2802/consoleFull)** for PR 12306 at commit [`b473240`](https://github.com/apache/spark/commit/b473240ddf6ee8e4867f923184048d52afe0498a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR] Revert removing explicit typing (chang...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12452#issuecomment-211125032 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/12259#issuecomment-211124797 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14685] [CORE] Document heritability of ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12455#issuecomment-211124248 **[Test build #2801 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2801/consoleFull)** for PR 12455 at commit [`b96cde1`](https://github.com/apache/spark/commit/b96cde118c1265bf37ac7036581b8bb1bef80ee0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14632 randomSplit method fails on datafr...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12438 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14632 randomSplit method fails on datafr...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12438#issuecomment-211123492 @sbcd90 can you also create a backport for branch-1.6? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14632 randomSplit method fails on datafr...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12438#issuecomment-211123166 Thanks - merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] Support Correlated Sub-queri...
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/12306#issuecomment-211122925 @rxin / @davies this one is ready for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904] Add exit code parameter to exitE...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/12457#issuecomment-211121052 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14632 randomSplit method fails on datafr...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12438#issuecomment-211120299 **[Test build #2800 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2800/consoleFull)** for PR 12438 at commit [`76c0047`](https://github.com/apache/spark/commit/76c004773f0bb7827e460b966d3408387e49254f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14632 randomSplit method fails on datafr...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12438#issuecomment-211107493 **[Test build #2800 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2800/consoleFull)** for PR 12438 at commit [`76c0047`](https://github.com/apache/spark/commit/76c004773f0bb7827e460b966d3408387e49254f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/11723#discussion_r59991684 --- Diff: core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala --- @@ -140,6 +140,13 @@ private[spark] class CoarseGrainedExecutorBackend( case None => logWarning(s"Drop $msg because has not yet connected to driver") } } + + /** + * This function can be overloaded by other child classes to handle + * executor exits differently. For e.g. when an executor goes down, + * back-end may not want to take the parent process down. + */ + protected def exitExecutor(): Unit = System.exit(1) --- End diff -- Created https://github.com/apache/spark/pull/12457 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904] Add exit code parameter to exitE...
GitHub user tedyu opened a pull request: https://github.com/apache/spark/pull/12457 [SPARK-13904] Add exit code parameter to exitExecutor() ## What changes were proposed in this pull request? This PR adds exit code parameter to exitExecutor() so that caller can specify different exit code. ## How was this patch tested? Existing test @rxin @hbhanawat You can merge this pull request into a Git repository by running: $ git pull https://github.com/tedyu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12457.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12457 commit df565e4b4cfdcf1b86458bc9c89b2f023ab7222c Author: tedyuDate: 2016-04-17T20:15:13Z [SPARK-13904] Add exit code parameter to exitExecutor() --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14632 randomSplit method fails on datafr...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12438#issuecomment-211098368 **[Test build #2798 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2798/consoleFull)** for PR 12438 at commit [`76c0047`](https://github.com/apache/spark/commit/76c004773f0bb7827e460b966d3408387e49254f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14632 randomSplit method fails on datafr...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12438#issuecomment-211098676 **[Test build #2799 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2799/consoleFull)** for PR 12438 at commit [`76c0047`](https://github.com/apache/spark/commit/76c004773f0bb7827e460b966d3408387e49254f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11834][ML] Ignore Thresholds in Logisti...
Github user JeremyNixon commented on a diff in the pull request: https://github.com/apache/spark/pull/12400#discussion_r59990906 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -104,8 +107,8 @@ private[classification] trait LogisticRegressionParams extends ProbabilisticClas * @group setParam */ def setThresholds(value: Array[Double]): this.type = { -if (isSet(threshold)) clear(threshold) -set(thresholds, value) +logWarning("Ignoring setThresholds(), use setThreshold() for binary Logistic Regression.") --- End diff -- I updated the top line of the note to reiterate that this functionality will be re-enabled, and maintained the second line as thresholds can still be set via a param map with {model name}.thresholds -> {Array(values)}. Do you think it makes more sense to remove the top line instead? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14504][SQL] Enable Oracle docker tests
Github user lresende commented on the pull request: https://github.com/apache/spark/pull/12270#issuecomment-211084045 Jenkins, retest this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14283][SQL] Add a option to avoid local...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/12453#issuecomment-211083868 Shouldn't the user-facing documentation explain the consequences of omitting this sort, especially its impact on correctness? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14686][CORE,SQL,STREAMING]
Github user marcintustin commented on the pull request: https://github.com/apache/spark/pull/12456#issuecomment-211080503 @rxin Derp on my part. Of course this needs a better title. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14686][CORE,SQL,STREAMING]
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12456#issuecomment-211079561 The title should probably just be "[SPARK-14686] Allow setting local properties that are not inheritable" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14686][CORE,SQL,STREAMING]
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12456#issuecomment-211079376 Can you add a more descriptive title? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14686][CORE,SQL,STREAMING]
Github user marcintustin commented on the pull request: https://github.com/apache/spark/pull/12456#issuecomment-211078297 I should add that this will also need new tests. I haven't added any, again pending overall agreement on design. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14632 randomSplit method fails on datafr...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12438#issuecomment-211078003 **[Test build #2798 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2798/consoleFull)** for PR 12438 at commit [`76c0047`](https://github.com/apache/spark/commit/76c004773f0bb7827e460b966d3408387e49254f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14686][CORE,SQL,STREAMING]
GitHub user marcintustin opened a pull request: https://github.com/apache/spark/pull/12456 [Spark-14686][CORE,SQL,STREAMING] ## What changes were proposed in this pull request? This PR adds a uninheritableLocalPropertyFacility and ports sql.execution.id to be set with that facility. If this is to go forward, the changes should probably be folded into a Properties type which accommodates hierarchical access rather than a tuple. ## How was this patch tested? Running tests. @rxin @JoshRosen PR opened for comments. As noted above, this should probably have a little more engineering done, but I'd like to (a) get feedback on the overall approach; and (b) see which tests fail in jenkins, as I have some tests failing locally which may or may not be bogus. You can merge this pull request into a Git repository by running: $ git pull https://github.com/marcintustin/spark SPARK-14686 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12456.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12456 commit aecb305577a7d16065738afcf4bbeee6397b4f53 Author: Marcin TustinDate: 2016-04-16T23:41:22Z [SPARK-14685] Document inheritability of localProperties commit 9964e2e937718194521c6705ba345deba11f1f3d Author: Marcin Tustin Date: 2016-04-17T14:49:51Z Add test for heritability of local properties commit b96cde118c1265bf37ac7036581b8bb1bef80ee0 Author: Marcin Tustin Date: 2016-04-17T15:09:38Z SPARK-14685 add test to ensure no crosstalk between threads on localProperties Work with me in NYC: https://www.handy.com/careers/73115?gh_jid=73115_src=o5qcxn commit bc302e014f280cca9d9b7f104c14cdf7de6d5df4 Author: Marcin Tustin Date: 2016-04-17T18:35:36Z [SPARK-14686] First cut of non-inheritable localProperties --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14632 randomSplit method fails on datafr...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12438#issuecomment-211077993 **[Test build #2799 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2799/consoleFull)** for PR 12438 at commit [`76c0047`](https://github.com/apache/spark/commit/76c004773f0bb7827e460b966d3408387e49254f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14505] [Core] Fix bug : creating two Sp...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/12273#issuecomment-211075866 OK, so the changes are to pull the check for the active context out of the block checking for context being created (which looks like a simple bug), and you also put the check for active context first. I suppose it really doesn't matter which is checked first in this block. One generates a warning, one generates the exception, so the behavior is the same. LGTM. I think the slight code simplification you added is also warranted. I think the tests are failing now for unrelated reasons but we can retest soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14679] [UI] Fix UI DAG visualization OO...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/12437#discussion_r59989555 --- Diff: core/src/main/scala/org/apache/spark/ui/scope/RDDOperationGraph.scala --- @@ -72,6 +72,22 @@ private[ui] class RDDOperationCluster(val id: String, private var _name: String) def getCachedNodes: Seq[RDDOperationNode] = { _childNodes.filter(_.cached) ++ _childClusters.flatMap(_.getCachedNodes) } + + def canEqual(other: Any): Boolean = other.isInstanceOf[RDDOperationCluster] + + override def equals(other: Any): Boolean = other match { +case that: RDDOperationCluster => + (that canEqual this) && + _childClusters == that._childClusters && + id == that.id && + _name == that._name +case _ => false + } + + override def hashCode(): Int = { +val state = Seq(_childClusters, id, _name) +state.map(_.hashCode()).foldLeft(0)((a, b) => 31 * a + b) --- End diff -- Yeah, we had a similar discussion at https://github.com/apache/spark/pull/12157 just now. This is a decent way to write it when there are a lot of fields; for 3 fields, I'm neutral about it. The alternative is `31 * (31 * Objects.hashCode(_childClusters) + Objects.hashCode(id)) + Objects.hashCode(_name)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14684] [SQL] Verification of Partition ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12446#issuecomment-211074915 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56063/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14684] [SQL] Verification of Partition ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12446#issuecomment-211074913 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14684] [SQL] Verification of Partition ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12446#issuecomment-211074843 **[Test build #56063 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56063/consoleFull)** for PR 12446 at commit [`f550bf8`](https://github.com/apache/spark/commit/f550bf852ca7c088266cbc8c8a24a068d65d1766). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14504][SQL] Enable Oracle docker tests
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12270#issuecomment-211074807 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56062/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org