[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17770 **[Test build #77082 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77082/testReport)** for PR 17770 at commit [`6a7204c`](https://github.com/apache/spark/commit/6a7204c0fc00dbe2e43d6d65e722b3b13c3b35d0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17770 It seems to me that we won't want to show `AnalysisBarrier` in analyzed plan, unlike `SubqueryAlias`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18016 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77078/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17770: [SPARK-20392][SQL] Set barrier to prevent re-ente...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17770#discussion_r117404393 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -187,6 +187,9 @@ class Dataset[T] private[sql]( } } + // Wrap analyzed logical plan with an analysis barrier so we won't traverse/resolve it again. + @transient private val planWithBarrier: LogicalPlan = AnalysisBarrier(logicalPlan) --- End diff -- `CacheManager` uses `Dataset.logicalPlan` as key to look up identical plans already cached. If we always wrap `logicalPlan` with a barrier, we need to strip it when looking up caches. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18016 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18016 **[Test build #77078 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77078/testReport)** for PR 18016 at commit [`6d51c07`](https://github.com/apache/spark/commit/6d51c07e464c81f7d0337d7f632d3d9552a50cec). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17770: [SPARK-20392][SQL] Set barrier to prevent re-ente...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17770#discussion_r117404229 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1741,7 +1744,7 @@ class Dataset[T] private[sql]( def union(other: Dataset[T]): Dataset[T] = withSetOperator { // This breaks caching, but it's usually ok because it addresses a very specific use case: // using union to union many files or partitions. -CombineUnions(Union(logicalPlan, other.logicalPlan)) +CombineUnions(Union(logicalPlan, other.logicalPlan)).mapChildren(AnalysisBarrier(_)) --- End diff -- Sure and also all above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17770: [SPARK-20392][SQL] Set barrier to prevent re-ente...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17770#discussion_r117404204 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2470,6 +2480,13 @@ object CleanupAliases extends Rule[LogicalPlan] { } } +/** Remove the barrier nodes of analysis */ +object CleanupBarriers extends Rule[LogicalPlan] { --- End diff -- Sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17770: [SPARK-20392][SQL] Set barrier to prevent re-ente...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17770#discussion_r117404192 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -166,14 +166,15 @@ class Analyzer( Batch("Subquery", Once, UpdateOuterReferences), Batch("Cleanup", fixedPoint, - CleanupAliases) + CleanupAliases, + CleanupBarriers) --- End diff -- We do cleaning up the barriers in the end of Analysis is because we don't want to show it in analyzed plan. If we move it the "Finish Analysis" batch, it will show up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17770: [SPARK-20392][SQL] Set barrier to prevent re-ente...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17770#discussion_r117403987 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -912,3 +913,10 @@ case class Deduplicate( override def output: Seq[Attribute] = child.output } + +/** A logical plan for setting a barrier of analysis */ +case class AnalysisBarrier(child: LogicalPlan) extends LeafNode { + override def output: Seq[Attribute] = child.output + override def analyzed: Boolean = true + override def isStreaming: Boolean = child.isStreaming --- End diff -- It should be fine to use the default `canonicalized`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18016 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77077/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18016 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18016 **[Test build #77077 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77077/testReport)** for PR 18016 at commit [`8b346e6`](https://github.com/apache/spark/commit/8b346e6f6e211a8945e9d3fc9db489ce4c27ba87). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18011: [SPARK-19089][SQL] Add support for nested sequences
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18011 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77076/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18011: [SPARK-19089][SQL] Add support for nested sequences
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18011 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user janewangfb commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117403590 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala --- @@ -177,6 +177,18 @@ object ParserUtils { sb.toString() } + val escapedIdentifier = "`(.+)`".r --- End diff -- added. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18011: [SPARK-19089][SQL] Add support for nested sequences
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18011 **[Test build #77076 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77076/testReport)** for PR 18011 at commit [`dd3bf01`](https://github.com/apache/spark/commit/dd3bf0113cbf66ebf784f68d7f602c39f4a46b8b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user janewangfb commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117403303 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -795,6 +795,12 @@ object SQLConf { .intConf .createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt) + val SUPPORT_QUOTED_IDENTIFIERS = buildConf("spark.sql.support.quoted.identifiers") --- End diff -- renamed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user janewangfb commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117403331 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -795,6 +795,12 @@ object SQLConf { .intConf .createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt) + val SUPPORT_QUOTED_IDENTIFIERS = buildConf("spark.sql.support.quoted.identifiers") +.internal() +.doc("When true, identifiers specified by regex patterns will be expanded.") --- End diff -- yes. this only applies to column names. updated the doc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117402527 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala --- @@ -177,6 +177,18 @@ object ParserUtils { sb.toString() } + val escapedIdentifier = "`(.+)`".r --- End diff -- Please add a comment for this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117402461 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -795,6 +795,12 @@ object SQLConf { .intConf .createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt) + val SUPPORT_QUOTED_IDENTIFIERS = buildConf("spark.sql.support.quoted.identifiers") +.internal() +.doc("When true, identifiers specified by regex patterns will be expanded.") --- End diff -- It must be quoted. Thus, we also need to mention it in the description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117402094 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -795,6 +795,12 @@ object SQLConf { .intConf .createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt) + val SUPPORT_QUOTED_IDENTIFIERS = buildConf("spark.sql.support.quoted.identifiers") +.internal() +.doc("When true, identifiers specified by regex patterns will be expanded.") --- End diff -- We only do it for the column names, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117402025 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -795,6 +795,12 @@ object SQLConf { .intConf .createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt) + val SUPPORT_QUOTED_IDENTIFIERS = buildConf("spark.sql.support.quoted.identifiers") --- End diff -- How about renaming it to `spark.sql.parser.regexColumnNames`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18023 Like what we did for `*` in `Column.scala`, we also need to handle the Dataset APIs. You can follow the way we handle star there. ```Scala df.select(df("(a|b)?+.+")) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18023 **[Test build #77081 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77081/testReport)** for PR 18023 at commit [`6e37517`](https://github.com/apache/spark/commit/6e375177e68a216cdd53de1e5d600d898b2b59d5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18029: [SPARK-20168][WIP][DStream] Add changes to use kinesis f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18029 **[Test build #77080 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77080/testReport)** for PR 18029 at commit [`9944da8`](https://github.com/apache/spark/commit/9944da82b0b07642f0489c597d9b63176a361f0e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user janewangfb commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117399885 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create a dereference expression. The return type depends on the type of the parent, this can - * either be a [[UnresolvedAttribute]] (if the parent is an [[UnresolvedAttribute]]), or an - * [[UnresolvedExtractValue]] if the parent is some expression. + * Create a dereference expression. The return type depends on the type of the parent. + * If the parent is an [[UnresolvedAttribute]], it can be a [[UnresolvedAttribute]] or + * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some other expression, + * it can be [[UnresolvedExtractValue]]. */ override def visitDereference(ctx: DereferenceContext): Expression = withOrigin(ctx) { val attr = ctx.fieldName.getText expression(ctx.base) match { - case UnresolvedAttribute(nameParts) => + case unresolved_attr @ UnresolvedAttribute(nameParts) => +if (conf.supportQuotedIdentifiers) { + val escapedIdentifier = "`(.+)`".r + val ret = Option(ctx.fieldName.getStart).map(_.getText match { +case r@escapedIdentifier(i) => + UnresolvedRegex(i, Some(unresolved_attr.name)) +case _ => + UnresolvedAttribute(nameParts :+ attr) + }) + return ret.get +} + UnresolvedAttribute(nameParts :+ attr) case e => UnresolvedExtractValue(e, Literal(attr)) } } /** - * Create an [[UnresolvedAttribute]] expression. + * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] if it is a regex + * quoted in `` */ override def visitColumnReference(ctx: ColumnReferenceContext): Expression = withOrigin(ctx) { +if (conf.supportQuotedIdentifiers) { + val escapedIdentifier = "`(.+)`".r + val ret = Option(ctx.getStart).map(_.getText match { --- End diff -- removed the option --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user janewangfb commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117399877 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create a dereference expression. The return type depends on the type of the parent, this can - * either be a [[UnresolvedAttribute]] (if the parent is an [[UnresolvedAttribute]]), or an - * [[UnresolvedExtractValue]] if the parent is some expression. + * Create a dereference expression. The return type depends on the type of the parent. + * If the parent is an [[UnresolvedAttribute]], it can be a [[UnresolvedAttribute]] or + * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some other expression, + * it can be [[UnresolvedExtractValue]]. */ override def visitDereference(ctx: DereferenceContext): Expression = withOrigin(ctx) { val attr = ctx.fieldName.getText expression(ctx.base) match { - case UnresolvedAttribute(nameParts) => + case unresolved_attr @ UnresolvedAttribute(nameParts) => +if (conf.supportQuotedIdentifiers) { + val escapedIdentifier = "`(.+)`".r + val ret = Option(ctx.fieldName.getStart).map(_.getText match { +case r@escapedIdentifier(i) => + UnresolvedRegex(i, Some(unresolved_attr.name)) +case _ => + UnresolvedAttribute(nameParts :+ attr) + }) + return ret.get +} + UnresolvedAttribute(nameParts :+ attr) case e => UnresolvedExtractValue(e, Literal(attr)) } } /** - * Create an [[UnresolvedAttribute]] expression. + * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] if it is a regex + * quoted in `` */ override def visitColumnReference(ctx: ColumnReferenceContext): Expression = withOrigin(ctx) { +if (conf.supportQuotedIdentifiers) { + val escapedIdentifier = "`(.+)`".r --- End diff -- Add API in ParserUtils. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/18025 @felixcheung Thanks for your feedback. - This does not affect discoverability: the name of the method is still on the index list - No problem with help either, e.g., one can use `?avg`. ![image](https://cloud.githubusercontent.com/assets/11082368/26232656/945b3afe-3c0c-11e7-8c17-fa8df5e4ee2e.png) Another benefit is that we can get rid of most warnings on no examples since we now document all the tiny functions together. I think it is important and the change is straightforward. However, this is a pretty manual (and big) change. I would like to get a `Yes` from you for doing this. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user janewangfb commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117399811 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create a dereference expression. The return type depends on the type of the parent, this can - * either be a [[UnresolvedAttribute]] (if the parent is an [[UnresolvedAttribute]]), or an - * [[UnresolvedExtractValue]] if the parent is some expression. + * Create a dereference expression. The return type depends on the type of the parent. + * If the parent is an [[UnresolvedAttribute]], it can be a [[UnresolvedAttribute]] or + * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some other expression, + * it can be [[UnresolvedExtractValue]]. */ override def visitDereference(ctx: DereferenceContext): Expression = withOrigin(ctx) { val attr = ctx.fieldName.getText expression(ctx.base) match { - case UnresolvedAttribute(nameParts) => + case unresolved_attr @ UnresolvedAttribute(nameParts) => +if (conf.supportQuotedIdentifiers) { + val escapedIdentifier = "`(.+)`".r --- End diff -- Add API in ParserUtils. I think in the parser, it can still get ``; after that, the `` are stripped off. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user janewangfb commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117399718 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create a dereference expression. The return type depends on the type of the parent, this can - * either be a [[UnresolvedAttribute]] (if the parent is an [[UnresolvedAttribute]]), or an - * [[UnresolvedExtractValue]] if the parent is some expression. + * Create a dereference expression. The return type depends on the type of the parent. + * If the parent is an [[UnresolvedAttribute]], it can be a [[UnresolvedAttribute]] or + * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some other expression, + * it can be [[UnresolvedExtractValue]]. */ override def visitDereference(ctx: DereferenceContext): Expression = withOrigin(ctx) { val attr = ctx.fieldName.getText expression(ctx.base) match { - case UnresolvedAttribute(nameParts) => + case unresolved_attr @ UnresolvedAttribute(nameParts) => +if (conf.supportQuotedIdentifiers) { + val escapedIdentifier = "`(.+)`".r + val ret = Option(ctx.fieldName.getStart).map(_.getText match { --- End diff -- removed the option --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user janewangfb commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117399399 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create a dereference expression. The return type depends on the type of the parent, this can - * either be a [[UnresolvedAttribute]] (if the parent is an [[UnresolvedAttribute]]), or an - * [[UnresolvedExtractValue]] if the parent is some expression. + * Create a dereference expression. The return type depends on the type of the parent. + * If the parent is an [[UnresolvedAttribute]], it can be a [[UnresolvedAttribute]] or + * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some other expression, + * it can be [[UnresolvedExtractValue]]. */ override def visitDereference(ctx: DereferenceContext): Expression = withOrigin(ctx) { val attr = ctx.fieldName.getText expression(ctx.base) match { - case UnresolvedAttribute(nameParts) => + case unresolved_attr @ UnresolvedAttribute(nameParts) => --- End diff -- updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: [SPARK-20801] Record accurate size of blocks in MapStatu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18031 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: [SPARK-20801] Record accurate size of blocks in MapStatu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18031 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77072/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: [SPARK-20801] Record accurate size of blocks in MapStatu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18031 **[Test build #77072 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77072/testReport)** for PR 18031 at commit [`bfea9f5`](https://github.com/apache/spark/commit/bfea9f59fd7587b87de0ddb4601f76786671f38a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user janewangfb commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117398452 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala --- @@ -84,6 +84,33 @@ case class UnresolvedTableValuedFunction( } /** + * Represents all of the input attributes to a given relational operator, for example in + * "SELECT `(id)?+.+` FROM ...". + * + * @param table an optional table that should be the target of the expansion. If omitted all + * tables' columns are produced. + */ +case class UnresolvedRegex(expr: String, table: Option[String]) extends Star with Unevaluable { + override def expand(input: LogicalPlan, resolver: Resolver): Seq[NamedExpression] = { +val expandedAttributes: Seq[Attribute] = table match { + // If there is no table specified, use all input attributes that match expr + case None => input.output.filter(_.name.matches(expr)) + // If there is a table, pick out attributes that are part of this table that match expr + case Some(t) => input.output.filter(_.qualifier.filter(resolver(_, t)).nonEmpty) +.filter(_.name.matches(expr)) +} + +expandedAttributes.zip(input.output).map { --- End diff -- you are right. we dont need it any more. removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #77079 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77079/testReport)** for PR 16677 at commit [`55ee6b0`](https://github.com/apache/spark/commit/55ee6b0fb3bc9e6998b4098a369c54a15824e414). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18016 **[Test build #77078 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77078/testReport)** for PR 18016 at commit [`6d51c07`](https://github.com/apache/spark/commit/6d51c07e464c81f7d0337d7f632d3d9552a50cec). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user janewangfb commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117398110 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala --- @@ -84,6 +84,33 @@ case class UnresolvedTableValuedFunction( } /** + * Represents all of the input attributes to a given relational operator, for example in + * "SELECT `(id)?+.+` FROM ...". + * + * @param table an optional table that should be the target of the expansion. If omitted all + * tables' columns are produced. + */ +case class UnresolvedRegex(expr: String, table: Option[String]) extends Star with Unevaluable { + override def expand(input: LogicalPlan, resolver: Resolver): Seq[NamedExpression] = { +val expandedAttributes: Seq[Attribute] = table match { + // If there is no table specified, use all input attributes that match expr + case None => input.output.filter(_.name.matches(expr)) + // If there is a table, pick out attributes that are part of this table that match expr + case Some(t) => input.output.filter(_.qualifier.filter(resolver(_, t)).nonEmpty) --- End diff -- updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user janewangfb commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117397712 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala --- @@ -84,6 +84,33 @@ case class UnresolvedTableValuedFunction( } /** + * Represents all of the input attributes to a given relational operator, for example in + * "SELECT `(id)?+.+` FROM ...". + * + * @param table an optional table that should be the target of the expansion. If omitted all + * tables' columns are produced. + */ +case class UnresolvedRegex(expr: String, table: Option[String]) extends Star with Unevaluable { --- End diff -- renamed to pattern --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18016 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77074/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18016 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18016 **[Test build #77074 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77074/testReport)** for PR 18016 at commit [`1f771bd`](https://github.com/apache/spark/commit/1f771bd9bdee15b4a2c2d829f5f60404044ba9af). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18016 **[Test build #77077 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77077/testReport)** for PR 18016 at commit [`8b346e6`](https://github.com/apache/spark/commit/8b346e6f6e211a8945e9d3fc9db489ce4c27ba87). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18016: [SPARK-20786][SQL]Improve ceil and floor handle t...
Github user heary-cao commented on a diff in the pull request: https://github.com/apache/spark/pull/18016#discussion_r117397093 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala --- @@ -173,6 +173,14 @@ class MathFunctionsSuite extends QueryTest with SharedSQLContext { checkAnswer( sql("SELECT ceiling(0), ceiling(1), ceiling(1.5)"), Row(0L, 1L, 2L)) + +checkAnswer( + sql("SELECT ceil(1234567890123456), ceil(12345678901234567)"), + Row(1234567890123456L, 12345678901234567L)) + +checkAnswer( + sql("SELECT ceiling(1234567890123456), ceiling(12345678901234567)"), + Row(1234567890123456L, 12345678901234567L)) --- End diff -- ok, add new tests to the end of operators.sql. please review it again. thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18016 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77073/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18016 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18016 **[Test build #77073 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77073/testReport)** for PR 18016 at commit [`68ecf5e`](https://github.com/apache/spark/commit/68ecf5e129eaba5830c439e1196bd4f1ee22ae42). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide jump link...
Github user guoxiaolongzte commented on the issue: https://github.com/apache/spark/pull/18015 Thank you, I will work better for Spark web ui. jenkins to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17936: [SPARK-20638][Core]Optimize the CartesianRDD to reduce r...
Github user ConeyLiu commented on the issue: https://github.com/apache/spark/pull/17936 @srowen Sorry for the late reply. I updated the code. Because we should reduce times of the remotely fetch, the second partition should be cached in locally. There are two ways, first cached by the `TaskConsumer` which controlled by the `Execution Memory`(this methods seems #9969); Second, cached by the `BlockManager` which controlled by the `Storage Memory`. Through the experiment found that the first way gc problem is very serious. Cartesian only used in `ALS` and `UnsafeCartesianRDD`. However, the latter itself implements a `Cartesian`, you can see as follow: ``` class UnsafeCartesianRDD( left : RDD[UnsafeRow], right : RDD[UnsafeRow], numFieldsOfRight: Int, spillThreshold: Int) extends CartesianRDD[UnsafeRow, UnsafeRow](left.sparkContext, left, right) { override def compute(split: Partition, context: TaskContext): Iterator[(UnsafeRow, UnsafeRow)] = { val rowArray = new ExternalAppendOnlyUnsafeRowArray(spillThreshold) val partition = split.asInstanceOf[CartesianPartition] rdd2.iterator(partition.s2, context).foreach(rowArray.add) // Create an iterator from rowArray def createIter(): Iterator[UnsafeRow] = rowArray.generateIterator() val resultIter = for (x <- rdd1.iterator(partition.s1, context); y <- createIter()) yield (x, y) CompletionIterator[(UnsafeRow, UnsafeRow), Iterator[(UnsafeRow, UnsafeRow)]]( resultIter, rowArray.clear()) } } ``` So I think there should be no other impact. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18011: [SPARK-19089][SQL] Add support for nested sequences
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18011 **[Test build #77076 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77076/testReport)** for PR 18011 at commit [`dd3bf01`](https://github.com/apache/spark/commit/dd3bf0113cbf66ebf784f68d7f602c39f4a46b8b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14971 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77071/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14971 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16986: [SPARK-18891][SQL] Support for Map collection typ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16986#discussion_r117394141 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -329,35 +329,19 @@ object ScalaReflection extends ScalaReflection { } UnresolvedMapObjects(mapFunction, getPath, Some(cls)) - case t if t <:< localTypeOf[Map[_, _]] => + case t if t <:< localTypeOf[Map[_, _]] || t <:< localTypeOf[java.util.Map[_, _]] => --- End diff -- let's remove them and related java map tests in this PR and add them in next PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14971 **[Test build #77071 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77071/testReport)** for PR 14971 at commit [`1e4182d`](https://github.com/apache/spark/commit/1e4182d1e03622cdcc84f6cd951b2c534289e78f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18011: [SPARK-19089][SQL] Add support for nested sequences
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18011 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17936: [SPARK-20638][Core]Optimize the CartesianRDD to r...
Github user ConeyLiu commented on a diff in the pull request: https://github.com/apache/spark/pull/17936#discussion_r117393923 --- Diff: core/src/test/scala/org/apache/spark/metrics/InputOutputMetricsSuite.scala --- @@ -198,8 +198,12 @@ class InputOutputMetricsSuite extends SparkFunSuite with SharedSparkContext // write files to disk so we can read them later. sc.parallelize(cartVector).saveAsTextFile(cartFilePath) val aRdd = sc.textFile(cartFilePath, numPartitions) +aRdd.cache() +aRdd.count() --- End diff -- There is a very strange mistake. If we cache both `aRdd` & `tmpRdd`, this pr and master branch all pasted the test. But if we just cache the `tmpRdd`, both the branch are failed. So here are temporarily set to cache. I will look at the details of the problem, it may be a bug, if I understand the wrong me, please pointer me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17936: [SPARK-20638][Core]Optimize the CartesianRDD to r...
Github user ConeyLiu commented on a diff in the pull request: https://github.com/apache/spark/pull/17936#discussion_r117393634 --- Diff: core/src/test/scala/org/apache/spark/metrics/InputOutputMetricsSuite.scala --- @@ -198,8 +198,12 @@ class InputOutputMetricsSuite extends SparkFunSuite with SharedSparkContext // write files to disk so we can read them later. sc.parallelize(cartVector).saveAsTextFile(cartFilePath) val aRdd = sc.textFile(cartFilePath, numPartitions) +aRdd.cache() +aRdd.count() val tmpRdd = sc.textFile(tmpFilePath, numPartitions) +tmpRdd.cache() +tmpRdd.count() --- End diff -- Because we cache the rdd in the CartesianRDD compute method, so there we should count the bytes read from memory. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated S...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14971#discussion_r117393090 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala --- @@ -215,6 +218,215 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto } } + private def createNonPartitionedTable( + tabName: String, + analyzedBySpark: Boolean = true, + analyzedByHive: Boolean = true): Unit = { +val hiveClient = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog].client +sql( + s""" + |CREATE TABLE $tabName (key STRING, value STRING) + |STORED AS TEXTFILE + |TBLPROPERTIES ('prop1' = 'val1', 'prop2' = 'val2') + """.stripMargin) +sql(s"INSERT INTO TABLE $tabName SELECT * FROM src") +if (analyzedBySpark) sql(s"ANALYZE TABLE $tabName COMPUTE STATISTICS") +// This is to mimic the scenario in which Hive genrates statistics before we reading it +if (analyzedByHive) hiveClient.runSqlHive(s"ANALYZE TABLE $tabName COMPUTE STATISTICS") +val describeResult1 = hiveClient.runSqlHive(s"DESCRIBE FORMATTED $tabName") + +val tableMetadata = + spark.sessionState.catalog.getTableMetadata(TableIdentifier(tabName)).properties +// statistics info is not contained in the metadata of the original table +assert(Seq(StatsSetupConst.COLUMN_STATS_ACCURATE, + StatsSetupConst.NUM_FILES, + StatsSetupConst.NUM_PARTITIONS, + StatsSetupConst.ROW_COUNT, + StatsSetupConst.RAW_DATA_SIZE, + StatsSetupConst.TOTAL_SIZE).forall(!tableMetadata.contains(_))) + +if (analyzedByHive) { + assert(StringUtils.filterPattern(describeResult1, "*numRows\\s+500*").nonEmpty) +} else { + assert(StringUtils.filterPattern(describeResult1, "*numRows\\s+500*").isEmpty) +} + } + + private def extractStatsPropValues( + descOutput: Seq[String], + propKey: String): Option[BigInt] = { +val str = descOutput + .filterNot(_.contains(HiveExternalCatalog.STATISTICS_PREFIX)) + .filter(_.contains(propKey)) +if (str.isEmpty) { + None +} else { + assert(str.length == 1, "found more than one matches") + val pattern = new Regex(s"""$propKey\\s+(-?\\d+)""") + val pattern(value) = str.head.trim + Option(BigInt(value)) +} + } + + test("get statistics when not analyzed in both Hive and Spark") { +val tabName = "tab1" +withTable(tabName) { + createNonPartitionedTable(tabName, analyzedByHive = false, analyzedBySpark = false) + checkTableStats( +tabName, hasSizeInBytes = true, expectedRowCounts = None) + + // ALTER TABLE SET TBLPROPERTIES invalidates some contents of Hive specific statistics + // This is triggered by the Hive alterTable API + val hiveClient = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog].client + val describeResult = hiveClient.runSqlHive(s"DESCRIBE FORMATTED $tabName") + + val rawDataSize = extractStatsPropValues(describeResult, "rawDataSize") + val numRows = extractStatsPropValues(describeResult, "numRows") + val totalSize = extractStatsPropValues(describeResult, "totalSize") + assert(rawDataSize.isEmpty, "rawDataSize should not be shown without table analysis") + assert(numRows.isEmpty, "numRows should not be shown without table analysis") + assert(totalSize.isDefined && totalSize.get > 0, "totalSize is lost") +} + } + + test("alter table rename after analyze table") { +Seq(true, false).foreach { analyzedBySpark => + val oldName = "tab1" + val newName = "tab2" + withTable(oldName, newName) { +createNonPartitionedTable(oldName, analyzedByHive = true, analyzedBySpark = analyzedBySpark) +val fetchedStats1 = checkTableStats( + oldName, hasSizeInBytes = true, expectedRowCounts = Some(500)) +sql(s"ALTER TABLE $oldName RENAME TO $newName") +val fetchedStats2 = checkTableStats( + newName, hasSizeInBytes = true, expectedRowCounts = Some(500)) +assert(fetchedStats1 == fetchedStats2) + +// ALTER TABLE RENAME does not affect the contents of Hive specific statistics +val hiveClient = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog].client +val describeResult = hiveClient.runSqlHive(s"DESCRIBE FORMATTED $newName") + +val rawDataSize = extractStatsPropValues(describeResult, "rawDataSize") +val numRows = extractStatsPropValues(describeResult, "numRows")
[GitHub] spark pull request #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated S...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14971#discussion_r117392683 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -414,6 +415,50 @@ private[hive] class HiveClientImpl( val properties = Option(h.getParameters).map(_.asScala.toMap).orNull + // Hive-generated Statistics are also recorded in ignoredProperties + val ignoredProperties = scala.collection.mutable.Map.empty[String, String] + for (key <- HiveStatisticsProperties; value <- properties.get(key)) { +ignoredProperties += key -> value + } + + val excludedTableProperties = HiveStatisticsProperties ++ Set( +// The property value of "comment" is moved to the dedicated field "comment" +"comment", +// For EXTERNAL_TABLE, the table properties has a particular field "EXTERNAL". This is added +// in the function toHiveTable. +"EXTERNAL" + ) + + val filteredProperties = properties.filterNot { +case (key, _) => excludedTableProperties.contains(key) + } + val comment = properties.get("comment") + + val totalSize = properties.get(StatsSetupConst.TOTAL_SIZE).map(BigInt(_)) + val rawDataSize = properties.get(StatsSetupConst.RAW_DATA_SIZE).map(BigInt(_)) + lazy val rowCount = properties.get(StatsSetupConst.ROW_COUNT).map(BigInt(_)) match { --- End diff -- 1. I think we can just use val, no need to bother about performance here. 2. can be simplified to `xxx.filter(_ >= 0)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18029: [SPARK-20168][WIP][DStream] Add changes to use kinesis f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18029 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77075/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18029: [SPARK-20168][WIP][DStream] Add changes to use kinesis f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18029 **[Test build #77075 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77075/testReport)** for PR 18029 at commit [`75d8523`](https://github.com/apache/spark/commit/75d852384f12554c3171513f11d31604ff206dac). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18029: [SPARK-20168][WIP][DStream] Add changes to use kinesis f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18029 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18029: [SPARK-20168][WIP][DStream] Add changes to use kinesis f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18029 **[Test build #77075 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77075/testReport)** for PR 18029 at commit [`75d8523`](https://github.com/apache/spark/commit/75d852384f12554c3171513f11d31604ff206dac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18029: [SPARK-20168][WIP][DStream] Add changes to use kinesis f...
Github user yssharma commented on the issue: https://github.com/apache/spark/pull/18029 @budde @brkyvz would love to hear your thoughts if this is the best way to add this functionality --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18016 **[Test build #77074 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77074/testReport)** for PR 18016 at commit [`1f771bd`](https://github.com/apache/spark/commit/1f771bd9bdee15b4a2c2d829f5f60404044ba9af). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17955: [SPARK-20715] Store MapStatuses only in MapOutput...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/17955#discussion_r117388593 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala --- @@ -42,13 +41,12 @@ private[spark] class ShuffleMapStage( parents: List[Stage], firstJobId: Int, callSite: CallSite, -val shuffleDep: ShuffleDependency[_, _, _]) +val shuffleDep: ShuffleDependency[_, _, _], --- End diff -- Good catch. I agree, but with the caveat that we can only clean this up if this isn't functioning as the last strong reference which keeps the dependency from being garbage-collected. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18016 **[Test build #77073 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77073/testReport)** for PR 18016 at commit [`68ecf5e`](https://github.com/apache/spark/commit/68ecf5e129eaba5830c439e1196bd4f1ee22ae42). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17955: [SPARK-20715] Store MapStatuses only in MapOutput...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/17955#discussion_r117388447 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1233,17 +1223,6 @@ class DAGScheduler( logInfo("waiting: " + waitingStages) logInfo("failed: " + failedStages) - // We supply true to increment the epoch number here in case this is a - // recomputation of the map outputs. In that case, some nodes may have cached - // locations with holes (from when we detected the error) and will need the - // epoch incremented to refetch them. - // TODO: Only increment the epoch number if this is not the first time - // we registered these map outputs. - mapOutputTracker.registerMapOutputs( -shuffleStage.shuffleDep.shuffleId, -shuffleStage.outputLocInMapOutputTrackerFormat(), -changeEpoch = true) --- End diff -- I need to think about this carefully and maybe make a matrix of possible cases to be sure. My original thought process was something like this: - The old code comment says `TODO: Only increment the epoch number if this is not the first time we registered these map outputs`, which implies that at least some of the epoch increments here were unnecessary. - If we assume that a new, never-before-computed map output won't be requested by executors before it is complete then we don't need to worry about executors caching incomplete map outputs. - I believe that any FetchFailure should end up incrementing the epoch. That said, the increment here is only occurring once per stage completion. It probably doesn't _hurt_ to bump the epoch here because in a single-stage-at-a-time case we'd only be invalidating map outputs which we'll never fetch again anyways. Even if we were unnecessarily invalidating the map output statuses of other concurrent stages I think that the impact of this is going to be relatively small (if we did find that this had an impact then a sane approach would be to implement an e-tag like mechanism where bumping the epoch doesn't purge the executor-side caches, but, instead, has them verify a per-stage epoch / counter). Finally, the existing code might be giving us nice eager cleanup of map statuses after stages complete (vs. the cleanup which occurs later when stages or shuffles are fully cleaned up). I think you're right that this change carries unnecessary / not-fully-understood risks for now, so let me go ahead and put in an explicit increment here (with an updated comment / ref. to this discussion) in my next push to this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18023 **[Test build #77068 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77068/testReport)** for PR 18023 at commit [`7699e87`](https://github.com/apache/spark/commit/7699e871a31e37755b35c88b893faf9df8f7664f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18023 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18023 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77068/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17999: [SPARK-20751][SQL] Add built-in SQL Function - COT
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17999 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17955: [SPARK-20715] Store MapStatuses only in MapOutput...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/17955#discussion_r117385925 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala --- @@ -42,13 +41,12 @@ private[spark] class ShuffleMapStage( parents: List[Stage], firstJobId: Int, callSite: CallSite, -val shuffleDep: ShuffleDependency[_, _, _]) +val shuffleDep: ShuffleDependency[_, _, _], --- End diff -- Seems we can pass the `shuffleId`, instead of `ShuffleDependency` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17955: [SPARK-20715] Store MapStatuses only in MapOutput...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/17955#discussion_r117385673 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1233,17 +1223,6 @@ class DAGScheduler( logInfo("waiting: " + waitingStages) logInfo("failed: " + failedStages) - // We supply true to increment the epoch number here in case this is a - // recomputation of the map outputs. In that case, some nodes may have cached - // locations with holes (from when we detected the error) and will need the - // epoch incremented to refetch them. - // TODO: Only increment the epoch number if this is not the first time - // we registered these map outputs. - mapOutputTracker.registerMapOutputs( -shuffleStage.shuffleDep.shuffleId, -shuffleStage.outputLocInMapOutputTrackerFormat(), -changeEpoch = true) --- End diff -- Is it safer if we increment the epoch number here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17999: [SPARK-20751][SQL] Add built-in SQL Function - COT
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17999 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77067/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17999: [SPARK-20751][SQL] Add built-in SQL Function - COT
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17999 **[Test build #77067 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77067/testReport)** for PR 17999 at commit [`ea10dee`](https://github.com/apache/spark/commit/ea10dee343671e3d9c79eb0bcddc55a2ee3d1d71). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17985: Add "full_outer" name to join types
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/17985 @BartekH Yes, I think we can add that to exception message. Please also add a test case for checking supported join types. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17992: [SPARK-20759] SCALA_VERSION in _config.yml should be con...
Github user liu-zhaokun commented on the issue: https://github.com/apache/spark/pull/17992 @srowen Hello,do you know how to finish the test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18031 **[Test build #77072 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77072/testReport)** for PR 18031 at commit [`bfea9f5`](https://github.com/apache/spark/commit/bfea9f59fd7587b87de0ddb4601f76786671f38a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 @HyukjinKwon Thank you so much ! Really helpful ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18031: Record accurate size of blocks in MapStatus when ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18031#discussion_r117385321 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -121,48 +126,69 @@ private[spark] class CompressedMapStatus( } /** - * A [[MapStatus]] implementation that only stores the average size of non-empty blocks, - * plus a bitmap for tracking which blocks are empty. + * A [[MapStatus]] implementation that stores the accurate size of huge blocks, which are larger + * than both [[config.SHUFFLE_ACCURATE_BLOCK_THRESHOLD]] and + * [[config.SHUFFLE_ACCURATE_BLOCK_THRESHOLD_BY_TIMES_AVERAGE]] * averageSize. It stores the --- End diff -- It looks the documentation generation for Javadoc 8 is being failed due to these links - ``` [error] /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/scheduler/HighlyCompressedMapStatus.java:4: error: reference not found [error] * than both {@link config.SHUFFLE_ACCURATE_BLOCK_THRESHOLD} and [error] ^ [error] /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/scheduler/HighlyCompressedMapStatus.java:5: error: reference not found [error] * {@link config.SHUFFLE_ACCURATE_BLOCK_THRESHOLD_BY_TIMES_AVERAGE} * averageSize. It stores the [error] ^ [error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/core/target/java/org/apache/spark/sql/functions.java:2996: error: invalid uri: "http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html Customizing Formats" [error]* @see http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html Customizing Formats"/> [error] ^ ``` Probably, we should wrap it `` `...` `` as I did before - https://github.com/apache/spark/pull/16013 or find a way to make this link properly. The other errors seem spurious. Please refer my observation - https://github.com/apache/spark/pull/17389#issuecomment-288438704 (I think we should fix it or document ^ somewhere at least). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14971 **[Test build #77071 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77071/testReport)** for PR 14971 at commit [`1e4182d`](https://github.com/apache/spark/commit/1e4182d1e03622cdcc84f6cd951b2c534289e78f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18031 **[Test build #77070 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77070/testReport)** for PR 18031 at commit [`970421b`](https://github.com/apache/spark/commit/970421b2a5cb2278d60403f72dc165418e4faf87). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18031 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77070/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14971 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77066/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18031 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14971 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14971 **[Test build #77066 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77066/testReport)** for PR 14971 at commit [`aa9a36e`](https://github.com/apache/spark/commit/aa9a36e1c5bff881a053a139f49344be0ad62452). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117367232 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala --- @@ -84,6 +84,33 @@ case class UnresolvedTableValuedFunction( } /** + * Represents all of the input attributes to a given relational operator, for example in + * "SELECT `(id)?+.+` FROM ...". + * + * @param table an optional table that should be the target of the expansion. If omitted all + * tables' columns are produced. + */ +case class UnresolvedRegex(expr: String, table: Option[String]) extends Star with Unevaluable { + override def expand(input: LogicalPlan, resolver: Resolver): Seq[NamedExpression] = { +val expandedAttributes: Seq[Attribute] = table match { + // If there is no table specified, use all input attributes that match expr + case None => input.output.filter(_.name.matches(expr)) + // If there is a table, pick out attributes that are part of this table that match expr + case Some(t) => input.output.filter(_.qualifier.filter(resolver(_, t)).nonEmpty) --- End diff -- `input.output.filter(_.qualifier.exists(resolver(_, t)))` is a bit more concise. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117379878 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create a dereference expression. The return type depends on the type of the parent, this can - * either be a [[UnresolvedAttribute]] (if the parent is an [[UnresolvedAttribute]]), or an - * [[UnresolvedExtractValue]] if the parent is some expression. + * Create a dereference expression. The return type depends on the type of the parent. + * If the parent is an [[UnresolvedAttribute]], it can be a [[UnresolvedAttribute]] or + * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some other expression, + * it can be [[UnresolvedExtractValue]]. */ override def visitDereference(ctx: DereferenceContext): Expression = withOrigin(ctx) { val attr = ctx.fieldName.getText expression(ctx.base) match { - case UnresolvedAttribute(nameParts) => + case unresolved_attr @ UnresolvedAttribute(nameParts) => +if (conf.supportQuotedIdentifiers) { + val escapedIdentifier = "`(.+)`".r + val ret = Option(ctx.fieldName.getStart).map(_.getText match { +case r@escapedIdentifier(i) => + UnresolvedRegex(i, Some(unresolved_attr.name)) +case _ => + UnresolvedAttribute(nameParts :+ attr) + }) + return ret.get +} + UnresolvedAttribute(nameParts :+ attr) case e => UnresolvedExtractValue(e, Literal(attr)) } } /** - * Create an [[UnresolvedAttribute]] expression. + * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] if it is a regex + * quoted in `` */ override def visitColumnReference(ctx: ColumnReferenceContext): Expression = withOrigin(ctx) { +if (conf.supportQuotedIdentifiers) { + val escapedIdentifier = "`(.+)`".r --- End diff -- We don't need to compile the same regex over and over. Can you move this to the ParserUtils... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117367155 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala --- @@ -84,6 +84,33 @@ case class UnresolvedTableValuedFunction( } /** + * Represents all of the input attributes to a given relational operator, for example in + * "SELECT `(id)?+.+` FROM ...". + * + * @param table an optional table that should be the target of the expansion. If omitted all + * tables' columns are produced. + */ +case class UnresolvedRegex(expr: String, table: Option[String]) extends Star with Unevaluable { --- End diff -- `expr` is the pattern right? Maybe we should give it a better name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117380037 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create a dereference expression. The return type depends on the type of the parent, this can - * either be a [[UnresolvedAttribute]] (if the parent is an [[UnresolvedAttribute]]), or an - * [[UnresolvedExtractValue]] if the parent is some expression. + * Create a dereference expression. The return type depends on the type of the parent. + * If the parent is an [[UnresolvedAttribute]], it can be a [[UnresolvedAttribute]] or + * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some other expression, + * it can be [[UnresolvedExtractValue]]. */ override def visitDereference(ctx: DereferenceContext): Expression = withOrigin(ctx) { val attr = ctx.fieldName.getText expression(ctx.base) match { - case UnresolvedAttribute(nameParts) => + case unresolved_attr @ UnresolvedAttribute(nameParts) => +if (conf.supportQuotedIdentifiers) { + val escapedIdentifier = "`(.+)`".r + val ret = Option(ctx.fieldName.getStart).map(_.getText match { --- End diff -- Using an option here does not add a thing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117366828 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala --- @@ -84,6 +84,33 @@ case class UnresolvedTableValuedFunction( } /** + * Represents all of the input attributes to a given relational operator, for example in + * "SELECT `(id)?+.+` FROM ...". + * + * @param table an optional table that should be the target of the expansion. If omitted all + * tables' columns are produced. + */ +case class UnresolvedRegex(expr: String, table: Option[String]) extends Star with Unevaluable { + override def expand(input: LogicalPlan, resolver: Resolver): Seq[NamedExpression] = { +val expandedAttributes: Seq[Attribute] = table match { + // If there is no table specified, use all input attributes that match expr + case None => input.output.filter(_.name.matches(expr)) + // If there is a table, pick out attributes that are part of this table that match expr + case Some(t) => input.output.filter(_.qualifier.filter(resolver(_, t)).nonEmpty) +.filter(_.name.matches(expr)) +} + +expandedAttributes.zip(input.output).map { --- End diff -- An `Attribute` is always a `NamedExpression`, why do we need this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117368022 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create a dereference expression. The return type depends on the type of the parent, this can - * either be a [[UnresolvedAttribute]] (if the parent is an [[UnresolvedAttribute]]), or an - * [[UnresolvedExtractValue]] if the parent is some expression. + * Create a dereference expression. The return type depends on the type of the parent. + * If the parent is an [[UnresolvedAttribute]], it can be a [[UnresolvedAttribute]] or + * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some other expression, + * it can be [[UnresolvedExtractValue]]. */ override def visitDereference(ctx: DereferenceContext): Expression = withOrigin(ctx) { val attr = ctx.fieldName.getText expression(ctx.base) match { - case UnresolvedAttribute(nameParts) => + case unresolved_attr @ UnresolvedAttribute(nameParts) => --- End diff -- Please use a guard, e.g.: `case unresolved_attr @ UnresolvedAttribute(nameParts) if conf.supportQuotedIdentifiers => `. That makes the logic down the line much simpler. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117380055 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create a dereference expression. The return type depends on the type of the parent, this can - * either be a [[UnresolvedAttribute]] (if the parent is an [[UnresolvedAttribute]]), or an - * [[UnresolvedExtractValue]] if the parent is some expression. + * Create a dereference expression. The return type depends on the type of the parent. + * If the parent is an [[UnresolvedAttribute]], it can be a [[UnresolvedAttribute]] or + * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some other expression, + * it can be [[UnresolvedExtractValue]]. */ override def visitDereference(ctx: DereferenceContext): Expression = withOrigin(ctx) { val attr = ctx.fieldName.getText expression(ctx.base) match { - case UnresolvedAttribute(nameParts) => + case unresolved_attr @ UnresolvedAttribute(nameParts) => +if (conf.supportQuotedIdentifiers) { + val escapedIdentifier = "`(.+)`".r + val ret = Option(ctx.fieldName.getStart).map(_.getText match { +case r@escapedIdentifier(i) => + UnresolvedRegex(i, Some(unresolved_attr.name)) +case _ => + UnresolvedAttribute(nameParts :+ attr) + }) + return ret.get +} + UnresolvedAttribute(nameParts :+ attr) case e => UnresolvedExtractValue(e, Literal(attr)) } } /** - * Create an [[UnresolvedAttribute]] expression. + * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] if it is a regex + * quoted in `` */ override def visitColumnReference(ctx: ColumnReferenceContext): Expression = withOrigin(ctx) { +if (conf.supportQuotedIdentifiers) { + val escapedIdentifier = "`(.+)`".r + val ret = Option(ctx.getStart).map(_.getText match { --- End diff -- Using an option here does not add a thing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117367722 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create a dereference expression. The return type depends on the type of the parent, this can - * either be a [[UnresolvedAttribute]] (if the parent is an [[UnresolvedAttribute]]), or an - * [[UnresolvedExtractValue]] if the parent is some expression. + * Create a dereference expression. The return type depends on the type of the parent. + * If the parent is an [[UnresolvedAttribute]], it can be a [[UnresolvedAttribute]] or + * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some other expression, + * it can be [[UnresolvedExtractValue]]. */ override def visitDereference(ctx: DereferenceContext): Expression = withOrigin(ctx) { val attr = ctx.fieldName.getText expression(ctx.base) match { - case UnresolvedAttribute(nameParts) => + case unresolved_attr @ UnresolvedAttribute(nameParts) => +if (conf.supportQuotedIdentifiers) { + val escapedIdentifier = "`(.+)`".r --- End diff -- We don't need to compile the same regex over and over. Can you move this to the ParserUtils... I am also wondering if we shouldn't do the match in the parser it self. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18031 **[Test build #77070 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77070/testReport)** for PR 18031 at commit [`970421b`](https://github.com/apache/spark/commit/970421b2a5cb2278d60403f72dc165418e4faf87). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18031 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org