[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/14435 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14435#discussion_r73334242 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -739,6 +931,15 @@ case class Sample( case class Distinct(child: LogicalPlan) extends UnaryNode { override def maxRows: Option[Long] = child.maxRows override def output: Seq[Attribute] = child.output + + override def sql: String = child match { +case Union(children) => + val childrenSql = children.map(c => s"(${c.sql})") + childrenSql.mkString(" UNION DISTINCT ") + +case _: Project => --- End diff -- Replace `_` to `p` and use it instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14435#discussion_r73334167 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -731,6 +909,20 @@ case class Sample( } override protected def otherCopyArgs: Seq[AnyRef] = isTableSample :: Nil + + override def sql: String = child match { +case SubqueryAlias(alias, _: NonSQLPlan) => + val repeatable = if (withReplacement) s" REPEATABLE ($seed)" else "" + s"$alias TABLESAMPLE(${upperBound * 100} PERCENT)$repeatable" + +case SubqueryAlias(alias, grandChild) => + val repeatable = if (withReplacement) s" REPEATABLE ($seed)" else "" + s"${grandChild.sql} TABLESAMPLE(${upperBound * 100} PERCENT)$repeatable $alias" + +case _ => + val repeatable = if (withReplacement) s" REPEATABLE ($seed)" else "" --- End diff -- `repeatable` repeated three times (and don't think it's on purpose despite the name :)) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14435#discussion_r73334011 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -692,11 +864,17 @@ case class LocalLimit(limitExpr: Expression, child: LogicalPlan) extends UnaryNo } child.statistics.copy(sizeInBytes = sizeInBytes) } + + override def sql: String = child.sql } case class SubqueryAlias(alias: String, child: LogicalPlan) extends UnaryNode { override def output: Seq[Attribute] = child.output.map(_.withQualifier(Some(alias))) + + override def sql: String = child match { +case _ => if (child.sql.equals(alias)) child.sql else s"(${child.sql}) AS $alias" --- End diff -- Why do you pattern match here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14435#discussion_r7845 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -495,6 +573,92 @@ case class Aggregate( super.statistics } } + + private def sameOutput(output1: Seq[Attribute], output2: Seq[Attribute]): Boolean = +output1.size == output2.size && + output1.zip(output2).forall(pair => pair._1.semanticEquals(pair._2)) + + private def isGroupingSet(e: Expand, p: Project) = { --- End diff -- Ah, so you are using `{` to wrap boolean one-liners :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14435#discussion_r7741 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -495,6 +573,92 @@ case class Aggregate( super.statistics } } + + private def sameOutput(output1: Seq[Attribute], output2: Seq[Attribute]): Boolean = +output1.size == output2.size && + output1.zip(output2).forall(pair => pair._1.semanticEquals(pair._2)) --- End diff -- Think `forall { case (left, right) => left semanticEquals right }` (or with dots) could be more readable. WDYT? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14435#discussion_r7585 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -495,6 +573,92 @@ case class Aggregate( super.statistics } } + + private def sameOutput(output1: Seq[Attribute], output2: Seq[Attribute]): Boolean = --- End diff -- Wrap it around `{` and `}`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14435#discussion_r7461 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -167,6 +212,8 @@ case class Intersect(left: LogicalPlan, right: LogicalPlan) extends SetOperation Statistics(sizeInBytes = sizeInBytes, isBroadcastable = isBroadcastable) } + + override def sql: String = s"(${left.sql}) INTERSECT (${right.sql})" --- End diff -- I think `INTERSECT` et al should be a value to later limit typos (I think there are tests that test that the output contains `INTERSET`, right?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14435#discussion_r7286 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -112,6 +152,11 @@ case class Filter(condition: Expression, child: LogicalPlan) .filterNot(SubqueryExpression.hasCorrelatedSubquery) child.constraints.union(predicates.toSet) } + + override def sql: String = child match { +case _: Aggregate => s"${child.sql} HAVING ${condition.sql}" +case _ => s"${child.sql} WHERE ${condition.sql}" --- End diff -- The only difference is `HAVING` vs `WHERE`, right? Mind extracting the small difference out and doing the following instead: ``` val havingOrWhere = ??? s"${child.sql} $havingOrWhere ${condition.sql}" ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14435#discussion_r73332988 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -99,6 +133,12 @@ case class Generate( if (join) child.output ++ qualified else qualified } + + override def sql: String = { +val columnAliases = generatorOutput.map(_.sql).mkString(", ") +s"${child.sql} LATERAL VIEW ${if (outer) "OUTER" else ""} " + --- End diff -- Could you move `${if...` outside the interpolated string? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14435#discussion_r73332573 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -53,6 +53,40 @@ case class Project(projectList: Seq[NamedExpression], child: LogicalPlan) extend override def validConstraints: Set[Expression] = child.constraints.union(getAliasedConstraints(projectList)) + + override def sql: String = { +if (projectList.exists(expr => expr.find(e => e.isInstanceOf[NonSQLExpression]).isDefined)) { + throw new UnsupportedOperationException("NonSQLExpression") --- End diff -- Would `assert` be applicable here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14435#discussion_r73332522 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -53,6 +53,40 @@ case class Project(projectList: Seq[NamedExpression], child: LogicalPlan) extend override def validConstraints: Set[Expression] = child.constraints.union(getAliasedConstraints(projectList)) + + override def sql: String = { +if (projectList.exists(expr => expr.find(e => e.isInstanceOf[NonSQLExpression]).isDefined)) { --- End diff -- This `if` is overly complex and forces a reader to read the expression inside-out (not left to right). I can't propose anything better than beg for a boolean val with the entire boolean condition on a separate line. WDYT? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/14435 [SPARK-16756][SQL][WIP] Add `sql` function to LogicalPlan and `NonSQLPlan` trait ## What changes were proposed in this pull request? This PR is a part of [SPARK-16576](https://issues.apache.org/jira/browse/SPARK-16576) that moves logicalPlan SQL generation code from SQLBuilder into logical operators. Like `Expression`, this PR adds `sql` function for `LogicalPlan` and `NonSQLPlan` trait. The method will be `abstract` method. All logical plan should implement that or use trait `NonSQLPlan` explicitly. ```scala /** * Returns SQL representation of this plan. For the plans extending [[NonSQLPlan]], * this method may return an arbitrary user facing string. */ def sql: String ``` This PR updates testsuites including`LogicalPlanToSQLSuite` and `ExpressionToSQLSuite` in order to test new `sql` function, but does not remove `SQLBuilder` and its usage in `views.scala` of `sql/core`. ## How was this patch tested? Pass the Jenkins tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-16756 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14435.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14435 commit 1392c8a62e57c0a9b66555d4ac676eb0269533a3 Author: Dongjoon Hyun Date: 2016-08-01T07:00:11Z [SPARK-16756][SQL] Add `sql` function to LogicalPlan and `NonSQLPlan` trait --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org