[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-11-01 Thread dongjoon-hyun
Github user dongjoon-hyun closed the pull request at:

https://github.com/apache/spark/pull/14435


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-08-03 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14435#discussion_r73334242
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -739,6 +931,15 @@ case class Sample(
 case class Distinct(child: LogicalPlan) extends UnaryNode {
   override def maxRows: Option[Long] = child.maxRows
   override def output: Seq[Attribute] = child.output
+
+  override def sql: String = child match {
+case Union(children) =>
+  val childrenSql = children.map(c => s"(${c.sql})")
+  childrenSql.mkString(" UNION DISTINCT ")
+
+case _: Project =>
--- End diff --

Replace `_` to `p` and use it instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-08-03 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14435#discussion_r73334167
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -731,6 +909,20 @@ case class Sample(
   }
 
   override protected def otherCopyArgs: Seq[AnyRef] = isTableSample :: Nil
+
+  override def sql: String = child match {
+case SubqueryAlias(alias, _: NonSQLPlan) =>
+  val repeatable = if (withReplacement) s" REPEATABLE ($seed)" else ""
+  s"$alias TABLESAMPLE(${upperBound * 100} PERCENT)$repeatable"
+
+case SubqueryAlias(alias, grandChild) =>
+  val repeatable = if (withReplacement) s" REPEATABLE ($seed)" else ""
+  s"${grandChild.sql} TABLESAMPLE(${upperBound * 100} 
PERCENT)$repeatable $alias"
+
+case _ =>
+  val repeatable = if (withReplacement) s" REPEATABLE ($seed)" else ""
--- End diff --

`repeatable` repeated three times (and don't think it's on purpose despite 
the name :))


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-08-03 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14435#discussion_r73334011
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -692,11 +864,17 @@ case class LocalLimit(limitExpr: Expression, child: 
LogicalPlan) extends UnaryNo
 }
 child.statistics.copy(sizeInBytes = sizeInBytes)
   }
+
+  override def sql: String = child.sql
 }
 
 case class SubqueryAlias(alias: String, child: LogicalPlan) extends 
UnaryNode {
 
   override def output: Seq[Attribute] = 
child.output.map(_.withQualifier(Some(alias)))
+
+  override def sql: String = child match {
+case _ => if (child.sql.equals(alias)) child.sql else s"(${child.sql}) 
AS $alias"
--- End diff --

Why do you pattern match here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-08-03 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14435#discussion_r7845
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -495,6 +573,92 @@ case class Aggregate(
   super.statistics
 }
   }
+
+  private def sameOutput(output1: Seq[Attribute], output2: 
Seq[Attribute]): Boolean =
+output1.size == output2.size &&
+  output1.zip(output2).forall(pair => pair._1.semanticEquals(pair._2))
+
+  private def isGroupingSet(e: Expand, p: Project) = {
--- End diff --

Ah, so you are using `{` to wrap boolean one-liners :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-08-03 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14435#discussion_r7741
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -495,6 +573,92 @@ case class Aggregate(
   super.statistics
 }
   }
+
+  private def sameOutput(output1: Seq[Attribute], output2: 
Seq[Attribute]): Boolean =
+output1.size == output2.size &&
+  output1.zip(output2).forall(pair => pair._1.semanticEquals(pair._2))
--- End diff --

Think `forall { case (left, right) => left semanticEquals right }` (or with 
dots) could be more readable. WDYT?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-08-03 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14435#discussion_r7585
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -495,6 +573,92 @@ case class Aggregate(
   super.statistics
 }
   }
+
+  private def sameOutput(output1: Seq[Attribute], output2: 
Seq[Attribute]): Boolean =
--- End diff --

Wrap it around `{` and `}`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-08-03 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14435#discussion_r7461
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -167,6 +212,8 @@ case class Intersect(left: LogicalPlan, right: 
LogicalPlan) extends SetOperation
 
 Statistics(sizeInBytes = sizeInBytes, isBroadcastable = 
isBroadcastable)
   }
+
+  override def sql: String = s"(${left.sql}) INTERSECT (${right.sql})"
--- End diff --

I think `INTERSECT` et al should be a value to later limit typos (I think 
there are tests that test that the output contains `INTERSET`, right?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-08-03 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14435#discussion_r7286
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -112,6 +152,11 @@ case class Filter(condition: Expression, child: 
LogicalPlan)
   .filterNot(SubqueryExpression.hasCorrelatedSubquery)
 child.constraints.union(predicates.toSet)
   }
+
+  override def sql: String = child match {
+case _: Aggregate => s"${child.sql} HAVING ${condition.sql}"
+case _ => s"${child.sql} WHERE ${condition.sql}"
--- End diff --

The only difference is `HAVING` vs `WHERE`, right?

Mind extracting the small difference out and doing the following instead:

```
val havingOrWhere = ???
s"${child.sql} $havingOrWhere ${condition.sql}"
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-08-03 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14435#discussion_r73332988
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -99,6 +133,12 @@ case class Generate(
 
 if (join) child.output ++ qualified else qualified
   }
+
+  override def sql: String = {
+val columnAliases = generatorOutput.map(_.sql).mkString(", ")
+s"${child.sql} LATERAL VIEW ${if (outer) "OUTER" else ""} " +
--- End diff --

Could you move `${if...` outside the interpolated string?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-08-03 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14435#discussion_r73332573
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -53,6 +53,40 @@ case class Project(projectList: Seq[NamedExpression], 
child: LogicalPlan) extend
 
   override def validConstraints: Set[Expression] =
 child.constraints.union(getAliasedConstraints(projectList))
+
+  override def sql: String = {
+if (projectList.exists(expr => expr.find(e => 
e.isInstanceOf[NonSQLExpression]).isDefined)) {
+  throw new UnsupportedOperationException("NonSQLExpression")
--- End diff --

Would `assert` be applicable here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-08-03 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14435#discussion_r73332522
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -53,6 +53,40 @@ case class Project(projectList: Seq[NamedExpression], 
child: LogicalPlan) extend
 
   override def validConstraints: Set[Expression] =
 child.constraints.union(getAliasedConstraints(projectList))
+
+  override def sql: String = {
+if (projectList.exists(expr => expr.find(e => 
e.isInstanceOf[NonSQLExpression]).isDefined)) {
--- End diff --

This `if` is overly complex and forces a reader to read the expression 
inside-out (not left to right).

I can't propose anything better than beg for a boolean val with the entire 
boolean condition on a separate line.

WDYT?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14435: [SPARK-16756][SQL][WIP] Add `sql` function to Log...

2016-08-01 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/14435

[SPARK-16756][SQL][WIP] Add `sql` function to LogicalPlan and `NonSQLPlan` 
trait

## What changes were proposed in this pull request?
This PR is a part of 
[SPARK-16576](https://issues.apache.org/jira/browse/SPARK-16576) that moves 
logicalPlan SQL generation code from SQLBuilder into logical operators.

Like `Expression`, this PR adds `sql` function for `LogicalPlan` and 
`NonSQLPlan` trait. The method will be `abstract` method. All logical plan 
should implement that or use trait `NonSQLPlan` explicitly.

```scala
/**
 * Returns SQL representation of this plan. For the plans extending 
[[NonSQLPlan]],
 * this method may return an arbitrary user facing string.
 */
def sql: String
```

This PR updates testsuites including`LogicalPlanToSQLSuite` and 
`ExpressionToSQLSuite` in order to test new `sql` function, but does not remove 
`SQLBuilder` and its usage in `views.scala` of `sql/core`.

## How was this patch tested?

Pass the Jenkins tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-16756

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14435.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14435


commit 1392c8a62e57c0a9b66555d4ac676eb0269533a3
Author: Dongjoon Hyun 
Date:   2016-08-01T07:00:11Z

[SPARK-16756][SQL] Add `sql` function to LogicalPlan and `NonSQLPlan` trait




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org