[GitHub] spark pull request #15856: [SPARK-17982][SQL][BACKPORT-2.0] SQLBuilder shoul...

2016-11-11 Thread dongjoon-hyun
Github user dongjoon-hyun closed the pull request at:

https://github.com/apache/spark/pull/15856


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15856: [SPARK-17982][SQL][BACKPORT-2.0] SQLBuilder shoul...

2016-11-11 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/15856

[SPARK-17982][SQL][BACKPORT-2.0] SQLBuilder should wrap the generated SQL 
with parenthesis for LIMIT

## What changes were proposed in this pull request?

Currently, `SQLBuilder` handles `LIMIT` by always adding `LIMIT` at the end 
of the generated subSQL. It makes `RuntimeException`s like the following. This 
PR adds a parenthesis always except `SubqueryAlias` is used together with 
`LIMIT`.

**Before**

``` scala
scala> sql("CREATE TABLE tbl(id INT)")
scala> sql("CREATE VIEW v1(id2) AS SELECT id FROM tbl LIMIT 2")
java.lang.RuntimeException: Failed to analyze the canonicalized SQL: ...
```

**After**

``` scala
scala> sql("CREATE TABLE tbl(id INT)")
scala> sql("CREATE VIEW v1(id2) AS SELECT id FROM tbl LIMIT 2")
scala> sql("SELECT id2 FROM v1")
res4: org.apache.spark.sql.DataFrame = [id2: int]
```

**Fixed cases in this PR**

The following two cases are the detail query plans having problematic SQL 
generations.

1. `SELECT * FROM (SELECT id FROM tbl LIMIT 2)`

Please note that **FROM SELECT** part of the generated SQL in the 
below. When we don't use '()' for limit, this fails.

```scala
# Original logical plan:
Project [id#1]
+- GlobalLimit 2
   +- LocalLimit 2
  +- Project [id#1]
 +- MetastoreRelation default, tbl

# Canonicalized logical plan:
Project [gen_attr_0#1 AS id#4]
+- SubqueryAlias tbl
   +- Project [gen_attr_0#1]
  +- GlobalLimit 2
 +- LocalLimit 2
+- Project [gen_attr_0#1]
   +- SubqueryAlias gen_subquery_0
  +- Project [id#1 AS gen_attr_0#1]
 +- SQLTable default, tbl, [id#1]

# Generated SQL:
SELECT `gen_attr_0` AS `id` FROM (SELECT `gen_attr_0` FROM SELECT 
`gen_attr_0` FROM (SELECT `id` AS `gen_attr_0` FROM `default`.`tbl`) AS 
gen_subquery_0 LIMIT 2) AS tbl
```

2. `SELECT * FROM (SELECT id FROM tbl TABLESAMPLE (2 ROWS))`

Please note that **((~~~) AS gen_subquery_0 LIMIT 2)** in the below. 
When we use '()' for limit on `SubqueryAlias`, this fails.

```scala
# Original logical plan:
Project [id#1]
+- Project [id#1]
   +- GlobalLimit 2
  +- LocalLimit 2
 +- MetastoreRelation default, tbl

# Canonicalized logical plan:
Project [gen_attr_0#1 AS id#4]
+- SubqueryAlias tbl
   +- Project [gen_attr_0#1]
  +- GlobalLimit 2
 +- LocalLimit 2
+- SubqueryAlias gen_subquery_0
   +- Project [id#1 AS gen_attr_0#1]
  +- SQLTable default, tbl, [id#1]

# Generated SQL:
SELECT `gen_attr_0` AS `id` FROM (SELECT `gen_attr_0` FROM ((SELECT `id` AS 
`gen_attr_0` FROM `default`.`tbl`) AS gen_subquery_0 LIMIT 2)) AS tbl
```

## How was this patch tested?

Pass the Jenkins test with a newly added test case.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-17982-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15856.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15856


commit 92d901b13be0a60cfda7cd8fba4ec8bb3c0610f6
Author: Dongjoon Hyun 
Date:   2016-11-11T22:36:44Z

[SPARK-17982][SQL][BACKPORT-2.0] SQLBuilder should wrap the generated SQL 
with parenthesis for LIMIT




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org