spark git commit: [SPARK-17982][SQL] SQLBuilder should wrap the generated SQL with parenthesis for LIMIT
Repository: spark Updated Branches: refs/heads/master a531fe1a8 -> d42bb7cc4 [SPARK-17982][SQL] SQLBuilder should wrap the generated SQL with parenthesis for LIMIT ## What changes were proposed in this pull request? Currently, `SQLBuilder` handles `LIMIT` by always adding `LIMIT` at the end of the generated subSQL. It makes `RuntimeException`s like the following. This PR adds a parenthesis always except `SubqueryAlias` is used together with `LIMIT`. **Before** ``` scala scala> sql("CREATE TABLE tbl(id INT)") scala> sql("CREATE VIEW v1(id2) AS SELECT id FROM tbl LIMIT 2") java.lang.RuntimeException: Failed to analyze the canonicalized SQL: ... ``` **After** ``` scala scala> sql("CREATE TABLE tbl(id INT)") scala> sql("CREATE VIEW v1(id2) AS SELECT id FROM tbl LIMIT 2") scala> sql("SELECT id2 FROM v1") res4: org.apache.spark.sql.DataFrame = [id2: int] ``` **Fixed cases in this PR** The following two cases are the detail query plans having problematic SQL generations. 1. `SELECT * FROM (SELECT id FROM tbl LIMIT 2)` Please note that **FROM SELECT** part of the generated SQL in the below. When we don't use '()' for limit, this fails. ```scala # Original logical plan: Project [id#1] +- GlobalLimit 2 +- LocalLimit 2 +- Project [id#1] +- MetastoreRelation default, tbl # Canonicalized logical plan: Project [gen_attr_0#1 AS id#4] +- SubqueryAlias tbl +- Project [gen_attr_0#1] +- GlobalLimit 2 +- LocalLimit 2 +- Project [gen_attr_0#1] +- SubqueryAlias gen_subquery_0 +- Project [id#1 AS gen_attr_0#1] +- SQLTable default, tbl, [id#1] # Generated SQL: SELECT `gen_attr_0` AS `id` FROM (SELECT `gen_attr_0` FROM SELECT `gen_attr_0` FROM (SELECT `id` AS `gen_attr_0` FROM `default`.`tbl`) AS gen_subquery_0 LIMIT 2) AS tbl ``` 2. `SELECT * FROM (SELECT id FROM tbl TABLESAMPLE (2 ROWS))` Please note that **((~~~) AS gen_subquery_0 LIMIT 2)** in the below. When we use '()' for limit on `SubqueryAlias`, this fails. ```scala # Original logical plan: Project [id#1] +- Project [id#1] +- GlobalLimit 2 +- LocalLimit 2 +- MetastoreRelation default, tbl # Canonicalized logical plan: Project [gen_attr_0#1 AS id#4] +- SubqueryAlias tbl +- Project [gen_attr_0#1] +- GlobalLimit 2 +- LocalLimit 2 +- SubqueryAlias gen_subquery_0 +- Project [id#1 AS gen_attr_0#1] +- SQLTable default, tbl, [id#1] # Generated SQL: SELECT `gen_attr_0` AS `id` FROM (SELECT `gen_attr_0` FROM ((SELECT `id` AS `gen_attr_0` FROM `default`.`tbl`) AS gen_subquery_0 LIMIT 2)) AS tbl ``` ## How was this patch tested? Pass the Jenkins test with a newly added test case. Author: Dongjoon HyunCloses #15546 from dongjoon-hyun/SPARK-17982. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d42bb7cc Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d42bb7cc Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d42bb7cc Branch: refs/heads/master Commit: d42bb7cc4e32c173769bd7da5b9b5eafb510860c Parents: a531fe1 Author: Dongjoon Hyun Authored: Fri Nov 11 13:28:18 2016 -0800 Committer: gatorsmile Committed: Fri Nov 11 13:28:18 2016 -0800 -- .../scala/org/apache/spark/sql/catalyst/SQLBuilder.scala | 7 ++- .../src/test/resources/sqlgen/generate_with_other_1.sql | 2 +- .../src/test/resources/sqlgen/generate_with_other_2.sql | 2 +- sql/hive/src/test/resources/sqlgen/limit.sql | 4 .../apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala | 10 ++ 5 files changed, 22 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d42bb7cc/sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala index 6f821f8..3804542 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala @@ -138,9 +138,14 @@ class SQLBuilder private ( case g: Generate => generateToSQL(g) -case Limit(limitExpr, child) => +// This prevents a pattern of `((...) AS gen_subquery_0 LIMIT 1)` which does not work. +// For example, `SELECT * FROM (SELECT id FROM tbl TABLESAMPLE (2 ROWS))` makes this plan. +case Limit(limitExpr, child: SubqueryAlias) => s"${toSQL(child)} LIMIT ${limitExpr.sql}" +case Limit(limitExpr, child) => + s"(${toSQL(child)} LIMIT ${limitExpr.sql})" +
spark git commit: [SPARK-17982][SQL] SQLBuilder should wrap the generated SQL with parenthesis for LIMIT
Repository: spark Updated Branches: refs/heads/branch-2.1 00c9c7d96 -> 465e4b40b [SPARK-17982][SQL] SQLBuilder should wrap the generated SQL with parenthesis for LIMIT ## What changes were proposed in this pull request? Currently, `SQLBuilder` handles `LIMIT` by always adding `LIMIT` at the end of the generated subSQL. It makes `RuntimeException`s like the following. This PR adds a parenthesis always except `SubqueryAlias` is used together with `LIMIT`. **Before** ``` scala scala> sql("CREATE TABLE tbl(id INT)") scala> sql("CREATE VIEW v1(id2) AS SELECT id FROM tbl LIMIT 2") java.lang.RuntimeException: Failed to analyze the canonicalized SQL: ... ``` **After** ``` scala scala> sql("CREATE TABLE tbl(id INT)") scala> sql("CREATE VIEW v1(id2) AS SELECT id FROM tbl LIMIT 2") scala> sql("SELECT id2 FROM v1") res4: org.apache.spark.sql.DataFrame = [id2: int] ``` **Fixed cases in this PR** The following two cases are the detail query plans having problematic SQL generations. 1. `SELECT * FROM (SELECT id FROM tbl LIMIT 2)` Please note that **FROM SELECT** part of the generated SQL in the below. When we don't use '()' for limit, this fails. ```scala # Original logical plan: Project [id#1] +- GlobalLimit 2 +- LocalLimit 2 +- Project [id#1] +- MetastoreRelation default, tbl # Canonicalized logical plan: Project [gen_attr_0#1 AS id#4] +- SubqueryAlias tbl +- Project [gen_attr_0#1] +- GlobalLimit 2 +- LocalLimit 2 +- Project [gen_attr_0#1] +- SubqueryAlias gen_subquery_0 +- Project [id#1 AS gen_attr_0#1] +- SQLTable default, tbl, [id#1] # Generated SQL: SELECT `gen_attr_0` AS `id` FROM (SELECT `gen_attr_0` FROM SELECT `gen_attr_0` FROM (SELECT `id` AS `gen_attr_0` FROM `default`.`tbl`) AS gen_subquery_0 LIMIT 2) AS tbl ``` 2. `SELECT * FROM (SELECT id FROM tbl TABLESAMPLE (2 ROWS))` Please note that **((~~~) AS gen_subquery_0 LIMIT 2)** in the below. When we use '()' for limit on `SubqueryAlias`, this fails. ```scala # Original logical plan: Project [id#1] +- Project [id#1] +- GlobalLimit 2 +- LocalLimit 2 +- MetastoreRelation default, tbl # Canonicalized logical plan: Project [gen_attr_0#1 AS id#4] +- SubqueryAlias tbl +- Project [gen_attr_0#1] +- GlobalLimit 2 +- LocalLimit 2 +- SubqueryAlias gen_subquery_0 +- Project [id#1 AS gen_attr_0#1] +- SQLTable default, tbl, [id#1] # Generated SQL: SELECT `gen_attr_0` AS `id` FROM (SELECT `gen_attr_0` FROM ((SELECT `id` AS `gen_attr_0` FROM `default`.`tbl`) AS gen_subquery_0 LIMIT 2)) AS tbl ``` ## How was this patch tested? Pass the Jenkins test with a newly added test case. Author: Dongjoon HyunCloses #15546 from dongjoon-hyun/SPARK-17982. (cherry picked from commit d42bb7cc4e32c173769bd7da5b9b5eafb510860c) Signed-off-by: gatorsmile Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/465e4b40 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/465e4b40 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/465e4b40 Branch: refs/heads/branch-2.1 Commit: 465e4b40b3b7760bfcd0f03a14b805029ed599f1 Parents: 00c9c7d Author: Dongjoon Hyun Authored: Fri Nov 11 13:28:18 2016 -0800 Committer: gatorsmile Committed: Fri Nov 11 13:28:34 2016 -0800 -- .../scala/org/apache/spark/sql/catalyst/SQLBuilder.scala | 7 ++- .../src/test/resources/sqlgen/generate_with_other_1.sql | 2 +- .../src/test/resources/sqlgen/generate_with_other_2.sql | 2 +- sql/hive/src/test/resources/sqlgen/limit.sql | 4 .../apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala | 10 ++ 5 files changed, 22 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/465e4b40/sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala index 6f821f8..3804542 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala @@ -138,9 +138,14 @@ class SQLBuilder private ( case g: Generate => generateToSQL(g) -case Limit(limitExpr, child) => +// This prevents a pattern of `((...) AS gen_subquery_0 LIMIT 1)` which does not work. +// For example, `SELECT * FROM (SELECT id FROM tbl TABLESAMPLE (2 ROWS))` makes this plan. +case Limit(limitExpr, child: SubqueryAlias) =>