yaooqinn opened a new pull request #30332:
URL: https://github.com/apache/spark/pull/30332
### What changes were proposed in this pull request?
<!--
Please clarify what changes you are proposing. The purpose of this section
is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR. See the examples below.
1. If you refactor some codes with changing classes, showing the class
hierarchy will help reviewers.
2. If you fix some SQL features, you can provide some references of other
DBMSes.
3. If there is design documentation, please add the link.
4. If there is a discussion in the mailing list, please add the link.
-->
SparkSession.sql converts a string value to a DataFrame, and the string
value should be one single SQL statement ending up w/ or w/o one or more
semicolons. e.g.
```sql
scala> spark.sql(" select 2").show
+---+
| 2|
+---+
| 2|
+---+
scala> spark.sql(" select 2;").show
+---+
| 2|
+---+
| 2|
+---+
scala> spark.sql(" select 2;;;;").show
+---+
| 2|
+---+
| 2|
+---+
```
If we put 2 or more statements in, it fails in the parser as expected, e.g.
```sql
scala> spark.sql(" select 2; select 1;").show
org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input 'select' expecting {<EOF>, ';'}(line 1, pos 11)
== SQL ==
select 2; select 1;
-----------^^^
at
org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:263)
at
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:130)
at
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51)
at
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:81)
at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:610)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:610)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:769)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:607)
... 47 elided
```
As a very generic user scenario, users may want to change some settings
before they execute
the queries. They may pass a string value like `set spark.sql.abc=2; select
1;` into this API, which creates a confusing gap between the actual effect and
the user's expectations.
The user may want the query to be executed with spark.sql.abc=2, but Spark
actually treats the whole part of `2; select 1;` as the value of the property
'spark.sql.abc',
e.g.
```
scala> spark.sql("set spark.sql.abc=2; select 1;").show
+-------------+------------+
| key| value|
+-------------+------------+
|spark.sql.abc|2; select 1;|
+-------------+------------+
```
What's more, the SET symbol could digest everything behind it, which makes
it unstable from version to version, e.g.
#### 3.1
```sql
scala> spark.sql("set;").show
org.apache.spark.sql.catalyst.parser.ParseException:
Expected format is 'SET', 'SET key', or 'SET key=value'. If you want to
include special characters in key, please use quotes, e.g., SET `ke
y`=value.(line 1, pos 0)
== SQL ==
set;
^^^
at
org.apache.spark.sql.execution.SparkSqlAstBuilder.$anonfun$visitSetConfiguration$1(SparkSqlParser.scala:83)
at
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:113)
at
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitSetConfiguration(SparkSqlParser.scala:72)
at
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitSetConfiguration(SparkSqlParser.scala:58)
at
org.apache.spark.sql.catalyst.parser.SqlBaseParser$SetConfigurationContext.accept(SqlBaseParser.java:2161)
at
org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
at
org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitSingleStatement$1(AstBuilder.scala:77)
at
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:113)
at
org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:77)
at
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.$anonfun$parsePlan$1(ParseDriver.scala:82)
at
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:113)
at
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51)
at
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:81)
at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:610)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:610)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:769)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:607)
... 47 elided
scala> spark.sql("set a;").show
org.apache.spark.sql.catalyst.parser.ParseException:
Expected format is 'SET', 'SET key', or 'SET key=value'. If you want to
include special characters in key, please use quotes, e.g., SET `ke
y`=value.(line 1, pos 0)
== SQL ==
set a;
^^^
at
org.apache.spark.sql.execution.SparkSqlAstBuilder.$anonfun$visitSetConfiguration$1(SparkSqlParser.scala:83)
at
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:113)
at
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitSetConfiguration(SparkSqlParser.scala:72)
at
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitSetConfiguration(SparkSqlParser.scala:58)
at
org.apache.spark.sql.catalyst.parser.SqlBaseParser$SetConfigurationContext.accept(SqlBaseParser.java:2161)
at
org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
at
org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitSingleStatement$1(AstBuilder.scala:77)
at
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:113)
at
org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:77)
at
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.$anonfun$parsePlan$1(ParseDriver.scala:82)
at
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:113)
at
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51)
at
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:81)
at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:610)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:610)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:769)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:607)
... 47 elided
```
#### 2.4
```sql
scala> spark.sql("set;").show
+---+-----------+
|key| value|
+---+-----------+
| ;|<undefined>|
+---+-----------+
scala> spark.sql("set a;").show
+---+-----------+
|key| value|
+---+-----------+
| a;|<undefined>|
+---+-----------+
```
In this PR,
1. make `set spark.sql.abc=2; select 1;` in `SparkSession.sql` fail
directly, user should call `.sql` for each statement separately.
2. make the semicolon as the separator of statements, and if users want to
use it as part of the property value, shall use quotes too.
### Why are the changes needed?
1. disambiguation for `SparkSession.sql`
2. make semicolon work same both w/ `SET` and other statements
### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such as
the documentation fix.
If yes, please clarify the previous behavior and the change this PR proposes
- provide the console output, description and/or an example to show the
behavior difference if possible.
If possible, please also clarify if this is a user-facing change compared to
the released Spark versions or within the unreleased branches such as master.
If no, write 'No'.
-->
yes,
the semicolon works as a separator of statements now, it will be trimmed if
it is at the end of the statement and fail the statement if it is in the
middle. you need to use quotes if you want it to be part of the property value
### How was this patch tested?
<!--
If tests were added, say they were added here. Please make sure to add some
test cases that check the changes thoroughly including negative and positive
cases if possible.
If it was tested in a way different from regular unit tests, please clarify
how you tested step by step, ideally copy and paste-able, so that other
reviewers can test and check, and descendants can verify in the future.
If tests were not added, please describe why they were not added and/or why
it was difficult to add.
-->
new tests
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]