[GitHub] [spark] yaooqinn opened a new pull request #30332: [SPARK-33419][SQL] Unexpected behavior when using SET commands before a query in SparkSession.sql

GitBox Wed, 11 Nov 2020 02:31:58 -0800


yaooqinn opened a new pull request #30332:
URL: https://github.com/apache/spark/pull/30332



   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section 
is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster 
reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class 
hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other 
DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   SparkSession.sql converts a string value to a DataFrame, and the string 
value should be one single SQL statement ending up w/ or w/o one or more 
semicolons. e.g.
   
   ```sql
   scala> spark.sql(" select 2").show
   +---+
   |  2|
   +---+
   |  2|
   +---+
   scala> spark.sql(" select 2;").show
   +---+
   |  2|
   +---+
   |  2|
   +---+
   
   scala> spark.sql(" select 2;;;;").show
   +---+
   |  2|
   +---+
   |  2|
   +---+
   ```
   If we put 2 or more statements in, it fails in the parser as expected, e.g.
   
   ```sql
   scala> spark.sql(" select 2; select 1;").show
   org.apache.spark.sql.catalyst.parser.ParseException:
   extraneous input 'select' expecting {<EOF>, ';'}(line 1, pos 11)
   
   == SQL ==
    select 2; select 1;
   -----------^^^
   
     at 
org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:263)
     at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:130)
     at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51)
     at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:81)
     at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:610)
     at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
     at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:610)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:769)
     at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:607)
     ... 47 elided
   ```
   
   As a very generic user scenario, users may want to change some settings 
before they execute
   the queries. They may pass a string value like `set spark.sql.abc=2; select 
1;` into this API, which creates a confusing gap between the actual effect and 
the user's expectations.
   
   The user may want the query to be executed with spark.sql.abc=2, but Spark 
actually treats the whole part of `2; select 1;` as the value of the property 
'spark.sql.abc',
   e.g.
   
   ```
   scala> spark.sql("set spark.sql.abc=2; select 1;").show
   +-------------+------------+
   |          key|       value|
   +-------------+------------+
   |spark.sql.abc|2; select 1;|
   +-------------+------------+
   ```
   
   What's more, the SET symbol could digest everything behind it, which makes 
it unstable from version to version, e.g.
   
   #### 3.1
   ```sql
   scala> spark.sql("set;").show
   org.apache.spark.sql.catalyst.parser.ParseException:
   Expected format is 'SET', 'SET key', or 'SET key=value'. If you want to 
include special characters in key, please use quotes, e.g., SET `ke 
y`=value.(line 1, pos 0)
   
   == SQL ==
   set;
   ^^^
   
     at 
org.apache.spark.sql.execution.SparkSqlAstBuilder.$anonfun$visitSetConfiguration$1(SparkSqlParser.scala:83)
     at 
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:113)
     at 
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitSetConfiguration(SparkSqlParser.scala:72)
     at 
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitSetConfiguration(SparkSqlParser.scala:58)
     at 
org.apache.spark.sql.catalyst.parser.SqlBaseParser$SetConfigurationContext.accept(SqlBaseParser.java:2161)
     at 
org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
     at 
org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitSingleStatement$1(AstBuilder.scala:77)
     at 
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:113)
     at 
org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:77)
     at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.$anonfun$parsePlan$1(ParseDriver.scala:82)
     at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:113)
     at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51)
     at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:81)
     at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:610)
     at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
     at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:610)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:769)
     at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:607)
     ... 47 elided
   
   scala> spark.sql("set a;").show
   org.apache.spark.sql.catalyst.parser.ParseException:
   Expected format is 'SET', 'SET key', or 'SET key=value'. If you want to 
include special characters in key, please use quotes, e.g., SET `ke 
y`=value.(line 1, pos 0)
   
   == SQL ==
   set a;
   ^^^
   
     at 
org.apache.spark.sql.execution.SparkSqlAstBuilder.$anonfun$visitSetConfiguration$1(SparkSqlParser.scala:83)
     at 
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:113)
     at 
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitSetConfiguration(SparkSqlParser.scala:72)
     at 
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitSetConfiguration(SparkSqlParser.scala:58)
     at 
org.apache.spark.sql.catalyst.parser.SqlBaseParser$SetConfigurationContext.accept(SqlBaseParser.java:2161)
     at 
org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
     at 
org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitSingleStatement$1(AstBuilder.scala:77)
     at 
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:113)
     at 
org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:77)
     at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.$anonfun$parsePlan$1(ParseDriver.scala:82)
     at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:113)
     at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51)
     at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:81)
     at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:610)
     at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
     at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:610)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:769)
     at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:607)
     ... 47 elided
   ```
   
   #### 2.4
   
   ```sql
   scala> spark.sql("set;").show
   +---+-----------+
   |key|      value|
   +---+-----------+
   |  ;|<undefined>|
   +---+-----------+
   
   
   scala> spark.sql("set a;").show
   +---+-----------+
   |key|      value|
   +---+-----------+
   | a;|<undefined>|
   +---+-----------+
   ```
   
   In this PR, 
   1.  make `set spark.sql.abc=2; select 1;` in `SparkSession.sql` fail 
directly, user should call `.sql` for each statement separately.
   2. make the semicolon as the separator of statements, and if users want to 
use it as part of the property value, shall use quotes too.
   
   
   ### Why are the changes needed?
   
   1. disambiguation for  `SparkSession.sql` 
   2. make semicolon work same both w/ `SET` and other statements
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as 
the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes 
- provide the console output, description and/or an example to show the 
behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to 
the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   yes,
   the semicolon works as a separator of statements now, it will be trimmed if 
it is at the end of the statement and fail the statement if it is in the 
middle. you need to use quotes if you want it to be part of the property value
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some 
test cases that check the changes thoroughly including negative and positive 
cases if possible.
   If it was tested in a way different from regular unit tests, please clarify 
how you tested step by step, ideally copy and paste-able, so that other 
reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why 
it was difficult to add.
   -->
   
   new tests


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] yaooqinn opened a new pull request #30332: [SPARK-33419][SQL] Unexpected behavior when using SET commands before a query in SparkSession.sql

Reply via email to