javierivanov commented on a change in pull request #27920: [SPARK-31102][SQL]
Spark-sql fails to parse when contains comment.
URL: https://github.com/apache/spark/pull/27920#discussion_r393070691
##########
File path:
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
##########
@@ -402,24 +402,14 @@ class CliSuite extends SparkFunSuite with
BeforeAndAfterAll with Logging {
}
test("SPARK-30049 Should not complain for quotes in commented lines") {
- runCliWithin(1.minute)(
+ runCliWithin(3.minute)(
"""SELECT concat('test', 'comment') -- someone's comment here
|;""".stripMargin -> "testcomment"
)
- }
-
- test("SPARK-30049 Should not complain for quotes in commented with
multi-lines") {
- runCliWithin(1.minute)(
- """SELECT concat('test', 'comment') -- someone's comment here \\
- | comment continues here with single ' quote \\
- | extra ' \\
- |;""".stripMargin -> "testcomment"
- )
- runCliWithin(1.minute)(
- """SELECT concat('test', 'comment') -- someone's comment here \\
- | comment continues here with single ' quote \\
- | extra ' \\
- | ;""".stripMargin -> "testcomment"
Review comment:
Hey @maropu !
The SQL parser does not recognize line-continuity per se.
```
scala> sql(s"""SELECT concat('test', 'comment') -- someone's comment here
\\\ncomment continues here with single ' quote \\\nextra ' \\""")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'continues' expecting {<EOF>, ',', 'CLUSTER', 'DISTRIBUTE',
'EXCEPT', 'FROM', 'GROUP', 'HAVING', 'INTERSECT', 'LATERAL', 'LIMIT', 'ORDER',
'MINUS', 'SORT', 'UNION', 'WHERE', 'WINDOW', '-'}(line 2, pos 8)
== SQL ==
SELECT concat('test', 'comment') -- someone's comment here \
comment continues here with single ' quote \
--------^^^
extra ' \
```
It works just fine for inline comments included backslash:
```
scala> sql(s"""SELECT concat('test', 'comment') -- someone's comment here
\\\n,2""") show
+---------------------+---+
|concat(test, comment)| 2|
+---------------------+---+
| testcomment| 2|
+---------------------+---+
```
But does not work outside the inline comment(the backslash):
```
sql(s"""SELECT concat('test', 'comment') -- someone's comment here
\n,2\\\n""")
org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input '\' expecting <EOF>(line 2, pos 2)
== SQL ==
SELECT concat('test', 'comment') -- someone's comment here
,2\
--^^^
```
Previously worked fine because of this very bug, the insideComment flag
ignored everything until the end of the string. But the spark SQL parser does
not recognize the backslashes. Line-continuity can be added to the CLI. But I
think that feature should be added directly to the SQL parser to avoid
confusion.
Let me know your thoughts 👍
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]