Xinyi Yu created SPARK-38384:
--------------------------------

             Summary: Improve error messages of ParseException from ANTLR
                 Key: SPARK-38384
                 URL: https://issues.apache.org/jira/browse/SPARK-38384
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.3.0
            Reporter: Xinyi Yu


This task is intended to improve the error messages of ParseException directly 
coming from ANTLR.
h2. Bad Error Messages

Many error messages defined in ANTLR are not user-friendly. For example,
{code:java}
spark.sql("sel 1")
 
ParseException: 
mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 
'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 
'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 
'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 
'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 
'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 'TRUNCATE', 'UNCACHE', 
'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)
 
== SQL ==
sel 1
^^^ {code}
Following the [Spark Error Message 
Guidelines|https://spark.apache.org/error-message-guidelines.html], the words 
in this message are vague and hard to follow. It states ‘What’, but is unclear 
on the ‘Why’ and ‘How’.

Or,
{code:java}
spark.sql("") // empty query

ParseException: 
mismatched input '<EOF>' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', 
'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 
'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 
'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 
'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 
'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 
'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)

== SQL ==

^^^ {code}
Instead of simply telling users it’s an empty line, it outputs a long message, 
even giving the jargon '<EOF>'.
h2. Where do these error messages come from?

There has been much work on improving ParseException in general (see 
[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala]
 for example). But lots of the above error messages are defined in ANTLR and 
stay unmodified in Spark.

When such an error is encountered in ANTLR, ANTLR notified the exception 
listener with a message like ‘mismatched input {} expecting {}’. The Spark 
exception listener _appends_ the line and position to the message, as well as 
the problematic SQL and several ‘^^^’ marking the error position. Then it 
throws a ParseException with the appended error message. Spark doesn’t modify 
the error message given from ANTLR. 

This task focuses on those error messages from ANTLR.
h2. Goals
 # Improve the error messages of ParseException that are from ANTLR; Modify all 
affected test cases accordingly.
 # Make sure the new error message framework is applied in this change.

h2. Proposed Error Messages Change

It should be in each sub-task and includes concrete before & after cases. See 
the description of each sub-task for more details.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to