Xinyi Yu created SPARK-38384: -------------------------------- Summary: Improve error messages of ParseException from ANTLR Key: SPARK-38384 URL: https://issues.apache.org/jira/browse/SPARK-38384 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Xinyi Yu
This task is intended to improve the error messages of ParseException directly coming from ANTLR. h2. Bad Error Messages Many error messages defined in ANTLR are not user-friendly. For example, {code:java} spark.sql("sel 1") ParseException: mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == sel 1 ^^^ {code} Following the [Spark Error Message Guidelines|https://spark.apache.org/error-message-guidelines.html], the words in this message are vague and hard to follow. It states ‘What’, but is unclear on the ‘Why’ and ‘How’. Or, {code:java} spark.sql("") // empty query ParseException: mismatched input '<EOF>' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == ^^^ {code} Instead of simply telling users it’s an empty line, it outputs a long message, even giving the jargon '<EOF>'. h2. Where do these error messages come from? There has been much work on improving ParseException in general (see [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala] for example). But lots of the above error messages are defined in ANTLR and stay unmodified in Spark. When such an error is encountered in ANTLR, ANTLR notified the exception listener with a message like ‘mismatched input {} expecting {}’. The Spark exception listener _appends_ the line and position to the message, as well as the problematic SQL and several ‘^^^’ marking the error position. Then it throws a ParseException with the appended error message. Spark doesn’t modify the error message given from ANTLR. This task focuses on those error messages from ANTLR. h2. Goals # Improve the error messages of ParseException that are from ANTLR; Modify all affected test cases accordingly. # Make sure the new error message framework is applied in this change. h2. Proposed Error Messages Change It should be in each sub-task and includes concrete before & after cases. See the description of each sub-task for more details. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org