Re: [PR] [SPARK-53573][SQL] IDENTIFIER everywhere [spark]

via GitHub Wed, 12 Nov 2025 20:53:35 -0800


dongjoon-hyun commented on code in PR #52765:
URL: https://github.com/apache/spark/pull/52765#discussion_r2521451826



##########
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala:
##########
@@ -60,12 +60,52 @@ import org.apache.spark.sql.types.{ArrayType, BinaryType, 
BooleanType, ByteType,
  *
  * @see
  *   [[org.apache.spark.sql.catalyst.parser.AstBuilder]] for the full SQL 
statement parser
+ *
+ * ==CRITICAL: Extracting Identifier Names==
+ *
+ * When extracting identifier names from parser contexts, you MUST use the 
helper methods provided
+ * by this class instead of calling ctx.getText() directly:
+ *
+ *   - '''getIdentifierText(ctx)''': For single identifiers (column names, 
aliases, window names)
+ *   - '''getIdentifierParts(ctx)''': For qualified identifiers (table names, 
schema.table)
+ *
+ * '''DO NOT use ctx.getText() or ctx.identifier.getText()''' directly! These 
methods do not
+ * handle the IDENTIFIER('literal') syntax and will cause incorrect behavior.
+ *
+ * The IDENTIFIER('literal') syntax allows string literals to be used as 
identifiers at parse time
+ * (e.g., IDENTIFIER('my_col') resolves to the identifier my_col). If you use 
getText(), you'll
+ * get the raw text "IDENTIFIER('my_col')" instead of "my_col", breaking the 
feature.
+ *
+ * Example:
+ * {{{
+ *   // WRONG - does not handle IDENTIFIER('literal'):
+ *   val name = ctx.identifier.getText

Review Comment:
   For my understanding, is this always wrong? What about the currently 
existing code in `AstBuilder.scala` and `SparkSqlParser.scala` like the 
following?
   
   ```
   $ git grep ctx.identifier.getText
   
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala:
 * '''DO NOT use ctx.getText() or ctx.identifier.getText()''' directly! These 
methods do not
   
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala:
 *   val name = ctx.identifier.getText
   
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
    val collationName = ctx.identifier.getText
   
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
      lazy val name: String = ctx.identifier.getText
   
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
      lazy val name: String = ctx.identifier.getText
   
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
        ctx.identifier.getText.toLowerCase(Locale.ROOT) != "noscan") {
   
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
      ctx.identifier.getText.toLowerCase(Locale.ROOT) != "noscan") {
   
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
    val indexName = ctx.identifier.getText
   sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala: 
       ctx.identifier.getText.toLowerCase(Locale.ROOT) match {
   sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala: 
       ctx.identifier.getText.toLowerCase(Locale.ROOT) match {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-53573][SQL] IDENTIFIER everywhere [spark]

Reply via email to