dongjoon-hyun commented on code in PR #52765:
URL: https://github.com/apache/spark/pull/52765#discussion_r2521451826
##########
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala:
##########
@@ -60,12 +60,52 @@ import org.apache.spark.sql.types.{ArrayType, BinaryType,
BooleanType, ByteType,
*
* @see
* [[org.apache.spark.sql.catalyst.parser.AstBuilder]] for the full SQL
statement parser
+ *
+ * ==CRITICAL: Extracting Identifier Names==
+ *
+ * When extracting identifier names from parser contexts, you MUST use the
helper methods provided
+ * by this class instead of calling ctx.getText() directly:
+ *
+ * - '''getIdentifierText(ctx)''': For single identifiers (column names,
aliases, window names)
+ * - '''getIdentifierParts(ctx)''': For qualified identifiers (table names,
schema.table)
+ *
+ * '''DO NOT use ctx.getText() or ctx.identifier.getText()''' directly! These
methods do not
+ * handle the IDENTIFIER('literal') syntax and will cause incorrect behavior.
+ *
+ * The IDENTIFIER('literal') syntax allows string literals to be used as
identifiers at parse time
+ * (e.g., IDENTIFIER('my_col') resolves to the identifier my_col). If you use
getText(), you'll
+ * get the raw text "IDENTIFIER('my_col')" instead of "my_col", breaking the
feature.
+ *
+ * Example:
+ * {{{
+ * // WRONG - does not handle IDENTIFIER('literal'):
+ * val name = ctx.identifier.getText
Review Comment:
For my understanding, is this always wrong? What about the currently
existing code in `AstBuilder.scala` and `SparkSqlParser.scala` like the
following?
```
$ git grep ctx.identifier.getText
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala:
* '''DO NOT use ctx.getText() or ctx.identifier.getText()''' directly! These
methods do not
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala:
* val name = ctx.identifier.getText
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
val collationName = ctx.identifier.getText
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
lazy val name: String = ctx.identifier.getText
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
lazy val name: String = ctx.identifier.getText
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
ctx.identifier.getText.toLowerCase(Locale.ROOT) != "noscan") {
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
ctx.identifier.getText.toLowerCase(Locale.ROOT) != "noscan") {
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
val indexName = ctx.identifier.getText
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala:
ctx.identifier.getText.toLowerCase(Locale.ROOT) match {
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala:
ctx.identifier.getText.toLowerCase(Locale.ROOT) match {
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]