Re: [PR] [SPARK-53573][SQL] IDENTIFIER everywhere [spark]

via GitHub Mon, 10 Nov 2025 13:50:47 -0800


dtenedor commented on code in PR #52765:
URL: https://github.com/apache/spark/pull/52765#discussion_r2512015107



##########
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##########
@@ -833,8 +840,8 @@ hint
     ;
 
 hintStatement
-    : hintName=identifier
-    | hintName=identifier LEFT_PAREN parameters+=primaryExpression (COMMA 
parameters+=primaryExpression)* RIGHT_PAREN
+    : hintName=simpleIdentifier

Review Comment:
   Unless I am missing something, this change could break existing SQL using 
identifiers with session variables, etc. Is this intended?



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##########
@@ -3393,13 +3419,18 @@ class AstBuilder extends DataTypeAstBuilder
    * quoted in ``
    */
   override def visitColumnReference(ctx: ColumnReferenceContext): Expression = 
withOrigin(ctx) {
-    ctx.getStart.getText match {
+    // For regex check, we need the original text before identifier-lite 
resolution
+    val originalText = ctx.getStart.getText
+    originalText match {

Review Comment:
   same here



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/SubstituteParamsParser.scala:
##########
@@ -186,16 +186,26 @@ class SubstituteParamsParser extends Logging {
 
   /**
    * Apply a list of substitutions to the SQL text.
+   * Inserts a space separator when a parameter is immediately preceded by a 
quote
+   * to avoid back-to-back quotes after substitution.
    */
   private def applySubstitutions(sqlText: String, substitutions: 
List[Substitution]): String = {
     // Sort substitutions by start position in reverse order to avoid offset 
issues
     val sortedSubstitutions = substitutions.sortBy(-_.start)
 
     var result = sqlText
     sortedSubstitutions.foreach { substitution =>
-      result = result.substring(0, substitution.start) +
-        substitution.replacement +
-        result.substring(substitution.end)
+      val prefix = result.substring(0, substitution.start)
+      val replacement = substitution.replacement
+      val suffix = result.substring(substitution.end)
+
+      // Check if replacement is immediately preceded by a quote and doesn't 
already
+      // start with whitespace
+      val needsSpace = substitution.start > 0 &&

Review Comment:
   This is complex, could we break it up into separate `val`s to clarify what 
it is doing?



##########
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala:
##########
@@ -161,11 +201,110 @@ class DataTypeAstBuilder extends 
SqlBaseParserBaseVisitor[AnyRef] {
   }
 
   /**
-   * Create a multi-part identifier.
+   * Parse a string into a multi-part identifier. This method is intended to 
be overridden by
+   * subclasses that have access to a full SQL parser. The base implementation 
simply returns the
+   * input as a single-part identifier.
+   *
+   * For example, in AstBuilder, this would parse "`catalog`.`schema`.`table`" 
into Seq("catalog",
+   * "schema", "table").
+   *
+   * @param identifier
+   *   The identifier string to parse, potentially containing dots and 
backticks.
+   * @return
+   *   Sequence of identifier parts.
+   */
+  protected def parseMultipartIdentifier(identifier: String): Seq[String] = {
+    // Base implementation: just return the string as a single part.
+    // Subclasses with access to a full parser should override this.

Review Comment:
   Would this method be better left abstract to force subclasses to provide a 
working implementation rather than providing an implementation here that does 
not work as we want it to?



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/SubstituteParamsParser.scala:
##########
@@ -186,16 +186,26 @@ class SubstituteParamsParser extends Logging {
 
   /**
    * Apply a list of substitutions to the SQL text.
+   * Inserts a space separator when a parameter is immediately preceded by a 
quote
+   * to avoid back-to-back quotes after substitution.
    */
   private def applySubstitutions(sqlText: String, substitutions: 
List[Substitution]): String = {
     // Sort substitutions by start position in reverse order to avoid offset 
issues
     val sortedSubstitutions = substitutions.sortBy(-_.start)
 
     var result = sqlText
     sortedSubstitutions.foreach { substitution =>
-      result = result.substring(0, substitution.start) +
-        substitution.replacement +
-        result.substring(substitution.end)
+      val prefix = result.substring(0, substitution.start)
+      val replacement = substitution.replacement
+      val suffix = result.substring(substitution.end)
+
+      // Check if replacement is immediately preceded by a quote and doesn't 
already
+      // start with whitespace
+      val needsSpace = substitution.start > 0 &&
+        (result(substitution.start - 1) == '\'' || result(substitution.start - 
1) == '"') &&
+        replacement.nonEmpty && !replacement(0).isWhitespace
+
+      result = prefix + (if (needsSpace) " " else "") + replacement + suffix

Review Comment:
   let's use a Scala s-string for this



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##########
@@ -3353,10 +3376,13 @@ class AstBuilder extends DataTypeAstBuilder
    * it can be [[UnresolvedExtractValue]].
    */
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
-    val attr = ctx.fieldName.getText
+    // Use getIdentifierText to handle both regular identifiers and 
IDENTIFIER('literal')
+    val attr = getIdentifierText(ctx.fieldName)
     expression(ctx.base) match {
       case unresolved_attr @ UnresolvedAttribute(nameParts) =>
-        ctx.fieldName.getStart.getText match {
+        // For regex check, we need the original text before identifier-lite 
resolution
+        val originalText = ctx.fieldName.getStart.getText
+        originalText match {

Review Comment:
   Is there any point to making this a separate `val`, could we do
   
   ```
   ctx.fieldName.getStart.getText match {
     ...
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-53573][SQL] IDENTIFIER everywhere [spark]

Reply via email to