cloud-fan commented on code in PR #52638:
URL: https://github.com/apache/spark/pull/52638#discussion_r2472902586


##########
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala:
##########
@@ -45,40 +74,94 @@ class DataTypeAstBuilder extends 
SqlBaseParserBaseVisitor[AnyRef] {
     withOrigin(ctx)(StructType(visitColTypeList(ctx.colTypeList)))
   }
 
-  override def visitStringLiteralValue(ctx: StringLiteralValueContext): Token =
+  /**
+   * Visits a stringLit context and returns a single token from the first 
singleStringLit child.
+   *
+   * Note: This base implementation does not coalesce multiple string 
literals. Coalescing is
+   * handled in AstBuilder where SQL configuration is available to determine 
the correct escape
+   * processing mode.
+   */
+  override def visitStringLit(ctx: StringLitContext): Token = {
+    if (ctx == null) {
+      return null
+    }
+
+    import scala.jdk.CollectionConverters._
+
+    // Just return the first token. Coalescing happens in AstBuilder.
+    val singleStringLits = ctx.singleStringLit().asScala
+    if (singleStringLits.isEmpty) {
+      null
+    } else {
+      visit(singleStringLits.head).asInstanceOf[Token]

Review Comment:
   Most places simply do `string(singleStringLits(...))` to parse the string 
literal, without looking at any config. The comment parsing is such a place. A 
few places parse string literal differently w.r.t. configs by calling 
`createString`:
   ```
     private def createString(ctx: StringLiteralContext): String = {
       if (conf.escapedStringLiterals) {
         ctx.stringLit.asScala.map(x => 
stringWithoutUnescape(visitStringLit(x))).mkString
       } else if (conf.getConf(LEGACY_CONSECUTIVE_STRING_LITERALS)) {
         ctx.stringLit.asScala.map(x => 
stringIgnoreQuoteQuote(visitStringLit(x))).mkString
       } else {
         ctx.stringLit.asScala.map(x => string(visitStringLit(x))).mkString
       }
     }
   ```
   
   I think we can make `visitStringLit` to return `Array[Token]`, and change 
`def string` to take `Array[Token]`. It shouldn't be a massive change and 
avoids dropping anything.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to