srowen commented on a change in pull request #26027: [SPARK-24540][SQL] Support
for multiple character delimiter in Spark CSV read
URL: https://github.com/apache/spark/pull/26027#discussion_r332035147
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtils.scala
##########
@@ -79,4 +82,48 @@ object CSVExprUtils {
throw new IllegalArgumentException(s"Delimiter cannot be more than one
character: $str")
}
}
+
+ /**
+ * Helper method that converts string representation of a character sequence
to actual
+ * delimiter characters. The input is processed in "chunks", and each chunk
is converted
+ * by calling [[CSVExprUtils.toChar()]]. A chunk is either:
+ * <ul>
+ * <li>a backslash followed by another character</li>
+ * <li>a non-backslash character by itself</li>
+ * </ul>
+ * , in that order of precedence. The result of the converting all chunks is
returned as
+ * a [[String]].
+ *
+ * <br/><br/>Examples:
+ * <ul><li>`\t` will result in a single tab character as the separator (same
as before)
Review comment:
Yeah it's a fair point @jeff303 ; won't this already be unescaped by virtue
of being a string literal? If I pass `"\\"` then the string literal is a single
backslash, by the time it gets here. That would yield an error, not specify a
single backslash as delimiter.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]