[GitHub] [spark] srowen commented on a change in pull request #26027: [SPARK-24540][SQL] Support for multiple character delimiter in Spark CSV read

GitBox Mon, 07 Oct 2019 08:48:55 -0700

srowen commented on a change in pull request #26027: [SPARK-24540][SQL] Support 
for multiple character delimiter in Spark CSV read
URL: https://github.com/apache/spark/pull/26027#discussion_r332096818


 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtils.scala
 ##########
 @@ -79,4 +82,48 @@ object CSVExprUtils {
         throw new IllegalArgumentException(s"Delimiter cannot be more than one 
character: $str")
     }
   }
+
+  /**
+   * Helper method that converts string representation of a character sequence 
to actual
+   * delimiter characters. The input is processed in "chunks", and each chunk 
is converted
+   * by calling [[CSVExprUtils.toChar()]].  A chunk is either:
+   * <ul>
+   *   <li>a backslash followed by another character</li>
+   *   <li>a non-backslash character by itself</li>
+   * </ul>
+   * , in that order of precedence. The result of the converting all chunks is 
returned as
+   * a [[String]].
+   *
+   * <br/><br/>Examples:
+   * <ul><li>`\t` will result in a single tab character as the separator (same 
as before)
 
 Review comment:
   Yeah you're right I'm also questioning why this method is there to parse the 
delimiter at all. Seems like the result of passing `"\\"` would be surprising. 
It existed before that change, even. @MaxGekk do you recall anything about this 
part? why does the string need further unescaping?
   
   Well.. we could say that's a separate question and just leave this issue 
aside here entirely.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] srowen commented on a change in pull request #26027: [SPARK-24540][SQL] Support for multiple character delimiter in Spark CSV read

Reply via email to