srowen commented on a change in pull request #26027: [SPARK-24540][SQL] Support 
for multiple delimiter in Spark CSV read
URL: https://github.com/apache/spark/pull/26027#discussion_r331660691
 
 

 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtils.scala
 ##########
 @@ -79,4 +81,39 @@ object CSVExprUtils {
         throw new IllegalArgumentException(s"Delimiter cannot be more than one 
character: $str")
     }
   }
+
+  /**
+   * Helper method that converts string representation of a character sequence 
to actual
+   * delimiter characters. The input is processed in "chunks", and each chunk 
is converted
+   * by calling [[CSVExprUtils.toChar()]].  A chunk is either:
+   * <ul>
+   *   <li>a backslash followed by another character</li>
+   *   <li>a non-backslash character by itself</li>
+   * </ul>
+   * , in that order of precedence. The result of the converting all chunks is 
returned as
 
 Review comment:
   You might throw in an example here. The idea of a chunk is just to account 
for expressing a tab as `\t`, right? That kind of example could clarify this. 
Really the delimiter is just a string whose backslash escapes have been 
resolved in the same way that a Scala / Java string does (right?)
   
   This documentation is fine here but is even more important in the docs for 
what the CSV reader accepts.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to