[GitHub] [spark] srowen commented on a change in pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV

GitBox Mon, 24 Aug 2020 21:22:58 -0700


srowen commented on a change in pull request #29516:
URL: https://github.com/apache/spark/pull/29516#discussion_r476151826




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtils.scala
##########
@@ -25,16 +25,21 @@ object CSVExprUtils {
    * This is currently being used in CSV reading path and CSV schema inference.
    */
   def filterCommentAndEmpty(iter: Iterator[String], options: CSVOptions): 
Iterator[String] = {
-    iter.filter { line =>
-      line.trim.nonEmpty && !line.startsWith(options.comment.toString)
+    if (options.isCommentSet) {
+      val commentPrefix = options.comment.toString
+      iter.filter { line =>
+        line.trim.nonEmpty && !line.startsWith(commentPrefix)
+      }
+    } else {
+      iter.filter(_.trim.nonEmpty)
     }
   }
 
   def skipComments(iter: Iterator[String], options: CSVOptions): 
Iterator[String] = {
     if (options.isCommentSet) {
       val commentPrefix = options.comment.toString
       iter.dropWhile { line =>
-        line.trim.isEmpty || line.trim.startsWith(commentPrefix)
+        line.trim.isEmpty || line.startsWith(commentPrefix)

Review comment:
       I think the existing logic matches the logic of 
https://github.com/apache/spark/pull/29516/files#diff-7faa93f00223527237747227998e30f1R27
 ? maybe I'm missing your point. The logic has always been to drop lines that 
are empty after trimming no matter what, regardless of comment char. Right or 
wrong that's separate.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] srowen commented on a change in pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV

Reply via email to