srowen commented on a change in pull request #29516:
URL: https://github.com/apache/spark/pull/29516#discussion_r476151826
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtils.scala
##########
@@ -25,16 +25,21 @@ object CSVExprUtils {
* This is currently being used in CSV reading path and CSV schema inference.
*/
def filterCommentAndEmpty(iter: Iterator[String], options: CSVOptions):
Iterator[String] = {
- iter.filter { line =>
- line.trim.nonEmpty && !line.startsWith(options.comment.toString)
+ if (options.isCommentSet) {
+ val commentPrefix = options.comment.toString
+ iter.filter { line =>
+ line.trim.nonEmpty && !line.startsWith(commentPrefix)
+ }
+ } else {
+ iter.filter(_.trim.nonEmpty)
}
}
def skipComments(iter: Iterator[String], options: CSVOptions):
Iterator[String] = {
if (options.isCommentSet) {
val commentPrefix = options.comment.toString
iter.dropWhile { line =>
- line.trim.isEmpty || line.trim.startsWith(commentPrefix)
+ line.trim.isEmpty || line.startsWith(commentPrefix)
Review comment:
I think the existing logic matches the logic of
https://github.com/apache/spark/pull/29516/files#diff-7faa93f00223527237747227998e30f1R27
? maybe I'm missing your point. The logic has always been to drop lines that
are empty after trimming no matter what, regardless of comment char. Right or
wrong that's separate.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]