Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21380#discussion_r190023523
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
---
@@ -300,14 +302,11 @@ private[csv] object UnivocityParser {
lines
}
- val filteredLines: Iterator[String] =
- CSVUtils.filterCommentAndEmpty(linesWithoutHeader, options)
--- End diff --
Actually the test from #21394 shows the case when this PR has different
behavior: empty lines consist of multiple whitespaces +
`ignoreLeadingWhiteSpace` is `false` (which is by default) produces `null`s.
UniVocity parser can ignore lines with multiple whitespaces only when
`ignoreLeadingWhiteSpace` (or `ignoreLeadingWhiteSpace`) is set to `true`.
So, there is no combination of CSV options that allow to have default
behavior of current implementation. I would like to propose to close this PR
and add the test from #21394 to CSVSuite to be sure we will not break the
behavior described above.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]