Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21380#discussion_r189823948
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
---
@@ -300,14 +302,11 @@ private[csv] object UnivocityParser {
lines
}
- val filteredLines: Iterator[String] =
- CSVUtils.filterCommentAndEmpty(linesWithoutHeader, options)
--- End diff --
Guys, this is speculation about the bug, it seems to me. If the code was
added to fix a bug in previous version fo uniVocity parser, where is the tests
for that? For now, we filter CSV rows additionally to uniVocity only because we
don't trust to the library as far as I understand @HyukjinKwon . Following to
the logic, we should duplicate another functionality of uniVocity in CSV
datasource. That looks pretty crazy.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]