Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21380#discussion_r189794437
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
---
@@ -300,14 +302,11 @@ private[csv] object UnivocityParser {
lines
}
- val filteredLines: Iterator[String] =
- CSVUtils.filterCommentAndEmpty(linesWithoutHeader, options)
--- End diff --
Probably, you observed issues in old versions of uniVocity parser as
@maropu wrote above. I would propose to remove the filtering till we face to
the cases when uniVocity's filter doesn't work as it is expected. So, we would
submit an issue for uniVocity and revert the changes back.
> I think we already do such things in Spark side redundantly to make sure
in few places.
I looked at another places where we do the same but this is only the place
where we do filtering directly before uniVocity.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]