GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21380
[SPARK-24329][SQL] Remove comments filtering before parsing of CSV files
## What changes were proposed in this pull request?
Filtering of comments and whitespace has been performed by uniVocity parser
already according to parser settings:
https://github.com/apache/spark/blob/branch-2.3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala#L178-L180
There is no need to repeat the same before uniVocity parser. In this PR, I
propose to remove filtering of whitespaces and comments (call of
filterCommentAndEmpty) in the parseIterator method of the UnivocityParser
object.
## How was this patch tested?
The changes were tested by CSVSuite
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 delete-comment-filtering
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21380.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21380
----
commit 014cfe9542f8ccdc634097fbebda21b9eb99ab7b
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-05-20T21:16:02Z
Removing of unnecessary comments filtering.
commit 36522689f9579ec05e7d69d1d7bd1f507f6bdbc0
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-05-20T21:18:20Z
Cleanup remove of comments
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]