GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21969
[SPARK-24945][SQL] Switching to uniVocity 2.7.3
## What changes were proposed in this pull request?
In the PR, I propose to upgrade uniVocity parser from **2.6.3** to
**2.7.3**. The recent version includes a fix for the SPARK-24645 issue and has
better performance.
Before changes:
```
Parsing quoted values: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
One quoted string 33336 / 34122 0.0
666727.0 1.0X
Wide rows with 1000 columns: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Select 1000 columns 90287 / 91713 0.0
90286.9 1.0X
Select 100 columns 31826 / 36589 0.0
31826.4 2.8X
Select one column 25738 / 25872 0.0
25737.9 3.5X
count() 6931 / 7269 0.1
6931.5 13.0X
```
after:
```
Parsing quoted values: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
One quoted string 33411 / 33510 0.0
668211.4 1.0X
Wide rows with 1000 columns: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Select 1000 columns 88028 / 89311 0.0
88028.1 1.0X
Select 100 columns 29010 / 32755 0.0
29010.1 3.0X
Select one column 22936 / 22953 0.0
22936.5 3.8X
count() 6657 / 6740 0.2
6656.6 13.5X
```
Closes #21892
## How was this patch tested?
It was tested by `CSVSuite` and `CSVBenchmarks`
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 univocity-2_7_3
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21969.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21969
----
commit 7b569ae1318316129d4b0d46969b02324b18b0aa
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-07-27T11:59:39Z
Bumping version of uniVocity parser up to 2.7.2
commit b116987d9a0adb887201177d41c1b94e6f5aeb63
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-07-27T13:25:11Z
Call uniVocity even the set of selected columns is empty
commit 3fb9cf76df65abe14dd39d233d18242e72e0a729
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-08-02T09:14:27Z
Bumping version to 2.7.3
commit a053994bcc6027668f64c9e55d09dfaa45cb97cf
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-08-02T09:14:48Z
Revert "Call uniVocity even the set of selected columns is empty"
This reverts commit b116987d9a0adb887201177d41c1b94e6f5aeb63.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]