Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21892
@jbax It became really faster:
```
Parsing quoted values: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
One quoted string 33411 / 33510 0.0
668211.4 1.0X
Wide rows with 1000 columns: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Select 1000 columns 88028 / 89311 0.0
88028.1 1.0X
Select 100 columns 29010 / 32755 0.0
29010.1 3.0X
Select one column 22936 / 22953 0.0
22936.5 3.8X
count() 22790 / 23143 0.0
22789.6 3.9X
```
The `count()` benchmark is still slower because I reverted the optimization
for empty schema. Before we didn't call `uniVocity`'s `parseLine` if the set of
selected indexes is empty. In this PR, I call `parseLine` for empty set since
the bug (https://github.com/uniVocity/univocity-parsers/issues/250) has been
fixed. It seems it performs similar to the case when only one column is
selected. So, the overhead per line is around 15.5 milliseconds on my CPU.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]