Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/20937
@HyukjinKwon Actually performance degrades because of `InputStreamReader`.
Cost of `ByteArrayInputStream` is relatively very small. As you can see in the
screenshot below `InputStreamReader` does allocates memory in the heap and
does decoding per-each rows which takes pretty significant time:
<img width="966" alt="screen shot 2018-04-15 at 2 33 25 pm"
src="https://user-images.githubusercontent.com/1580697/38778900-b83894c6-40c0-11e8-84f4-b015c5e279fe.png">
I added benchmarks for measurements of per-line overhead. You can see the
numbers in the comments.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]