Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/20937 @HyukjinKwon Actually performance degrades because of `InputStreamReader`. Cost of `ByteArrayInputStream` is relatively very small. As you can see in the screenshot below `InputStreamReader` does allocates memory in the heap and does decoding per-each rows which takes pretty significant time: <img width="966" alt="screen shot 2018-04-15 at 2 33 25 pm" src="https://user-images.githubusercontent.com/1580697/38778900-b83894c6-40c0-11e8-84f4-b015c5e279fe.png"> I added benchmarks for measurements of per-line overhead. You can see the numbers in the comments.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org