[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

MaxGekk Sun, 15 Apr 2018 06:24:38 -0700

Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/20937
  
    @HyukjinKwon Actually performance degrades because of `InputStreamReader`.  
Cost of `ByteArrayInputStream` is relatively very small.  As you can see in the 
screenshot below  `InputStreamReader` does allocates memory in the heap and 
does decoding per-each rows which takes pretty significant time:
    <img width="966" alt="screen shot 2018-04-15 at 2 33 25 pm" 
src="https://user-images.githubusercontent.com/1580697/38778900-b83894c6-40c0-11e8-84f4-b015c5e279fe.png";>
    
    I added benchmarks for measurements of per-line overhead. You can see the 
numbers in the comments.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

Reply via email to