[ https://issues.apache.org/jira/browse/SPARK-7389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Patrick Wendell updated SPARK-7389: ----------------------------------- Assignee: shimingfei > Tachyon integration improvement > ------------------------------- > > Key: SPARK-7389 > URL: https://issues.apache.org/jira/browse/SPARK-7389 > Project: Spark > Issue Type: Improvement > Components: Block Manager > Reporter: shimingfei > Assignee: shimingfei > Fix For: 1.5.0 > > > Two main changes: > 1. Add two functions in ExternalBlockManager, which are putValues and > getValues, because the implementation may not rely on the putBytes and > getBytes > 2. improve Tachyon integration. > Currently, when putting data into Tachyon, Spark first serialize all data in > one partition into a ByteBuffer, and then write into Tachyon, this will use > much memory and increase GC overhead > when getting data from Tachyon, getValues depends on getBytes, which also > read all data into On heap byte arry, and result in much memory usage. > This PR changes the approach of the two functions, make them read / write > data by stream to reduce memory usage. > In our testing, when data size is huge, this patch reduces about 30% GC time > and 70% full GC time, and total execution time reduces about 10% -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org