[ https://issues.apache.org/jira/browse/SPARK-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Rosen resolved SPARK-2496. ------------------------------- Resolution: Incomplete Resolving as "Incomplete"; if we still want to do this then we should wait until we have a specific concrete use-case / list of things that need to be changed. > Compression streams should write its codec info to the stream > ------------------------------------------------------------- > > Key: SPARK-2496 > URL: https://issues.apache.org/jira/browse/SPARK-2496 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core > Reporter: Reynold Xin > Priority: Critical > > Spark sometime store compressed data outside of Spark (e.g. event logs, > blocks in tachyon), and those data are read back directly using the codec > configured by the user. When the codec differs between runs, Spark wouldn't > be able to read the codec back. > I'm not sure what the best strategy here is yet. If we write the codec > identifier for all streams, then we will be writing a lot of identifiers for > shuffle blocks. One possibility is to only write it for blocks that will be > shared across different Spark instances (i.e. managed outside of Spark), > which includes tachyon blocks and event log blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org