[ https://issues.apache.org/jira/browse/SPARK-13997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200984#comment-15200984 ]
Apache Spark commented on SPARK-13997: -------------------------------------- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/11806 > Use Hadoop 2.0 default value for compression in data sources > ------------------------------------------------------------ > > Key: SPARK-13997 > URL: https://issues.apache.org/jira/browse/SPARK-13997 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.0.0 > Reporter: Hyukjin Kwon > Priority: Trivial > > Currently, JSON, TEXT and CSV data sources use {{CompressionCodecs}} class to > set compression configurations via {{option("compress", "codec")}}. > I made this uses Hadoop 1.x default value (block level compression). However, > the default value in Hadoop 2.x is record level compression as described in > [mapred-site.xml|https://hadoop.apache.org/docs/r2.7.1/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml]. > Since it drops Hadoop 1.x, it will make sense to use Hadoop 2.x default > values. > According to [Hadoop Definitive Guide 3th > edition|https://www.safaribooksonline.com/library/view/hadoop-the-definitive/9781449328917/ch04.html], > it looks configurations for the unit of compression (record or block). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org