This is a little confusing, but that code path is actually going through hive. So the spark sql configuration does not help.
Perhaps, try: set parquet.compression=GZIP; On Fri, Jan 9, 2015 at 2:41 AM, Ayoub <benali.ayoub.i...@gmail.com> wrote: > Hello, > > I tried to save a table created via the hive context as a parquet file but > whatever compression codec (uncompressed, snappy, gzip or lzo) I set via > setConf like: > > setConf("spark.sql.parquet.compression.codec", "gzip") > > the size of the generated files is the always the same, so it seems like > spark context ignores the compression codec that I set. > > Here is a code sample applied via the spark shell: > > import org.apache.spark.sql.hive.HiveContext > val hiveContext = new HiveContext(sc) > > hiveContext.sql("SET hive.exec.dynamic.partition = true") > hiveContext.sql("SET hive.exec.dynamic.partition.mode = nonstrict") > hiveContext.setConf("spark.sql.parquet.binaryAsString", "true") // required > to make data compatible with impala > hiveContext.setConf("spark.sql.parquet.compression.codec", "gzip") > > hiveContext.sql("create external table if not exists foo (bar STRING, ts > INT) Partitioned by (year INT, month INT, day INT) STORED AS PARQUET > Location 'hdfs://path/data/foo'") > > hiveContext.sql("insert into table foo partition(year, month,day) select *, > year(from_unixtime(ts)) as year, month(from_unixtime(ts)) as month, > day(from_unixtime(ts)) as day from raw_foo") > > I tried that with spark 1.2 and 1.3 snapshot against hive 0.13 > and I also tried that with Impala on the same cluster which applied > correctly the compression codecs. > > Does anyone know what could be the problem ? > > Thanks, > Ayoub. > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Parquet-compression-codecs-not-applied-tp21058.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >