Hi,

After a little bit of thinking, I'm not sure anymore if saveAsObjectFile uses the spark.hadoop.*

Also, I did write a mistake. The use of *.mapred.* or *.mapreduce.* does not depend on the hadoop version you use, but onthe API version you use

So, I can assure you that if you use the saveAsNewAPIHadoopFile, with the spark.hadoop.mapreduce.* properties, the compression will be used.

If you use the saveAsHadoopFile, it should be used with mapred.*

If you use the saveAsObjectFile to a hdfs path, I'm not sure if the output is compressed.

Anyway, saveAsObjectFile should be used for small objects, in my opinion.

Guillaume
Even 

someMap.saveAsTextFile("out", classOf[GzipCodec])

has no effect.

Also, I notices that saving sequence files has no compression option (my original question was about compressing binary output).

Having said this, I still do not understand why kryo cannot be helpful when saving binary output. Binary output uses java serialization, which has a pretty hefty overhead.

How can kryo be applied to T when calling RDD[T]#saveAsObjectFile()?

--
eXenSa
Guillaume PITEL, Président
+33(0)6 25 48 86 80 / +33(0)9 70 44 67 53

eXenSa S.A.S.
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05

Reply via email to