RDD only defines saveAsTextFile and saveAsObjectFile. I think saveAsHadoopFile and saveAsNewAPIHadoopFile belong to the older versions.
saveAsObjectFile definitely outputs hadoop format. I'm not trying to save big objects by saveAsObjectFile, I'm just trying to minimize the java serialization overhead when saving to a binary file. I can see spark can benefit from something like https://github.com/twitter/chill in this matter. On Fri, Jan 3, 2014 at 6:42 PM, Guillaume Pitel <[email protected]>wrote: > Hi, > > After a little bit of thinking, I'm not sure anymore if saveAsObjectFile > uses the spark.hadoop.* > > Also, I did write a mistake. The use of *.mapred.* or *.mapreduce.* does > not depend on the hadoop version you use, but onthe API version you use > > So, I can assure you that if you use the saveAsNewAPIHadoopFile, with the > spark.hadoop.mapreduce.* properties, the compression will be used. > > If you use the saveAsHadoopFile, it should be used with mapred.* > > If you use the saveAsObjectFile to a hdfs path, I'm not sure if the output > is compressed. > > Anyway, saveAsObjectFile should be used for small objects, in my opinion. > > Guillaume > > Even > > someMap.saveAsTextFile("out", classOf[GzipCodec]) > > has no effect. > > Also, I notices that saving sequence files has no compression option (my > original question was about compressing binary output). > > Having said this, I still do not understand why kryo cannot be helpful > when saving binary output. Binary output uses java serialization, which has > a pretty hefty overhead. > > How can kryo be applied to T when calling RDD[T]#saveAsObjectFile()? > > > -- > [image: eXenSa] > *Guillaume PITEL, Président* > +33(0)6 25 48 86 80 / +33(0)9 70 44 67 53 > > eXenSa S.A.S. <http://www.exensa.com/> > 41, rue Périer - 92120 Montrouge - FRANCE > Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05 >
<<exensa_logo_mail.png>>
