saveAsHadoopFile and saveAsNewAPIHadoopFile are on PairRDDFunctions which uses some Scala magic to become available when you have an that's RDD[Key, Value]
https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L648 Agreed, something like Chill would make this much easier for the default cases. On Fri, Jan 3, 2014 at 2:04 PM, Aureliano Buendia <[email protected]>wrote: > RDD only defines saveAsTextFile and saveAsObjectFile. I think > saveAsHadoopFile and saveAsNewAPIHadoopFile belong to the older versions. > > saveAsObjectFile definitely outputs hadoop format. > > I'm not trying to save big objects by saveAsObjectFile, I'm just trying to > minimize the java serialization overhead when saving to a binary file. > > I can see spark can benefit from something like > https://github.com/twitter/chill in this matter. > > > On Fri, Jan 3, 2014 at 6:42 PM, Guillaume Pitel < > [email protected]> wrote: > >> Hi, >> >> After a little bit of thinking, I'm not sure anymore if saveAsObjectFile >> uses the spark.hadoop.* >> >> Also, I did write a mistake. The use of *.mapred.* or *.mapreduce.* does >> not depend on the hadoop version you use, but onthe API version you use >> >> So, I can assure you that if you use the saveAsNewAPIHadoopFile, with the >> spark.hadoop.mapreduce.* properties, the compression will be used. >> >> If you use the saveAsHadoopFile, it should be used with mapred.* >> >> If you use the saveAsObjectFile to a hdfs path, I'm not sure if the >> output is compressed. >> >> Anyway, saveAsObjectFile should be used for small objects, in my opinion. >> >> Guillaume >> >> Even >> >> someMap.saveAsTextFile("out", classOf[GzipCodec]) >> >> has no effect. >> >> Also, I notices that saving sequence files has no compression option (my >> original question was about compressing binary output). >> >> Having said this, I still do not understand why kryo cannot be helpful >> when saving binary output. Binary output uses java serialization, which has >> a pretty hefty overhead. >> >> How can kryo be applied to T when calling RDD[T]#saveAsObjectFile()? >> >> >> -- >> [image: eXenSa] >> *Guillaume PITEL, Président* >> +33(0)6 25 48 86 80 / +33(0)9 70 44 67 53 >> >> eXenSa S.A.S. <http://www.exensa.com/> >> 41, rue Périer - 92120 Montrouge - FRANCE >> Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05 >> > >
<<exensa_logo_mail.png>>
