Even
someMap.saveAsTextFile("out", classOf[GzipCodec])has no effect. Also, I notices that saving sequence files has no compression option (my original question was about compressing binary output). Having said this, I still do not understand why kryo cannot be helpful when saving binary output. Binary output uses java serialization, which has a pretty hefty overhead. How can kryo be applied to T when calling RDD[T]#saveAsObjectFile()? On Fri, Jan 3, 2014 at 5:58 PM, Guillaume Pitel <[email protected]>wrote: > That's the right place. Maybe try with HDP1 properties : > > > http://stackoverflow.com/questions/17241185/spark-standalone-mode-how-to-compress-spark-output-written-to-hdfs > > About your Kryo error, you can use that if you want a coverage of scala > types : https://github.com/romix/scala-kryo-serialization > > Guillaume > > Thanks for clarifying this. > > I tried setting hadoop properties before constructing SparkContext, but > it had no effect. > > Where is the right place to set these properties? > > > On Fri, Jan 3, 2014 at 4:56 PM, Guillaume Pitel < > [email protected]> wrote: > >> Hi, >> >> I believe Kryo is only use during RDD serialization (i.e. communication >> between nodes), not for saving. If you want to compress output, you can use >> GZip or snappy codec like that : >> >> val codec = "org.apache.hadoop.io.compress.SnappyCodec" // for snappy >> val codec = "org.apache.hadoop.io.compress.GzipCodec" // for gzip >> >> System.setProperty("spark.hadoop.mapreduce.output.fileoutputformat.compress", >> "true") >> System.setProperty("spark.hadoop.mapreduce.output.fileoutputformat.compress.codec", >> codec) >> System.setProperty("spark.hadoop.mapreduce.output.fileoutputformat.compress.type", >> "BLOCK") >> >> (That's for HDP2, for HDP1, the keys are different) >> Regards >> Guillaume >> >> Hi, >> >> I'm trying to call saveAsObjectFile() on an RDD[*(Int, Int, Double >> Double)*], expecting the output binary to be smaller, but it is exactly >> the same size of when kryo is not on. >> >> I've checked the log, and there is no trace of kryo related errors. >> >> The code looks something like: >> >> class MyRegistrator extends KryoRegistrator { >> override def registerClasses(kryo: Kryo) { >> kryo.setRegistrationRequired(true) >> kryo.register(classOf[*(Int, Int, Double Double)*]) >> } >> } >> System.setProperty("spark.serializer", >> "org.apache.spark.serializer.KryoSerializer") >> System.setProperty("spark.kryo.registrator", "MyRegistrator") >> >> At the end, I tried to call: >> >> kryo.setRegistrationRequired(*true*) >> >> to make sure my class gets registered. But I found errors like: >> >> Exception in thread "DAGScheduler" >> com.esotericsoftware.kryo.KryoException: >> java.lang.IllegalArgumentException: Class is not registered: >> *scala.math.Numeric$IntIsIntegral$* >> Note: To register this class use: >> kryo.register(scala.math.Numeric$IntIsIntegral$.class); >> >> It appears many scala internal types have to be registered in order to >> have full kryo support. >> >> Any idea why my simple tuple type should not get kryo benefits? >> >> >> >> -- >> [image: eXenSa] >> *Guillaume PITEL, Président* >> +33(0)6 25 48 86 80 <%2B33%280%296%2025%2048%2086%2080> / +33(0)9 70 44 >> 67 53 <%2B33%280%299%2070%2044%2067%2053> >> >> eXenSa S.A.S. <http://www.exensa.com/> >> 41, rue Périer - 92120 Montrouge - FRANCE >> Tel +33(0)1 84 16 36 77 <%2B33%280%291%2084%2016%2036%2077> / Fax +33(0)9 >> 72 28 37 05 >> > > > > -- > [image: eXenSa] > *Guillaume PITEL, Président* > +33(0)6 25 48 86 80 / +33(0)9 70 44 67 53 > > eXenSa S.A.S. <http://www.exensa.com/> > 41, rue Périer - 92120 Montrouge - FRANCE > Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05 >
<<image/png>>
<<exensa_logo_mail.png>>
