I can do that in my application, but I really want to know how I can do it in spark-shell because I usually prototype in spark-shell before I put the code into an application.
On Wed, Aug 20, 2014 at 12:47 PM, Sameer Tilak <ssti...@live.com> wrote: > Hi Wang, > Have you tried doing this in your application? > > conf.set("spark.serializer", > "org.apache.spark.serializer.KryoSerializer") > conf.set("spark.kryo.registrator", "yourpackage.MyKryoRegistrator") > > You then don't need to specify it via commandline. > > > ------------------------------ > Date: Wed, 20 Aug 2014 12:25:14 -0700 > Subject: How to set KryoRegistrator class in spark-shell > From: bewang.t...@gmail.com > To: user@spark.apache.org > > > I want to use opencsv's CSVParser to parse csv lines using a script like > below in spark-shell: > > import au.com.bytecode.opencsv.CSVParser; > import com.esotericsoftware.kryo.Kryo > import org.apache.spark.serializer.KryoRegistrator > import org.apache.hadoop.fs.{Path, FileSystem} > > class MyKryoRegistrator extends KryoRegistrator { > override def registerClasses(kryo:Kryo) { > kryo.register(classOf[CSVParser]) > } > } > > val outDir="/tmp/dmc-out" > > val fs = FileSystem.get(sc.hadoopConfiguration) > fs.delete(new Path(outDir), true); > > val largeLines = sc.textFile("/tmp/dmc-03-08/*.gz") > val parser = new CSVParser('|', '"') > largeLines.map(parser.parseLine(_).toList).saveAsTextFile(outDir, > classOf[org.apache.hadoop.io.compress.GzipCodec]) > > If I start spark-shell with spark.kryo.registrator like this > > SPARK_JAVA_OPTS="-Dspark.serializer=org.apache.spark.serializer.KryoSerializer > -Dspark.kryo.registrator=MyKryoRegistrator" spark-shell > > it complains that MyKroRegistrator not found when I run ":load my_script" > in spark-shell. > > 14/08/20 12:14:01 ERROR KryoSerializer: Failed to run > spark.kryo.registrator > java.lang.ClassNotFoundException: MyKryoRegistrator > > What's wrong? >