I can do that in my application, but I really want to know how I can do it
in spark-shell because I usually prototype in spark-shell before I put the
code into an application.



On Wed, Aug 20, 2014 at 12:47 PM, Sameer Tilak <ssti...@live.com> wrote:

> Hi Wang,
> Have you tried doing this in your application?
>
>        conf.set("spark.serializer",
> "org.apache.spark.serializer.KryoSerializer")
>        conf.set("spark.kryo.registrator", "yourpackage.MyKryoRegistrator")
>
> You then don't need to specify it via commandline.
>
>
> ------------------------------
> Date: Wed, 20 Aug 2014 12:25:14 -0700
> Subject: How to set KryoRegistrator class in spark-shell
> From: bewang.t...@gmail.com
> To: user@spark.apache.org
>
>
> I want to use opencsv's CSVParser to parse csv lines using a script like
> below in spark-shell:
>
> import au.com.bytecode.opencsv.CSVParser;
> import com.esotericsoftware.kryo.Kryo
> import org.apache.spark.serializer.KryoRegistrator
> import org.apache.hadoop.fs.{Path, FileSystem}
>
> class MyKryoRegistrator extends KryoRegistrator {
>   override def registerClasses(kryo:Kryo) {
>     kryo.register(classOf[CSVParser])
>   }
> }
>
> val outDir="/tmp/dmc-out"
>
> val fs = FileSystem.get(sc.hadoopConfiguration)
> fs.delete(new Path(outDir), true);
>
> val largeLines = sc.textFile("/tmp/dmc-03-08/*.gz")
> val parser = new CSVParser('|', '"')
> largeLines.map(parser.parseLine(_).toList).saveAsTextFile(outDir,
> classOf[org.apache.hadoop.io.compress.GzipCodec])
>
> If I start spark-shell with spark.kryo.registrator like this
>
> SPARK_JAVA_OPTS="-Dspark.serializer=org.apache.spark.serializer.KryoSerializer
> -Dspark.kryo.registrator=MyKryoRegistrator" spark-shell
>
> it complains that MyKroRegistrator not found when I run ":load my_script"
> in spark-shell.
>
> 14/08/20 12:14:01 ERROR KryoSerializer: Failed to run
> spark.kryo.registrator
> java.lang.ClassNotFoundException: MyKryoRegistrator
>
> What's wrong?
>

Reply via email to