I want to use opencsv's CSVParser to parse csv lines using a script like
below in spark-shell:

import au.com.bytecode.opencsv.CSVParser;
import com.esotericsoftware.kryo.Kryo
import org.apache.spark.serializer.KryoRegistrator
import org.apache.hadoop.fs.{Path, FileSystem}

class MyKryoRegistrator extends KryoRegistrator {
  override def registerClasses(kryo:Kryo) {
    kryo.register(classOf[CSVParser])
  }
}

val outDir="/tmp/dmc-out"

val fs = FileSystem.get(sc.hadoopConfiguration)
fs.delete(new Path(outDir), true);

val largeLines = sc.textFile("/tmp/dmc-03-08/*.gz")
val parser = new CSVParser('|', '"')
largeLines.map(parser.parseLine(_).toList).saveAsTextFile(outDir,
classOf[org.apache.hadoop.io.compress.GzipCodec])

If I start spark-shell with spark.kryo.registrator like this

SPARK_JAVA_OPTS="-Dspark.serializer=org.apache.spark.serializer.KryoSerializer
-Dspark.kryo.registrator=MyKryoRegistrator" spark-shell

it complains that MyKroRegistrator not found when I run ":load my_script"
in spark-shell.

14/08/20 12:14:01 ERROR KryoSerializer: Failed to run spark.kryo.registrator
java.lang.ClassNotFoundException: MyKryoRegistrator

What's wrong?

Reply via email to