I want to use opencsv's CSVParser to parse csv lines using a script like below in spark-shell:
import au.com.bytecode.opencsv.CSVParser; import com.esotericsoftware.kryo.Kryo import org.apache.spark.serializer.KryoRegistrator import org.apache.hadoop.fs.{Path, FileSystem} class MyKryoRegistrator extends KryoRegistrator { override def registerClasses(kryo:Kryo) { kryo.register(classOf[CSVParser]) } } val outDir="/tmp/dmc-out" val fs = FileSystem.get(sc.hadoopConfiguration) fs.delete(new Path(outDir), true); val largeLines = sc.textFile("/tmp/dmc-03-08/*.gz") val parser = new CSVParser('|', '"') largeLines.map(parser.parseLine(_).toList).saveAsTextFile(outDir, classOf[org.apache.hadoop.io.compress.GzipCodec]) If I start spark-shell with spark.kryo.registrator like this SPARK_JAVA_OPTS="-Dspark.serializer=org.apache.spark.serializer.KryoSerializer -Dspark.kryo.registrator=MyKryoRegistrator" spark-shell it complains that MyKroRegistrator not found when I run ":load my_script" in spark-shell. 14/08/20 12:14:01 ERROR KryoSerializer: Failed to run spark.kryo.registrator java.lang.ClassNotFoundException: MyKryoRegistrator What's wrong?