Hi,

I'm using the below SPARK Code. Currently i have a file of size 25 MB. And
I'm trying to do a comparative study on Kryo and Java serialization.

I had couple of questions:

1. How do you know which classes to register in Kryo ? [ highlighted in
yellow ]
2. When data is small, I'm seeing Java Serialization has better performance
than Kryo. so was wondering whether the below code represents  the correct
usage of Kryo ?

*import org.apache.spark._*
*import com.esotericsoftware.kryo.Kryo*
*import org.apache.spark.serializer.KryoRegistrator*
*import org.apache.hadoop.io.LongWritable*
*import org.apache.hadoop.io.Text*
*import org.apache.spark.storage.StorageLevel*

*class MyRegistrator extends KryoRegistrator {*
*  override def registerClasses(kryo: Kryo) {*
*        kryo.register(classOf[LongWritable])*
*        kryo.register(classOf[Text])*
*        kryo.register(classOf[Integer])*
*        kryo.register(classOf[Array[String]])*
*  }*
*}*

*object HTest {*

*  def main(args: Array[String]) {*
*        System.setProperty("spark.serializer",
"org.apache.spark.serializer.KryoSerializer")*
*        System.setProperty("spark.kryo.registrator", "MyRegistrator")*
*        val sc = new SparkContext("local[4]","Test")*
*        val input =
sc.textFile("/home/Test/DataSet/cd7a58dc-2053-4811-8463-b144781352ac_000004.csv").persist(StorageLevel.MEMORY_ONLY_SER)*
*        println(input.count())*
*        Thread.sleep(30000L)*
*        println(input.count())*
*        Thread.sleep(30000L)*
*  }*
*}*

Your Help is Highly appreciated.

Regards,
SB

Reply via email to