Hi,
I'm using the below SPARK Code. Currently i have a file of size 25 MB. And
I'm trying to do a comparative study on Kryo and Java serialization.
I had couple of questions:
1. How do you know which classes to register in Kryo ? [ highlighted in
yellow ]
2. When data is small, I'm seeing Java Serialization has better performance
than Kryo. so was wondering whether the below code represents the correct
usage of Kryo ?
*import org.apache.spark._*
*import com.esotericsoftware.kryo.Kryo*
*import org.apache.spark.serializer.KryoRegistrator*
*import org.apache.hadoop.io.LongWritable*
*import org.apache.hadoop.io.Text*
*import org.apache.spark.storage.StorageLevel*
*class MyRegistrator extends KryoRegistrator {*
* override def registerClasses(kryo: Kryo) {*
* kryo.register(classOf[LongWritable])*
* kryo.register(classOf[Text])*
* kryo.register(classOf[Integer])*
* kryo.register(classOf[Array[String]])*
* }*
*}*
*object HTest {*
* def main(args: Array[String]) {*
* System.setProperty("spark.serializer",
"org.apache.spark.serializer.KryoSerializer")*
* System.setProperty("spark.kryo.registrator", "MyRegistrator")*
* val sc = new SparkContext("local[4]","Test")*
* val input =
sc.textFile("/home/Test/DataSet/cd7a58dc-2053-4811-8463-b144781352ac_000004.csv").persist(StorageLevel.MEMORY_ONLY_SER)*
* println(input.count())*
* Thread.sleep(30000L)*
* println(input.count())*
* Thread.sleep(30000L)*
* }*
*}*
Your Help is Highly appreciated.
Regards,
SB