You should register each class you plan to use within your RDDs. In your case 
you only have an RDD of Strings, so you don’t even really need a registrator 
(strings are registered by default). But if you made custom objects you would 
use one.

To speed up Kryo you can also add kryo.setReferences(false) or set 
spark.kryo.referenceTracking = false. This disables tracking of circular 
references. But in general benchmarking on this small amount of data, you’ll 
probably have noise from the JVM starting up.

Matei

On Jan 22, 2014, at 12:27 PM, suman bharadwaj <[email protected]> wrote:

> Hi,
> 
> I'm using the below SPARK Code. Currently i have a file of size 25 MB. And 
> I'm trying to do a comparative study on Kryo and Java serialization.
> 
> I had couple of questions:
> 
> 1. How do you know which classes to register in Kryo ? [ highlighted in 
> yellow ]
> 2. When data is small, I'm seeing Java Serialization has better performance 
> than Kryo. so was wondering whether the below code represents  the correct 
> usage of Kryo ?
> 
> import org.apache.spark._
> import com.esotericsoftware.kryo.Kryo
> import org.apache.spark.serializer.KryoRegistrator
> import org.apache.hadoop.io.LongWritable
> import org.apache.hadoop.io.Text
> import org.apache.spark.storage.StorageLevel
> 
> class MyRegistrator extends KryoRegistrator {
>   override def registerClasses(kryo: Kryo) {
>         kryo.register(classOf[LongWritable])
>         kryo.register(classOf[Text])
>         kryo.register(classOf[Integer])
>         kryo.register(classOf[Array[String]])
>   }
> }
> 
> object HTest {
> 
>   def main(args: Array[String]) {
>         System.setProperty("spark.serializer", 
> "org.apache.spark.serializer.KryoSerializer")
>         System.setProperty("spark.kryo.registrator", "MyRegistrator")
>         val sc = new SparkContext("local[4]","Test")
>         val input = 
> sc.textFile("/home/Test/DataSet/cd7a58dc-2053-4811-8463-b144781352ac_000004.csv").persist(StorageLevel.MEMORY_ONLY_SER)
>         println(input.count())
>         Thread.sleep(30000L)
>         println(input.count())
>         Thread.sleep(30000L)
>   }
> }
> 
> Your Help is Highly appreciated.
> 
> Regards,
> SB

Reply via email to