Brilliant. Thanks !! Regards, SB
On Thu, Jan 23, 2014 at 3:42 AM, Matei Zaharia <[email protected]>wrote: > You should register each class you plan to use within your RDDs. In your > case you only have an RDD of Strings, so you don’t even really need a > registrator (strings are registered by default). But if you made custom > objects you would use one. > > To speed up Kryo you can also add kryo.setReferences(false) or set > spark.kryo.referenceTracking > = false. This disables tracking of circular references. But in general > benchmarking on this small amount of data, you’ll probably have noise from > the JVM starting up. > > Matei > > On Jan 22, 2014, at 12:27 PM, suman bharadwaj <[email protected]> wrote: > > Hi, > > I'm using the below SPARK Code. Currently i have a file of size 25 MB. And > I'm trying to do a comparative study on Kryo and Java serialization. > > I had couple of questions: > > 1. How do you know which classes to register in Kryo ? [ highlighted in > yellow ] > 2. When data is small, I'm seeing Java Serialization has better > performance than Kryo. so was wondering whether the below code represents > the correct usage of Kryo ? > > *import org.apache.spark._* > *import com.esotericsoftware.kryo.Kryo* > *import org.apache.spark.serializer.KryoRegistrator* > *import org.apache.hadoop.io.LongWritable* > *import org.apache.hadoop.io.Text* > *import org.apache.spark.storage.StorageLevel* > > *class MyRegistrator extends KryoRegistrator {* > * override def registerClasses(kryo: Kryo) {* > * kryo.register(classOf[LongWritable])* > * kryo.register(classOf[Text])* > * kryo.register(classOf[Integer])* > * kryo.register(classOf[Array[String]])* > * }* > *}* > > *object HTest {* > > * def main(args: Array[String]) {* > * System.setProperty("spark.serializer", > "org.apache.spark.serializer.KryoSerializer")* > * System.setProperty("spark.kryo.registrator", "MyRegistrator")* > * val sc = new SparkContext("local[4]","Test")* > * val input = > sc.textFile("/home/Test/DataSet/cd7a58dc-2053-4811-8463-b144781352ac_000004.csv").persist(StorageLevel.MEMORY_ONLY_SER)* > * println(input.count())* > * Thread.sleep(30000L)* > * println(input.count())* > * Thread.sleep(30000L)* > * }* > *}* > > Your Help is Highly appreciated. > > Regards, > SB > > >
