I have a rather simple avro schema to serialize Tweets (message, username, timestamp). Kryo and twitter chill are used to do so.
For my dev environment the Spark context is configured as below val conf: SparkConf = new SparkConf() conf.setAppName("kryo_test") conf.setMaster(“local[4]") conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") conf.set("spark.kryo.registrator", "co.feeb.TweetRegistrator”) Serialization is setup with override def registerClasses(kryo: Kryo): Unit = { kryo.register(classOf[Tweet], AvroSerializer.SpecificRecordBinarySerializer[Tweet]) } (This method gets called) Using this configuration to persist some object fails with java.io.NotSerializableException: co.feeb.avro.Tweet (which seems to be ok as this class is not Serializable) I used the following code: val ctx: SparkContext = new SparkContext(conf) val tweets: RDD[Tweet] = ctx.parallelize(List( new Tweet("a", "b", 1L), new Tweet("c", "d", 2L), new Tweet("e", "f", 3L) ) ) tweets.saveAsObjectFile("file:///tmp/spark”) Using saveAsTextFile works, but persisted files are not binary but JSON cat /tmp/spark/part-00000 {"username": "a", "text": "b", "timestamp": 1} {"username": "c", "text": "d", "timestamp": 2} {"username": "e", "text": "f", "timestamp": 3} Is this intended behaviour, a configuration issue, avro serialisation not working in local mode or something else? --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org