Is MyType serializable? Everything inside the foreachRDD closure has to be serializable.
2014-07-09 14:24 GMT+01:00 RodrigoB <rodrigo.boav...@aspect.com>: > Hi all, > > I am currently trying to save to Cassandra after some Spark Streaming > computation. > > I call a myDStream.foreachRDD so that I can collect each RDD in the driver > app runtime and inside I do something like this: > myDStream.foreachRDD(rdd => { > > var someCol = Seq[MyType]() > > foreach(kv =>{ > someCol :+ rdd._2 //I only want the RDD value and not the key > } > val collectionRDD = sc.parallelize(someCol) //THIS IS WHY IT FAILS TRYING > TO > RUN THE WORKER > collectionRDD.saveToCassandra(...) > } > > I get the NotSerializableException while trying to run the Node (also tried > someCol as shared variable). > I believe this happens because the myDStream doesn't exist yet when the > code > is pushed to the Node so the parallelize doens't have any structure to > relate to it. Inside this foreachRDD I should only do RDD calls which are > only related to other RDDs. I guess this was just a desperate attempt.... > > So I have a question > Using the Cassandra Spark driver - Can we only write to Cassandra from an > RDD? In my case I only want to write once all the computation is finished > in > a single batch on the driver app. > > tnks in advance. > > Rod > > > > > > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Cassandra-driver-Spark-question-tp9177.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >