Hi Aureliano,

I followed this thread to create a custom saveAsObjectFile. 
The following is the code.
/new org.apache.spark.rdd.SequenceFileRDDFunctions[NullWritable,
BytesWritable](saveRDD.mapPartitions(iter =>
iter.grouped(10).map(_.toArray)).map(x => (NullWritable.get(), new
BytesWritable(serialize(x))))).saveAsSequenceFile("objFiles") /

But, I get the following error when executed.
/
org.apache.spark.SparkException: Job aborted: Task not serializable:
java.io.NotSerializableException: org.apache.hadoop.mapred.JobConf 
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
 
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
 
        at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
        at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 
        at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
 
        at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:794)
 
        at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:737)
 
        at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:569) 
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
 
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) 
        at akka.actor.ActorCell.invoke(ActorCell.scala:456) 
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) 
        at akka.dispatch.Mailbox.run(Mailbox.scala:219) 
        at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
 
        at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
        at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 
        at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
        at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
/

Any idea about this error?
or 
Is there anything wrong in the line of code?

Thanks,
Pradeep




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SequenceFileRDDFunctions-cannot-be-used-output-of-spark-package-tp250p3442.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to