Hi, I am trying one transformation by calling scala method this scala method returns MutableList[AvroObject]
def processRecords(id: String, list1: Iterable[(String, GenericRecord)]): scala.collection.mutable.MutableList[AvroObject] Hence, the output of transaformation is RDD[MutableList[AvroObject]] But I want o/p as RDD[AvroObject] I tried applying foreach on RDD[MutableList[AvroObject]] --> RDD[AvroObject] var uA = sparkContext.accumulableCollection[MutableList[AvroObject], universe](MutableList[AvroObject]()) rdd_list_avroObj.foreach(u => { uA ++= u }) var uRDD = sparkContext.parallelize(uA.value) Its failing on large dataset with following error java.io.IOException: java.lang.StackOverflowError at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1140) at org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:45) at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1458) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1429) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:80) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:226) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.StackOverflowError at java.io.ObjectOutputStream$HandleTable.hash(ObjectOutputStream.java:2359) at java.io.ObjectOutputStream$HandleTable.lookup(ObjectOutputStream.java:2292) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1115) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at java.util.ArrayList.writeObject(ArrayList.java:742) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) I have two queries regarding this issue: Option 1: REplacement of accumulator Option 2: In scala method instead of returning List[AvroObject] can I send multiple AvroObject. SO that I'll get RDD[AvroObject] Note: I am using Saprk 1.3.0 Input DataSize 200GB Cluster 3 Machines(2 Cores, 8GB) Spark running in YARN Mode Thanks & Regards Shweta Jadhav Tata Consultancy Services Limited Cell:- +91-9867515614 Mailto: jadhav.shw...@tcs.com Website: http://www.tcs.com ____________________________________________ Experience certainty. IT Services Business Solutions Consulting ____________________________________________ =====-----=====-----===== Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you