My mapPartition code as given below outputs one record for each input record. So, the output object has equal number of records as input. I am loading the output data into a listbuffer object. This object is turning out to be too huge for memory leading to Out Of Memory exception.
To be more clear my logic of partition is as below: *Iterator(Iter1) -> Processing -> ListBuffer(list1) iter1.size() = list1.size() list1 goes out of memory* *I cannot change the partition size.* My parition is based on input key and all the records corresponding to a key need to go into same partition. Is there a workaround to this? / tempRDD = iterateRDD.mapPartitions(p => { var outputList = ListBuffer[String]() var minVal = 0L while (p.hasNext) { val tpl = p.next() val key = tpl._1 val value = tpl._2 if(key != prevKey){ if(value < key){ minVal = value; outputList.add(minVal.toString() + "\t" +key.toString()) }else{ minVal = key; outputList.add(minVal.toString() + "\t" +value.toString()) } }else{ outputList.add(minVal.toString() + "\t" +value.toString()) } prevKey = key; } outputList.iterator })/ -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Out-of-Memory-error-caused-by-output-object-in-mapPartitions-tp26229.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org