Hi Ghousia, You can try the following:
1. Increase the heap size <https://spark.apache.org/docs/0.9.0/configuration.html> 2. Increase the number of partitions <http://stackoverflow.com/questions/21698443/spark-best-practice-for-retrieving-big-data-from-rdd-to-local-machine> 3. You could try persisting the RDD to use DISK_ONLY <http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence> Thanks Best Regards On Mon, Aug 18, 2014 at 10:40 AM, Ghousia Taj <ghousia.ath...@gmail.com> wrote: > Hi, > > I am trying to implement machine learning algorithms on Spark. I am working > on a 3 node cluster, with each node having 5GB of memory. Whenever I am > working with slightly more number of records, I end up with OutOfMemory > Error. Problem is, even if number of records is slightly high, the > intermediate result from a transformation is huge and this results in > OutOfMemory Error. To overcome this, we are partitioning the data such that > each partition has only a few records. > > Is there any better way to fix this issue. Some thing like spilling the > intermediate data to local disk? > > Thanks, > Ghousia. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemory-Error-tp12275.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >