Hi Ghousia,

You can try the following:

1. Increase the heap size
<https://spark.apache.org/docs/0.9.0/configuration.html>
2. Increase the number of partitions
<http://stackoverflow.com/questions/21698443/spark-best-practice-for-retrieving-big-data-from-rdd-to-local-machine>
3. You could try persisting the RDD to use DISK_ONLY
<http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence>



Thanks
Best Regards


On Mon, Aug 18, 2014 at 10:40 AM, Ghousia Taj <ghousia.ath...@gmail.com>
wrote:

> Hi,
>
> I am trying to implement machine learning algorithms on Spark. I am working
> on a 3 node cluster, with each node having 5GB of memory. Whenever I am
> working with slightly more number of records, I end up with OutOfMemory
> Error. Problem is, even if number of records is slightly high, the
> intermediate result from a transformation is huge and this results in
> OutOfMemory Error. To overcome this, we are partitioning the data such that
> each partition has only a few records.
>
> Is there any better way to fix this issue. Some thing like spilling the
> intermediate data to local disk?
>
> Thanks,
> Ghousia.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemory-Error-tp12275.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to