Re: Last step of processing is using too much memory.

Davies Liu Tue, 29 Jul 2014 23:14:27 -0700

When you do groupBy(), it wish to load all the data into memory for best
performance, then you should specify the number of partitions carefully.


In Spark master or upcoming 1.1 release, PySpark can do external groupBy(),
it means that it will dumps the data into disks if there is not enough memory
to hold all the data. It also will help in this case.

On Fri, Jul 18, 2014 at 1:56 AM, Roch Denis <rde...@exostatic.com> wrote:
> Well, for what it's worth, I found the issue after spending the whole night
> running experiments;).
>
> Basically, I needed to give a higher number of partition for the groupByKey.
> I was simply using the default, which generated only 4 partitions and so the
> whole thing blew up.
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Last-step-of-processing-is-using-too-much-memory-tp10134p10147.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Last step of processing is using too much memory.

Reply via email to