What does your application do?

Best Regards,
Sonal
Founder, Nube Technologies <http://www.nubetech.co>
Reifier at Strata Hadoop World <https://www.youtube.com/watch?v=eD3LkpPQIgM>
Reifier at Spark Summit 2015
<https://spark-summit.org/2015/events/real-time-fuzzy-matching-with-spark-and-elastic-search/>

<http://in.linkedin.com/in/sonalgoyal>



On Wed, Jun 22, 2016 at 9:57 PM, Raghava Mutharaju <
m.vijayaragh...@gmail.com> wrote:

> Hello All,
>
> We have a Spark cluster where driver and master are running on the same
> node. We are using Spark Standalone cluster manager. If the number of nodes
> (and the partitions) are increased, the same dataset that used to run to
> completion on lesser number of nodes is now giving an out of memory on the
> driver.
>
> For example, a dataset that runs on 32 nodes with number of partitions set
> to 256 completes whereas the same dataset when run on 64 nodes with number
> of partitions as 512 gives an OOM on the driver side.
>
> From what I read in the Spark documentation and other articles, following
> are the responsibilities of the driver/master.
>
> 1) create spark context
> 2) build DAG of operations
> 3) schedule tasks
>
> I am guessing that 1) and 2) should not change w.r.t number of
> nodes/partitions. So is it that since the driver has to keep track of lot
> more tasks, that it gives an OOM?
>
> What could be the possible reasons behind the driver-side OOM when the
> number of partitions are increased?
>
> Regards,
> Raghava.
>

Reply via email to