What does your application do? Best Regards, Sonal Founder, Nube Technologies <http://www.nubetech.co> Reifier at Strata Hadoop World <https://www.youtube.com/watch?v=eD3LkpPQIgM> Reifier at Spark Summit 2015 <https://spark-summit.org/2015/events/real-time-fuzzy-matching-with-spark-and-elastic-search/>
<http://in.linkedin.com/in/sonalgoyal> On Wed, Jun 22, 2016 at 9:57 PM, Raghava Mutharaju < m.vijayaragh...@gmail.com> wrote: > Hello All, > > We have a Spark cluster where driver and master are running on the same > node. We are using Spark Standalone cluster manager. If the number of nodes > (and the partitions) are increased, the same dataset that used to run to > completion on lesser number of nodes is now giving an out of memory on the > driver. > > For example, a dataset that runs on 32 nodes with number of partitions set > to 256 completes whereas the same dataset when run on 64 nodes with number > of partitions as 512 gives an OOM on the driver side. > > From what I read in the Spark documentation and other articles, following > are the responsibilities of the driver/master. > > 1) create spark context > 2) build DAG of operations > 3) schedule tasks > > I am guessing that 1) and 2) should not change w.r.t number of > nodes/partitions. So is it that since the driver has to keep track of lot > more tasks, that it gives an OOM? > > What could be the possible reasons behind the driver-side OOM when the > number of partitions are increased? > > Regards, > Raghava. >