Depending on how your C++ program is designed, maybe you can feed the data from multiple partitions into the same process? Getting the results back might be tricky. But that may be the only way to guarantee you're only using one invocation per node.
On Mon, Jul 14, 2014 at 5:12 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote: > I think coalesce with shuffle=true will force it to have one task per > node. Without that, it might be that due to data locality it decides to > launch multiple ones on the same node even though the total # of tasks is > equal to the # of nodes. > > If this is the *only* thing you run on the cluster, you could also > configure the Workers to only report one core by manually launching the > spark.deploy.worker.Worker process with that flag (see > http://spark.apache.org/docs/latest/spark-standalone.html). > > Matei > > On Jul 14, 2014, at 1:59 PM, Daniel Siegmann <daniel.siegm...@velos.io> > wrote: > > I don't have a solution for you (sorry), but do note that > rdd.coalesce(numNodes) keeps data on the same nodes where it was. If you > set shuffle=true then it should repartition and redistribute the data. > But it uses the hash partitioner according to the ScalaDoc - I don't know > of any way to supply a custom partitioner. > > > On Mon, Jul 14, 2014 at 4:09 PM, Ravi Pandya <r...@iecommerce.com> wrote: > >> I'm trying to run a job that includes an invocation of a memory & >> compute-intensive multithreaded C++ program, and so I'd like to run one >> task per physical node. Using rdd.coalesce(# nodes) seems to just allocate >> one task per core, and so runs out of memory on the node. Is there any way >> to give the scheduler a hint that the task uses lots of memory and cores so >> it spreads it out more evenly? >> >> Thanks, >> >> Ravi Pandya >> Microsoft Research >> > > > > -- > Daniel Siegmann, Software Developer > Velos > Accelerating Machine Learning > > 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001 > E: daniel.siegm...@velos.io W: www.velos.io > > > -- Daniel Siegmann, Software Developer Velos Accelerating Machine Learning 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001 E: daniel.siegm...@velos.io W: www.velos.io