Hi Andrew - thanks - that's a good thought - unfortunately, I have those set in the same pre context creation place as all the other variables that I have been using for months quite happily and that seem to impact Spark nicely. I have it set to Int.MaxValue.toString which I am guessing is large enough.
It very occasionally will use all data local nodes, and sometimes a mix, but mostly all process-local... On Tue, Nov 26, 2013 at 2:45 PM, Andrew Ash <[email protected]> wrote: > Hi Erik, > > I would guess that if you set spark.locality.wait to an absurdly large > value then you would have essentially that effect. > > Maybe you aren't setting the system property before creating your Spark > context? > > http://spark.incubator.apache.org/docs/latest/configuration.html > > Andrew > > > On Tue, Nov 26, 2013 at 2:40 PM, Erik Freed <[email protected]>wrote: > >> Hi All, >> After switching to 0.8, and reducing the number of partitions/tasks for a >> large scale computation, I have been unable to force Spark to use only >> executors on nodes where hbase data is local. I have not been able to find >> a setting for spark.locality.wait that makes any difference. It is not an >> option for us to let spark chose non data local nodes. Is their some >> example code of how to get this to work the way we want? We have our own >> input RDD that mimics the NewHadoopRdd and it seems to be doing the correct >> thing in all regards wrt to preferred locations. >> >> Do I have to write my own compute Tasks and schedule them myself? >> >> Anyone have any suggestions? I am stumped. >> >> cheers, >> Erik >> >> >> > -- Erik James Freed CoDecision Software 510.859.3360 [email protected] 1480 Olympus Avenue Berkeley, CA 94708 179 Maria Lane Orcas, WA 98245
