Hi Andrew  - thanks - that's a good thought - unfortunately, I have those
set in the same pre context creation place as all the other variables that
I have been using for months quite happily and that seem to impact Spark
nicely. I have it set to Int.MaxValue.toString which I am guessing is large
enough.

It very occasionally will use all data local nodes, and sometimes a mix,
but mostly all process-local...


On Tue, Nov 26, 2013 at 2:45 PM, Andrew Ash <[email protected]> wrote:

> Hi Erik,
>
> I would guess that if you set spark.locality.wait to an absurdly large
> value then you would have essentially that effect.
>
> Maybe you aren't setting the system property before creating your Spark
> context?
>
> http://spark.incubator.apache.org/docs/latest/configuration.html
>
> Andrew
>
>
> On Tue, Nov 26, 2013 at 2:40 PM, Erik Freed <[email protected]>wrote:
>
>> Hi All,
>> After switching to 0.8, and reducing the number of partitions/tasks for a
>> large scale computation, I have been unable to force Spark to use only
>> executors on nodes where hbase data is local. I have not been able to find
>> a setting for spark.locality.wait that makes any difference. It is not an
>> option for us to let spark chose non data local nodes. Is their some
>> example code of how to get this to work the way we want? We have our own
>> input RDD that mimics the NewHadoopRdd and it seems to be doing the correct
>> thing in all regards wrt to preferred locations.
>>
>> Do I have to write my own compute Tasks and schedule them myself?
>>
>> Anyone have any suggestions? I am stumped.
>>
>> cheers,
>> Erik
>>
>>
>>
>


-- 
Erik James Freed
CoDecision Software
510.859.3360
[email protected]

1480 Olympus Avenue
Berkeley, CA
94708

179 Maria Lane
Orcas, WA
98245

Reply via email to