Hi All, After switching to 0.8, and reducing the number of partitions/tasks for a large scale computation, I have been unable to force Spark to use only executors on nodes where hbase data is local. I have not been able to find a setting for spark.locality.wait that makes any difference. It is not an option for us to let spark chose non data local nodes. Is their some example code of how to get this to work the way we want? We have our own input RDD that mimics the NewHadoopRdd and it seems to be doing the correct thing in all regards wrt to preferred locations.
Do I have to write my own compute Tasks and schedule them myself? Anyone have any suggestions? I am stumped. cheers, Erik
