If you have multiple executors running on a single node then you might have data that's on the same server but in different JVMs. Just on the same server is NODE_LOCAL, but being in the same JVM is PROCESS_LOCAL.
Yes it was changed to be more specific than just preferred/non-preferred. The new options are PROCESS_LOCAL, NODE_LOCAL, RACK_LOCAL, ANY in decreasing order of co-location. Andrew On Wed, Feb 5, 2014 at 10:16 PM, Tsai Li Ming <[email protected]> wrote: > Hi, > > In older posts on Google Groups, there was mention of checking the logs on > “preferred/non-preferred” for data locality. > > But I can’t seem to find this on 0.9.0 anymore? Has this been changed to > “PROCESS_LOCAL” , like this: > 14/02/06 13:51:45 INFO TaskSetManager: Starting task 9.0:50 as TID 568 on > executor 0: xxx (PROCESS_LOCAL) > > What is the difference between process-local and node-local? > > Thanks, > Liming > > > > >
