I guess you extended some InputFormat for providing locality information. Can you share some code snippet ?
Which non-distributed file system are you using ? Thanks On Fri, Jul 1, 2016 at 2:46 PM, Raajen <raa...@gmail.com> wrote: > I would like to use Spark on a non-distributed file system but am having > trouble getting the driver to assign tasks to the workers that are local to > the files. I have extended InputSplits to create my own version of > FileSplits, so that each worker gets a bit more information than the > default > FileSplit provides. I thought that the driver would assign splits based on > their locality. But I have found that the driver will send these splits to > workers seemingly at random -- even the very first split will go to a node > with a different IP than the split specifies. I can see that I am providing > the right node address via GetLocations. I also set spark.locality.wait to > a > high value, but the same misassignment keeps happening. > > I am using newAPIHadoopFile to create my RDD. InputFormat is creating the > required splits, but not all splits refer to the same file or the same > worker IP. > > What else I can check, or change, to force the driver to send these tasks > to > the right workers? > > Thanks! > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-driver-assigning-splits-to-incorrect-workers-tp27261.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >