I believe what I want is the exact functionality provided by SparkContext.makeRDD in Scala. For each element in the RDD, I want specify a list of preferred hosts for processing that element.
It looks like this method only exists in Scala, and as far as I can tell there is no similar functionality available in python. Is this true? - Philip