Hi, def makeRDD[T](seq: Seq[(T, Seq[String])])(implicit arg0: ClassTag[T]): RDD[T] list of tuples of data and location preferences (hostnames of Spark nodes)
Is that list a list of acceptable choices, and it will choose one of them? Or is it an ordered list? I'm trying to ascertain how well it will distribute if there's a lot of overlap between partitions and nodes. In my particular case, my RDD is Seq of (filePath, hosts[]) where hosts are nodes on which the file's blocks are local. --C