Hello Everybody, Please help me with this. preferredLocations(p) method for an RDD gives nodes where partition p of a given RDD can be accessed faster. How does SPARK inherently implements this?...Does any history about access times, network bandwidth for various partitions across nodes are stored and used, or else jobs allocated to a node only determines the preferredLocations in case for multiple copies of RDD. Or is the intelligence derived from underlying framework, say HDFS.
-- *Sai Prasanna. AN* *II M.Tech (CS), SSSIHL* *Entire water in the ocean can never sink a ship, Unless it gets inside.All the pressures of life can never hurt you, Unless you let them in.*
