Thanks, Rustagi. Yes, the global data is read-only and stays from the beginning to the end of the whole Spark task. Actually, it is not only identical for one Map/Reduce task, but used by a lot of map/reduce tasks of mine. That's why I intend to put the data into each node of my cluster, and hope to see if it is possible for a Spark Map/Reduce program to let all the nodes read it simultaneously from their local disks rather than read it by one node and broadcast to other nodes. Any suggestions for solving it?
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-handle-this-situation-Huge-File-Shared-by-All-maps-and-Each-Computer-Has-one-copy-tp5139p5192.html Sent from the Apache Spark User List mailing list archive at Nabble.com.