482 MB should be small enough to be distributed as a set of broadcast variables. Then you can use local features of spark to process.
-----Original Message----- From: "shahab" <[email protected]> Sent: 4/30/2015 9:42 AM To: "[email protected]" <[email protected]> Subject: is there anyway to enforce Spark to cache data in all worker nodes(almost equally) ? Hi, I load data from Cassandra into spark The entire data is almost around 482 MB. and it is cached as TempTable in 7 tables. How can I enforce spark to cache data in both worker nodes not only in ONE worker (as in my case)? I am using spark "2.1.1" with spark-connector "1.2.0-rc3". I have small stand-alone cluster with two nodes A, B. Where node A accommodates Cassandra, Spark Master and Worker and node B contains the second spark worker. best, /Shahab
