Hi, I asked a similar question a while ago, didn't get any answers.
I'd like to share a 10 gb double array between 50 to 100 workers. The physical memory of workers is over 40 gb, so it can fit in each memory. The reason I'm sharing this array is that a cartesian operation is applied to this array, and I want to avoid network shuffling. 1. Is Spark broadcast built for pushing variables of gb size? Does it need special configurations (eg akka config, etc) to work under this condition? 2. (Not directly related to spark) Is the an upper limit for scala/java arrays other than the physical memory? Do they stop working when the array elements count exceeds a certain number?