Hi,

I asked a similar question a while ago, didn't get any answers.

I'd like to share a 10 gb double array between 50 to 100 workers. The
physical memory of workers is over 40 gb, so it can fit in each memory. The
reason I'm sharing this array is that a cartesian operation is applied to
this array, and I want to avoid network shuffling.

1. Is Spark broadcast built for pushing variables of gb size? Does it need
special configurations (eg akka config, etc) to work under this condition?

2. (Not directly related to spark) Is the an upper limit for scala/java
arrays other than the physical memory? Do they stop working when the array
elements count exceeds a certain number?

Reply via email to