You should try Torrent for this one, it will be faster. It’s still experimental 
but I believe it works pretty well and it just needs more testing to become the 
default.

Matei

On Mar 12, 2014, at 1:12 PM, Aureliano Buendia <buendia...@gmail.com> wrote:

> Is TorrentBroadcastFactory out of beta? IS it preferred over 
> HttpBroadcastFactory for large broadcasts?
> 
> What are the benefits of HttpBroadcastFactory as the default factory?
> 
> 
> On Wed, Mar 12, 2014 at 7:09 PM, Stephen Boesch <java...@gmail.com> wrote:
> Hi Josh,
>   So then   2^31 (2.2Bilion) * 2^6  (length of double)  = 128GB  would be max 
> array byte length with Doubles?
> 
> 
> 2014-03-12 11:30 GMT-07:00 Josh Marcus <jmar...@meetup.com>:
> 
> Aureliano,
> 
> Just to answer your second question (unrelated to Spark), arrays in java and 
> scala can't be larger than the maximum value of an Integer 
> (Integer.MAX_VALUE), which means that arrays are limited to about 2.2 billion 
> elements.  
> 
> --j
> 
> 
> 
> On Wed, Mar 12, 2014 at 1:08 PM, Aureliano Buendia <buendia...@gmail.com> 
> wrote:
> Hi,
> 
> I asked a similar question a while ago, didn't get any answers.
> 
> I'd like to share a 10 gb double array between 50 to 100 workers. The 
> physical memory of workers is over 40 gb, so it can fit in each memory. The 
> reason I'm sharing this array is that a cartesian operation is applied to 
> this array, and I want to avoid network shuffling.
> 
> 1. Is Spark broadcast built for pushing variables of gb size? Does it need 
> special configurations (eg akka config, etc) to work under this condition?
> 
> 2. (Not directly related to spark) Is the an upper limit for scala/java 
> arrays other than the physical memory? Do they stop working when the array 
> elements count exceeds a certain number?
> 
> 
> 

Reply via email to