Yes sure for usual tests it is fine, but the broadcast is only done if we
are not in local mode (at least seems so).

In SparkContext we have def broadcast[T](value: T) =
env.broadcastManager.newBroadcast[T](value, isLocal)
the is local is computed from the master name ("local" or "local[..."). Now
If we look int HttpBroadcast we see
if (!isLocal) {
    HttpBroadcast.write(id, value_)
  }

The broadcast is not done in local. I guess this is an optimization in case
we run multiple threads sharing the same broadcasted variable. But perhaps
am I missing something?


2013/11/19 Sriram Ramachandrasekaran <[email protected]>

> Trying local[m], where m is the number of workers. For tests, local[2]
> should be ideal. This is the way to accomplish writing tests for Spark code
> generally.
>
>
> On Tue, Nov 19, 2013 at 10:03 PM, Eugen Cepoi <[email protected]>wrote:
>
>> Maybe a bug with HttpBroadcast or maybe my fault but can't find where :)
>>
>> The problem:
>>   At the beginning a job computes a treemap(string, someobject) with a
>> custom order (some dummy lowercase), this treemap is broadcasted.
>>   Then i use this map to do some matching against input rdd (excluding
>> those that don't exist).
>>   What happens? In local (bc is in that case not used) or by passing all
>> the treemap without broadcast I got more than 3M matchings, after broadcast
>> it falls to 20K.
>>
>>  Replacing HttpBroadcastFactory with TreeBroadcastFactory solves the
>> problem (I obtain expected results). I am trying to implement a test case
>> to reproduce it, but it is quite tricky in that case...
>>
>> BTW is there a way to reproduce the broadcast mechanism in local (I see
>> that the SparkEnv instance is shared as static, so I guess there is no easy
>> way)?
>>
>> Thanks,
>> Eugen
>>
>
>
>
> --
> It's just about how deep your longing is!
>

Reply via email to