To answer my own question, that does seem to be the right way. I was
concerned about whether the data that a broadcast variable would end up
getting serialized if I used it as an instance variable of the function. I
realized that doesnt happen because the broadcast variable's value is
marked as transient.

1. Http -
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/broadcast/HttpBroadcast.scala
2. Torrent -
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala


On Thu, May 22, 2014 at 6:58 PM, Puneet Lakhina <puneet.lakh...@gmail.com>wrote:

> Hi,
>
> Im confused on what is the right way to use broadcast variables from java.
>
> My code looks something like this:
>
> Map<> val = //build Map to be broadcast
> Broadcast<Map<>> broadastVar = sc.broadcast(val);
>
>
> sc.textFile(...).map(new SomeFunction()) {
> //Do something here using broadcastVar
> }
>
> My question is, should I pass the broadcastVar to the SomeFunction as a
> constructor parameter that it can keep around as an instance variable i.e.
>
> sc.textFile(...).map(new SomeFunction(broadcastVar)) {
> //Do something here using broadcastVar
> }
>
> class SomeFunction extends Function<T> {
>  public SomeFunction(Broadcast<Map<>> var) {
>    this.var = var
>  }
>
>  public T call() {
>   //Do something
>  }
> }
>
> Is above the right way to utilize broadcast Variables when not using
> anonymous inner classes as functions?
> --
> Regards,
> Puneet
>
>


-- 
Regards,
Puneet

Reply via email to