Hi Imran,

If I understood you correctly, you are suggesting to simply call broadcast
again from the driver program. This is exactly what I am hoping will work
as I have the Broadcast data wrapped up and I am indeed (re)broadcasting
the wrapper over again when the underlying data changes. However,
documentation seems to suggest that one cannot re-broadcast. Is my
understanding accurate?

Thanks
NB


On Mon, May 18, 2015 at 6:24 PM, Imran Rashid <iras...@cloudera.com> wrote:

> Rather than "updating" the broadcast variable, can't you simply create a
> new one?  When the old one can be gc'ed in your program, it will also get
> gc'ed from spark's cache (and all executors).
>
> I think this will make your code *slightly* more complicated, as you need
> to add in another layer of indirection for which broadcast variable to use,
> but not too bad.  Eg., from
>
> var myBroadcast = sc.broadcast( ...)
> (0 to 20).foreach{ iteration =>
>   //  ... some rdd operations that involve myBroadcast ...
>   myBroadcast.update(...) // wrong! dont' update a broadcast variable
> }
>
> instead do something like:
>
> def oneIteration(myRDD: RDD[...], myBroadcastVar: Broadcast[...]): Unit = {
>  ...
> }
>
> var myBroadcast = sc.broadcast(...)
> (0 to 20).foreach { iteration =>
>   oneIteration(myRDD, myBroadcast)
>   var myBroadcast = sc.broadcast(...) // create a NEW broadcast here, with
> whatever you need to update it
> }
>
> On Sat, May 16, 2015 at 2:01 AM, N B <nb.nos...@gmail.com> wrote:
>
>> Thanks Ayan. Can we rebroadcast after updating in the driver?
>>
>> Thanks
>> NB.
>>
>>
>> On Fri, May 15, 2015 at 6:40 PM, ayan guha <guha.a...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> broadcast variables are shipped for the first time it is accessed in a
>>> transformation to the executors used by the transformation. It will NOT
>>> updated subsequently, even if the value has changed. However, a new value
>>> will be shipped to any new executor comes into play after the value has
>>> changed. This way, changing value of broadcast variable is not a good idea
>>> as it can create inconsistency within cluster. From documentatins:
>>>
>>>  In addition, the object v should not be modified after it is broadcast
>>> in order to ensure that all nodes get the same value of the broadcast
>>> variable
>>>
>>>
>>> On Sat, May 16, 2015 at 10:39 AM, N B <nb.nos...@gmail.com> wrote:
>>>
>>>> Thanks Ilya. Does one have to call broadcast again once the underlying
>>>> data is updated in order to get the changes visible on all nodes?
>>>>
>>>> Thanks
>>>> NB
>>>>
>>>>
>>>> On Fri, May 15, 2015 at 5:29 PM, Ilya Ganelin <ilgan...@gmail.com>
>>>> wrote:
>>>>
>>>>> The broadcast variable is like a pointer. If the underlying data
>>>>> changes then the changes will be visible throughout the cluster.
>>>>> On Fri, May 15, 2015 at 5:18 PM NB <nb.nos...@gmail.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Once a broadcast variable is created using sparkContext.broadcast(),
>>>>>> can it
>>>>>> ever be updated again? The use case is for something like the
>>>>>> underlying
>>>>>> lookup data changing over time.
>>>>>>
>>>>>> Thanks
>>>>>> NB
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Broadcast-variables-can-be-rebroadcast-tp22908.html
>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>>
>>>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>
>>
>

Reply via email to