I think good practice is not to hold on to SparkContext in mapFunction.

On Sun, Jun 19, 2016 at 7:10 AM, Takeshi Yamamuro <linguin....@gmail.com>
wrote:

> How about using `transient` annotations?
>
> // maropu
>
> On Sun, Jun 19, 2016 at 10:51 PM, Daniel Haviv <
> daniel.ha...@veracity-group.com> wrote:
>
>> Hi,
>> Just updating on my findings for future reference.
>> The problem was that after refactoring my code I ended up with a scala
>> object which held SparkContext as a member, eg:
>> object A  {
>>      sc: SparkContext = new SparkContext
>>      def mapFunction  {}
>> }
>>
>> and when I called rdd.map(A.mapFunction) it failed as A.sc is not
>> serializable.
>>
>> Thanks,
>> Daniel
>>
>> On Tue, Jun 7, 2016 at 10:13 AM, Takeshi Yamamuro <linguin....@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Since `HttpBroadcastFactory` has already been removed in master, so
>>> you cannot use the broadcast mechanism in future releases.
>>>
>>> Anyway, I couldn't find a root cause only from the stacktraces...
>>>
>>> // maropu
>>>
>>>
>>>
>>>
>>> On Mon, Jun 6, 2016 at 2:14 AM, Daniel Haviv <
>>> daniel.ha...@veracity-group.com> wrote:
>>>
>>>> Hi,
>>>> I've set  spark.broadcast.factory to
>>>> org.apache.spark.broadcast.HttpBroadcastFactory and it indeed resolve my
>>>> issue.
>>>>
>>>> I'm creating a dataframe which creates a broadcast variable internally
>>>> and then fails due to the torrent broadcast with the following stacktrace:
>>>> Caused by: org.apache.spark.SparkException: Failed to get
>>>> broadcast_3_piece0 of broadcast_3
>>>>         at
>>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:138)
>>>>         at
>>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:138)
>>>>         at scala.Option.getOrElse(Option.scala:120)
>>>>         at
>>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:137)
>>>>         at
>>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:120)
>>>>         at
>>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:120)
>>>>         at scala.collection.immutable.List.foreach(List.scala:318)
>>>>         at org.apache.spark.broadcast.TorrentBroadcast.org
>>>> $apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:120)
>>>>         at
>>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:175)
>>>>         at
>>>> org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1220)
>>>>
>>>> I'm using spark 1.6.0 on CDH 5.7
>>>>
>>>> Thanks,
>>>> Daniel
>>>>
>>>>
>>>> On Wed, Jun 1, 2016 at 5:52 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>
>>>>> I found spark.broadcast.blockSize but no parameter to switch broadcast
>>>>> method.
>>>>>
>>>>> Can you describe the issues with torrent broadcast in more detail ?
>>>>>
>>>>> Which version of Spark are you using ?
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Wed, Jun 1, 2016 at 7:48 AM, Daniel Haviv <
>>>>> daniel.ha...@veracity-group.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>> Our application is failing due to issues with the torrent broadcast,
>>>>>> is there a way to switch to another broadcast method ?
>>>>>>
>>>>>> Thank you.
>>>>>> Daniel
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>

Reply via email to