I think good practice is not to hold on to SparkContext in mapFunction. On Sun, Jun 19, 2016 at 7:10 AM, Takeshi Yamamuro <linguin....@gmail.com> wrote:
> How about using `transient` annotations? > > // maropu > > On Sun, Jun 19, 2016 at 10:51 PM, Daniel Haviv < > daniel.ha...@veracity-group.com> wrote: > >> Hi, >> Just updating on my findings for future reference. >> The problem was that after refactoring my code I ended up with a scala >> object which held SparkContext as a member, eg: >> object A { >> sc: SparkContext = new SparkContext >> def mapFunction {} >> } >> >> and when I called rdd.map(A.mapFunction) it failed as A.sc is not >> serializable. >> >> Thanks, >> Daniel >> >> On Tue, Jun 7, 2016 at 10:13 AM, Takeshi Yamamuro <linguin....@gmail.com> >> wrote: >> >>> Hi, >>> >>> Since `HttpBroadcastFactory` has already been removed in master, so >>> you cannot use the broadcast mechanism in future releases. >>> >>> Anyway, I couldn't find a root cause only from the stacktraces... >>> >>> // maropu >>> >>> >>> >>> >>> On Mon, Jun 6, 2016 at 2:14 AM, Daniel Haviv < >>> daniel.ha...@veracity-group.com> wrote: >>> >>>> Hi, >>>> I've set spark.broadcast.factory to >>>> org.apache.spark.broadcast.HttpBroadcastFactory and it indeed resolve my >>>> issue. >>>> >>>> I'm creating a dataframe which creates a broadcast variable internally >>>> and then fails due to the torrent broadcast with the following stacktrace: >>>> Caused by: org.apache.spark.SparkException: Failed to get >>>> broadcast_3_piece0 of broadcast_3 >>>> at >>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:138) >>>> at >>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:138) >>>> at scala.Option.getOrElse(Option.scala:120) >>>> at >>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:137) >>>> at >>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:120) >>>> at >>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:120) >>>> at scala.collection.immutable.List.foreach(List.scala:318) >>>> at org.apache.spark.broadcast.TorrentBroadcast.org >>>> $apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:120) >>>> at >>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:175) >>>> at >>>> org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1220) >>>> >>>> I'm using spark 1.6.0 on CDH 5.7 >>>> >>>> Thanks, >>>> Daniel >>>> >>>> >>>> On Wed, Jun 1, 2016 at 5:52 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>>> >>>>> I found spark.broadcast.blockSize but no parameter to switch broadcast >>>>> method. >>>>> >>>>> Can you describe the issues with torrent broadcast in more detail ? >>>>> >>>>> Which version of Spark are you using ? >>>>> >>>>> Thanks >>>>> >>>>> On Wed, Jun 1, 2016 at 7:48 AM, Daniel Haviv < >>>>> daniel.ha...@veracity-group.com> wrote: >>>>> >>>>>> Hi, >>>>>> Our application is failing due to issues with the torrent broadcast, >>>>>> is there a way to switch to another broadcast method ? >>>>>> >>>>>> Thank you. >>>>>> Daniel >>>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> --- >>> Takeshi Yamamuro >>> >> >> > > > -- > --- > Takeshi Yamamuro >