Either you have to do rdd.collect and then broadcast or you can do a join On 22 Jul 2015 07:54, "Dan Dong" <dongda...@gmail.com> wrote:
> Hi, All, > > > I am trying to access a Map from RDDs that are on different compute nodes, > but without success. The Map is like: > > val map1 = Map("aa"->1,"bb"->2,"cc"->3,...) > > All RDDs will have to check against it to see if the key is in the Map or > not, so seems I have to make the Map itself global, the problem is that if > the Map is stored as RDDs and spread across the different nodes, each node > will only see a piece of the Map and the info will not be complete to check > against the Map( an then replace the key with the corresponding value) E,g: > > val matchs= Vecs.map(term=>term.map{case (a,b)=>(map1(a),b)}) > > But if the Map is not an RDD, how to share it like sc.broadcast(map1) > > Any idea about this? Thanks! > > > Cheers, > Dan > >