Re: RDD.broadcast

2016-04-28 Thread Reynold Xin
)); > > Location l= locationMap.value.get(u.getLocationId(); > > Object result = method(f,u1,u2,l);//method implementation not important, > but requires all 3 objects > > return result; > > }); > > > > > > *From:* Marcin Tustin [mailto:mtus...

RE: RDD.broadcast

2016-04-28 Thread Ioannis.Deligiannis
From: Marcin Tustin [mailto:mtus...@handybook.com] Sent: 28 April 2016 12:27 To: Deligiannis, Ioannis (UK) Cc: dev@spark.apache.org Subject: Re: RDD.broadcast I don't know what your notation really means. I'm very much unclear on why you can't use the filter method for 1. If you're talking abo

Re: RDD.broadcast

2016-04-28 Thread Marcin Tustin
oin” > method. > > > > > > *From:* Marcin Tustin [mailto:mtus...@handybook.com > <javascript:_e(%7B%7D,'cvml','mtus...@handybook.com');>] > *Sent:* 28 April 2016 12:08 > *To:* Deligiannis, Ioannis (UK) > *Cc:* dev@spark.apache.org > <javascript:_e(%7B%7D,'cvm

Re: RDD.broadcast

2016-04-28 Thread Mike Hynes
I second knowing the use case for interest. I can imagine a case where knowledge of the RDD key distribution would help local computations, for relaticely few keys, but would be interested to hear your motive. Essentially, are you trying to achieve what would be an all-reduce type operation in

Re: RDD.broadcast

2016-04-28 Thread Marcin Tustin
Why would you ever need to do this? I'm genuinely curious. I view collects as being solely for interactive work. On Thursday, April 28, 2016, wrote: > Hi, > > > > It is a common pattern to process an RDD, collect (typically a subset) to > the driver and then

RDD.broadcast

2016-04-28 Thread Ioannis.Deligiannis
Hi, It is a common pattern to process an RDD, collect (typically a subset) to the driver and then broadcast back. Adding an RDD method that can do that using the torrent broadcast mechanics would be much more efficient. In addition, it would not require the Driver to also utilize its Heap