I’d be happy to see this added to the core API.

Matei

On Jan 23, 2014, at 5:39 PM, Andrew Ash <[email protected]> wrote:

> Ah right of course -- perils of typing code without running it!
> 
> It feels like this is a pretty core operation that should be added to the 
> main RDD API.  Do other people not run into this often?
> 
> When I'm validating a foreign key join in my cluster I often check to make 
> sure that the foreign keys land on valid values on the referenced table, and 
> the way I do that is checking to see what percentage of the references 
> actually land.
> 
> 
> On Thu, Jan 23, 2014 at 6:36 PM, Evan R. Sparks <[email protected]> wrote:
> Yup (well, with _._1 at the end!)
> 
> 
> On Thu, Jan 23, 2014 at 5:28 PM, Andrew Ash <[email protected]> wrote:
> You're thinking like this?
> 
> A.map(v => (v,None)).join(B.map(v => (v,None))).map(_._2)
> 
> 
> On Thu, Jan 23, 2014 at 6:26 PM, Evan R. Sparks <[email protected]> wrote:
> You could map each to an RDD[(String,None)] and do a join.
> 
> 
> On Thu, Jan 23, 2014 at 5:18 PM, Andrew Ash <[email protected]> wrote:
> Hi spark users,
> 
> I recently wanted to calculate the set intersection of two RDDs of Strings.  
> I couldn't find a .intersection() method in the autocomplete or in the Scala 
> API docs, so used a little set theory to end up with this:
> 
> lazy val A = ...
> lazy val B = ...
> A.union(B).subtract(A.subtract(B)).subtract(B.subtract(A))
> 
> Which feels very cumbersome.
> 
> Does anyone have a more idiomatic way to calculate intersection?
> 
> Thanks!
> Andrew
> 
> 
> 
> 

Reply via email to