I’d be happy to see this added to the core API. Matei
On Jan 23, 2014, at 5:39 PM, Andrew Ash <[email protected]> wrote: > Ah right of course -- perils of typing code without running it! > > It feels like this is a pretty core operation that should be added to the > main RDD API. Do other people not run into this often? > > When I'm validating a foreign key join in my cluster I often check to make > sure that the foreign keys land on valid values on the referenced table, and > the way I do that is checking to see what percentage of the references > actually land. > > > On Thu, Jan 23, 2014 at 6:36 PM, Evan R. Sparks <[email protected]> wrote: > Yup (well, with _._1 at the end!) > > > On Thu, Jan 23, 2014 at 5:28 PM, Andrew Ash <[email protected]> wrote: > You're thinking like this? > > A.map(v => (v,None)).join(B.map(v => (v,None))).map(_._2) > > > On Thu, Jan 23, 2014 at 6:26 PM, Evan R. Sparks <[email protected]> wrote: > You could map each to an RDD[(String,None)] and do a join. > > > On Thu, Jan 23, 2014 at 5:18 PM, Andrew Ash <[email protected]> wrote: > Hi spark users, > > I recently wanted to calculate the set intersection of two RDDs of Strings. > I couldn't find a .intersection() method in the autocomplete or in the Scala > API docs, so used a little set theory to end up with this: > > lazy val A = ... > lazy val B = ... > A.union(B).subtract(A.subtract(B)).subtract(B.subtract(A)) > > Which feels very cumbersome. > > Does anyone have a more idiomatic way to calculate intersection? > > Thanks! > Andrew > > > >
