Is there any separation in the API between functions that can be built
solely on the existing exposed public API and ones which require access to
internals?

Just to maybe avoid bloat for composite functions like this that are for
user convenience?

(Ala something like lua's aux api vs core api?)


On Thu, Jan 23, 2014 at 8:33 PM, Matei Zaharia <[email protected]>wrote:

> I’d be happy to see this added to the core API.
>
> Matei
>
> On Jan 23, 2014, at 5:39 PM, Andrew Ash <[email protected]> wrote:
>
> Ah right of course -- perils of typing code without running it!
>
> It feels like this is a pretty core operation that should be added to the
> main RDD API.  Do other people not run into this often?
>
> When I'm validating a foreign key join in my cluster I often check to make
> sure that the foreign keys land on valid values on the referenced table,
> and the way I do that is checking to see what percentage of the references
> actually land.
>
>
> On Thu, Jan 23, 2014 at 6:36 PM, Evan R. Sparks <[email protected]>wrote:
>
>> Yup (well, with _._1 at the end!)
>>
>>
>> On Thu, Jan 23, 2014 at 5:28 PM, Andrew Ash <[email protected]> wrote:
>>
>>> You're thinking like this?
>>>
>>> A.map(v => (v,None)).join(B.map(v => (v,None))).map(_._2)
>>>
>>>
>>> On Thu, Jan 23, 2014 at 6:26 PM, Evan R. Sparks 
>>> <[email protected]>wrote:
>>>
>>>> You could map each to an RDD[(String,None)] and do a join.
>>>>
>>>>
>>>> On Thu, Jan 23, 2014 at 5:18 PM, Andrew Ash <[email protected]>wrote:
>>>>
>>>>> Hi spark users,
>>>>>
>>>>> I recently wanted to calculate the set intersection of two RDDs of
>>>>> Strings.  I couldn't find a .intersection() method in the autocomplete or
>>>>> in the Scala API docs, so used a little set theory to end up with this:
>>>>>
>>>>> lazy val A = ...
>>>>> lazy val B = ...
>>>>> A.union(B).subtract(A.subtract(B)).subtract(B.subtract(A))
>>>>>
>>>>> Which feels very cumbersome.
>>>>>
>>>>> Does anyone have a more idiomatic way to calculate intersection?
>>>>>
>>>>> Thanks!
>>>>> Andrew
>>>>>
>>>>
>>>>
>>>
>>
>
>

Reply via email to