Re: Why does sortByKey launch cluster job?

Ryan Prenger Tue, 10 Dec 2013 08:20:11 -0800

Thanks for the responses!  I agree that b seems like it would be better.  I
could imagine optimizations that could be made if a filter call came after
the sortByKey that would make the initial partitioning sub-optimal.  Plus
this way, it's a pain to use in the REPL.


Cheers,

Ryan


On Tue, Dec 10, 2013 at 7:06 AM, Andrew Ash <[email protected]> wrote:

> Since sortByKey() invokes those right now, we should either a) change the
> documentation to treat note that it kicks off actions or b) change the
> method to execute those things lazily.
>
> Personally I'd prefer b but don't know how difficult that would be.
>
>
> On Tue, Dec 10, 2013 at 1:52 AM, Jason Lenderman <[email protected]>wrote:
>
>> Hey Ryan,
>>
>> The *sortByKey* method creates a *RangePartitioner* (see
>> Partitioner.scala), and the initialization code of the 
>> *RangePartitioner*invokes actions
>> *count* and *sample*.
>>
>>
>> Jason
>>
>>
>>
>>
>> On Mon, Dec 9, 2013 at 7:01 PM, Ryan Prenger <[email protected]>wrote:
>>
>>> sortByKey is listed as a data transformation, not an action, yet it
>>> launches a job.  This doesn't seem to square with the documentation.
>>>
>>> Ryan
>>>
>>
>>
>

Re: Why does sortByKey launch cluster job?

Reply via email to