Thanks for the responses! I agree that b seems like it would be better. I could imagine optimizations that could be made if a filter call came after the sortByKey that would make the initial partitioning sub-optimal. Plus this way, it's a pain to use in the REPL.
Cheers, Ryan On Tue, Dec 10, 2013 at 7:06 AM, Andrew Ash <[email protected]> wrote: > Since sortByKey() invokes those right now, we should either a) change the > documentation to treat note that it kicks off actions or b) change the > method to execute those things lazily. > > Personally I'd prefer b but don't know how difficult that would be. > > > On Tue, Dec 10, 2013 at 1:52 AM, Jason Lenderman <[email protected]>wrote: > >> Hey Ryan, >> >> The *sortByKey* method creates a *RangePartitioner* (see >> Partitioner.scala), and the initialization code of the >> *RangePartitioner*invokes actions >> *count* and *sample*. >> >> >> Jason >> >> >> >> >> On Mon, Dec 9, 2013 at 7:01 PM, Ryan Prenger <[email protected]>wrote: >> >>> sortByKey is listed as a data transformation, not an action, yet it >>> launches a job. This doesn't seem to square with the documentation. >>> >>> Ryan >>> >> >> >
