Re: taking top k values of rdd

2014-07-05 Thread Nick Pentreath
On Sat, Jul 5, 2014 at 10:17 AM, Koert Kuipers ko...@tresata.com wrote: my initial approach to taking top k values of a rdd was using a priority-queue monoid. along these lines: rdd.mapPartitions({ items = Iterator.single(new PriorityQueue(...)) }, false).reduce(monoid.plus) this works fine

Re: taking top k values of rdd

2014-07-05 Thread Koert Kuipers
. On the driver you can just top k the combined top k from each partition (assuming you have (object, count) for each top k list). — Sent from Mailbox https://www.dropbox.com/mailbox On Sat, Jul 5, 2014 at 10:17 AM, Koert Kuipers ko...@tresata.com wrote: my initial approach to taking top k values

Re: taking top k values of rdd

2014-07-05 Thread Nick Pentreath
, 2014 at 10:17 AM, Koert Kuipers ko...@tresata.com wrote: my initial approach to taking top k values of a rdd was using a priority-queue monoid. along these lines: rdd.mapPartitions({ items = Iterator.single(new PriorityQueue(...)) }, false).reduce(monoid.plus) this works fine, but looking