On Sat, Jul 5, 2014 at 10:17 AM, Koert Kuipers ko...@tresata.com wrote:
my initial approach to taking top k values of a rdd was using a
priority-queue monoid. along these lines:
rdd.mapPartitions({ items = Iterator.single(new PriorityQueue(...)) },
false).reduce(monoid.plus)
this works fine
. On
the driver you can just top k the combined top k from each partition
(assuming you have (object, count) for each top k list).
—
Sent from Mailbox https://www.dropbox.com/mailbox
On Sat, Jul 5, 2014 at 10:17 AM, Koert Kuipers ko...@tresata.com wrote:
my initial approach to taking top k values
, 2014 at 10:17 AM, Koert Kuipers ko...@tresata.com wrote:
my initial approach to taking top k values of a rdd was using a
priority-queue monoid. along these lines:
rdd.mapPartitions({ items = Iterator.single(new PriorityQueue(...)) },
false).reduce(monoid.plus)
this works fine, but looking