There are two meanings of top floating around in this thread.

I think that the OP was asking for "find the indexes of the largest elements
of a vector".  That is definitely what Sean answered.

I think that Dmitriy mean "find the most common elements of a collection".
 That is definitely harder and needs slight cleverness.

On Mon, Apr 25, 2011 at 2:03 PM, Dmitriy Lyubimov <[email protected]> wrote:

> if you "forget" some of the items you saw along with the counters,
> how'd you add them back to the heap?
>
>
> On Mon, Apr 25, 2011 at 1:08 PM, Jake Mannix <[email protected]>
> wrote:
> > On Mon, Apr 25, 2011 at 1:03 PM, Dmitriy Lyubimov <[email protected]>
> wrote:
> >
> >> There's nothing in Mahout to do this (afaik).
> >>
> >> There's a statistical method for this doing that in linear time, which
> >> is very easy to implement (and btw would make an excellent addition to
> >> Mahout), google up 'countsketch'.
> >>
> >
> > Taking top k items in a list of length n, when n >> k, is already O(n),
> if
> > you check the bottom of heap first before inserting.  Not sure if
> > countsketch
> > is necessary for the usual use case.
> >
> >  -jake
> >
>

Reply via email to