There are two meanings of top floating around in this thread. I think that the OP was asking for "find the indexes of the largest elements of a vector". That is definitely what Sean answered.
I think that Dmitriy mean "find the most common elements of a collection". That is definitely harder and needs slight cleverness. On Mon, Apr 25, 2011 at 2:03 PM, Dmitriy Lyubimov <[email protected]> wrote: > if you "forget" some of the items you saw along with the counters, > how'd you add them back to the heap? > > > On Mon, Apr 25, 2011 at 1:08 PM, Jake Mannix <[email protected]> > wrote: > > On Mon, Apr 25, 2011 at 1:03 PM, Dmitriy Lyubimov <[email protected]> > wrote: > > > >> There's nothing in Mahout to do this (afaik). > >> > >> There's a statistical method for this doing that in linear time, which > >> is very easy to implement (and btw would make an excellent addition to > >> Mahout), google up 'countsketch'. > >> > > > > Taking top k items in a list of length n, when n >> k, is already O(n), > if > > you check the bottom of heap first before inserting. Not sure if > > countsketch > > is necessary for the usual use case. > > > > -jake > > >
