On Mon, Apr 25, 2011 at 2:33 PM, Ted Dunning <[email protected]> wrote:
> There are two meanings of top floating around in this thread. > > I think that the OP was asking for "find the indexes of the largest > elements > of a vector". That is definitely what Sean answered. > That's definitely what I was talking about as well. -jake > > I think that Dmitriy mean "find the most common elements of a collection". > That is definitely harder and needs slight cleverness. > > On Mon, Apr 25, 2011 at 2:03 PM, Dmitriy Lyubimov <[email protected]> > wrote: > > > if you "forget" some of the items you saw along with the counters, > > how'd you add them back to the heap? > > > > > > On Mon, Apr 25, 2011 at 1:08 PM, Jake Mannix <[email protected]> > > wrote: > > > On Mon, Apr 25, 2011 at 1:03 PM, Dmitriy Lyubimov <[email protected]> > > wrote: > > > > > >> There's nothing in Mahout to do this (afaik). > > >> > > >> There's a statistical method for this doing that in linear time, which > > >> is very easy to implement (and btw would make an excellent addition to > > >> Mahout), google up 'countsketch'. > > >> > > > > > > Taking top k items in a list of length n, when n >> k, is already O(n), > > if > > > you check the bottom of heap first before inserting. Not sure if > > > countsketch > > > is necessary for the usual use case. > > > > > > -jake > > > > > >
