assuming of course that you don't have enough space to put all distinct items into the heap?
On Mon, Apr 25, 2011 at 2:03 PM, Dmitriy Lyubimov <[email protected]> wrote: > if you "forget" some of the items you saw along with the counters, > how'd you add them back to the heap? > > > On Mon, Apr 25, 2011 at 1:08 PM, Jake Mannix <[email protected]> wrote: >> On Mon, Apr 25, 2011 at 1:03 PM, Dmitriy Lyubimov <[email protected]> wrote: >> >>> There's nothing in Mahout to do this (afaik). >>> >>> There's a statistical method for this doing that in linear time, which >>> is very easy to implement (and btw would make an excellent addition to >>> Mahout), google up 'countsketch'. >>> >> >> Taking top k items in a list of length n, when n >> k, is already O(n), if >> you check the bottom of heap first before inserting. Not sure if >> countsketch >> is necessary for the usual use case. >> >> -jake >> >
