On Mon, Apr 25, 2011 at 2:03 PM, Dmitriy Lyubimov <[email protected]> wrote:
> if you "forget" some of the items you saw along with the counters, > how'd you add them back to the heap? > What do you mean? This is the common search problem: you're iterating over a list of int, double pairs (int is "docId", double is the "score"), and you want to keep the topK out of all N of them. Keep a size-K heap of docId/score pairs, and as you iterate, check to see if your current one beats your current K'th best result - if not, you know it will *never* be in the overall topK, and it can be discarded (O(1) operation, this happens most of the time if K << n, and the list is randomly ordered). If it beats the bottom of the heap, insert it (O(log(K) operation), dropping the old bottom of the heap. This is O(n + k*log(k)) for a randomly sorted list. -jake
