On Mon, Apr 25, 2011 at 2:03 PM, Dmitriy Lyubimov <[email protected]> wrote:

> if you "forget" some of the items you saw along with the counters,
> how'd you add them back to the heap?
>

What do you mean?  This is the common search problem: you're iterating over
a list of int, double pairs (int is "docId", double is the "score"), and you
want to
keep the topK out of all N of them.  Keep a size-K heap of docId/score
pairs,
and as you iterate, check to see if your current one beats your current K'th
best
result - if not, you know it will *never* be in the overall topK, and it can
be discarded
(O(1) operation, this happens most of the time if K << n, and the list is
randomly ordered).  If it beats the bottom of the heap, insert it (O(log(K)
operation),
dropping the old bottom of the heap.

This is O(n + k*log(k)) for a randomly sorted list.

  -jake

Reply via email to