Re: Re[2]: Compute the top 100 million in the total 10 billion data efficiently.

Ted Dunning Wed, 22 Jan 2014 01:36:54 -0800

On Tue, Jan 21, 2014 at 7:31 AM, <[email protected]> wrote:

> You mentioned a approximate algorithm. That's great! I will check it out
> later. But, Is there a way to calculate it in a precise way?



If you want to select the 1% largest numbers, then you have a few choices.

If you have memory for the full set, you can sort.

If you have room to keep 1% of the samples in memory, you need to do 100
passes.

If you are willing to accept small errors, then you can do it in a single
pass.

These trade-offs are not optional, but are theorems.

Re: Re[2]: Compute the top 100 million in the total 10 billion data efficiently.

Reply via email to