On Dec 29 2009, 6:38 pm, Cédric Krier <[email protected]> wrote:
> Hi,
>
> I would like to add a new type of sequence in Tryton that will generate random
> but unique number.
> A simple solution will be to generate random number and compare it with
> already generated number (stored in a table). But there is the birthday
> paradox [1] that reduces the speed of this method.
>
> An other way will be to store in a table every possible value (if the length
> of the number is fixed) and pick/remove one randomly. But here the generation
> of all values can be slow.
>
> Is there anybody that knows a better way?
>
> [1]http://en.wikipedia.org/wiki/Birthday_paradox
>
> --
> Cédric Krier

About the scalability of this approach :

Let's say we pick an integer. On my 32bit laptop, sys.maxint gives me:
2147483647, which is roughly 2*10**9.
So this means that the odds of generating a number already in the
sequence is 1 in 10 when 2*10**8 (200 million) numbers have been
generated. One in ten is not bad this means that the the creation of
the next item in the sequence is on average 10% slower than if there
was no "collision".

To give an idea of how much "time" is needed to generate a sequence
with 2*10**8 numbers: It takes 5479 years if 100 numbers are generated
each day (2*10**8 / (100*365) gives 5479).

So scalability may become a problem if one want to generate more than
10,000 number per day over 50 years (10,000 number a day reach the 10%
slowdown after 54 years) or more than 100,000 per day over 5 years.

The only use case I would see for such numbers is a big cable/water/
electricity/phone company that want to generate statements for their
customers.


But if we want to scale more at the expense of being a bit less random
would be to use a variant of the "store in a table every possible
value" solution:

- Generate a randomized table with the number from 1 to 10k.
- Consume table
- Generate a table with number from 10k+1 to 20k.
- ...

Another variant is to use two such "possible values tables":  one that
gives which intervals are still available (1to 10k, 10k+1 to 20k, etc)
and the other giving the available values in the currently active
interval.


Bertrand

-- 
[email protected] mailing list

Reply via email to