Please don't top post!

See below for my comments.

On 09/07/14 07:04, Kynn Jones wrote:
Thanks to Gavin and Martijn for their suggestions. They're both simple good-ol' LCGs, and both avoid the need to check for collisions.

I ultimately went with a multiplicative LCG, like Martijn's, mostly because I understand better how it avoids collisions, so it was easier for me to tweak it in various ways.

In particular, I changed the prime number from 899981 to the very lucky prime 900001. This happens to work *perfectly*, because the range of such a generator is p-1, not p. (BTW, Martijn's choice of the "random" 2345 for the multiplier was a somewhat lucky one, since such generators are not full for arbitrary multipliers; for example, the one with modulus 899981 is not full for a multiplier of 3456, say.)

I also followed Martijn's pointer regarding the 3-argument form of python's pow function, and implemented a 3-argument pow for PL/PgSQL. I include all the code below, including a snippet borrowed from Gavin's post, and modified here and there. (I'm not very experienced with PL/PgSQL, so please feel free to point out ways in which my PL/PgSQL code can be improved.)

First the functions:

CREATE OR REPLACE FUNCTION pow_mod(bigx bigint, n bigint, m bigint) returns bigint AS $$
    DECLARE
    x  bigint;
    xx bigint;
    BEGIN
      IF n = 0 THEN RETURN 1; END IF;

      x := bigx % m;
      xx := (x * x) % m;

      IF n % 2 = 0 THEN
        RETURN pow_mod(xx, n/2, m);
      ELSE
        RETURN (x * pow_mod(xx, (n-1)/2, m)) % m;
      END IF;

    END;
    $$ LANGUAGE plpgsql strict immutable;


    -- "mcg" = "multiplicative congruential generator"
    CREATE OR REPLACE FUNCTION mcg_900001(i bigint) returns int AS $$
    BEGIN
      -- CHECK (0 < i AND i < 900001)
      RETURN 99999 + pow_mod(<INSERT YOUR MULTIPLIER HERE>, i, 900001);
    END;
    $$ LANGUAGE plpgsql strict immutable;


And here's a small demo:

    DROP TABLE IF EXISTS rtab;
    DROP SEQUENCE IF EXISTS rseq;

    CREATE SEQUENCE rseq;

    CREATE TABLE rtab
    (
        id       int PRIMARY KEY DEFAULT mcg_900001(nextval('rseq')),
        payload  int NOT NULL
    );

\timing on \\ INSERT INTO rtab (payload) VALUES (generate_series(1, 900000)); \timing off
    -- Timing is on.
    -- INSERT 0 900000
    -- Time: 201450.781 ms
    -- Timing is off.

    SELECT * FROM rtab WHERE 449990 < payload AND payload < 450011;
    --    id   | payload
    -- --------+---------
    --  539815 |  449991
    --  901731 |  449992
    --  878336 |  449993
    --  564275 |  449994
    --  863664 |  449995
    --  720159 |  449996
    --  987833 |  449997
    --  999471 |  449998
    --  999977 |  449999
    --  999999 |  450000
    --  921739 |  450001
    --  722684 |  450002
    --  596638 |  450003
    --  121592 |  450004
    --  687895 |  450005
    --  477734 |  450006
    --  585988 |  450007
    --  942869 |  450008
    --  175776 |  450009
    --  377207 |  450010
    -- (20 rows)

kj



On Sat, Jul 5, 2014 at 4:35 AM, Martijn van Oosterhout <klep...@svana.org <mailto:klep...@svana.org>> wrote:

    On Fri, Jul 04, 2014 at 09:24:31AM -0400, Kynn Jones wrote:
    > I'm looking for a way to implement pseudorandom primary keys in
    the range
    > 100000..999999.
    >
    > The randomization scheme does not need to be cryptographically
    strong.  As
    > long as it is not easy to figure out in a few minutes it's good
    enough.

    Well, a trick that produces a not too easy to guess sequence is:

    X(n) = p^n mod q

    where q is prime. Pick the largest prime that will fit, in this case
    899981 (I beleive) and some random p, say 2345.

    Then 100000 + (2345^n) mod 899981

    should be a sequence fitting your purpose. Unfortunatly, the pow()
    function in Postgres can't be used here (too slow and it overflows),
    but python has a helpful function:

    In [113]: len( set( pow(2345, n, 899981) for n in range(899981)  ) )
    Out[113]: 899980

    You could probably write an equivalent function in Postgres if
    necessary.

    Hope this helps,
    --
    Martijn van Oosterhout   <klep...@svana.org
    <mailto:klep...@svana.org>> http://svana.org/kleptog/
    > He who writes carelessly confesses thereby at the very outset
    that he does
    > not attach much importance to his own thoughts.
       -- Arthur Schopenhauer

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.10 (GNU/Linux)

    iQIVAwUBU7e450vt++dL5i1EAQhwew/9Fps1rkjMl85kAhD4nj9i5Gy+Y6T71vbS
    gXkHyJOVHr9r9kT8/1shG8OtTDEIKI1FEjDD5wdkTvj+K//wswPcpCIcj5eJVu5K
    56v8ITYnc3/YeYthoBI829adAreP7kjBAJlB8lENTAbxkdJmRBEGA3KjEnSLj7I/
    pdqlrrbkUq7r/OBFlJYFnv/YXLAFeOWQRAk+Be+UorAUmkrvoA0g7gW4VEFnQ1Qk
    k1kTYIEU3HUXVDHUeYTC2jjLj7cFVhYaQ52FA950MzkpkqFAej34gpitcOFC8yf+
    KSglMq4nAFNF6VCU50RwPLjMIXXbHTSYxjJ5n3qYo4CExlg0wBLcmuu25GHc69qP
    wEx71lPvXb4yfI3YNNHcH3Cwgl46u5M5Dt2aqWDcr+haAy8Hmhm5zqjTcfpUhyD+
    efi8B512YDr4HoDV6qEKx0MdjHUFptX34L8tjkmnNYQlXj89ATE82lUoTusiIxts
    axwJwbjl81cg3ZbtfoWPQ3LXXSRNI0NhIkHX0k2xp3uJ+TX76UmPZYSzQ3M2QrhX
    atFCkcE4RqY26zwtxCp27yjnKMsMkEeo4z7JIQKjkwLzHGPavNd2PFV1EfCXNhwz
    aDXXZHzwymJjhgP1BH0mXrL6VD0UiQb3RTRH82RpG0MaNBRImcncCAjKlN5UzDur
    dwQY8zHxuOQ=
    =IJFu
    -----END PGP SIGNATURE-----


Didn't I mention that my code & variable names are copyrighted and my methods patented? :-)

More seriously, its great that you picked out what you want from multiple replies! Understanding what is given to you is far better than blindly cutting & pasting.

Actually I just realized that having an additive constant is essentially meaningless in this Use Case, as we are feeding in a sequence of consecutive integers.


Cheers,
Gavin

Reply via email to