On Wed, 26 Apr 2006, Rob wrote:

On Thu, Apr 27, 2006 at 12:48:13AM +0000, Alexey Toptygin wrote:
Random shuffle in O(n)? How?

This algorithm is provably random, assuming you use a completely
random pseudo-random number generator (PRNG)

Yes, sorry, I've even coded that before; I guess I was having a senior moment. So, this is good provided I'm willing to use O(n) space and O(m) time, but it's not very nice space-wise if m << n.

set = set_init(k)
for i= 0 to m-1
        do
                j = array[PRNG[n]]
        } while set_exists(set,j)
        set_enter(set,j)

def set_init(k)
        malloc array of size k
        memset array to 0
        return array
def set_exists(set,j)
        i = hash(j)
        return array[i]==1
def set_enter(set,j)
        i = hash(j)
        array[i]=1

hash just needs to map j to something in [0,k-1] One way to do this
is to pick k such that it is a large prime, and pick some other prime
p smaller than k.  Then your hash can just be hash(x) = p^x(mod k).
Note, if this needs to be really fast, I can post some algs on quick
modular exponentiation.

Thank you for the pseudocode, I see what you're saying now, but it still doesn't make sense to me. ISTM the algorithm implies m < k < n, because if m > k, this will never terminate, and if k > n then shuffling is deterministic and more efficient in both space and time.

However, if k < n, then there exists at least one pair of unsigned integers a < n and b < n such that a^x(mod k) = b^x(mod k), so this algorith will never produce a set with both a and b in it - not truly random.

Plus, what about the overhead of picking a k and x? IIUC finding large primes is non-trivial.

There's one thing about this algorithm I really like though - O(m) space. The best thing I've thought of so far that works in O(m) space looks like:

/*
 * an array of size m is passed in via x; it's where we keep results, in
 * ascending order.
 */
void pickmfromn(unsigned n, unsigned m, unsigned *x){
        unsigned i, j, r, tmp;

        for (i = 0; i < m; i++) {
                r = RAND(n - i);
                /* this is the "heart" of the algorithm */
                for (j = 0; j < i && r >= x[j]; j++, r++);
                /* now shove r into x before the j'th element */
                for (; j < i; j++) {
                        tmp = x[j];
                        x[j] = r;
                        r = tmp;
                }
                x[i] = r;
        }
}

Assuming RAND is random, this gives any possible set with equal probablility, and the results are sorted as an added bonus, but unfortunately it is O(m^2) time. You can make the "heart" faster by doing a binary search keyed on r, adding the number of elements less than r to r, and stepping from there. That still gives you O(m^2) due to inserting r into the array. If you use a linked list instead, you can insert really fast, but now you're stuck with stepping in the "heart", so it's still O(m^2)...

                        Alexey

Reply via email to