On Wed, 26 Apr 2006, Rob wrote:
On Thu, Apr 27, 2006 at 12:48:13AM +0000, Alexey Toptygin wrote:
Random shuffle in O(n)? How?
This algorithm is provably random, assuming you use a completely
random pseudo-random number generator (PRNG)
Yes, sorry, I've even coded that before; I guess I was having a senior
moment. So, this is good provided I'm willing to use O(n) space and O(m)
time, but it's not very nice space-wise if m << n.
set = set_init(k)
for i= 0 to m-1
do
j = array[PRNG[n]]
} while set_exists(set,j)
set_enter(set,j)
def set_init(k)
malloc array of size k
memset array to 0
return array
def set_exists(set,j)
i = hash(j)
return array[i]==1
def set_enter(set,j)
i = hash(j)
array[i]=1
hash just needs to map j to something in [0,k-1] One way to do this
is to pick k such that it is a large prime, and pick some other prime
p smaller than k. Then your hash can just be hash(x) = p^x(mod k).
Note, if this needs to be really fast, I can post some algs on quick
modular exponentiation.
Thank you for the pseudocode, I see what you're saying now, but it still
doesn't make sense to me. ISTM the algorithm implies m < k < n, because if
m > k, this will never terminate, and if k > n then shuffling is
deterministic and more efficient in both space and time.
However, if k < n, then there exists at least one pair of unsigned
integers a < n and b < n such that a^x(mod k) = b^x(mod k), so this
algorith will never produce a set with both a and b in it - not truly
random.
Plus, what about the overhead of picking a k and x? IIUC finding large
primes is non-trivial.
There's one thing about this algorithm I really like though - O(m) space.
The best thing I've thought of so far that works in O(m) space looks like:
/*
* an array of size m is passed in via x; it's where we keep results, in
* ascending order.
*/
void pickmfromn(unsigned n, unsigned m, unsigned *x){
unsigned i, j, r, tmp;
for (i = 0; i < m; i++) {
r = RAND(n - i);
/* this is the "heart" of the algorithm */
for (j = 0; j < i && r >= x[j]; j++, r++);
/* now shove r into x before the j'th element */
for (; j < i; j++) {
tmp = x[j];
x[j] = r;
r = tmp;
}
x[i] = r;
}
}
Assuming RAND is random, this gives any possible set with equal
probablility, and the results are sorted as an added bonus, but
unfortunately it is O(m^2) time. You can make the "heart" faster by doing
a binary search keyed on r, adding the number of elements less than r to
r, and stepping from there. That still gives you O(m^2) due to inserting r
into the array. If you use a linked list instead, you can insert really
fast, but now you're stuck with stepping in the "heart", so it's still
O(m^2)...
Alexey