Damn. This is a text book type of problem... Thanks for reminding :) -- Yasuo Ohgaki
Tim Converse wrote: > --- Yasuo Ohgaki <[EMAIL PROTECTED]> wrote: > >>Tim Converse wrote: >> >>>--- Yasuo Ohgaki <[EMAIL PROTECTED]> wrote: >>> >>> >>>>Dan Kalowsky wrote: >>>> >>>> >>>>>Anyone able to confirm or deny the validity of this >>>> >>>>patch? >>>> >>>> >>>>>---------- Forwarded message ---------- >>>>> >>>>>[2002-07-18 01:03:51] [EMAIL PROTECTED] >>>>> >>>>>The shuffle() function in the CVS is (still) broken. >>>> >>It >> >>>>does not >>>> >>>> >>>>>correctly generate all possible combinations of the >>>> >>>>array. (I know this >>>> >>>> >>>>>function was recently updated, this test is with new >>>> >>>>version.) >>>> >>>> >>>>>Here is a diff to fix the problem: >>>>> >>>>>--- array.c Thu Jul 18 00:50:54 2002 >>>>>+++ array.new.c Thu Jul 18 00:49:35 2002 >>>>>@@ -1441,7 +1441,7 @@ >>>>>{ >>>>> Bucket **elems, *temp; >>>>> HashTable *hash; >>>>>- int j, n_elems, cur_elem = 0, rnd_idx, n_left; >>>>>+ int j, n_elems, rnd_idx, n_left; >>>>> >>>>> n_elems = zend_hash_num_elements(Z_ARRVAL_P(array)); >>>>> >>>>>@@ -1457,13 +1457,12 @@ >>>>> elems[j++] = temp; >>>>> while (--n_left) { >>>>> rnd_idx = php_rand(TSRMLS_C); >>>>>- RAND_RANGE(rnd_idx, cur_elem, n_left, >>>> >>PHP_RAND_MAX); >> >>>>>- if (rnd_idx != cur_elem) { >>>>>- temp = elems[cur_elem]; >>>>>- elems[cur_elem] = elems[rnd_idx]; >>>>>+ RAND_RANGE(rnd_idx, 0, n_left, PHP_RAND_MAX); >>>>>+ if (rnd_idx != n_left) { >>>>>+ temp = elems[n_left]; >>>>>+ elems[n_left] = elems[rnd_idx]; >>>>> elems[rnd_idx] = temp; >>>>> } >>>>>- cur_elem++; >>>>> } >>>>> >>>>> HANDLE_BLOCK_INTERRUPTIONS(); >>>>> >>>> >>>>I've send the message to Dan, already. >>>>Just posting php-dev again. >>>> >>>>Top elements have more chance than end elements. >>>>All elements should have equal chance to be shuffled. >>> >>> >>>Actually, I think the algorithm of the patched version >> >>is >> >>>correct. It's counterintuitive, but it doesn't matter >> >>how >> >>>many times any slot has an element shuffled out of it. >> >>It may not be intuitive that the algorithm is not >>perfectly >>correct. It may be enough for most purposes, though. >> >> >>>What matters is the invariant that whenever >> >>elems[n_left] >> >>>is set, it's being set to an element randomly chosen >> >>from >> >>>the subset of elements that are left (including >>>elems[n_left]), and also that elems[n_left] is not >> >>messed >> >>>with thereafter. >> >>For example, if the last element isn't shuffled, it's >>stays >>in as the last element since the last element has only 1 >>chance >>to be shuffled. >> >>Current code has tendency that last elements stay as >>last. >> >>i.e. Probability of the last element is shuffled is >> (total_num-1)/total_num. >>while n th element to be shuffled is obviously has much >>more chance than last elements. It may be acceptable >>for most purposes. > > > It doesn't matter how likely a particular slot is to be > swapped. Some slots will be subject to (temporary) swap > more often than others. What matters is whether the end > result includes all possible permutations with exactly > equal probability. > > >>>You would think that the right thing to do is to set >>>elems[n_left] to a random choice from _all_ the >> >>elements in >> >>>the array, but this leads to some orderings being >> >>favored >> >>>over others. (Check out Intro. to Algo., Cormen, 2nd >> >>ed., >> >>>p 103) >> >>elemes[n_left] to a random choice does not make sense, of >>course. >> >>To give equal chances to be shuffled for all elements, we >>can write something like >> >>for (current=0; current < num_of_elems; current++) { >> random_index = rand(); >> RAND_RANGE(random_index, 0, num_of_elems-1, >>PHP_RAND_MAX); >> if (array[current] != array[random_index]) >> swap(array[current], array[random_index]); >>} >> >>Current code is almost ok, but this way is correct. >>This does not require any additional overhead at all, >>also. >> >>-- >>Yasuo Ohgaki > > > No, this is not correct. You are choosing the element to > swap with array[current] from the entire range, from > array[0] to array[num_of_elems - 1]. This introduces a > subtle bias in favor of elements being moved one position > back. > > In particular, for a three-element array there are 27 > possible execution traces (three choices each time), and in > 10 of those traces the second element becomes the first > element, when it should be 9 out of 27 (= 1/3) instead. > Note also that all 27 execution traces are of equal > probability, yet somehow they must map to exactly 6 > equal-probability orderings of three elements. Since 27 is > not divisible by 6, this can't happen. > > Instead, you need to loop through the array elements and > choose the next swap candidate for array[i] from the set of > elements between array[i] and array[n-1] (inclusive). This > will have the result that when each array[i] is set (for > the final time) it is an unbiased choice from all the > elements that are still eligible (that have not been set > for the final time). That's the loop invariant. Notice > also that on a 3-element array there are exactly 6 distinct > execution traces (3 x 2 x 1), each of which maps to a > different one of the 6 possible orderings of three > elements. > > If you don't believe me, take your algorithm and run it > many times, tracking how often element i in the input moves > to become element j in the output, and print the resulting > counts out as an n-by-n table. You'll see a bias towards > elements moving one position back in the array. > > --tim > > > > __________________________________________________ > Do You Yahoo!? > HotJobs - Search Thousands of New Jobs > http://www.hotjobs.com > > -- PHP Development Mailing List <http://www.php.net/> To unsubscribe, visit: http://www.php.net/unsub.php