--- Yasuo Ohgaki <[EMAIL PROTECTED]> wrote: > Tim Converse wrote: > > --- Yasuo Ohgaki <[EMAIL PROTECTED]> wrote: > > > >>Dan Kalowsky wrote: > >> > >>>Anyone able to confirm or deny the validity of this > >> > >>patch? > >> > >>> > >>>---------- Forwarded message ---------- > >>> > >>>[2002-07-18 01:03:51] [EMAIL PROTECTED] > >>> > >>>The shuffle() function in the CVS is (still) broken. > It > >> > >>does not > >> > >>>correctly generate all possible combinations of the > >> > >>array. (I know this > >> > >>>function was recently updated, this test is with new > >> > >>version.) > >> > >>>Here is a diff to fix the problem: > >>> > >>>--- array.c Thu Jul 18 00:50:54 2002 > >>>+++ array.new.c Thu Jul 18 00:49:35 2002 > >>>@@ -1441,7 +1441,7 @@ > >>> { > >>> Bucket **elems, *temp; > >>> HashTable *hash; > >>>- int j, n_elems, cur_elem = 0, rnd_idx, n_left; > >>>+ int j, n_elems, rnd_idx, n_left; > >>> > >>> n_elems = zend_hash_num_elements(Z_ARRVAL_P(array)); > >>> > >>>@@ -1457,13 +1457,12 @@ > >>> elems[j++] = temp; > >>> while (--n_left) { > >>> rnd_idx = php_rand(TSRMLS_C); > >>>- RAND_RANGE(rnd_idx, cur_elem, n_left, > PHP_RAND_MAX); > >>>- if (rnd_idx != cur_elem) { > >>>- temp = elems[cur_elem]; > >>>- elems[cur_elem] = elems[rnd_idx]; > >>>+ RAND_RANGE(rnd_idx, 0, n_left, PHP_RAND_MAX); > >>>+ if (rnd_idx != n_left) { > >>>+ temp = elems[n_left]; > >>>+ elems[n_left] = elems[rnd_idx]; > >>> elems[rnd_idx] = temp; > >>> } > >>>- cur_elem++; > >>> } > >>> > >>> HANDLE_BLOCK_INTERRUPTIONS(); > >>> > >> > >>I've send the message to Dan, already. > >>Just posting php-dev again. > >> > >>Top elements have more chance than end elements. > >>All elements should have equal chance to be shuffled. > > > > > > Actually, I think the algorithm of the patched version > is > > correct. It's counterintuitive, but it doesn't matter > how > > many times any slot has an element shuffled out of it. > > It may not be intuitive that the algorithm is not > perfectly > correct. It may be enough for most purposes, though. > > > What matters is the invariant that whenever > elems[n_left] > > is set, it's being set to an element randomly chosen > from > > the subset of elements that are left (including > > elems[n_left]), and also that elems[n_left] is not > messed > > with thereafter. > > For example, if the last element isn't shuffled, it's > stays > in as the last element since the last element has only 1 > chance > to be shuffled. > > Current code has tendency that last elements stay as > last. > > i.e. Probability of the last element is shuffled is > (total_num-1)/total_num. > while n th element to be shuffled is obviously has much > more chance than last elements. It may be acceptable > for most purposes.
It doesn't matter how likely a particular slot is to be swapped. Some slots will be subject to (temporary) swap more often than others. What matters is whether the end result includes all possible permutations with exactly equal probability. > > > > > You would think that the right thing to do is to set > > elems[n_left] to a random choice from _all_ the > elements in > > the array, but this leads to some orderings being > favored > > over others. (Check out Intro. to Algo., Cormen, 2nd > ed., > > p 103) > > elemes[n_left] to a random choice does not make sense, of > course. > > To give equal chances to be shuffled for all elements, we > can write something like > > for (current=0; current < num_of_elems; current++) { > random_index = rand(); > RAND_RANGE(random_index, 0, num_of_elems-1, > PHP_RAND_MAX); > if (array[current] != array[random_index]) > swap(array[current], array[random_index]); > } > > Current code is almost ok, but this way is correct. > This does not require any additional overhead at all, > also. > > -- > Yasuo Ohgaki No, this is not correct. You are choosing the element to swap with array[current] from the entire range, from array[0] to array[num_of_elems - 1]. This introduces a subtle bias in favor of elements being moved one position back. In particular, for a three-element array there are 27 possible execution traces (three choices each time), and in 10 of those traces the second element becomes the first element, when it should be 9 out of 27 (= 1/3) instead. Note also that all 27 execution traces are of equal probability, yet somehow they must map to exactly 6 equal-probability orderings of three elements. Since 27 is not divisible by 6, this can't happen. Instead, you need to loop through the array elements and choose the next swap candidate for array[i] from the set of elements between array[i] and array[n-1] (inclusive). This will have the result that when each array[i] is set (for the final time) it is an unbiased choice from all the elements that are still eligible (that have not been set for the final time). That's the loop invariant. Notice also that on a 3-element array there are exactly 6 distinct execution traces (3 x 2 x 1), each of which maps to a different one of the 6 possible orderings of three elements. If you don't believe me, take your algorithm and run it many times, tracking how often element i in the input moves to become element j in the output, and print the resulting counts out as an n-by-n table. You'll see a bias towards elements moving one position back in the array. --tim __________________________________________________ Do You Yahoo!? HotJobs - Search Thousands of New Jobs http://www.hotjobs.com -- PHP Development Mailing List <http://www.php.net/> To unsubscribe, visit: http://www.php.net/unsub.php