--- Yasuo Ohgaki <[EMAIL PROTECTED]> wrote:
> Tim Converse wrote:
> > --- Yasuo Ohgaki <[EMAIL PROTECTED]> wrote:
> > 
> >>Dan Kalowsky wrote:
> >>
> >>>Anyone able to confirm or deny the validity of this
> >>
> >>patch?
> >>
> >>>
> >>>---------- Forwarded message ----------
> >>>
> >>>[2002-07-18 01:03:51] [EMAIL PROTECTED]
> >>>
> >>>The shuffle() function in the CVS is (still) broken.
> It
> >>
> >>does not
> >>
> >>>correctly generate all possible combinations of the
> >>
> >>array. (I know this
> >>
> >>>function was recently updated, this test is with new
> >>
> >>version.)
> >>
> >>>Here is a diff to fix the problem:
> >>>
> >>>--- array.c        Thu Jul 18 00:50:54 2002
> >>>+++ array.new.c    Thu Jul 18 00:49:35 2002
> >>>@@ -1441,7 +1441,7 @@
> >>> {
> >>>   Bucket **elems, *temp;
> >>>   HashTable *hash;
> >>>-  int j, n_elems, cur_elem = 0, rnd_idx, n_left;
> >>>+  int j, n_elems, rnd_idx, n_left;
> >>>
> >>>   n_elems = zend_hash_num_elements(Z_ARRVAL_P(array));
> >>>
> >>>@@ -1457,13 +1457,12 @@
> >>>           elems[j++] = temp;
> >>>   while (--n_left) {
> >>>           rnd_idx = php_rand(TSRMLS_C);
> >>>-          RAND_RANGE(rnd_idx, cur_elem, n_left,
> PHP_RAND_MAX);
> >>>-          if (rnd_idx != cur_elem) {
> >>>-                  temp = elems[cur_elem];
> >>>-                  elems[cur_elem] = elems[rnd_idx];
> >>>+          RAND_RANGE(rnd_idx, 0, n_left, PHP_RAND_MAX);
> >>>+          if (rnd_idx != n_left) {
> >>>+                  temp = elems[n_left];
> >>>+                  elems[n_left] = elems[rnd_idx];
> >>>                   elems[rnd_idx] = temp;
> >>>           }
> >>>-          cur_elem++;
> >>>   }
> >>>
> >>>   HANDLE_BLOCK_INTERRUPTIONS();
> >>>
> >>
> >>I've send the message to Dan, already.
> >>Just posting php-dev again.
> >>
> >>Top elements have more chance than end elements.
> >>All elements should have equal chance to be shuffled.
> > 
> > 
> > Actually, I think the algorithm of the patched version
> is
> > correct.  It's counterintuitive, but it doesn't matter
> how
> > many times any slot has an element shuffled out of it. 
> 
> It may not be intuitive that the algorithm is not
> perfectly
> correct. It may be enough for most purposes, though.
> 
> > What matters is the invariant that whenever
> elems[n_left]
> > is set, it's being set to an element randomly chosen
> from
> > the subset of elements that are left (including
> > elems[n_left]), and also that elems[n_left] is not
> messed
> > with thereafter. 
> 
> For example, if the last element isn't shuffled, it's
> stays
> in as the last element since the last element has only 1
> chance
> to be shuffled.
> 
> Current code has tendency that last elements stay as
> last.
> 
> i.e. Probability of the last element is shuffled is
>    (total_num-1)/total_num.
> while n th element to be shuffled is obviously has much
> more chance than last elements. It may be acceptable
> for most purposes.

It doesn't matter how likely a particular slot is to be
swapped. Some slots will be subject to (temporary) swap
more often than others.  What matters is whether the end
result includes all possible permutations with exactly
equal probability.

> 
> > 
> > You would think that the right thing to do is to set
> > elems[n_left] to a random choice from _all_ the
> elements in
> > the array, but this leads to some orderings being
> favored
> > over others.  (Check out Intro. to Algo., Cormen, 2nd
> ed.,
> > p 103)
> 
> elemes[n_left] to a random choice does not make sense, of
> course.
> 
> To give equal chances to be shuffled for all elements, we
> can write something like
> 
> for (current=0; current < num_of_elems; current++) {
>    random_index = rand();
>    RAND_RANGE(random_index, 0, num_of_elems-1,
> PHP_RAND_MAX);
>    if (array[current] != array[random_index])
>      swap(array[current], array[random_index]);
> }
> 
> Current code is almost ok, but this way is correct.
> This does not require any additional overhead at all,
> also.
> 
> --
> Yasuo Ohgaki

No, this is not correct.  You are choosing the element to
swap with array[current] from the entire range, from
array[0] to array[num_of_elems - 1].  This introduces a
subtle bias in favor of elements being moved one position
back.

In particular, for a three-element array there are 27
possible execution traces (three choices each time), and in
10 of those traces the second element becomes the first
element, when it should be 9 out of 27 (= 1/3) instead. 
Note also that all 27 execution traces are of equal
probability, yet somehow they must map to exactly 6
equal-probability orderings of three elements.  Since 27 is
not divisible by 6, this can't happen.

Instead, you need to loop through the array elements and
choose the next swap candidate for array[i] from the set of
elements between array[i] and array[n-1] (inclusive).  This
will have the result that when each array[i] is set (for
the final time) it is an unbiased choice from all the
elements that are still eligible (that have not been set
for the final time).  That's the loop invariant.  Notice
also that on a 3-element array there are exactly 6 distinct
execution traces (3 x 2 x 1), each of which maps to a
different one of the 6 possible orderings of three
elements.

If you don't believe me, take your algorithm and run it
many times, tracking how often element i in the input moves
to become element j in the output, and print the resulting
counts out as an n-by-n table.  You'll see a bias towards
elements moving one position back in the array.

--tim



__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

-- 
PHP Development Mailing List <http://www.php.net/>
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to