Damn. This is a text book type of problem...
Thanks for reminding :)

--
Yasuo Ohgaki


Tim Converse wrote:
> --- Yasuo Ohgaki <[EMAIL PROTECTED]> wrote:
> 
>>Tim Converse wrote:
>>
>>>--- Yasuo Ohgaki <[EMAIL PROTECTED]> wrote:
>>>
>>>
>>>>Dan Kalowsky wrote:
>>>>
>>>>
>>>>>Anyone able to confirm or deny the validity of this
>>>>
>>>>patch?
>>>>
>>>>
>>>>>---------- Forwarded message ----------
>>>>>
>>>>>[2002-07-18 01:03:51] [EMAIL PROTECTED]
>>>>>
>>>>>The shuffle() function in the CVS is (still) broken.
>>>>
>>It
>>
>>>>does not
>>>>
>>>>
>>>>>correctly generate all possible combinations of the
>>>>
>>>>array. (I know this
>>>>
>>>>
>>>>>function was recently updated, this test is with new
>>>>
>>>>version.)
>>>>
>>>>
>>>>>Here is a diff to fix the problem:
>>>>>
>>>>>--- array.c        Thu Jul 18 00:50:54 2002
>>>>>+++ array.new.c    Thu Jul 18 00:49:35 2002
>>>>>@@ -1441,7 +1441,7 @@
>>>>>{
>>>>>   Bucket **elems, *temp;
>>>>>   HashTable *hash;
>>>>>-  int j, n_elems, cur_elem = 0, rnd_idx, n_left;
>>>>>+  int j, n_elems, rnd_idx, n_left;
>>>>>
>>>>>   n_elems = zend_hash_num_elements(Z_ARRVAL_P(array));
>>>>>
>>>>>@@ -1457,13 +1457,12 @@
>>>>>           elems[j++] = temp;
>>>>>   while (--n_left) {
>>>>>           rnd_idx = php_rand(TSRMLS_C);
>>>>>-          RAND_RANGE(rnd_idx, cur_elem, n_left,
>>>>
>>PHP_RAND_MAX);
>>
>>>>>-          if (rnd_idx != cur_elem) {
>>>>>-                  temp = elems[cur_elem];
>>>>>-                  elems[cur_elem] = elems[rnd_idx];
>>>>>+          RAND_RANGE(rnd_idx, 0, n_left, PHP_RAND_MAX);
>>>>>+          if (rnd_idx != n_left) {
>>>>>+                  temp = elems[n_left];
>>>>>+                  elems[n_left] = elems[rnd_idx];
>>>>>                   elems[rnd_idx] = temp;
>>>>>           }
>>>>>-          cur_elem++;
>>>>>   }
>>>>>
>>>>>   HANDLE_BLOCK_INTERRUPTIONS();
>>>>>
>>>>
>>>>I've send the message to Dan, already.
>>>>Just posting php-dev again.
>>>>
>>>>Top elements have more chance than end elements.
>>>>All elements should have equal chance to be shuffled.
>>>
>>>
>>>Actually, I think the algorithm of the patched version
>>
>>is
>>
>>>correct.  It's counterintuitive, but it doesn't matter
>>
>>how
>>
>>>many times any slot has an element shuffled out of it. 
>>
>>It may not be intuitive that the algorithm is not
>>perfectly
>>correct. It may be enough for most purposes, though.
>>
>>
>>>What matters is the invariant that whenever
>>
>>elems[n_left]
>>
>>>is set, it's being set to an element randomly chosen
>>
>>from
>>
>>>the subset of elements that are left (including
>>>elems[n_left]), and also that elems[n_left] is not
>>
>>messed
>>
>>>with thereafter. 
>>
>>For example, if the last element isn't shuffled, it's
>>stays
>>in as the last element since the last element has only 1
>>chance
>>to be shuffled.
>>
>>Current code has tendency that last elements stay as
>>last.
>>
>>i.e. Probability of the last element is shuffled is
>>   (total_num-1)/total_num.
>>while n th element to be shuffled is obviously has much
>>more chance than last elements. It may be acceptable
>>for most purposes.
> 
> 
> It doesn't matter how likely a particular slot is to be
> swapped. Some slots will be subject to (temporary) swap
> more often than others.  What matters is whether the end
> result includes all possible permutations with exactly
> equal probability.
> 
> 
>>>You would think that the right thing to do is to set
>>>elems[n_left] to a random choice from _all_ the
>>
>>elements in
>>
>>>the array, but this leads to some orderings being
>>
>>favored
>>
>>>over others.  (Check out Intro. to Algo., Cormen, 2nd
>>
>>ed.,
>>
>>>p 103)
>>
>>elemes[n_left] to a random choice does not make sense, of
>>course.
>>
>>To give equal chances to be shuffled for all elements, we
>>can write something like
>>
>>for (current=0; current < num_of_elems; current++) {
>>   random_index = rand();
>>   RAND_RANGE(random_index, 0, num_of_elems-1,
>>PHP_RAND_MAX);
>>   if (array[current] != array[random_index])
>>     swap(array[current], array[random_index]);
>>}
>>
>>Current code is almost ok, but this way is correct.
>>This does not require any additional overhead at all,
>>also.
>>
>>--
>>Yasuo Ohgaki
> 
> 
> No, this is not correct.  You are choosing the element to
> swap with array[current] from the entire range, from
> array[0] to array[num_of_elems - 1].  This introduces a
> subtle bias in favor of elements being moved one position
> back.
> 
> In particular, for a three-element array there are 27
> possible execution traces (three choices each time), and in
> 10 of those traces the second element becomes the first
> element, when it should be 9 out of 27 (= 1/3) instead. 
> Note also that all 27 execution traces are of equal
> probability, yet somehow they must map to exactly 6
> equal-probability orderings of three elements.  Since 27 is
> not divisible by 6, this can't happen.
> 
> Instead, you need to loop through the array elements and
> choose the next swap candidate for array[i] from the set of
> elements between array[i] and array[n-1] (inclusive).  This
> will have the result that when each array[i] is set (for
> the final time) it is an unbiased choice from all the
> elements that are still eligible (that have not been set
> for the final time).  That's the loop invariant.  Notice
> also that on a 3-element array there are exactly 6 distinct
> execution traces (3 x 2 x 1), each of which maps to a
> different one of the 6 possible orderings of three
> elements.
> 
> If you don't believe me, take your algorithm and run it
> many times, tracking how often element i in the input moves
> to become element j in the output, and print the resulting
> counts out as an n-by-n table.  You'll see a bias towards
> elements moving one position back in the array.
> 
> --tim
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> HotJobs - Search Thousands of New Jobs
> http://www.hotjobs.com
> 
> 




-- 
PHP Development Mailing List <http://www.php.net/>
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to