Re: [PATCH] Use a more efficient algorithm for list shuffling

2015-05-11 Thread Salvador Fandino

On 05/11/2015 04:53 PM, Pierre Thierry wrote:

Scribit salvador fandino dies 11/05/2015 hora 16:00:

Considering those results I would choose 200 as the best cut point because
it is where SBCL and CCL perform best without incurring in a great penalty
for ECL and GCL (~40%). Note also that performance degrades quite fast for
SBCL and CCL once the optimal cut point is surpassed.


Why not use a different cut point for implementations where we know an
optimal one?


IMO, those optimal values for the cut-point I have found in GCL and ECL 
(and unless some hidden issue is discovered in my code) are just 
revealing some bottleneck on that implementations. There is no reason, 
for the divide-and-conquer side to be so relatively slow.


Actually, I have found that compiling the function on GCL just brings 
the cut-point around the 200 mark too and on ECL to around 1000.







Re: [PATCH] Use a more efficient algorithm for list shuffling

2015-05-11 Thread Pierre Thierry
Scribit salvador fandino dies 11/05/2015 hora 16:00:
> Considering those results I would choose 200 as the best cut point because
> it is where SBCL and CCL perform best without incurring in a great penalty
> for ECL and GCL (~40%). Note also that performance degrades quite fast for
> SBCL and CCL once the optimal cut point is surpassed.

Why not use a different cut point for implementations where we know an
optimal one?

Curiously,
Pierre Thierry
-- 
pie...@nothos.net
OpenPGP 0xD9D50D8A


signature.asc
Description: Digital signature


Re: [PATCH] Use a more efficient algorithm for list shuffling

2015-05-11 Thread salvador fandino
On Sun, May 10, 2015 at 2:02 AM, Svante v. Erichsen <
svante.v.erich...@web.de> wrote:

>
> How did you determine the threshold to switch to Fisher-Yates?
>
>
Mostly, I inferred it from my previous experience with similar
algorithms... and it is completely wrong!

I have been benchmarking an instrumentalized version of the shuffle-sublist
function (code attached) with several FLOSS common-lisp implementations and
found the following results:

For SBCL and Clozure CL, the optimal cut-point is around 200 and 150
respectively. I have run it with different list sizes and in a couple of
machines, and the results seem consistent.

For ECL, the optimal cut-point seems to be around 1. In any case, it
runs much slower than in SBCL or in CCL.

For GCL, the optimal cut-point seems to be around 5000 and it is also much
slower than SBCL or CCL.

Considering those results I would choose 200 as the best cut point because
it is where SBCL and CCL perform best without incurring in a great penalty
for ECL and GCL (~40%). Note also that performance degrades quite fast for
SBCL and CCL once the optimal cut point is surpassed.

I would also be interested to see how commercial Lisp implementations
behave.


list-shuffling.lisp
Description: Binary data


list-shuffling.sbcl.out
Description: Binary data


Re: [PATCH] Use a more efficient algorithm for list shuffling

2015-05-09 Thread Svante v. Erichsen
Hi!

OK, I have convinced myself that the algorithm is sound.  My train of thought:
starting from an empty result and an unshuffled starting list, you want each
picked element be a uniformly random element of the unshuffled part.
Fisher-Yates does this directly.  The merging steps instead divide the
unshuffled part into two parts, then randomly decide which of those to pick from
(weighted by their size), then pick an element randomly from that part.  The
trick to performance is to do that inner randomization by recursive shuffling.

Since each element gets picked (log n 2) times, the overall asymptotic running
time of (* n (log n)) also looks right.

How did you determine the threshold to switch to Fisher-Yates?


Yours aye

Svante


On 2015-05-08 12:21:37+0200, salvador fandino wrote:
> On Fri, May 8, 2015 at 12:06 AM, Svante v. Erichsen <
> svante.v.erich...@web.de> wrote:
> 
> > Hi!
> >
> > When I read “sort with random comparator”, I get a bad feeling.  I have not
> > examined the code deeply yet, but do you have some reference for an
> > explanation
> > or perhaps even a proof that this is not biased?
> >
> 
> A pseudo-formal justification of the algorithm follows:
> 
> How can you define a proper shuffling algorithm for a list L of n elements?
> 
> The first idea is that for any element E the probability of it ending at
> some position P is 1/N. But this is not enough, because there could be
> correlations between the assigned positions (in example, rotating the list
> (random n) positions honors that condition but is a quite bad shuffling
> algorithm).
> 
> A better definition of a shuffling algorithm is as follows:
> 
> Given...
> 
> 1) a set S = {s0, s1, ..., sm} containing m elements of L such that m < n
> 2) a set Q = {q0, q1, ..., qm} containing m positions (integers in the
> range [0, n)).
> 3) an element E of S such that E doesn't belong to S
> 4) a position P in [0, n) such that P doesn't belong to Q
> 
> We say that and algorithm A is a proper shuffling algorithm iff the
> probability of the assigned position to the element E being P conditioned
> by the position of the elements of S being Q (i.e.: pos(s0) = q0, pos(s1) =
> q1,..., pos(sm) = qm) is 1/(n-m) for every S, Q, E, P abiding the
> conditions above.
> 
> In simpler words, once we have fixed the final positions of m elements of
> the list, the rest of the elements are distributed among the remaining free
> positions with equal probability 1/(n-m).
> 
> Now, from the relation above, we can easily derive another one, that when
> applied constructively results on the algorithm I have implemented in
> shuffle-sublist:
> 
> Given (1), (2), (3), (4) and ...
> 
> 5) a partition of [0, n) formed by the two intervals A = [0, o) and B = [o,
> n) such that 0 < o < n.
> 6) QA = the intersection of Q and A. and size(QA) its number of elements.
> 
> The probability of E getting assigned any position in A is (o - size(QA)) /
> (n - m).
> 
> Or again in simpler words, the probability of some element E assigned to
> any position on A is equal to the number of free positions in A divided by
> the total number of free positions.
> 
> And that's what shuffle-sublist does, it selects o as (floor n 2) creating
> two partitions a and b of [0, n) and then distributes the elements from
> list among the two using the probability above (the only tricky thing, is
> that it does it destructively reusing the cons cells on list).
> 
> Then recursively, it calls itself until the sublist are short enough to
> make Fisher-Yates efficient.

-- 
Svante von Erichsen

GPG fingerprint: A78A D4FB 762F A922 A495  57E8 2649 9081 6E61 20DE


signature.asc
Description: Digital signature


Re: [PATCH] Use a more efficient algorithm for list shuffling

2015-05-08 Thread salvador fandino
On Fri, May 8, 2015 at 12:06 AM, Svante v. Erichsen <
svante.v.erich...@web.de> wrote:

> Hi!
>
> When I read “sort with random comparator”, I get a bad feeling.  I have not
> examined the code deeply yet, but do you have some reference for an
> explanation
> or perhaps even a proof that this is not biased?
>

A pseudo-formal justification of the algorithm follows:

How can you define a proper shuffling algorithm for a list L of n elements?

The first idea is that for any element E the probability of it ending at
some position P is 1/N. But this is not enough, because there could be
correlations between the assigned positions (in example, rotating the list
(random n) positions honors that condition but is a quite bad shuffling
algorithm).

A better definition of a shuffling algorithm is as follows:

Given...

1) a set S = {s0, s1, ..., sm} containing m elements of L such that m < n
2) a set Q = {q0, q1, ..., qm} containing m positions (integers in the
range [0, n)).
3) an element E of S such that E doesn't belong to S
4) a position P in [0, n) such that P doesn't belong to Q

We say that and algorithm A is a proper shuffling algorithm iff the
probability of the assigned position to the element E being P conditioned
by the position of the elements of S being Q (i.e.: pos(s0) = q0, pos(s1) =
q1,..., pos(sm) = qm) is 1/(n-m) for every S, Q, E, P abiding the
conditions above.

In simpler words, once we have fixed the final positions of m elements of
the list, the rest of the elements are distributed among the remaining free
positions with equal probability 1/(n-m).

Now, from the relation above, we can easily derive another one, that when
applied constructively results on the algorithm I have implemented in
shuffle-sublist:

Given (1), (2), (3), (4) and ...

5) a partition of [0, n) formed by the two intervals A = [0, o) and B = [o,
n) such that 0 < o < n.
6) QA = the intersection of Q and A. and size(QA) its number of elements.

The probability of E getting assigned any position in A is (o - size(QA)) /
(n - m).

Or again in simpler words, the probability of some element E assigned to
any position on A is equal to the number of free positions in A divided by
the total number of free positions.

And that's what shuffle-sublist does, it selects o as (floor n 2) creating
two partitions a and b of [0, n) and then distributes the elements from
list among the two using the probability above (the only tricky thing, is
that it does it destructively reusing the cons cells on list).

Then recursively, it calls itself until the sublist are short enough to
make Fisher-Yates efficient.


Re: [PATCH] Use a more efficient algorithm for list shuffling

2015-05-07 Thread Svante v. Erichsen
Hi!

When I read “sort with random comparator”, I get a bad feeling.  I have not
examined the code deeply yet, but do you have some reference for an explanation
or perhaps even a proof that this is not biased?

Yours aye

Svante

On 2015-05-07 17:51:01+0200, salvador fandino wrote:
> It is quite similar to quicksort with a random comparator but ensuring
> than the list is always divided in two parts of (almost) equal size,
> and so avoiding the O(N*N) worst case of quicksort.

-- 
Svante von Erichsen

GPG fingerprint: A78A D4FB 762F A922 A495  57E8 2649 9081 6E61 20DE


signature.asc
Description: Digital signature