On 23/03/2012 20:47, Jonathan Street wrote:
I've just put some code together to test the advantage seen with
skipping combinations which aren't sorted.  It's significantly slower
compared to pass-ing everything but putting a quick method together to
do some actual work shows the improvement.  Unfortunately doing some
actual work pushes up the time taken.  My quick check (running this code
https://gist.github.com/2174656 ) suggests skipping unsorted iterations
brings down the total running time from 7000 to 4000 days.

My computer shows a similar speedup from 4500 days to 2000 days. Still too slow. Then I discovered, once again that itertools is awesome (in this case in 2.7 (and probably 3.x) only). Replacing sorted generator with

def sorted_exhaustive(num):
    for h in itertools.combinations_with_replacement(range(1,7), num):
        stub_comparison(h)

Cuts the time down to about a day.

There is some bad news however. The way you're using this to generate dice sets, building one list for the whole set and splitting it, doesn't work with this optimization. You'll get a 'low dice', 'higher dice' and 'highest dice', not what's needed at all.


I suspect to see 'reasonable' run times we would need to fundamentally
rethink the approach we're using.  At the moment I don't think we can
even brute force it by 'bursting' to the cloud.  I don't know how the
other providers compare but taking aws
http://aws.amazon.com/ec2/#instance as an example the individual cores
aren't anything special.  The advantage comes when you can use lots of
them, either on a instance with lots of cores or on multiple instances.

Keeping the fast die generation technique and generating sets of dice with

def split_dice(num, high=7):
    pool =  itertools.combinations_with_replacement(range(1,high), num)
    for a, b in itertools.combinations(pool, 2):
        dice.simple(a, b)

should let us trade some ram for speed.

On my computer running this with num = 9 (which would be sets of two nine sided dice with numbers from 1-6 on each side) takes about 35 seconds.

Trying something similar with sets of three dice:

def full_split_dice(num, high=7, size=3):
    pool =  itertools.combinations_with_replacement(range(1,high), num)
    for selection in itertools.combinations(pool, size):
        for a, b in itertools.combinations(selection, 2)
            dice.simple(a, b)

on my computer takes:
num    time
3      0.3sec
4      5sec
5      50 sec

and running it with num=3 and high=10 (which the example dice effectively had as each number from 1 to 9 was used twice the same die) takes 7 seconds.

Fast enough for you? :)

https://gist.github.com/2176168

It still needs a way to identify valid sets. Also scaling it up to all sets of six sided dice (num=6 and high=19) or sets of more than three dice is probably going to take more work. If I can find the time those are problems for Sunday.


Given the discussion here I'm tempted by the tutorials on high
performance python but at 3 hours they might be a little long.
http://pyvideo.org/video/614/high-performance-python-i
http://pyvideo.org/video/620/high-performance-python-ii
http://pyvideo.org/video/618/optimize-performance-and-scalability-with-paralle

That depends, it's only two of us so far. Are other people following along interestedly?

Ed

--
To post: [email protected]
To unsubscribe: [email protected]
Feeds: http://groups.google.com/group/python-north-west/feeds
More options: http://groups.google.com/group/python-north-west

Reply via email to