[Haskell-cafe] Re: Euler 201 performance mystery

2008-07-16 Thread apfelmus

[EMAIL PROTECTED] wrote:

Dear Group,

I've spend the last few days figuring out the solution to Euler Problem 201 in
haskell.  I first tried a relatively elegant approach based on Data.Map but
the performance was horrible.  I never actually arrived at the answer.  I then
rewrote the same algorithm using STUArrays and it was lightning.  I have
posted both versions of the code at:
http://www.maztravel.com/haskell/euler_problem_201.html
and would appreciate any insights that you master haskellers can provide on
why the speed difference is so huge.  Thanks in advance.
Henry Laxen


First, you may want to change the map type to

  type SumMap = Map (Int,Int) Int

since you're working with pairs (length, sum), not lists. I mean, you're doing 
the same with  STUArray (Int,Int) Int .



Did you try to estimate the running time of both data structures? Calculating 
the number of big-O operations on the back of an envelope is a very good 
guideline. So,  Data.Map.insert  takes O(log (size of map)) operations and so 
on. A rule of thumb is that a computer can perform 10 million operations per 
second (maybe 100, that was five years ago :)). Granted, this rule works best 
for C programs whereas Haskell is quite sensitive to constant factors, in 
particular concerning memory and cache effects. So, the rule is pretty accurate 
for an STUArray but you may have to multiply with 10 to get the right order of 
magnitude for Data.Map.



As you have noted, the choice of data structure (Map, STUArray, something else) 
is important (Map only touches existing sums, but STUArray has O(1) access and 
uses a tight representation in memory). But in the following, I want to discuss 
something what you did implicitly, namely how to *calculate* the general 
algorithm in a mechanical fashion. This follows the lines of Richard Bird's 
work, of which the book Algebra of Programming


http://web.comlab.ox.ac.uk/oucl/research/pdt/ap/pubs.html#Bird-deMoor96:Algebra

is one of the cornerstones. The systematic derivation of dynamic programming 
algorithms has been rediscovered in a more direct but less general fashion in


   http://bibiserv.techfak.uni-bielefeld.de/adp/


Euler problem 201 asks to calculate the possible sums you can form with 50 
elements from the set of square numbers from 1^2 to 100^2. Hence, given a function


  subsets [] = [[]]
  subsets (x:xs) = map (x:) (subsets xs) ++ subsets xs

that returns all subsets of a set, we can implement a solution as follows:

  squares   = map (^2) [1..100]
  euler201  = map sum . filter ((==50) . length) . subsets $ squares

While hopelessly inefficient, this solution is obviously correct! In fact, we 
did barely more than write down the task.


Ok ok, the solution is *not correct* because  map sum  may generate 
*duplicates*. In other words,  subsets  generates a lot of sets that have the 
same sum. But that's the key point for creating a better algorithm: we could be 
a lot faster if merging subsets with the same sum and generating these subsets 
could be interleaved.


To that end, we first have to move the length filter to after the summation:

   map sum . filter ((==50) . length)
 = map snd . filter ((==50) . fst) . map (length  sum)

The function () is very useful and defined as

  (length  sum) xs = (length xs, sum xs)

You can import (a generalization of) of it from Control.Arrow. In other words, 
our solution now reads


  euler201 = map snd . filter ((==50) . fst) . subsums $ squares
 where
 subsums = map (length  sum) . subsets

and our task is to find a definition of  subsums  that fuses summation and 
subset generation.


But this is a straightforward calculation! Let's assume that we have an 
implementation of Sets that we can use for merging duplicates. In other words, 
we assume operations


  singleton :: a - Set a
  union :: Set a - Set a - Set a
  map   :: (a - b) - Set a - Set b

so that  subsets  becomes

  subsets [] = singleton []
  subsets (x:xs) = map (x:) (subsets xs) `union` subsets xs

Now, let's calculate:

  subsums []
  =  { definition }
map (length  sum) (subsets   [])
  =  { subsets }
map (length  sum) (singleton [])
  =  { map }
singleton ((length  sum) [])
  =  { length  sum }
singleton (0,0)


  subsums (x:xs)
  =  { definition }
map (length  sum) (subsets (x:xs))
  =  { subsets }
map (length  sum) (map (x:) (subsets xs) `union` subsets xs)
  =  { map preserves unions }
 map (length  sum) (map (x:) subsets xs)
`union`  map (length  sum) (subsets xs)
  =  { map fusion }
 map (length  sum . (x:)) (subsets xs)
`union`  map (length  sum)(subsets xs)
  =  { move (length  sum) to the front, see footnote }
 map ((\(n,s) - (n+1,s+x)) . (length  sum)) (subsets xs)
`union`  map (length  sum) (subsets xs)
  =  { reverse map fusion }
 map (\(n,s) - (n+1,s+x)) (map (length  sum) (subsets xs))
`union`  map (length  sum) (subsets xs)
  =  { 

[Haskell-cafe] Re: Euler 201 performance mystery

2008-07-16 Thread apfelmus

apfelmus wrote:

In other words, we have now calculated the more efficient program

  euler201 = map snd . filter ((==50) . fst) . subsums $ squares
 where
 subsums [] = singleton (0,0)
 subsums (x:xs) = map (\(n,s) - (n+1,s+x)) (subsums xs) `union` subsums xs


I forgot something very important, namely that the common subexpression 
subsums xs  has to be shared


  euler201 = map snd . filter ((==50) . fst) . subsums $ squares
 where
 subsums [] = singleton (0,0)
 subsums (x:xs) = let s = subsums xs in map (\(n,s) - (n+1,s+x)) s `union`s

Otherwise, this exercise would be pointless and the runtime exponential ... :O


Regards,
apfelmus

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe