Re: [Haskell-cafe] V.I.P.s and the associativity of merge'

2011-01-01 Thread Will Ness
Heinrich Apfelmus apfelmus at quantentunnel.de writes:

 
 Will Ness wrote:
  Heinrich Apfelmus writes:
  Here an example where the VIP merge would give a different result
 
   bad = tfold $ (1:10:undefined) : (2:3:5:undefined) : (4:undefined) :
 error bad
  
  The reason to *not* have the lazy patterns in foldTree for primes, as 
Daniel 
  Fischer discovered back then, is that they give it a space leak. Case in 
point -
   http://ideone.com/DLHp2 : 
   
  [...]
   
 tfold ((x:xs):t) = x : xs `merge` tfold (pairs t)
   where pairs ((x:xs):ys:t) = (x : merge xs ys) : pairs t
   
 hfold  xs = serve . foldTree  mergeP . map vip $ xs
 hfold' xs = serve . foldTree' mergeP . map vip $ xs
   
 foldTree f ~(x:xs) = x `f` foldTree f (pairs xs)
   where pairs ~(x: ~(y:ys)) = f x y : pairs ys
   
 foldTree' f (x:xs) = x `f` foldTree' f (pairs xs)
   where pairs (x: (y:ys)) = f x y : pairs ys
 
  [...] 
   
  so hfold' too appears to be over-eager to-the-right, although still more 
  productive than tfold.
 
 Ah, the lazy patterns in  foldTree  are a different issue, sorry for my 
 bad choice of example.
 
 While I still don't understand the trade-off that turns the lazy 
 patterns into a space leak, there is no harm in allowing  foldTree  to 
 see the complete spine of the list. What we do want to forbid is looking 
 at the elements of that list too early. 

This turns out to be too optimistic a demand on data, in general. 

 In other words, the example 
 should read
 
  bad = tfold $
  (1:10:undefined) : (2:3:5:undefined) : (4:undefined)
  : repeat (error bad : undefined)
 
 i.e. the previously unknown tail is replaced with an infinite list of 
 undefined elements. This example can properly distinguish between the 
 not-so-correct  tfold  and proper VIP implementations (or other 
 implementations that don't do unnecessary comparisons).
 


will have to think this over, but in the mean time, they *both* turn out to be 
*completely* and utterly *wrong* :) in a general case (although probably for 
different reasons).

Here's where:


 *Main mapM_ print $ take 5 $ map (take 10) 
[concatMap (replicate 3) [n,n+1..]|n-[1..]]
 [1,1,1,2,2,2,3,3,3,4]
 [2,2,2,3,3,3,4,4,4,5]
 [3,3,3,4,4,4,5,5,5,6]
 [4,4,4,5,5,5,6,6,6,7]
 [5,5,5,6,6,6,7,7,7,8]

 *Main take 20 $ hfold [concatMap (replicate 3) [n,n+1..]|n-[1..]]
 [1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7]

 *Main take 20 $ tfold [concatMap (replicate 3) [n,n+1..]|n-[1..]]
 [1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7]


when it should'a been 


 [1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,4,4]


Cheers,

:)


 Regards,
 Heinrich Apfelmus
 
 --
 http://apfelmus.nfshost.com
 





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] V.I.P.s and the associativity of merge'

2011-01-01 Thread Will Ness
Will Ness will_n48 at yahoo.com writes:

 
 ... they *both* turn out to be 
 *completely* and utterly *wrong* :) in a general case (although probably for 
 different reasons).


Sorry, my bad. Thought in terms of merge, but the definiton used in VIP code 
was really an union.

When definition was changed to a real merge, non-removing of duplicates, 
everything was as expected in that case, for both versions.


  *Main take 20 $ hfold [concatMap (replicate 3) [n,n+1..]|n-[1..]]
  *Main take 20 $ tfold [concatMap (replicate 3) [n,n+1..]|n-[1..]]

  [1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,4,4]


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] V.I.P.s and the associativity of merge'

2010-12-30 Thread Will Ness
Heinrich Apfelmus apfelmus at quantentunnel.de writes:

 
 Leon Smith wrote:
 
  [1] http://permalink.gmane.org/gmane.comp.lang.haskell.cafe/84666
  [2] http://apfelmus.nfshost.com/articles/implicit-heaps.html
  [3] 
 http://hackage.haskell.org/packages/archive/data-ordlist/0.4.4/doc/html/Data-
List-Ordered.html#v:mergeAll
 
 
 Will Ness
  
  primes = 2: primes' 
 where
  primes' = 3: 5: [7,9..] `minus` tfold
[ [p*p,p*p+2*p..] | p - primes' ]   
  tfold ((x:xs):t)= x : xs `union` tfold (pairs t)
  pairs ((x:xs):ys:t) = (x: union xs ys) : pairs t
 
 Unfortunately, it turns out that this program is clear, shorter ... and 
 subtly wrong. :)
 
 Here an example where the VIP merge would give a different result
 
  bad = tfold $ (1:10:undefined) : (2:3:5:undefined) : (4:undefined) :
error bad
 
 We have
 
  ghci bad
  [1,2*** Exception: bad
 
 but the VIP version would give at least
 
  ghci bad
  [1,2,3,4*** Exception: bad / Prelude: undefined
 



The reason to *not* have the lazy patterns in foldTree for primes, as Daniel 
Fischer discovered back then, is that they give it a space leak. Case in point -
 http://ideone.com/DLHp2 : 
 

- 1M primes:  2M primes: --- 3M: --- ideone #:
- no-VIPs:
smart fold: 1.90s- 4.8MB  4.42s- 4.8MB  7.40s- 4.8MB  r3bdL
- VIPs:
smart fold: 1.95s- 4.8MB  4.53s- 4.8MB  7.45s- 4.8MB  4ACpe
simple  fold: 2.04s- 4.8MB  4.76s- 4.8MB  7.86s- 4.8MB  av9XR
lazy  pats: 2.44s-20.1MB  5.70s-21.1MB  9.85s-42.6MB  DLHp2
 
Also, having 
 
   tfold ((x:xs):t) = x : xs `merge` tfold (pairs t)
 where pairs ((x:xs):ys:t) = (x : merge xs ys) : pairs t
 
   hfold  xs = serve . foldTree  mergeP . map vip $ xs
   hfold' xs = serve . foldTree' mergeP . map vip $ xs
 
   foldTree f ~(x:xs) = x `f` foldTree f (pairs xs)
 where pairs ~(x: ~(y:ys)) = f x y : pairs ys
 
   foldTree' f (x:xs) = x `f` foldTree' f (pairs xs)
 where pairs (x: (y:ys)) = f x y : pairs ys
 
and
 
   bad = (1:10:error 1) : (2:3:5:error 2) : (4:error 4) 
   : error bad
   bad2 = (1:10:error 1) : (2:3:5:error 2) : (4:error 4)
   : (5:error 5) : (6:error 6)
   : (7:error 7) 
   : error bad2

we get
 
   *Main hfold bad
   [1,2,3,4*** Exception: bad
   *Main hfold' bad
   [1,2,3,4*** Exception: bad
   *Main tfold bad
   [1,2*** Exception: bad
 
   *Main hfold bad2
   [1,2,3,4*** Exception: 4
   *Main hfold' bad2
   [1,2,3,4*** Exception: bad2
   *Main tfold bad2
   [1,2*** Exception: bad2
 
so hfold' too appears to be over-eager to-the-right, although still more 
productive than tfold.



 Regards,
 Heinrich Apfelmus
 
 --
 http://apfelmus.nfshost.com
 





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] New simplified primes; no VIP necessary.

2010-12-23 Thread Will Ness
Hi,

For those who remember the discussion about this about a year ago, it turns out 
there was a simpler version after all lurking somewhere in there (or is it 
_out_?).

I've just posted it to the haskellwiki's Prime Numbers page:

primes = 2: primes' 
   where
primes' = 3: 5: [7,9..] `minus` tfold
  [ [p*p,p*p+2*p..] | p - primes' ]   
tfold ((x:xs):t)= x : xs `union` tfold (pairs t)
pairs ((x:xs):ys:t) = (x: union xs ys) : pairs t

The full code with double-feed, wheel and better folding structure is also 
there. Speed and complexity weren't changed by this. But it is rather 
_short_. :)

Cheers,



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Why no merge and listDiff?

2010-01-26 Thread Will Ness
Derek Elkins derek.a.elkins at gmail.com writes:

 
 On Wed, Jan 20, 2010 at 9:42 AM, Will Ness will_n48 at yahoo.com wrote:
  Derek Elkins derek.a.elkins at gmail.com writes:
  On Sun, Jan 17, 2010 at 2:22 PM, Will Ness will_n48 at yahoo.com wrote:
   Hello cafe,
  
   I wonder, if we have List.insert and List.union, why 
   no List.merge (:: Ord a = [a] - [a] - [a])
   and no List.minus ? These seem to be pretty general
   operations.
 
  You
  probably also want to look at the package data-ordlist on hackage
  (http://hackage.haskell.org/packages/archive/data-
ordlist/0.0.1/doc/html/Data-
  OrdList.html)
  which represents sets and bags as ordered lists and has all of the
  operations you mention.
 
 
  I did, thanks again! Although, that package deals with non-decreasing lists,
  i.e. lists with multiples possibly. As such, its operations produce non-
  decreasing lists, i.e. possibly having multiples too.
 
 It is clear that some of the operations are guaranteed to produce sets
 given sets.  The documentation could be better in this regard though.
 
 The 'union' and 'minus' functions of ordlist meet this requirement if
 you satisfy the preconditions.


Yes, thanks, it's exactly what I was looking for. I've recognized from the code 
that `minus' was OK, but `merge' was different. As it turns out, OrdList.union 
is exactly what I have under `merge'. Better (or any at all really) 
documentation for Data.OrdList would be a big help. 

I don't know if it's at all easy to separate Sets and Bags, though it may seem 
desirable. I seem to have read something about Circle/Ellipse problem, i.e. the 
Sets/Bags problem which are not easily detachable from one another? Although I 
don't know the details of that. 

The background for this is my attempts to classify the various primes-
generating code variants. Apparently, the essense of sieve is the composites 
removal, and both composites and natural numbers are naturally represented as 
strictly increasing lists. Same with merging the lists of multiples of each 
prime to construct the composites. I had to provide the `minus' and `merge' 
definitions along with the actual code and searched for something standard.

You can check it out on the Haskellwiki Prime Numbers page (work still in 
progress, the comparison tables are missing). We had also a recent thread here 
in cafe under FASTER primes. The original idea of Heinrich Apfelmus of 
treefold merging the composites really panned out. I found a little bit better 
structure for the folding tree, and Daniel Fischer was a great help in fixing 
the space leaks there (two of them) so that now the resulting code, with wheel 
optimization, runs very close to the PQ-based O'Neill's sieve (actually faster 
than it if interpreted in GHCi). More importantly (?) there's a natural 
progression of code now, straight from the classic Turner's sieve, so it's not 
an ad-hoc thing anymore.

It also became apparent that the essence of prime wheels is Euler's sieve. And 
vice versa. :)

Thanks a lot for all the help from all the posters!

Cheers,


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Why no merge and listDiff?

2010-01-26 Thread Will Ness
Heinrich Apfelmus apfelmus at quantentunnel.de writes:

 
 Will Ness wrote:
  You can check it out on the Haskellwiki Prime Numbers page (work still in 
  progress, the comparison tables are missing). We had also a recent thread 
here 
  in cafe under FASTER primes. The original idea of Heinrich Apfelmus of 
  treefold merging the composites really panned out.
 
 (Just for historical reference, credit for the data structure that works
 with infinite merges goes to Dave Bayer, I merely contributed the
 mnemonic aid of interpreting it in terms of VIPs.)
 
 Regards,
 Heinrich Apfelmus
 
 --
 http://apfelmus.nfshost.com
 


yes, yes, my bad. GMANE is very unreliable at presenting the discussion threads 
in full. :| I saw it first on the Haskellwiki page though, and it was your code 
there, that's the reason for my mistake. :) 



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Why no merge and listDiff?

2010-01-20 Thread Will Ness
Derek Elkins derek.a.elkins at gmail.com writes:

 
 On Sun, Jan 17, 2010 at 2:22 PM, Will Ness will_n48 at yahoo.com wrote:
  Hello cafe,
 
  I wonder, if we have List.insert and List.union, why no List.merge (:: Ord 
a =
  [a] - [a] - [a]) and no List.minus ? These seem to be pretty general
  operations.
 
 Presumably by List.minus you mean the (\\) function in Data.List. 

No, it has to search its second list over and over from the start, to be able 
to deal with unordered lists, so its performance can't be good. 


 You
 probably also want to look at the package data-ordlist on hackage
 (http://hackage.haskell.org/packages/archive/data-ordlist/0.0.1/doc/html/Data-
OrdList.html)
 which represents sets and bags as ordered lists and has all of the
 operations you mention.


I did, thanks again! Although, that package deals with non-decreasing lists, 
i.e. lists with multiples possibly. As such, its operations produce non-
decreasing lists, i.e. possibly having multiples too. 

I meant strictly increasing ordered lists, without multiples, for which the two 
operations, 'merge' and 'minus', would also have to produce like lists, i.e 
strictly increasing, without multiples.

I guess the first variety is more appropriate for bags, and the second one - 
for sets. The two would have to be de-conflated for that. (?)



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Why no merge and listDiff?

2010-01-20 Thread Will Ness
Christian Maeder Christian.Maeder at dfki.de writes:

 Will Ness schrieb:
  I meant strictly increasing ordered lists, without multiples, for which the 
two 
  operations, 'merge' and 'minus', would also have to produce like lists, i.e 
  strictly increasing, without multiples.
 
 Why don't you use directly Data.Set?

It says it's based on size balanced Trees? I initially wondered why no such 
fundamental operations as merge and minus for _lists_, in the stadard libraries?

Also, its to/from list conversions are O(n), so probably won't work for 
infinite lists. 

I was told the trend is to move specifics to hackage packages, but I wonder why 
shouldn't such fundamental operations be just included in standard Data.List?

 
  I guess the first variety is more appropriate for bags, and the second one
  - for sets. The two would have to be de-conflated for that. (?)
 
 There are also bags aka multisets:
 http://hackage.haskell.org/package/multiset

it's too seems to be based on trees.

Data.Ordlist seems to be a good match, except for its conflation of 
ascending/non-decreasing lists under one ordered category (i.e. sets/bags 
distinction).




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Why no merge and listDiff?

2010-01-17 Thread Will Ness
Hello cafe,

I wonder, if we have List.insert and List.union, why no List.merge (:: Ord a = 
[a] - [a] - [a]) and no List.minus ? These seem to be pretty general 
operations.

Brief look into haskell-prime-report/list.html reveals nothing.

Could we please have them?

On the wider perspective, is their a way to declare an /ordered/ list on the 
type level (e.g. [1,2,3] would be one, but not [2,3,1])? Non-decreasing lists? 
Cyclical, or of certain length? What are such types called?

TIA!


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes

2010-01-16 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 Am Donnerstag 14 Januar 2010 08:25:48 schrieb Will Ness:
  Daniel Fischer daniel.is.fischer at web.de writes:
   Am Mittwoch 13 Januar 2010 10:43:42 schrieb Heinrich Apfelmus:
I wonder whether it's really the liveness of  pair  in
   
  mergeSP (a,b) pair
     = let sm = spMerge b (fst pair)
       in (a ++ fst sm, merge (snd sm) (snd pair))
   
that is responsible for the space leak, ...
  
   I think that is responsible. At least that's how I understand the
   core:
  
   mergeSP (a,b) ~(c,d) = (a ++ bc, merge b' d)
  where
 (bc, b') = spMerge b c
 spMerge ...
 
  That is equivalent to
 
first (a++) . second (`merge`d) $ spMerge b c
 
  and Daniel's fix is equivalent to
 
first (a++) $ spMerge b c d
 


That should've been

  mergeSP (a,b) p = first(a++) . second(`merge`snd p) $ spMerge b (fst p)

and

  mergeSP (a,b) p = first(a++) $ spMerge b (fst p) (snd p)


The code fragments you've posted are essentially


  mergeSP (a,b) p = let res = case p of (c,_) -
case spMerge b c of (# x,y #) -
  (x,y)
in
   (# (++) a (case res of (bc,_)- bc) ,
  case res of (_,b') - 
case p of (_,d) - merge b' d  #)

and

  mergeSP (a,b) p = let res = case p of (c,d) -
case spMerge b c d of (# x,y #) -
  (x,y)
in
   (# (++) a (case res of (bc,_)- bc) ,
  case res of (_,b') - b' #)


This looks like Haskell to me, with many detailes explicitely written out, 
probaly serving as immediate input to the compiler - not its output. So it 
can't say to us much about how this is actually implemented on the lower level. 
(?)

Your theory would certainly hold if the translation was done one-to-one without 
any further code rearrangements. But it could've been further re-arranged by 
the compiler at some later stage (is there one?) into an equivalent of, e.g. 


  mergeSP (a,b) p = let (bc,b') = case p of (c,_) -
case spMerge b c of (x,y) -
  (x,y)
in
   (# (++) a bc ,
  case p of (_,d) - merge b' d  #)


and further,


  mergeSP (a,b) p = let (c,d)   = case p of (x,y) - (x,y)
(bc,b') = case spMerge b c of (x,y) -
  (x,y)
in
   (# (++) a bc , merge b' d  #)


could it? This would take hold on /d/ and /c/ at the same time, right? 

What is that code that you've shown exactly? At which stage is it produced and 
is it subject to any further manipulation? I apologise if these are obvious 
questions, I don't know anything about GHC. I also don't know what (# x,y #) 
means?

One thing seems certain - we should not hold explicit references to same 
entities in different parts of our code, to avoid space leaks with more 
confidence. To make code look as much tail-recursive as possible, so to speak.

Does that make sense?

Anyway that was a very educational (for me) and fruitful discussion, and I 
greatly appreciate your help, and fixing and improving of the code.

Thanks!


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes

2010-01-13 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 Am Mittwoch 13 Januar 2010 10:43:42 schrieb Heinrich Apfelmus:
  I wonder whether it's really the liveness of  pair  in
 
    mergeSP (a,b) pair
       = let sm = spMerge b (fst pair)
         in (a ++ fst sm, merge (snd sm) (snd pair))
 
  that is responsible for the space leak, for chances are that Sparud's
  technique applies and  pair  is properly disposed of. Rather, it could
  be that we need the stronger property that forcing the second component
  will evaluate the first to NF.
 
 I think that is responsible. At least that's how I understand the core:
 
 mergeSP (a,b) ~(c,d) = (a ++ bc, merge b' d)
where
   (bc, b') = spMerge b c
   spMerge ...



That is equivalent to

  first (a++) . second (`merge`d) $ spMerge b c

and Daniel's fix is equivalent to

  first (a++) $ spMerge b c d


Now, when compiler sees the first variant, it probably treats spMerge as 
opaque. I.e. although in reality spMerge only contributes to the 
first channel while it is progressively instantiated, and (`merge`d) will 
only be called upon when spMerge's final clause is reached, that is (most 
likely) not known to the compiler at this stage. When looking at just the first 
expression itself, it has to assume that spMerge may contribute to both 
channels (parts of a pair) while working, and so can't know _when_ /d/ will get 
called upon to contribute to the data, as it is consumed finally at access.

So /d/ is gotten hold of prematurely, _before_ going into spMerge.

The second variant passes the responsibility for actually accessing its inputs 
to spMerge itself, and _it_ is clear about needing /d/ only in the very end.

Just a theory. :) 

Does that make sense? 




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes

2010-01-10 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 Am Freitag 08 Januar 2010 19:45:47 schrieb Will Ness:
  Daniel Fischer daniel.is.fischer at web.de writes:
 
  
   mergeSP :: Integral a = People a - People a - People a
   mergeSP p1@(P a _) p2 = P (a ++ vips mrgd) (dorks mrgd)
 where
   mrgd = spMerge (dorks p1) (vips p2) (dorks p2)
   spMerge l1 [] l3 = P [] (merge l1 l3)
   spMerge ~l1@(x:xs) l2@(y:ys) l3 = case compare x y of
   LT - celebrate x (spMerge xs l2 l3)
   EQ - celebrate x (spMerge xs ys l3)
   GT - celebrate y (spMerge l1 ys l3)
  
   --


Actually, the minimal edit that does the trick (of eliminating the space leak 
that you've identified) for my original code is just


  mergeSP (a,b) ~(c,d) = let (bc,bd) = spMerge b c d
 in (a ++ bc, bd) 
   where 
spMerge b [] d = ([] ,merge b d)
spMerge b@(x:xs) c@(y:ys) d = case compare x y of
LT -  (x:u,v)  where (u,v) = spMerge xs c  d
EQ -  (x:u,v)  where (u,v) = spMerge xs ys d
GT -  (y:u,v)  where (u,v) = spMerge b  ys d
spMerge [] c d = ([] ,merge c d)


which hardly looks at all different at the first glance. Just for reference, it 
was 

 {-
  mergeSP (a,b) ~(c,d) = let (bc,b') = spMerge b c
 in (a ++ bc, merge b' d) 
   where 
spMerge b@(x:xs) c@(y:ys) = case compare x y of
LT -  (x:u,v)  where (u,v) = spMerge xs c  
EQ -  (x:u,v)  where (u,v) = spMerge xs ys 
GT -  (y:u,v)  where (u,v) = spMerge b  ys 
spMerge b [] = ([] ,b)
spMerge [] c = ([] ,c)
 -}

spMerge of course is not tail recursive here in both versions if seen through 
the imperative eyes. But lazy evaluation makes it effectively so. The important 
thing is, when the final point is reached, there's no outstanding context - 
everything is present. There should be a name for such concept. This is very 
similar to late instantiation in Prolog (programming with holes), and I think 
this *would* pass as a tail-recursive function /there/.

Even in the new code the compiler could've internally held on to the original 
pair and only deconstructed the 'd' out of it at the final call to merge, 
recreating the space leak. It could just as well have recognized that 'd' isn't 
changed inside spMerge (we're pure in Haskell after all) and deconstructed the 
pair in the original code. Something is missing here.

 
 As it turns out, the important things are
 
 1. a feeder and separate lists of multiples for the feeder and the runner, 
 for the reasons detailed earlier (two-step feeding and larger wheel are 
 pleasing but minor optimisations).
 
 2. a strict pattern in tfold
 
 3. moving the merge inside spMerge 


  Is this the state of our _best_ Haskell compiler
 
 
 Yes. It's still a do what I tell you to compiler, even if a pretty slick 
 one, not a do what I mean compiler. Sometimes, what you tell the compiler 
 isn't what you wanted.
 It's easier to predict when you give detailed step by step instructions.
 




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes

2010-01-09 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 
 Am Donnerstag 07 Januar 2010 11:43:44 schrieb Heinrich Apfelmus:
  Will Ness wrote:
   Heinrich Apfelmus writes:
 
 The below code is now a well-behaved memory citizen (3MB for the 100 
millionth prime, about the same as the PQ code). It still is considerably 
slower than the PQ code.
 In terms of MUT times as reported by +RTS -sstderr - as well as (MUT+GC) 
times - (measured once for prime No. 5*10^5, 10^6, 2*10^6, 4*10^6, 10^7 to get 
a rough tendency), it seems to scale a wee bit better than any of the other 
tfold versions I created, but a little worse than the PQ versions.
 The relation of MUT times isn't too bad, but the GC times are rather abysmal 
(30-40%).
 
 --
 
 data People a = P { vips :: [a], dorks :: [a] }
 
 celebrate x p = P (x:vips p) (dorks p)
 
 mergeSP p1@(P a _) p2 = P (a ++ vips mrgd) (dorks mrgd)
   where
 mrgd = spMerge (dorks p1) (vips p2) (dorks p2)
 spMerge l1 [] l3 = P [] (merge l1 l3)
 spMerge ~l1@(x:xs) l2@(y:ys) l3 = case compare x y of
 LT - celebrate x (spMerge xs l2 l3)
 EQ - celebrate x (spMerge xs ys l3)
 GT - celebrate y (spMerge l1 ys l3)
 


I forgot to say something *very* important. :) Here it is.


Yippee-hurray!


:)


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes

2010-01-09 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 Am Samstag 09 Januar 2010 08:04:20 schrieb Will Ness:
  Daniel Fischer daniel.is.fischer at web.de writes:
   Am Freitag 08 Januar 2010 19:45:47 schrieb Will Ness:
Daniel Fischer daniel.is.fischer at web.de writes:
  
   It's not tail-recursive, the recursive call is inside a celebrate.
 
  It is (spMerge that is).
 
 No.
 In computer science, tail recursion (or tail-end recursion) is a special 
 case of recursion in which the last operation of the function, the tail 
 call, is a recursive call.


As far as I understand it, when a function makes a tail call to a tail 
recursive function (be it itself or some other function) it is itself tail 
recursive. I.e. that call may be replaced with a direct jump, with no new 
context to be created. That is what your version accomplishes, too. Mine really 
held to that pair ~(c,d) when it wanted to call (merge _ d) _after_ the call to 
spMerge. By passing the pre-decomposed part of a pair, 'd', into the process 
environment of spMerge, you've made it tail recursive - it carried along all 
the context it needed. That's what've eliminated the space leak, so I'd say 
tail recursion does play a role under lazy evaluation - when a compiler isn't 
smart enough to do _that_ for us by itself. _Were_ it reliably smart, even non-
recursive functions like my initial variant would work just as well. 



 The last operation of spMerge is a call to celebrate or the pair 
 constructor (be that P or (,)). Doesn't matter, though, as for lazy 
 languages, tail recursion isn't very important.
 
  It calls tail-recursive celebrate in a tail
  position. What you've done, is to eliminate the outstanding context, by
  moving it inward. Your detailed explanation is more clear than that. :)
 
  BTW when I run VIP code it is consistently slower than using just pairs,
 
 I can't reproduce that. Ceteris paribus, I get the exact same allocation 
 and GC figures whether I use People or (,), running times identical enough 
 (difference between People and (,) is smaller than the difference between 
 runs of the same; the difference between the fastest and the slowest run of 
 the two is less than 0.5%). I think it must be the other changes you made.

I just take the VIP code as it is on a web page, and my intial variant without 
the wheel, and compare. Then I add the wheel in the same fashion, and then 
feeder, and compare again. When I tested that Monid instance code I too 
compared it to the straight pairs, and it was slower. Don't know why. 


  modified with wheel and feeder and all. So what's needed is to
  re-implement your approach for pairs:
 
   mergeSP (a,b) ~(c,d) = let (bc,bd) = spMerge b c d
 in (a ++ bc, bd)
   where
spMerge b [] d = ([], merge b d)
spMerge b@(x:xs) c@(y:ys) d = case compare x y of
 LT - consSP x $ spMerge xs c  d
 EQ - consSP x $ spMerge xs ys d
 GT - consSP y $ spMerge b  ys d
 
   consSP x ~(bc,bd) = (x:bc,bd) -- don't forget that magic `~` !!!
 
 I called that (:).
 
 
  BTW I'm able to eliminate sharing without a compiler switch by using
 
 
 Yes, I can too. But it's easy to make a false step and trigger sharing.

yes indeed. It's seems unpredictable. Fortunately GHC couldn't tell that (12-1) 
was 11 by the looks of it. :) Your idea certainly seems right, that there ought 
to be some control over sharing on a per-function basis somehow without these 
ridiculous code tricks.

 I can get a nice speedup (~15%, mostly due to much less garbage collecting) 
by doing the final merge in a function without unnecessarily wrapping the 
result in a pair

Will have to wrap my head around _that_. But that would be fighting with the 
compiler again. I don't like that, I much rather fight with the problem at 
hand. There shouldn't be any pairs in the compiled code in the first place. 
They just guide the staging of (++) and (merge) intertwined between the 
producer streams. At each finite step, when the second part of a pair comes 
into play, it is only after its first part was completely consumed. I guess the 
next thing to try would be to actually create a data type MergeNode and arrange 
_those_ in a tree and see if that helps. That would be the next half-step 
towards the PQ itself.

 This uses a different folding structure again,

which I am yet to decipher. :)

  How about them wheels? :)
 
 Well, what about them?

I dunno, it makes for a real easy wheel derivation, and coming out of our 
discussion of euler's sieve it's a nice cross-pollination. :) Having yet 
another list representation suddenly cleared up the whole issue (two of them). 
I'll repost it one last time as I've made some corrections to it:

   euler ks@(p:rs) = p : euler (rs `minus` map (*p) ks)
   primes = euler [2..]

   primes  = euler $ listFrom [2] 1
= 2:euler ( listFrom [3] 1 `minus` map(2*) (listFrom [2] 1)) )
listFrom [3,4] 2

[Haskell-cafe] Re: FASTER primes

2010-01-08 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 
 Am Donnerstag 07 Januar 2010 11:43:44 schrieb Heinrich Apfelmus:
  Will Ness wrote:
 
  Hm? In my world view, there is only reduction to normal form and I don't
  see how allocate its own storage fits in there. Okasaki having shown
  something to that end would be new to me?
 
 Perhaps what was meant was storage must be allocated for each filter
 (which, however, seems trivial).

I still contend that in case of nested filters, where each filter has only one 
consumer, even that isn't ultimately necessary (a chain of pointers could be 
formed). That's because there's no peeks, only pulls there. If the filters 
are used in merge situation, then there will be some peeks so current value 
must be somehow stored, though it's better to be made explicit and thus static 
(the place for it, I mean, instead of the rolling cons cell). Such 
implementation technique would prevent some extra gc.


   Then under merging these split pairs form a monoid, so can be rearranged
   in a tree. If you haven't seen it yet, it uses a different folding
   structure to your code, with a lower total cost of multiples production
   (estimated as Sum (1/p)*depth):

correction:

tfold f (x:y:z:xs) = (x `f` (y `f` z)) `f` tfold f (pairwise f xs)
comps = tfold mergeSP $ pairwise mergeSP multips
 
  The idea being that each prime produces  1/p  composites each turn and
  it takes time  depth  to trickle it to the top? Sounds reasonable, but
  remember that a prime  p  does not start to generate composites until
  the turn count has reached p^2, so it might occupy a place of low
  depth in the tree that is better spent on the previous primes. 


That might be why Daniel's structure is better: it plunges down faster than 
mine.


treefold structure was:
(2+4) + ( (4+8) + ( (8+16) + ( (16+32) + ( (32+64) + ... 
dpths:   3 4   4 5   5 66  77  8


daniel's:
(2+(4+6)) + ( (8+(10+12)) + ( (14+(16+18)) + ( (20+(22+24)) +  ))
 3  5 5.4  6  7.8 7.9  8   9  9.5 9.6 10.7 10.8



 primes () = 2:3:5:7:11:13:primes'
    where
     primes'   = roll 17 wheel13 `minus` compos primes'''
 primes''  = 17:19:23:29:31:37:rollFrom 41 `minus` compos primes''
 primes''' = 17:19:23:29:31:37:rollFrom 41 `minus` compos primes''


Haven't read through the whole thing yet. :) :) I thought there was a typo 
there. There isn't.

BTW using the no-share switch isn't necessary if we just write it down twice 
with some variations, like 

  primes''' = let (h,t)=span ( 17^2) roll 17 wheel13
  in h++t `minus` compos primes''

As we've found out, compilers aren't  _that_ smart yet.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes

2010-01-08 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 
 The below code is now a well-behaved memory citizen (3MB for the 100 
millionth prime, about the same as the PQ code). It still is considerably 
slower than the PQ code.
 In terms of MUT times as reported by +RTS -sstderr - as well as (MUT+GC) 
times - (measured once for prime No. 5*10^5, 10^6, 2*10^6, 4*10^6, 10^7 to get 
a rough tendency), it seems to scale a wee bit better than any of the other 
tfold versions I created, but a little worse than the PQ versions.
 The relation of MUT times isn't too bad, but the GC times are rather abysmal 
(30-40%).
 
 --
 data People a = P { vips :: [a], dorks :: [a] }
 
 celebrate :: a - People a - People a
 celebrate x p = P (x:vips p) (dorks p)
 
 primes :: forall a. Integral a = () - [a]
 primes () = 2:3:5:7:11:13:primes'
    where
     primes'   = roll 17 wheel13 `minus` compos primes'''
  primes''  = 17:19:23:29:31:rollFrom 37 `minus` compos primes''
 primes''' = 17:19:23:29:31:37:rollFrom 41 `minus` compos primes''
 
 pmults :: a - People a
 pmults p = case map (*p) (rollFrom p) of
 (x:xs) - P [x] xs
 
 multip :: [a] - [People a]
 multip ps = map pmults ps
 
 compos :: [a] - [a]
  compos = vips . smartfold mergeSP . multip
 
 
 smartfold f = tfold f . pairwise f
 
 tfold f (a:b:c:xs) = (a `f` (b `f` c)) `f` smartfold f xs
 
 pairwise f (x:y:ys)  = f x y : pairwise f ys
 
 mergeSP :: Integral a = People a - People a - People a
 mergeSP p1@(P a _) p2 = P (a ++ vips mrgd) (dorks mrgd)
   where
 mrgd = spMerge (dorks p1) (vips p2) (dorks p2)
 spMerge l1 [] l3 = P [] (merge l1 l3)
 spMerge ~l1@(x:xs) l2@(y:ys) l3 = case compare x y of
 LT - celebrate x (spMerge xs l2 l3)
 EQ - celebrate x (spMerge xs ys l3)
 GT - celebrate y (spMerge l1 ys l3)
 
 --
 


Hi Daniel,

Is it so that you went back to my fold structure? Was it better for really big 
numbers of primes? 

I had the following for ages (well, at least two weeks) but I thought it was 
slower and took more memory (that was _before_ the 'no-share' and 'feeder' 
stuff). I can see the only difference in that you've re-written spMerge in a 
tail-recursive style with explicitly deconstructed parts; mine was relying on 
the compiler (8-L) to de-couple the two pipes and recognize that the second 
just passes along the final result, unchanged.

The two versions seem to me to be _exactly_ operationally equivalent. All this 
searching for the code better understood by the compiler is _*very*_ 
frustrating, as it doesn't reflect on the semantics of the code, or even the 
operational semantics of the code.  :-[

Weren't the P's supposed to disappear completely in the compiled code? Aren't 
types just a _behavioral_ definitions??? Aren't we supposed to be able to 
substitute equals for equals dammit??

Is this the state of our _best_ Haskell compiler



 module Primes8 where

 import Data.Monoid

 data (Ord a) = SplitList a = P [a] [a]

 instance (Ord a) = Monoid (SplitList a) where 
mempty = P [] []  
-- {x | x::SplitList a} form a monoid under mappend
mappend (P a b) ~(P c d) = let P bc b' = spMerge b c
   in P (a ++ bc) (merge b' d)
 where 
  spMerge :: (Ord a) = [a] - [a] - SplitList a 
  spMerge u@(x:xs) w@(y:ys) = case compare x y of
   LT - P (x:c) d  where (P c d) = spMerge xs w
   EQ - P (x:c) d  where (P c d) = spMerge xs ys
   GT - P (y:c) d  where (P c d) = spMerge u  ys
  spMerge u [] = P  []   u 
  spMerge [] w = P  []   w 
mconcat ms = fold mappend (pairwise mappend ms)
 where
  fold f (a: ~(b: ~(c:xs))) 
  = (a `f` (b `f` c)) `f` fold f (pairwise f xs)
  pairwise f (x:y:ys) = f x y:pairwise f ys

 primes :: Integral a = () - [a]
 primes () = 2:3:5:7:primes'
   where
primes'= [11,13] ++ drop 2 (rollFrom 11) `minus` comps
mults  = map (\p- P [p*p] [p*n | n- tail $ rollFrom p]) $ primes'
P comps _  = mconcat mults



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes

2010-01-08 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:


 roll   = scanl (+)
 wheel  = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2:
    4:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wheel
 wheel11 = res
   where
 snms = scanl (+) 11 (take 47 wheel)
 nums = tail $ scanl (+) 11 (take 529 wheel)
 cops = nums `minus` map (*11) snms
 diffs = zipWith (-) (tail cops) cops
 res = foldr (:) res diffs
 wheel13 = res
   where
 snms = take 480 $ scanl (+) 13 wheel11
 nums = take (480*13+1) . tail $ scanl (+) 13 wheel11
 cops = nums `minus` map (*13) snms
 diffs = zipWith (-) (tail cops) cops
 res = foldr (:) res diffs
 


BTW have you seen my take on the faithful Euler's sieve? It shows another way 
to look at the wheels, which for me was really the first time I really 
understood what's going on there.

It also makes for easier wheel extention (IMO):


   euler ks@(p:rs) = p : euler (rs `minus` map (*p) ks)
   primes = euler [2..]


The essence of Euler's sieve is the wheel: after each step we're left with 
lists of non-multiples of the preceding primes:


primes  = euler $ listFrom [2] 1
 = 2:euler ( listFrom [3] 1 `minus` map(2*) (listFrom [2] 1)) )
 listFrom [3,4] 2 `minus` listFrom [4] 2
   -- listFrom [3] 2 --
 = 2:3:euler (listFrom [5] 2 `minus` map(3*) (listFrom [3] 2))
  listFrom [5,7,9] 6 `minus` listFrom [9] 6
   -- listFrom [5,7] 6 --
 = 2:3:5:euler (listFrom [7,11] 6 `minus` listFrom [25,35] 30)
  [7,11, 13,17, 19,23, 25,29, 31,35] 30
-- listFrom [7,11,13,17,19,23,29,31] 30 --
 = .

where

  listFrom xs by = concat $ iterate (map (+ by)) xs

so

  -- startRoll = ([2],1)

  nextRoll r@(xs@(x:xt),b) 
 = ( (x,r') , r')
 where
   ys@(y:_) = xt ++ [x + b]
   r' = (xs',b')
   b' = x*b
   xs' = takeWhile ( y + b') (listFrom ys b) 
 `minus` map (x*) xs

  rolls = unfoldr (Just . nextRoll) ([2],1)

  nthWheel n = let (ps,rs) = unzip $ take n rolls
   (x:xs,b) = last rs
   in ((ps, x), zipWith (-) (xs++[x+b]) (x:xs))

{-
 *Main mapM print $ take 4 rolls
 (2,([3],2))
 (3,([5,7],6))
 (5,([7,11,13,17,19,23,29,31],30))
 (7,([11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,
  101,103,107,109,113,121,127,131,137,139,143,149,151,157,163,167,169,
  173,179,181,187,191,193,197,199,209,211],210))

 *Main nthWheel 3
 (([2,3,5],7),[4,2,4,2,4,6,2,6])

 *Main nthWheel 4
 (([2,3,5,7],11),[2,4,2,4,6,2,6,4,2,4,6,6,2,6,4,2,6,4,6,8,4,2,4,2,
  4,8,6,4,6,2,4,6,2,6,6,4,2,4,6,2,6,4,2,4,2,10,2,10])
-}


Coincidentally, the function specified  by


 eulerPrimes n = let (ps,rs) = unzip $ take n rolls
 (qs@(q:_),b) = last rs
 in ps ++ takeWhile ( q^2) qs


can be used to write the specialized nthEulerPrime etc., whose complexity seems 
to be about O(n^1.5).


Maybe this reinvents some spiraling wheels or somethn'. :)



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes

2010-01-08 Thread Will Ness
Will Ness will_n48 at yahoo.com writes:

 
 
 That might be why Daniel's structure is better: it plunges down faster than 
 mine.
 
 treefold structure was:
 (2+4) + ( (4+8) + ( (8+16) + ( (16+32) + ( (32+64) + ... 
 dpths:   3 4   4 5   5 66  77  8

this should of course have been


  dpths:   3 4   5 6   7 89  10  11  12


 
 daniel's:
 (2+(4+6)) + ( (8+(10+12)) + ( (14+(16+18)) + ( (20+(22+24)) +  ))
  3  5 5.4  6  7.8 7.9  8   9  9.5 9.6 10.7 10.8
 


hmm. :|

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes

2010-01-08 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 Am Freitag 08 Januar 2010 19:45:47 schrieb Will Ness:
  Daniel Fischer daniel.is.fischer at web.de writes:
 
 
 It's not tail-recursive, the recursive call is inside a celebrate.

It is (spMerge that is). It calls tail-recursive celebrate in a tail position. 
What you've done, is to eliminate the outstanding context, buy moving it 
inward. Your detailed explanation is more clear than that. :)

BTW when I run VIP code it is consistently slower than using just pairs, 
modified with wheel and feeder and all. So what's needed is to re-implement 
your approach for pairs:

 mergeSP (a,b) ~(c,d) = let (bc,bd) = spMerge b c d 
   in (a ++ bc, bd)
 where 
  spMerge u [] d = ([], merge u d)   
  spMerge u@(x:xs) w@(y:ys) d = case compare x y of
   LT - consSP x $ spMerge xs w  d
   EQ - consSP x $ spMerge xs ys d
   GT - consSP y $ spMerge u  ys d

 consSP x ~(a,b) = (x:a,b)   -- don't forget that magic `~` !!!


BTW I'm able to eliminate sharing without a compiler switch by using


 mtwprimes () = 2:3:5:7:primes 
   where
primes = doPrimes 121 primes

 doPrimes n prs = let (h,t) = span ( n) $ rollFrom 11 
  in h ++ t `diff` comps prs
 doPrimes2 n prs = let (h,t) = span ( n) $ rollFrom (12-1)
   in h ++ t `diff` comps prs

 mtw2primes () = 2:3:5:7:primes
   where
primes  = doPrimes 26 primes2
primes2 = doPrimes2 121 primes2


Using 'splitAt 26' in place of 'span ( 121)' didn't work though.


How about them wheels? :)



 
 Yes. It's still a do what I tell you to compiler, even if a pretty slick 
 one, not a do what I mean compiler. Sometimes, what you tell the compiler 
 isn't what you wanted.
 It's easier to predict when you give detailed step by step instructions.
 




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes

2010-01-08 Thread Will Ness
Heinrich Apfelmus apfelmus at quantentunnel.de writes:

 
 Will Ness wrote:
  But I get the impression that GHC isn't working through equational
  reasoning?.. 
  I see all this talk about thunks etc.
 
 Sure it does. Concerning the thunks, they're part of the implementation
 of the reduction model (call-by-need  aka  lazy evaluation).

At run-time? I meant to eliminate as much calculation as possible, pre-run-time.
I would expect the best efforts of the best minds to go into searching for ways 
how to eliminate computations altogether, instead of how to perform them better.


 
  Concerning the sieves, there is a fundamental difference between the
  imperative sieve and the functional sieves, regardless of whether the
  latter start at p or p^2 or use a priority queue. [...]
  
  We can directy jump to the next multiple too, it is called (+). :) :)
 
  But seriously, the real issue is that we have to merge the produced
  streams of multiples, while the mutable-storage code works on same one,
  so its merging cost is zero. 
 
 Not quite, I think there are two things going on:
 
 1. In contrast to the standard imperative version, we are implementing
 an on-line algorithm that produces arbitrarily many primes on demand.
 Any imperative on-line version will need to think about storage for
 pending prime filters as well.

True.


 2. Even the imperative algorithm can be interpreted as merging arrays,
 just that the composites are magically merged at the right place (at
 distance p from each other) because arrays offer O(1) jumps. 

i.e. with a merging cost of zero. :)

 In contrast, the functional version needs O(log something) time to
 calculate where to put the next composite.

when thinking it terms of the finally produced sequence, yes. We have to 
produce numbers one by one and take care of their ordering ourselves; _they_ 
just /throw/ the numbers at the shared canvas and let _it_ take care of 
ordering them automagically, _later_, on the final sweep through. ISWYM.



 
  If you could take a look at the tree-merging primes and why it uses
  too much memory, it would be great.
 
 Fortunately, Daniel already took a detailed look. :) 


Yes he really came through! He finally found, and fixed, the space leak. It was 
hiding in 

 mergeSP_BAD (a,b) ~(c,d) = let (bc,b') = spMerge b c
in (a ++ bc, merge b' d)
   where 
spMerge :: (Ord a) = [a] - [a] - ([a],[a]) 
spMerge a@(x:xs) b@(y:ys) = case compare x y of
LT -  (x:c,d)  where (c,d) = spMerge xs b
EQ -  (x:c,d)  where (c,d) = spMerge xs ys
GT -  (y:c,d)  where (c,d) = spMerge a  ys
spMerge a [] = ([] ,a)
spMerge [] b = ([] ,b)


which really ought to have been


 mergeSP (a,b) ~(c,d) = let ~(bc,bd) = spMerge b c d
in (a ++ bc, bd)
 where 
  spMerge u [] d = ([], merge u d)
  spMerge u@(x:xs) w@(y:ys) d = case compare x y of
   LT - spCons x $ spMerge xs w  d
   EQ - spCons x $ spMerge xs ys d
   GT - spCons y $ spMerge u  ys d
  spCons x ~(a,b) = (x:a,b)


Can you spot the difference? :) :)


 Aww, why remove the cutesy name? The VIPs will be angry for being ignored!


It runs faster on plain pairs, and on less memory, equals for equals. For some 
reason. :)




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes

2010-01-06 Thread Will Ness
Will Ness will_n48 at yahoo.com writes:

 
 Daniel Fischer daniel.is.fischer at web.de writes:
 
  Am Dienstag 05 Januar 2010 14:49:58 schrieb Will Ness:
  
 euler ks@(p:rs) = p : euler (rs `minus` map (*p) ks)
 primes = 2:euler [3,5..]
  
  
 
 Re-write:
 
  primes  = euler $ rollFrom [2] 1
   = 2:euler ( rollFrom [3] 1 `minus` map(2*) (rollFrom [2] 1)) )
   rollFrom [3,4] 2 `minus` rollFrom [4] 2
 -- rollFrom [3] 2 --
   = 2:3:euler (rollFrom [5] 2 `minus` map(3*) (rollFrom [3] 2))
rollFrom [5,7,9] 6 `minus` rollFrom [9] 6
 -- rollFrom [5,7] 6 --
   = 2:3:5:euler (rollFrom [7,11] 6 `minus` rollFrom [25,35] 30)
[7,11, 13,17, 19,23, 25,29, 31,35] 30
  -- rollFrom [7,11,13,17,19,23,29,31] 30 --
   = .
 

correction:

where
  rollOnce (x:xs) by = (x, xs ++ [x+by])
  rollFrom xs by = concat $ iterate (map (+ by)) (xs)
  multRoll xs@(x:_) by p = takeWhile ( (x+p*by)) $ rollFrom xs by

 
 so, reifying, we get
 
  data Roll a = Roll [a] a
 
  rollOnce (Roll (x:xs) by) = (x,Roll (xs ++ [x+by]) by)
  rollFrom (Roll xs by) = concat $ iterate (map (+ by)) (xs)
  multRoll r@(Roll (x:_) by) p 
   = Roll (takeWhile ( (x+p*by)) $ rollFrom r) (by*p)
 
  primes  = euler $ Roll [2] 1
  euler r@(Roll xs _)
 = x:euler (Roll (mxs `minus` map (x*) xs)  mby)
   where  
(x,r') = rollOnce r
(Roll mxs mby) = multRoll r' x
 

There's much extra primes pre-calculated inside the Roll, of course.

For any (Roll xs@(x:_) _),  (takeWhile ( x*x) xs) are all primes too.

When these are used, the code's complexity is around O(n^1.5), and it runs 
about 1.8x slower than Postponed Filters.

The faithful sieve's empirical complexity is above 2.10..2.25 and rising. So 
it might not be exponential, bbut is worse than power it seems anyway.





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes

2010-01-06 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 
 Am Mittwoch 06 Januar 2010 00:09:07 schrieb Will Ness:
  Daniel Fischer daniel.is.fischer at web.de writes:
   Am Montag 04 Januar 2010 22:25:28 schrieb Daniel Fischer:
   Fix rfold:
  
   rfold f [x] = x
   rfold f xs = rfold f (pairwise f xs)
  
   and it's faster also for those.
 
 
 The memory is almost completely due to the tree-merging of the multiples for
 the fastest runner. While it produces faster than flat merging, the
 exponential growth of the trees makes  a bad memory citizen.


Isn't the number of nodes the same in any two trees with the same number of 
leafs?

BTW using

 compos ps = fst $ tfold mergeSP $ nwise 1 mergeSP $ map pmults ps

instead of 

 compos ps = fst $ tfold mergeSP $ nwise 1 mergeSP 
 $ pairwise mergeSP $ map pmults ps

brings down memory consumption further by 25%..45% on 1..8 mln primes produced, 
while slowing it down by about 0%..2% (that's after eliminating the lazy 
pattern in tfold as per your advice).


  'pairwise' puts odd leafs higher on the right. It might be better if it was
  so on the left, for the frequency of production is higher.
 
 Maybe. But how would you do it? I tried passing the length to rfold, so when
 there was an odd numberof trees in the list, it would move the first out of
 the recursion. Any possible gains in production have been more than eaten up
 by the control code (not a big difference, but it was there).


yes I've seen this too now. BTW, at a price of further slowing down, memory can 
be lowered yet more with


 compos ps = fst $ tfold mergeSP $ nwise 1 0.4 mergeSP $ map pmults ps
 nwise k d f xs = let (ys,zs) = splitAt (round k) xs 
  in rfold f ys : nwise (k+d) d f zs


It really looks like the nearer the structure is to linear list, the lower the 
memory consumption becomes. Of course using 0.0 in place of 0.4 would make it 
into a plain list.





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes

2010-01-05 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 
 Am Montag 04 Januar 2010 16:30:18 schrieb Will Ness:
 
  For me, a real smart compiler is one that would take in e.g. (sum $ take n
  $ cycle $ [1..m]) and spill out a straight up math formula, inside a few
  ifs maybe (just an aside).
 
 Such things just aren't common enough. If you start letting the compiler
 look for patterns such as this which can be transformed into a simple
 formula, compile times would explode.

I was thinking more along the lines of inferencing compiler, proving new 
theorems about the types and applying that knowledge in simplifying the 
expressions. This would take time, so it should be a part of some interactive 
system, maybe kind of like Lisp has. 

In such a setting, the underlying compiler could first produce quick-n-dirty 
version, and would continue working in the background whenever the system is 
not busy, trying to improve the executable. Such a system would probably have 
to distinguish, at the type level, between [1..m] ; cycle [1..m] ; take n 
[1..m] ; etc. These would all be not just fuctions, but parts of a type's (here 
list) behaviour with automatically deduced semantics.

What would such a type system be called? 


 The -fno-cse turns off Common Subexpression Elimination (rather sharing
 than elimination).

 That is, if you have
 
 f x = g (expensive x) y z (h (expensive x))
 
 the compiler can see the common subexpression (expensive x) on the RHS and 
 decide to share it, i.e. calculate it only once:
 
 f x = let e = expensive x in g e y z (h e)
 
 

thanks for the in-depth explanation! :)

 
 Now if you have a list producer and a consumer, without fusion, it goes like
 Consumer: Gimme
 Producer: creates cons cell, puts value, next cons cell (how to produce more)
 Consumer: takes value, does something with it, gimme more.
 
 Stream fusion is about eliminating the cons cell creating and value
 passing, to fuse production and consumption into a nice loop. That is of
 course impossible if the produced list is shared between two (or more)
 consumers.


I would imagine so. Do I get this fusion on lists for free from the compiler, 
or do I have to recode for that? (haven't yet time to look into the article 
mentioned).


 
 
 
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe at haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe
 




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes

2010-01-05 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 
 Am Montag 04 Januar 2010 19:16:32 schrieb Will Ness:
  Daniel Fischer daniel.is.fischer at web.de writes:
   Am Montag 04 Januar 2010 13:25:47 schrieb Will Ness:
Euler's sieve is
   
 sieve (p:ps) xs = h ++ sieve ps (t `minus` map (p*) [p,p+2..])
  where (h,t) = span ( p*p) xs
  
   Not quite. That's a variant of the postponed filters, it crosses off e.g.
   45 twice, once as 3*15 and once as 5*9 (okay, it has already been removed
   by the first, so let's say 45 appears in two lists of numbers to be
   removed if present).
 
  there won't be any such. whatever is removed first, is just skipped second
  (and third, etc). 
 
 ((45:(offer 47 when demanded)) `minus` (45:(next will be 51 when demanded)))
  `minus` (45:(next will be 55 when demanded))
 
 So there are two attempts to tell the generator to not output 45. To the
 second, it answers I already knew that, but the request is made
 nevertheless.

yes, of course. 

 ... There are two attempts to eliminate 45.

I would say there are two requests to not have 45 in the output.

  I don't see any problem here. As Melissa (and yourself, I think) have
  shown, double hits on multiples are few and far between.
 
 It's not a problem, it just means it's not Euler's sieve, because that
 attempts to eliminate each composite exactly once.

yes I see now. My bad. Wasn't reading that wikipedia page with enough attention 
to detail. It uses the modified (culled, so far) numbers to find out the next 
multiples to be eliminated, not just the prime itself like my code does.

You solution is indeed, exponential:

  euler ks@(p:rs) = p : euler (rs `minus` map (*p) ks)
  primes = 2:euler [3,5..]

  
  primes 
   = 2:euler (a...@[3,5..])
   3:euler (bs@(tail as `minus` map (3*) as))
 5:euler (cs@(tail bs `minus` map (5*) bs))


There are two separate look-back pointers to /as/ in /bs/, and there are two 
such in /cs/, to /bs/. The book-keeping machinery explodes.



  Also, it uses no filters, i.e. no individual number divisibility testing.
  The filters refers first of all to testing an individual number to decide
  whether to keep it in or not.
 
 Umm, the postponed filters says keep everything until p^2, then eliminate
 (filter out, remove) multiples of p in the remainder, after that, pick next
 prime.
 That's precisely what the above does. It doesn't do the filtering out by
 divisibility testing but by minus (hence more efficiently). I would say
 that's a variant of the postponed filters.
 

Filter is usually (as in Haskell's 'filter') is about testing individual 
elements by a predicate function. There is of course a filtering effect in two 
lists elts' comparison that 'minus' performs, so that's debatable. Even the PQ 
code performs filtering in this wider sense.


   Euler's sieve is never attempting to remove a number more than once,
   that's
 
  How's that possible?
 
 http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes#Euler.27s_Sieve
 
 C) The number after the previous prime is also a prime. *Multiply each 
 number /that's left/
 in the list starting from this prime by this prime and discard the products*.


yes. Wasn't paying attention to that, more to the intent of it.

There's of course enourmous vagueness in how exactly it is to be performed, in 
the unbounded case, which you uncovered here.

 
  It can't have foresight, right?
 
 
 But it has :) By only trying to eliminates products of the prime p 
 currently under consideration *with numbers (= p) /which have not/
 /yet been eliminated/ from the list*, it is known in advance that all these
 products are still in the list.


missed that.


 When p is under consideration, the list contains (apart from the primes  p) 
precisely the numbers whose smallest prime factor is = p.
 
   the outstanding feature of it. Unfortunately, that also makes it hard to
   implement efficiently. The problem is that you can't forget the primes
   between p and p^2, you must remember them for the removal of multiples of
   p later on.


not just primes - all the composites not yet removed, too. So it can't even be 
implemented on shared mutable storage if we want to delay the removal (as we 
must, in unbounded case) - the composites will get removed so their multiples 
must actually all be produced first! 


  The more pressing problem with that code is its linear structure of course
  which gets addressed by the tree-folding merge etc.
 
 
 Which unfortunately introduces its own space problem :(



 Try a different minus:
 
 xxs at (x:xs) `minus` yys at (y:ys)
= case compare x y of
LT - x : xs `minus` yys
EQ - xs `minus` ys
GT - error (trying to remove  ++ show y ++  a second time)
 
 Your code is not. It is, however, much faster.

I understand now. Thanks!


 
   Its performance is really horrible though.


exponential, empyrically as well.



 It just occured to me that the accumulation list is just

[Haskell-cafe] Re: FASTER primes

2010-01-05 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 
 Am Dienstag 05 Januar 2010 14:49:58 schrieb Will Ness:
   ... There are two attempts to eliminate 45.
 
  I would say there are two requests to not have 45 in the output.
 
 Thers are many possible ways to phrase it.
 
 
  You solution is indeed, exponential:
 
euler ks@(p:rs) = p : euler (rs `minus` map (*p) ks)
primes = 2:euler [3,5..]
 
 
primes
 = 2:euler (a...@[3,5..])
 3:euler (bs@(tail as `minus` map (3*) as))
   5:euler (cs@(tail bs `minus` map (5*) bs))
 
 

Re-write:

 euler s = head s:euler (tail s `minus` map(head s*) s)
 primes  = euler [2..]

 primes  = euler $ rollFrom [2] 1
  = 2:euler ( rollFrom [3] 1 `minus` map(2*) (rollFrom [2] 1)) )
  rollFrom [3,4] 2 `minus` rollFrom [4] 2
-- rollFrom [3] 2 --
  = 2:3:euler (rollFrom [5] 2 `minus` map(3*) (rollFrom [3] 2))
   rollFrom [5,7,9] 6 `minus` rollFrom [9] 6
-- rollFrom [5,7] 6 --
  = 2:3:5:euler (rollFrom [7,11] 6 `minus` rollFrom [25,35] 30)
   [7,11, 13,17, 19,23, 25,29, 31,35] 30
 -- rollFrom [7,11,13,17,19,23,29,31] 30 --
  = .

   where
 rollOnce (x:xs) by = (x, tail xs ++ [x+by])
 rollFrom xs by = concat $ iterate (map (+ by)) (xs)
 multRoll xs@(x:_) by p = takeWhile ( (x+p*by)) $ rollFrom xs by


so, reifying, we get


 data Roll a = Roll [a] a

 rollOnce (Roll (x:xs) by) 
  = (x,Roll (xs ++ [x+by]) by)

 rollFrom (Roll xs by) 
  = concat $ iterate (map (+ by)) (xs)

 multRoll r@(Roll (x:_) by) p 
  = Roll (takeWhile ( (x+p*by)) $ rollFrom r) (by*p)

 primes  = euler $ Roll [2] 1
 euler r@(Roll xs _)
= x:euler (Roll (mxs `minus` map (x*) xs)  mby)
  where  
   (x,r') = rollOnce r
   (Roll mxs mby) = multRoll r' x


There's much extra primes pre-calculated inside the Roll, of course (upto p^2 in
fact, for p = primes!!n ), so this needs to be somehow tapped into, by writing a
specialized

 nthPrime n = 

to be called instead of (!! n), and also

 primesUpTo n = 


This calculated primes !! 1000 == 7927 in 3 seconds for me, interpreted, on my
old slowish laptop. It's supposed to have all the primes upto 7927^2 = 62837329
inside the Roll (if I'm not mistaken - or is it?). That's about 3.72 millionth
prime, according to WolframAlpha. (nah, it cant be that much). But it is surely
not horribly slow.

Is this, in fact, the wheels' spiral?


 
 
  not just primes - all the composites not yet removed, too.
 
 Between p and p^2, there are only primes left, fortunately.

but (map (*p) ks) needs access to the original, non-culled numbers - primes,
composites and all. (?)



 
  So it can't even be implemented on shared mutable storage if we
  want to delay the removal (as we must, in unbounded case) -
 
 Yes. And it's not nice in the bounded case either.
 
 
 
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe at haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe
 




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes

2010-01-05 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:


 So we must make sure that the list of composites that primes' consumes is
 not the same as that which primes'' consumes.


yes that is what I had done too. Duplicated everything. Turns out, it works
exactly as you told it would when using the compiler switch, -fno-cse, thanks!


  I used the switch; it didn't help at all. The only thing I can see is

wrong. I didn't. When I did, it worked.


  Unfortunately it grows, as you've said - 23MB for 2 mln. :|
 
 And I've found out why. Change the definition of tfold to
 
 tfold f (a: ~(b: ~(c:xs)))
  = (a `f` (b `f` c)) `f` tfold f xs
 
 and memory stays low (things are going much slower, though).


(forced by gmane poster to delete unusually many of your comments today...)

Interesting... As for the structure, I chose it trying to minimize the 
estimated average cost of a composite production, Sum (1/p)*depth. 


 You can make a compromise by using the above tfold (which is no longer a
 tree-fold) and grouping (and merging) the multiples in a slower-growing
 manner,
 ...
 
 memory still grows, but much slower, in my tests, due to the much smaller
 GC time, it's a bit faster than the version with the original tfold.


Great! :)
 






___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes

2010-01-05 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 
 Am Montag 04 Januar 2010 22:25:28 schrieb Daniel Fischer:

  memory still grows, but much slower, in my tests, due to the much smaller
  GC time, it's a bit faster than the version with the original tfold.
 
 Not for larger inputs (but not so large that the tree-fold dies OOM).
 Fix rfold:
 
 rfold f [x] = x
 rfold f xs = rfold f (pairwise f xs)
 
 and it's faster also for those.


Niiice This is just great!  :)

I tried a two-step feed BTW (that's three separate sets of lists) , with the
original structure. It ran with same speed as your new version (10..20% faster)
but with the memory of the previous one :) (80M for 8 mil primes vs the new
one's 10M). But your new structure is just great! I hoped there is something
better, that's why I posted it here in the first place. 

'pairwise' puts odd leafs higher on the right. It might be better if it was so
on the left, for the frequency of production is higher. 


Thanks a lot for your comments!


 
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe at haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe
 




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes

2010-01-04 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 
 Am Sonntag 03 Januar 2010 09:54:37 schrieb Will Ness:
 
  Daniel Fischer daniel.is.fischer at web.de writes:
  
   But there's a lot of list constructuion and deconstruction necessary for
   the Euler sieve.
 
  yes. Unless, of course, s smart compiler recognizes there's only one
  consumer for the values each multiples-list is producing, and keeps it
  stripped down to a generator function, and its current value. I keep
  thinkig a smart compiler could eliminate all these span calls and replace
  them with just some pointers manipulating...
 
 
 Of course I'm no smart compiler, but I don't see how it could be even
 possible to replace the span calls with pointer manipulation when dealing
 with lazily generated (infinite, if we're really mean) lists. Even when
 you're dealing only with strict finite lists, it's not trivial to do
 efficiently.

I keep thinking that storage duplication with span, filter etc. is not really 
necessary, and can be replaced with some pointer chasing - especially when 
there's only one consumer present for the generated values. 

What I mean is thinking of lists in terms of produce/consumer paradigm, as 
objects supporting the { pull, peek } interface, keeping the generator inside 
that would produce the next value on 'pull' request and keep it ready for 
any 'peek's.

Euler's sieve is

 sieve (p:ps) xs = h ++ sieve ps (t `minus` map (p*) [p,p+2..])
  where (h,t) = span ( p*p) xs

Everything lives only through access, so (sieve (tail primes) [5,7]) would 
create an object with the generator which has the 'span' logic inlined:

 sieve ps xs = make producer such that
 p := pull ps -- alter ps as well (actually pull a value from it)
 q := p*p
 peek = x := peek xs
if x  q then x else peek (remake self)
 pull = x := peek xs
if x  q then pull xs else pull (remake self)
 remake = ys := minus xs (intsFromBy q (2*p))
  self := sieve ps ys

Here the only thing that gets created are the 'minus' nodes which essentially 
maintain pointers into the two streams that they consume. 'intsFromBy' only has 
to maintain two integers inside it (currentVal and step) as there's no need for 
it to maintain any storage for its results, as they are immediately consumed. A 
persistent list would be represented by a different kind of producer which 
would be given a storage to operate on, upon creation (as would the top level 
variable like 'primes').

The real difference here is between those producers whose values will only be 
consumed once, by one specific consumer, and those which values may be needed 
more than once, so need really to be maintained in some storage. If not - span, 
filter, map, whatever - they all are just little modifiers on top of the real 
producers, which may or may not also have an actual storage maintained by them.


  again, what I mean is, not _where_ I start crossing them off in a PQ, but
  _when_. The article's code starts crossing them off _at_ p^2 - by adding
  p^2+2p into the PQ - _as_ _soon_ as p itself is reached. It won't surface
  until p^2 will be considered for a prime; it'll lay dormant deep inside the
  queue's guts. When reaching 7919, the thousand (i.e. pi(7919) ) entries
  will hang out inside the PQ - instead of just 24. A memory blowup. (this is
  of course fixed in Melissa's ZIP package). Of course due to the nature of
  PQ it might actually not hurt the performance for a while, depending on
  partcular PQ implementation. Complexity _and_ constant factors.
 
 
 It will start to have an impact pretty soon. Assuming at least one of the
 relevant PQ operations to be Theta(log size), each composite between ~400
 and ~40 (rough estimates) will take something like twice as long to
 handle. It will start to hurt really badly only a while later, though, as
 a guesstimate, with more than a million primes in the PQ, memory will have
 a hard time.


Exactly!


 
   If, on the other hand, you start crossung off at 2*p, when the main sieve
   is at 10^7, the size of the PQ is  65, at 10^8, the size is more
   than 5.5 million. That starts to become a memory problem rather soon.
 
  here you don't have a choice or when to add it - you have to add it at p
  itself - so the problem is clear. But even when you cross at p^2, the
  question remains, of when you add the p^2 entry into the PQ. That was my
  point.
 
  Postponed Filters code makes this clear, and thus hard to err on.
  Unfortunately, it wasn't present  _in_  the article.


   I think that remark was meant to apply to composite removal, not Turner's
   sieve.
 
  It is right there on page 2, right when the Turner's sieve is presented and
  discussed. The only explanation that I see is that she thought of it in
  regards to the imperative code, just as her analysis concentrates only on
  calculation aspects of the imperative code itself.
 
 
 To be fair, she writes:
 
 Let us

[Haskell-cafe] Re: FASTER primes

2010-01-04 Thread Will Ness
Heinrich Apfelmus apfelmus at quantentunnel.de writes:

 
 Will Ness wrote:
 
  I keep thinking that storage duplication with span, filter etc. is not
  really 
  necessary, and can be replaced with some pointer chasing - especially when 
  there's only one consumer present for the generated values. 
  
  What I mean is thinking of lists in terms of produce/consumer paradigm, as 
  objects supporting the { pull, peek } interface, keeping the generator
  inside that would produce the next value on 'pull' request and keep it
  ready for any 'peek's.
  
  Euler's sieve is
  
   sieve (p:ps) xs = h ++ sieve ps (t `minus` map (p*) [p,p+2..])
where (h,t) = span ( p*p) xs
 
  [...]
 
  The real difference here is between those producers whose values will
  only be consumed once, by one specific consumer, and those which values
  may be needed more than once, so need really to be maintained in some
  storage. If not - span, filter, map, whatever - they all are just little
  modifiers on top of the real producers, which may or may not also have 
  an actual storage maintained by them.
 
 (I haven't followed the whole thread, but hopefully I have enough grasp
 of it to make a useful remark. :))
 
 Concerning lists as producer/consumer, I think that's exactly what lazy
 evaluation is doing. Neither  filter ,  map  or  span  evaluate and
 store more list elements that strictly necessary.


I laways suspected as much, but was once told that Chris Okasaki has shown that 
any filter etc must allocate its own storage. With the peek/pull they don't 
have to, if they are nested, and the deepest one from the real storage gets 
pulled through some pointer chasing eventually. Span isn't so easily compiled 
out too or is it? But that might be a minor point.

For me, a real smart compiler is one that would take in e.g. (sum $ take n $ 
cycle $ [1..m]) and spill out a straight up math formula, inside a few ifs 
maybe (just an aside). 

Such a smart compiler might even be able to derive a well performing code right 
from the Turner's sieve. :)


 Sure, creating a list head only to immediately consume it is somewhat
 inefficient -- and the target of stream fusion[1] -- but this is an
 overhead of how list elements are stored, not how many.

it might be equivalent to the (imagined) producer's storing its 'current' value 
inside its frame.

How much can we rely on the run-time to actually destroy all the passed-over 
elements and not hang on to them for some time? Is this that compiler switch 
that Daniel mentioned? Is it reliable?


 
 You can try to implement the Euler sieve with producers by using a type like
 
data Producer a = forall s. Producer {
   state :: !s, next :: s - s, value :: s - a }
 
 but I think this will be quite difficult; it's not clear what and thus
 how big the state will be. (See [1] for choosing a good type.)


I did that once in Scheme, as per SICP, with 'next' hanging in a stream's tail. 
Put filters and maps on top of that (inside the new 'next' actually). But that 
used the Scheme's lists as sorage. Another one was actual producers/modifiers 
with {pull,peek} interface. It even produced some primes, and some Hamming 
numbers. Then I saw Haskell, and thought I'd get all that for free with its 
equational reasoning.

But I get the impression that GHC isn't working through equational reasoning?.. 
I see all this talk about thunks etc.


 Concerning the sieves, there is a fundamental difference between the
 imperative sieve and the functional sieves, regardless of whether the
 latter start at p or p^2 or use a priority queue. Namely, the imperative
 sieve makes essential use of *pointer arithmetic*. The key point is that
 in order to cross off the multiples
 
 p, 2*p, 3*p, ...
 
 of a prime, the algorithm can directly jump from the (k*p)-th to the
 (k*p+p)-th array element by adding  p  to the index. The functional
 versions can never beat that because they can't just jump over  p
 constructors of a data structure in O(1) time.


We can directy jump to the next multiple too, it is called (+). :) :)

But seriously, the real issue is that we have to merge the produced streams of 
multiples, while the mutable-storage code works on same one, so its merging 
cost is zero. And even if we are smart to merge them in a tree-like fashion, 
we still have no (or little) control over the compiler's representation of 
lists and retention of their content and whether it performs stream fusion or 
not (if we use lists).

If you could take a look at the tree-merging primes and why it uses too much 
memory, it would be great. The code is in Daniel's post to which I replied, or 
on haselwiki Prime_numbers page (there in its rudimentary form). It's a tangent 
to your VIP code, where instead of People structure an ordered list is just 
maintained as a split pair, of its known (so far, finite) prefix and the rest 
of it. Then under merging these split pairs form a monoid, s can

[Haskell-cafe] Re: FASTER primes

2010-01-04 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 Am Sonntag 03 Januar 2010 09:54:37 schrieb Will Ness:
 
  The quesion of a memory blowup with the treefolding merge still remains.
  For some reason using its second copy for a feeder doesn't reduce the
  memory (as reported by standalone compiled program, GHCi reported values
  are useless) - it causes it to increase twice.
 
 
 I have a partial solution. The big problem is that the feeder holds on to 
 the beginning of comps while the main runner holds on to a later part. Thus
 the entire segment between these two points must be in memory. 

 So have two lists of composites (yeah, you know that, but it didn't work
 so far).

 But you have to force the compiler not to share them: enter -fno-cse.
 The attached code does that (I've also expanded the wheel), it reduces the
 memory requirements much (a small part is due to the larger wheel, a factor
 of ~5 due to the non-sharing).


I don't understand. What is there to be shared? Each multiples list is consumed 
only at one point; there's nothing to be shared. Do you mean the compiler still 
hangs on to them? If so, why?? 

I used the switch; it didn't help at all. The only thing I can see is different 
is that all my interim data which I named with inner vars you moved out to the 
top level as functions. Is that what did the trick? What would be the reason to 
hang on to the already consumed data that is inaccessible to any active 
consumer? Why not make the forgetful behaviour the norm - especially where 
remembering is pointless??




 It still uses much more memory than the PQ, and while the PQ's memory
 requirements grow very slowly, the tree-fold merge's still grow rather fast
 (don't go much beyond the 10,000,000th prime), I'm not sure why.


You did it! It's now 7M for 1,000,000th prime, instead of 52M before. Making 
the pattern lazy in mergeSP was probably an important fix too. :)

Unfortunately it grows, as you've said - 23MB for 2 mln. :|  

PQ stays at just 2MB. 



 
 Attachment (V13Primes.hs): text/x-haskell, 3621 bytes
 
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe at haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe
 




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes

2010-01-04 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 
 Am Montag 04 Januar 2010 13:25:47 schrieb Will Ness:
 
  Euler's sieve is
 
   sieve (p:ps) xs = h ++ sieve ps (t `minus` map (p*) [p,p+2..])
where (h,t) = span ( p*p) xs
 
 Not quite. That's a variant of the postponed filters, it crosses off e.g.
 45 twice, once as 3*15 and once as 5*9 (okay, it has already been removed by
 the first, so let's say 45 appears in two lists of numbers to be removed if
 present).

there won't be any such. whatever is removed first, is just skipped second (and 
third, etc). 45 does appear twice on the two multiples ists (3- and 5-). But it 
is removed by first, and skipped by second. And there's no removal here in 
the first place. There is no pre-filled storage. All there is, is some lists 
comparison, and lists are just generators (I so keep hoping).

I don't see any problem here. As Melissa (and yourself, I think) have shown, 
double hits on multiples are few and far between.

Also, it uses no filters, i.e. no individual number divisibility testing. 
The filters refers first of all to testing an individual number to decide 
whether to keep it in or not. Euler's sieve removes multiples in advance, so 
there's no testing and no filtering, only comparison. It follows the 
_Postponed_ Filter's framework in postponing the action until the right moment; 
the action itself is two lists comparison and skipping of the equals (i.e. 
the minus action).


 Euler's sieve is never attempting to remove a number more than once, that's


How's that possible? On wikipedia is says, it removes multiples of 3; then 
multiples of 5; etc. So it just looks for each number it is removing, if it is 
there, and if so, removes it. I would argue that looking whether to remove or 
not, is part of attempting to remove. It can't have foresight, right?

 the outstanding feature of it. Unfortunately, that also makes it hard to
 implement efficiently. The problem is that you can't forget the primes
 between p and p^2, you must remember them for the removal of multiples of p
 later on.

you're right, every formulation of these algorithms is always done in the 
imperative, mutable-storage setting. They always speak of removing numbers 
etc. 

The more pressing problem with that code is its linear structure of course 
which gets addressed by the tree-folding merge etc.


  An (inefficient but faithful) implementation would be
 
 euler ks@(p:rs) = p : euler (rs `minus` map (*p) ks)


I think it does exactly the same thing, computationally, except it does so, 
again,  _prematurely_  exactly in Turner's sieve's fashion - for each prime 
found, _as_ _soon_ as it is found. If it is faithful, so is my code. :)


 Its performance is really horrible though. 

makes sense, as for Turner's. See, the _when_ thing really helps to see what's 
going on here. 


 A much better implementation is
 
 primes = 2:3:euler hprs [] (scanl (+) 5 (cycle [2,4]))
   where
 hprs = 5:drop 3 primes
 euler (p:ps) acc cs = h ++ euler ps (tail (acc ++ h)) (t `minus` comps)
   where
 (h,t) = span ( p*p) cs
 comps = map (*p) (acc ++ cs)


this look like {2,3} wheel reimplemented and inlined. No point in improving 
anything until the linear structure isn't turned into a tree. 


  
   To be fair, she writes:
  
   ... (This optimization does not affect the time complexity of the sieve,
   however, so its absence from _the_ _code_ in Section 1
   is _not_ our cause for worry.)
 
  A-HA!
 
  But its absense from _*that*_ _*code*_ WAS the *major* cause for worry, as
  dealing with it worked wonders on its complexity and constant factors.
 
 I think you're overinterpreting it. 


I might have. I don't mean to neatpick, honest. It's just how it looked to 
me: Turner's? - bad. Squares? - won't help, it's a _minor_ thing; don't even 
go there! etc. 

As I've said, geniuses leap, and fly. I just wanted to _walk_ down that road, 
not skipping any steps. And it turned out, the very first step _really_ pays up 
_big_. So big in fact, it might be argued that any subsequent improvement is 
just an advanced optimization, in terms of presenting an introductory code 
which isn't horribly inefficient and yet is short and clear.



 ___
 Haskell-Cafe mailing list
 Haskell-Cafe at haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe
 




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes

2010-01-04 Thread Will Ness
Emil Axelsson emax at chalmers.se writes:

 
  For me, a real smart compiler is one that would take in e.g. (sum $ 
  take n $ 
  cycle $ [1..m]) and spill out a straight up math formula, inside a few ifs 
  maybe (just an aside). 
 
 (Also an aside, I couldn't resist...)
 
 Then I'm sure you'd say that Feldspar [1] has a smart compiler :)


but it didn't produce

f n m = if n  m then n*(n+1)/2 else
let (q,r)=quotRem n m
in q*(m*(m+1)/2) + r*(r+1)/2

:)

 
 The above expression written in Feldspar and the resulting C code can be 
 found here:
 
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=15592#a15593
 
 
 / Emil
 
 [1] http://feldspar.sourceforge.net/
 




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes

2010-01-03 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 
 Am Samstag 02 Januar 2010 14:13:29 schrieb Will Ness:
  Daniel Fischer daniel.is.fischer at web.de writes:
   Am Mittwoch 30 Dezember 2009 20:46:57 schrieb Will Ness:
Daniel Fischer daniel.is.fischer at web.de writes:
 Am Dienstag 29 Dezember 2009 20:16:59 schrieb Daniel Fischer:
   especially the claim that going by primes squares
   is a pleasing but minor optimization,
 
  Which it is not. It is a major optimisation. It reduces the
  algorithmic complexity *and* reduces the constant factors
  significantly.

 D'oh! Thinko while computing sum (takeWhile (= n) primes) without
 paper. It doesn't change the complexity, and the constant factors are
 reduced far less than I thought.
   
I do not understand. Turner's sieve is
   
  primes = sieve [2..]
   where
sieve (p:xs) = p : sieve [x | x-xs, x `mod` p /= 0]
   
and the Postponed Filters is
   
  primes = 2: 3: sieve (tail primes) [5,7..]
   where
sieve (p:ps) xs = h ++ sieve ps [x | x-t, x `rem` p /= 0]
  where (h,~(_:t)) = span ( p*p) xs
   
Are you saying they both exhibit same complexity?
  
   No. They don't.
   But if you're looking at an imperative (mutable array) sieve (that's
   simpler to analyse because you don't have to take the book-keeping costs
   of your priority queue, heap or whatever into account), if you start
   crossing out
 
  The key question then, is _*WHEN*_ and not _*WHAT*_. As is clearly
  demonstrated by the case of Turner/Postponed filters, the work that is done
  (of crossing numbers off) is the same
 
 Crossing off is only part of the work. Most of the work is checking whether 
 to cross off the number in this round. And Turner does a lot more of that
 than the postponed filters.

Exactly the point I tried to make. :) 

  - _when_ it is actually done - but
  Turner's starts out so _prematurely_ that it is busy doing nothing most of
  the time.
 
 It's not doing nothing. It's just doing a lot of superfluous work. It is
 _achieveing_ nothing most of the time, though.

again, yes. :)

 Take 7919, the thousandth prime. The postponed filters decide to keep it when
 fitering out the multiples of 89, the twenty-fourth prime. Turner also 
 divides it by all 975 primes in between. That is a lot of real but futile 
 work.

yes.

  Thus its function call overhead costs pile up enormously,
  overstaging the actual calculation.
 
 It's not only the function calls. Division is expensive, too.

yes, that's what I meant - the cost of calling all the fuctions that - we know 
in advance will - have nothing to do eventually.


  So analyzing that calculation in the premature execution setting is missing
  the point, although helpful after we fix this, with the Postponed Filters.
  _Only then_ the finer points of this algorithm's analysis can be applied -
  namely, of avoiding testing primes divisibility altogether. And _if_ a fast
  cheap primality test were to have existed, the filtering versions would win
 
 Sorry, I can't follow. What's the point of a primality test here? Every
 number whose multiples we want to remove is prime, what's left to test?

sorry for being obstruse. I meant in a filtering sieve (i.e. postponed filters) 
vs the multiples removing sieves, if only the cheap primality test have existed 
(by some _magic_) which _could_ be run on _every_ numer (unlike the costly 
divisiility test), than the filtering sieves would win. Hypothetically.

  over, because they progressively cull the input sequence so there would be
  no double hits as we have when merging the multiples (whether from lists or
  inside the PQ).
 
 But there's a lot of list constructuion and deconstruction necessary for the
 Euler sieve. 

yes. Unless, of course, s smart compiler recognizes there's only one consumer 
for the values each multiples-list is producing, and keeps it stripped down to 
a generator function, and its current value. I keep thinkig a smart compiler 
could eliminate all these span calls and replace them with just some pointers 
manipulating...

 That may be more work than the multiple hits cause.


so that too would make filters win; only _if_ the cheap primality test 
existed!.. 

 
   the multiples of p with 2*p, you have
   ...
 
  There are two questions here - where to start crossing off numbers, and
  when. If you'd start at 2*p maybe the overall complexity would remain the
  same but it'll add enormous overhead with all those duplicate multiples.
 
 The additional duplicate multiples aren't the problem. Sure, the numbers
 having a prime divisor larger than the square root would be crossed off one
 additional time,  but that isn't so much per se. The additional crossings off
 are O(bound), so they don't harm the time complexity. But they can have
 effects which multiply the running time by a large constant.

yes, exactly what I wanted to say

[Haskell-cafe] Re: FASTER primes

2010-01-03 Thread Will Ness
Will Ness will_n48 at yahoo.com writes:

 
 ... It was a big STOP sign on the way to 
 Postponed Filters - Euler's - Bird's merged multiples - tree-merging (with 
 wheel) road of little steps, and used as a justification for her to make a
 big leap across the chasm towards the PQ code. 

correction: across the /supposed/ chasm. 

There is no chasm. There is a nice straight freeway, with rest stops and 
gas stations, and exits to local roads going across the county in every 
which way. :) 


 .. and that is of course a matter of personal preference. A genius is
 perfectly capable of making big leaps across chasms. Heck they might even
 be able to fly :)



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes

2010-01-02 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 
 Am Mittwoch 30 Dezember 2009 20:46:57 schrieb Will Ness:
  Daniel Fischer daniel.is.fischer at web.de writes:
   Am Dienstag 29 Dezember 2009 20:16:59 schrieb Daniel Fischer:
 especially the claim that going by primes squares
 is a pleasing but minor optimization,
   
Which it is not. It is a major optimisation. It reduces the algorithmic
complexity *and* reduces the constant factors significantly.
  
   D'oh! Thinko while computing sum (takeWhile (= n) primes) without paper.
   It doesn't change the complexity, and the constant factors are reduced
   far less than I thought.
 
  I do not understand. Turner's sieve is
 
primes = sieve [2..]
 where
  sieve (p:xs) = p : sieve [x | x-xs, x `mod` p /= 0]
 
  and the Postponed Filters is
 
primes = 2: 3: sieve (tail primes) [5,7..]
 where
  sieve (p:ps) xs = h ++ sieve ps [x | x-t, x `rem` p /= 0]
where (h,~(_:t)) = span ( p*p) xs
 
  Are you saying they both exhibit same complexity?
 
 No. They don't.
 But if you're looking at an imperative (mutable array) sieve (that's simpler
 to analyse because you don't have to take the book-keeping costs of your
 priority queue, heap or whatever into account), if you start crossing out 


The key question then, is _*WHEN*_ and not _*WHAT*_. As is clearly demonstrated
by the case of Turner/Postponed filters, the work that is done (of crossing
numbers off) is the same - _when_ it is actually done - but Turner's starts out
so _prematurely_ that it is busy doing nothing most of the time. Thus its
function call overhead costs pile up enormously, overstaging the actual
calculation. 

So analyzing that calculation in the premature execution setting is missing the
point, although helpful after we fix this, with the Postponed Filters. _Only
then_ the finer points of this algorithm's analysis can be applied - namely, of
avoiding testing primes divisibility altogether. And _if_ a fast cheap primality
test were to have existed, the filtering versions would win over, because they
progressively cull the input sequence so there would be no double hits as we
have when merging the multiples (whether from lists or inside the PQ).


 the multiples of p with 2*p, you have
 
 sum [bound `div` p - 1 | p - takeWhile (= sqrt bound) primes]
 
 crossings-out, that is Theta(bound*log (log bound)). If you eliminate 
 multiples of some small primes a priori (wheel), you can reduce the constant 
 factor significantly, but the complexity remains the same (you drop a few
 terms from the front of the sum and multiply the remaining terms with 
 phi(n)/n, where n is the product of the excluded primes).
 
 If you start crossing out at p^2, the number is
 
 sum [bound `div` p - (p-1) | p - takeWhile (= sqrt bound) primes].
 
 The difference is basically sum (takeWhile (= sqrt bound) primes), which I 
 stupidly - I don't remember how - believed to cancel out the main term. 
 It doesn't, it's O(bound/log bound), so the complexity is the same.

 Now if you take a stream of numbers from which you remove composites, having
 a priority queue of multiples of primes, things are a little different.
  If you start crossing out at 2*p, when you are looking at n, you have more 
 multiples in your PQ than if you start crossing out at p^2 (about pi(n/2) 
 vs. pi(sqrt n)), so updating the PQ will be more expensive. But updating the 
 PQ is O(log size), I believe, and log pi(n) is O(log pi(sqrt n)), so I think 
 it shouldn't change the complexity here either. I think this would have 
 complexity O(bound*log bound*log (log bound)).

There are two questions here - where to start crossing off numbers, and when. If
you'd start at 2*p maybe the overall complexity would remain the same but it'll
add enormous overhead with all those duplicate multiples. No, the question is
not where to start, but when. PQ might hide the problem until the memory blows
up. Anything that we add that won't have any chance of contributing to the final
result, is added for nothing and only drives the total cost up needlessly. 

 
  I was under impression that the first shows O(n^2) approx., and the second 
  one O(n^1.5) (for n primes produced).
 
 In Turner/postponed filters, things are really different. Actually, Ms. 
 O'Neill is right, it is a different algorithm. In the above, we match each 


what _is_ different is divisibility testing vs composites removal, which follows
from her in-depth analysis although is never quite formulated in such words in
her article. But nothing matters until the premature starting up is eliminated,
and that key observation is missing for the article either - worse, it is
brushed off with the casual remark that it is a pleasing but minor
optimization. Which remark, as you show here, is true in the imperative,
mutable-storage setting, but is made in an article abut functional code, in
regard to the functional code of Turner's sieve. So the key

[Haskell-cafe] Re: FASTER primes

2009-12-30 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 
 Am Mittwoch 30 Dezember 2009 01:04:34 schrieb Will Ness:
 
   While I haven't detected that with the primes code, I find that in my
   ghci your code is approximately 2.5 times faster than ONeill or Bayer
   when interpreted (no difference in scaling observed), while when 
   compiled with -O2, ONeill is approximately three times as fast as 
   your code
  
  that was what I was getting at first too, before I've put into my code the
  _type_signatures_ and the specialize _pragmas_ as per her file. Then it
  was only 1.3x slower, when compiled (with about same asymptotics and 
  memory usage).
 
 
 Specialising your code to Int makes it half as fast as ONeill here 
 (as an executable).
 That is largely due to the fact that your code uses much more memory here 
 (54MB vs. 2MB for the millionth prime), though, the MUT times have a ratio 
 of about 1.5.

I'm an unsophisticated tester. I just use

 GHC -O2 -c filename.hs
 GHCi filename

and then it says ( for primes()!!100 )

(8.24 secs, 1906813836 bytes) for my code, and   
(6.09 secs, 1800873864 bytes) for O'Neill's  

But now when I've looked at system resources I see this too. Well, it means
we've found where the PQ code is better. OK.


 Now an interesting question is, why does it use so much memory here?
 Can you send me your exact code so I can see how that behaves here?

will do. It's probably doing a lot more bookkeeping. Or it might be some impl
issue with scanl or span etc., and it'll go away if we'd recode it directly, 
who knows. We can only guess if we don't know the compiler in and out. That's
exactly what kept me off using the compiler. Guessing. Sheesh.

(and I did see a 2.0x speedup once when replacing one simple code snippet for 
its operationally equivalent twin).


 
  and twice as fast as Bayer as an executable, about twice as fast as your
   code and slightly slower than Bayer in ghci.
 
  see, this kind of inconsistencies is exactly why I was concentrating only
  on one platform in measuring the speed - the interp'/GHCi combination.
 
 The problem with that is that one is primarily interested in speed for
 library functions, which are mostly used as compiled code.

right and I used it as a measure of code's fitness to the problem so it was
only comparative to me.

 
  Especially when developing and trying out several approaches, to test with
  compiler just takes too long. :) And why should it give (sometimes) wildly
  different readings when running inside GHCi or standalone ??
 
 Good question.
 
 
   And I have huge memory problems in ghci with your code.
   That may be due to my implementation of merge and minus, though. You
   wrote 'standard' and I coded the straightforward methods.
 
  Here's what I'm using ...
 
 More or less the same that I wrote.

The rest is exactly as I've posted it, I've only added type signatures and
specialize pragmas, as per Melissa's code,


{-# SPECIALIZE primes :: () - [Int] #-}
{-# SPECIALIZE primes :: () - [Integer] #-}
primes :: Integral a = () - [a]
primes () = 2:3:5:7:primes'
   where
primes' = [11,13] ++ drop 2 (rollFrom 11) `minus` comps
(comps,_)  = tfold mergeSP (pairwise mergeSP mults)
mults   = map (\p- fromList $ map (p*) $ rollFrom p) $ primes'


maybe it's about memoization of primes'. She writes something about it in 
her code. 


 
  It's just that the mutating code tends to be convoluted, like in the
  example I mentioned of quicksort. One has to read the C code with good
  attention to understand it.
 
 Convoluted is (often) an exaggeration. But I agree that the specification 
 of 'what' is usually easier to understand than that of 'how'.

well put.

 
  Normal Haskell is much more visually apparent, like
 
primes = 2: 3: sieve (tail primes) [5,7..]
 where
  sieve (p:ps) xs = h ++ sieve ps (t `minus` tail [q,q+2*p..])
where (h,~(_:t)) = span ( q) xs
  q  = p*p
 
 
 Yes.
 
  or
 
primes = 2: 3: sieve [] (tail primes) 5
 where
  sieve fs (p:ps) x = [i | i- [x,x+2..q-2], a!i]
++ sieve ((2*p,q):fs') ps (q+2)
   where
q   = p*p
mults   = [ [y+s,y+2*s..q] | (s,y)- fs]
fs' = [ (s,last ms)| ((s,_),ms)- zip fs mults]
a   = accumArray (\a b-False) True (x,q-2)
   [(i,()) | ms- mults, i- ms]
 
 
 Umm, really?
 I'd think if you see what that does, you won't have difficulties with a
 mutable array sieve.

You're right, bad example. :)


 ___
 Haskell-Cafe mailing list
 Haskell-Cafe at haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe
 




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes (was: Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve))

2009-12-30 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:


 

No, it's my own code. Nothing elaborate, just sieving numbers 6k±1, twice as
fast as the haskellwiki code (here) and uses only 1/3 the memory. For the 
record:

.


thanks! will need to sift through it thoroughly... :) :)



 
  BTW I think a really smart compiler should just get a specification, like
  Turner's sieve, and just derive a good code from that by itself.
 
 Go ahead and write one. I would love such a beast.
 
 
  Another example would be
 
qq n m = sum $ take n $ cycle [1..m]
 
  which should really get compiled into just a math formula, IMO. Now _that_
  I would call a good compiler.
 
 Dream on, dream on, with hope in your heart.


Those who can't do, dream. And rant. :)


 
 Maybe you'd like 
 http://en.wikipedia.org/wiki/Shakespeare_(programming_language) ?


niice. 

:)


 
 
 
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe at haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe
 




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes

2009-12-30 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 
 Am Dienstag 29 Dezember 2009 20:16:59 schrieb Daniel Fischer:
   especially the claim that going by primes squares
   is a pleasing but minor optimization,
 
  Which it is not. It is a major optimisation. It reduces the algorithmic
  complexity *and* reduces the constant factors significantly.
 
 D'oh! Thinko while computing sum (takeWhile (= n) primes) without paper.
 It doesn't change the complexity, and the constant factors are reduced 
 far less than I thought. 


I do not understand. Turner's sieve is

  primes = sieve [2..]
   where
sieve (p:xs) = p : sieve [x | x-xs, x `mod` p /= 0]

and the Postponed Filters is

  primes = 2: 3: sieve (tail primes) [5,7..]
   where 
sieve (p:ps) xs = h ++ sieve ps [x | x-t, x `rem` p /= 0]
  where (h,~(_:t)) = span ( p*p) xs  

Are you saying they both exhibit same complexity? I was under impression that 
the first shows O(n^2) approx., and the second one O(n^1.5) (for n primes 
produced).



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes (was: Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve))

2009-12-29 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 
 Am Dienstag 29 Dezember 2009 04:38:21 schrieb Will Ness:
  Now _this_, when tested as interpreted code in GHCi, runs about 2.5x times
  faster than Priority Queue based code from Melissa O'Neill's ZIP package
  mentioned at the haskellwiki/Prime_Numbers page, with
  about half used memory reported, in producing 10,000 to 300,000 primes.
 
  It is faster than BayerPrimes.hs from the ZIP package too, in the tested
  range, at about 35 lines of code in total.
 
 That's nice. However, the important criterion is how compiled code (-O2)
fares. Do the relations continue to hold? How does it compare to a bitsieve?


Haven't gotten to that part yet. :)

But why is it more important? Would that not tell us more about the compiler
performance than the code itself? 

This code is just an endpoint (so far) in a short procession of natural stepwise
development of the famous classic Turner's sieve, through the postponed
filters, through to Euler's sieve, the merging sieve (i.e. Richard Bird's) and
on to the tree-fold merging, with wheel. I just wanted to see where the simple
normal (i.e. _beginner_-friendly) functional code can get, in a natural way.

It's not about writing the fastest code in _advanced_ Haskell. It's about having
clear and simple code that can be understood at a glance - i.e. contributes to
our understanding of a problem - faithfully reflecting its essential elements,
and because of _that_, fast. It's kind of like _not_ using mutable arrays in a
quicksort.

Seeing claims that it's _either_ Turner's _or_ the PQ-based code didn't feel
right to me somehow, especially the claim that going by primes squares is a
pleasing but minor optimization, what with the postponed filters (which serves
as the framework for all the other variants) achieving the orders of magnitude
speedup and cutting the Turner's O(n^2) right down to O(n^1.5) just by doing
that squares optimization (with the final version hovering around 1.24..1.17 in
the tested range). The Euler's sieve being a special case of Eratosthenes's,
too, doesn't let credence to claims that only the PQ version is somehow uniquely
authentic and faithful to it.

Turner's sieve should have been always looked at as just a specification, not a
code, anyway, and actually running it is ridiculous. Postponed filters version,
is the one to be used as a reference point of the basic _code_, precisely
because it _does_ use the primes squares optimization, which _is_ essential to
any basic sieve.


 ___
 Haskell-Cafe mailing list
 Haskell-Cafe at haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe
 


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes (was: Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve))

2009-12-29 Thread Will Ness
Eugene Kirpichov ekirpichov at gmail.com writes:

 
 2009/12/29 Will Ness will_n48 at yahoo.com:
  Daniel Fischer daniel.is.fischer at web.de writes:
 
 
 
  Am Dienstag 29 Dezember 2009 04:38:21 schrieb Will Ness:
   Now _this_, when tested as interpreted code in GHCi, runs about 2.5x 
   times
   faster than Priority Queue based code from Melissa O'Neill's ZIP package
   mentioned at the haskellwiki/Prime_Numbers page, with
   about half used memory reported, in producing 10,000 to 300,000 primes.
  
   It is faster than BayerPrimes.hs from the ZIP package too, in the tested
   range, at about 35 lines of code in total.
 
  That's nice. However, the important criterion is how compiled code (-O2)
  fares. Do the relations continue to hold? How does it compare to a bitsieve?
 
 
  Haven't gotten to that part yet. :)
 
  But why is it more important? Would that not tell us more about the compiler
  performance than the code itself?
 
 
 If you mean algorithmic complexity, you shouldn't care about a
 difference of 2.5x.

It's not just at one point; the asymptotics are _the_same_ across the range that
I've tested (admittedly, somewhat narrow). I measure local behavior simply as
logBase in base of ratio of problem sizes, of the ratio of run times.

 If you mean actual performance for a particular task, you should
 measure the performance in realistic conditions. Namely, if you're
 implementing a program that needs efficient generation of primes,
 won't you compile it with -O2?

If I realistically needed primes generated in a real life setting, I'd probably
had to use some C for that. If OTOH we're talking about a tutorial code that is
to be as efficient as possible without loosing it clarity, being a reflection of
essentials of the problem, then any overly complicated advanced Haskell wouldn't
be my choice either. And seeing that this overly-complicated (IMO),
steps-jumping PQ-based code was sold to us as the only faithful rendering of
the sieve, I wanted to see for myself whether this really holds water.



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes (was: Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve))

2009-12-29 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 
 Am Dienstag 29 Dezember 2009 04:38:21 schrieb Will Ness:
  Now _this_, when tested as interpreted code in GHCi, runs about 2.5x times
  faster than Priority Queue based code from Melissa O'Neill's ZIP package
  mentioned at the haskellwiki/Prime_Numbers page, with
  about half used memory reported, in producing 10,000 to 300,000 primes.
 
  It is faster than BayerPrimes.hs from the ZIP package too, in the tested
  range, at about 35 lines of code in total.
 
 That's nice. However, the important criterion is how compiled code (-O2)
fares. Do the relations continue to hold? How does it compare to a bitsieve?
 

OK, I've tested it now. For some reason it runs about 1.3x times slower than
Melissa O'Neill's code when compiled, while taking about the same amount of
memory (going slightly worse actually, 0.97..1.05..1.08), in the range of
100,000..1,000,000..2,000,000 primes produced. The local asymptotic behavior was
about the same, again, in both versions, about O(n^1.20..1.25) - worsening
slightly for the merging version (the ratio of run times going 
1.25..1.29..1.32).

I guess that makes the two versions (almost) operationally equivalent in
producing of up to a million primes or two.




 
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe at haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe
 




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: FASTER primes

2009-12-29 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 
 Am Dienstag 29 Dezember 2009 14:34:03 schrieb Will Ness:
  Daniel Fischer daniel.is.fischer at web.de writes:
   Am Dienstag 29 Dezember 2009 04:38:21 schrieb Will Ness:
Now _this_, when tested as interpreted code in GHCi, runs about 2.5x
times faster than Priority Queue based code from Melissa O'Neill's ZIP
package mentioned at the haskellwiki/Prime_Numbers page, with
about half used memory reported, in producing 10,000 to 300,000 primes.
   
It is faster than BayerPrimes.hs from the ZIP package too, in the
tested range, at about 35 lines of code in total.
  
   That's nice. However, the important criterion is how compiled code (-O2)
 
  fares. Do the relations continue to hold? How does it compare to a
  bitsieve?
 
 
  Haven't gotten to that part yet. :)
 
  But why is it more important?
 
 I thought the uppercase FASTER in the subject meant you were really 
 interested in speed.
 If you're only interested in asymptotics, interpreted may be appropriate.

 However, it is possible that optimisation can change the perceived 
 asymptotics of an algorithm (determined strictness eliminating thunks for
example).
 
 While I haven't detected that with the primes code, I find that in my ghci
 your code is approximately 2.5 times faster than ONeill or Bayer when
 interpreted (no difference in scaling observed), while when compiled 
 with -O2, ONeill is approximately three times as fast as your code 


that was what I was getting at first too, before I've put into my code the
_type_signatures_ and the specialize _pragmas_ as per her file. Then it was
only 1.3x slower, when compiled (with about same asymptotics and memory usage).


and twice as fast as Bayer as an executable, about twice as fast as your code
 and slightly slower than Bayer in ghci.


see, this kind of inconsistencies is exactly why I was concentrating only on 
one platform in measuring the speed - the interp'/GHCi combination. Especially 
when developing and trying out several approaches, to test with compiler just 
takes too long. :) And why should it give (sometimes) wildly different readings 
when running inside GHCi or standalone ??


 And I have huge memory problems in ghci with your code.
 That may be due to my implementation of merge and minus, though. You wrote
 'standard' and I coded the straightforward methods.


Here's what I'm using (BTW I've put it on the primes haskellwiki page too). The
memory reported for interpreted is about half of PQ's (IIRC), and compiled - the
same:

 minus a@(x:xs) b@(y:ys) = case compare x y of 
LT - x: xs `minus` b
GT -a  `minus` ys
EQ -xs `minus` ys
 minus a b  = a

 merge a@(x:xs) b@(y:ys) = case compare x y of
LT - x: merge xs b
EQ - x: merge xs ys
GT - y: merge a  ys
 merge a b  = if null b then a else b




 
  Would that not tell us more about the compiler performance than the code 
  itself?
 
 Unless you write machine code or assembly, don't all performance tests tell us
more about the compiler/interpreter performance than the code itself?
 That is, of course, with respect to algorithms with the same scaling 
 behaviour.
 
 
  This code is just an endpoint (so far) in a short procession of natural
  stepwise development of the famous classic Turner's sieve,
 
 That was
 
 sieve (x:xs) = x:sieve (filter ((/= 0) . (`mod` x)) xs)
 
 , was it?

right


 
  through the
  postponed filters, through to Euler's sieve, the merging sieve (i.e.
  Richard Bird's) and on to the tree-fold merging, with wheel. I just wanted
  to see where the simple normal (i.e. _beginner_-friendly) functional code
  can get, in a natural way.
 
 Good.
 
 
  It's not about writing the fastest code in _advanced_ Haskell. It's about
  having clear and simple code that can be understood at a glance - i.e.
  contributes to our understanding of a problem - faithfully reflecting its
  essential elements, and because of _that_, fast. It's kind of like _not_
  using mutable arrays in a quicksort.
 
 What's wrong with mutable arrays? There are a lot of algorithms which can be
 easily and efficiently implemented using mutable unboxed arrays while a
 comparably efficient implementation without mutable arrays is hard. For 
 those, I consider STUArrays the natural choice. Sieving primes falls into 
 that category.


It's just that the mutating code tends to be convoluted, like in the example I
mentioned of quicksort. One has to read the C code with good attention to
understand it. Normal Haskell is much more visually apparent, like

  primes = 2: 3: sieve (tail primes) [5,7..]  
   where
sieve (p:ps) xs = h ++ sieve ps (t `minus` tail [q,q+2*p..])
  where (h,~(_:t)) = span ( q) xs 
q  = p*p

or

  primes = 2: 3: sieve [] (tail primes) 5  
   where 
sieve fs (p:ps) x = [i | i- [x,x+2..q-2], a!i] 
  ++ sieve ((2*p,q):fs') ps

[Haskell-cafe] Re: FASTER primes (was: Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve))

2009-12-29 Thread Will Ness
Daniel Fischer daniel.is.fischer at web.de writes:

 
 
 Gee, seems my mail server reads your posts very thoroughly today :)

I hope it's not a bad thing. :)

 
 Am Dienstag 29 Dezember 2009 14:58:10 schrieb Will Ness:
 
  If I realistically needed primes generated in a real life setting, I'd
  probably had to use some C for that.
 
 If you need the utmost speed, then probably yes. If you can do with a little
 less, my STUArray bitsieves take about 35% longer than the equivalent C code 
and are roughly eight times faster than ONeillPrimes. I can usually live well 
with that.


Wow! That's fast! (that's the code from haskellwiki's primes page, right?)


 
  If OTOH we're talking about a tutorial
  code that is to be as efficient as possible without loosing it clarity,
  being a reflection of essentials of the problem, then any overly
  complicated advanced Haskell wouldn't be my choice either.
 
 +1
 Though perhaps we view mutable array code differently. In my view, it's
 neither advanced nor complicated.


convoluted, then. Not using higher level concepts, like map and foldr, :) or
head.until isSingleton (pairwise merge).map wrap , that kind of thing. :)

BTW I think a really smart compiler should just get a specification, like
Turner's sieve, and just derive a good code from that by itself.

Another example would be

  qq n m = sum $ take n $ cycle [1..m]

which should really get compiled into just a math formula, IMO. Now _that_ I
would call a good compiler. That way I really won't have to learn how to use
STUArray`s you see.

I've seen this question asked a lot, what should be a good programming language?

IMO, English (plus math where needed, and maybe some sketches by hand). :)


 
  And seeing that
  this overly-complicated (IMO), steps-jumping PQ-based code was sold to us
  as the only faithful rendering of the sieve, I wanted to see for myself
  whether this really holds water.
 
 I can understand that very well.

:)



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve)

2009-12-28 Thread Will Ness

apfelmus apfelmus at quantentunnel.de writes:

 
 Dave Bayer wrote:
  What I'm calling a venturi
 
 venturi :: Ord a = [[a]] - [a]
 
  merges an infinite list of infinite lists into one list, under the
  assumption that each list, and the heads of the lists, are in
  increasing order.
 
  I wrote this as an experiment in coding data structures in Haskell's
  lazy evaluation model, rather than as explicit data. The majority of the
  work done by this code is done by merge; the multiples of each prime
  percolate up through a tournament consisting of a balanced tree of
  suspended merge function calls. In order to create an infinite lazy
  balanced tree of lists, the datatype
  
 data List a = A a (List a) | B [a]
  
  is used as scaffolding. One thinks of the root of the infinite tree as
  starting at the leftmost child, and crawling up the left spine as
  necessary.
 
 After some pondering, the  List a  data structure for merging is really
 ingenious! :) Here's a try to explain how it works:
 
 The task is to merge a number of sorted and infinite lists when it's
 known that their heads are in increasing order. In particular, we want
 to write
 
   primes = (2:) $ diff [3..] $ venturi $ map multiple primes
 
 Thus, we have a (maybe infinite) list
 
   xss = [xs1, xs2, xs3, ...]
 
 of infinite lists with the following  properties
 
   all sorted xss
   sorted (map head xss)
 
 where  sorted  is a function that returns  True  if the argument is a
 sorted list. A first try to implement the merging function is
 
   venturi xss = foldr1 merge xss
 = xs1 `merge` (xs2 `merge` (xs3 `merge` ...
 
 where  merge  is the standard function to merge to sorted lists.
 
 However, there are two problems. The first problem is that this doesn't
 work for infinite lists since  merge  is strict in both arguments. But
 the property  head xs1  head xs2  head xs3  ...  we missed to exploit
 yet can now be used in the following way
 
   venturi xss = foldr1 merge' xss
 
   merge' (x:xt) ys = x : merge xt ys
 
 In other words,  merge'  is biased towards the left element
 
   merge' (x:_|_) _|_ = x : _|_
 
 which is correct since we know that (head xs  head ys).
 
 The second problem is that we want the calls to  merge  to be arranged
 as a balanced binary tree since that gives an efficient heap. It's not
 so difficult to find a good shape for the infinite tree, the real
 problem is to adapt  merge' to this situation since it's not associative:

 ..
 
 The problem is that the second form doesn't realize that y is also
 smaller than the third argument. In other words, the second form has to
 treat more than one element as privileged, namely  x1,x2,... and y.
 This can be done with the aforementioned list data structure
 
   data People a = VIP a (People a) | Crowd [a]
 
 The people (VIPs and crowd) are assumed to be _sorted_. Now, we can
 start to implement
 
   merge' :: Ord a = People a - People a - People a

Hi,

... replying to a two-years-old post here, :) :) and after consulting the 
full VIP version in haskellwiki/Prime_Numers#Implicit_Heap ...

It is indeed the major problem with the merged multiples removing code (similar 
one to Richard Bird's code from Melissa O'Neill's JFP article) - the linear 
nature of foldr, requiring an (:: a-b-b) merge function. To make it freely 
composable to rearrange the list into arbitrary form tree it must indeed be 
type uniform (:: a-a-a) first, and associative second.

The structure of the folded tree should be chosen to better suit the primes 
multiples production. I guestimate the total cost as Sum (1/p)*d, where p is a 
generating prime at the leaf, and d the leaf's depth, i.e. the amount of merge 
nodes its produced multiple must pass on its way to the top.

The structure used in your VIP code, 1+(2+(4+(8+...))), can actually be 
improved upon with another, (2+4)+( (4+8)+( (8+16)+...)), for which the 
estimated cost is about 10%-12% lower.

This can be expressed concisely as the following:

  primes :: () - [Integer]
  primes () = 2:primes'
   where
primes'= [3,5] ++ drop 2 [3,5..] `minus` comps
mults  = map (\p- fromList [p*p,p*p+2*p..]) $ primes'
(comps,_)  = tfold mergeSP (pairwise mergeSP mults)
fromList (x:xs) = ([x],xs)

  tfold f (a: ~(b: ~(c:xs))) 
   = (a `f` (b `f` c)) `f` tfold f (pairwise f xs)
  pairwise f (x:y:ys)  = f x y : pairwise f ys

  mergeSP (a,b) ~(c,d) = let (bc,b') = spMerge b c
 in (a ++ bc, merge b' d)
 where 
  spMerge u@(x:xs) w@(y:ys) = case compare x y of
   LT - (x:c,d) where (c,d) = spMerge xs w
   EQ - (x:c,d) where (c,d) = spMerge xs ys
   GT - (y:c,d) where (c,d) = spMerge u  ys
  spMerge u [] = ([], u)
  spMerge [] w = ([], w)


with ''merge'' and ''minus'' defined in the usual way. Its run times are indeed 
improved 10%-12% over the VIP code from the haskellwiki page. Testing was done 
by running the code, 

[Haskell-cafe] FASTER primes (was: Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve))

2009-12-28 Thread Will Ness

apfelmus apfelmus at quantentunnel.de writes:

 


~~ This is a repost, with apologies to anyone who sees this twice (I've replied 
to a two years old thread, and it doesn't show up in GMANE as I thought it 
would). ~~



 Dave Bayer wrote:
  What I'm calling a venturi
 
 venturi :: Ord a = [[a]] - [a]
 
  merges an infinite list of infinite lists into one list, under the
  assumption that each list, and the heads of the lists, are in
  increasing order.
 
  I wrote this as an experiment in coding data structures in Haskell's
  lazy evaluation model, rather than as explicit data. The majority of the
  work done by this code is done by merge; the multiples of each prime
  percolate up through a tournament consisting of a balanced tree of
  suspended merge function calls. In order to create an infinite lazy
  balanced tree of lists, the datatype
  
 data List a = A a (List a) | B [a]
  
  is used as scaffolding. One thinks of the root of the infinite tree as
  starting at the leftmost child, and crawling up the left spine as
  necessary.
 
 After some pondering, the  List a  data structure for merging is really
 ingenious! :) Here's a try to explain how it works:
 
 The task is to merge a number of sorted and infinite lists when it's
 known that their heads are in increasing order. In particular, we want
 to write
 
   primes = (2:) $ diff [3..] $ venturi $ map multiple primes
 
 Thus, we have a (maybe infinite) list
 
   xss = [xs1, xs2, xs3, ...]
 
 of infinite lists with the following  properties
 
   all sorted xss
   sorted (map head xss)
 
 where  sorted  is a function that returns  True  if the argument is a
 sorted list. A first try to implement the merging function is
 
   venturi xss = foldr1 merge xss
 = xs1 `merge` (xs2 `merge` (xs3 `merge` ...
 
 where  merge  is the standard function to merge to sorted lists.
 
 However, there are two problems. The first problem is that this doesn't
 work for infinite lists since  merge  is strict in both arguments. But
 the property  head xs1  head xs2  head xs3  ...  we missed to exploit
 yet can now be used in the following way
 
   venturi xss = foldr1 merge' xss
 
   merge' (x:xt) ys = x : merge xt ys
 
 In other words,  merge'  is biased towards the left element
 
   merge' (x:_|_) _|_ = x : _|_
 
 which is correct since we know that (head xs  head ys).
 
 The second problem is that we want the calls to  merge  to be arranged
 as a balanced binary tree since that gives an efficient heap. It's not
 so difficult to find a good shape for the infinite tree, the real
 problem is to adapt  merge' to this situation since it's not associative:

 ..
 
 The problem is that the second form doesn't realize that y is also
 smaller than the third argument. In other words, the second form has to
 treat more than one element as privileged, namely  x1,x2,... and y.
 This can be done with the aforementioned list data structure
 
   data People a = VIP a (People a) | Crowd [a]
 
 The people (VIPs and crowd) are assumed to be _sorted_. Now, we can
 start to implement
 
   merge' :: Ord a = People a - People a - People a

Hi,

... replying to a two-years-old post here, :) :) and after consulting the 
full VIP version in haskellwiki/Prime_Numers#Implicit_Heap ...

It is indeed the major problem with the merged multiples removing code (similar 
one to Richard Bird's code from Melissa O'Neill's JFP article) - the linear 
nature of foldr, requiring an (:: a-b-b) merge function. To make it freely 
composable to rearrange the list into arbitrary form tree it must indeed be 
type uniform (:: a-a-a) first, and associative second.

The structure of the folded tree should be chosen to better suit the primes 
multiples production. I guestimate the total cost as Sum (1/p)*d, where p is a 
generating prime at the leaf, and d the leaf's depth, i.e. the amount of merge 
nodes its produced multiple must pass on its way to the top.

The structure used in your VIP code, 1+(2+(4+(8+...))), can actually be 
improved upon with another, (2+4)+( (4+8)+( (8+16)+...)), for which the 
estimated cost is about 10%-12% lower.

This can be expressed concisely as the following:

  primes :: () - [Integer]
  primes () = 2:primes'
   where
primes'= [3,5] ++ drop 2 [3,5..] `minus` comps
mults  = map (\p- fromList [p*p,p*p+2*p..]) $ primes'
(comps,_)  = tfold mergeSP (pairwise mergeSP mults)
fromList (x:xs) = ([x],xs)

  tfold f (a: ~(b: ~(c:xs))) 
   = (a `f` (b `f` c)) `f` tfold f (pairwise f xs)
  pairwise f (x:y:ys)  = f x y : pairwise f ys

  mergeSP (a,b) ~(c,d) = let (bc,b') = spMerge b c
 in (a ++ bc, merge b' d)
 where 
  spMerge u@(x:xs) w@(y:ys) = case compare x y of
   LT - (x:c,d) where (c,d) = spMerge xs w
   EQ - (x:c,d) where (c,d) = spMerge xs ys
   GT - (y:c,d) where (c,d) = spMerge u  ys
  spMerge u [] = ([], u)
  spMerge [] w = ([], w)


with ''merge'' 

[Haskell-cafe] Re: FASTER primes (was: Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve))

2009-12-28 Thread Will Ness
Will Ness will_n48 at yahoo.com writes:

 
 
   wheelSums  = roll 0 wdiffs
   roll   = scanl (+)
   wheel  = wdiffs ++ wheel
   wdiffs = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2:
4:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wdiffs
 

Apparently that works too, but I meant it to be:

   wdiffs = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2:
4:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:[]


:)


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Simple FAST lazy functional primes

2009-11-05 Thread Will Ness
Steve stevech1097 at yahoo.com.au writes:

 
 On Tue, 2009-11-03 at 10:52 -0500, Will wrote:
  I've just tried it and it was twice slower than mine. (?) I didn't use
  the [Int]
  signature in both. [...] It runs twice as fast with it).
  Although your
  code has an advantage that it is very easy to add the wheel
  optimization to it.
 
 I used [Int] for both. If you didn't use [Int] then it defaults to
 [Integer] and you are probably testing the speed of GMP integer
 operations (rather than the speed of Haskell Int operations) which could
 give differing conclusions.

When both are declared [Int], the ratios are [1.50, 1.40, 1.35, 1.26, 1.26] for 
speed of your version vs mine in producing 100,000, 300,000, 1, 2 and 3 million 
primes, on my laptop, with memory consistently at 120% (as reported by GHCi 
with :s +s switch). Haven't tried any above that, as it's old and slow. But 
clearly the two versions are on par with each  other. The reason I prefer my 
version is that it's in a form easy to be tweaked with further. :)


 I had looked into using a wheel before. Its nice in theory, but not so
 useful in practice. At least that's my experience where using a wheel
 made the primes function slower.

interesting.


 Adding your code and conclusions to Prime numbers on the Haskell wiki,
 could be useful.
 http://www.haskell.org/haskellwiki/Prime_numbers

Thank you for pointing me to that page. I've just done that. :) The short 
description that I've arrived at in the end, is that my version explicitly uses 
successive prefixes of the primes list to test batches of odds between 
successive squares of primes. Now it's short and clear.

Thanks!


 I've just updated the page to split it into Finding Primes and
 Testing Primality.
 
 Steve
 




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Simple FAST lazy functional primes

2009-11-03 Thread Will Ness
Jason Dagit dagit at codersbase.com writes:

 By the way, do you understand where the speedup with Int is coming from?  As I
understand it, there are two main places.  One is that the type class dictionary
passing can be removed (GHC might figure this out already, I'd need to check the
core to be sure).  The other is that GHC is likely unboxing to it's primitive
Int# type.
 [...]
 Good luck!
 Jason


Thanks!

Writing the super-fast sieve wasn't my objective here though. It rather was
writing the fastest possible simple functional lazy code true to the sieve's
definition, and understanding it better that way (that's the added bonus).

As it stands now, this code seems a rather faithful description of what _is_
sieve, except that it tests each number in isolation instead of counting over a
bunch of them at once (skipping over primes, getting them for free). THAT's the
crucial difference, which the article seems trying to explain but never quite
gets it in such simple terms. All the extra activity is kept to absolute minimum
here, and _now_ the main thing can be dealt with further, if so desired - like
turning to using the PQ thing, etc. 

Then if we were to compare them, it wouldn't be like comparing apples with
orange juice. :)  That's what it felt like, seeing the PQ code compared with the
classic naive version in that article. I'm reasonably sure that PQ-augmented,
this code will be even faster, not slower, even for the first million primes. 

This whole experience proves it that the clearest code can also be the fastest
(and may be necessarily so). Seeing it described in that article as if clarity
must be paid for with efficiency (and vice versa), didn't seem right to me.


Cheers,
 


 ___
 Haskell-Cafe mailing list
 Haskell-Cafe at haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe
 




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Simple FAST lazy functional primes

2009-11-03 Thread Will Ness

Hi Steve,

Steve stevech1097 at yahoo.com.au writes:

 
 Hi Will,
 
 I had previously tested the Melissa O'Neil prime function (using the
 primes 0.1.1 package) and found it to be fast (for a functional
 approach), but with high memory use.
 
 To fully test a prime function, I think you should:
 1. Test for more than the first 10^6 primes.
 Generating all primes to 10^6 is quite easy for modern computers. Most
 prime generating functions will run in less than 1 sec and look fast. 
 Try generating all primes to 10^7 and 10^8 then you will see how 'fast'
 these lazy functional methods really are.
 2. Measure memory use.
 As you move above 10^6 and test primes up to 10^7, 10^8, and 10^9,
 memory use becomes very important. A prime function with excessive
 memory use will soon consume all of the system's memory.
 
 Here's another primes function which performs quite well.
 
 primes :: [Int]
 primes = 2 : filter (isPrime primes) [3,5..]  where
   isPrime (p:ps) n
 | mod n p == 0 = False
 | p*p  n  = True
 | otherwise= isPrime ps n
   isPrime [] _ = False -- never used, but avoids compiler warning
 
 Here's some results from my PC, for generating primes to 10^8.
 
  10**6  10**7   10**8
   secs MiB   secs MiB   secs  MiB
   ---
 1 0.01   00.1   2  2   14
 2 0.56   7   11.1  43270  306
 3 0.61   7   11.8  44260  342
 4 0.43  365.4 345900 not finished
 
 1 using a Haskell ST Array
 2 your primes function
 3 my primes function, listed above
 4 Melissa O'Neil method from primes 0.1.1 package
 
 To summarise the results from the tests I've run:
 - if you want fast functional primes, forget it, its not possible.
 - if you just want fast primes, then use C, or a Haskell array.
 - if performance does not matter and slow primes are acceptable, then
 use the purely functional approach.
 
 Regards,
 Steve
 


you just have a fast PC that's all, :) so a million is not enough for you. My
old laptop is 50 times slower. :)

Seriously though, your conclusions seem entirely plausible to me. My goal here
was to have a Haskell code for primes, that is decent. Nothing more.

The reason your code is slightly slower is of course that in effect it
recalculates the needed primes prefix for each new candidate. If you could
somehow thread its length through, it might have sped it up some more.

I've just tried it and it was twice slower than mine. (?) I didn't use the [Int]
signature in both. Maybe it's not real issues we're dealing with here, but
compiler/OS/CPU issues? (or have you've forgotten to put the [Int] signature
into my code too, when tested? It runs twice as fast with it). Although your
code has an advantage that it is very easy to add the wheel optimization to it.

BTW I don't know about the code in the package, but the one in the article makes
terrible faux-pas of adding each prime into the queue as soon as it is produced;
this could easily account for a memory blow-up you're seeing. What's really
needed, is to plug a decent PQ implementation into my framework, which does
absolute minimum of repeated calculations it seems.

What I have now, is this:

 qprimes   = 2: 3: sieve emptyQ primes' 5  where
  primes'  = tail qprimes
  sieve q (p:ps) x  
   = h ++ sieve (addQ q' (2*p) (p*p+2*p)) ps (p*p+2)  
 where
   (h,q') = noComps q [x,x+2..p*p-2] 
  ..

The main deficiency of list-based sieves, as I've now came to realize and
formulate in simple enough terms for myself to understand, is that they work
with each number in isolation, and thus must test primes as well as composites.
Testing composites on average is cheap, because most of them will have small
factors; testing primes is costly.

Imperative sieves avoid that by working over spans of numbers at once, so they
get their primes for free, when they see gaps in produced/marked composites (I
repeat myself here, but am not sure if you've read this my explanation in other
posts). What counts here, is the direct access to random memory - or the numbers
written out on a blackboard. Melissa's PQ approach tries to emulate that. Not
crossing them off, but seeing the gaps in between.

One is tempted to treat Haskell as high-level executable definition, hoping for
a compiler to turn it into the fastest possible low-level code. I know I can
translate it into C by hand fairly well; why wouldn't the compiler? It seems
Haskell compilers have much room for improvement. :)





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Simple FAST lazy functional primes

2009-11-02 Thread Will Ness
Sjoerd Visscher sjoerd at w3future.com writes:

 [...] 2 doesn't have to be in the list of smaller primes, as  
 we're only generating odd numbers:
 
 primes = 2 : 3 : 5 : 7 : sieve [3] (drop 2 primes)
 sieve qs@(q:_) (p:ps)
 = [x | x-[q*q+2,q*q+4..p*p-2], and [(x`rem`p)/=0 | p-qs]]
   ++ sieve (p:qs) ps
 
 Sjoerd

Thanks!

I haven't tested your code's speed yet, but have few points:

i wouldn't expect eliminating a parameter to have any effect on performance in
compiled code (I haven't a clue whether -O2 or -O3 should be used BTW) if it
doesn't eliminate any superfluous computations.

second, you prepend bigger primes in front of a list (building the prefix in
reverse order); this will cause them to be tested against first. I think (1)
we're supposed to test composites against smaller factors first, for the tests
to stop sooner. IMO it'll slow the code down. I think (2) you can safely append
them at the end instead, for a compiler should be able to just maintain a tail
pointer there. But then you loose immediate access to the last found prime; so
will need to return the 'x' parameter back. Then you end up with my code exactly
:) (only replicating the prefix):

primes = 2 : 3 : 5 : 7 : sieve [3] (drop 2 primes) 9
sieve pfx (p:ps) x
   = [x | x-[x+2,x+4..p*p-2], and [(x`rem`p)/=0 | p-pfx]]
 ++ sieve (pfx++[p]) ps (p*p)

it'll be interesting to test my hypothesis (two of them :) ) and see if this has
in fact better time than your code.

thirdly, (take k) could be compiled out completely into a tail pointer too,
maintained and advanced after each step. I would hope a smart compiler to do
just that. Again, some more testing is in order. Although, I tested the two
approaches on some previous incarnation of this code, and the (take k) vs
(pfx++[p]) had exactly same run time (or was better).

What I'm after mostly here, is for the code to be clear and simple, and
reasonably efficient. Right now it corresponds very nicely to the description of
the sieve as filtering all the odds testable by a known prefix of primes, and
then going on to proceed with the next batch of them with a prefix that was
grown by one more prime. And so on.

But more importantly I want it to be known that there's a lot that can be done
here, in a natural functional lazy kind of way, before resorting to priority
queues and mutable arrays. We could always just use C too. ;)

Thanks for the feedback! :)


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Simple FAST lazy functional primes

2009-11-02 Thread Will Ness
Will Ness will_n48 at yahoo.com writes:

 But more importantly I want it to be known that there's a lot that can be done
 here, in a natural functional lazy kind of way, before resorting to priority
 queues and mutable arrays. We could always just use C too. ;)

I mean it as an introductory code that's nevertheless good for producing the
first million primes or so - not the best sieve ever, but the best (is it?)
simple clear functional introductory sieve, instead. That's another reason to
using (take k primes') - it practically says it in English, the first k odd
primes.

:)



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Simple FAST lazy functional primes

2009-11-02 Thread Will Ness
Sjoerd Visscher sjoerd at w3future.com writes:

 
 Excuse me, 2 doesn't have to be in the list of smaller primes, as  
 we're only generating odd numbers:
 
 primes = 2 : 3 : 5 : 7 : sieve [3] (drop 2 primes)
 sieve qs@(q:_) (p:ps)
 = [x | x-[q*q+2,q*q+4..p*p-2], and [(x`rem`p)/=0 | p-qs]]
   ++ sieve (p:qs) ps
 
 Sjoerd
 

Hi,

I've run it now. In producing 100,000 primes, your above code takes x3.5 more
time than the one I posted. The code modified as I suggested with (qs++[p])
takes exactly the same time as mine. :)

Cheers,


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Base classes can be _ELIMINATED_ with interfaces

2009-11-02 Thread Will Ness
Shelby Moore shelby at coolpage.com writes:

 
 * 1856 Thermo Law: entire universe (a closed system, i.e. everything)
 trends to maximum disorder.
 


On the very, *very*, VERY long timescale.

In the meantime, chaos creates clashes of matter, which cause local energy
outbursts (i.e. galaxies), which pump their immediate surroundings, where
natural selection in presence of energy influx leads to increasing complexity.

To persist for a long, *long*, LONG time.



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Simple FAST lazy functional primes

2009-11-02 Thread Will Ness
Will Ness will_n48 at yahoo.com writes:

 primes   = 2: 3: sieve 0 primes' 5
 primes'  = tail primes
 sieve k (p:ps) x
  = [x | x - [x,x+2..p*p-2], 
  and [(x`rem`p)/=0 | p - take k primes']]
++ sieve (k+1) ps (p*p+2)
 
 (thanks to Leon P.Smith for his brilliant idea of directly generating
 the spans of odds instead of calling 'span' on the infinite odds list).
 
 The relative performance vs the PQ-version is:
 
   100,000   300,000   1,000,000  primes produced
0.6   0.75  0.97
 

One _crucial_ tidbit I've left out: _type_signature_.

Adding (:: [Int]) speeds this code up more than TWICE!

:) :)

'sieve' can also be used in e.g.

primesFrom m = sieve (length h) ps m where
  (h,(_:ps)) = span (= (floor.sqrt.fromIntegral) m) primes

to get few big primes even faster.

:)


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Simple FAST lazy functional primes

2009-11-02 Thread Will Ness
Jason Dagit dagit at codersbase.com writes:

 On Mon, Nov 2, 2009 at 1:59 PM, Will Ness will_n48 at yahoo.com wrote:
 Will Ness will_n48 at yahoo.com writes:
 
 One _crucial_ tidbit I've left out: _type_signature_.
 Adding (:: [Int]) speeds this code up more than TWICE!
 :) :)
 
 
 If you are okay with Int, then maybe you're also happy with Int32 or Word32.
 If so, why don't you use template haskell to build the list at compile time?
 If you do that, then getting the kth prime at run-time is O(k).  Take that
 AKS!  :)
 


O(k), I've removed it since the post actually. Wasn't thinking clearly for a
moment, having seen the double speedup!

I've found Matthew Brecknell's fast code in old Melissa O'Neill's article here
from 2007-02-19 18:14:23 GMT. Without the type signature, it's twice slower than
my code. 

I think it is a fairly faithful translation now of what the sieve is all about,
except that it tests its candidate numbers whereas sieve counts over them (and
thus is able to skip over primes without testing them). The usual functional
approach has it working with each number in isolation, so tests it (to recreate
counting state in effect), thus overworks much on primes.

Next logical step is to start counting!


:)



 Jason
 
 
 
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe at haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe
 




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Simple FAST lazy functional primes

2009-11-01 Thread Will Ness

First, here it is:

primes   = 2: 3: sieve 0 primes' 5
primes'  = tail primes
sieve k (p:ps) x
 = [x | x-[x,x+2..p*p-2], and [(x`rem`p)/=0 | p-take k primes']]
   ++ sieve (k+1) ps (p*p+2)

(thanks to Leon P.Smith for his brilliant idea of directly generating
the spans of odds instead of calling 'span' on the infinite odds list).

This code is faster than PriorityQueue-based sieve (granted, using my own
ad-hoc implementation, so YMMV) in producing the first million primes
(testing  was done running the ghc -O3 compiled code inside GHCi).
The relative performance vs the PQ-version is:

  100,000   300,000   1,000,000  primes produced
   0.6   0.75  0.97

I've recently came to the Melissa O'Neill's article (I know, I know) and
she makes the incredible claim there that the square-of-prime optimization
doesn't count if we want to improve the old and lazy

uprimes = 2: sieve [3,5..] where
  sieve (p:xs) = p : sieve [x | x - xs, x `mod` p  0]

Her article gave me the strong impression that she claims that the only way
to fix this code is by using Priority Queue based optimization, and then
goes on to present astronomical gains in speed by implementing it.

Well, I find this claim incredible. First of all, the naive code fires up
its nested filters much too early, when in fact each must be started only
when the prime's square is reached (not only have its work started there -
be started there itself!), so that filters are delayed:

dprimes  = 2: 3: sieve (tail dprimes) [5,7..]  where
  sieve (p:ps) xs
 = h ++ sieve ps (filter ((/=0).(`rem`p)) (tail t))
   where (h,t)=span ( p*p) xs

This code right there is on par with the PQ-base code, only x3-4 slower at
generating the first million primes.

Second, the implicit control structure of linearly nested filters, piling up
in front of the supply stream, each with its prime-number-it-filters-by
hidden inside it, needs to be explicated into a data structure of
primes-to-filter-by (which is in fact the prefix of the primes list
we're building itself), so the filtering could be done by one function call
instead of many nested calls:

xprimes  = 2: 3: sieve 0 primes' [5,7..]  where
  primes' = tail xprimes
  sieve k (p:ps) xs
   = noDivs k h ++ sieve (k+1) ps (tail t)
 where (h,t)=span ( p*p) xs
  noDivs k = filter (\x- all ((/=0).(x`rem`)) (take k primes'))

From here, using the brilliant idea of Leon P. Smith's of fusing the span
and the infinite odds supply, we arrive at the final version,

kprimes  = 2: 3: sieve 0 primes' 5  where
  primes' = tail kprimes
  sieve k (p:ps) x
   = noDivs k h ++ sieve (k+1) ps (t+2)
 where (h,t)=([x,x+2..t-2], p*p)
  noDivs k = filter (\x- all ((/=0).(x`rem`)) (take k primes'))

Using the list comprehension syntax, it turns out, when compiled with
ghc -O3, gives it another 5-10% speedup itself.

So I take it to disprove the central premise of the article, and to show
that simple lazy functional FAST primes code does in fact exist, and
that the PQ optimization - which value of course no-one can dispute - is
a far-end optimization.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: [Haskell-beginners] map question

2009-10-20 Thread Will Ness
Jason Dagit dagit at codersbase.com writes:

 On Mon, Oct 19, 2009 at 5:53 PM, Will Ness will_n48 at yahoo.com wrote:
 
 You think of functions, where domain matters (for purists?). In syntax 
 only the result matter, does it read? Does it have an intended meaning?
 How is it a mistake if it expresses what I intended?
 Both 3 `-` 2 and curry fst `foldl` 0 are exactly the same - expressions with
 infix operator, read in the same way, interpreted in the same way. In 
 the first case the backticks are made superfluous by Haskell reader for 
 our convinience; but they shouldn't be made illegal. Why should they be? 
 

 Don't you mean 3 `(-)` 2?  I'm pretty sure -, without the parens is infix and 
 (-) is prefix.  So it seems to me that you need the brackets for this to be 
 consistent.Jason

You absolutely right, in current syntax that also would only be consistent, yet 
is illegal also.

But I propose to augment the syntax by allowing symbolic ops in backticks to 
stand for themselves.

When I see `op`, for me, it says: infix op. So `+` would also say, infix +. (`-
` 2) would finally become possible. It would read: treat - as infix binary and 
make a flip section out of it. Just as it does for an alphanumeric identifier 
in (`op` 2).

Without backticks, symbolic ops are also treated as infix by default, but 
that's just convinience.

Anyway I guess all the points in this discussion have been made, and it's just 
a matter of taste. 


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: [Haskell-beginners] map question

2009-10-19 Thread Will Ness
wren ng thornton wren at freegeek.org writes:

 
 Will Ness wrote:
 
  (`foldl`2) works.
  
  (`-`2) should too.
 
 The `` syntax is for converting lexical identifiers into infix 
 operators. Symbolic identifiers are already infix, which is why `` 


So it would be a no-op then. Why make it illegal? Just because it makes writing 
the scanner easier is no answer.

 doesn't work for them. If we introduced this then those striving for 
 consistency would be right in requesting that this pattern be allowed 
 for all symbolic operators. I for one am opposed to introducing 
 superfluous syntax for duplicating the current ability to write things 
 in the same ways.


This syntax already exists. The '`' symbol is non-collating already, so using 
it for symbol chars doesn't change anything (it's not that it can be a part of 
some name, right?). To turn an infix op into an infix op is an id operation, 
made illegal artificially at the scan phase after a successful lex (or 
whatever).

Finally enabling the missing functionality which is a common stumbling block 
for every beginner is hardly duplicating.

 Attack the underlying problem, don't introduce hacks to cover up broken 
 hacks. This isn't C++.


The underlying problem is a broken scanner where it can't distinguish between a 
binary op and a number read syntax. Op names are collated symbol chars, and one 
of the symbols, -, is also a valid number prefix. So, allow for a clues from 
programmer to detach it from the number: backticks separate it from the 
following numeric chars, preventing it from sticking to them. And by itself, 
it forms an op, a binary one.

Not a hack, a solution. A consistent one. Look:

  (`foldl` 0)
  (`-` 2)

Don't they look exactly the same?

Why wouldn't it be made legal? Show me one inconsistency it introduces.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: [Haskell-beginners] map question

2009-10-19 Thread Will Ness
Tom Tobin korpios at korpios.com writes:

 On Mon, Oct 19, 2009 at 5:34 PM, Will Ness will_n48 at yahoo.com wrote:
  This syntax already exists. The '`' symbol is non-collating already, so 
  using it for symbol chars doesn't change anything (it's not that it 
  can be a part of some name, right?). To turn an infix op into an infix op 
  is an id operation, made illegal artificially at the scan phase after a 
  successful lex (or whatever).
 
 If I've accidentally applied syntax meant for a prefix operator to an
 infix operator, *I want the compiler to tell me*, and not to silently
 accept my mistake.

You don't apply sytax, you write it.

You think of functions, where domain matters (for purists?). In syntax only the 
result matter, does it read? Does it have an intended meaning? 

How is it a mistake if it expresses what I intended?

Both 3 `-` 2 and curry fst `foldl` 0 are exactly the same - expressions with 
infix operator, read in the same way, interpreted in the same way. In the first 
case the backticks are made superfluous by Haskell reader for our convinience; 
but they shouldn't be made illegal. Why should they be? I truly don't 
understand the resistance to this idea. :)


  Why wouldn't it be made legal? Show me one inconsistency it introduces.
 
 You've said that you want to be able to do this for the sole case of
 the - (minus-sign) operator:

This is not an inconsistence. 

Plus, if we were to take this idea of using backticks as names delimeters to 
the extreme, it could even allow us to use such identifiers as `left-fold` or 
`right-fold` in infix position, and (`left-fold`) by itself. Although that 
seems not such a good idea.


  Operators are great because they make our intent visible, immediately
  apparent. Long words' meaning, like subtract's, is not immediately apparent,
  and they break consistency. Not everyone's first language in life was 
  English, you see.
 
 I don't buy this rationale.  Haskell has plenty of English words as
 function names all over the place; if you can't handle subtract, how
 are you handling Haskell at all?  Sure, the minus-sign issue is a
 wart, but it's less awkward than the solution you propose for a
 problem I doubt you really have.  

When I see `++` I don't need to think _at_all_. When I see `concatenate` or 
some such, I do - even if for a briefest of moments. It is _less_ convinient 
both to read and _write_, don't you agree? 

I don't see my proposal as awkward at all. On the contrary, to me it looks 
natural and consistent with the other uses of this device in the language. It 
is this asymmetry that bothers me with the (-) issue, I just want the balance 
restored. But it is a matter of taste of course. Or obsessing over minutiae. :)

Oh well.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: [Haskell-beginners] map question

2009-10-18 Thread Will Ness
Gregory Propf gregorypropf at yahoo.com writes:

 
 
 I actually meant it as sort of a joke but maybe it's not after all.  

Seriously though, using anything non-ASCII in source code is a bad idea, 
because there are lots of fonts and editors in the world.

It seems natural to me to have (`-`2) stand for (flip (-) 2), if only that 
would be made legal syntax, just as (`foldl`0) stands for (flip (foldl) 0).

Supposedly there is no reason to write (`:`[]) since : is already an infix 
operator, but making it a no-op wouldn't hurt, and would give us a benefit of 
being able finally to write the binary-minus flip-section in a visually 
apparent way.



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: [Haskell-beginners] map question

2009-10-18 Thread Will Ness
Luke Palmer lrpalmer at gmail.com writes:

 
 Or you could use the subtract function.
 
map (subtract 2) [3,4,5]
   [1,2,3]

I don't want to.

 
 I don't think syntax sugar is worth it in this case.


I do. Operators are great because they make our intent visible, immediately 
apparent. Long words' meaning, like subtract's, is not immediately apparent, 
and they break consistency. Not everyone's first language in life was English, 
you see.

(`foldl`2) works.

(`-`2) should too.

I'll settle for (+(-2)) for now, but it ain't that pretty.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe