Re: [Haskell-cafe] V.I.P.s and the associativity of merge'
Heinrich Apfelmus apfelmus at quantentunnel.de writes: Will Ness wrote: Heinrich Apfelmus writes: Here an example where the VIP merge would give a different result bad = tfold $ (1:10:undefined) : (2:3:5:undefined) : (4:undefined) : error bad The reason to *not* have the lazy patterns in foldTree for primes, as Daniel Fischer discovered back then, is that they give it a space leak. Case in point - http://ideone.com/DLHp2 : [...] tfold ((x:xs):t) = x : xs `merge` tfold (pairs t) where pairs ((x:xs):ys:t) = (x : merge xs ys) : pairs t hfold xs = serve . foldTree mergeP . map vip $ xs hfold' xs = serve . foldTree' mergeP . map vip $ xs foldTree f ~(x:xs) = x `f` foldTree f (pairs xs) where pairs ~(x: ~(y:ys)) = f x y : pairs ys foldTree' f (x:xs) = x `f` foldTree' f (pairs xs) where pairs (x: (y:ys)) = f x y : pairs ys [...] so hfold' too appears to be over-eager to-the-right, although still more productive than tfold. Ah, the lazy patterns in foldTree are a different issue, sorry for my bad choice of example. While I still don't understand the trade-off that turns the lazy patterns into a space leak, there is no harm in allowing foldTree to see the complete spine of the list. What we do want to forbid is looking at the elements of that list too early. This turns out to be too optimistic a demand on data, in general. In other words, the example should read bad = tfold $ (1:10:undefined) : (2:3:5:undefined) : (4:undefined) : repeat (error bad : undefined) i.e. the previously unknown tail is replaced with an infinite list of undefined elements. This example can properly distinguish between the not-so-correct tfold and proper VIP implementations (or other implementations that don't do unnecessary comparisons). will have to think this over, but in the mean time, they *both* turn out to be *completely* and utterly *wrong* :) in a general case (although probably for different reasons). Here's where: *Main mapM_ print $ take 5 $ map (take 10) [concatMap (replicate 3) [n,n+1..]|n-[1..]] [1,1,1,2,2,2,3,3,3,4] [2,2,2,3,3,3,4,4,4,5] [3,3,3,4,4,4,5,5,5,6] [4,4,4,5,5,5,6,6,6,7] [5,5,5,6,6,6,7,7,7,8] *Main take 20 $ hfold [concatMap (replicate 3) [n,n+1..]|n-[1..]] [1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7] *Main take 20 $ tfold [concatMap (replicate 3) [n,n+1..]|n-[1..]] [1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7] when it should'a been [1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,4,4] Cheers, :) Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] V.I.P.s and the associativity of merge'
Will Ness will_n48 at yahoo.com writes: ... they *both* turn out to be *completely* and utterly *wrong* :) in a general case (although probably for different reasons). Sorry, my bad. Thought in terms of merge, but the definiton used in VIP code was really an union. When definition was changed to a real merge, non-removing of duplicates, everything was as expected in that case, for both versions. *Main take 20 $ hfold [concatMap (replicate 3) [n,n+1..]|n-[1..]] *Main take 20 $ tfold [concatMap (replicate 3) [n,n+1..]|n-[1..]] [1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,4,4] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] V.I.P.s and the associativity of merge'
Heinrich Apfelmus apfelmus at quantentunnel.de writes: Leon Smith wrote: [1] http://permalink.gmane.org/gmane.comp.lang.haskell.cafe/84666 [2] http://apfelmus.nfshost.com/articles/implicit-heaps.html [3] http://hackage.haskell.org/packages/archive/data-ordlist/0.4.4/doc/html/Data- List-Ordered.html#v:mergeAll Will Ness primes = 2: primes' where primes' = 3: 5: [7,9..] `minus` tfold [ [p*p,p*p+2*p..] | p - primes' ] tfold ((x:xs):t)= x : xs `union` tfold (pairs t) pairs ((x:xs):ys:t) = (x: union xs ys) : pairs t Unfortunately, it turns out that this program is clear, shorter ... and subtly wrong. :) Here an example where the VIP merge would give a different result bad = tfold $ (1:10:undefined) : (2:3:5:undefined) : (4:undefined) : error bad We have ghci bad [1,2*** Exception: bad but the VIP version would give at least ghci bad [1,2,3,4*** Exception: bad / Prelude: undefined The reason to *not* have the lazy patterns in foldTree for primes, as Daniel Fischer discovered back then, is that they give it a space leak. Case in point - http://ideone.com/DLHp2 : - 1M primes: 2M primes: --- 3M: --- ideone #: - no-VIPs: smart fold: 1.90s- 4.8MB 4.42s- 4.8MB 7.40s- 4.8MB r3bdL - VIPs: smart fold: 1.95s- 4.8MB 4.53s- 4.8MB 7.45s- 4.8MB 4ACpe simple fold: 2.04s- 4.8MB 4.76s- 4.8MB 7.86s- 4.8MB av9XR lazy pats: 2.44s-20.1MB 5.70s-21.1MB 9.85s-42.6MB DLHp2 Also, having tfold ((x:xs):t) = x : xs `merge` tfold (pairs t) where pairs ((x:xs):ys:t) = (x : merge xs ys) : pairs t hfold xs = serve . foldTree mergeP . map vip $ xs hfold' xs = serve . foldTree' mergeP . map vip $ xs foldTree f ~(x:xs) = x `f` foldTree f (pairs xs) where pairs ~(x: ~(y:ys)) = f x y : pairs ys foldTree' f (x:xs) = x `f` foldTree' f (pairs xs) where pairs (x: (y:ys)) = f x y : pairs ys and bad = (1:10:error 1) : (2:3:5:error 2) : (4:error 4) : error bad bad2 = (1:10:error 1) : (2:3:5:error 2) : (4:error 4) : (5:error 5) : (6:error 6) : (7:error 7) : error bad2 we get *Main hfold bad [1,2,3,4*** Exception: bad *Main hfold' bad [1,2,3,4*** Exception: bad *Main tfold bad [1,2*** Exception: bad *Main hfold bad2 [1,2,3,4*** Exception: 4 *Main hfold' bad2 [1,2,3,4*** Exception: bad2 *Main tfold bad2 [1,2*** Exception: bad2 so hfold' too appears to be over-eager to-the-right, although still more productive than tfold. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] New simplified primes; no VIP necessary.
Hi, For those who remember the discussion about this about a year ago, it turns out there was a simpler version after all lurking somewhere in there (or is it _out_?). I've just posted it to the haskellwiki's Prime Numbers page: primes = 2: primes' where primes' = 3: 5: [7,9..] `minus` tfold [ [p*p,p*p+2*p..] | p - primes' ] tfold ((x:xs):t)= x : xs `union` tfold (pairs t) pairs ((x:xs):ys:t) = (x: union xs ys) : pairs t The full code with double-feed, wheel and better folding structure is also there. Speed and complexity weren't changed by this. But it is rather _short_. :) Cheers, ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Why no merge and listDiff?
Derek Elkins derek.a.elkins at gmail.com writes: On Wed, Jan 20, 2010 at 9:42 AM, Will Ness will_n48 at yahoo.com wrote: Derek Elkins derek.a.elkins at gmail.com writes: On Sun, Jan 17, 2010 at 2:22 PM, Will Ness will_n48 at yahoo.com wrote: Hello cafe, I wonder, if we have List.insert and List.union, why no List.merge (:: Ord a = [a] - [a] - [a]) and no List.minus ? These seem to be pretty general operations. You probably also want to look at the package data-ordlist on hackage (http://hackage.haskell.org/packages/archive/data- ordlist/0.0.1/doc/html/Data- OrdList.html) which represents sets and bags as ordered lists and has all of the operations you mention. I did, thanks again! Although, that package deals with non-decreasing lists, i.e. lists with multiples possibly. As such, its operations produce non- decreasing lists, i.e. possibly having multiples too. It is clear that some of the operations are guaranteed to produce sets given sets. The documentation could be better in this regard though. The 'union' and 'minus' functions of ordlist meet this requirement if you satisfy the preconditions. Yes, thanks, it's exactly what I was looking for. I've recognized from the code that `minus' was OK, but `merge' was different. As it turns out, OrdList.union is exactly what I have under `merge'. Better (or any at all really) documentation for Data.OrdList would be a big help. I don't know if it's at all easy to separate Sets and Bags, though it may seem desirable. I seem to have read something about Circle/Ellipse problem, i.e. the Sets/Bags problem which are not easily detachable from one another? Although I don't know the details of that. The background for this is my attempts to classify the various primes- generating code variants. Apparently, the essense of sieve is the composites removal, and both composites and natural numbers are naturally represented as strictly increasing lists. Same with merging the lists of multiples of each prime to construct the composites. I had to provide the `minus' and `merge' definitions along with the actual code and searched for something standard. You can check it out on the Haskellwiki Prime Numbers page (work still in progress, the comparison tables are missing). We had also a recent thread here in cafe under FASTER primes. The original idea of Heinrich Apfelmus of treefold merging the composites really panned out. I found a little bit better structure for the folding tree, and Daniel Fischer was a great help in fixing the space leaks there (two of them) so that now the resulting code, with wheel optimization, runs very close to the PQ-based O'Neill's sieve (actually faster than it if interpreted in GHCi). More importantly (?) there's a natural progression of code now, straight from the classic Turner's sieve, so it's not an ad-hoc thing anymore. It also became apparent that the essence of prime wheels is Euler's sieve. And vice versa. :) Thanks a lot for all the help from all the posters! Cheers, ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Why no merge and listDiff?
Heinrich Apfelmus apfelmus at quantentunnel.de writes: Will Ness wrote: You can check it out on the Haskellwiki Prime Numbers page (work still in progress, the comparison tables are missing). We had also a recent thread here in cafe under FASTER primes. The original idea of Heinrich Apfelmus of treefold merging the composites really panned out. (Just for historical reference, credit for the data structure that works with infinite merges goes to Dave Bayer, I merely contributed the mnemonic aid of interpreting it in terms of VIPs.) Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com yes, yes, my bad. GMANE is very unreliable at presenting the discussion threads in full. :| I saw it first on the Haskellwiki page though, and it was your code there, that's the reason for my mistake. :) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Why no merge and listDiff?
Derek Elkins derek.a.elkins at gmail.com writes: On Sun, Jan 17, 2010 at 2:22 PM, Will Ness will_n48 at yahoo.com wrote: Hello cafe, I wonder, if we have List.insert and List.union, why no List.merge (:: Ord a = [a] - [a] - [a]) and no List.minus ? These seem to be pretty general operations. Presumably by List.minus you mean the (\\) function in Data.List. No, it has to search its second list over and over from the start, to be able to deal with unordered lists, so its performance can't be good. You probably also want to look at the package data-ordlist on hackage (http://hackage.haskell.org/packages/archive/data-ordlist/0.0.1/doc/html/Data- OrdList.html) which represents sets and bags as ordered lists and has all of the operations you mention. I did, thanks again! Although, that package deals with non-decreasing lists, i.e. lists with multiples possibly. As such, its operations produce non- decreasing lists, i.e. possibly having multiples too. I meant strictly increasing ordered lists, without multiples, for which the two operations, 'merge' and 'minus', would also have to produce like lists, i.e strictly increasing, without multiples. I guess the first variety is more appropriate for bags, and the second one - for sets. The two would have to be de-conflated for that. (?) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Why no merge and listDiff?
Christian Maeder Christian.Maeder at dfki.de writes: Will Ness schrieb: I meant strictly increasing ordered lists, without multiples, for which the two operations, 'merge' and 'minus', would also have to produce like lists, i.e strictly increasing, without multiples. Why don't you use directly Data.Set? It says it's based on size balanced Trees? I initially wondered why no such fundamental operations as merge and minus for _lists_, in the stadard libraries? Also, its to/from list conversions are O(n), so probably won't work for infinite lists. I was told the trend is to move specifics to hackage packages, but I wonder why shouldn't such fundamental operations be just included in standard Data.List? I guess the first variety is more appropriate for bags, and the second one - for sets. The two would have to be de-conflated for that. (?) There are also bags aka multisets: http://hackage.haskell.org/package/multiset it's too seems to be based on trees. Data.Ordlist seems to be a good match, except for its conflation of ascending/non-decreasing lists under one ordered category (i.e. sets/bags distinction). ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Why no merge and listDiff?
Hello cafe, I wonder, if we have List.insert and List.union, why no List.merge (:: Ord a = [a] - [a] - [a]) and no List.minus ? These seem to be pretty general operations. Brief look into haskell-prime-report/list.html reveals nothing. Could we please have them? On the wider perspective, is their a way to declare an /ordered/ list on the type level (e.g. [1,2,3] would be one, but not [2,3,1])? Non-decreasing lists? Cyclical, or of certain length? What are such types called? TIA! ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes
Daniel Fischer daniel.is.fischer at web.de writes: Am Donnerstag 14 Januar 2010 08:25:48 schrieb Will Ness: Daniel Fischer daniel.is.fischer at web.de writes: Am Mittwoch 13 Januar 2010 10:43:42 schrieb Heinrich Apfelmus: I wonder whether it's really the liveness of pair in mergeSP (a,b) pair = let sm = spMerge b (fst pair) in (a ++ fst sm, merge (snd sm) (snd pair)) that is responsible for the space leak, ... I think that is responsible. At least that's how I understand the core: mergeSP (a,b) ~(c,d) = (a ++ bc, merge b' d) where (bc, b') = spMerge b c spMerge ... That is equivalent to first (a++) . second (`merge`d) $ spMerge b c and Daniel's fix is equivalent to first (a++) $ spMerge b c d That should've been mergeSP (a,b) p = first(a++) . second(`merge`snd p) $ spMerge b (fst p) and mergeSP (a,b) p = first(a++) $ spMerge b (fst p) (snd p) The code fragments you've posted are essentially mergeSP (a,b) p = let res = case p of (c,_) - case spMerge b c of (# x,y #) - (x,y) in (# (++) a (case res of (bc,_)- bc) , case res of (_,b') - case p of (_,d) - merge b' d #) and mergeSP (a,b) p = let res = case p of (c,d) - case spMerge b c d of (# x,y #) - (x,y) in (# (++) a (case res of (bc,_)- bc) , case res of (_,b') - b' #) This looks like Haskell to me, with many detailes explicitely written out, probaly serving as immediate input to the compiler - not its output. So it can't say to us much about how this is actually implemented on the lower level. (?) Your theory would certainly hold if the translation was done one-to-one without any further code rearrangements. But it could've been further re-arranged by the compiler at some later stage (is there one?) into an equivalent of, e.g. mergeSP (a,b) p = let (bc,b') = case p of (c,_) - case spMerge b c of (x,y) - (x,y) in (# (++) a bc , case p of (_,d) - merge b' d #) and further, mergeSP (a,b) p = let (c,d) = case p of (x,y) - (x,y) (bc,b') = case spMerge b c of (x,y) - (x,y) in (# (++) a bc , merge b' d #) could it? This would take hold on /d/ and /c/ at the same time, right? What is that code that you've shown exactly? At which stage is it produced and is it subject to any further manipulation? I apologise if these are obvious questions, I don't know anything about GHC. I also don't know what (# x,y #) means? One thing seems certain - we should not hold explicit references to same entities in different parts of our code, to avoid space leaks with more confidence. To make code look as much tail-recursive as possible, so to speak. Does that make sense? Anyway that was a very educational (for me) and fruitful discussion, and I greatly appreciate your help, and fixing and improving of the code. Thanks! ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes
Daniel Fischer daniel.is.fischer at web.de writes: Am Mittwoch 13 Januar 2010 10:43:42 schrieb Heinrich Apfelmus: I wonder whether it's really the liveness of pair in mergeSP (a,b) pair = let sm = spMerge b (fst pair) in (a ++ fst sm, merge (snd sm) (snd pair)) that is responsible for the space leak, for chances are that Sparud's technique applies and pair is properly disposed of. Rather, it could be that we need the stronger property that forcing the second component will evaluate the first to NF. I think that is responsible. At least that's how I understand the core: mergeSP (a,b) ~(c,d) = (a ++ bc, merge b' d) where (bc, b') = spMerge b c spMerge ... That is equivalent to first (a++) . second (`merge`d) $ spMerge b c and Daniel's fix is equivalent to first (a++) $ spMerge b c d Now, when compiler sees the first variant, it probably treats spMerge as opaque. I.e. although in reality spMerge only contributes to the first channel while it is progressively instantiated, and (`merge`d) will only be called upon when spMerge's final clause is reached, that is (most likely) not known to the compiler at this stage. When looking at just the first expression itself, it has to assume that spMerge may contribute to both channels (parts of a pair) while working, and so can't know _when_ /d/ will get called upon to contribute to the data, as it is consumed finally at access. So /d/ is gotten hold of prematurely, _before_ going into spMerge. The second variant passes the responsibility for actually accessing its inputs to spMerge itself, and _it_ is clear about needing /d/ only in the very end. Just a theory. :) Does that make sense? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes
Daniel Fischer daniel.is.fischer at web.de writes: Am Freitag 08 Januar 2010 19:45:47 schrieb Will Ness: Daniel Fischer daniel.is.fischer at web.de writes: mergeSP :: Integral a = People a - People a - People a mergeSP p1@(P a _) p2 = P (a ++ vips mrgd) (dorks mrgd) where mrgd = spMerge (dorks p1) (vips p2) (dorks p2) spMerge l1 [] l3 = P [] (merge l1 l3) spMerge ~l1@(x:xs) l2@(y:ys) l3 = case compare x y of LT - celebrate x (spMerge xs l2 l3) EQ - celebrate x (spMerge xs ys l3) GT - celebrate y (spMerge l1 ys l3) -- Actually, the minimal edit that does the trick (of eliminating the space leak that you've identified) for my original code is just mergeSP (a,b) ~(c,d) = let (bc,bd) = spMerge b c d in (a ++ bc, bd) where spMerge b [] d = ([] ,merge b d) spMerge b@(x:xs) c@(y:ys) d = case compare x y of LT - (x:u,v) where (u,v) = spMerge xs c d EQ - (x:u,v) where (u,v) = spMerge xs ys d GT - (y:u,v) where (u,v) = spMerge b ys d spMerge [] c d = ([] ,merge c d) which hardly looks at all different at the first glance. Just for reference, it was {- mergeSP (a,b) ~(c,d) = let (bc,b') = spMerge b c in (a ++ bc, merge b' d) where spMerge b@(x:xs) c@(y:ys) = case compare x y of LT - (x:u,v) where (u,v) = spMerge xs c EQ - (x:u,v) where (u,v) = spMerge xs ys GT - (y:u,v) where (u,v) = spMerge b ys spMerge b [] = ([] ,b) spMerge [] c = ([] ,c) -} spMerge of course is not tail recursive here in both versions if seen through the imperative eyes. But lazy evaluation makes it effectively so. The important thing is, when the final point is reached, there's no outstanding context - everything is present. There should be a name for such concept. This is very similar to late instantiation in Prolog (programming with holes), and I think this *would* pass as a tail-recursive function /there/. Even in the new code the compiler could've internally held on to the original pair and only deconstructed the 'd' out of it at the final call to merge, recreating the space leak. It could just as well have recognized that 'd' isn't changed inside spMerge (we're pure in Haskell after all) and deconstructed the pair in the original code. Something is missing here. As it turns out, the important things are 1. a feeder and separate lists of multiples for the feeder and the runner, for the reasons detailed earlier (two-step feeding and larger wheel are pleasing but minor optimisations). 2. a strict pattern in tfold 3. moving the merge inside spMerge Is this the state of our _best_ Haskell compiler Yes. It's still a do what I tell you to compiler, even if a pretty slick one, not a do what I mean compiler. Sometimes, what you tell the compiler isn't what you wanted. It's easier to predict when you give detailed step by step instructions. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes
Daniel Fischer daniel.is.fischer at web.de writes: Am Donnerstag 07 Januar 2010 11:43:44 schrieb Heinrich Apfelmus: Will Ness wrote: Heinrich Apfelmus writes: The below code is now a well-behaved memory citizen (3MB for the 100 millionth prime, about the same as the PQ code). It still is considerably slower than the PQ code. In terms of MUT times as reported by +RTS -sstderr - as well as (MUT+GC) times - (measured once for prime No. 5*10^5, 10^6, 2*10^6, 4*10^6, 10^7 to get a rough tendency), it seems to scale a wee bit better than any of the other tfold versions I created, but a little worse than the PQ versions. The relation of MUT times isn't too bad, but the GC times are rather abysmal (30-40%). -- data People a = P { vips :: [a], dorks :: [a] } celebrate x p = P (x:vips p) (dorks p) mergeSP p1@(P a _) p2 = P (a ++ vips mrgd) (dorks mrgd) where mrgd = spMerge (dorks p1) (vips p2) (dorks p2) spMerge l1 [] l3 = P [] (merge l1 l3) spMerge ~l1@(x:xs) l2@(y:ys) l3 = case compare x y of LT - celebrate x (spMerge xs l2 l3) EQ - celebrate x (spMerge xs ys l3) GT - celebrate y (spMerge l1 ys l3) I forgot to say something *very* important. :) Here it is. Yippee-hurray! :) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes
Daniel Fischer daniel.is.fischer at web.de writes: Am Samstag 09 Januar 2010 08:04:20 schrieb Will Ness: Daniel Fischer daniel.is.fischer at web.de writes: Am Freitag 08 Januar 2010 19:45:47 schrieb Will Ness: Daniel Fischer daniel.is.fischer at web.de writes: It's not tail-recursive, the recursive call is inside a celebrate. It is (spMerge that is). No. In computer science, tail recursion (or tail-end recursion) is a special case of recursion in which the last operation of the function, the tail call, is a recursive call. As far as I understand it, when a function makes a tail call to a tail recursive function (be it itself or some other function) it is itself tail recursive. I.e. that call may be replaced with a direct jump, with no new context to be created. That is what your version accomplishes, too. Mine really held to that pair ~(c,d) when it wanted to call (merge _ d) _after_ the call to spMerge. By passing the pre-decomposed part of a pair, 'd', into the process environment of spMerge, you've made it tail recursive - it carried along all the context it needed. That's what've eliminated the space leak, so I'd say tail recursion does play a role under lazy evaluation - when a compiler isn't smart enough to do _that_ for us by itself. _Were_ it reliably smart, even non- recursive functions like my initial variant would work just as well. The last operation of spMerge is a call to celebrate or the pair constructor (be that P or (,)). Doesn't matter, though, as for lazy languages, tail recursion isn't very important. It calls tail-recursive celebrate in a tail position. What you've done, is to eliminate the outstanding context, by moving it inward. Your detailed explanation is more clear than that. :) BTW when I run VIP code it is consistently slower than using just pairs, I can't reproduce that. Ceteris paribus, I get the exact same allocation and GC figures whether I use People or (,), running times identical enough (difference between People and (,) is smaller than the difference between runs of the same; the difference between the fastest and the slowest run of the two is less than 0.5%). I think it must be the other changes you made. I just take the VIP code as it is on a web page, and my intial variant without the wheel, and compare. Then I add the wheel in the same fashion, and then feeder, and compare again. When I tested that Monid instance code I too compared it to the straight pairs, and it was slower. Don't know why. modified with wheel and feeder and all. So what's needed is to re-implement your approach for pairs: mergeSP (a,b) ~(c,d) = let (bc,bd) = spMerge b c d in (a ++ bc, bd) where spMerge b [] d = ([], merge b d) spMerge b@(x:xs) c@(y:ys) d = case compare x y of LT - consSP x $ spMerge xs c d EQ - consSP x $ spMerge xs ys d GT - consSP y $ spMerge b ys d consSP x ~(bc,bd) = (x:bc,bd) -- don't forget that magic `~` !!! I called that (:). BTW I'm able to eliminate sharing without a compiler switch by using Yes, I can too. But it's easy to make a false step and trigger sharing. yes indeed. It's seems unpredictable. Fortunately GHC couldn't tell that (12-1) was 11 by the looks of it. :) Your idea certainly seems right, that there ought to be some control over sharing on a per-function basis somehow without these ridiculous code tricks. I can get a nice speedup (~15%, mostly due to much less garbage collecting) by doing the final merge in a function without unnecessarily wrapping the result in a pair Will have to wrap my head around _that_. But that would be fighting with the compiler again. I don't like that, I much rather fight with the problem at hand. There shouldn't be any pairs in the compiled code in the first place. They just guide the staging of (++) and (merge) intertwined between the producer streams. At each finite step, when the second part of a pair comes into play, it is only after its first part was completely consumed. I guess the next thing to try would be to actually create a data type MergeNode and arrange _those_ in a tree and see if that helps. That would be the next half-step towards the PQ itself. This uses a different folding structure again, which I am yet to decipher. :) How about them wheels? :) Well, what about them? I dunno, it makes for a real easy wheel derivation, and coming out of our discussion of euler's sieve it's a nice cross-pollination. :) Having yet another list representation suddenly cleared up the whole issue (two of them). I'll repost it one last time as I've made some corrections to it: euler ks@(p:rs) = p : euler (rs `minus` map (*p) ks) primes = euler [2..] primes = euler $ listFrom [2] 1 = 2:euler ( listFrom [3] 1 `minus` map(2*) (listFrom [2] 1)) ) listFrom [3,4] 2
[Haskell-cafe] Re: FASTER primes
Daniel Fischer daniel.is.fischer at web.de writes: Am Donnerstag 07 Januar 2010 11:43:44 schrieb Heinrich Apfelmus: Will Ness wrote: Hm? In my world view, there is only reduction to normal form and I don't see how allocate its own storage fits in there. Okasaki having shown something to that end would be new to me? Perhaps what was meant was storage must be allocated for each filter (which, however, seems trivial). I still contend that in case of nested filters, where each filter has only one consumer, even that isn't ultimately necessary (a chain of pointers could be formed). That's because there's no peeks, only pulls there. If the filters are used in merge situation, then there will be some peeks so current value must be somehow stored, though it's better to be made explicit and thus static (the place for it, I mean, instead of the rolling cons cell). Such implementation technique would prevent some extra gc. Then under merging these split pairs form a monoid, so can be rearranged in a tree. If you haven't seen it yet, it uses a different folding structure to your code, with a lower total cost of multiples production (estimated as Sum (1/p)*depth): correction: tfold f (x:y:z:xs) = (x `f` (y `f` z)) `f` tfold f (pairwise f xs) comps = tfold mergeSP $ pairwise mergeSP multips The idea being that each prime produces 1/p composites each turn and it takes time depth to trickle it to the top? Sounds reasonable, but remember that a prime p does not start to generate composites until the turn count has reached p^2, so it might occupy a place of low depth in the tree that is better spent on the previous primes. That might be why Daniel's structure is better: it plunges down faster than mine. treefold structure was: (2+4) + ( (4+8) + ( (8+16) + ( (16+32) + ( (32+64) + ... dpths: 3 4 4 5 5 66 77 8 daniel's: (2+(4+6)) + ( (8+(10+12)) + ( (14+(16+18)) + ( (20+(22+24)) + )) 3 5 5.4 6 7.8 7.9 8 9 9.5 9.6 10.7 10.8 primes () = 2:3:5:7:11:13:primes' where primes' = roll 17 wheel13 `minus` compos primes''' primes'' = 17:19:23:29:31:37:rollFrom 41 `minus` compos primes'' primes''' = 17:19:23:29:31:37:rollFrom 41 `minus` compos primes'' Haven't read through the whole thing yet. :) :) I thought there was a typo there. There isn't. BTW using the no-share switch isn't necessary if we just write it down twice with some variations, like primes''' = let (h,t)=span ( 17^2) roll 17 wheel13 in h++t `minus` compos primes'' As we've found out, compilers aren't _that_ smart yet. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes
Daniel Fischer daniel.is.fischer at web.de writes: The below code is now a well-behaved memory citizen (3MB for the 100 millionth prime, about the same as the PQ code). It still is considerably slower than the PQ code. In terms of MUT times as reported by +RTS -sstderr - as well as (MUT+GC) times - (measured once for prime No. 5*10^5, 10^6, 2*10^6, 4*10^6, 10^7 to get a rough tendency), it seems to scale a wee bit better than any of the other tfold versions I created, but a little worse than the PQ versions. The relation of MUT times isn't too bad, but the GC times are rather abysmal (30-40%). -- data People a = P { vips :: [a], dorks :: [a] } celebrate :: a - People a - People a celebrate x p = P (x:vips p) (dorks p) primes :: forall a. Integral a = () - [a] primes () = 2:3:5:7:11:13:primes' where primes' = roll 17 wheel13 `minus` compos primes''' primes'' = 17:19:23:29:31:rollFrom 37 `minus` compos primes'' primes''' = 17:19:23:29:31:37:rollFrom 41 `minus` compos primes'' pmults :: a - People a pmults p = case map (*p) (rollFrom p) of (x:xs) - P [x] xs multip :: [a] - [People a] multip ps = map pmults ps compos :: [a] - [a] compos = vips . smartfold mergeSP . multip smartfold f = tfold f . pairwise f tfold f (a:b:c:xs) = (a `f` (b `f` c)) `f` smartfold f xs pairwise f (x:y:ys) = f x y : pairwise f ys mergeSP :: Integral a = People a - People a - People a mergeSP p1@(P a _) p2 = P (a ++ vips mrgd) (dorks mrgd) where mrgd = spMerge (dorks p1) (vips p2) (dorks p2) spMerge l1 [] l3 = P [] (merge l1 l3) spMerge ~l1@(x:xs) l2@(y:ys) l3 = case compare x y of LT - celebrate x (spMerge xs l2 l3) EQ - celebrate x (spMerge xs ys l3) GT - celebrate y (spMerge l1 ys l3) -- Hi Daniel, Is it so that you went back to my fold structure? Was it better for really big numbers of primes? I had the following for ages (well, at least two weeks) but I thought it was slower and took more memory (that was _before_ the 'no-share' and 'feeder' stuff). I can see the only difference in that you've re-written spMerge in a tail-recursive style with explicitly deconstructed parts; mine was relying on the compiler (8-L) to de-couple the two pipes and recognize that the second just passes along the final result, unchanged. The two versions seem to me to be _exactly_ operationally equivalent. All this searching for the code better understood by the compiler is _*very*_ frustrating, as it doesn't reflect on the semantics of the code, or even the operational semantics of the code. :-[ Weren't the P's supposed to disappear completely in the compiled code? Aren't types just a _behavioral_ definitions??? Aren't we supposed to be able to substitute equals for equals dammit?? Is this the state of our _best_ Haskell compiler module Primes8 where import Data.Monoid data (Ord a) = SplitList a = P [a] [a] instance (Ord a) = Monoid (SplitList a) where mempty = P [] [] -- {x | x::SplitList a} form a monoid under mappend mappend (P a b) ~(P c d) = let P bc b' = spMerge b c in P (a ++ bc) (merge b' d) where spMerge :: (Ord a) = [a] - [a] - SplitList a spMerge u@(x:xs) w@(y:ys) = case compare x y of LT - P (x:c) d where (P c d) = spMerge xs w EQ - P (x:c) d where (P c d) = spMerge xs ys GT - P (y:c) d where (P c d) = spMerge u ys spMerge u [] = P [] u spMerge [] w = P [] w mconcat ms = fold mappend (pairwise mappend ms) where fold f (a: ~(b: ~(c:xs))) = (a `f` (b `f` c)) `f` fold f (pairwise f xs) pairwise f (x:y:ys) = f x y:pairwise f ys primes :: Integral a = () - [a] primes () = 2:3:5:7:primes' where primes'= [11,13] ++ drop 2 (rollFrom 11) `minus` comps mults = map (\p- P [p*p] [p*n | n- tail $ rollFrom p]) $ primes' P comps _ = mconcat mults ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes
Daniel Fischer daniel.is.fischer at web.de writes: roll = scanl (+) wheel = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2: 4:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wheel wheel11 = res where snms = scanl (+) 11 (take 47 wheel) nums = tail $ scanl (+) 11 (take 529 wheel) cops = nums `minus` map (*11) snms diffs = zipWith (-) (tail cops) cops res = foldr (:) res diffs wheel13 = res where snms = take 480 $ scanl (+) 13 wheel11 nums = take (480*13+1) . tail $ scanl (+) 13 wheel11 cops = nums `minus` map (*13) snms diffs = zipWith (-) (tail cops) cops res = foldr (:) res diffs BTW have you seen my take on the faithful Euler's sieve? It shows another way to look at the wheels, which for me was really the first time I really understood what's going on there. It also makes for easier wheel extention (IMO): euler ks@(p:rs) = p : euler (rs `minus` map (*p) ks) primes = euler [2..] The essence of Euler's sieve is the wheel: after each step we're left with lists of non-multiples of the preceding primes: primes = euler $ listFrom [2] 1 = 2:euler ( listFrom [3] 1 `minus` map(2*) (listFrom [2] 1)) ) listFrom [3,4] 2 `minus` listFrom [4] 2 -- listFrom [3] 2 -- = 2:3:euler (listFrom [5] 2 `minus` map(3*) (listFrom [3] 2)) listFrom [5,7,9] 6 `minus` listFrom [9] 6 -- listFrom [5,7] 6 -- = 2:3:5:euler (listFrom [7,11] 6 `minus` listFrom [25,35] 30) [7,11, 13,17, 19,23, 25,29, 31,35] 30 -- listFrom [7,11,13,17,19,23,29,31] 30 -- = . where listFrom xs by = concat $ iterate (map (+ by)) xs so -- startRoll = ([2],1) nextRoll r@(xs@(x:xt),b) = ( (x,r') , r') where ys@(y:_) = xt ++ [x + b] r' = (xs',b') b' = x*b xs' = takeWhile ( y + b') (listFrom ys b) `minus` map (x*) xs rolls = unfoldr (Just . nextRoll) ([2],1) nthWheel n = let (ps,rs) = unzip $ take n rolls (x:xs,b) = last rs in ((ps, x), zipWith (-) (xs++[x+b]) (x:xs)) {- *Main mapM print $ take 4 rolls (2,([3],2)) (3,([5,7],6)) (5,([7,11,13,17,19,23,29,31],30)) (7,([11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97, 101,103,107,109,113,121,127,131,137,139,143,149,151,157,163,167,169, 173,179,181,187,191,193,197,199,209,211],210)) *Main nthWheel 3 (([2,3,5],7),[4,2,4,2,4,6,2,6]) *Main nthWheel 4 (([2,3,5,7],11),[2,4,2,4,6,2,6,4,2,4,6,6,2,6,4,2,6,4,6,8,4,2,4,2, 4,8,6,4,6,2,4,6,2,6,6,4,2,4,6,2,6,4,2,4,2,10,2,10]) -} Coincidentally, the function specified by eulerPrimes n = let (ps,rs) = unzip $ take n rolls (qs@(q:_),b) = last rs in ps ++ takeWhile ( q^2) qs can be used to write the specialized nthEulerPrime etc., whose complexity seems to be about O(n^1.5). Maybe this reinvents some spiraling wheels or somethn'. :) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes
Will Ness will_n48 at yahoo.com writes: That might be why Daniel's structure is better: it plunges down faster than mine. treefold structure was: (2+4) + ( (4+8) + ( (8+16) + ( (16+32) + ( (32+64) + ... dpths: 3 4 4 5 5 66 77 8 this should of course have been dpths: 3 4 5 6 7 89 10 11 12 daniel's: (2+(4+6)) + ( (8+(10+12)) + ( (14+(16+18)) + ( (20+(22+24)) + )) 3 5 5.4 6 7.8 7.9 8 9 9.5 9.6 10.7 10.8 hmm. :| ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes
Daniel Fischer daniel.is.fischer at web.de writes: Am Freitag 08 Januar 2010 19:45:47 schrieb Will Ness: Daniel Fischer daniel.is.fischer at web.de writes: It's not tail-recursive, the recursive call is inside a celebrate. It is (spMerge that is). It calls tail-recursive celebrate in a tail position. What you've done, is to eliminate the outstanding context, buy moving it inward. Your detailed explanation is more clear than that. :) BTW when I run VIP code it is consistently slower than using just pairs, modified with wheel and feeder and all. So what's needed is to re-implement your approach for pairs: mergeSP (a,b) ~(c,d) = let (bc,bd) = spMerge b c d in (a ++ bc, bd) where spMerge u [] d = ([], merge u d) spMerge u@(x:xs) w@(y:ys) d = case compare x y of LT - consSP x $ spMerge xs w d EQ - consSP x $ spMerge xs ys d GT - consSP y $ spMerge u ys d consSP x ~(a,b) = (x:a,b) -- don't forget that magic `~` !!! BTW I'm able to eliminate sharing without a compiler switch by using mtwprimes () = 2:3:5:7:primes where primes = doPrimes 121 primes doPrimes n prs = let (h,t) = span ( n) $ rollFrom 11 in h ++ t `diff` comps prs doPrimes2 n prs = let (h,t) = span ( n) $ rollFrom (12-1) in h ++ t `diff` comps prs mtw2primes () = 2:3:5:7:primes where primes = doPrimes 26 primes2 primes2 = doPrimes2 121 primes2 Using 'splitAt 26' in place of 'span ( 121)' didn't work though. How about them wheels? :) Yes. It's still a do what I tell you to compiler, even if a pretty slick one, not a do what I mean compiler. Sometimes, what you tell the compiler isn't what you wanted. It's easier to predict when you give detailed step by step instructions. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes
Heinrich Apfelmus apfelmus at quantentunnel.de writes: Will Ness wrote: But I get the impression that GHC isn't working through equational reasoning?.. I see all this talk about thunks etc. Sure it does. Concerning the thunks, they're part of the implementation of the reduction model (call-by-need aka lazy evaluation). At run-time? I meant to eliminate as much calculation as possible, pre-run-time. I would expect the best efforts of the best minds to go into searching for ways how to eliminate computations altogether, instead of how to perform them better. Concerning the sieves, there is a fundamental difference between the imperative sieve and the functional sieves, regardless of whether the latter start at p or p^2 or use a priority queue. [...] We can directy jump to the next multiple too, it is called (+). :) :) But seriously, the real issue is that we have to merge the produced streams of multiples, while the mutable-storage code works on same one, so its merging cost is zero. Not quite, I think there are two things going on: 1. In contrast to the standard imperative version, we are implementing an on-line algorithm that produces arbitrarily many primes on demand. Any imperative on-line version will need to think about storage for pending prime filters as well. True. 2. Even the imperative algorithm can be interpreted as merging arrays, just that the composites are magically merged at the right place (at distance p from each other) because arrays offer O(1) jumps. i.e. with a merging cost of zero. :) In contrast, the functional version needs O(log something) time to calculate where to put the next composite. when thinking it terms of the finally produced sequence, yes. We have to produce numbers one by one and take care of their ordering ourselves; _they_ just /throw/ the numbers at the shared canvas and let _it_ take care of ordering them automagically, _later_, on the final sweep through. ISWYM. If you could take a look at the tree-merging primes and why it uses too much memory, it would be great. Fortunately, Daniel already took a detailed look. :) Yes he really came through! He finally found, and fixed, the space leak. It was hiding in mergeSP_BAD (a,b) ~(c,d) = let (bc,b') = spMerge b c in (a ++ bc, merge b' d) where spMerge :: (Ord a) = [a] - [a] - ([a],[a]) spMerge a@(x:xs) b@(y:ys) = case compare x y of LT - (x:c,d) where (c,d) = spMerge xs b EQ - (x:c,d) where (c,d) = spMerge xs ys GT - (y:c,d) where (c,d) = spMerge a ys spMerge a [] = ([] ,a) spMerge [] b = ([] ,b) which really ought to have been mergeSP (a,b) ~(c,d) = let ~(bc,bd) = spMerge b c d in (a ++ bc, bd) where spMerge u [] d = ([], merge u d) spMerge u@(x:xs) w@(y:ys) d = case compare x y of LT - spCons x $ spMerge xs w d EQ - spCons x $ spMerge xs ys d GT - spCons y $ spMerge u ys d spCons x ~(a,b) = (x:a,b) Can you spot the difference? :) :) Aww, why remove the cutesy name? The VIPs will be angry for being ignored! It runs faster on plain pairs, and on less memory, equals for equals. For some reason. :) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes
Will Ness will_n48 at yahoo.com writes: Daniel Fischer daniel.is.fischer at web.de writes: Am Dienstag 05 Januar 2010 14:49:58 schrieb Will Ness: euler ks@(p:rs) = p : euler (rs `minus` map (*p) ks) primes = 2:euler [3,5..] Re-write: primes = euler $ rollFrom [2] 1 = 2:euler ( rollFrom [3] 1 `minus` map(2*) (rollFrom [2] 1)) ) rollFrom [3,4] 2 `minus` rollFrom [4] 2 -- rollFrom [3] 2 -- = 2:3:euler (rollFrom [5] 2 `minus` map(3*) (rollFrom [3] 2)) rollFrom [5,7,9] 6 `minus` rollFrom [9] 6 -- rollFrom [5,7] 6 -- = 2:3:5:euler (rollFrom [7,11] 6 `minus` rollFrom [25,35] 30) [7,11, 13,17, 19,23, 25,29, 31,35] 30 -- rollFrom [7,11,13,17,19,23,29,31] 30 -- = . correction: where rollOnce (x:xs) by = (x, xs ++ [x+by]) rollFrom xs by = concat $ iterate (map (+ by)) (xs) multRoll xs@(x:_) by p = takeWhile ( (x+p*by)) $ rollFrom xs by so, reifying, we get data Roll a = Roll [a] a rollOnce (Roll (x:xs) by) = (x,Roll (xs ++ [x+by]) by) rollFrom (Roll xs by) = concat $ iterate (map (+ by)) (xs) multRoll r@(Roll (x:_) by) p = Roll (takeWhile ( (x+p*by)) $ rollFrom r) (by*p) primes = euler $ Roll [2] 1 euler r@(Roll xs _) = x:euler (Roll (mxs `minus` map (x*) xs) mby) where (x,r') = rollOnce r (Roll mxs mby) = multRoll r' x There's much extra primes pre-calculated inside the Roll, of course. For any (Roll xs@(x:_) _), (takeWhile ( x*x) xs) are all primes too. When these are used, the code's complexity is around O(n^1.5), and it runs about 1.8x slower than Postponed Filters. The faithful sieve's empirical complexity is above 2.10..2.25 and rising. So it might not be exponential, bbut is worse than power it seems anyway. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes
Daniel Fischer daniel.is.fischer at web.de writes: Am Mittwoch 06 Januar 2010 00:09:07 schrieb Will Ness: Daniel Fischer daniel.is.fischer at web.de writes: Am Montag 04 Januar 2010 22:25:28 schrieb Daniel Fischer: Fix rfold: rfold f [x] = x rfold f xs = rfold f (pairwise f xs) and it's faster also for those. The memory is almost completely due to the tree-merging of the multiples for the fastest runner. While it produces faster than flat merging, the exponential growth of the trees makes a bad memory citizen. Isn't the number of nodes the same in any two trees with the same number of leafs? BTW using compos ps = fst $ tfold mergeSP $ nwise 1 mergeSP $ map pmults ps instead of compos ps = fst $ tfold mergeSP $ nwise 1 mergeSP $ pairwise mergeSP $ map pmults ps brings down memory consumption further by 25%..45% on 1..8 mln primes produced, while slowing it down by about 0%..2% (that's after eliminating the lazy pattern in tfold as per your advice). 'pairwise' puts odd leafs higher on the right. It might be better if it was so on the left, for the frequency of production is higher. Maybe. But how would you do it? I tried passing the length to rfold, so when there was an odd numberof trees in the list, it would move the first out of the recursion. Any possible gains in production have been more than eaten up by the control code (not a big difference, but it was there). yes I've seen this too now. BTW, at a price of further slowing down, memory can be lowered yet more with compos ps = fst $ tfold mergeSP $ nwise 1 0.4 mergeSP $ map pmults ps nwise k d f xs = let (ys,zs) = splitAt (round k) xs in rfold f ys : nwise (k+d) d f zs It really looks like the nearer the structure is to linear list, the lower the memory consumption becomes. Of course using 0.0 in place of 0.4 would make it into a plain list. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes
Daniel Fischer daniel.is.fischer at web.de writes: Am Montag 04 Januar 2010 16:30:18 schrieb Will Ness: For me, a real smart compiler is one that would take in e.g. (sum $ take n $ cycle $ [1..m]) and spill out a straight up math formula, inside a few ifs maybe (just an aside). Such things just aren't common enough. If you start letting the compiler look for patterns such as this which can be transformed into a simple formula, compile times would explode. I was thinking more along the lines of inferencing compiler, proving new theorems about the types and applying that knowledge in simplifying the expressions. This would take time, so it should be a part of some interactive system, maybe kind of like Lisp has. In such a setting, the underlying compiler could first produce quick-n-dirty version, and would continue working in the background whenever the system is not busy, trying to improve the executable. Such a system would probably have to distinguish, at the type level, between [1..m] ; cycle [1..m] ; take n [1..m] ; etc. These would all be not just fuctions, but parts of a type's (here list) behaviour with automatically deduced semantics. What would such a type system be called? The -fno-cse turns off Common Subexpression Elimination (rather sharing than elimination). That is, if you have f x = g (expensive x) y z (h (expensive x)) the compiler can see the common subexpression (expensive x) on the RHS and decide to share it, i.e. calculate it only once: f x = let e = expensive x in g e y z (h e) thanks for the in-depth explanation! :) Now if you have a list producer and a consumer, without fusion, it goes like Consumer: Gimme Producer: creates cons cell, puts value, next cons cell (how to produce more) Consumer: takes value, does something with it, gimme more. Stream fusion is about eliminating the cons cell creating and value passing, to fuse production and consumption into a nice loop. That is of course impossible if the produced list is shared between two (or more) consumers. I would imagine so. Do I get this fusion on lists for free from the compiler, or do I have to recode for that? (haven't yet time to look into the article mentioned). ___ Haskell-Cafe mailing list Haskell-Cafe at haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes
Daniel Fischer daniel.is.fischer at web.de writes: Am Montag 04 Januar 2010 19:16:32 schrieb Will Ness: Daniel Fischer daniel.is.fischer at web.de writes: Am Montag 04 Januar 2010 13:25:47 schrieb Will Ness: Euler's sieve is sieve (p:ps) xs = h ++ sieve ps (t `minus` map (p*) [p,p+2..]) where (h,t) = span ( p*p) xs Not quite. That's a variant of the postponed filters, it crosses off e.g. 45 twice, once as 3*15 and once as 5*9 (okay, it has already been removed by the first, so let's say 45 appears in two lists of numbers to be removed if present). there won't be any such. whatever is removed first, is just skipped second (and third, etc). ((45:(offer 47 when demanded)) `minus` (45:(next will be 51 when demanded))) `minus` (45:(next will be 55 when demanded)) So there are two attempts to tell the generator to not output 45. To the second, it answers I already knew that, but the request is made nevertheless. yes, of course. ... There are two attempts to eliminate 45. I would say there are two requests to not have 45 in the output. I don't see any problem here. As Melissa (and yourself, I think) have shown, double hits on multiples are few and far between. It's not a problem, it just means it's not Euler's sieve, because that attempts to eliminate each composite exactly once. yes I see now. My bad. Wasn't reading that wikipedia page with enough attention to detail. It uses the modified (culled, so far) numbers to find out the next multiples to be eliminated, not just the prime itself like my code does. You solution is indeed, exponential: euler ks@(p:rs) = p : euler (rs `minus` map (*p) ks) primes = 2:euler [3,5..] primes = 2:euler (a...@[3,5..]) 3:euler (bs@(tail as `minus` map (3*) as)) 5:euler (cs@(tail bs `minus` map (5*) bs)) There are two separate look-back pointers to /as/ in /bs/, and there are two such in /cs/, to /bs/. The book-keeping machinery explodes. Also, it uses no filters, i.e. no individual number divisibility testing. The filters refers first of all to testing an individual number to decide whether to keep it in or not. Umm, the postponed filters says keep everything until p^2, then eliminate (filter out, remove) multiples of p in the remainder, after that, pick next prime. That's precisely what the above does. It doesn't do the filtering out by divisibility testing but by minus (hence more efficiently). I would say that's a variant of the postponed filters. Filter is usually (as in Haskell's 'filter') is about testing individual elements by a predicate function. There is of course a filtering effect in two lists elts' comparison that 'minus' performs, so that's debatable. Even the PQ code performs filtering in this wider sense. Euler's sieve is never attempting to remove a number more than once, that's How's that possible? http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes#Euler.27s_Sieve C) The number after the previous prime is also a prime. *Multiply each number /that's left/ in the list starting from this prime by this prime and discard the products*. yes. Wasn't paying attention to that, more to the intent of it. There's of course enourmous vagueness in how exactly it is to be performed, in the unbounded case, which you uncovered here. It can't have foresight, right? But it has :) By only trying to eliminates products of the prime p currently under consideration *with numbers (= p) /which have not/ /yet been eliminated/ from the list*, it is known in advance that all these products are still in the list. missed that. When p is under consideration, the list contains (apart from the primes p) precisely the numbers whose smallest prime factor is = p. the outstanding feature of it. Unfortunately, that also makes it hard to implement efficiently. The problem is that you can't forget the primes between p and p^2, you must remember them for the removal of multiples of p later on. not just primes - all the composites not yet removed, too. So it can't even be implemented on shared mutable storage if we want to delay the removal (as we must, in unbounded case) - the composites will get removed so their multiples must actually all be produced first! The more pressing problem with that code is its linear structure of course which gets addressed by the tree-folding merge etc. Which unfortunately introduces its own space problem :( Try a different minus: xxs at (x:xs) `minus` yys at (y:ys) = case compare x y of LT - x : xs `minus` yys EQ - xs `minus` ys GT - error (trying to remove ++ show y ++ a second time) Your code is not. It is, however, much faster. I understand now. Thanks! Its performance is really horrible though. exponential, empyrically as well. It just occured to me that the accumulation list is just
[Haskell-cafe] Re: FASTER primes
Daniel Fischer daniel.is.fischer at web.de writes: Am Dienstag 05 Januar 2010 14:49:58 schrieb Will Ness: ... There are two attempts to eliminate 45. I would say there are two requests to not have 45 in the output. Thers are many possible ways to phrase it. You solution is indeed, exponential: euler ks@(p:rs) = p : euler (rs `minus` map (*p) ks) primes = 2:euler [3,5..] primes = 2:euler (a...@[3,5..]) 3:euler (bs@(tail as `minus` map (3*) as)) 5:euler (cs@(tail bs `minus` map (5*) bs)) Re-write: euler s = head s:euler (tail s `minus` map(head s*) s) primes = euler [2..] primes = euler $ rollFrom [2] 1 = 2:euler ( rollFrom [3] 1 `minus` map(2*) (rollFrom [2] 1)) ) rollFrom [3,4] 2 `minus` rollFrom [4] 2 -- rollFrom [3] 2 -- = 2:3:euler (rollFrom [5] 2 `minus` map(3*) (rollFrom [3] 2)) rollFrom [5,7,9] 6 `minus` rollFrom [9] 6 -- rollFrom [5,7] 6 -- = 2:3:5:euler (rollFrom [7,11] 6 `minus` rollFrom [25,35] 30) [7,11, 13,17, 19,23, 25,29, 31,35] 30 -- rollFrom [7,11,13,17,19,23,29,31] 30 -- = . where rollOnce (x:xs) by = (x, tail xs ++ [x+by]) rollFrom xs by = concat $ iterate (map (+ by)) (xs) multRoll xs@(x:_) by p = takeWhile ( (x+p*by)) $ rollFrom xs by so, reifying, we get data Roll a = Roll [a] a rollOnce (Roll (x:xs) by) = (x,Roll (xs ++ [x+by]) by) rollFrom (Roll xs by) = concat $ iterate (map (+ by)) (xs) multRoll r@(Roll (x:_) by) p = Roll (takeWhile ( (x+p*by)) $ rollFrom r) (by*p) primes = euler $ Roll [2] 1 euler r@(Roll xs _) = x:euler (Roll (mxs `minus` map (x*) xs) mby) where (x,r') = rollOnce r (Roll mxs mby) = multRoll r' x There's much extra primes pre-calculated inside the Roll, of course (upto p^2 in fact, for p = primes!!n ), so this needs to be somehow tapped into, by writing a specialized nthPrime n = to be called instead of (!! n), and also primesUpTo n = This calculated primes !! 1000 == 7927 in 3 seconds for me, interpreted, on my old slowish laptop. It's supposed to have all the primes upto 7927^2 = 62837329 inside the Roll (if I'm not mistaken - or is it?). That's about 3.72 millionth prime, according to WolframAlpha. (nah, it cant be that much). But it is surely not horribly slow. Is this, in fact, the wheels' spiral? not just primes - all the composites not yet removed, too. Between p and p^2, there are only primes left, fortunately. but (map (*p) ks) needs access to the original, non-culled numbers - primes, composites and all. (?) So it can't even be implemented on shared mutable storage if we want to delay the removal (as we must, in unbounded case) - Yes. And it's not nice in the bounded case either. ___ Haskell-Cafe mailing list Haskell-Cafe at haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes
Daniel Fischer daniel.is.fischer at web.de writes: So we must make sure that the list of composites that primes' consumes is not the same as that which primes'' consumes. yes that is what I had done too. Duplicated everything. Turns out, it works exactly as you told it would when using the compiler switch, -fno-cse, thanks! I used the switch; it didn't help at all. The only thing I can see is wrong. I didn't. When I did, it worked. Unfortunately it grows, as you've said - 23MB for 2 mln. :| And I've found out why. Change the definition of tfold to tfold f (a: ~(b: ~(c:xs))) = (a `f` (b `f` c)) `f` tfold f xs and memory stays low (things are going much slower, though). (forced by gmane poster to delete unusually many of your comments today...) Interesting... As for the structure, I chose it trying to minimize the estimated average cost of a composite production, Sum (1/p)*depth. You can make a compromise by using the above tfold (which is no longer a tree-fold) and grouping (and merging) the multiples in a slower-growing manner, ... memory still grows, but much slower, in my tests, due to the much smaller GC time, it's a bit faster than the version with the original tfold. Great! :) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes
Daniel Fischer daniel.is.fischer at web.de writes: Am Montag 04 Januar 2010 22:25:28 schrieb Daniel Fischer: memory still grows, but much slower, in my tests, due to the much smaller GC time, it's a bit faster than the version with the original tfold. Not for larger inputs (but not so large that the tree-fold dies OOM). Fix rfold: rfold f [x] = x rfold f xs = rfold f (pairwise f xs) and it's faster also for those. Niiice This is just great! :) I tried a two-step feed BTW (that's three separate sets of lists) , with the original structure. It ran with same speed as your new version (10..20% faster) but with the memory of the previous one :) (80M for 8 mil primes vs the new one's 10M). But your new structure is just great! I hoped there is something better, that's why I posted it here in the first place. 'pairwise' puts odd leafs higher on the right. It might be better if it was so on the left, for the frequency of production is higher. Thanks a lot for your comments! ___ Haskell-Cafe mailing list Haskell-Cafe at haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes
Daniel Fischer daniel.is.fischer at web.de writes: Am Sonntag 03 Januar 2010 09:54:37 schrieb Will Ness: Daniel Fischer daniel.is.fischer at web.de writes: But there's a lot of list constructuion and deconstruction necessary for the Euler sieve. yes. Unless, of course, s smart compiler recognizes there's only one consumer for the values each multiples-list is producing, and keeps it stripped down to a generator function, and its current value. I keep thinkig a smart compiler could eliminate all these span calls and replace them with just some pointers manipulating... Of course I'm no smart compiler, but I don't see how it could be even possible to replace the span calls with pointer manipulation when dealing with lazily generated (infinite, if we're really mean) lists. Even when you're dealing only with strict finite lists, it's not trivial to do efficiently. I keep thinking that storage duplication with span, filter etc. is not really necessary, and can be replaced with some pointer chasing - especially when there's only one consumer present for the generated values. What I mean is thinking of lists in terms of produce/consumer paradigm, as objects supporting the { pull, peek } interface, keeping the generator inside that would produce the next value on 'pull' request and keep it ready for any 'peek's. Euler's sieve is sieve (p:ps) xs = h ++ sieve ps (t `minus` map (p*) [p,p+2..]) where (h,t) = span ( p*p) xs Everything lives only through access, so (sieve (tail primes) [5,7]) would create an object with the generator which has the 'span' logic inlined: sieve ps xs = make producer such that p := pull ps -- alter ps as well (actually pull a value from it) q := p*p peek = x := peek xs if x q then x else peek (remake self) pull = x := peek xs if x q then pull xs else pull (remake self) remake = ys := minus xs (intsFromBy q (2*p)) self := sieve ps ys Here the only thing that gets created are the 'minus' nodes which essentially maintain pointers into the two streams that they consume. 'intsFromBy' only has to maintain two integers inside it (currentVal and step) as there's no need for it to maintain any storage for its results, as they are immediately consumed. A persistent list would be represented by a different kind of producer which would be given a storage to operate on, upon creation (as would the top level variable like 'primes'). The real difference here is between those producers whose values will only be consumed once, by one specific consumer, and those which values may be needed more than once, so need really to be maintained in some storage. If not - span, filter, map, whatever - they all are just little modifiers on top of the real producers, which may or may not also have an actual storage maintained by them. again, what I mean is, not _where_ I start crossing them off in a PQ, but _when_. The article's code starts crossing them off _at_ p^2 - by adding p^2+2p into the PQ - _as_ _soon_ as p itself is reached. It won't surface until p^2 will be considered for a prime; it'll lay dormant deep inside the queue's guts. When reaching 7919, the thousand (i.e. pi(7919) ) entries will hang out inside the PQ - instead of just 24. A memory blowup. (this is of course fixed in Melissa's ZIP package). Of course due to the nature of PQ it might actually not hurt the performance for a while, depending on partcular PQ implementation. Complexity _and_ constant factors. It will start to have an impact pretty soon. Assuming at least one of the relevant PQ operations to be Theta(log size), each composite between ~400 and ~40 (rough estimates) will take something like twice as long to handle. It will start to hurt really badly only a while later, though, as a guesstimate, with more than a million primes in the PQ, memory will have a hard time. Exactly! If, on the other hand, you start crossung off at 2*p, when the main sieve is at 10^7, the size of the PQ is 65, at 10^8, the size is more than 5.5 million. That starts to become a memory problem rather soon. here you don't have a choice or when to add it - you have to add it at p itself - so the problem is clear. But even when you cross at p^2, the question remains, of when you add the p^2 entry into the PQ. That was my point. Postponed Filters code makes this clear, and thus hard to err on. Unfortunately, it wasn't present _in_ the article. I think that remark was meant to apply to composite removal, not Turner's sieve. It is right there on page 2, right when the Turner's sieve is presented and discussed. The only explanation that I see is that she thought of it in regards to the imperative code, just as her analysis concentrates only on calculation aspects of the imperative code itself. To be fair, she writes: Let us
[Haskell-cafe] Re: FASTER primes
Heinrich Apfelmus apfelmus at quantentunnel.de writes: Will Ness wrote: I keep thinking that storage duplication with span, filter etc. is not really necessary, and can be replaced with some pointer chasing - especially when there's only one consumer present for the generated values. What I mean is thinking of lists in terms of produce/consumer paradigm, as objects supporting the { pull, peek } interface, keeping the generator inside that would produce the next value on 'pull' request and keep it ready for any 'peek's. Euler's sieve is sieve (p:ps) xs = h ++ sieve ps (t `minus` map (p*) [p,p+2..]) where (h,t) = span ( p*p) xs [...] The real difference here is between those producers whose values will only be consumed once, by one specific consumer, and those which values may be needed more than once, so need really to be maintained in some storage. If not - span, filter, map, whatever - they all are just little modifiers on top of the real producers, which may or may not also have an actual storage maintained by them. (I haven't followed the whole thread, but hopefully I have enough grasp of it to make a useful remark. :)) Concerning lists as producer/consumer, I think that's exactly what lazy evaluation is doing. Neither filter , map or span evaluate and store more list elements that strictly necessary. I laways suspected as much, but was once told that Chris Okasaki has shown that any filter etc must allocate its own storage. With the peek/pull they don't have to, if they are nested, and the deepest one from the real storage gets pulled through some pointer chasing eventually. Span isn't so easily compiled out too or is it? But that might be a minor point. For me, a real smart compiler is one that would take in e.g. (sum $ take n $ cycle $ [1..m]) and spill out a straight up math formula, inside a few ifs maybe (just an aside). Such a smart compiler might even be able to derive a well performing code right from the Turner's sieve. :) Sure, creating a list head only to immediately consume it is somewhat inefficient -- and the target of stream fusion[1] -- but this is an overhead of how list elements are stored, not how many. it might be equivalent to the (imagined) producer's storing its 'current' value inside its frame. How much can we rely on the run-time to actually destroy all the passed-over elements and not hang on to them for some time? Is this that compiler switch that Daniel mentioned? Is it reliable? You can try to implement the Euler sieve with producers by using a type like data Producer a = forall s. Producer { state :: !s, next :: s - s, value :: s - a } but I think this will be quite difficult; it's not clear what and thus how big the state will be. (See [1] for choosing a good type.) I did that once in Scheme, as per SICP, with 'next' hanging in a stream's tail. Put filters and maps on top of that (inside the new 'next' actually). But that used the Scheme's lists as sorage. Another one was actual producers/modifiers with {pull,peek} interface. It even produced some primes, and some Hamming numbers. Then I saw Haskell, and thought I'd get all that for free with its equational reasoning. But I get the impression that GHC isn't working through equational reasoning?.. I see all this talk about thunks etc. Concerning the sieves, there is a fundamental difference between the imperative sieve and the functional sieves, regardless of whether the latter start at p or p^2 or use a priority queue. Namely, the imperative sieve makes essential use of *pointer arithmetic*. The key point is that in order to cross off the multiples p, 2*p, 3*p, ... of a prime, the algorithm can directly jump from the (k*p)-th to the (k*p+p)-th array element by adding p to the index. The functional versions can never beat that because they can't just jump over p constructors of a data structure in O(1) time. We can directy jump to the next multiple too, it is called (+). :) :) But seriously, the real issue is that we have to merge the produced streams of multiples, while the mutable-storage code works on same one, so its merging cost is zero. And even if we are smart to merge them in a tree-like fashion, we still have no (or little) control over the compiler's representation of lists and retention of their content and whether it performs stream fusion or not (if we use lists). If you could take a look at the tree-merging primes and why it uses too much memory, it would be great. The code is in Daniel's post to which I replied, or on haselwiki Prime_numbers page (there in its rudimentary form). It's a tangent to your VIP code, where instead of People structure an ordered list is just maintained as a split pair, of its known (so far, finite) prefix and the rest of it. Then under merging these split pairs form a monoid, s can
[Haskell-cafe] Re: FASTER primes
Daniel Fischer daniel.is.fischer at web.de writes: Am Sonntag 03 Januar 2010 09:54:37 schrieb Will Ness: The quesion of a memory blowup with the treefolding merge still remains. For some reason using its second copy for a feeder doesn't reduce the memory (as reported by standalone compiled program, GHCi reported values are useless) - it causes it to increase twice. I have a partial solution. The big problem is that the feeder holds on to the beginning of comps while the main runner holds on to a later part. Thus the entire segment between these two points must be in memory. So have two lists of composites (yeah, you know that, but it didn't work so far). But you have to force the compiler not to share them: enter -fno-cse. The attached code does that (I've also expanded the wheel), it reduces the memory requirements much (a small part is due to the larger wheel, a factor of ~5 due to the non-sharing). I don't understand. What is there to be shared? Each multiples list is consumed only at one point; there's nothing to be shared. Do you mean the compiler still hangs on to them? If so, why?? I used the switch; it didn't help at all. The only thing I can see is different is that all my interim data which I named with inner vars you moved out to the top level as functions. Is that what did the trick? What would be the reason to hang on to the already consumed data that is inaccessible to any active consumer? Why not make the forgetful behaviour the norm - especially where remembering is pointless?? It still uses much more memory than the PQ, and while the PQ's memory requirements grow very slowly, the tree-fold merge's still grow rather fast (don't go much beyond the 10,000,000th prime), I'm not sure why. You did it! It's now 7M for 1,000,000th prime, instead of 52M before. Making the pattern lazy in mergeSP was probably an important fix too. :) Unfortunately it grows, as you've said - 23MB for 2 mln. :| PQ stays at just 2MB. Attachment (V13Primes.hs): text/x-haskell, 3621 bytes ___ Haskell-Cafe mailing list Haskell-Cafe at haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes
Daniel Fischer daniel.is.fischer at web.de writes: Am Montag 04 Januar 2010 13:25:47 schrieb Will Ness: Euler's sieve is sieve (p:ps) xs = h ++ sieve ps (t `minus` map (p*) [p,p+2..]) where (h,t) = span ( p*p) xs Not quite. That's a variant of the postponed filters, it crosses off e.g. 45 twice, once as 3*15 and once as 5*9 (okay, it has already been removed by the first, so let's say 45 appears in two lists of numbers to be removed if present). there won't be any such. whatever is removed first, is just skipped second (and third, etc). 45 does appear twice on the two multiples ists (3- and 5-). But it is removed by first, and skipped by second. And there's no removal here in the first place. There is no pre-filled storage. All there is, is some lists comparison, and lists are just generators (I so keep hoping). I don't see any problem here. As Melissa (and yourself, I think) have shown, double hits on multiples are few and far between. Also, it uses no filters, i.e. no individual number divisibility testing. The filters refers first of all to testing an individual number to decide whether to keep it in or not. Euler's sieve removes multiples in advance, so there's no testing and no filtering, only comparison. It follows the _Postponed_ Filter's framework in postponing the action until the right moment; the action itself is two lists comparison and skipping of the equals (i.e. the minus action). Euler's sieve is never attempting to remove a number more than once, that's How's that possible? On wikipedia is says, it removes multiples of 3; then multiples of 5; etc. So it just looks for each number it is removing, if it is there, and if so, removes it. I would argue that looking whether to remove or not, is part of attempting to remove. It can't have foresight, right? the outstanding feature of it. Unfortunately, that also makes it hard to implement efficiently. The problem is that you can't forget the primes between p and p^2, you must remember them for the removal of multiples of p later on. you're right, every formulation of these algorithms is always done in the imperative, mutable-storage setting. They always speak of removing numbers etc. The more pressing problem with that code is its linear structure of course which gets addressed by the tree-folding merge etc. An (inefficient but faithful) implementation would be euler ks@(p:rs) = p : euler (rs `minus` map (*p) ks) I think it does exactly the same thing, computationally, except it does so, again, _prematurely_ exactly in Turner's sieve's fashion - for each prime found, _as_ _soon_ as it is found. If it is faithful, so is my code. :) Its performance is really horrible though. makes sense, as for Turner's. See, the _when_ thing really helps to see what's going on here. A much better implementation is primes = 2:3:euler hprs [] (scanl (+) 5 (cycle [2,4])) where hprs = 5:drop 3 primes euler (p:ps) acc cs = h ++ euler ps (tail (acc ++ h)) (t `minus` comps) where (h,t) = span ( p*p) cs comps = map (*p) (acc ++ cs) this look like {2,3} wheel reimplemented and inlined. No point in improving anything until the linear structure isn't turned into a tree. To be fair, she writes: ... (This optimization does not affect the time complexity of the sieve, however, so its absence from _the_ _code_ in Section 1 is _not_ our cause for worry.) A-HA! But its absense from _*that*_ _*code*_ WAS the *major* cause for worry, as dealing with it worked wonders on its complexity and constant factors. I think you're overinterpreting it. I might have. I don't mean to neatpick, honest. It's just how it looked to me: Turner's? - bad. Squares? - won't help, it's a _minor_ thing; don't even go there! etc. As I've said, geniuses leap, and fly. I just wanted to _walk_ down that road, not skipping any steps. And it turned out, the very first step _really_ pays up _big_. So big in fact, it might be argued that any subsequent improvement is just an advanced optimization, in terms of presenting an introductory code which isn't horribly inefficient and yet is short and clear. ___ Haskell-Cafe mailing list Haskell-Cafe at haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes
Emil Axelsson emax at chalmers.se writes: For me, a real smart compiler is one that would take in e.g. (sum $ take n $ cycle $ [1..m]) and spill out a straight up math formula, inside a few ifs maybe (just an aside). (Also an aside, I couldn't resist...) Then I'm sure you'd say that Feldspar [1] has a smart compiler :) but it didn't produce f n m = if n m then n*(n+1)/2 else let (q,r)=quotRem n m in q*(m*(m+1)/2) + r*(r+1)/2 :) The above expression written in Feldspar and the resulting C code can be found here: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=15592#a15593 / Emil [1] http://feldspar.sourceforge.net/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes
Daniel Fischer daniel.is.fischer at web.de writes: Am Samstag 02 Januar 2010 14:13:29 schrieb Will Ness: Daniel Fischer daniel.is.fischer at web.de writes: Am Mittwoch 30 Dezember 2009 20:46:57 schrieb Will Ness: Daniel Fischer daniel.is.fischer at web.de writes: Am Dienstag 29 Dezember 2009 20:16:59 schrieb Daniel Fischer: especially the claim that going by primes squares is a pleasing but minor optimization, Which it is not. It is a major optimisation. It reduces the algorithmic complexity *and* reduces the constant factors significantly. D'oh! Thinko while computing sum (takeWhile (= n) primes) without paper. It doesn't change the complexity, and the constant factors are reduced far less than I thought. I do not understand. Turner's sieve is primes = sieve [2..] where sieve (p:xs) = p : sieve [x | x-xs, x `mod` p /= 0] and the Postponed Filters is primes = 2: 3: sieve (tail primes) [5,7..] where sieve (p:ps) xs = h ++ sieve ps [x | x-t, x `rem` p /= 0] where (h,~(_:t)) = span ( p*p) xs Are you saying they both exhibit same complexity? No. They don't. But if you're looking at an imperative (mutable array) sieve (that's simpler to analyse because you don't have to take the book-keeping costs of your priority queue, heap or whatever into account), if you start crossing out The key question then, is _*WHEN*_ and not _*WHAT*_. As is clearly demonstrated by the case of Turner/Postponed filters, the work that is done (of crossing numbers off) is the same Crossing off is only part of the work. Most of the work is checking whether to cross off the number in this round. And Turner does a lot more of that than the postponed filters. Exactly the point I tried to make. :) - _when_ it is actually done - but Turner's starts out so _prematurely_ that it is busy doing nothing most of the time. It's not doing nothing. It's just doing a lot of superfluous work. It is _achieveing_ nothing most of the time, though. again, yes. :) Take 7919, the thousandth prime. The postponed filters decide to keep it when fitering out the multiples of 89, the twenty-fourth prime. Turner also divides it by all 975 primes in between. That is a lot of real but futile work. yes. Thus its function call overhead costs pile up enormously, overstaging the actual calculation. It's not only the function calls. Division is expensive, too. yes, that's what I meant - the cost of calling all the fuctions that - we know in advance will - have nothing to do eventually. So analyzing that calculation in the premature execution setting is missing the point, although helpful after we fix this, with the Postponed Filters. _Only then_ the finer points of this algorithm's analysis can be applied - namely, of avoiding testing primes divisibility altogether. And _if_ a fast cheap primality test were to have existed, the filtering versions would win Sorry, I can't follow. What's the point of a primality test here? Every number whose multiples we want to remove is prime, what's left to test? sorry for being obstruse. I meant in a filtering sieve (i.e. postponed filters) vs the multiples removing sieves, if only the cheap primality test have existed (by some _magic_) which _could_ be run on _every_ numer (unlike the costly divisiility test), than the filtering sieves would win. Hypothetically. over, because they progressively cull the input sequence so there would be no double hits as we have when merging the multiples (whether from lists or inside the PQ). But there's a lot of list constructuion and deconstruction necessary for the Euler sieve. yes. Unless, of course, s smart compiler recognizes there's only one consumer for the values each multiples-list is producing, and keeps it stripped down to a generator function, and its current value. I keep thinkig a smart compiler could eliminate all these span calls and replace them with just some pointers manipulating... That may be more work than the multiple hits cause. so that too would make filters win; only _if_ the cheap primality test existed!.. the multiples of p with 2*p, you have ... There are two questions here - where to start crossing off numbers, and when. If you'd start at 2*p maybe the overall complexity would remain the same but it'll add enormous overhead with all those duplicate multiples. The additional duplicate multiples aren't the problem. Sure, the numbers having a prime divisor larger than the square root would be crossed off one additional time, but that isn't so much per se. The additional crossings off are O(bound), so they don't harm the time complexity. But they can have effects which multiply the running time by a large constant. yes, exactly what I wanted to say
[Haskell-cafe] Re: FASTER primes
Will Ness will_n48 at yahoo.com writes: ... It was a big STOP sign on the way to Postponed Filters - Euler's - Bird's merged multiples - tree-merging (with wheel) road of little steps, and used as a justification for her to make a big leap across the chasm towards the PQ code. correction: across the /supposed/ chasm. There is no chasm. There is a nice straight freeway, with rest stops and gas stations, and exits to local roads going across the county in every which way. :) .. and that is of course a matter of personal preference. A genius is perfectly capable of making big leaps across chasms. Heck they might even be able to fly :) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes
Daniel Fischer daniel.is.fischer at web.de writes: Am Mittwoch 30 Dezember 2009 20:46:57 schrieb Will Ness: Daniel Fischer daniel.is.fischer at web.de writes: Am Dienstag 29 Dezember 2009 20:16:59 schrieb Daniel Fischer: especially the claim that going by primes squares is a pleasing but minor optimization, Which it is not. It is a major optimisation. It reduces the algorithmic complexity *and* reduces the constant factors significantly. D'oh! Thinko while computing sum (takeWhile (= n) primes) without paper. It doesn't change the complexity, and the constant factors are reduced far less than I thought. I do not understand. Turner's sieve is primes = sieve [2..] where sieve (p:xs) = p : sieve [x | x-xs, x `mod` p /= 0] and the Postponed Filters is primes = 2: 3: sieve (tail primes) [5,7..] where sieve (p:ps) xs = h ++ sieve ps [x | x-t, x `rem` p /= 0] where (h,~(_:t)) = span ( p*p) xs Are you saying they both exhibit same complexity? No. They don't. But if you're looking at an imperative (mutable array) sieve (that's simpler to analyse because you don't have to take the book-keeping costs of your priority queue, heap or whatever into account), if you start crossing out The key question then, is _*WHEN*_ and not _*WHAT*_. As is clearly demonstrated by the case of Turner/Postponed filters, the work that is done (of crossing numbers off) is the same - _when_ it is actually done - but Turner's starts out so _prematurely_ that it is busy doing nothing most of the time. Thus its function call overhead costs pile up enormously, overstaging the actual calculation. So analyzing that calculation in the premature execution setting is missing the point, although helpful after we fix this, with the Postponed Filters. _Only then_ the finer points of this algorithm's analysis can be applied - namely, of avoiding testing primes divisibility altogether. And _if_ a fast cheap primality test were to have existed, the filtering versions would win over, because they progressively cull the input sequence so there would be no double hits as we have when merging the multiples (whether from lists or inside the PQ). the multiples of p with 2*p, you have sum [bound `div` p - 1 | p - takeWhile (= sqrt bound) primes] crossings-out, that is Theta(bound*log (log bound)). If you eliminate multiples of some small primes a priori (wheel), you can reduce the constant factor significantly, but the complexity remains the same (you drop a few terms from the front of the sum and multiply the remaining terms with phi(n)/n, where n is the product of the excluded primes). If you start crossing out at p^2, the number is sum [bound `div` p - (p-1) | p - takeWhile (= sqrt bound) primes]. The difference is basically sum (takeWhile (= sqrt bound) primes), which I stupidly - I don't remember how - believed to cancel out the main term. It doesn't, it's O(bound/log bound), so the complexity is the same. Now if you take a stream of numbers from which you remove composites, having a priority queue of multiples of primes, things are a little different. If you start crossing out at 2*p, when you are looking at n, you have more multiples in your PQ than if you start crossing out at p^2 (about pi(n/2) vs. pi(sqrt n)), so updating the PQ will be more expensive. But updating the PQ is O(log size), I believe, and log pi(n) is O(log pi(sqrt n)), so I think it shouldn't change the complexity here either. I think this would have complexity O(bound*log bound*log (log bound)). There are two questions here - where to start crossing off numbers, and when. If you'd start at 2*p maybe the overall complexity would remain the same but it'll add enormous overhead with all those duplicate multiples. No, the question is not where to start, but when. PQ might hide the problem until the memory blows up. Anything that we add that won't have any chance of contributing to the final result, is added for nothing and only drives the total cost up needlessly. I was under impression that the first shows O(n^2) approx., and the second one O(n^1.5) (for n primes produced). In Turner/postponed filters, things are really different. Actually, Ms. O'Neill is right, it is a different algorithm. In the above, we match each what _is_ different is divisibility testing vs composites removal, which follows from her in-depth analysis although is never quite formulated in such words in her article. But nothing matters until the premature starting up is eliminated, and that key observation is missing for the article either - worse, it is brushed off with the casual remark that it is a pleasing but minor optimization. Which remark, as you show here, is true in the imperative, mutable-storage setting, but is made in an article abut functional code, in regard to the functional code of Turner's sieve. So the key
[Haskell-cafe] Re: FASTER primes
Daniel Fischer daniel.is.fischer at web.de writes: Am Mittwoch 30 Dezember 2009 01:04:34 schrieb Will Ness: While I haven't detected that with the primes code, I find that in my ghci your code is approximately 2.5 times faster than ONeill or Bayer when interpreted (no difference in scaling observed), while when compiled with -O2, ONeill is approximately three times as fast as your code that was what I was getting at first too, before I've put into my code the _type_signatures_ and the specialize _pragmas_ as per her file. Then it was only 1.3x slower, when compiled (with about same asymptotics and memory usage). Specialising your code to Int makes it half as fast as ONeill here (as an executable). That is largely due to the fact that your code uses much more memory here (54MB vs. 2MB for the millionth prime), though, the MUT times have a ratio of about 1.5. I'm an unsophisticated tester. I just use GHC -O2 -c filename.hs GHCi filename and then it says ( for primes()!!100 ) (8.24 secs, 1906813836 bytes) for my code, and (6.09 secs, 1800873864 bytes) for O'Neill's But now when I've looked at system resources I see this too. Well, it means we've found where the PQ code is better. OK. Now an interesting question is, why does it use so much memory here? Can you send me your exact code so I can see how that behaves here? will do. It's probably doing a lot more bookkeeping. Or it might be some impl issue with scanl or span etc., and it'll go away if we'd recode it directly, who knows. We can only guess if we don't know the compiler in and out. That's exactly what kept me off using the compiler. Guessing. Sheesh. (and I did see a 2.0x speedup once when replacing one simple code snippet for its operationally equivalent twin). and twice as fast as Bayer as an executable, about twice as fast as your code and slightly slower than Bayer in ghci. see, this kind of inconsistencies is exactly why I was concentrating only on one platform in measuring the speed - the interp'/GHCi combination. The problem with that is that one is primarily interested in speed for library functions, which are mostly used as compiled code. right and I used it as a measure of code's fitness to the problem so it was only comparative to me. Especially when developing and trying out several approaches, to test with compiler just takes too long. :) And why should it give (sometimes) wildly different readings when running inside GHCi or standalone ?? Good question. And I have huge memory problems in ghci with your code. That may be due to my implementation of merge and minus, though. You wrote 'standard' and I coded the straightforward methods. Here's what I'm using ... More or less the same that I wrote. The rest is exactly as I've posted it, I've only added type signatures and specialize pragmas, as per Melissa's code, {-# SPECIALIZE primes :: () - [Int] #-} {-# SPECIALIZE primes :: () - [Integer] #-} primes :: Integral a = () - [a] primes () = 2:3:5:7:primes' where primes' = [11,13] ++ drop 2 (rollFrom 11) `minus` comps (comps,_) = tfold mergeSP (pairwise mergeSP mults) mults = map (\p- fromList $ map (p*) $ rollFrom p) $ primes' maybe it's about memoization of primes'. She writes something about it in her code. It's just that the mutating code tends to be convoluted, like in the example I mentioned of quicksort. One has to read the C code with good attention to understand it. Convoluted is (often) an exaggeration. But I agree that the specification of 'what' is usually easier to understand than that of 'how'. well put. Normal Haskell is much more visually apparent, like primes = 2: 3: sieve (tail primes) [5,7..] where sieve (p:ps) xs = h ++ sieve ps (t `minus` tail [q,q+2*p..]) where (h,~(_:t)) = span ( q) xs q = p*p Yes. or primes = 2: 3: sieve [] (tail primes) 5 where sieve fs (p:ps) x = [i | i- [x,x+2..q-2], a!i] ++ sieve ((2*p,q):fs') ps (q+2) where q = p*p mults = [ [y+s,y+2*s..q] | (s,y)- fs] fs' = [ (s,last ms)| ((s,_),ms)- zip fs mults] a = accumArray (\a b-False) True (x,q-2) [(i,()) | ms- mults, i- ms] Umm, really? I'd think if you see what that does, you won't have difficulties with a mutable array sieve. You're right, bad example. :) ___ Haskell-Cafe mailing list Haskell-Cafe at haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes (was: Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve))
Daniel Fischer daniel.is.fischer at web.de writes: No, it's my own code. Nothing elaborate, just sieving numbers 6k±1, twice as fast as the haskellwiki code (here) and uses only 1/3 the memory. For the record: . thanks! will need to sift through it thoroughly... :) :) BTW I think a really smart compiler should just get a specification, like Turner's sieve, and just derive a good code from that by itself. Go ahead and write one. I would love such a beast. Another example would be qq n m = sum $ take n $ cycle [1..m] which should really get compiled into just a math formula, IMO. Now _that_ I would call a good compiler. Dream on, dream on, with hope in your heart. Those who can't do, dream. And rant. :) Maybe you'd like http://en.wikipedia.org/wiki/Shakespeare_(programming_language) ? niice. :) ___ Haskell-Cafe mailing list Haskell-Cafe at haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes
Daniel Fischer daniel.is.fischer at web.de writes: Am Dienstag 29 Dezember 2009 20:16:59 schrieb Daniel Fischer: especially the claim that going by primes squares is a pleasing but minor optimization, Which it is not. It is a major optimisation. It reduces the algorithmic complexity *and* reduces the constant factors significantly. D'oh! Thinko while computing sum (takeWhile (= n) primes) without paper. It doesn't change the complexity, and the constant factors are reduced far less than I thought. I do not understand. Turner's sieve is primes = sieve [2..] where sieve (p:xs) = p : sieve [x | x-xs, x `mod` p /= 0] and the Postponed Filters is primes = 2: 3: sieve (tail primes) [5,7..] where sieve (p:ps) xs = h ++ sieve ps [x | x-t, x `rem` p /= 0] where (h,~(_:t)) = span ( p*p) xs Are you saying they both exhibit same complexity? I was under impression that the first shows O(n^2) approx., and the second one O(n^1.5) (for n primes produced). ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes (was: Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve))
Daniel Fischer daniel.is.fischer at web.de writes: Am Dienstag 29 Dezember 2009 04:38:21 schrieb Will Ness: Now _this_, when tested as interpreted code in GHCi, runs about 2.5x times faster than Priority Queue based code from Melissa O'Neill's ZIP package mentioned at the haskellwiki/Prime_Numbers page, with about half used memory reported, in producing 10,000 to 300,000 primes. It is faster than BayerPrimes.hs from the ZIP package too, in the tested range, at about 35 lines of code in total. That's nice. However, the important criterion is how compiled code (-O2) fares. Do the relations continue to hold? How does it compare to a bitsieve? Haven't gotten to that part yet. :) But why is it more important? Would that not tell us more about the compiler performance than the code itself? This code is just an endpoint (so far) in a short procession of natural stepwise development of the famous classic Turner's sieve, through the postponed filters, through to Euler's sieve, the merging sieve (i.e. Richard Bird's) and on to the tree-fold merging, with wheel. I just wanted to see where the simple normal (i.e. _beginner_-friendly) functional code can get, in a natural way. It's not about writing the fastest code in _advanced_ Haskell. It's about having clear and simple code that can be understood at a glance - i.e. contributes to our understanding of a problem - faithfully reflecting its essential elements, and because of _that_, fast. It's kind of like _not_ using mutable arrays in a quicksort. Seeing claims that it's _either_ Turner's _or_ the PQ-based code didn't feel right to me somehow, especially the claim that going by primes squares is a pleasing but minor optimization, what with the postponed filters (which serves as the framework for all the other variants) achieving the orders of magnitude speedup and cutting the Turner's O(n^2) right down to O(n^1.5) just by doing that squares optimization (with the final version hovering around 1.24..1.17 in the tested range). The Euler's sieve being a special case of Eratosthenes's, too, doesn't let credence to claims that only the PQ version is somehow uniquely authentic and faithful to it. Turner's sieve should have been always looked at as just a specification, not a code, anyway, and actually running it is ridiculous. Postponed filters version, is the one to be used as a reference point of the basic _code_, precisely because it _does_ use the primes squares optimization, which _is_ essential to any basic sieve. ___ Haskell-Cafe mailing list Haskell-Cafe at haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes (was: Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve))
Eugene Kirpichov ekirpichov at gmail.com writes: 2009/12/29 Will Ness will_n48 at yahoo.com: Daniel Fischer daniel.is.fischer at web.de writes: Am Dienstag 29 Dezember 2009 04:38:21 schrieb Will Ness: Now _this_, when tested as interpreted code in GHCi, runs about 2.5x times faster than Priority Queue based code from Melissa O'Neill's ZIP package mentioned at the haskellwiki/Prime_Numbers page, with about half used memory reported, in producing 10,000 to 300,000 primes. It is faster than BayerPrimes.hs from the ZIP package too, in the tested range, at about 35 lines of code in total. That's nice. However, the important criterion is how compiled code (-O2) fares. Do the relations continue to hold? How does it compare to a bitsieve? Haven't gotten to that part yet. :) But why is it more important? Would that not tell us more about the compiler performance than the code itself? If you mean algorithmic complexity, you shouldn't care about a difference of 2.5x. It's not just at one point; the asymptotics are _the_same_ across the range that I've tested (admittedly, somewhat narrow). I measure local behavior simply as logBase in base of ratio of problem sizes, of the ratio of run times. If you mean actual performance for a particular task, you should measure the performance in realistic conditions. Namely, if you're implementing a program that needs efficient generation of primes, won't you compile it with -O2? If I realistically needed primes generated in a real life setting, I'd probably had to use some C for that. If OTOH we're talking about a tutorial code that is to be as efficient as possible without loosing it clarity, being a reflection of essentials of the problem, then any overly complicated advanced Haskell wouldn't be my choice either. And seeing that this overly-complicated (IMO), steps-jumping PQ-based code was sold to us as the only faithful rendering of the sieve, I wanted to see for myself whether this really holds water. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes (was: Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve))
Daniel Fischer daniel.is.fischer at web.de writes: Am Dienstag 29 Dezember 2009 04:38:21 schrieb Will Ness: Now _this_, when tested as interpreted code in GHCi, runs about 2.5x times faster than Priority Queue based code from Melissa O'Neill's ZIP package mentioned at the haskellwiki/Prime_Numbers page, with about half used memory reported, in producing 10,000 to 300,000 primes. It is faster than BayerPrimes.hs from the ZIP package too, in the tested range, at about 35 lines of code in total. That's nice. However, the important criterion is how compiled code (-O2) fares. Do the relations continue to hold? How does it compare to a bitsieve? OK, I've tested it now. For some reason it runs about 1.3x times slower than Melissa O'Neill's code when compiled, while taking about the same amount of memory (going slightly worse actually, 0.97..1.05..1.08), in the range of 100,000..1,000,000..2,000,000 primes produced. The local asymptotic behavior was about the same, again, in both versions, about O(n^1.20..1.25) - worsening slightly for the merging version (the ratio of run times going 1.25..1.29..1.32). I guess that makes the two versions (almost) operationally equivalent in producing of up to a million primes or two. ___ Haskell-Cafe mailing list Haskell-Cafe at haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: FASTER primes
Daniel Fischer daniel.is.fischer at web.de writes: Am Dienstag 29 Dezember 2009 14:34:03 schrieb Will Ness: Daniel Fischer daniel.is.fischer at web.de writes: Am Dienstag 29 Dezember 2009 04:38:21 schrieb Will Ness: Now _this_, when tested as interpreted code in GHCi, runs about 2.5x times faster than Priority Queue based code from Melissa O'Neill's ZIP package mentioned at the haskellwiki/Prime_Numbers page, with about half used memory reported, in producing 10,000 to 300,000 primes. It is faster than BayerPrimes.hs from the ZIP package too, in the tested range, at about 35 lines of code in total. That's nice. However, the important criterion is how compiled code (-O2) fares. Do the relations continue to hold? How does it compare to a bitsieve? Haven't gotten to that part yet. :) But why is it more important? I thought the uppercase FASTER in the subject meant you were really interested in speed. If you're only interested in asymptotics, interpreted may be appropriate. However, it is possible that optimisation can change the perceived asymptotics of an algorithm (determined strictness eliminating thunks for example). While I haven't detected that with the primes code, I find that in my ghci your code is approximately 2.5 times faster than ONeill or Bayer when interpreted (no difference in scaling observed), while when compiled with -O2, ONeill is approximately three times as fast as your code that was what I was getting at first too, before I've put into my code the _type_signatures_ and the specialize _pragmas_ as per her file. Then it was only 1.3x slower, when compiled (with about same asymptotics and memory usage). and twice as fast as Bayer as an executable, about twice as fast as your code and slightly slower than Bayer in ghci. see, this kind of inconsistencies is exactly why I was concentrating only on one platform in measuring the speed - the interp'/GHCi combination. Especially when developing and trying out several approaches, to test with compiler just takes too long. :) And why should it give (sometimes) wildly different readings when running inside GHCi or standalone ?? And I have huge memory problems in ghci with your code. That may be due to my implementation of merge and minus, though. You wrote 'standard' and I coded the straightforward methods. Here's what I'm using (BTW I've put it on the primes haskellwiki page too). The memory reported for interpreted is about half of PQ's (IIRC), and compiled - the same: minus a@(x:xs) b@(y:ys) = case compare x y of LT - x: xs `minus` b GT -a `minus` ys EQ -xs `minus` ys minus a b = a merge a@(x:xs) b@(y:ys) = case compare x y of LT - x: merge xs b EQ - x: merge xs ys GT - y: merge a ys merge a b = if null b then a else b Would that not tell us more about the compiler performance than the code itself? Unless you write machine code or assembly, don't all performance tests tell us more about the compiler/interpreter performance than the code itself? That is, of course, with respect to algorithms with the same scaling behaviour. This code is just an endpoint (so far) in a short procession of natural stepwise development of the famous classic Turner's sieve, That was sieve (x:xs) = x:sieve (filter ((/= 0) . (`mod` x)) xs) , was it? right through the postponed filters, through to Euler's sieve, the merging sieve (i.e. Richard Bird's) and on to the tree-fold merging, with wheel. I just wanted to see where the simple normal (i.e. _beginner_-friendly) functional code can get, in a natural way. Good. It's not about writing the fastest code in _advanced_ Haskell. It's about having clear and simple code that can be understood at a glance - i.e. contributes to our understanding of a problem - faithfully reflecting its essential elements, and because of _that_, fast. It's kind of like _not_ using mutable arrays in a quicksort. What's wrong with mutable arrays? There are a lot of algorithms which can be easily and efficiently implemented using mutable unboxed arrays while a comparably efficient implementation without mutable arrays is hard. For those, I consider STUArrays the natural choice. Sieving primes falls into that category. It's just that the mutating code tends to be convoluted, like in the example I mentioned of quicksort. One has to read the C code with good attention to understand it. Normal Haskell is much more visually apparent, like primes = 2: 3: sieve (tail primes) [5,7..] where sieve (p:ps) xs = h ++ sieve ps (t `minus` tail [q,q+2*p..]) where (h,~(_:t)) = span ( q) xs q = p*p or primes = 2: 3: sieve [] (tail primes) 5 where sieve fs (p:ps) x = [i | i- [x,x+2..q-2], a!i] ++ sieve ((2*p,q):fs') ps
[Haskell-cafe] Re: FASTER primes (was: Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve))
Daniel Fischer daniel.is.fischer at web.de writes: Gee, seems my mail server reads your posts very thoroughly today :) I hope it's not a bad thing. :) Am Dienstag 29 Dezember 2009 14:58:10 schrieb Will Ness: If I realistically needed primes generated in a real life setting, I'd probably had to use some C for that. If you need the utmost speed, then probably yes. If you can do with a little less, my STUArray bitsieves take about 35% longer than the equivalent C code and are roughly eight times faster than ONeillPrimes. I can usually live well with that. Wow! That's fast! (that's the code from haskellwiki's primes page, right?) If OTOH we're talking about a tutorial code that is to be as efficient as possible without loosing it clarity, being a reflection of essentials of the problem, then any overly complicated advanced Haskell wouldn't be my choice either. +1 Though perhaps we view mutable array code differently. In my view, it's neither advanced nor complicated. convoluted, then. Not using higher level concepts, like map and foldr, :) or head.until isSingleton (pairwise merge).map wrap , that kind of thing. :) BTW I think a really smart compiler should just get a specification, like Turner's sieve, and just derive a good code from that by itself. Another example would be qq n m = sum $ take n $ cycle [1..m] which should really get compiled into just a math formula, IMO. Now _that_ I would call a good compiler. That way I really won't have to learn how to use STUArray`s you see. I've seen this question asked a lot, what should be a good programming language? IMO, English (plus math where needed, and maybe some sketches by hand). :) And seeing that this overly-complicated (IMO), steps-jumping PQ-based code was sold to us as the only faithful rendering of the sieve, I wanted to see for myself whether this really holds water. I can understand that very well. :) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve)
apfelmus apfelmus at quantentunnel.de writes: Dave Bayer wrote: What I'm calling a venturi venturi :: Ord a = [[a]] - [a] merges an infinite list of infinite lists into one list, under the assumption that each list, and the heads of the lists, are in increasing order. I wrote this as an experiment in coding data structures in Haskell's lazy evaluation model, rather than as explicit data. The majority of the work done by this code is done by merge; the multiples of each prime percolate up through a tournament consisting of a balanced tree of suspended merge function calls. In order to create an infinite lazy balanced tree of lists, the datatype data List a = A a (List a) | B [a] is used as scaffolding. One thinks of the root of the infinite tree as starting at the leftmost child, and crawling up the left spine as necessary. After some pondering, the List a data structure for merging is really ingenious! :) Here's a try to explain how it works: The task is to merge a number of sorted and infinite lists when it's known that their heads are in increasing order. In particular, we want to write primes = (2:) $ diff [3..] $ venturi $ map multiple primes Thus, we have a (maybe infinite) list xss = [xs1, xs2, xs3, ...] of infinite lists with the following properties all sorted xss sorted (map head xss) where sorted is a function that returns True if the argument is a sorted list. A first try to implement the merging function is venturi xss = foldr1 merge xss = xs1 `merge` (xs2 `merge` (xs3 `merge` ... where merge is the standard function to merge to sorted lists. However, there are two problems. The first problem is that this doesn't work for infinite lists since merge is strict in both arguments. But the property head xs1 head xs2 head xs3 ... we missed to exploit yet can now be used in the following way venturi xss = foldr1 merge' xss merge' (x:xt) ys = x : merge xt ys In other words, merge' is biased towards the left element merge' (x:_|_) _|_ = x : _|_ which is correct since we know that (head xs head ys). The second problem is that we want the calls to merge to be arranged as a balanced binary tree since that gives an efficient heap. It's not so difficult to find a good shape for the infinite tree, the real problem is to adapt merge' to this situation since it's not associative: .. The problem is that the second form doesn't realize that y is also smaller than the third argument. In other words, the second form has to treat more than one element as privileged, namely x1,x2,... and y. This can be done with the aforementioned list data structure data People a = VIP a (People a) | Crowd [a] The people (VIPs and crowd) are assumed to be _sorted_. Now, we can start to implement merge' :: Ord a = People a - People a - People a Hi, ... replying to a two-years-old post here, :) :) and after consulting the full VIP version in haskellwiki/Prime_Numers#Implicit_Heap ... It is indeed the major problem with the merged multiples removing code (similar one to Richard Bird's code from Melissa O'Neill's JFP article) - the linear nature of foldr, requiring an (:: a-b-b) merge function. To make it freely composable to rearrange the list into arbitrary form tree it must indeed be type uniform (:: a-a-a) first, and associative second. The structure of the folded tree should be chosen to better suit the primes multiples production. I guestimate the total cost as Sum (1/p)*d, where p is a generating prime at the leaf, and d the leaf's depth, i.e. the amount of merge nodes its produced multiple must pass on its way to the top. The structure used in your VIP code, 1+(2+(4+(8+...))), can actually be improved upon with another, (2+4)+( (4+8)+( (8+16)+...)), for which the estimated cost is about 10%-12% lower. This can be expressed concisely as the following: primes :: () - [Integer] primes () = 2:primes' where primes'= [3,5] ++ drop 2 [3,5..] `minus` comps mults = map (\p- fromList [p*p,p*p+2*p..]) $ primes' (comps,_) = tfold mergeSP (pairwise mergeSP mults) fromList (x:xs) = ([x],xs) tfold f (a: ~(b: ~(c:xs))) = (a `f` (b `f` c)) `f` tfold f (pairwise f xs) pairwise f (x:y:ys) = f x y : pairwise f ys mergeSP (a,b) ~(c,d) = let (bc,b') = spMerge b c in (a ++ bc, merge b' d) where spMerge u@(x:xs) w@(y:ys) = case compare x y of LT - (x:c,d) where (c,d) = spMerge xs w EQ - (x:c,d) where (c,d) = spMerge xs ys GT - (y:c,d) where (c,d) = spMerge u ys spMerge u [] = ([], u) spMerge [] w = ([], w) with ''merge'' and ''minus'' defined in the usual way. Its run times are indeed improved 10%-12% over the VIP code from the haskellwiki page. Testing was done by running the code,
[Haskell-cafe] FASTER primes (was: Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve))
apfelmus apfelmus at quantentunnel.de writes: ~~ This is a repost, with apologies to anyone who sees this twice (I've replied to a two years old thread, and it doesn't show up in GMANE as I thought it would). ~~ Dave Bayer wrote: What I'm calling a venturi venturi :: Ord a = [[a]] - [a] merges an infinite list of infinite lists into one list, under the assumption that each list, and the heads of the lists, are in increasing order. I wrote this as an experiment in coding data structures in Haskell's lazy evaluation model, rather than as explicit data. The majority of the work done by this code is done by merge; the multiples of each prime percolate up through a tournament consisting of a balanced tree of suspended merge function calls. In order to create an infinite lazy balanced tree of lists, the datatype data List a = A a (List a) | B [a] is used as scaffolding. One thinks of the root of the infinite tree as starting at the leftmost child, and crawling up the left spine as necessary. After some pondering, the List a data structure for merging is really ingenious! :) Here's a try to explain how it works: The task is to merge a number of sorted and infinite lists when it's known that their heads are in increasing order. In particular, we want to write primes = (2:) $ diff [3..] $ venturi $ map multiple primes Thus, we have a (maybe infinite) list xss = [xs1, xs2, xs3, ...] of infinite lists with the following properties all sorted xss sorted (map head xss) where sorted is a function that returns True if the argument is a sorted list. A first try to implement the merging function is venturi xss = foldr1 merge xss = xs1 `merge` (xs2 `merge` (xs3 `merge` ... where merge is the standard function to merge to sorted lists. However, there are two problems. The first problem is that this doesn't work for infinite lists since merge is strict in both arguments. But the property head xs1 head xs2 head xs3 ... we missed to exploit yet can now be used in the following way venturi xss = foldr1 merge' xss merge' (x:xt) ys = x : merge xt ys In other words, merge' is biased towards the left element merge' (x:_|_) _|_ = x : _|_ which is correct since we know that (head xs head ys). The second problem is that we want the calls to merge to be arranged as a balanced binary tree since that gives an efficient heap. It's not so difficult to find a good shape for the infinite tree, the real problem is to adapt merge' to this situation since it's not associative: .. The problem is that the second form doesn't realize that y is also smaller than the third argument. In other words, the second form has to treat more than one element as privileged, namely x1,x2,... and y. This can be done with the aforementioned list data structure data People a = VIP a (People a) | Crowd [a] The people (VIPs and crowd) are assumed to be _sorted_. Now, we can start to implement merge' :: Ord a = People a - People a - People a Hi, ... replying to a two-years-old post here, :) :) and after consulting the full VIP version in haskellwiki/Prime_Numers#Implicit_Heap ... It is indeed the major problem with the merged multiples removing code (similar one to Richard Bird's code from Melissa O'Neill's JFP article) - the linear nature of foldr, requiring an (:: a-b-b) merge function. To make it freely composable to rearrange the list into arbitrary form tree it must indeed be type uniform (:: a-a-a) first, and associative second. The structure of the folded tree should be chosen to better suit the primes multiples production. I guestimate the total cost as Sum (1/p)*d, where p is a generating prime at the leaf, and d the leaf's depth, i.e. the amount of merge nodes its produced multiple must pass on its way to the top. The structure used in your VIP code, 1+(2+(4+(8+...))), can actually be improved upon with another, (2+4)+( (4+8)+( (8+16)+...)), for which the estimated cost is about 10%-12% lower. This can be expressed concisely as the following: primes :: () - [Integer] primes () = 2:primes' where primes'= [3,5] ++ drop 2 [3,5..] `minus` comps mults = map (\p- fromList [p*p,p*p+2*p..]) $ primes' (comps,_) = tfold mergeSP (pairwise mergeSP mults) fromList (x:xs) = ([x],xs) tfold f (a: ~(b: ~(c:xs))) = (a `f` (b `f` c)) `f` tfold f (pairwise f xs) pairwise f (x:y:ys) = f x y : pairwise f ys mergeSP (a,b) ~(c,d) = let (bc,b') = spMerge b c in (a ++ bc, merge b' d) where spMerge u@(x:xs) w@(y:ys) = case compare x y of LT - (x:c,d) where (c,d) = spMerge xs w EQ - (x:c,d) where (c,d) = spMerge xs ys GT - (y:c,d) where (c,d) = spMerge u ys spMerge u [] = ([], u) spMerge [] w = ([], w) with ''merge''
[Haskell-cafe] Re: FASTER primes (was: Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve))
Will Ness will_n48 at yahoo.com writes: wheelSums = roll 0 wdiffs roll = scanl (+) wheel = wdiffs ++ wheel wdiffs = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2: 4:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wdiffs Apparently that works too, but I meant it to be: wdiffs = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2: 4:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:[] :) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Simple FAST lazy functional primes
Steve stevech1097 at yahoo.com.au writes: On Tue, 2009-11-03 at 10:52 -0500, Will wrote: I've just tried it and it was twice slower than mine. (?) I didn't use the [Int] signature in both. [...] It runs twice as fast with it). Although your code has an advantage that it is very easy to add the wheel optimization to it. I used [Int] for both. If you didn't use [Int] then it defaults to [Integer] and you are probably testing the speed of GMP integer operations (rather than the speed of Haskell Int operations) which could give differing conclusions. When both are declared [Int], the ratios are [1.50, 1.40, 1.35, 1.26, 1.26] for speed of your version vs mine in producing 100,000, 300,000, 1, 2 and 3 million primes, on my laptop, with memory consistently at 120% (as reported by GHCi with :s +s switch). Haven't tried any above that, as it's old and slow. But clearly the two versions are on par with each other. The reason I prefer my version is that it's in a form easy to be tweaked with further. :) I had looked into using a wheel before. Its nice in theory, but not so useful in practice. At least that's my experience where using a wheel made the primes function slower. interesting. Adding your code and conclusions to Prime numbers on the Haskell wiki, could be useful. http://www.haskell.org/haskellwiki/Prime_numbers Thank you for pointing me to that page. I've just done that. :) The short description that I've arrived at in the end, is that my version explicitly uses successive prefixes of the primes list to test batches of odds between successive squares of primes. Now it's short and clear. Thanks! I've just updated the page to split it into Finding Primes and Testing Primality. Steve ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Simple FAST lazy functional primes
Jason Dagit dagit at codersbase.com writes: By the way, do you understand where the speedup with Int is coming from? As I understand it, there are two main places. One is that the type class dictionary passing can be removed (GHC might figure this out already, I'd need to check the core to be sure). The other is that GHC is likely unboxing to it's primitive Int# type. [...] Good luck! Jason Thanks! Writing the super-fast sieve wasn't my objective here though. It rather was writing the fastest possible simple functional lazy code true to the sieve's definition, and understanding it better that way (that's the added bonus). As it stands now, this code seems a rather faithful description of what _is_ sieve, except that it tests each number in isolation instead of counting over a bunch of them at once (skipping over primes, getting them for free). THAT's the crucial difference, which the article seems trying to explain but never quite gets it in such simple terms. All the extra activity is kept to absolute minimum here, and _now_ the main thing can be dealt with further, if so desired - like turning to using the PQ thing, etc. Then if we were to compare them, it wouldn't be like comparing apples with orange juice. :) That's what it felt like, seeing the PQ code compared with the classic naive version in that article. I'm reasonably sure that PQ-augmented, this code will be even faster, not slower, even for the first million primes. This whole experience proves it that the clearest code can also be the fastest (and may be necessarily so). Seeing it described in that article as if clarity must be paid for with efficiency (and vice versa), didn't seem right to me. Cheers, ___ Haskell-Cafe mailing list Haskell-Cafe at haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Simple FAST lazy functional primes
Hi Steve, Steve stevech1097 at yahoo.com.au writes: Hi Will, I had previously tested the Melissa O'Neil prime function (using the primes 0.1.1 package) and found it to be fast (for a functional approach), but with high memory use. To fully test a prime function, I think you should: 1. Test for more than the first 10^6 primes. Generating all primes to 10^6 is quite easy for modern computers. Most prime generating functions will run in less than 1 sec and look fast. Try generating all primes to 10^7 and 10^8 then you will see how 'fast' these lazy functional methods really are. 2. Measure memory use. As you move above 10^6 and test primes up to 10^7, 10^8, and 10^9, memory use becomes very important. A prime function with excessive memory use will soon consume all of the system's memory. Here's another primes function which performs quite well. primes :: [Int] primes = 2 : filter (isPrime primes) [3,5..] where isPrime (p:ps) n | mod n p == 0 = False | p*p n = True | otherwise= isPrime ps n isPrime [] _ = False -- never used, but avoids compiler warning Here's some results from my PC, for generating primes to 10^8. 10**6 10**7 10**8 secs MiB secs MiB secs MiB --- 1 0.01 00.1 2 2 14 2 0.56 7 11.1 43270 306 3 0.61 7 11.8 44260 342 4 0.43 365.4 345900 not finished 1 using a Haskell ST Array 2 your primes function 3 my primes function, listed above 4 Melissa O'Neil method from primes 0.1.1 package To summarise the results from the tests I've run: - if you want fast functional primes, forget it, its not possible. - if you just want fast primes, then use C, or a Haskell array. - if performance does not matter and slow primes are acceptable, then use the purely functional approach. Regards, Steve you just have a fast PC that's all, :) so a million is not enough for you. My old laptop is 50 times slower. :) Seriously though, your conclusions seem entirely plausible to me. My goal here was to have a Haskell code for primes, that is decent. Nothing more. The reason your code is slightly slower is of course that in effect it recalculates the needed primes prefix for each new candidate. If you could somehow thread its length through, it might have sped it up some more. I've just tried it and it was twice slower than mine. (?) I didn't use the [Int] signature in both. Maybe it's not real issues we're dealing with here, but compiler/OS/CPU issues? (or have you've forgotten to put the [Int] signature into my code too, when tested? It runs twice as fast with it). Although your code has an advantage that it is very easy to add the wheel optimization to it. BTW I don't know about the code in the package, but the one in the article makes terrible faux-pas of adding each prime into the queue as soon as it is produced; this could easily account for a memory blow-up you're seeing. What's really needed, is to plug a decent PQ implementation into my framework, which does absolute minimum of repeated calculations it seems. What I have now, is this: qprimes = 2: 3: sieve emptyQ primes' 5 where primes' = tail qprimes sieve q (p:ps) x = h ++ sieve (addQ q' (2*p) (p*p+2*p)) ps (p*p+2) where (h,q') = noComps q [x,x+2..p*p-2] .. The main deficiency of list-based sieves, as I've now came to realize and formulate in simple enough terms for myself to understand, is that they work with each number in isolation, and thus must test primes as well as composites. Testing composites on average is cheap, because most of them will have small factors; testing primes is costly. Imperative sieves avoid that by working over spans of numbers at once, so they get their primes for free, when they see gaps in produced/marked composites (I repeat myself here, but am not sure if you've read this my explanation in other posts). What counts here, is the direct access to random memory - or the numbers written out on a blackboard. Melissa's PQ approach tries to emulate that. Not crossing them off, but seeing the gaps in between. One is tempted to treat Haskell as high-level executable definition, hoping for a compiler to turn it into the fastest possible low-level code. I know I can translate it into C by hand fairly well; why wouldn't the compiler? It seems Haskell compilers have much room for improvement. :) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Simple FAST lazy functional primes
Sjoerd Visscher sjoerd at w3future.com writes: [...] 2 doesn't have to be in the list of smaller primes, as we're only generating odd numbers: primes = 2 : 3 : 5 : 7 : sieve [3] (drop 2 primes) sieve qs@(q:_) (p:ps) = [x | x-[q*q+2,q*q+4..p*p-2], and [(x`rem`p)/=0 | p-qs]] ++ sieve (p:qs) ps Sjoerd Thanks! I haven't tested your code's speed yet, but have few points: i wouldn't expect eliminating a parameter to have any effect on performance in compiled code (I haven't a clue whether -O2 or -O3 should be used BTW) if it doesn't eliminate any superfluous computations. second, you prepend bigger primes in front of a list (building the prefix in reverse order); this will cause them to be tested against first. I think (1) we're supposed to test composites against smaller factors first, for the tests to stop sooner. IMO it'll slow the code down. I think (2) you can safely append them at the end instead, for a compiler should be able to just maintain a tail pointer there. But then you loose immediate access to the last found prime; so will need to return the 'x' parameter back. Then you end up with my code exactly :) (only replicating the prefix): primes = 2 : 3 : 5 : 7 : sieve [3] (drop 2 primes) 9 sieve pfx (p:ps) x = [x | x-[x+2,x+4..p*p-2], and [(x`rem`p)/=0 | p-pfx]] ++ sieve (pfx++[p]) ps (p*p) it'll be interesting to test my hypothesis (two of them :) ) and see if this has in fact better time than your code. thirdly, (take k) could be compiled out completely into a tail pointer too, maintained and advanced after each step. I would hope a smart compiler to do just that. Again, some more testing is in order. Although, I tested the two approaches on some previous incarnation of this code, and the (take k) vs (pfx++[p]) had exactly same run time (or was better). What I'm after mostly here, is for the code to be clear and simple, and reasonably efficient. Right now it corresponds very nicely to the description of the sieve as filtering all the odds testable by a known prefix of primes, and then going on to proceed with the next batch of them with a prefix that was grown by one more prime. And so on. But more importantly I want it to be known that there's a lot that can be done here, in a natural functional lazy kind of way, before resorting to priority queues and mutable arrays. We could always just use C too. ;) Thanks for the feedback! :) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Simple FAST lazy functional primes
Will Ness will_n48 at yahoo.com writes: But more importantly I want it to be known that there's a lot that can be done here, in a natural functional lazy kind of way, before resorting to priority queues and mutable arrays. We could always just use C too. ;) I mean it as an introductory code that's nevertheless good for producing the first million primes or so - not the best sieve ever, but the best (is it?) simple clear functional introductory sieve, instead. That's another reason to using (take k primes') - it practically says it in English, the first k odd primes. :) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Simple FAST lazy functional primes
Sjoerd Visscher sjoerd at w3future.com writes: Excuse me, 2 doesn't have to be in the list of smaller primes, as we're only generating odd numbers: primes = 2 : 3 : 5 : 7 : sieve [3] (drop 2 primes) sieve qs@(q:_) (p:ps) = [x | x-[q*q+2,q*q+4..p*p-2], and [(x`rem`p)/=0 | p-qs]] ++ sieve (p:qs) ps Sjoerd Hi, I've run it now. In producing 100,000 primes, your above code takes x3.5 more time than the one I posted. The code modified as I suggested with (qs++[p]) takes exactly the same time as mine. :) Cheers, ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Base classes can be _ELIMINATED_ with interfaces
Shelby Moore shelby at coolpage.com writes: * 1856 Thermo Law: entire universe (a closed system, i.e. everything) trends to maximum disorder. On the very, *very*, VERY long timescale. In the meantime, chaos creates clashes of matter, which cause local energy outbursts (i.e. galaxies), which pump their immediate surroundings, where natural selection in presence of energy influx leads to increasing complexity. To persist for a long, *long*, LONG time. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Simple FAST lazy functional primes
Will Ness will_n48 at yahoo.com writes: primes = 2: 3: sieve 0 primes' 5 primes' = tail primes sieve k (p:ps) x = [x | x - [x,x+2..p*p-2], and [(x`rem`p)/=0 | p - take k primes']] ++ sieve (k+1) ps (p*p+2) (thanks to Leon P.Smith for his brilliant idea of directly generating the spans of odds instead of calling 'span' on the infinite odds list). The relative performance vs the PQ-version is: 100,000 300,000 1,000,000 primes produced 0.6 0.75 0.97 One _crucial_ tidbit I've left out: _type_signature_. Adding (:: [Int]) speeds this code up more than TWICE! :) :) 'sieve' can also be used in e.g. primesFrom m = sieve (length h) ps m where (h,(_:ps)) = span (= (floor.sqrt.fromIntegral) m) primes to get few big primes even faster. :) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Simple FAST lazy functional primes
Jason Dagit dagit at codersbase.com writes: On Mon, Nov 2, 2009 at 1:59 PM, Will Ness will_n48 at yahoo.com wrote: Will Ness will_n48 at yahoo.com writes: One _crucial_ tidbit I've left out: _type_signature_. Adding (:: [Int]) speeds this code up more than TWICE! :) :) If you are okay with Int, then maybe you're also happy with Int32 or Word32. If so, why don't you use template haskell to build the list at compile time? If you do that, then getting the kth prime at run-time is O(k). Take that AKS! :) O(k), I've removed it since the post actually. Wasn't thinking clearly for a moment, having seen the double speedup! I've found Matthew Brecknell's fast code in old Melissa O'Neill's article here from 2007-02-19 18:14:23 GMT. Without the type signature, it's twice slower than my code. I think it is a fairly faithful translation now of what the sieve is all about, except that it tests its candidate numbers whereas sieve counts over them (and thus is able to skip over primes without testing them). The usual functional approach has it working with each number in isolation, so tests it (to recreate counting state in effect), thus overworks much on primes. Next logical step is to start counting! :) Jason ___ Haskell-Cafe mailing list Haskell-Cafe at haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Simple FAST lazy functional primes
First, here it is: primes = 2: 3: sieve 0 primes' 5 primes' = tail primes sieve k (p:ps) x = [x | x-[x,x+2..p*p-2], and [(x`rem`p)/=0 | p-take k primes']] ++ sieve (k+1) ps (p*p+2) (thanks to Leon P.Smith for his brilliant idea of directly generating the spans of odds instead of calling 'span' on the infinite odds list). This code is faster than PriorityQueue-based sieve (granted, using my own ad-hoc implementation, so YMMV) in producing the first million primes (testing was done running the ghc -O3 compiled code inside GHCi). The relative performance vs the PQ-version is: 100,000 300,000 1,000,000 primes produced 0.6 0.75 0.97 I've recently came to the Melissa O'Neill's article (I know, I know) and she makes the incredible claim there that the square-of-prime optimization doesn't count if we want to improve the old and lazy uprimes = 2: sieve [3,5..] where sieve (p:xs) = p : sieve [x | x - xs, x `mod` p 0] Her article gave me the strong impression that she claims that the only way to fix this code is by using Priority Queue based optimization, and then goes on to present astronomical gains in speed by implementing it. Well, I find this claim incredible. First of all, the naive code fires up its nested filters much too early, when in fact each must be started only when the prime's square is reached (not only have its work started there - be started there itself!), so that filters are delayed: dprimes = 2: 3: sieve (tail dprimes) [5,7..] where sieve (p:ps) xs = h ++ sieve ps (filter ((/=0).(`rem`p)) (tail t)) where (h,t)=span ( p*p) xs This code right there is on par with the PQ-base code, only x3-4 slower at generating the first million primes. Second, the implicit control structure of linearly nested filters, piling up in front of the supply stream, each with its prime-number-it-filters-by hidden inside it, needs to be explicated into a data structure of primes-to-filter-by (which is in fact the prefix of the primes list we're building itself), so the filtering could be done by one function call instead of many nested calls: xprimes = 2: 3: sieve 0 primes' [5,7..] where primes' = tail xprimes sieve k (p:ps) xs = noDivs k h ++ sieve (k+1) ps (tail t) where (h,t)=span ( p*p) xs noDivs k = filter (\x- all ((/=0).(x`rem`)) (take k primes')) From here, using the brilliant idea of Leon P. Smith's of fusing the span and the infinite odds supply, we arrive at the final version, kprimes = 2: 3: sieve 0 primes' 5 where primes' = tail kprimes sieve k (p:ps) x = noDivs k h ++ sieve (k+1) ps (t+2) where (h,t)=([x,x+2..t-2], p*p) noDivs k = filter (\x- all ((/=0).(x`rem`)) (take k primes')) Using the list comprehension syntax, it turns out, when compiled with ghc -O3, gives it another 5-10% speedup itself. So I take it to disprove the central premise of the article, and to show that simple lazy functional FAST primes code does in fact exist, and that the PQ optimization - which value of course no-one can dispute - is a far-end optimization. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: [Haskell-beginners] map question
Jason Dagit dagit at codersbase.com writes: On Mon, Oct 19, 2009 at 5:53 PM, Will Ness will_n48 at yahoo.com wrote: You think of functions, where domain matters (for purists?). In syntax only the result matter, does it read? Does it have an intended meaning? How is it a mistake if it expresses what I intended? Both 3 `-` 2 and curry fst `foldl` 0 are exactly the same - expressions with infix operator, read in the same way, interpreted in the same way. In the first case the backticks are made superfluous by Haskell reader for our convinience; but they shouldn't be made illegal. Why should they be? Don't you mean 3 `(-)` 2? I'm pretty sure -, without the parens is infix and (-) is prefix. So it seems to me that you need the brackets for this to be consistent.Jason You absolutely right, in current syntax that also would only be consistent, yet is illegal also. But I propose to augment the syntax by allowing symbolic ops in backticks to stand for themselves. When I see `op`, for me, it says: infix op. So `+` would also say, infix +. (`- ` 2) would finally become possible. It would read: treat - as infix binary and make a flip section out of it. Just as it does for an alphanumeric identifier in (`op` 2). Without backticks, symbolic ops are also treated as infix by default, but that's just convinience. Anyway I guess all the points in this discussion have been made, and it's just a matter of taste. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: [Haskell-beginners] map question
wren ng thornton wren at freegeek.org writes: Will Ness wrote: (`foldl`2) works. (`-`2) should too. The `` syntax is for converting lexical identifiers into infix operators. Symbolic identifiers are already infix, which is why `` So it would be a no-op then. Why make it illegal? Just because it makes writing the scanner easier is no answer. doesn't work for them. If we introduced this then those striving for consistency would be right in requesting that this pattern be allowed for all symbolic operators. I for one am opposed to introducing superfluous syntax for duplicating the current ability to write things in the same ways. This syntax already exists. The '`' symbol is non-collating already, so using it for symbol chars doesn't change anything (it's not that it can be a part of some name, right?). To turn an infix op into an infix op is an id operation, made illegal artificially at the scan phase after a successful lex (or whatever). Finally enabling the missing functionality which is a common stumbling block for every beginner is hardly duplicating. Attack the underlying problem, don't introduce hacks to cover up broken hacks. This isn't C++. The underlying problem is a broken scanner where it can't distinguish between a binary op and a number read syntax. Op names are collated symbol chars, and one of the symbols, -, is also a valid number prefix. So, allow for a clues from programmer to detach it from the number: backticks separate it from the following numeric chars, preventing it from sticking to them. And by itself, it forms an op, a binary one. Not a hack, a solution. A consistent one. Look: (`foldl` 0) (`-` 2) Don't they look exactly the same? Why wouldn't it be made legal? Show me one inconsistency it introduces. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: [Haskell-beginners] map question
Tom Tobin korpios at korpios.com writes: On Mon, Oct 19, 2009 at 5:34 PM, Will Ness will_n48 at yahoo.com wrote: This syntax already exists. The '`' symbol is non-collating already, so using it for symbol chars doesn't change anything (it's not that it can be a part of some name, right?). To turn an infix op into an infix op is an id operation, made illegal artificially at the scan phase after a successful lex (or whatever). If I've accidentally applied syntax meant for a prefix operator to an infix operator, *I want the compiler to tell me*, and not to silently accept my mistake. You don't apply sytax, you write it. You think of functions, where domain matters (for purists?). In syntax only the result matter, does it read? Does it have an intended meaning? How is it a mistake if it expresses what I intended? Both 3 `-` 2 and curry fst `foldl` 0 are exactly the same - expressions with infix operator, read in the same way, interpreted in the same way. In the first case the backticks are made superfluous by Haskell reader for our convinience; but they shouldn't be made illegal. Why should they be? I truly don't understand the resistance to this idea. :) Why wouldn't it be made legal? Show me one inconsistency it introduces. You've said that you want to be able to do this for the sole case of the - (minus-sign) operator: This is not an inconsistence. Plus, if we were to take this idea of using backticks as names delimeters to the extreme, it could even allow us to use such identifiers as `left-fold` or `right-fold` in infix position, and (`left-fold`) by itself. Although that seems not such a good idea. Operators are great because they make our intent visible, immediately apparent. Long words' meaning, like subtract's, is not immediately apparent, and they break consistency. Not everyone's first language in life was English, you see. I don't buy this rationale. Haskell has plenty of English words as function names all over the place; if you can't handle subtract, how are you handling Haskell at all? Sure, the minus-sign issue is a wart, but it's less awkward than the solution you propose for a problem I doubt you really have. When I see `++` I don't need to think _at_all_. When I see `concatenate` or some such, I do - even if for a briefest of moments. It is _less_ convinient both to read and _write_, don't you agree? I don't see my proposal as awkward at all. On the contrary, to me it looks natural and consistent with the other uses of this device in the language. It is this asymmetry that bothers me with the (-) issue, I just want the balance restored. But it is a matter of taste of course. Or obsessing over minutiae. :) Oh well. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: [Haskell-beginners] map question
Gregory Propf gregorypropf at yahoo.com writes: I actually meant it as sort of a joke but maybe it's not after all. Seriously though, using anything non-ASCII in source code is a bad idea, because there are lots of fonts and editors in the world. It seems natural to me to have (`-`2) stand for (flip (-) 2), if only that would be made legal syntax, just as (`foldl`0) stands for (flip (foldl) 0). Supposedly there is no reason to write (`:`[]) since : is already an infix operator, but making it a no-op wouldn't hurt, and would give us a benefit of being able finally to write the binary-minus flip-section in a visually apparent way. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: [Haskell-beginners] map question
Luke Palmer lrpalmer at gmail.com writes: Or you could use the subtract function. map (subtract 2) [3,4,5] [1,2,3] I don't want to. I don't think syntax sugar is worth it in this case. I do. Operators are great because they make our intent visible, immediately apparent. Long words' meaning, like subtract's, is not immediately apparent, and they break consistency. Not everyone's first language in life was English, you see. (`foldl`2) works. (`-`2) should too. I'll settle for (+(-2)) for now, but it ain't that pretty. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe