RE: optimization and rewrite rules questions
| II is where I'd like to be able to distinguish variables, constants, | and complex expressions in the left-hand sides of RULES, and | I and III are where I'd like control over the rewrite strategy, as | in strategy combinators. I'm deep in icfp submissions, so no time to reply properly. You can distinguish between literals, variables etc, in GHC's BuiltinRules. These are not hard to write: see prelude/PrelRules. But they are built in, not part of the source program. Maybe that's ok, since you are messing with built-in arithmetic. Another avenue is to elaborate the language of rules somehow to let you say what you want. But I don't know a good *spec* for such a feature, let alone an impl. Another possibility is to write a Core-to-Core optimiser aimed at your target area. Max B's plugins would let you dynamically link that to a distributed GHC. But it's not in the HEAD yet... Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: optimization and rewrite rules questions
2009/2/24 Claus Reinke claus.rei...@talk21.com: In the recently burried haskell-cafe thread speed: ghc vs gcc, Bulat pointed out some of the optimizations that GHC doesn't do, such as loop unrolling. I suggested a way of experimenting with loop unrolling, using template haskell to bypass GHC's blindspot (it usually doesn't unfold recursive definitions http://www.haskell.org/pipermail/glasgow-haskell-users/2007-July/012936.html , but if we unfold a loop combinator at compile time, GHC's normal optimizations can take over from there): http://www.haskell.org/pipermail/haskell-cafe/2009-February/056241.html Just a note - there is a solution that doesn't require Template Haskell which I use in my own code. Here is a sketch: fact = fix4 fact_worker {-# INLINE fact_worker #-} fact_worker recurse n | n = 0 = 1 | otherwise = n * recurse (n - 1) {-# INLINE fix4 #-} fix4 f = f1 where f1 = f f2 f2 = f f3 f3 = f f4 f4 = f f1 There is probably a way to generalise this to arbitrary levels of unrolling by using instances of a typeclass on type level numerals. Cheers, Max ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: optimization and rewrite rules questions
| II is where I'd like to be able to distinguish variables, constants, | and complex expressions in the left-hand sides of RULES, and | I and III are where I'd like control over the rewrite strategy, as | in strategy combinators. I'm deep in icfp submissions, so no time to reply properly. Okay, not urgent, just general concerns about GHC performance, can wait a few days. You can distinguish between literals, variables etc, in GHC's BuiltinRules. Not really sufficient. If the loop body was about Maps instead of Ints, different rules would apply. And the same limitations apply for all uses of RULES, which were meant to provide for user-extensible library-specific optimization by transformation. Another avenue is to elaborate the language of rules somehow to let you say what you want. But I don't know a good *spec* for such a feature, let alone an impl. Another possibility is to write a Core-to-Core optimiser aimed at your target area. Max B's plugins would let you dynamically link that to a distributed GHC. But it's not in the HEAD yet... My guess would be to start from standard Strafunski/SYB-style traversals over Core as the semantic API, utilizing compiler plugins. Then look for better syntax, closer to the existing RULES (for a start, something like quasi-quoting, so that left- and right-hand sides of rules can still be written in concrete syntax, even if they are just Haskell function definitions). Then move all RULES to the new framework (to reduce duplication inside GHC, and to provide a single programmer API) and start adding further optimizations to libraries (I wonder: surely others have encountered these limitations when adding RULES for their libraries?). But that is just a guess - as you imply, someone would have to sit down, design, implement, and test. That is why I asked whether it would make a good GSoC topic for GHC HQ (lots of potential, and building on one of last year's topics). Claus ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: optimization and rewrite rules questions
but if we unfold a loop combinator at compile time, GHC's normal optimizations can take over from there): http://www.haskell.org/pipermail/haskell-cafe/2009-February/056241.html Just a note - there is a solution that doesn't require Template Haskell which I use in my own code. Here is a sketch: That is in fact the same solution!-) Just that I stayed close to the example in the original thread, hence a fixpoint-combinator with implicit tail-recursion and built-in counter rather than one with explicit general recursion. fact = fix4 fact_worker {-# INLINE fact_worker #-} fact_worker recurse n | n = 0 = 1 | otherwise = n * recurse (n - 1) {-# INLINE fix4 #-} fix4 f = f1 where f1 = f f2 f2 = f f3 f3 = f f4 f4 = f f1 There is probably a way to generalise this to arbitrary levels of unrolling by using instances of a typeclass on type level numerals. Semantically, one could compute the nested application without meta-level help, but that involves another recursive definition, which GHC won't unfold during compilation. So I used TH, just to generate the equivalent to the 'fixN' definition. Since only the fixpoint/loop combinators need to be unfolded statically, one could indeed do it by hand, for a suitable range of unfolding depths, and provide them as a library. Claus ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Type (class) recursion + families = exponential compile time?
Hi the attached module is a much reduced version of some type-level assurance stuff (inspired by the Lightweight Monadic Regions paper) I am trying to do. I am almost certain that it could be reduced further but it is late and I want to get this off my desk. Note the 4 test functions, test11 .. test14. The following are timings for compiling the module only with all test functions commented out, except respectively, test11, test12, test13, and test14: b...@sarun[1] time ghc -c Bug2.hs ghc -c Bug2.hs 1,79s user 0,04s system 99% cpu 1,836 total b...@sarun[1] time ghc -c Bug2.hs ghc -c Bug2.hs 5,87s user 0,14s system 99% cpu 6,028 total b...@sarun[1] time ghc -c Bug2.hs ghc -c Bug2.hs 23,52s user 0,36s system 99% cpu 23,899 total b...@sarun[1] time ghc -c Bug2.hs ghc -c Bug2.hs 102,20s user 1,32s system 97% cpu 1:45,89 total It seems something is scaling very badly. You really don't want to wait for a version with 20 levels of nesting to compile... If anyone has a good explanation for this, I'd be grateful. BTW, I am not at all certain that this is ghc's fault, it may well be my program, i.e. the constraints are too complex, whatever. I have no idea how hard it is for the compiler to do all the unification. Also, the problem is not of much practical relevance, as no sensible program will use more than a handfull levels of nesting. Cheers Ben {-# LANGUAGE Rank2Types, TypeSynonymInstances #-} {-# LANGUAGE TypeFamilies, GeneralizedNewtypeDeriving #-} module Bug2 where import Control.Monad.Reader newtype ResourceT r s m v = ResourceT { unResourceT :: ReaderT r m v } deriving (Monad) data Ctx = Ctx data Ch = Ch type CAT s c = ResourceT [Ch] (s,c) type CtxM c = ResourceT Ctx c IO newtype CA s c v = CA { unCA :: CAT s c (CtxM c) v } deriving (Monad) class (Monad m) = MonadCA m where type CtxLabel m instance MonadCA (CA s c) where type CtxLabel (CA s c) = c instance (Monad m, MonadCA m, c ~ CtxLabel m) = MonadCA (CAT s c m) where type CtxLabel (CAT s c m) = c runCAT :: (forall s. CAT s c m v) - m v runCAT action = runReaderT (unResourceT action) [] newRgn :: MonadCA m = (forall s. CAT s (CtxLabel m) m v) - m v newRgn = runCAT runCA :: (forall s c. CA s c v) - IO v runCA action = runCtxM (runCAT (unCA action)) runCtxM :: (forall c. CtxM c v) - IO v runCtxM action = runReaderT (unResourceT action) Ctx test11 :: IO () test11 = runCA(newRgn(newRgn(newRgn(newRgn(newRgn( newRgn(newRgn(newRgn(newRgn(return())) -- test12 :: IO () -- test12 = runCA(newRgn(newRgn(newRgn(newRgn(newRgn(newRgn( -- newRgn(newRgn(newRgn(newRgn(return( -- test13 :: IO () -- test13 = runCA(newRgn(newRgn(newRgn(newRgn(newRgn(newRgn(newRgn( -- newRgn(newRgn(newRgn(newRgn(return() -- test14 :: IO () -- test14 = runCA(newRgn(newRgn(newRgn(newRgn(newRgn(newRgn(newRgn(newRgn( -- newRgn(newRgn(newRgn(newRgn(return()) ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Int vs Word performance?
Looking at prelude/PrelRules.hs has reminded me of an old conundrum: if I switch from Int to Word, should I expect any performance differences? A while ago, I needed lots of fairly small positive numbers, together with a small number of flags for each, so I thought I'd switch from Int to Word, and map the flags to bits. But the performance dropped so drastically that I went back to Int, slightly complicating the bitmaps. I didn't expect that, and I can't see any builtin rules for Int that would have no Word counterpart. Here is a trivial example with drastic difference between T = Int and T = Word (~2.5x here): main = print $ foldl' (+) 0 [1..1::T] What am I missing here? Also, in the real code I ended up seeing things like this in the -ddump-simpl output for the bit-fiddling code: GHC.Prim.word2Int# (GHC.Prim.and# (GHC.Prim.int2Word# wild13_XbE) (GHC.Prim.int2Word# y#_a4EZ)) Is that likely to cost me a lot or are these conversions cheap? Claus ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Type (class) recursion + families = exponential compile time?
Just Hindley-Milner type inference is exponential, making the type system more complex isn't going to make things better. 2009/2/26 Ben Franksen ben.frank...@online.de: Hi the attached module is a much reduced version of some type-level assurance stuff (inspired by the Lightweight Monadic Regions paper) I am trying to do. I am almost certain that it could be reduced further but it is late and I want to get this off my desk. Note the 4 test functions, test11 .. test14. The following are timings for compiling the module only with all test functions commented out, except respectively, test11, test12, test13, and test14: b...@sarun[1] time ghc -c Bug2.hs ghc -c Bug2.hs 1,79s user 0,04s system 99% cpu 1,836 total b...@sarun[1] time ghc -c Bug2.hs ghc -c Bug2.hs 5,87s user 0,14s system 99% cpu 6,028 total b...@sarun[1] time ghc -c Bug2.hs ghc -c Bug2.hs 23,52s user 0,36s system 99% cpu 23,899 total b...@sarun[1] time ghc -c Bug2.hs ghc -c Bug2.hs 102,20s user 1,32s system 97% cpu 1:45,89 total It seems something is scaling very badly. You really don't want to wait for a version with 20 levels of nesting to compile... If anyone has a good explanation for this, I'd be grateful. BTW, I am not at all certain that this is ghc's fault, it may well be my program, i.e. the constraints are too complex, whatever. I have no idea how hard it is for the compiler to do all the unification. Also, the problem is not of much practical relevance, as no sensible program will use more than a handfull levels of nesting. Cheers Ben ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Int vs Word performance?
claus.reinke: Looking at prelude/PrelRules.hs has reminded me of an old conundrum: if I switch from Int to Word, should I expect any performance differences? A while ago, I needed lots of fairly small positive numbers, together with a small number of flags for each, so I thought I'd switch from Int to Word, and map the flags to bits. But the performance dropped so drastically that I went back to Int, slightly complicating the bitmaps. I didn't expect that, and I can't see any builtin rules for Int that would have no Word counterpart. Here is a trivial example with drastic difference between T = Int and T = Word (~2.5x here): main = print $ foldl' (+) 0 [1..1::T] What am I missing here? Also, in the real code I ended up seeing things like this in the -ddump-simpl output for the bit-fiddling code: GHC.Prim.word2Int# (GHC.Prim.and# (GHC.Prim.int2Word# wild13_XbE) (GHC.Prim.int2Word# y#_a4EZ)) Is that likely to cost me a lot or are these conversions cheap? Those guys are no-ops, and in general you should never see a performance difference. If you do, it is a bug! There are some known cases where rules are missing however: * Conversions to Int from Double for ceil, floor, round are missing rules. http://hackage.haskell.org/trac/ghc/ticket/2271 * gcd only has a specialised implementation for Int, http://hackage.haskell.org/trac/ghc/ticket/2270 Some others I'm aware of are product/sum/maximum/minimum on lists have specialisations for some atomic types (Int, Integer) but not all (needs a ticket for this too). I'm not aware of any remaining theoretically noop conversions that aren't in fact implemented noops now. If you find them, please open a ticket. -- Don ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Type (class) recursion + families = exponential compile time?
Here's the reference http://portal.acm.org/citation.cfm?id=96748 Deciding ML typability is complete for deterministic exponential time -- Harry G. Mairson. Ben. On 27/02/2009, at 10:12 AM, Ben Franksen wrote: Hi the attached module is a much reduced version of some type-level assurance stuff (inspired by the Lightweight Monadic Regions paper) I am trying to do. I am almost certain that it could be reduced further but it is late and I want to get this off my desk. Note the 4 test functions, test11 .. test14. The following are timings for compiling the module only with all test functions commented out, except respectively, test11, test12, test13, and test14: b...@sarun[1] time ghc -c Bug2.hs ghc -c Bug2.hs 1,79s user 0,04s system 99% cpu 1,836 total b...@sarun[1] time ghc -c Bug2.hs ghc -c Bug2.hs 5,87s user 0,14s system 99% cpu 6,028 total b...@sarun[1] time ghc -c Bug2.hs ghc -c Bug2.hs 23,52s user 0,36s system 99% cpu 23,899 total b...@sarun[1] time ghc -c Bug2.hs ghc -c Bug2.hs 102,20s user 1,32s system 97% cpu 1:45,89 total It seems something is scaling very badly. You really don't want to wait for a version with 20 levels of nesting to compile... If anyone has a good explanation for this, I'd be grateful. BTW, I am not at all certain that this is ghc's fault, it may well be my program, i.e. the constraints are too complex, whatever. I have no idea how hard it is for the compiler to do all the unification. Also, the problem is not of much practical relevance, as no sensible program will use more than a handfull levels of nesting. Cheers Ben Bug2.hs___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Suggestion for bang patterns documentation
I got confused by the GHC documentation recently, I was wondering how it could be improved. From: http://www.haskell.org/ghc/docs/latest/html/users_guide/bang-patterns.html A bang only really has an effect if it precedes a variable or wild-card pattern: f3 !(x,y) = [x,y] f4 (x,y) = [x,y] Here, f3 and f4 are identical; putting a bang before a pattern that forces evaluation anyway does nothing. The first sentence is true, but only in settings where the pattern is being evaluated eagerly -- the bang in: f3 a = let !(x,y) = a in [1,x,y] f4 a = let (x,y) = a in [1,x,y] has an effect. The first time I read this, I took the first sentence to be a unqualified truth and ended up thinking that !(x,y) was equivalent to (x,y) everywhere. Stuff that comes later actually clarifies this, but I missed it. What about making the distinction clear upfront? Something like: A bang in an eager pattern match only really has an effect if it precedes a variable or wild-card pattern: f3 !(x,y) = [x,y] f4 (x,y) = [x,y] Because f4 _|_ will force the evaluation of the pattern match anyway, f3 and f4 are identical; the bang does nothing. It also might be a good idea to immediately follow this with the let/where usage: A bang can also preceed a let/where binding to make the pattern match strict. For example: let ![x,y] = e in b is a strict pattern... (in the existing docs, let comes a bit later): Just a thought. Hopefully someone can come up with a better way of wording what I'm getting at. Thanks, -Brian _ Windows Live™ Hotmail®…more than just e-mail. http://windowslive.com/howitworks?ocid=TXT_TAGLM_WL_t2_hm_justgotbetter_howitworks_022009___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users