Re: [Haskell-cafe] vector stream fusion, inlining and compilation time
Here's a transcript from a conversation I had with Conal on IRC. tl;dr: conal cross-module inlining is only possible because ghc stashes a definition in a .hi, iuuc. i'm suggesting that the stashed definition either (a) never include further inlinings, or (b) be augmented by such a definition. Full transcript: 11:23:10 conal jmcarthur: i'm wondering what to do about INLINE pragmas for vector-space and other libraries. i love optimizability and clean/elegant/terse code. and i don't know how to resolve that tension. 11:23:21 jmcarthur conal: yeah, me either. it's annoying 11:23:41 jmcarthur conal: a compiler feature to do it more succinctly would be nice, if we can come up with one 11:23:52 conal jmcarthur: i'm thinking exactly the same 11:24:04 conal jmcarthur: a ghc flag that does what you did manually 11:24:41 jmcarthur conal: yeah, but we still need to do better than inlining *all* functions. we need to be able to tell it we want it to inline all functions satisfying some predicate or something 11:25:07 jmcarthur like, there's no point in forcing to inline functions having absolutely nothing to do with vector, for example 11:25:18 conal jmcarthur: i wonder. ghc already has some heuristics. do we really want anything different/unusual? 11:25:26 jmcarthur then again, combinators that don't inline and get used in a vector function later might still be annoying 11:26:08 conal jmcarthur: maybe some kind of demand-driven mechanism 11:26:21 jmcarthur conal: that's what i was thinking would be best 11:26:28 conal jmcarthur: ie pull inlining rather than push them. or some combo. 11:27:21 conal jmcarthur: i don't think this issue is specific to either vector fusion or to the vector-space package. 11:27:28 jmcarthur conal: actually, this is about rewrite rules more than inlining 11:27:40 jmcarthur conal: maybe if we focus on the rewrite rules we can think of something nicer 11:27:46 conal jmcarthur: ah, yeah. 11:28:32 conal jmcarthur: have you talked with they ghc guys about this issue? i wonder what practice they'd advise for use with the current ghc 11:28:54 jmcarthur i have not 11:29:47 conal jmcarthur: how did the inlining/rewriting concern arise for vector fusion and the vector-space package? 11:30:16 jmcarthur conal: i assume you read the email i linked to? 11:30:27 jmcarthur this one: http://www.haskell.org/pipermail/haskell-cafe/2010-March/074153.html 11:30:34 conal jmcarthur: yes. i'l reread now. 11:31:03 jmcarthur conal: in general, you have to add INLINE pragmas in such cases if you want to be sure your code fuses. A general-purpose mechanism for handling situations like this automatically would be great but we haven't found a good one so far. 11:31:14 jmcarthur i think the most relevant line 11:31:49 conal jmcarthur: thx. and the difficulty (with current ghc) is specifically cross-module, right? 11:32:00 jmcarthur conal: that is my understanding 11:32:10 jmcarthur but perhaps it is more complex 11:32:49 conal jmcarthur: if so, i wonder if ghc could be fixed to inline between modules according to the same heuristics as within a module. 11:34:36 jmcarthur conal: maybe. 11:35:34 conal jmcarthur: part of my discomfort is that i don't know whether the INLINE directives are more helpful or more harmful under all uses. if they were generally helpful, i imagine ghc would do it. 11:56:56 jmcarthur me too 11:57:46 conal jmcarthur: i just found that haskell-cafe thread and added a reply. 11:58:31 conal jmcarthur: hoping that don, roman, etc will have some ideas in addressing that discomfort. 12:09:58 jmcarthur conal: apparently the real trick is that GHC will not inline functions in a function that is annotated INLINE, meaning that rewrite rules can fire on the outermost rule before firing on inner ones 12:10:30 jmcarthur conal: i think it would be nice if we could come up with a way for rewrite rules to affect GHC's inliner 12:10:44 conal jmcarthur: yeah. maybe INLINE ought to be decomposed into two sub-meanings. 12:10:45 jmcarthur then it would only happen when necessary 12:11:17 jmcarthur well, the fact that it forces that function to be inlined is also good though 12:11:33 jmcarthur which is apparently important across module boundaries 12:11:45 conal jmcarthur: maybe ghc could *never* inline functions into an inline body. and then do some caching to avoid redundant work. 12:12:00 jmcarthur perhaps. still leaves the cross-module inlining issue though 12:12:32 jmcarthur i suspect this is also architectural 12:12:44 jmcarthur ghc doesn't know if it will inline a function across a module boundary in advance 12:12:52 jmcarthur therefore it goes ahead and inlines into it 12:13:04 jmcarthur *inlines other functions into it 12:13:31 conal jmcarthur: i don't understand how module boundaries come into play 12:14:40 jmcarthur conal: my suspicion is that because ghc builds modules separately it can't know whether a function will be inlined in another module, so if it's not marked INLINE it feels free to
Re: [Haskell-cafe] vector stream fusion, inlining and compilation time
This is my understanding: Old story (GHC 6.12.1 (?) and below): 1) Function bodies are only optimised if they are not marked INLINE. The assumption is that INLINE bodies will be inlined and then optimised at the point where they are inlined. 2) Unfoldings are exported in .hi files for functions if they are nonrecursive and small enough or marked INLINE New story (GHC HEAD): 1) Function bodies are optimised regardless of whether they are marked INLINE or not. 2) Unfoldings are exported in .hi files for functions if they are nonrecursive and small enough or marked INLINE, but the unfolding that is exported is the _unoptimised_ version! - It is important to export the unoptimised version because the optimisation may destroy information we would have relied upon in rule matching. E.g. the optimised version of a list producer will not use build, but we need to be able to spot the build in rules to do foldr/build fusion. The new story is much more robust - adding INLINE pragmas should be a much safer thing to do now, because even if the function is not inlined you will still get an optimised definition for that function. Other relevant points: a) There is a new flag, -fexpose-all-unfoldings, in GHC HEAD. This does what you want - compile vector-space with this flag on and unfoldings for everything will show up in the .hi file, with no INLINE annotations necessary. This was intended for supercompilers (such as the one I'm working on :-) but you can probably use for this as well. b) GHC already treats RULES differently for the purposes of inlining. It is very eager to inline into the arguments of a function you have RULES for, because doing so may let it spot extra chances for the rule to fire. Hope that helps, Max On 10 March 2010 16:23, Jake McArthur jake.mcart...@gmail.com wrote: Here's a transcript from a conversation I had with Conal on IRC. tl;dr: conal cross-module inlining is only possible because ghc stashes a definition in a .hi, iuuc. i'm suggesting that the stashed definition either (a) never include further inlinings, or (b) be augmented by such a definition. Full transcript: 11:23:10 conal jmcarthur: i'm wondering what to do about INLINE pragmas for vector-space and other libraries. i love optimizability and clean/elegant/terse code. and i don't know how to resolve that tension. 11:23:21 jmcarthur conal: yeah, me either. it's annoying 11:23:41 jmcarthur conal: a compiler feature to do it more succinctly would be nice, if we can come up with one 11:23:52 conal jmcarthur: i'm thinking exactly the same 11:24:04 conal jmcarthur: a ghc flag that does what you did manually 11:24:41 jmcarthur conal: yeah, but we still need to do better than inlining *all* functions. we need to be able to tell it we want it to inline all functions satisfying some predicate or something 11:25:07 jmcarthur like, there's no point in forcing to inline functions having absolutely nothing to do with vector, for example 11:25:18 conal jmcarthur: i wonder. ghc already has some heuristics. do we really want anything different/unusual? 11:25:26 jmcarthur then again, combinators that don't inline and get used in a vector function later might still be annoying 11:26:08 conal jmcarthur: maybe some kind of demand-driven mechanism 11:26:21 jmcarthur conal: that's what i was thinking would be best 11:26:28 conal jmcarthur: ie pull inlining rather than push them. or some combo. 11:27:21 conal jmcarthur: i don't think this issue is specific to either vector fusion or to the vector-space package. 11:27:28 jmcarthur conal: actually, this is about rewrite rules more than inlining 11:27:40 jmcarthur conal: maybe if we focus on the rewrite rules we can think of something nicer 11:27:46 conal jmcarthur: ah, yeah. 11:28:32 conal jmcarthur: have you talked with they ghc guys about this issue? i wonder what practice they'd advise for use with the current ghc 11:28:54 jmcarthur i have not 11:29:47 conal jmcarthur: how did the inlining/rewriting concern arise for vector fusion and the vector-space package? 11:30:16 jmcarthur conal: i assume you read the email i linked to? 11:30:27 jmcarthur this one: http://www.haskell.org/pipermail/haskell-cafe/2010-March/074153.html 11:30:34 conal jmcarthur: yes. i'l reread now. 11:31:03 jmcarthur conal: in general, you have to add INLINE pragmas in such cases if you want to be sure your code fuses. A general-purpose mechanism for handling situations like this automatically would be great but we haven't found a good one so far. 11:31:14 jmcarthur i think the most relevant line 11:31:49 conal jmcarthur: thx. and the difficulty (with current ghc) is specifically cross-module, right? 11:32:00 jmcarthur conal: that is my understanding 11:32:10 jmcarthur but perhaps it is more complex 11:32:49 conal jmcarthur: if so, i wonder if ghc could be fixed to inline between modules according to the same heuristics as within a module. 11:34:36 jmcarthur
Re: [Haskell-cafe] vector stream fusion, inlining and compilation time
Hi Max. Thanks much for passing on this info. Very encouraging news! - Conal On Wed, Mar 10, 2010 at 8:41 AM, Max Bolingbroke batterseapo...@hotmail.com wrote: This is my understanding: Old story (GHC 6.12.1 (?) and below): 1) Function bodies are only optimised if they are not marked INLINE. The assumption is that INLINE bodies will be inlined and then optimised at the point where they are inlined. 2) Unfoldings are exported in .hi files for functions if they are nonrecursive and small enough or marked INLINE New story (GHC HEAD): 1) Function bodies are optimised regardless of whether they are marked INLINE or not. 2) Unfoldings are exported in .hi files for functions if they are nonrecursive and small enough or marked INLINE, but the unfolding that is exported is the _unoptimised_ version! - It is important to export the unoptimised version because the optimisation may destroy information we would have relied upon in rule matching. E.g. the optimised version of a list producer will not use build, but we need to be able to spot the build in rules to do foldr/build fusion. The new story is much more robust - adding INLINE pragmas should be a much safer thing to do now, because even if the function is not inlined you will still get an optimised definition for that function. Other relevant points: a) There is a new flag, -fexpose-all-unfoldings, in GHC HEAD. This does what you want - compile vector-space with this flag on and unfoldings for everything will show up in the .hi file, with no INLINE annotations necessary. This was intended for supercompilers (such as the one I'm working on :-) but you can probably use for this as well. b) GHC already treats RULES differently for the purposes of inlining. It is very eager to inline into the arguments of a function you have RULES for, because doing so may let it spot extra chances for the rule to fire. Hope that helps, Max On 10 March 2010 16:23, Jake McArthur jake.mcart...@gmail.com wrote: Here's a transcript from a conversation I had with Conal on IRC. tl;dr: conal cross-module inlining is only possible because ghc stashes a definition in a .hi, iuuc. i'm suggesting that the stashed definition either (a) never include further inlinings, or (b) be augmented by such a definition. Full transcript: 11:23:10 conal jmcarthur: i'm wondering what to do about INLINE pragmas for vector-space and other libraries. i love optimizability and clean/elegant/terse code. and i don't know how to resolve that tension. 11:23:21 jmcarthur conal: yeah, me either. it's annoying 11:23:41 jmcarthur conal: a compiler feature to do it more succinctly would be nice, if we can come up with one 11:23:52 conal jmcarthur: i'm thinking exactly the same 11:24:04 conal jmcarthur: a ghc flag that does what you did manually 11:24:41 jmcarthur conal: yeah, but we still need to do better than inlining *all* functions. we need to be able to tell it we want it to inline all functions satisfying some predicate or something 11:25:07 jmcarthur like, there's no point in forcing to inline functions having absolutely nothing to do with vector, for example 11:25:18 conal jmcarthur: i wonder. ghc already has some heuristics. do we really want anything different/unusual? 11:25:26 jmcarthur then again, combinators that don't inline and get used in a vector function later might still be annoying 11:26:08 conal jmcarthur: maybe some kind of demand-driven mechanism 11:26:21 jmcarthur conal: that's what i was thinking would be best 11:26:28 conal jmcarthur: ie pull inlining rather than push them. or some combo. 11:27:21 conal jmcarthur: i don't think this issue is specific to either vector fusion or to the vector-space package. 11:27:28 jmcarthur conal: actually, this is about rewrite rules more than inlining 11:27:40 jmcarthur conal: maybe if we focus on the rewrite rules we can think of something nicer 11:27:46 conal jmcarthur: ah, yeah. 11:28:32 conal jmcarthur: have you talked with they ghc guys about this issue? i wonder what practice they'd advise for use with the current ghc 11:28:54 jmcarthur i have not 11:29:47 conal jmcarthur: how did the inlining/rewriting concern arise for vector fusion and the vector-space package? 11:30:16 jmcarthur conal: i assume you read the email i linked to? 11:30:27 jmcarthur this one: http://www.haskell.org/pipermail/haskell-cafe/2010-March/074153.html 11:30:34 conal jmcarthur: yes. i'l reread now. 11:31:03 jmcarthur conal: in general, you have to add INLINE pragmas in such cases if you want to be sure your code fuses. A general-purpose mechanism for handling situations like this automatically would be great but we haven't found a good one so far. 11:31:14 jmcarthur i think the most relevant line 11:31:49 conal jmcarthur: thx. and the difficulty (with current ghc) is specifically cross-module, right? 11:32:00
Re: [Haskell-cafe] vector stream fusion, inlining and compilation time
I think Jake is referring to my vector-space package. He did the work of writing 171 INLINE pragmas, covering lots of methods and standalone function defs. I'm simultaneously grateful for the effort and repelled by the added syntactic noise. Also concerned about the impact of all these directives on other uses of vector-space. If all this inlining is a uniform win, I'd rather ghc did it for me. If the ghc implementers have reasons not to do so in general, then I'd expect that some of those reasons apply to vector-space (and many other libraries) as well. It's not like I'm applying some kind of domain-specific understanding that ghc doesn't have access to. I'm raising my concerns here in the hopes of stimulating some creative thinking and suggestions about addressing this sort of situation more generally. - Conal On Sun, Mar 7, 2010 at 9:23 PM, Don Stewart d...@galois.com wrote: jake.mcarthur: I've run into an issue with inlining that I'm not sure how to work around. I am instantiating some pre-existing type classes with Vector-based types. There already exist generic functions in modules I do not control that use this type class, and they are not tagged with the INLINE pragma. I am doubtful, but I figure it is worth at least asking: is there some practical workaround for this kind of situation that anybody knows about? I don't know of a way, other than patching the library code. If it makes a difference to you, it probably makes a difference to lots of people. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] vector stream fusion, inlining and compilation time
On Tue, Mar 9, 2010 at 11:53 AM, Conal Elliott co...@conal.net wrote: I think Jake is referring to my vector-space package. He did the work of writing 171 INLINE pragmas, covering lots of methods and standalone function defs. I'm simultaneously grateful for the effort and repelled by the added syntactic noise. Also concerned about the impact of all these directives on other uses of vector-space. If all this inlining is a uniform win, I'd rather ghc did it for me. Alas, it very much is not easy to predict. The unfortunate thing about inline directives is that each individual one really can have a substantial, but not necessarily predictable, effect on the performance of an application. I have seen large improvements in performance, large drops in performance, nothing at all, and everything in between, and I have yet to develop a consistently successful intuition about what will work well, and when. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] vector stream fusion, inlining and compilation time
On Mar 4, 2010, at 9:05 PM, Roman Leshchinskiy wrote: On 05/03/2010, at 04:34, stefan kersten wrote: i've been hunting down some performance problems in DSP code using vector and the single most important transformation seems to be throwing in INLINE pragmas for any function that uses vector combinators and is to be called from higher-level code. failing to do so seems to prevent vector operations from being fused and results in big performance hits (the good news is that the optimized code is quite competitive) the downside after adding the INLINE pragmas is that now some of my modules take _really_ long to compile (up to a couple of minutes); any ideas where i can start looking to bring the compilation times down again? Alas, stream fusion (and fusion in general, I guess) requires what I would call whole loop compilation - you need to inline everything into loops. That tends to be slow. I don't know what your code looks like but you could try to control inlining a bit more. For instance, if you have something like this: foo ... = ... map f xs ... where f x = ... I can confirm that this is a general problem with libraries based on fusion / deforestation (having done the independent implementation of fusion in pH back in the day). No INLINE pragma? No fusion for you! That said, as Roman points out it'd be nice if when GHC discovers something is inlinable, it would inline the original definition (or perhaps the inlined, simplified, no-rules-firing version of same). The problem is that this duplicates a lot of the work of the optimizer a lot of the time. you could tell GHC not to inline f until fairly late in the game by adding {-# INLINE [0] f #-} to the where clause. This helps sometimes. Hands up if you can remember what things are legal in the braces, and what they mean. :-) I suspect I'm not the only one for whom remembering this stuff is fairly hard, in part because it doesn't come up too often. -Jan-Willem Maessen___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] vector stream fusion, inlining and compilation time
On 06/03/2010, at 03:10, stefan kersten wrote: i'm still curious, though, why my three versions of direct convolution perform so differently (see attached file). in particular, i somehow expected conv_3 to be the slowest and conv_2 to perform similar to conv_1. any ideas? i haven't had a look at the core yet, mainly because i'm lacking the expertise ... Hmm, one problem is that the current definition of reverse is suboptimal to say the least. I'll fix that. Could you perhaps send me your complete benchmark? Roman ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] vector stream fusion, inlining and compilation time
I've run into an issue with inlining that I'm not sure how to work around. I am instantiating some pre-existing type classes with Vector-based types. There already exist generic functions in modules I do not control that use this type class, and they are not tagged with the INLINE pragma. I am doubtful, but I figure it is worth at least asking: is there some practical workaround for this kind of situation that anybody knows about? - Jake ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] vector stream fusion, inlining and compilation time
jake.mcarthur: I've run into an issue with inlining that I'm not sure how to work around. I am instantiating some pre-existing type classes with Vector-based types. There already exist generic functions in modules I do not control that use this type class, and they are not tagged with the INLINE pragma. I am doubtful, but I figure it is worth at least asking: is there some practical workaround for this kind of situation that anybody knows about? I don't know of a way, other than patching the library code. If it makes a difference to you, it probably makes a difference to lots of people. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] vector stream fusion, inlining and compilation time
This is a general problem when working with RULES-based optimisations. Here is an example of what happens: suppose we have foo :: Vector Int - Vector Int foo xs = map (+1) xs Now, GHC will generate a nice tight loop for this but if in a different module, we have something like this: bar xs = foo (foo xs) then this won't fuse because (a) foo won't be inlined and (b) even if GHC did inline here, it would inline the nice tight loop which can't possibly fuse instead of the original map which can. By slapping an INLINE pragma on foo, you're telling GHC to (almost) always inline the function and to use the original definition for inlining, thus giving it a chance to fuse. thanks for the insight, roman! the downside after adding the INLINE pragmas is that now some of my modules take _really_ long to compile (up to a couple of minutes); any ideas where i can start looking to bring the compilation times down again? Alas, stream fusion (and fusion in general, I guess) requires what I would call whole loop compilation - you need to inline everything into loops. That tends to be slow. I don't know what your code looks like but you could try to control inlining a bit more. For instance, if you have something like this: foo ... = ... map f xs ... where f x = ... you could tell GHC not to inline f until fairly late in the game by adding {-# INLINE [0] f #-} to the where clause. This helps sometimes. thanks, i'll check it out. I'm surprised -Odph doesn't produce faster code than -O2. In any case, you could try turning these flags on individually (esp. -fno-method-sharing and the spec-constr flags) to see how they affect performance and compilation times. in the end it turned out that i had forgotten another INLINE pragma and in my crude benchmarks -O2 and -Odph give basically the same results, -O2 being a little faster. i hope i'll have time next week to do proper benchmarks, and i also want to try ghc HEAD with the llvm patches. conv_1 conv_2 conv_3 -Odph 1.004 2.715 1.096 -O2 1.000 2.710 1.097 i'm still curious, though, why my three versions of direct convolution perform so differently (see attached file). in particular, i somehow expected conv_3 to be the slowest and conv_2 to perform similar to conv_1. any ideas? i haven't had a look at the core yet, mainly because i'm lacking the expertise ... sk import Data.Vector.Generic (Vector, (!)) import qualified Data.Vector.Generic as V conv_1, conv_2, conv_3 :: (Num a, Vector v a) = v a - v a - v a {-# INLINE conv_1 #-} conv_1 h x = V.generate (l+m) f where m = V.length h - 1 l = V.length x {-# INLINE f #-} f n = g 0 n (max 0 (n-l+1)) (min n m) g y n m k = if m = k then let y' = y + (h ! m) * (x ! (n-m)) in y' `seq` g y' n (m+1) k else y {-# INLINE conv_2 #-} conv_2 h x = V.generate (l+m) f where l = V.length x m = V.length h - 1 {-# INLINE f #-} f n = let j = max 0 (n-l+1) k = (min n m) - j + 1 in V.sum (V.zipWith (*) (V.slice j k h) (V.reverse (V.slice (n - j - k + 1) k x))) {-# INLINE conv_3 #-} conv_3 h x = V.generate (l+m-1) f where m = V.length h l = V.length x p = V.replicate (m-1) 0 x' = p ++ x ++ p {-# INLINE f #-} f i = V.sum (V.zipWith (*) (V.reverse h) (V.slice i m x')) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] vector stream fusion, inlining and compilation time
sk: hi, two questions in one post: i've been hunting down some performance problems in DSP code using vector and the single most important transformation seems to be throwing in INLINE pragmas for any function that uses vector combinators and is to be called from higher-level code. failing to do so seems to prevent vector operations from being fused and results in big performance hits (the good news is that the optimized code is quite competitive). does anybody have some more info about the do's and don'ts when programming with vector? Always inline any combination of things that are expressed in terms of vector combinators, so that the combination of your code can hope to fuse as well. the downside after adding the INLINE pragmas is that now some of my modules take _really_ long to compile (up to a couple of minutes); any ideas where i can start looking to bring the compilation times down again? I'm not sure there's much we can do there. i'm compiling with -O2 -funbox-strict-fields instead of -Odph (with ghc 6.10.4 on OSX 10.4), because it's faster for some of my code, but -O2 vs. -Odph doesn't make a noticable difference in compilation time. -Odph should make it easier for some things to fuse -- and get better code. But Roman can say more. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] vector stream fusion, inlining and compilation time
On 05/03/2010, at 04:34, stefan kersten wrote: i've been hunting down some performance problems in DSP code using vector and the single most important transformation seems to be throwing in INLINE pragmas for any function that uses vector combinators and is to be called from higher-level code. failing to do so seems to prevent vector operations from being fused and results in big performance hits (the good news is that the optimized code is quite competitive). does anybody have some more info about the do's and don'ts when programming with vector? This is a general problem when working with RULES-based optimisations. Here is an example of what happens: suppose we have foo :: Vector Int - Vector Int foo xs = map (+1) xs Now, GHC will generate a nice tight loop for this but if in a different module, we have something like this: bar xs = foo (foo xs) then this won't fuse because (a) foo won't be inlined and (b) even if GHC did inline here, it would inline the nice tight loop which can't possibly fuse instead of the original map which can. By slapping an INLINE pragma on foo, you're telling GHC to (almost) always inline the function and to use the original definition for inlining, thus giving it a chance to fuse. GHC could be a bit cleverer here (perhaps by noticing that the original definition is small enough to inline and keeping it) but in general, you have to add INLINE pragmas in such cases if you want to be sure your code fuses. A general-purpose mechanism for handling situations like this automatically would be great but we haven't found a good one so far. the downside after adding the INLINE pragmas is that now some of my modules take _really_ long to compile (up to a couple of minutes); any ideas where i can start looking to bring the compilation times down again? Alas, stream fusion (and fusion in general, I guess) requires what I would call whole loop compilation - you need to inline everything into loops. That tends to be slow. I don't know what your code looks like but you could try to control inlining a bit more. For instance, if you have something like this: foo ... = ... map f xs ... where f x = ... you could tell GHC not to inline f until fairly late in the game by adding {-# INLINE [0] f #-} to the where clause. This helps sometimes. i'm compiling with -O2 -funbox-strict-fields instead of -Odph (with ghc 6.10.4 on OSX 10.4), because it's faster for some of my code, but -O2 vs. -Odph doesn't make a noticable difference in compilation time. If you're *really* interested in performance, I would suggest using GHC head. It really is much better for this kind of code (although not necessarily faster wrt to compilation times). This is what -Odph does: -- -Odph is equivalent to -- ---O2 optimise as much as possible ---fno-method-sharing sharing specialisation defeats fusion -- sometimes ---fdicts-cheap always inline dictionaries ---fmax-simplifier-iterations20 this is necessary sometimes ---fsimplifier-phases=3 we use an additional simplifier phase -- for fusion ---fno-spec-constr-thresholdrun SpecConstr even for big loops ---fno-spec-constr-countSpecConstr as much as possible I'm surprised -Odph doesn't produce faster code than -O2. In any case, you could try turning these flags on individually (esp. -fno-method-sharing and the spec-constr flags) to see how they affect performance and compilation times. Roman ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe