Re: Compiler optimizations questions for ghc 6.10...
I think that what happens is that this function: a `quot` b | b == 0 = divZeroError | a == minBound b == (-1) = overflowError | otherwise = a `quotInt` b is expanded to: a `quot` b = if b == 0 then divZeroError else if a == minBound then if b == (-1) then overflowError else a `quotInt` b else a `quotInt` b Then the compiler sees that b is a constant and computes that b == 0 is False and b == (-1) is also False so it could eliminate two If statements. The result is: a `quot` b = if a == minBound then a `quotInt` b else a `quotInt` b and this is exactly what we get. I bet that if the original function was: a `quot` b | b == 0 = divZeroError | b == (-1) a == minBound = overflowError -- Note the changed order here | otherwise = a `quotInt` b then we would get what we want. I think that it is much more often to have division where the divisor is known so we will get the best code in this case. Regards, Krasimir On Fri, Feb 20, 2009 at 4:00 AM, Tyson Whitehead twhiteh...@gmail.com wrote: On February 19, 2009 18:20:33 Krasimir Angelov wrote: Oh. I looked at the primops.txt.pp for something suspicious but I didn't checked the actual implementation of quot. I thought that quot calls quotInt# directly. When I use quotInt in the code I can get the real idiv assembly instruction. Still the code generated by GHC is strange it doesn't throw any exception actually. It just evaluates the same expression but with the constant maxBound. On Thu, Feb 19, 2009 at 11:19 PM, Max Bolingbroke a `quot` b | b == 0 = divZeroError | a == minBound b == (-1) = overflowError | otherwise = a `quotInt` b I checked the quot one like you said. Kind of strange all right. It looks like it is performing the a == minBound check (where a = maxBound-x) despite the b == (-1) check (where b = 10) being optimized away. Cheers! -Tyson ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Compiler optimizations questions for ghc 6.10...
I was surprised to see this case expression: case GHC.Prim.-# 9223372036854775807 ipv_s1bD of wild2_a1xi [ALWAYS Just L] { What is the purpose to compare the value with maxBound before the division? The case expression doesn't disappear even if I use quot instead of div. Krasimir 2009/2/18 Tyson Whitehead twhiteh...@gmail.com: digit_s1lk = \ (x_aqR [ALWAYS Just U(L)] :: GHC.Types.Int) - case x_aqR of x_XsQ [ALWAYS Just A] { GHC.Types.I# ipv_s1bD [ALWAYS Just L] - let { lvl_s1my [ALWAYS Just D(T)] :: GHC.Types.Int [Str: DmdType] lvl_s1my = case GHC.Prim.-# 9223372036854775807 ipv_s1bD of wild2_a1xi [ALWAYS Just L] { __DEFAULT - case GHC.Base.divInt# wild2_a1xi 10 of wild21_a1xj [ALWAYS Just L] { __DEFAULT - GHC.Types.I# wild21_a1xj }; (-9223372036854775808) - case GHC.Base.divInt# (-9223372036854775808) 10 of wild21_a1xl [ALWAYS Just L] { __DEFAULT - GHC.Types.I# wild21_a1xl } } } in (\ (eta_X1sU [ALWAYS Just L] :: GHC.Types.Int) (eta_s1G7 [ALWAYS Just U(L)] :: GHC.Types.Int) - ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Compiler optimizations questions for ghc 6.10...
2009/2/19 Krasimir Angelov kr.ange...@gmail.com: I was surprised to see this case expression: case GHC.Prim.-# 9223372036854775807 ipv_s1bD of wild2_a1xi [ALWAYS Just L] { What is the purpose to compare the value with maxBound before the division? The case expression doesn't disappear even if I use quot instead of div. Have a look at this snippet of the base library, file GHC/Real.lhs: a `quot` b | b == 0 = divZeroError | a == minBound b == (-1) = overflowError | otherwise = a `quotInt` b quotInt is defined in GHC/Base.lhs as: (I# x) `quotInt` (I# y) = I# (x `quotInt#` y) And quotInt# is a primitive, which I guess is implemented via machine division (though I don't work on the codegen stuff at all) - so your offending case must come from those tests in GHC/Real.lhs. In general, if you want to answer questions like this you can usually find the answer by looking at the base code: it's all in Haskell, so very readable! You can get it online at http://darcs.haskell.org/libraries/base/GHC/ Cheers, Max ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Compiler optimizations questions for ghc 6.10...
Oh. I looked at the primops.txt.pp for something suspicious but I didn't checked the actual implementation of quot. I thought that quot calls quotInt# directly. When I use quotInt in the code I can get the real idiv assembly instruction. Still the code generated by GHC is strange it doesn't throw any exception actually. It just evaluates the same expression but with the constant maxBound. On Thu, Feb 19, 2009 at 11:19 PM, Max Bolingbroke batterseapo...@hotmail.com wrote: 2009/2/19 Krasimir Angelov kr.ange...@gmail.com: I was surprised to see this case expression: case GHC.Prim.-# 9223372036854775807 ipv_s1bD of wild2_a1xi [ALWAYS Just L] { What is the purpose to compare the value with maxBound before the division? The case expression doesn't disappear even if I use quot instead of div. Have a look at this snippet of the base library, file GHC/Real.lhs: a `quot` b | b == 0 = divZeroError | a == minBound b == (-1) = overflowError | otherwise = a `quotInt` b quotInt is defined in GHC/Base.lhs as: (I# x) `quotInt` (I# y) = I# (x `quotInt#` y) And quotInt# is a primitive, which I guess is implemented via machine division (though I don't work on the codegen stuff at all) - so your offending case must come from those tests in GHC/Real.lhs. In general, if you want to answer questions like this you can usually find the answer by looking at the base code: it's all in Haskell, so very readable! You can get it online at http://darcs.haskell.org/libraries/base/GHC/ Cheers, Max ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Compiler optimizations questions for ghc 6.10...
On February 19, 2009 18:20:33 Krasimir Angelov wrote: Oh. I looked at the primops.txt.pp for something suspicious but I didn't checked the actual implementation of quot. I thought that quot calls quotInt# directly. When I use quotInt in the code I can get the real idiv assembly instruction. Still the code generated by GHC is strange it doesn't throw any exception actually. It just evaluates the same expression but with the constant maxBound. On Thu, Feb 19, 2009 at 11:19 PM, Max Bolingbroke a `quot` b | b == 0 = divZeroError | a == minBound b == (-1) = overflowError | otherwise = a `quotInt` b I checked the quot one like you said. Kind of strange all right. It looks like it is performing the a == minBound check (where a = maxBound-x) despite the b == (-1) check (where b = 10) being optimized away. Cheers! -Tyson signature.asc Description: This is a digitally signed message part. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Compiler optimizations questions for ghc 6.10...
2009/2/18 Tyson Whitehead twhiteh...@gmail.com: On February 17, 2009 19:24:44 Max Bolingbroke wrote: 2009/2/17 Tyson Whitehead twhiteh...@gmail.com: (compiled with ghc 6.10 with options -O2 -ddump-simpl) That should have been -ddump-stranal instead of -ddump-simpl. Right. Mystery solved. In case you were wondering, the reason the other two lambdas in digit are not worker/wrappered is because GHC only performs strictness analysis on lambdas that occur /immediately/ in the binding. So increasing the arity of digit is CRUCIAL to getting your program to compile well, as that will cause: * Strictness analysis on the inner lambdas * Inlining of lvl_s1mF * Fewer updateable thunks to be created by callers (as we won't try to cache the results of a partial application) I was wondering why lvl_s1mF is not being inlined into a_s1Gv in the core at the bottom of this email as that is the only place it is ever referenced. The relevant GHC code is SimplUtils.preInlineUnconditionally. It looks like it dosen't get inlined for two reasons: 1) It's not a manifest lambda (it's an application) so inlining inside another lambda would change the number of times the FVs of lvl_s1mF might occur I have to confess my ignorance here as my google fu failed and so I still don't know what a manifest lambda is (other than not a application). : ) Sorry :-). Manifest lambdas are just lambdas that occur manifestly! So e.g.: f = \x - g x HAS a manifest lambda, whereas: f' = g Does NOT have one. No lambda immediately starts the binding. Lambdas, manifest or otherwise, are important in this context because any value with a leading lambda is certainly very cheap and so can be inlined inside other lambdas - which is what you are trying to achieve with lvl_s1mF. (snip code and explanation) I just finished adding the Parse q Int type to help with email line wrapping. As I alluded to in my original email, if I don't have the Int overflow check in digit, it is not chosen as the loop breaker, all the StateT stuff is compiled away, and you get a really nice efficient assembler loop (which is important because the final FSM has to actually chew through GBs of data). The part of the code under the first lambda in digit is as follows (I didn't keep the original dump, so the uniques have changed here). It's the second part of the Int overflow bounds check (i.e., y = (maxBound-x)`div`10), and, indeed, something you don't want to compute unless the easy check fails. Yes - GHC wants to share the work of (maxBound-x)`div`10 between several partial applications of digit. This is usually a good idea, but in this case it sucks because it's resulted in a massively increased arity. IMHO GHC should fix this by: * Marking divInt# INLINE in the base library. This would result in your code would just containing uses of quotInt# * Making some operations cheap even if they may fail (PrimOp.primpOpIsCheap should change). Though this might mean that we turn non-terminating programs into terminating ones (such operations get pushed inside lambdas) but this is consistent with our treatment of lambdas generally. Actually, your divInt# call wouldn't even usually be floated out to between two lambdas, but at the time FloatOut runs there is something in between the \x lambda and the lambdas from the state monad - the monadic bind operator! So FloatOut feels free to move the computation for x up even though that = will go away as soon as we run the simplifier. What a disaster! For me, making digit INLINE fixes all this, but that's probably because the context of my code is not the same as yours (I had to invent parts of a program to bind the bs, q and n variables). For your immediate problem, I would suggest this bit of GHC witchdoctory: where digit :: Int - ParseInt Int Int digit !x = do !y - lift get ( if y = (maxBound-9)`quot`10 || y = ghc_hacks then let !y' = y*10+x in (lift $ put y') s2 else throwError integer overflow ) where {-# INLINE ghc_hacks #-} ghc_hacks = (maxBound-x)`div`10 With luck, this should make the loop-invariant cheap in GHC's eyes, preventing it from trying to share it. It works for me - but like I say things may differ in your program. In general, the -ddump-inlinings flag is useful for working out why something wasn't inlined - but it wouldn't have helped you in this case, because it only dumps information about inlining at call sites, and you actually want an unconditional inlining to occur. I also tried that, and didn't have much luck with it. I didn't understand the output, which there was 48k lines worth of, and the uniques kept changing which made it hard to grep for names from previous -ddump-simpl runs. Right. I tend to use -ddump-inilnings and -dverbose-core2core (or
Re: Compiler optimizations questions for ghc 6.10...
2009/2/18 Max Bolingbroke batterseapo...@hotmail.com: Yes - GHC wants to share the work of (maxBound-x)`div`10 between several partial applications of digit. This is usually a good idea, but in this case it sucks because it's resulted in a massively increased arity. IMHO GHC should fix this by: * Marking divInt# INLINE in the base library. This would result in your code would just containing uses of quotInt# * Making some operations cheap even if they may fail (PrimOp.primpOpIsCheap should change). Though this might mean that we turn non-terminating programs into terminating ones (such operations get pushed inside lambdas) but this is consistent with our treatment of lambdas generally. Just to let you know - I've filed a bug (#3034 http://hackage.haskell.org/trac/ghc/ticket/3034) about this issue. I've taken the liberty of adding you to the CC list so you can keep track of the problem. Cheers, Max ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Compiler optimizations questions for ghc 6.10...
On February 18, 2009 04:29:42 Max Bolingbroke wrote: The part of the code under the first lambda in digit is as follows (I didn't keep the original dump, so the uniques have changed here). It's the second part of the Int overflow bounds check (i.e., y = (maxBound-x)`div`10), and, indeed, something you don't want to compute unless the easy check fails. Yes - GHC wants to share the work of (maxBound-x)`div`10 between several partial applications of digit. This is usually a good idea, but in this case it sucks because it's resulted in a massively increased arity. IMHO GHC should fix this by: * Marking divInt# INLINE in the base library. This would result in your code would just containing uses of quotInt# * Making some operations cheap even if they may fail (PrimOp.primpOpIsCheap should change). Though this might mean that we turn non-terminating programs into terminating ones (such operations get pushed inside lambdas) but this is consistent with our treatment of lambdas generally. Reading your response got me thinking, quot and div, I thought I had changed everything to quot because earlier I found that GHC was leaving evaluation of constant expressions like maxBound-9`div`10 until runtime if I used div. Turns out I had missed that one in the second bounds check, and changing it from y = (maxBound-x)`div`10 to y = (maxBound-x)`qout`10 resulted in GHC just doing the right thing. I presume this is because quot must be either marked INLINE and/or primOpIsCheap like you said div should be. I am guess this is because quot maps directly onto the x86 idiv instruction due to both of them truncating towards zero, while div, with its truncation towards negative infinity, does not. Running with -ddump-asm seems to back this up as quotInt# compiles down to an idiv instruction and divInt# to a call through base_GHCziBase_divIntzh_info. Unfortunately for me, I always seem to instinctively go with div and mod ahead of quot and rem. Actually, your divInt# call wouldn't even usually be floated out to between two lambdas, but at the time FloatOut runs there is something in between the \x lambda and the lambdas from the state monad - the monadic bind operator! So FloatOut feels free to move the computation for x up even though that = will go away as soon as we run the simplifier. What a disaster! I would like to try combining the GHC optimizer with a genetic algorithm so you could set it to pound away on your core loops for an hour or so to find the right sequence of ghc optimization steps to generate the tightest code. Maybe it could then write out an optimization hint file that regular GHC could optionally take in for use alongside the standard rules of thumb to produce great code. All I need is more time and more knowledge. Maybe someday. : ) For me, making digit INLINE fixes all this, but that's probably because the context of my code is not the same as yours (I had to invent parts of a program to bind the bs, q and n variables). Sorry about that. I've put the entire routine at http://www.sharcnet.ca/~tyson/haskell/Example1.hs I should have done this in the first place as it allows me to provide the full code while also stripping it down in the emails to make them more readable. For your immediate problem, I would suggest this bit of GHC witchdoctory: where digit :: Int - ParseInt Int Int digit !x = do !y - lift get ( if y = (maxBound-9)`quot`10 || y = ghc_hacks then let !y' = y*10+x in (lift $ put y') s2 else throwError integer overflow ) where {-# INLINE ghc_hacks #-} ghc_hacks = (maxBound-x)`div`10 With luck, this should make the loop-invariant cheap in GHC's eyes, preventing it from trying to share it. It works for me - but like I say things may differ in your program. I tried that as well, and am pleased to report that it works great. I think I actually understand what is going on here and will hopefully be able to wield it in the future to my advantage. Generally I've had very little luck with the INLINE hammer when the function is at all complex. GHC always seems to finds a way to thwart my pathetic attempts and punish me with even worse code (like then not inlining the monad bind operation in the code resulting from the inlining). : ) If you are interested in trying my script, it's available via Git at http://github.com/batterseapower/scripts/blob/da1f24ba16c27e3994aa66f9db352 ec1102c39d2/ghc-dump-split and is called as ghc-dump-split ghc-dump-file, where ghc-dump-file results from redirection of GHC stdout+stderr to a file. I'll grab a copy of that. : ) Hope all that helps, It sure did. Thanks very much for all your time! I'm really impressed with GHC and the people hacking on it. I've actually
Re: Compiler optimizations questions for ghc 6.10...
On February 18, 2009 12:42:02 Tyson Whitehead wrote: On February 18, 2009 04:29:42 Max Bolingbroke wrote: Yes - GHC wants to share the work of (maxBound-x)`div`10 between several partial applications of digit. This is usually a good idea, but in this case it sucks because it's resulted in a massively increased arity. IMHO GHC should fix this by: * Marking divInt# INLINE in the base library. This would result in your code would just containing uses of quotInt# * Making some operations cheap even if they may fail (PrimOp.primpOpIsCheap should change). Though this might mean that we turn non-terminating programs into terminating ones (such operations get pushed inside lambdas) but this is consistent with our treatment of lambdas generally. I am guess this is because quot maps directly onto the x86 idiv instruction due to both of them truncating towards zero, while div, with its truncation towards negative infinity, does not. Running with -ddump-asm seems to back this up as quotInt# compiles down to an idiv instruction and divInt# to a call through base_GHCziBase_divIntzh_info. Unfortunately for me, I always seem to instinctively go with div and mod ahead of quot and rem. I see what you mean about div as it is defined through divMod which is in turn defined through quotRem in Prelude. n `div` d = q where (q,_) = divMod n d divMod n d = if signum r == negate (signum d) then (q-1, r+d) else qr where qr@(q,r) = quotRem n d However, GHC almost seems to be doing its own thing as, as I mentioned above, it turns it into a divInt# (which is not in GHC.Prim) in the tidy core, which then turns into a call through base_GHCziBase_divIntzh_info in the assembler. (note that I'm just looking at the source from the haskell.org libraries link) Cheers! -Tyson signature.asc Description: This is a digitally signed message part. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Compiler optimizations questions for ghc 6.10...
2009/2/17 Tyson Whitehead twhiteh...@gmail.com: (compiled with ghc 6.10 with options -O2 -ddump-simpl) I was wondering why lvl_s1mF is not being inlined into a_s1Gv in the core at the bottom of this email as that is the only place it is ever referenced. The relevant GHC code is SimplUtils.preInlineUnconditionally. It looks like it dosen't get inlined for two reasons: 1) It's not a manifest lambda (it's an application) so inlining inside another lambda would change the number of times the FVs of lvl_s1mF might occur 2) I'm not sure if the use-context is considered interesting by GHC because the application of the function might be hidden by the cast. Not sure about this one. So it looks like the problem stems from digit_s1l3 having arity 1 rather than arity 3. You could try and force it into a higher arity somehow, but I can't say exactly how you might do that without seeing the rest of the Core (and in particular the part under the first lambda in the definition of digit). In general, the -ddump-inlinings flag is useful for working out why something wasn't inlined - but it wouldn't have helped you in this case, because it only dumps information about inlining at call sites, and you actually want an unconditional inlining to occur. It also seems the extra levels of indirection are defeating the strictness analyzer on eta_s1CN in a_s1Gv as all code branches either directly force it or ultimately pass it to digit_s1l3 as in the included branch. Also, why isn't digit_s1l3 optimized to take its first parameter unboxed? It is strict in its first argument, and grepping the core shows that it is only ever used like in lvl_s1mF (i.e., passed things like lvl_s1mG). Yeah, that's weird. I don't know the answer to this. Have you actually got to the worker-wrapper stage at the point you copied this core? Thanks! -Tyson PS: Is there any way to get better control over the loop breaker choice? For a slightly simpler digit function, it is not chosen, and great code is produced. I've tried using INLINE on digit, but that seems to result in the monad bind operator not being inlined, which produces even worse code. GHC won't inline inside functions marked with INLINE pragmas, for various very good reasons. I don't know how you could change the loop breaker given the current state of technology, but SPJ has a patch for GHC in the works that revamps the treatment of INLINE pragmas and which should (if my understanding of his patch is correct) solve the issue with = not being inlined within an inlineable digit. Sorry I can't be more helpful. It's possible I could make some more concrete suggestions if you posted the complete code somewhere. Max ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Compiler optimizations questions for ghc 6.10...
On February 17, 2009 19:24:44 Max Bolingbroke wrote: 2009/2/17 Tyson Whitehead twhiteh...@gmail.com: It also seems the extra levels of indirection are defeating the strictness analyzer on eta_s1CN in a_s1Gv as all code branches either directly force it or ultimately pass it to digit_s1l3 as in the included branch. Also, why isn't digit_s1l3 optimized to take its first parameter unboxed? It is strict in its first argument, and grepping the core shows that it is only ever used like in lvl_s1mF (i.e., passed things like lvl_s1mG). Yeah, that's weird. I don't know the answer to this. Have you actually got to the worker-wrapper stage at the point you copied this core? Yes. You are right. Contrary to what the top of the email said, I created that output with -ddump-stranal, and -dshow-passes indicates that the worker- wrapper stage comes next. If I dump it (or just the final core with -ddump- simpl), digit* is entirely replaced with a first-argument-unboxed $wdigit*. The inner lambdas (i.e., the second and third arguments) remain boxed. It seems I should have just used the -ddump-simpl output instead of the - ddump-stranal output. I had just got thinking the the -ddump-simpl output did not include strictness analysis because I didn't see it on a bunch of the code (in retrospect, that was because that code was created by later stages) Cheers! -Tyson signature.asc Description: This is a digitally signed message part. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Compiler optimizations questions for ghc 6.10...
On February 17, 2009 19:24:44 Max Bolingbroke wrote: 2009/2/17 Tyson Whitehead twhiteh...@gmail.com: (compiled with ghc 6.10 with options -O2 -ddump-simpl) That should have been -ddump-stranal instead of -ddump-simpl. I was wondering why lvl_s1mF is not being inlined into a_s1Gv in the core at the bottom of this email as that is the only place it is ever referenced. The relevant GHC code is SimplUtils.preInlineUnconditionally. It looks like it dosen't get inlined for two reasons: 1) It's not a manifest lambda (it's an application) so inlining inside another lambda would change the number of times the FVs of lvl_s1mF might occur I have to confess my ignorance here as my google fu failed and so I still don't know what a manifest lambda is (other than not a application). : ) 2) I'm not sure if the use-context is considered interesting by GHC because the application of the function might be hidden by the cast. Not sure about this one. I was wondering about that, which is why I didn't remove all the cast noise. So it looks like the problem stems from digit_s1l3 having arity 1 rather than arity 3. You could try and force it into a higher arity somehow, but I can't say exactly how you might do that without seeing the rest of the Core (and in particular the part under the first lambda in the definition of digit). The thing is that the inner lambdas come from inlining that StateT monad transformers in a StateT q (StateT Int (ErrorT String Identity)) monad (i.e., the first one is the q state -- which works out to an Int -- and the second is the Int state). I guess I could explicitly pass them around, but that would seem to defeat the purpose of having StateT. The actual routines under this implement part of a FSM for (hopefully) efficiently extracting an Int from a ByteString (or a uvector UArr -- source of the Step data type). The relevant part of the actual code, which is a bit hacked up with ! patterns from my attempts to get better code, is as follows. type ParseInt q a = StateT q (StateT Int (ErrorT String Identity)) a next :: q - Step q Word8 next i | i==n = Done | otherwise = Yield (bs `BS.unsafeIndex` i) (i+1) wrap :: Monad m = (Word8 - StateT q m a) - StateT q m a - StateT q m a wrap yield (done::StateT q m a) = loop where loop :: StateT q m a loop = do q - get case next q of Yield x q' - put q' yield x Skipq' - put q' loop Done - done s2 :: ParseInt q Int s2 = wrap yield done where yield :: Word8 - StateT q (StateT Int (ErrorT String Identity)) Int yield x | x==48 = digit 0 | x==49 = digit 1 | x==50 = digit 2 | x==51 = digit 3 | x==52 = digit 4 | x==53 = digit 5 | x==54 = digit 6 | x==55 = digit 7 | x==56 = digit 8 | x==57 = digit 9 | otherwise = do !y - lift get return y where digit :: Int - ParseInt q Int digit !x = do !y - lift get ( if y = (maxBound-9)`quot`10 || y = (maxBound-x)`div`10 then let !y' = y*10+x in (lift $ put y') s2 else throwError integer overflow ) done :: ParseInt q Int done= do !y - lift get return y I just finished adding the Parse q Int type to help with email line wrapping. As I alluded to in my original email, if I don't have the Int overflow check in digit, it is not chosen as the loop breaker, all the StateT stuff is compiled away, and you get a really nice efficient assembler loop (which is important because the final FSM has to actually chew through GBs of data). The part of the code under the first lambda in digit is as follows (I didn't keep the original dump, so the uniques have changed here). It's the second part of the Int overflow bounds check (i.e., y = (maxBound-x)`div`10), and, indeed, something you don't want to compute unless the easy check fails. digit_s1lk = \ (x_aqR [ALWAYS Just U(L)] :: GHC.Types.Int) - case x_aqR of x_XsQ [ALWAYS Just A] { GHC.Types.I# ipv_s1bD [ALWAYS Just L] - let { lvl_s1my [ALWAYS Just D(T)] :: GHC.Types.Int [Str: DmdType] lvl_s1my = case GHC.Prim.-# 9223372036854775807 ipv_s1bD of wild2_a1xi [ALWAYS Just L] { __DEFAULT - case GHC.Base.divInt# wild2_a1xi 10 of wild21_a1xj [ALWAYS Just L] { __DEFAULT - GHC.Types.I# wild21_a1xj }; (-9223372036854775808) - case GHC.Base.divInt# (-9223372036854775808) 10 of wild21_a1xl [ALWAYS Just L] { __DEFAULT -