RE: suboptimal ghc code generation in IO vs equivalent pure code case

2016-05-16 Thread Simon Peyton Jones
Subject: Re: suboptimal ghc code generation in IO vs equivalent pure code case I have stared at the cmm and assembly quite a bit. Indeed there is no trace of a token in cmm and assembly as expected. Here is what is happening. In the IO case the entire original list is evaluated and unfolded

Re: suboptimal ghc code generation in IO vs equivalent pure code case

2016-05-14 Thread Harendra Kumar
I have stared at the cmm and assembly quite a bit. Indeed there is no trace of a token in cmm and assembly as expected. Here is what is happening. In the IO case the entire original list is evaluated and unfolded on the stack first. During the recursion, the stack will have as many closure

Re: suboptimal ghc code generation in IO vs equivalent pure code case

2016-05-14 Thread David Feuer
The state token is zero-width and should therefore be erased altogether in code generation. On May 14, 2016 4:21 PM, "Tyson Whitehead" wrote: > On 14/05/16 02:31 PM, Harendra Kumar wrote: > >> The difference seems to be entirely due to memory pressure. At list size >> 1000

Re: suboptimal ghc code generation in IO vs equivalent pure code case

2016-05-14 Thread Tyson Whitehead
On 14/05/16 02:31 PM, Harendra Kumar wrote: The difference seems to be entirely due to memory pressure. At list size 1000 both pure version and IO version perform equally. But as the size of the list increases the pure version scales linearly while the IO version degrades exponentially. Here

Re: suboptimal ghc code generation in IO vs equivalent pure code case

2016-05-14 Thread Harendra Kumar
On 15 May 2016 at 01:35, David Feuer wrote: > Well, a few weeks ago Bertram Felgenhauer came up with a version of IO > that acts more like lazy ST. That could be just the thing. He placed it in > the public domain/CC0 and told me I could put it up on Hackage if I want. >

Re: suboptimal ghc code generation in IO vs equivalent pure code case

2016-05-14 Thread David Feuer
Well, a few weeks ago Bertram Felgenhauer came up with a version of IO that acts more like lazy ST. That could be just the thing. He placed it in the public domain/CC0 and told me I could put it up on Hackage if I want. I'll try to do that this week, but no promises. I could forward his email if

Re: suboptimal ghc code generation in IO vs equivalent pure code case

2016-05-14 Thread Harendra Kumar
The difference seems to be entirely due to memory pressure. At list size 1000 both pure version and IO version perform equally. But as the size of the list increases the pure version scales linearly while the IO version degrades exponentially. Here are the execution times per list element in ns as

Re: suboptimal ghc code generation in IO vs equivalent pure code case

2016-05-14 Thread Harendra Kumar
You are right about the way IO code is generated because of the ordering semantics. The IO version builds the list entirely in a recursive fashion before returning it while the pure code builds it lazily. I wrote very simple versions to keep things simpler: Pure version: f [] = [] f (x : xs) = x

Re: suboptimal ghc code generation in IO vs equivalent pure code case

2016-05-11 Thread Dan Doel
On Tue, May 10, 2016 at 4:45 AM, Harendra Kumar wrote: > Thanks Dan, that helped. I did notice and suspect the update frame and the > unboxed tuple but given my limited knowledge about ghc/core/stg/cmm I was > not sure what is going on. In fact I thought that the

Re: suboptimal ghc code generation in IO vs equivalent pure code case

2016-05-10 Thread Harendra Kumar
Thanks Dan, that helped. I did notice and suspect the update frame and the unboxed tuple but given my limited knowledge about ghc/core/stg/cmm I was not sure what is going on. In fact I thought that the intermediate tuple should get optimized out since it is required only because of the realworld

Re: suboptimal ghc code generation in IO vs equivalent pure code case

2016-05-09 Thread Dan Doel
I'm no expert on reading GHC's generated assembly. However, there may be a line you've overlooked in explaining the difference, namely: movq $stg_upd_frame_info,-16(%rbp) This appears only in the IO code, according to what you've pasted, and it appears to be pushing an update frame (I

suboptimal ghc code generation in IO vs equivalent pure code case

2016-05-09 Thread Harendra Kumar
I have a loop which runs millions of times. For some reason I have to run it in the IO monad. I noticed that when I convert the code from pure to IO monad the generated assembly code in essence is almost identical except one difference where it puts a piece of code in a separate block which is