Subject: Re: suboptimal ghc code generation in IO vs equivalent pure code case
I have stared at the cmm and assembly quite a bit. Indeed there is no trace of
a token in cmm and assembly as expected. Here is what is happening.
In the IO case the entire original list is evaluated and unfolded
I have stared at the cmm and assembly quite a bit. Indeed there is no trace
of a token in cmm and assembly as expected. Here is what is happening.
In the IO case the entire original list is evaluated and unfolded on the
stack first. During the recursion, the stack will have as many closure
The state token is zero-width and should therefore be erased altogether in
code generation.
On May 14, 2016 4:21 PM, "Tyson Whitehead" wrote:
> On 14/05/16 02:31 PM, Harendra Kumar wrote:
>
>> The difference seems to be entirely due to memory pressure. At list size
>> 1000
On 14/05/16 02:31 PM, Harendra Kumar wrote:
The difference seems to be entirely due to memory pressure. At list size 1000
both pure version and IO version perform equally. But as the size of the list
increases the pure version scales linearly while the IO version degrades
exponentially. Here
On 15 May 2016 at 01:35, David Feuer wrote:
> Well, a few weeks ago Bertram Felgenhauer came up with a version of IO
> that acts more like lazy ST. That could be just the thing. He placed it in
> the public domain/CC0 and told me I could put it up on Hackage if I want.
>
Well, a few weeks ago Bertram Felgenhauer came up with a version of IO that
acts more like lazy ST. That could be just the thing. He placed it in the
public domain/CC0 and told me I could put it up on Hackage if I want. I'll
try to do that this week, but no promises. I could forward his email if
The difference seems to be entirely due to memory pressure. At list size
1000 both pure version and IO version perform equally. But as the size of
the list increases the pure version scales linearly while the IO version
degrades exponentially. Here are the execution times per list element in ns
as
You are right about the way IO code is generated because of the ordering
semantics. The IO version builds the list entirely in a recursive fashion
before returning it while the pure code builds it lazily. I wrote very
simple versions to keep things simpler:
Pure version:
f [] = []
f (x : xs) = x
On Tue, May 10, 2016 at 4:45 AM, Harendra Kumar
wrote:
> Thanks Dan, that helped. I did notice and suspect the update frame and the
> unboxed tuple but given my limited knowledge about ghc/core/stg/cmm I was
> not sure what is going on. In fact I thought that the
Thanks Dan, that helped. I did notice and suspect the update frame and the
unboxed tuple but given my limited knowledge about ghc/core/stg/cmm I was
not sure what is going on. In fact I thought that the intermediate tuple
should get optimized out since it is required only because of the realworld
I'm no expert on reading GHC's generated assembly. However, there may
be a line you've overlooked in explaining the difference, namely:
movq $stg_upd_frame_info,-16(%rbp)
This appears only in the IO code, according to what you've pasted, and
it appears to be pushing an update frame (I
I have a loop which runs millions of times. For some reason I have to run
it in the IO monad. I noticed that when I convert the code from pure to IO
monad the generated assembly code in essence is almost identical except one
difference where it puts a piece of code in a separate block which is
12 matches
Mail list logo