Re: [Haskell-cafe] Thunks and GHC pessimisation

2013-02-26 Thread Tom Ellis
On Tue, Feb 26, 2013 at 10:00:32AM -, o...@okmij.org wrote:
> Tom Ellis wrote:
> > To avoid retaining a large lazy data structure in memory it is useful to
> > hide it behind a function call.  Below, "many" is used twice.  It is hidden
> > behind a function call so it can be garbage collected between uses. 
> 
> As you discovered, it is quite challenging to ``go against the grain''
> and force recomputation. GHC is quite good at avoiding
> recomputation. This is a trade-off, of time vs space. For large
> search tree, it is space that is a premium, and laziness and similar
> strategies are exactly the wrong trade-off. 

Hi Oleg,

I have good news: Don's suggestion of -fno-full-laziness that fixed the
space leak in my toy example also fixes the space leak in your iterative
deepening code.  To be precise, there is no longer any space leak in your
second implementation where subtrees are hidden behind a function call.

With full laziness your second implementation consumes 99M.  To avoid
sharing hacks are required and your third implementation consumes 2M. 
However, without full laziness your second implementation only consumes 2M.

> The solution (which I've seen in some of the internal library code) is
> to confuse GHC with extra functions:
> http://okmij.org/ftp/Haskell/misc.html#memo-off

Yes, when I read your exposition several months ago it started the thinking
process which culminated in this thread.

> So, eventually it is possible to force recomputation. But the solution
> leaves a poor taste -- fighting a compiler is never a good idea.

I agree, but I'm glad to discover that disabling full laziness is enough to
avoid having to fight the compiler.

> So, this is a bug of sort -- not the bug of GHC, but of lazy evaluation. 
> Lazy evaluation is not the best evaluation strategy.  It is a trade-off,
> which suits a large class of problems and punishes another large class of
> problems.

I'm not as pessimistic as you on this issue.  If the results of function
calls may be shared with previous callers then the situation is problematic. 
However, if it is known that function calls return new thunks then the
programmer has a lot of power to avoid space leaks.

My conclusion is that this is not the bug of lazy evalution.  I would prefer
GHC to be more discerning about when it applies full laziness.  It is
unfortunate that it will apply it in cases where unbounded amounts of
additional memory usage may result.

Tom


% wget --quiet http://okmij.org/ftp/Haskell/STrees.hs

% ghc -O2 -fforce-recomp -rtsopts -main-is STrees.main2 STrees.hs && ./STrees 
iter 100 +RTS -tstderr > /dev/null
[1 of 1] Compiling STrees   ( STrees.hs, STrees.o )
Linking STrees ...
<>

% ghc -O2 -fforce-recomp -rtsopts -main-is STrees.main3 STrees.hs && ./STrees 
iter 100 +RTS -tstderr > /dev/null
[1 of 1] Compiling STrees   ( STrees.hs, STrees.o )
Linking STrees ...
<>

% ghc -O2 -fno-full-laziness -fforce-recomp -rtsopts -main-is STrees.main2 
STrees.hs && ./STrees iter 100 +RTS -tstderr > /dev/null 
[1 of 1] Compiling STrees   ( STrees.hs, STrees.o )
Linking STrees ...
<>

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Thunks and GHC pessimisation

2013-02-26 Thread oleg

Tom Ellis wrote:
> To avoid retaining a large lazy data structure in memory it is useful to
> hide it behind a function call.  Below, "many" is used twice.  It is hidden
> behind a function call so it can be garbage collected between uses. 

As you discovered, it is quite challenging to ``go against the grain''
and force recomputation. GHC is quite good at avoiding
recomputation. This is a trade-off, of time vs space. For large
search tree, it is space that is a premium, and laziness and similar
strategies are exactly the wrong trade-off. 

The solution (which I've seen in some of the internal library code) is
to confuse GHC with extra functions:
http://okmij.org/ftp/Haskell/misc.html#memo-off

So, eventually it is possible to force recomputation. But the solution
leaves a poor taste -- fighting a compiler is never a good idea. So,
this is a bug of sort -- not the bug of GHC, but of lazy
evaluation. Lazy evaluation is not the best evaluation strategy. It is
a trade-off, which suits a large class of problems and punishes
another large class of problems.



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Thunks and GHC pessimisation

2013-02-25 Thread Joachim Breitner
Hi,

Am Sonntag, den 24.02.2013, 18:41 + schrieb Tom Ellis:
> On Sun, Feb 24, 2013 at 07:12:24PM +0100, Joachim Breitner wrote:
> > You should try:
> > 
> > > million :: () -> Int
> > > million _ = 10 ^ (6 :: Int)
> > >
> > > many :: () -> [Int]
> > > many x = [1..million x]
> 
> Thanks Joachim, but that doesn't work either.

oh, strange, I though I had added
Maybe you will need {-# NOINLINE million #-} as well.
to the mailing. Must have skipped or accidentally removed it... anyways,
with that pragma it works.

Greetings,
Joachim

-- 
Joachim "nomeata" Breitner
Debian Developer
  nome...@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C
  JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata



signature.asc
Description: This is a digitally signed message part
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Thunks and GHC pessimisation

2013-02-24 Thread Tom Ellis
On Sun, Feb 24, 2013 at 06:25:56PM +, Don Stewart wrote:
> If you explicitly rely on this not happening, turn it off:
> 
> $ ghc -O2 -fno-full-laziness --make A.hs -rtsopts -fforce-recomp
> [1 of 1] Compiling Main ( A.hs, A.o )
> Linking A ...
> 
> $ time ./A +RTS -M750k
> (5050,5050)
> ./A +RTS -M750k  0.06s user 0.00s system 97% cpu 0.069 total

Thanks Don, that's good to know.  I am not currently bitten by this issue in
any production code, but I am concerned it will crop up when I least expect
it.

Tom

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Thunks and GHC pessimisation

2013-02-24 Thread Tom Ellis
On Sun, Feb 24, 2013 at 07:12:24PM +0100, Joachim Breitner wrote:
> You should try:
> 
> > million :: () -> Int
> > million _ = 10 ^ (6 :: Int)
> >
> > many :: () -> [Int]
> > many x = [1..million x]

Thanks Joachim, but that doesn't work either.

Tom


% cat thunkfail.hs 
{-# OPTIONS_GHC -fno-warn-unused-binds #-}

import Data.List

million :: () -> Int
million _ = 10 ^ (6 :: Int)
 
many :: () -> [Int]
many x = [1..million x]

different_many :: () -> [Int]
different_many x = [1..million x]

twice :: (Int, Int)
twice = (foldl' (+) 0 (many ()), foldl' (+) 0 (many ()))

main :: IO ()
main = print twice
% ghc -O -fforce-recomp -Wall -Werror -rtsopts thunkfail.hs && ./thunkfail
% +RTS -M5M  
[1 of 1] Compiling Main ( thunkfail.hs, thunkfail.o )
Linking thunkfail ...
Heap exhausted;
Current maximum heap size is 5242880 bytes (5 MB);
use `+RTS -M' to increase it.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Thunks and GHC pessimisation

2013-02-24 Thread Don Stewart
In toy examples like this it will be generally hard to convince GHC not to
just collapse your program down to a constant, when you're turning up the
optimization level.

In particular, you are implying -ffull-laziness with -O (or -O2), which can
increase sharing.

> GHC doesn't implement complete full-laziness. When optimisation in on,
and -fno-full-laziness is not  given, *some transformations that increase
sharing are performed, such as extracting repeated computations from a loop*.
These are the same transformations that a fully lazy implementation would
do, the difference is that GHC doesn't consistently apply full-laziness, so
don't rely on it.

http://www.haskell.org/ghc/docs/latest/html/users_guide/options-optimise.html

If you explicitly rely on this not happening, turn it off:

$ ghc -O2 -fno-full-laziness --make A.hs -rtsopts -fforce-recomp
[1 of 1] Compiling Main ( A.hs, A.o )
Linking A ...

$ time ./A +RTS -M750k
(5050,5050)
./A +RTS -M750k  0.06s user 0.00s system 97% cpu 0.069 total

A 750k heap should be enough for anyone :)

-- Don

On Sun, Feb 24, 2013 at 5:49 PM, Tom Ellis <
tom-lists-haskell-cafe-2...@jaguarpaw.co.uk> wrote:

> To avoid retaining a large lazy data structure in memory it is useful to
> hide it behind a function call.  Below, "many" is used twice.  It is hidden
> behind a function call so it can be garbage collected between uses.  That's
> good.  When compiling with "-O" it seems that GHC 7.4.1 decides to keep it
> in memory anyway.  That's bad.  (I can't read core so I don't know exactly
> what's going on).  Replacing one of the "many" in "twice" with
> "different_many" makes everything fine again.
>
> Is this considered a bug in GHC?  Is it a known bug?  It is incredibly
> concerning that GHC would perform this kind of pessimisation.
>
> Tom
>
>
> % cat thunkfail.hs
> {-# OPTIONS_GHC -fno-warn-unused-binds #-}
>
> import Data.List
>
> million :: Int
> million = 10 ^ (6 :: Int)
>
> many :: () -> [Int]
> many () = [1..million]
>
> different_many :: () -> [Int]
> different_many () = [1..million]
>
> twice :: (Int, Int)
> twice = (foldl' (+) 0 (many ()), foldl' (+) 0 (many ()))
>
> main :: IO ()
> main = print twice
>
> % ghc -fforce-recomp -Wall -Werror -rtsopts thunkfail.hs && ./thunkfail
> +RTS -M5M
> [1 of 1] Compiling Main ( thunkfail.hs, thunkfail.o )
> Linking thunkfail ...
> (1784293664,1784293664)
>
> % ghc -O -fforce-recomp -Wall -Werror -rtsopts thunkfail.hs && ./thunkfail
> +RTS -M5M
> [1 of 1] Compiling Main ( thunkfail.hs, thunkfail.o )
> Linking thunkfail ...
> Heap exhausted;
> Current maximum heap size is 5242880 bytes (5 MB);
> use `+RTS -M' to increase it.
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Thunks and GHC pessimisation

2013-02-24 Thread Joachim Breitner
Hi,

I believe, without checking, that

Am Sonntag, den 24.02.2013, 17:49 + schrieb Tom Ellis:
> many :: () -> [Int]
> many () = [1..million]

is indeed called many times. The problem is that [1..million] is not
dependent on the parameter, so GHC will float it out to a top level
declaration, and hence share it between the multiple calls to many.

You should try:

> million :: () -> Int
> million _ = 10 ^ (6 :: Int)
>
> many :: () -> [Int]
> many x = [1..million x]

Greetings,
Joachim

(Maybe I should continue to work on http://arxiv.org/abs/1106.3478 and
related ideas, like annotating thunks in the code as unshareable... or
does this mostly occur in small example programs like this?)

-- 
Joachim "nomeata" Breitner
Debian Developer
  nome...@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C
  JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata



signature.asc
Description: This is a digitally signed message part
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Thunks and GHC pessimisation

2013-02-24 Thread Tom Ellis
To avoid retaining a large lazy data structure in memory it is useful to
hide it behind a function call.  Below, "many" is used twice.  It is hidden
behind a function call so it can be garbage collected between uses.  That's
good.  When compiling with "-O" it seems that GHC 7.4.1 decides to keep it
in memory anyway.  That's bad.  (I can't read core so I don't know exactly
what's going on).  Replacing one of the "many" in "twice" with
"different_many" makes everything fine again.

Is this considered a bug in GHC?  Is it a known bug?  It is incredibly
concerning that GHC would perform this kind of pessimisation.

Tom


% cat thunkfail.hs  
 
{-# OPTIONS_GHC -fno-warn-unused-binds #-}

import Data.List

million :: Int
million = 10 ^ (6 :: Int)

many :: () -> [Int]
many () = [1..million]
  
different_many :: () -> [Int]
different_many () = [1..million]

twice :: (Int, Int)
twice = (foldl' (+) 0 (many ()), foldl' (+) 0 (many ()))

main :: IO ()
main = print twice

% ghc -fforce-recomp -Wall -Werror -rtsopts thunkfail.hs && ./thunkfail +RTS 
-M5M 
[1 of 1] Compiling Main ( thunkfail.hs, thunkfail.o )
Linking thunkfail ...
(1784293664,1784293664)

% ghc -O -fforce-recomp -Wall -Werror -rtsopts thunkfail.hs && ./thunkfail +RTS 
-M5M
[1 of 1] Compiling Main ( thunkfail.hs, thunkfail.o )
Linking thunkfail ...
Heap exhausted;
Current maximum heap size is 5242880 bytes (5 MB);
use `+RTS -M' to increase it.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe