Re: [Haskell-cafe] Thunks and GHC pessimisation

2013-02-26 Thread Tom Ellis
On Tue, Feb 26, 2013 at 10:00:32AM -, o...@okmij.org wrote:
 Tom Ellis wrote:
  To avoid retaining a large lazy data structure in memory it is useful to
  hide it behind a function call.  Below, many is used twice.  It is hidden
  behind a function call so it can be garbage collected between uses. 
 
 As you discovered, it is quite challenging to ``go against the grain''
 and force recomputation. GHC is quite good at avoiding
 recomputation. This is a trade-off, of time vs space. For large
 search tree, it is space that is a premium, and laziness and similar
 strategies are exactly the wrong trade-off. 

Hi Oleg,

I have good news: Don's suggestion of -fno-full-laziness that fixed the
space leak in my toy example also fixes the space leak in your iterative
deepening code.  To be precise, there is no longer any space leak in your
second implementation where subtrees are hidden behind a function call.

With full laziness your second implementation consumes 99M.  To avoid
sharing hacks are required and your third implementation consumes 2M. 
However, without full laziness your second implementation only consumes 2M.

 The solution (which I've seen in some of the internal library code) is
 to confuse GHC with extra functions:
 http://okmij.org/ftp/Haskell/misc.html#memo-off

Yes, when I read your exposition several months ago it started the thinking
process which culminated in this thread.

 So, eventually it is possible to force recomputation. But the solution
 leaves a poor taste -- fighting a compiler is never a good idea.

I agree, but I'm glad to discover that disabling full laziness is enough to
avoid having to fight the compiler.

 So, this is a bug of sort -- not the bug of GHC, but of lazy evaluation. 
 Lazy evaluation is not the best evaluation strategy.  It is a trade-off,
 which suits a large class of problems and punishes another large class of
 problems.

I'm not as pessimistic as you on this issue.  If the results of function
calls may be shared with previous callers then the situation is problematic. 
However, if it is known that function calls return new thunks then the
programmer has a lot of power to avoid space leaks.

My conclusion is that this is not the bug of lazy evalution.  I would prefer
GHC to be more discerning about when it applies full laziness.  It is
unfortunate that it will apply it in cases where unbounded amounts of
additional memory usage may result.

Tom


% wget --quiet http://okmij.org/ftp/Haskell/STrees.hs

% ghc -O2 -fforce-recomp -rtsopts -main-is STrees.main2 STrees.hs  ./STrees 
iter 100 +RTS -tstderr  /dev/null
[1 of 1] Compiling STrees   ( STrees.hs, STrees.o )
Linking STrees ...
ghc: 159988256 bytes, 308 GCs, 12946822/42139812 avg/max bytes residency (10 
samples), 99M in use, 0.00 INIT (0.00 elapsed), 8.40 MUT (8.40 elapsed), 0.95 
GC (0.95 elapsed) :ghc

% ghc -O2 -fforce-recomp -rtsopts -main-is STrees.main3 STrees.hs  ./STrees 
iter 100 +RTS -tstderr  /dev/null
[1 of 1] Compiling STrees   ( STrees.hs, STrees.o )
Linking STrees ...
ghc: 6488675656 bytes, 12415 GCs, 214291/384036 avg/max bytes residency (37 
samples), 2M in use, 0.00 INIT (0.00 elapsed), 8.64 MUT (8.64 elapsed), 0.32 GC 
(0.32 elapsed) :ghc

% ghc -O2 -fno-full-laziness -fforce-recomp -rtsopts -main-is STrees.main2 
STrees.hs  ./STrees iter 100 +RTS -tstderr  /dev/null 
[1 of 1] Compiling STrees   ( STrees.hs, STrees.o )
Linking STrees ...
ghc: 11235944260 bytes, 21510 GCs, 208262/456880 avg/max bytes residency (44 
samples), 2M in use, 0.00 INIT (0.00 elapsed), 8.03 MUT (8.03 elapsed), 0.52 GC 
(0.52 elapsed) :ghc

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Thunks and GHC pessimisation

2013-02-25 Thread Joachim Breitner
Hi,

Am Sonntag, den 24.02.2013, 18:41 + schrieb Tom Ellis:
 On Sun, Feb 24, 2013 at 07:12:24PM +0100, Joachim Breitner wrote:
  You should try:
  
   million :: () - Int
   million _ = 10 ^ (6 :: Int)
  
   many :: () - [Int]
   many x = [1..million x]
 
 Thanks Joachim, but that doesn't work either.

oh, strange, I though I had added
Maybe you will need {-# NOINLINE million #-} as well.
to the mailing. Must have skipped or accidentally removed it... anyways,
with that pragma it works.

Greetings,
Joachim

-- 
Joachim nomeata Breitner
Debian Developer
  nome...@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C
  JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata



signature.asc
Description: This is a digitally signed message part
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Thunks and GHC pessimisation

2013-02-24 Thread Joachim Breitner
Hi,

I believe, without checking, that

Am Sonntag, den 24.02.2013, 17:49 + schrieb Tom Ellis:
 many :: () - [Int]
 many () = [1..million]

is indeed called many times. The problem is that [1..million] is not
dependent on the parameter, so GHC will float it out to a top level
declaration, and hence share it between the multiple calls to many.

You should try:

 million :: () - Int
 million _ = 10 ^ (6 :: Int)

 many :: () - [Int]
 many x = [1..million x]

Greetings,
Joachim

(Maybe I should continue to work on http://arxiv.org/abs/1106.3478 and
related ideas, like annotating thunks in the code as unshareable... or
does this mostly occur in small example programs like this?)

-- 
Joachim nomeata Breitner
Debian Developer
  nome...@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C
  JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata



signature.asc
Description: This is a digitally signed message part
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Thunks and GHC pessimisation

2013-02-24 Thread Don Stewart
In toy examples like this it will be generally hard to convince GHC not to
just collapse your program down to a constant, when you're turning up the
optimization level.

In particular, you are implying -ffull-laziness with -O (or -O2), which can
increase sharing.

 GHC doesn't implement complete full-laziness. When optimisation in on,
and -fno-full-laziness is not  given, *some transformations that increase
sharing are performed, such as extracting repeated computations from a loop*.
These are the same transformations that a fully lazy implementation would
do, the difference is that GHC doesn't consistently apply full-laziness, so
don't rely on it.

http://www.haskell.org/ghc/docs/latest/html/users_guide/options-optimise.html

If you explicitly rely on this not happening, turn it off:

$ ghc -O2 -fno-full-laziness --make A.hs -rtsopts -fforce-recomp
[1 of 1] Compiling Main ( A.hs, A.o )
Linking A ...

$ time ./A +RTS -M750k
(5050,5050)
./A +RTS -M750k  0.06s user 0.00s system 97% cpu 0.069 total

A 750k heap should be enough for anyone :)

-- Don

On Sun, Feb 24, 2013 at 5:49 PM, Tom Ellis 
tom-lists-haskell-cafe-2...@jaguarpaw.co.uk wrote:

 To avoid retaining a large lazy data structure in memory it is useful to
 hide it behind a function call.  Below, many is used twice.  It is hidden
 behind a function call so it can be garbage collected between uses.  That's
 good.  When compiling with -O it seems that GHC 7.4.1 decides to keep it
 in memory anyway.  That's bad.  (I can't read core so I don't know exactly
 what's going on).  Replacing one of the many in twice with
 different_many makes everything fine again.

 Is this considered a bug in GHC?  Is it a known bug?  It is incredibly
 concerning that GHC would perform this kind of pessimisation.

 Tom


 % cat thunkfail.hs
 {-# OPTIONS_GHC -fno-warn-unused-binds #-}

 import Data.List

 million :: Int
 million = 10 ^ (6 :: Int)

 many :: () - [Int]
 many () = [1..million]

 different_many :: () - [Int]
 different_many () = [1..million]

 twice :: (Int, Int)
 twice = (foldl' (+) 0 (many ()), foldl' (+) 0 (many ()))

 main :: IO ()
 main = print twice

 % ghc -fforce-recomp -Wall -Werror -rtsopts thunkfail.hs  ./thunkfail
 +RTS -M5M
 [1 of 1] Compiling Main ( thunkfail.hs, thunkfail.o )
 Linking thunkfail ...
 (1784293664,1784293664)

 % ghc -O -fforce-recomp -Wall -Werror -rtsopts thunkfail.hs  ./thunkfail
 +RTS -M5M
 [1 of 1] Compiling Main ( thunkfail.hs, thunkfail.o )
 Linking thunkfail ...
 Heap exhausted;
 Current maximum heap size is 5242880 bytes (5 MB);
 use `+RTS -Msize' to increase it.

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Thunks and GHC pessimisation

2013-02-24 Thread Tom Ellis
On Sun, Feb 24, 2013 at 07:12:24PM +0100, Joachim Breitner wrote:
 You should try:
 
  million :: () - Int
  million _ = 10 ^ (6 :: Int)
 
  many :: () - [Int]
  many x = [1..million x]

Thanks Joachim, but that doesn't work either.

Tom


% cat thunkfail.hs 
{-# OPTIONS_GHC -fno-warn-unused-binds #-}

import Data.List

million :: () - Int
million _ = 10 ^ (6 :: Int)
 
many :: () - [Int]
many x = [1..million x]

different_many :: () - [Int]
different_many x = [1..million x]

twice :: (Int, Int)
twice = (foldl' (+) 0 (many ()), foldl' (+) 0 (many ()))

main :: IO ()
main = print twice
% ghc -O -fforce-recomp -Wall -Werror -rtsopts thunkfail.hs  ./thunkfail
% +RTS -M5M  
[1 of 1] Compiling Main ( thunkfail.hs, thunkfail.o )
Linking thunkfail ...
Heap exhausted;
Current maximum heap size is 5242880 bytes (5 MB);
use `+RTS -Msize' to increase it.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Thunks and GHC pessimisation

2013-02-24 Thread Tom Ellis
On Sun, Feb 24, 2013 at 06:25:56PM +, Don Stewart wrote:
 If you explicitly rely on this not happening, turn it off:
 
 $ ghc -O2 -fno-full-laziness --make A.hs -rtsopts -fforce-recomp
 [1 of 1] Compiling Main ( A.hs, A.o )
 Linking A ...
 
 $ time ./A +RTS -M750k
 (5050,5050)
 ./A +RTS -M750k  0.06s user 0.00s system 97% cpu 0.069 total

Thanks Don, that's good to know.  I am not currently bitten by this issue in
any production code, but I am concerned it will crop up when I least expect
it.

Tom

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe