Re: [Haskell-cafe] Thunks and GHC pessimisation
On Tue, Feb 26, 2013 at 10:00:32AM -, o...@okmij.org wrote: > Tom Ellis wrote: > > To avoid retaining a large lazy data structure in memory it is useful to > > hide it behind a function call. Below, "many" is used twice. It is hidden > > behind a function call so it can be garbage collected between uses. > > As you discovered, it is quite challenging to ``go against the grain'' > and force recomputation. GHC is quite good at avoiding > recomputation. This is a trade-off, of time vs space. For large > search tree, it is space that is a premium, and laziness and similar > strategies are exactly the wrong trade-off. Hi Oleg, I have good news: Don's suggestion of -fno-full-laziness that fixed the space leak in my toy example also fixes the space leak in your iterative deepening code. To be precise, there is no longer any space leak in your second implementation where subtrees are hidden behind a function call. With full laziness your second implementation consumes 99M. To avoid sharing hacks are required and your third implementation consumes 2M. However, without full laziness your second implementation only consumes 2M. > The solution (which I've seen in some of the internal library code) is > to confuse GHC with extra functions: > http://okmij.org/ftp/Haskell/misc.html#memo-off Yes, when I read your exposition several months ago it started the thinking process which culminated in this thread. > So, eventually it is possible to force recomputation. But the solution > leaves a poor taste -- fighting a compiler is never a good idea. I agree, but I'm glad to discover that disabling full laziness is enough to avoid having to fight the compiler. > So, this is a bug of sort -- not the bug of GHC, but of lazy evaluation. > Lazy evaluation is not the best evaluation strategy. It is a trade-off, > which suits a large class of problems and punishes another large class of > problems. I'm not as pessimistic as you on this issue. If the results of function calls may be shared with previous callers then the situation is problematic. However, if it is known that function calls return new thunks then the programmer has a lot of power to avoid space leaks. My conclusion is that this is not the bug of lazy evalution. I would prefer GHC to be more discerning about when it applies full laziness. It is unfortunate that it will apply it in cases where unbounded amounts of additional memory usage may result. Tom % wget --quiet http://okmij.org/ftp/Haskell/STrees.hs % ghc -O2 -fforce-recomp -rtsopts -main-is STrees.main2 STrees.hs && ./STrees iter 100 +RTS -tstderr > /dev/null [1 of 1] Compiling STrees ( STrees.hs, STrees.o ) Linking STrees ... <> % ghc -O2 -fforce-recomp -rtsopts -main-is STrees.main3 STrees.hs && ./STrees iter 100 +RTS -tstderr > /dev/null [1 of 1] Compiling STrees ( STrees.hs, STrees.o ) Linking STrees ... <> % ghc -O2 -fno-full-laziness -fforce-recomp -rtsopts -main-is STrees.main2 STrees.hs && ./STrees iter 100 +RTS -tstderr > /dev/null [1 of 1] Compiling STrees ( STrees.hs, STrees.o ) Linking STrees ... <> ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Thunks and GHC pessimisation
Tom Ellis wrote: > To avoid retaining a large lazy data structure in memory it is useful to > hide it behind a function call. Below, "many" is used twice. It is hidden > behind a function call so it can be garbage collected between uses. As you discovered, it is quite challenging to ``go against the grain'' and force recomputation. GHC is quite good at avoiding recomputation. This is a trade-off, of time vs space. For large search tree, it is space that is a premium, and laziness and similar strategies are exactly the wrong trade-off. The solution (which I've seen in some of the internal library code) is to confuse GHC with extra functions: http://okmij.org/ftp/Haskell/misc.html#memo-off So, eventually it is possible to force recomputation. But the solution leaves a poor taste -- fighting a compiler is never a good idea. So, this is a bug of sort -- not the bug of GHC, but of lazy evaluation. Lazy evaluation is not the best evaluation strategy. It is a trade-off, which suits a large class of problems and punishes another large class of problems. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Thunks and GHC pessimisation
Hi, Am Sonntag, den 24.02.2013, 18:41 + schrieb Tom Ellis: > On Sun, Feb 24, 2013 at 07:12:24PM +0100, Joachim Breitner wrote: > > You should try: > > > > > million :: () -> Int > > > million _ = 10 ^ (6 :: Int) > > > > > > many :: () -> [Int] > > > many x = [1..million x] > > Thanks Joachim, but that doesn't work either. oh, strange, I though I had added Maybe you will need {-# NOINLINE million #-} as well. to the mailing. Must have skipped or accidentally removed it... anyways, with that pragma it works. Greetings, Joachim -- Joachim "nomeata" Breitner Debian Developer nome...@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata signature.asc Description: This is a digitally signed message part ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Thunks and GHC pessimisation
On Sun, Feb 24, 2013 at 06:25:56PM +, Don Stewart wrote: > If you explicitly rely on this not happening, turn it off: > > $ ghc -O2 -fno-full-laziness --make A.hs -rtsopts -fforce-recomp > [1 of 1] Compiling Main ( A.hs, A.o ) > Linking A ... > > $ time ./A +RTS -M750k > (5050,5050) > ./A +RTS -M750k 0.06s user 0.00s system 97% cpu 0.069 total Thanks Don, that's good to know. I am not currently bitten by this issue in any production code, but I am concerned it will crop up when I least expect it. Tom ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Thunks and GHC pessimisation
On Sun, Feb 24, 2013 at 07:12:24PM +0100, Joachim Breitner wrote: > You should try: > > > million :: () -> Int > > million _ = 10 ^ (6 :: Int) > > > > many :: () -> [Int] > > many x = [1..million x] Thanks Joachim, but that doesn't work either. Tom % cat thunkfail.hs {-# OPTIONS_GHC -fno-warn-unused-binds #-} import Data.List million :: () -> Int million _ = 10 ^ (6 :: Int) many :: () -> [Int] many x = [1..million x] different_many :: () -> [Int] different_many x = [1..million x] twice :: (Int, Int) twice = (foldl' (+) 0 (many ()), foldl' (+) 0 (many ())) main :: IO () main = print twice % ghc -O -fforce-recomp -Wall -Werror -rtsopts thunkfail.hs && ./thunkfail % +RTS -M5M [1 of 1] Compiling Main ( thunkfail.hs, thunkfail.o ) Linking thunkfail ... Heap exhausted; Current maximum heap size is 5242880 bytes (5 MB); use `+RTS -M' to increase it. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Thunks and GHC pessimisation
In toy examples like this it will be generally hard to convince GHC not to just collapse your program down to a constant, when you're turning up the optimization level. In particular, you are implying -ffull-laziness with -O (or -O2), which can increase sharing. > GHC doesn't implement complete full-laziness. When optimisation in on, and -fno-full-laziness is not given, *some transformations that increase sharing are performed, such as extracting repeated computations from a loop*. These are the same transformations that a fully lazy implementation would do, the difference is that GHC doesn't consistently apply full-laziness, so don't rely on it. http://www.haskell.org/ghc/docs/latest/html/users_guide/options-optimise.html If you explicitly rely on this not happening, turn it off: $ ghc -O2 -fno-full-laziness --make A.hs -rtsopts -fforce-recomp [1 of 1] Compiling Main ( A.hs, A.o ) Linking A ... $ time ./A +RTS -M750k (5050,5050) ./A +RTS -M750k 0.06s user 0.00s system 97% cpu 0.069 total A 750k heap should be enough for anyone :) -- Don On Sun, Feb 24, 2013 at 5:49 PM, Tom Ellis < tom-lists-haskell-cafe-2...@jaguarpaw.co.uk> wrote: > To avoid retaining a large lazy data structure in memory it is useful to > hide it behind a function call. Below, "many" is used twice. It is hidden > behind a function call so it can be garbage collected between uses. That's > good. When compiling with "-O" it seems that GHC 7.4.1 decides to keep it > in memory anyway. That's bad. (I can't read core so I don't know exactly > what's going on). Replacing one of the "many" in "twice" with > "different_many" makes everything fine again. > > Is this considered a bug in GHC? Is it a known bug? It is incredibly > concerning that GHC would perform this kind of pessimisation. > > Tom > > > % cat thunkfail.hs > {-# OPTIONS_GHC -fno-warn-unused-binds #-} > > import Data.List > > million :: Int > million = 10 ^ (6 :: Int) > > many :: () -> [Int] > many () = [1..million] > > different_many :: () -> [Int] > different_many () = [1..million] > > twice :: (Int, Int) > twice = (foldl' (+) 0 (many ()), foldl' (+) 0 (many ())) > > main :: IO () > main = print twice > > % ghc -fforce-recomp -Wall -Werror -rtsopts thunkfail.hs && ./thunkfail > +RTS -M5M > [1 of 1] Compiling Main ( thunkfail.hs, thunkfail.o ) > Linking thunkfail ... > (1784293664,1784293664) > > % ghc -O -fforce-recomp -Wall -Werror -rtsopts thunkfail.hs && ./thunkfail > +RTS -M5M > [1 of 1] Compiling Main ( thunkfail.hs, thunkfail.o ) > Linking thunkfail ... > Heap exhausted; > Current maximum heap size is 5242880 bytes (5 MB); > use `+RTS -M' to increase it. > > ___ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe > ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Thunks and GHC pessimisation
Hi, I believe, without checking, that Am Sonntag, den 24.02.2013, 17:49 + schrieb Tom Ellis: > many :: () -> [Int] > many () = [1..million] is indeed called many times. The problem is that [1..million] is not dependent on the parameter, so GHC will float it out to a top level declaration, and hence share it between the multiple calls to many. You should try: > million :: () -> Int > million _ = 10 ^ (6 :: Int) > > many :: () -> [Int] > many x = [1..million x] Greetings, Joachim (Maybe I should continue to work on http://arxiv.org/abs/1106.3478 and related ideas, like annotating thunks in the code as unshareable... or does this mostly occur in small example programs like this?) -- Joachim "nomeata" Breitner Debian Developer nome...@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata signature.asc Description: This is a digitally signed message part ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Thunks and GHC pessimisation
To avoid retaining a large lazy data structure in memory it is useful to hide it behind a function call. Below, "many" is used twice. It is hidden behind a function call so it can be garbage collected between uses. That's good. When compiling with "-O" it seems that GHC 7.4.1 decides to keep it in memory anyway. That's bad. (I can't read core so I don't know exactly what's going on). Replacing one of the "many" in "twice" with "different_many" makes everything fine again. Is this considered a bug in GHC? Is it a known bug? It is incredibly concerning that GHC would perform this kind of pessimisation. Tom % cat thunkfail.hs {-# OPTIONS_GHC -fno-warn-unused-binds #-} import Data.List million :: Int million = 10 ^ (6 :: Int) many :: () -> [Int] many () = [1..million] different_many :: () -> [Int] different_many () = [1..million] twice :: (Int, Int) twice = (foldl' (+) 0 (many ()), foldl' (+) 0 (many ())) main :: IO () main = print twice % ghc -fforce-recomp -Wall -Werror -rtsopts thunkfail.hs && ./thunkfail +RTS -M5M [1 of 1] Compiling Main ( thunkfail.hs, thunkfail.o ) Linking thunkfail ... (1784293664,1784293664) % ghc -O -fforce-recomp -Wall -Werror -rtsopts thunkfail.hs && ./thunkfail +RTS -M5M [1 of 1] Compiling Main ( thunkfail.hs, thunkfail.o ) Linking thunkfail ... Heap exhausted; Current maximum heap size is 5242880 bytes (5 MB); use `+RTS -M' to increase it. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe