3 questions regarding profiling in ghc
Hello Sorry for the uninformative subject, but I've got no less than 3 questions about profiling in ghc. Here we go 1) Can I profile my program if I don't have all the libraries it depends on compiled with profiling? ghc says it can't find profiling variants of this and that module in other packages, when I try ghc -prof. I managed to build all the dependencies with profiling, but that being a necessity doesn't look good to me. Maybe there's a way to avoid that? Moreover, as far as I could tell functions from those packages didn't appear in the call graph after +RTS -p profiling. 2) Can I exclude a function from profiling? That probably means not assigning a cost centre to it. Typical case, I think. Database connect function is rather heavyweight (regarding time) compared to the rest of the code, and it takes up to 98% of time. So the rest of the picture is less informative than it could be. 3) Isn't it possible to have -p profiling data of the interrupted (ctrl-c) program? Thanks a lot! -- Daniil Elovkov ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
A few small points about GHCi command documentation in section 2.7
Please add to the documentation for :set prompt: If you enclose \i{prompt} in quotes, you can use Haskell syntax for String literals. Actually, :set prompt is nearly useless without quotes, because GHCi strips off trailing spaces from commands. We should either add a space at the end of a prompt entered without quotes, or just require quotes. Or at least change the help text to be: :set prompt \prompt\ set the prompt used in GHCi\n so that people will know the right thing to do. Perhaps add a few more words of explanation to the docs in section 2.7 once we decide which of these to do. The :run command is not documented in section 2.7 - the only mention of it is buried within the documentation for the :main command. It is also not mentioned in helpText. Thanks, Yitz ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: [Haskell-cafe] Fwd: (Solved) cabal install with external gcc tool chain not the ghc-bundled one
On Thu, 2009-11-12 at 10:46 +0100, Daniel Kahlenberg wrote: to answer this question myself how the use of another gcc is specified with effect, I used the following options with the 'cabal install' call: --ghc-options=-pgmc e:/programme/ghc/mingw-gcc4/bin/gcc.exe -pgml e:/programme/ghc/mingw-gcc4/bin/gcc.exe See http://www.haskell.org/ghc/docs/latest/html/users_guide/options-phases.html#replacing-phases (searched in the wrong direction, too many trees...). Slightly tuned, this should be the way to go in all similar cases. One thing I haven't considered yet is if the '--with-ld' and '--with-gcc' options (if curious too, see logs in my previous mail - Subject [Haskell-cafe] caba install with external gcc toolchain not the ghc-bundled one) only effect what gets written into the setup-config/package.conf file or what other effects these have. Feel free to file a ticket about this. What makes me somewhat nervous is that the gcc you want to use for say .c files is not necessarily the same as the one ghc wants to use to compile .hc files or link stuff. This is particularly the case on Windows where ghc includes its own copy of gcc. Similarly on Solaris 10, ghc cannot use the /usr/bin/gcc because it's a hacked-up gcc that uses the Sun CC backend (which doesn't grok some of the crazy GNU C stuff that ghc uses). So it'd certainly be possible to have cabal's --with-gcc/ld override the ones that ghc uses by default, but the question is should it do so? I think it's worth asking the ghc hackers about this. Duncan ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: 3 questions regarding profiling in ghc
Daniil, While you're waiting for an answer from a GHC internals expert, here's my experience as a fellow user. 1) Can I profile my program if I don't have all the libraries it depends on compiled with profiling? I don't know how to do that, and I don't know how to automatically reinstall all dependencies of my project with profiling enabled. I recently went through reinstalling said dependencies as I found them, iteratively. I could have blown away and reinstalled GHC instead, and saved time. To prevent a recurrence, I now have this in my ~/.cabal/config : library-vanilla: True library-profiling: True This should install both normal and profiling versions of every library that I install with cabal from here out. It's a little slower when installing new library packages, but it doesn't come up often enough to bother me. There may be some pain when I get around to bootstrapping GHC 6.12, if it doesn't install profiling builds of its bundled libs. Moreover, as far as I could tell functions from those packages didn't appear in the call graph after +RTS -p profiling. Did you use -auto-all, to automatically create cost centers for all top-level functions? I find that I get very verbose cost info for definitions under imported libraries. 2) Can I exclude a function from profiling? That probably means not assigning a cost centre to it. If -auto-all doesn't please you, you can manually define your cost centers in your code, leaving out the ones you don't care about. But unless I'm mistaken, that doesn't exclude those costs, but rather includes them in the calling cost center. So it may not be what you're asking for. Typical case, I think. Database connect function is rather heavyweight (regarding time) compared to the rest of the code, and it takes up to 98% of time. So the rest of the picture is less informative than it could be. It's your business, but in that case why would you care about the (time) profile of the rest of the code? I wouldn't spend ten seconds time-optimizing anything but that hot spot. If it can't be improved, you're done. To be clear, I'm assuming you're talking about 98% of CPU time, not wall time; I don't think the profiler reports wall time, except maybe in the summary. 3) Isn't it possible to have -p profiling data of the interrupted (ctrl-c) program? When I ctrl-c out of my program, I get a nice program.prof file in the directory where it's running. If you're not getting that, the difference could be OS environment (I'm developing on Linux), or it could be that I'm using happstack and calling a routine that catches the ctrl-c then exits cleanly. It's Happstack.State.waitForTermination; you can probably distill enough from it to get the same effect. http://hackage.haskell.org/packages/archive/happstack-state/0.3.4/doc/html/src/Happstack-State-Control.html#waitForTermination (Pardon the long link.) My main routine spins off threads to do all the work, and the main thread waits on waitForTermination then shuts down. Hope some of this helps. Regards, John Dorsey ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Inliner behaviour - tiny changes lead to huge performance differences
I'm working on measuring and improving the performance of the text library at the moment, and the very first test I tried demonstrated a piece of behaviour that I'm not completely able to understand. Actually, I'm not able to understand what's going on at all, beyond a very shallow level. All the comments below pertain to GHC 6.10.4. The text library uses stream fusion, and I want to measure the performance of UTF-8 decoding. The code I'm measuring is very simple: import qualified Data.ByteString as B import Data.Text.Encoding as T import qualified Data.Text as T import System.Environment (getArgs) import Control.Monad (forM_) main = do args - getArgs forM_ args $ \a - do s - B.readFile a let t = T.decodeUtf8 s print (T.length t) The streamUtf8 function looks roughly like this: streamUtf8 :: OnDecodeError - ByteString - Stream Char streamUtf8 onErr bs = Stream next 0 (maxSize l) where l = B.length bs next i | i = l = Done | U8.validate1 x1 = Yield (unsafeChr8 x1) (i+1) | {- etc. -} {-# INLINE [0] streamUtf8 #-} The values being Yielded from the inner function are, as you can see, themselves constructed by functions. Originally, with the inner next function manually marked as INLINE, I found that functions like unsafeChr8 were not being inlined by GHC, and performance was terrible due to the amount of boxing and unboxing happening in the inner loop. I somehow stumbled on the idea of removing the INLINE annotation from next, and performance suddenly improved by a significant integer multiple. This caused the body of streamUtf8 to be inlined into my test program, as I hoped. However, I wasn't yet out of the woods. The length function is defined as follows: length :: Text - Int length t = Stream.length (Stream.stream t) {-# INLINE length #-} And the streaming length is: length :: Stream Char - Int length = S.lengthI {-# INLINE[1] length #-} And the lengthI function is defined more generally, in the hope that I could use it for both Int and Int64 lengths: lengthI :: Integral a = Stream Char - a lengthI (Stream next s0 _len) = loop_length 0 s0 where loop_length !z s = case next s of Done - z Skips' - loop_length z s' Yield _ s' - loop_length (z + 1) s' {-# INLINE[0] lengthI #-} Unfortunately, although lengthI is inlined into the Int-typed streaming length function, that function is not in turn marked with __inline_me in simplifier output, so the length/decodeUtf8 loops do not fuse. The code is pretty fast, but there's still a lot of boxing and unboxing happening for all the Yields. So. I am quite baffled by this, and I confess to having no idea what to do to get the remaining functions to fuse. But that's not quite confusing enough! Here's a one-byte change to my test code: main = do args - getArgs forM_ args $ \a - do s - B.readFile a let !t = decodeUtf8 s *{- -- notice the strictness annotation -}* print (T.length t) In principle, this should make the code a little slower, because I'm deliberately forcing a Text value to be created, instead of allowing stream/unstream fusion to occur. Now the length function seems to get inlined properly, but while the decodeUtf8 function is inlined, the functions in its inner loop that must be inlined for performance purposes are not. The result is very slow code. I found another site for this one test where removing a single INLINEannotation makes the strictified code above 2x faster, but that change causes the stream/unstream fusion rule to fail to fire entirely, so the strictness annotation no longer makes a difference to performance. All of these flip-flops in inliner behaviour are very difficult to understand, and they seem to be exceedingly fragile. Should I expect the situation to be better with the new inliner in 6.12? Thanks for bearing with that rather long narrative, Bryan. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users