3 questions regarding profiling in ghc

2009-11-12 Thread Daniil Elovkov

Hello

Sorry for the uninformative subject, but I've got no less than 3 
questions about profiling in ghc. Here we go


1) Can I profile my program if I don't have all the libraries it depends 
on compiled with profiling?


ghc says it can't find profiling variants of this and that module in 
other packages, when I try ghc -prof.


I managed to build all the dependencies with profiling, but that being a 
necessity doesn't look good to me. Maybe there's a way to avoid that?


Moreover, as far as I could tell functions from those packages didn't 
appear in the call graph after +RTS -p profiling.


2) Can I exclude a function from profiling? That probably means not 
assigning a cost centre to it.


Typical case, I think. Database connect function is rather heavyweight 
(regarding time) compared to the rest of the code, and it takes up to 
98% of time. So the rest of the picture is less informative than it 
could be.


3) Isn't it possible to have -p profiling data of the interrupted 
(ctrl-c) program?


Thanks a lot!

--
Daniil Elovkov
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


A few small points about GHCi command documentation in section 2.7

2009-11-12 Thread Yitzchak Gale
Please add to the documentation for :set prompt:

If you enclose \i{prompt} in quotes, you can use Haskell
syntax for String literals.

Actually, :set prompt is nearly useless without quotes, because
GHCi strips off trailing spaces from commands. We should either
add a space at the end of a prompt entered without quotes,
or just require quotes. Or at least change the help text
to be:

:set prompt \prompt\  set the prompt used in GHCi\n

so that people will know the right thing to do. Perhaps add
a few more words of explanation to the docs in section 2.7
once we decide which of these to do.

The :run command is not documented in section 2.7 - the
only mention of it is buried within the documentation for
the :main command. It is also not mentioned in helpText.

Thanks,
Yitz
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: [Haskell-cafe] Fwd: (Solved) cabal install with external gcc tool chain not the ghc-bundled one

2009-11-12 Thread Duncan Coutts
On Thu, 2009-11-12 at 10:46 +0100, Daniel Kahlenberg wrote:
 to answer this question myself how the use of another gcc is specified
 with effect, I used the following options with the 'cabal install' call:
 
  --ghc-options=-pgmc e:/programme/ghc/mingw-gcc4/bin/gcc.exe -pgml
 e:/programme/ghc/mingw-gcc4/bin/gcc.exe
 
 See
 http://www.haskell.org/ghc/docs/latest/html/users_guide/options-phases.html#replacing-phases
 (searched in the wrong direction, too many trees...). Slightly tuned,
 this should be the way to go in all similar cases.
 
 One thing I haven't considered yet is if the '--with-ld' and
 '--with-gcc' options (if curious too, see logs in my previous mail -
 Subject [Haskell-cafe] caba install with external gcc toolchain not the
 ghc-bundled one) only effect what gets written into the
 setup-config/package.conf file or what other effects these have.

Feel free to file a ticket about this. What makes me somewhat nervous is
that the gcc you want to use for say .c files is not necessarily the
same as the one ghc wants to use to compile .hc files or link stuff.
This is particularly the case on Windows where ghc includes its own copy
of gcc. Similarly on Solaris 10, ghc cannot use the /usr/bin/gcc because
it's a hacked-up gcc that uses the Sun CC backend (which doesn't grok
some of the crazy GNU C stuff that ghc uses).

So it'd certainly be possible to have cabal's --with-gcc/ld override the
ones that ghc uses by default, but the question is should it do so? I
think it's worth asking the ghc hackers about this.

Duncan

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: 3 questions regarding profiling in ghc

2009-11-12 Thread John Dorsey
Daniil,

While you're waiting for an answer from a GHC internals expert, here's my
experience as a fellow user.

 1) Can I profile my program if I don't have all the libraries it depends  
 on compiled with profiling?

I don't know how to do that, and I don't know how to automatically reinstall
all dependencies of my project with profiling enabled.  I recently went
through reinstalling said dependencies as I found them, iteratively.  I
could have blown away and reinstalled GHC instead, and saved time.

To prevent a recurrence, I now have this in my ~/.cabal/config :

library-vanilla: True
library-profiling: True

This should install both normal and profiling versions of every library that
I install with cabal from here out.  It's a little slower when installing
new library packages, but it doesn't come up often enough to bother me.
There may be some pain when I get around to bootstrapping GHC 6.12, if
it doesn't install profiling builds of its bundled libs.

 Moreover, as far as I could tell functions from those packages didn't  
 appear in the call graph after +RTS -p profiling.

Did you use -auto-all, to automatically create cost centers for all
top-level functions?  I find that I get very verbose cost info for
definitions under imported libraries.

 2) Can I exclude a function from profiling? That probably means not  
 assigning a cost centre to it.

If -auto-all doesn't please you, you can manually define your cost centers
in your code, leaving out the ones you don't care about.  But unless I'm
mistaken, that doesn't exclude those costs, but rather includes them in the
calling cost center.  So it may not be what you're asking for.

 Typical case, I think. Database connect function is rather heavyweight  
 (regarding time) compared to the rest of the code, and it takes up to  
 98% of time. So the rest of the picture is less informative than it  
 could be.

It's your business, but in that case why would you care about the (time)
profile of the rest of the code?  I wouldn't spend ten seconds
time-optimizing anything but that hot spot.  If it can't be improved, you're
done.

To be clear, I'm assuming you're talking about 98% of CPU time, not wall
time; I don't think the profiler reports wall time, except maybe in the
summary.

 3) Isn't it possible to have -p profiling data of the interrupted  
 (ctrl-c) program?

When I ctrl-c out of my program, I get a nice program.prof file in the
directory where it's running.  If you're not getting that, the difference
could be OS environment (I'm developing on Linux), or it could be that I'm
using happstack and calling a routine that catches the ctrl-c then exits
cleanly.  It's Happstack.State.waitForTermination; you can probably distill
enough from it to get the same effect.

http://hackage.haskell.org/packages/archive/happstack-state/0.3.4/doc/html/src/Happstack-State-Control.html#waitForTermination

(Pardon the long link.)  My main routine spins off threads to do all the
work, and the main thread waits on waitForTermination then shuts down.

Hope some of this helps.

Regards,
John Dorsey

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Inliner behaviour - tiny changes lead to huge performance differences

2009-11-12 Thread Bryan O'Sullivan
I'm working on measuring and improving the performance of the text library
at the moment, and the very first test I tried demonstrated a piece of
behaviour that I'm not completely able to understand. Actually, I'm not able
to understand what's going on at all, beyond a very shallow level. All the
comments below pertain to GHC 6.10.4.

The text library uses stream fusion, and I want to measure the performance
of UTF-8 decoding.

The code I'm measuring is very simple:

import qualified Data.ByteString as B
import Data.Text.Encoding as T
import qualified Data.Text as T
import System.Environment (getArgs)
import Control.Monad (forM_)

main = do
  args - getArgs
  forM_ args $ \a - do
s - B.readFile a
let t = T.decodeUtf8 s
print (T.length t)


The streamUtf8 function looks roughly like this:

streamUtf8 :: OnDecodeError - ByteString - Stream Char
streamUtf8 onErr bs = Stream next 0 (maxSize l)
where
  l = B.length bs
  next i
  | i = l =  Done
  | U8.validate1 x1 = Yield (unsafeChr8 x1) (i+1)
  | {- etc. -}
{-# INLINE [0] streamUtf8 #-}


The values being Yielded from the inner function are, as you can see,
themselves constructed by functions.

Originally, with the inner next function manually marked as INLINE, I found
that functions like unsafeChr8 were not being inlined by GHC, and
performance was terrible due to the amount of boxing and unboxing happening
in the inner loop.

I somehow stumbled on the idea of removing the INLINE annotation from next,
and performance suddenly improved by a significant integer multiple. This
caused the body of streamUtf8 to be inlined into my test program, as I
hoped.

However, I wasn't yet out of the woods. The length function is defined as
follows:

length :: Text - Int
length t = Stream.length (Stream.stream t)
{-# INLINE length #-}

And the streaming length is:

length :: Stream Char - Int
length = S.lengthI
{-# INLINE[1] length #-}


And the lengthI function is defined more generally, in the hope that I could
use it for both Int and Int64 lengths:

lengthI :: Integral a = Stream Char - a
lengthI (Stream next s0 _len) = loop_length 0 s0
where
  loop_length !z s  = case next s of
   Done   - z
   Skips' - loop_length z s'
   Yield _ s' - loop_length (z + 1) s'
{-# INLINE[0] lengthI #-}


Unfortunately, although lengthI is inlined into the Int-typed streaming
length function, that function is not in turn marked with __inline_me in
simplifier output, so the length/decodeUtf8 loops do not fuse. The code is
pretty fast, but there's still a lot of boxing and unboxing happening for
all the Yields.

So. I am quite baffled by this, and I confess to having no idea what to do
to get the remaining functions to fuse. But that's not quite confusing
enough! Here's a one-byte change to my test code:

main = do
  args - getArgs
  forM_ args $ \a - do
s - B.readFile a
let !t = decodeUtf8 s *{- -- notice the strictness annotation -}*
print (T.length t)


In principle, this should make the code a little slower, because I'm
deliberately forcing a Text value to be created, instead of allowing
stream/unstream fusion to occur. Now the length function seems to get
inlined properly, but while the decodeUtf8 function is inlined, the
functions in its inner loop that must be inlined for performance purposes
are not. The result is very slow code.

I found another site for this one test where removing a single
INLINEannotation makes the strictified code above 2x faster, but that
change
causes the stream/unstream fusion rule to fail to fire entirely, so the
strictness annotation no longer makes a difference to performance.

All of these flip-flops in inliner behaviour are very difficult to
understand, and they seem to be exceedingly fragile. Should I expect the
situation to be better with the new inliner in 6.12?

Thanks for bearing with that rather long narrative,
Bryan.
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users