Looing for advice on profiling

2004-11-09 Thread Duncan Coutts
Hi all,

I'm looking for some advice on profiling and any suggestion on what
might be going on with this program.

I'm profiling c2hs,  Manuel Chakravarty's FFI preprocessor with an eye
to reducing its time and space usage. I'm also trying to add binary
serialisation.

c2hs basically works by parsing a load of C header files, building up
maps of info on function  type names. It then scans a .chs file and
produces a .hs file using the info it gained from the C header files.

Producing the .hs file is fairly quick once all the symbol maps have
been constructed. Parsing the C headers and gathering the info is fairly
slow. So building a project that uses lots of .chs files is quite slow
because it re-parses the header files each time. We originally modified
c2hs to do multiple .chs files in one go so the work could be shared
however this style is not compatible with most build tools (ie
automake). Another approach is to dump a representation of the symbol
maps after parsing the header files and then reading this back for each
.chs file.

So enough background, here's the issues: doing the binary serialisation
seems to be incredibly slow and I can't figure out why.

I'm now using the Binary module pinched from the ghc sources and
instances derived by DrIFT for all the c2hs data structures. I've also
derived NFData instances (DeepSeq) so that I can make sure that the data
to be serialised is already fully evaluated.

I inserted timing points in the program so see what's taking so long.
Here's a sample output:

c2hs/c2hs +RTS -M400m -H300m -RTS -C-I/usr/include/glib-2.0
-C-I/usr/lib/glib-2.0/include --precomp=glib.precomp glib.h
starting c2hs
elapsed time: 0.00 (start)
elapsed time: 0.00 (about to run cpp)
elapsed time: 0.01 (about to parse header)
elapsed time: 1.90 (about to emit warnings)
elapsed time: 1.90 (about to deepSeq header)
elapsed time: 5.24 (about to serialise header)
elapsed time: 52.47 (finnished serialising header)
elapsed time: 52.47 (finnish)

glib.h (and it's includes after cpp'ing) is about 3k lines.
So it takes a few seconds to parse the file and a few more to deepSeq
the whole structure. However after that it takes a long time to
serialise the structure. glib.h is quite small in comparison to
gtk/gtk.h which is about 25k lines. So gtk.h takes ages. In fact it uses
so much memory serialising that it cannot complete on my machine with
-H450m (the max I can set it to on my 512Mb machine without swap
thrashing). When I run it in the original mode that does no
serialisation it completes fine in about 1min.

The binary file produced for glib.h in the end is about 4Mb which is not
huge I'd say. We're using String rather than FastString so that slows
things down a bit and takes more space but even so. (I modified the
Binary module to store lists in length + elems format rather than 0 for
[], 1 for (:) format and chars to use just one byte so strings are not
stored too inefficiently). The file size for gtk.h is about 20Mb (tested
on a machine with more memory).

When I do time profiling, the big cost centres come up as putByte and
putWord. When I profile for space it shows the large FiniteMaps
dominating most everything else. I originally guessed from that that the
serialisation must be forcing loads of thunks which is why it shows up
so highly on the profile. However even after doing the deepSeq before
serialisation, it takes a great deal of time, so I'm not sure what's
going on.

The retainer profiling again shows that the FiniteMaps are holding on to
most stuff.

A major problem no doubt is space use. For the large gtk/gtk.h, when I
run with +RTS -B to get a beep every major garbage collection, the
serialisation phase beeps continuously while the file grows.
Occasionally it seems to freeze for 10s of seconds, not dong any garbage
collection and not doing any file output but using 100% CPU, then it
carries on outputting and garbage collecting furiously. I don't know how
to work out what's going on when it does that.

I don't understand how it can be generating so much garbage when it is
doing the serialisation stuff on a structure that has already been fully
deepSeq'ed.

Any ideas? Where should I direct my investigation? I can supply tarballs
and profiling results and graphs if that would be helpful. Unfortunately
the code base is quite big which makes things difficult.

Duncan

___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


RE: proposal for ghc-pkg to use a directory of .conf files

2004-11-09 Thread Simon Marlow
On 08 November 2004 18:47, Duncan Coutts wrote:

 We can use ghc-pkg at the build / install-into-temp phase to create
 the $(package).conf files under
 $TMP_INSTALL_ROOT/usr/lib/ghc-$VER/package.conf.d/ and then final
 installation is jsut merging files without any post-install calls to
 ghc-pkg to modify installed files (ie the global ghc package.conf
 file) 
 
 So we can still keep the abstraction of $HC-pkg and gain simpler
 packaging stuff.

Ok, sounds reasonable.

I'm going to be working on the package support in ghc and ghc-pkg to
improve support for Cabal, so let's do this at the same time.

Cheers,
Simon
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Looing for advice on profiling

2004-11-09 Thread Malcolm Wallace
Duncan Coutts [EMAIL PROTECTED] writes:

 I'm looking for some advice on profiling and any suggestion on what
 might be going on with this program.

One suggestion might be to serialise (key,value) pairs to file as
they are first encountered, rather than waiting until they are all
inside FiniteMaps.  That would eliminate the time you are currently
spending on lookups.  (A subsequent run would then need to do the
insertion of binary (key,value)s, rather than having them already
ordered, but at least you save the textual parsing cost there.)

 A major problem no doubt is space use. For the large gtk/gtk.h, when I
 run with +RTS -B to get a beep every major garbage collection, the
 serialisation phase beeps continuously while the file grows.
 Occasionally it seems to freeze for 10s of seconds, not dong any garbage
 collection and not doing any file output but using 100% CPU, then it
 carries on outputting and garbage collecting furiously. I don't know how
 to work out what's going on when it does that.

One guess might be generational collection: fast beeps are for the
current generation, pauses are older generations?

Regards,
Malcolm
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


RE: Looing for advice on profiling

2004-11-09 Thread Simon Marlow
On 09 November 2004 12:54, Duncan Coutts wrote:

[snip]
 When I do time profiling, the big cost centres come up as putByte and
 putWord. When I profile for space it shows the large FiniteMaps
 dominating most everything else. I originally guessed from that that
 the serialisation must be forcing loads of thunks which is why it
 shows up so highly on the profile. However even after doing the
 deepSeq before serialisation, it takes a great deal of time, so I'm
 not sure what's going on.

let's get the simple things out of the way first: make sure you're
compiling Binary with -O -funbox-strict-fields (very important).  When
compiling for profiling, don't compile Binary with -auto-all, because
that will add cost centres to all the small functions and really skew
the profile.  I find this is a good rule of thumb when profiling: avoid
-auto-all on your low-level libraries that you hope to be inlined a lot.

You say your instances are created using DrIFT - I don't think we ever
modified DrIFT to generate the right kind of instances for the Binary
library in GHC, so are you using the instances designed for the nhc98
binary library?  If so, make sure your instances are using put_ rather
than put, because the former will allow binary output to run in constant
stack space.

Are you using BinMem, or BinIO?

 The retainer profiling again shows that the FiniteMaps are holding on
 to most stuff.
 
 A major problem no doubt is space use. For the large gtk/gtk.h, when I
 run with +RTS -B to get a beep every major garbage collection, the
 serialisation phase beeps continuously while the file grows.
 Occasionally it seems to freeze for 10s of seconds, not dong any
 garbage collection and not doing any file output but using 100% CPU,
 then it carries on outputting and garbage collecting furiously. I
 don't know how to work out what's going on when it does that.

I agree with Malcolm's conjecture: it sounds like a very long major GC
pause.

 I don't understand how it can be generating so much garbage when it is
 doing the serialisation stuff on a structure that has already been
 fully deepSeq'ed.

Yes, binary output *should* do zero allocation, and binary input should
only allocate the structure being created.  The Binary library is quite
heavily tuned so that this is the case (if you compile with profiling
and -auto-all, it will almost certainly break this property, though).

Cheers,
Simon
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Export through FFI Haskell List to C array (by MarshalAlloc)

2004-11-09 Thread David Lo
Dear all,

Anyone has ever tried exporting Haskell list to C array ? I read in
the FFI for MarshalAlloc, it is mentioned that Haskell lists are
typically passed as C arrays to C functions.

However, how can we do that ? I have a list of Int (i.e. [Int]) which
I would like to pass as int[] in C.

Please kindly advise how to proceed.

Best Regards,

David Lo
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Export through FFI Haskell List to C array (by MarshalAlloc)

2004-11-09 Thread Peter Simons
David Lo writes:

  Anyone has ever tried exporting Haskell list to C array?

Check out the module Foreign.Marshal.Array, specifically the
functions:

  withArray:: Storable a = [a] - (Ptr a - IO b) - IO b
  withArrayLen :: Storable a = [a] - (Int - Ptr a - IO b) - IO b

These will marshal a Haskell list into a C array and hand
the pointer to the given function. Because the array is gone
when your function returns, you don't need to worry about
freeing resources, catching exceptions, etc. If you need
more fine-grained control, you'll find the module also has
functions equivalent to the usual malloc/free way of doing
things.

Peter

___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


RE: GHC CVS HEAD bug!

2004-11-09 Thread Simon Peyton-Jones
Good point. I've fixed this (HEAD only), and added a test so it won't
recur.  Thanks!

Simon

| -Original Message-
| From: [EMAIL PROTECTED]
[mailto:glasgow-haskell-users-
| [EMAIL PROTECTED] On Behalf Of Keean Schupke
| Sent: 03 November 2004 10:08
| To: [EMAIL PROTECTED]
| Subject: GHC CVS HEAD bug!
| 
| The CVS HEAD branch of GHC seems to ignore the fixity of
| type constructors imported from another module... This breaks
| compiling of code that uses this feature:
| 
| module1:
| 
| type a :+ b = (a,b)
| infixr 1 :+
| 
| module2:
| 
| a :: Int :+ Float :+ Double
| a = undefined
| 
| Gives a the wrong type.
| 
| Keean
| ___
| Glasgow-haskell-users mailing list
| [EMAIL PROTECTED]
| http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



RE: Looing for advice on profiling

2004-11-09 Thread Duncan Coutts
On Tue, 2004-11-09 at 14:45, Simon Marlow wrote:
 On 09 November 2004 12:54, Duncan Coutts wrote:
 
 [snip]
  When I do time profiling, the big cost centres come up as putByte and
  putWord. When I profile for space it shows the large FiniteMaps
  dominating most everything else. I originally guessed from that that
  the serialisation must be forcing loads of thunks which is why it
  shows up so highly on the profile. However even after doing the
  deepSeq before serialisation, it takes a great deal of time, so I'm
  not sure what's going on.
 
 let's get the simple things out of the way first: make sure you're
 compiling Binary with -O -funbox-strict-fields (very important).  When
 compiling for profiling, don't compile Binary with -auto-all, because
 that will add cost centres to all the small functions and really skew
 the profile.  I find this is a good rule of thumb when profiling: avoid
 -auto-all on your low-level libraries that you hope to be inlined a lot.

Ok, I was missing -funbox-strict-fields. I'll try that.

 You say your instances are created using DrIFT - I don't think we ever
 modified DrIFT to generate the right kind of instances for the Binary
 library in GHC, so are you using the instances designed for the nhc98
 binary library?  If so, make sure your instances are using put_ rather
 than put, because the former will allow binary output to run in constant
 stack space.

It's using put_

 Are you using BinMem, or BinIO?

BinIO

  The retainer profiling again shows that the FiniteMaps are holding on
  to most stuff.
  
  A major problem no doubt is space use. For the large gtk/gtk.h, when I
  run with +RTS -B to get a beep every major garbage collection, the
  serialisation phase beeps continuously while the file grows.
  Occasionally it seems to freeze for 10s of seconds, not dong any
  garbage collection and not doing any file output but using 100% CPU,
  then it carries on outputting and garbage collecting furiously. I
  don't know how to work out what's going on when it does that.
 
 I agree with Malcolm's conjecture: it sounds like a very long major GC
 pause.

Right, ok.

  I don't understand how it can be generating so much garbage when it is
  doing the serialisation stuff on a structure that has already been
  fully deepSeq'ed.
 
 Yes, binary output *should* do zero allocation, and binary input should
 only allocate the structure being created.  The Binary library is quite
 heavily tuned so that this is the case (if you compile with profiling
 and -auto-all, it will almost certainly break this property, though).

Yes, it's much better with optimisations. I'll try the
-funbox-strict-fields and report back.

Duncan


___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users