Re: [CIL users] CIL perfomance issues

2009-06-08 Thread Gabriel Kerneis
On Mon, Jun 08, 2009 at 10:20:54AM +0200, Gabriel Kerneis wrote:
> Still looking for a simple way to handle this. Does anybody have an
> idea?

I just sent a patch upstream to handle this. I'll let you know if it gets
accepted.

-- 
Gabriel Kerneis

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
CIL-users mailing list
CIL-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cil-users


Re: [CIL users] CIL perfomance issues

2009-06-08 Thread Gabriel Kerneis
On Mon, Jun 08, 2009 at 09:53:26AM +0200, Gabriel Kerneis wrote:
> On Thu, May 28, 2009 at 10:30:11AM +0200, Christoph Spiel wrote:
> > I have been using the following Tailor configuration to mirror the
> > CIL repository for quite a while now.  It has been working
> > flawlessly so far.
> 
> I'm using it now. But beware, the latest version of Tailor ignores
> external references, and CIL has one (ocamlutil). Unless you track it
> in a separate repository, you should add: ignore-externals = False

Ooops, this didn't quite work actually: it does update the external
ocamlutil, but still doesn't record the related changes (since svn log
doesn't log the external repositories).

Still looking for a simple way to handle this. Does anybody have an
idea?

Regards,
-- 
Gabriel Kerneis

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
CIL-users mailing list
CIL-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cil-users


Re: [CIL users] CIL perfomance issues

2009-06-08 Thread Gabriel Kerneis
On Thu, May 28, 2009 at 10:30:11AM +0200, Christoph Spiel wrote:
> I have been using the following Tailor
> configuration to mirror the CIL repository for
> quite a while now.  It has been working
> flawlessly so far.

I'm using it now. But beware, the latest version of Tailor ignores
external references, and CIL has one (ocamlutil). Unless you track it in
a separate repository, you should add:

> [DEFAULT]
> projects = cil
> root-directory = /site/mirror/repositories
> 
> [cil]
> source = svn:cil
> start-revision = 10140
> state-file = cil.tailor.state
> target = hg:cil
> filter-badchars = True
> 
> [svn:cil]
> module = /trunk/cil
> repository = svn://hal.cs.berkeley.edu/home/svn/projects
> subdir = cil-svn

ignore-externals = False

> [hg:cil]
> subdir = cil-hg

It took me some time to figure out this one (include the *mandatory*
caps to False).

Regards,
-- 
Gabriel Kerneis

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
CIL-users mailing list
CIL-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cil-users


Re: [CIL users] CIL perfomance issues

2009-05-28 Thread Gabriel Kerneis
Hi everybody,

well, it turns out the insane amount of memory used was the fault of
curl macros; believe me or not, after preprocessing, I got a line (among
others) of over 15000 character. Did I mention the word insane before?

Anyway, the patch I wrote for CIL is still worth being applied since,
even though they did not improve the timing, they cut allocated memory
by half --- who would like to waste memory, and time GCing it?

I'll clean things a bit and send it in the next few days.

Regards,
-- 
Gabriel Kerneis

--
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT 
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian 
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com 
___
CIL-users mailing list
CIL-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cil-users


Re: [CIL users] CIL perfomance issues

2009-05-28 Thread Gabriel Kerneis
Hi,

On Thu, May 28, 2009 at 10:22:08AM +0200, Christoph Spiel wrote:
> On Tue, May 26, 2009 at 07:02:16PM +0200, Gabriel Kerneis wrote:
> > I've been suffering performance issues with CIL recently. 30% of the
> > time was spent in garbage collection.
> 
> (1) Recompile the run-time environment of OCaml
> with the best known combination of optimization
> flags.  This may speed up the garbage collector
> a little bit.

This might not be enough for me, see below.

> (2) Drastically increase the heap size.  (Of
> course this action is limited by the amount of
> memory in the target machines.)

This reduces the number of collections A LOT but doesn't really cut the
time down, because there too many words allocated.

> (3) Rewrite the analysis to be less functional.
> I had a functional implementation of my analysis
> that used a lot of [Buffer]s for the output:
> many of them and with a large total size.
> Rewriting it to a more procedural style which
> meant immediate output of the string data to the
> results file dramatically reduced the load on
> the GC, increased the performance a lot, and at
> the same time reduced the program's memory
> footprint.

Well, most of my trouble come from a single visitor which has to go
through my program many times. Sadly, I'm not analysing anything, but
rather transforming the source code. I'm as imperative as can be
(everything is mutated in place), but sadly cil.ml is full of partial
applications. Hence my patches.

Just to give you an idea of the figures involved:
  Memory statistics: total=1173.63Mb, max=3.56Mb, minor=1173.54Mb,
  major=125.06Mb, promoted=124.97Mb
  minor collections=8953  major collections=129 compactions=0

With your tune_garbage_collector(), I got:
  Memory statistics: total=1173.51Mb, max=34.54Mb, minor=1173.42Mb,
  major=16.48Mb, promoted=16.39Mb
  minor collections=323  major collections=18 compactions=3

Which looks like a big win, but the time spent allocating the 1.2Gb
of minor words kills everything else (moreover I'm on 64bit, so it's
2.4Gb in fact).

Thanks anyway for your advice,
-- 
Gabriel Kerneis

--
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT 
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian 
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com 
___
CIL-users mailing list
CIL-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cil-users


Re: [CIL users] CIL perfomance issues

2009-05-28 Thread Christoph Spiel
On Tue, May 26, 2009 at 10:45:30PM +0200, Gabriel Kerneis wrote:
> I consider switching to Tailor the day I get fed up to manually tracking it.

I have been using the following Tailor
configuration to mirror the CIL repository for
quite a while now.  It has been working
flawlessly so far.


[DEFAULT]
projects = cil
root-directory = /site/mirror/repositories

[cil]
source = svn:cil
start-revision = 10140
state-file = cil.tailor.state
target = hg:cil
filter-badchars = True

[svn:cil]
module = /trunk/cil
repository = svn://hal.cs.berkeley.edu/home/svn/projects
subdir = cil-svn

[hg:cil]
subdir = cil-hg


Note that you may have to grab the latest
version of Tailor from
http://progetti.arstecnica.it/tailor
if you use a newer release of your favourite
(target) SCM system -- happened to me with
Mercurial v1.2.


/Chris

--
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT 
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian 
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com 
___
CIL-users mailing list
CIL-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cil-users


Re: [CIL users] CIL perfomance issues

2009-05-28 Thread Christoph Spiel
Gabriel -

On Tue, May 26, 2009 at 07:02:16PM +0200, Gabriel Kerneis wrote:
> I've been suffering performance issues with CIL recently. 30% of the
> time was spent in garbage collection.

A while ago a faced similar problems.
That time, I tried three different approaches to
speed up the analysis.  From least effective to
most effective:

(1) Recompile the run-time environment of OCaml
with the best known combination of optimization
flags.  This may speed up the garbage collector
a little bit.

The improvement was noticeable in my case but
insufficient.

(2) Drastically increase the heap size.  (Of
course this action is limited by the amount of
memory in the target machines.)

I was lucky and the time spent in the GC dropped
significantly after increasing the heap size.

In the documentation of [tune_garbage_collector]
I have collected some of the scarce information
on tuning of the OCaml garbage collector that I
found on the web.

(** Tune the garbage collector's parameters for the extremely large
datasets we usually cope with.

Defaults:
- [minor_heap_size]: 32Kwords
- [major_heap_increment]: 62Kwords
- [space_overhead]: 80%
- [max_overhead]: 500%

Tuning:
- Increasing [minor_heap_size] will reduce the time spent in both
  the minor GC and the major GC.  It is often (but not always)
  preferable to keep it small enough to fit in the cache of the
  machine.
- Increasing [major_heap_increment] reduces the number of times
  that [add_to_heap] is called.
- Increasing [space_overhead] will reduce the time spent in the
  major GC.

Inspect the GC statistics at the end of the program's run.  The
most important figure is the ratio of [promoted_words] to
[minor_words].  This should be as small as possible.  If it is
more than 10%, the program spends too much time in the GC (both
minor and major).  Increasing [minor_heap_size] often helps in
this case. *)
let tune_garbage_collector () =
  let gc = Gc.get () in
Gc.set {
  gc with
Gc.minor_heap_size = 64 * gc.Gc.minor_heap_size;
Gc.major_heap_increment = 64 * gc.Gc.major_heap_increment;
Gc.space_overhead = 2 * gc.Gc.space_overhead;
Gc.max_overhead = 2 * gc.Gc.max_overhead;
Gc.verbose = 0 (* useful value: 0x01d *)
}


(3) Rewrite the analysis to be less functional.

Ouch!

I had a functional implementation of my analysis
that used a lot of [Buffer]s for the output:
many of them and with a large total size.
Rewriting it to a more procedural style which
meant immediate output of the string data to the
results file dramatically reduced the load on
the GC, increased the performance a lot, and at
the same time reduced the program's memory
footprint.


HTH,
Chris

--
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT 
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian 
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com 
___
CIL-users mailing list
CIL-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cil-users


Re: [CIL users] CIL perfomance issues

2009-05-26 Thread Gabriel Kerneis
Hi,

> I'm curious, how do you track the upstream svn changes using darcs? I
> find that git interoperates with svn very well, so I'm using git to
> manage my local copy of cil.

Manually since there are few updates. Something along this line:

http://weblog.masukomi.org/2007/5/23/using-darcs-with-svn-cvs-flow-chart

I consider switching to Tailor the day I get fed up to manually tracking it.

http://wiki.darcs.net/DarcsWiki/Tailor

darcs get http://www.pps.jussieu.fr/~kerneis/software/repos/cil/svn is the
upstream version, btw.

But darcs is not ideally suited for such a task, even with darcs2 some
"merges" take up to a minute on a recent computer. I don't know if git is
better though. Darcs is the preferred tool in my lab, so I stick with it
as long as it works for me.

>> I still have an awful lot of major collections

Please, forgive and forget my wanderings about major collection. Only the
minor ones are significant in that case (and they reduced dramatically
with my patches). Moreover, I did not have so many major collections...

Regards,
-- 
Gabriel Kerneis


--
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT 
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian 
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com 
___
CIL-users mailing list
CIL-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cil-users


Re: [CIL users] CIL perfomance issues

2009-05-26 Thread Wei Hu
I'm curious, how do you track the upstream svn changes using darcs? I
find that git interoperates with svn very well, so I'm using git to
manage my local copy of cil.

On Tue, May 26, 2009 at 1:02 PM, Gabriel Kerneis  wrote:
> Hi,
>
> I've been suffering performance issues with CIL recently. 30% of the
> time was spent in garbage collection.
>
> [Not sure if this figure is significant since I still spend 30% of my
> time garbage-collecting, but it's a shorter time now ;-) ]
>
> The issue basically boils down to a lot of partial evaluations, pushing
> the GC under heavy-load. I'm currently patching cil.ml extensively
> (well, the visitor part at least) and reduced the number of allocated
> words by half on my benchmarks.
>
> [By the way, ocamlutil/stats.ml is broken on 64bit architectures, the
> figures returned by printM in Stats.print should by doubled. I'll send a
> patch for this one too.]
>
> I still have an awful lot of major collections but I think I'm on the
> good way since there are far less allocations.
>
> I'll provide a unified patch on this list soon but curious people could
> look at the latest patches (using darcs) here:
>
> darcs get http://www.pps.jussieu.fr/~kerneis/software/repos/cil/patched
>
> More information on the general issue and work-arounds here:
>    http://ocaml.janestreet.com/?q=node/30
>
> Regards,
> --
> Gabriel Kerneis
>
> --
> Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
> is a gathering of tech-side developers & brand creativity professionals. Meet
> the minds behind Google Creative Lab, Visual Complexity, Processing, &
> iPhoneDevCamp as they present alongside digital heavyweights like Barbarian
> Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com
> ___
> CIL-users mailing list
> CIL-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cil-users
>

--
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT 
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian 
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com 
___
CIL-users mailing list
CIL-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cil-users