This morning while attempting to answer a different question related
to Parrot performance I ran into a very disturbing (to me) result
about Parrot GC, which I share below.

Here's a NQP version of the fibonacci benchmarks that appear in
examples/benchmarks/ .

  $ cat fib.nqp
  
  sub fib($n) {
      ($n < 2) ?? $n !! fib($n-1) + fib($n-2);
  }
  
  my $N := 28;
  
  pir::say("fib($N) = " ~ fib($N));


Running the above program with NQP takes about 20.5 seconds:
  
  $ time ./parrot-nqp fib.nqp
  fib(28) = 317811
  real  0m20.598s
  user  0m20.500s
  sys   0m0.090s

Okay, so let's figure out where the time is being used.  Let's see
how long it takes to compile the above into PIR:

  $ time ./parrot-nqp --target=pir fib.nqp >fib-nqp.pir
  real  0m0.258s
  user  0m0.220s
  sys   0m0.030s

Hmm, compilation only requires 0.25 seconds -- not too bad.  So the
remaining time (20.25 seconds) must be spent executing the program, 
right?  Let's try running the PIR file we just generated:

  $ time ./parrot fib-nqp.pir
  fib(28) = 317811
  real  0m9.879s
  user  0m9.870s
  sys   0m0.010s

Huh?!  Only 9.8 seconds?  So, if fib.nqp takes 0.25 seconds to compile 
and 9.8 seconds to run, where are the other 10 seconds coming from in 
the original execution?  Let's try running the original program again,
but this time disabling GC at the beginning:

  $ cat fib-2.nqp
  
  Q:PIR { sweepoff };
  Q:PIR { collectoff };
  
  sub fib($n) {
      ($n < 2) ?? $n !! fib($n-1) + fib($n-2);
  }
  
  my $N := 28;
  
  pir::say("fib($N) = " ~ fib($N));
  
  $ time ./parrot-nqp fib-2.nqp
  fib(28) = 317811
  real  0m16.715s
  user  0m10.320s
  sys   0m2.600s

Okay, it appears that simply turning off the GC gives us a 10 second
improvement in execution speed -- or, put another way, over half of our
execution time for this simple benchmark is spent dealing with GC overhead
for things not directly related to the code being run.

Presumably the extra GC overhead is coming from the parse/past/post
tree structures produced during compilation, as well as the nqp-rx/regex/
pct/past/etc. libraries themselves.  If this is true, then what
is especially worrying is that the data structures produced and retained
from compiling the fib.nqp program are almost trivially small, yet
their existence (but not their creation) is resulting in a relatively 
huge degradation in runtime performance -- in this case, effectively
doubling the overall runtime cost.

Some may suggest that we can improve the situation by reduce the
number of objects created or retained as a result of compilation.
While I agree it will be useful to make the compilation process
more efficient, my underlying point is that with the current GC
*any* modest-sized data structure that exists at runtime appears 
to incur an unacceptable runtime cost (and this would include
data structures created in applications as well).

Given what I've just found above, I'm really hoping we can get
some significant effort placed on GC *prior* to 2.0.  It's okay
with me if nothing _lands_ before 2.0, but Allison's message that
started this thread sounded much more to me like "we'll start
working on it in earnest shortly after 2.0" as opposed to 
"it will land shortly after 2.0".  I fear "start working on it
after 2.0" will ultimately mean that we're seriously hampered by
GC performance through at least the first half of 2010.

Pm
_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev

Reply via email to