Re: [spam probable] [computer-go] scalability study - final results

Sylvain Gelly Mon, 25 Jun 2007 11:22:04 -0700

Hi Don,

This is a very interesting study!


Sylvain

2007/6/25, Don Dailey <[EMAIL PROTECTED]>:


Someone just reminded me of the scalability study I did a few months
back and I reported that I would continue to run it for perhaps a few
more weeks.

I did run about 20% more games, but the data was quite useful because
it increased the number of games sampled for the highest levels.  I had
started the highest level program late but the auto-tester is designed
to try to equalize the number of games played for each player.

As a reminder, the study was designed to test the improvement of
modern UCT programs as the number of play-outs increase.  In the
study, I had two basic versions, each testing at 12 different levels.

The L series is Lazarus running with "lite play-outs" and the H series
is Lazarus running with "heavy play-outs."  Since the study, Lazarus
has actually improved significantly, so these are both older versions
of Lazarus - still relatively strong and perhaps better candidates for
a study of this type since the older programs tend to be more
universal (less prone to serious intransitives.)

I don't have a graph like I did before, but one can easily be
constructed by the data:

--- Player Key ---

   H_  is heavy play-out version of Lazarus
   L_  is light play-out version of Lazarus

   The numeric portion of the player name describes how
   many play-outs were executed to play each move.


PLAYER    TIME/GME   RATING  GAMES   Total games: 2895
--------  --------  -------  -----
H_2048    13350.17   2830.2    168
H_1024     6693.84   2768.0    169
H_0512     3147.28   2547.3    168
H_0256     1547.30   2399.3    168
L_2048     4549.37   2375.5    168
H_0128      758.64   2315.7    168
L_1024     2203.88   2287.8    169
H_0064      381.00   2240.3    339
L_0512     1064.80   2174.1    168
H_0032      214.12   2129.2    318
L_0256      523.12   2105.7    168
L_0128      258.54   2097.8    170
gg-3.7.9     68.97   2000.0    307    Standard GnuGo 3.7.9
L_0064      134.17   1981.7    293
H_0016      125.93   1950.2    284
L_0032       72.72   1941.5    284
H_0008       62.27   1872.4    276
L_0016       43.49   1758.6    261
H_0004       31.22   1679.1    253
L_0008       21.07   1556.2    248
H_0002       14.90   1402.1    250
L_0004       10.55   1347.0    248
L_0002        5.03   1123.6    248
H_0001        7.44   1031.6    249
L_0001        2.49    863.6    248



Observations:

If you look at the entire range of the HEAVY player, you will notice
that each doubling (on average) was worth 164 ELO points.

You will also notice a gradual falloff in improvement as the levels
increase.

As a general rule of thumb, there is about 150 ELO per doubling.  I
figured this by throwing out the highest and lowest rated HEAVY player
and averaging the increase per doubling.  It seems pragmatic to throw
out the 2 extremes based on empirical observation - I have always
noticed that in a pool of players the highest and lowest often have
at least somewhat distorted ratings.

After throwing out the low and high ratings the top 5 players average
about 132 ELO per doubling and the bottom 5 average an increase of
about 210 per doubling.

So there is a definite decrease per doubling, but it's quite gradual.


I did a similar study with 7x7 and found that the tapering is extremely
pronounced.  It was quite obvious which komi to use, because if it was
too low black won every games, if it was too high white won every
game.  The tapering was pronounced because at higher levels the play
was very close to perfect.  If you are playing perfect, there is no
improvement to be had by doubling.

It appears as a general rule of thumb, (and is supported by empirical
evidence in similar studies of other games) that the rating/resource
curve is almost linear when you are far away from perfect play but
gets pronounced as you approach perfect play.  I suspect Lazarus at
the highest level I tested is within a few hundred ELO points of
perfect play.  It's still a long way off, especially considering that
Lazarus at the highest level was spending almost 4 hours on each 9x9
game!

My auto-tester stores all data, including configuration in a single
sqlite3 database.  In it are the SGF game records, individual results
and even time spent on each move and it's available to anyone who
wants it upon request - so you can analyze the results for yourself
and come to your own conclusions!

- Don



_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [spam probable] [computer-go] scalability study - final results

Reply via email to