Hi Don, This is a very interesting study!
Sylvain 2007/6/25, Don Dailey <[EMAIL PROTECTED]>:
Someone just reminded me of the scalability study I did a few months back and I reported that I would continue to run it for perhaps a few more weeks. I did run about 20% more games, but the data was quite useful because it increased the number of games sampled for the highest levels. I had started the highest level program late but the auto-tester is designed to try to equalize the number of games played for each player. As a reminder, the study was designed to test the improvement of modern UCT programs as the number of play-outs increase. In the study, I had two basic versions, each testing at 12 different levels. The L series is Lazarus running with "lite play-outs" and the H series is Lazarus running with "heavy play-outs." Since the study, Lazarus has actually improved significantly, so these are both older versions of Lazarus - still relatively strong and perhaps better candidates for a study of this type since the older programs tend to be more universal (less prone to serious intransitives.) I don't have a graph like I did before, but one can easily be constructed by the data: --- Player Key --- H_ is heavy play-out version of Lazarus L_ is light play-out version of Lazarus The numeric portion of the player name describes how many play-outs were executed to play each move. PLAYER TIME/GME RATING GAMES Total games: 2895 -------- -------- ------- ----- H_2048 13350.17 2830.2 168 H_1024 6693.84 2768.0 169 H_0512 3147.28 2547.3 168 H_0256 1547.30 2399.3 168 L_2048 4549.37 2375.5 168 H_0128 758.64 2315.7 168 L_1024 2203.88 2287.8 169 H_0064 381.00 2240.3 339 L_0512 1064.80 2174.1 168 H_0032 214.12 2129.2 318 L_0256 523.12 2105.7 168 L_0128 258.54 2097.8 170 gg-3.7.9 68.97 2000.0 307 Standard GnuGo 3.7.9 L_0064 134.17 1981.7 293 H_0016 125.93 1950.2 284 L_0032 72.72 1941.5 284 H_0008 62.27 1872.4 276 L_0016 43.49 1758.6 261 H_0004 31.22 1679.1 253 L_0008 21.07 1556.2 248 H_0002 14.90 1402.1 250 L_0004 10.55 1347.0 248 L_0002 5.03 1123.6 248 H_0001 7.44 1031.6 249 L_0001 2.49 863.6 248 Observations: If you look at the entire range of the HEAVY player, you will notice that each doubling (on average) was worth 164 ELO points. You will also notice a gradual falloff in improvement as the levels increase. As a general rule of thumb, there is about 150 ELO per doubling. I figured this by throwing out the highest and lowest rated HEAVY player and averaging the increase per doubling. It seems pragmatic to throw out the 2 extremes based on empirical observation - I have always noticed that in a pool of players the highest and lowest often have at least somewhat distorted ratings. After throwing out the low and high ratings the top 5 players average about 132 ELO per doubling and the bottom 5 average an increase of about 210 per doubling. So there is a definite decrease per doubling, but it's quite gradual. I did a similar study with 7x7 and found that the tapering is extremely pronounced. It was quite obvious which komi to use, because if it was too low black won every games, if it was too high white won every game. The tapering was pronounced because at higher levels the play was very close to perfect. If you are playing perfect, there is no improvement to be had by doubling. It appears as a general rule of thumb, (and is supported by empirical evidence in similar studies of other games) that the rating/resource curve is almost linear when you are far away from perfect play but gets pronounced as you approach perfect play. I suspect Lazarus at the highest level I tested is within a few hundred ELO points of perfect play. It's still a long way off, especially considering that Lazarus at the highest level was spending almost 4 hours on each 9x9 game! My auto-tester stores all data, including configuration in a single sqlite3 database. In it are the SGF game records, individual results and even time spent on each move and it's available to anyone who wants it upon request - so you can analyze the results for yourself and come to your own conclusions! - Don _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/