Hi all time for the first update with real test from CGOS 9x9 using a
network trained on Odin AGZ exploratory self play games without network.
The I3 network is a network trained on 75000 games from I1 and I2, in
parallel as the games were played.
I1 is 50000 very fast self play games where Odin is having a 90% prior
for a random legal move.
I2 is 25000 alpha go zero exploration style games using Odin with 5k
simulations and a small prior added to a random move.
T1 games are not completely random, but the bias to the hard coded go
knowledge of Odin is small, the sum of nun random priors are 10%.
MC evaluation is always the same. AGZ exploration only affect the priors
and how the move after search is selected (proportional to number of
visits at root).
NNOdin_1.1.4_E 2591 Elo, This is the state of art Odin single
thread for MC eval, and single thread for cpu NN evaluation of the best
policy nn based on CGOS games I have.
NNOdin_1.1.8_I3 2413 Elo, This is the same as NNOdin_1.1.4_E but
uses the I3 network, using maximum time for search.
NNOdin_1.1.8_5K_E 2389 Elo, uses the best old network E, but limited
to 5K.
NNOdin_1.1.8_5K_I3 2080 Elo, This is the version which is now using the
I3 network for self play games in iteration 3, but full strength no
exploration/
Odin_1.1.8_5K 1983 Elo, This Odin relying on hard coded priors an
no NN an 5K simulation.
Odin_1.1.8_5K_N1 1809 Elo, This uses the a random networks that
overwrite the hard coded prior of Odin and make it play worse.
The data contain almost only unique positions, the branching factor is
so high that after 4-5 ply or so the variation is unique as far as I can
tell using Drago.
I find that the network overall is already very decent, making a clear
distinction between playable and impossible moves. The best move (in
deeper search) is almost always among the top 5 candidates. But it is
also the case that it often find surprising strong moves with prior 1%
that it cannot predict yet. It does not have any complex tactical
knowledge yet, just mostly good solid shape.
In summary the network has learned most basic obvious shapes for 9x9,
and search does not seem to get stuck in high prior moves.
My goal is to reach or even beat the E network and sofar I think the
first network surpassed my expectation. For I4 it will be exiting to see
if we see another big step or just some incremental changes.
On 13x13: I2 is far from finished and the network loss is slowly
dropping.
On 19x19: I2 is halfway through (using 500 simulations and only 10000
games as a target) and the network seems to have partially learned the
hard coded prior of Odin plus some ugly contact move artifacts that I
expect/hope to be trained away soon.
Magnus
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go