Hi all time for the first update with real test from CGOS 9x9 using a network trained on Odin AGZ exploratory self play games without network.

The I3 network is a network trained on 75000 games from I1 and I2, in parallel as the games were played. I1 is 50000 very fast self play games where Odin is having a 90% prior for a random legal move. I2 is 25000 alpha go zero exploration style games using Odin with 5k simulations and a small prior added to a random move.

T1 games are not completely random, but the bias to the hard coded go knowledge of Odin is small, the sum of nun random priors are 10%.

MC evaluation is always the same. AGZ exploration only affect the priors and how the move after search is selected (proportional to number of visits at root).

NNOdin_1.1.4_E 2591 Elo, This is the state of art Odin single thread for MC eval, and single thread for cpu NN evaluation of the best policy nn based on CGOS games I have. NNOdin_1.1.8_I3 2413 Elo, This is the same as NNOdin_1.1.4_E but uses the I3 network, using maximum time for search. NNOdin_1.1.8_5K_E 2389 Elo, uses the best old network E, but limited to 5K. NNOdin_1.1.8_5K_I3 2080 Elo, This is the version which is now using the I3 network for self play games in iteration 3, but full strength no exploration/ Odin_1.1.8_5K 1983 Elo, This Odin relying on hard coded priors an no NN an 5K simulation. Odin_1.1.8_5K_N1 1809 Elo, This uses the a random networks that overwrite the hard coded prior of Odin and make it play worse.

The data contain almost only unique positions, the branching factor is so high that after 4-5 ply or so the variation is unique as far as I can tell using Drago.

I find that the network overall is already very decent, making a clear distinction between playable and impossible moves. The best move (in deeper search) is almost always among the top 5 candidates. But it is also the case that it often find surprising strong moves with prior 1% that it cannot predict yet. It does not have any complex tactical knowledge yet, just mostly good solid shape.

In summary the network has learned most basic obvious shapes for 9x9, and search does not seem to get stuck in high prior moves.

My goal is to reach or even beat the E network and sofar I think the first network surpassed my expectation. For I4 it will be exiting to see if we see another big step or just some incremental changes.

On 13x13: I2 is far from finished and the network loss is slowly dropping.

On 19x19: I2 is halfway through (using 500 simulations and only 10000 games as a target) and the network seems to have partially learned the hard coded prior of Odin plus some ugly contact move artifacts that I expect/hope to be trained away soon.

Magnus
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to