Dear Petr,
many thanks for you experiments and your posting!
It is an important puzzle piece in understanding
Alpha-Zero.
Keep on with your good work.
Regards, Ingo.
> Gesendet: Freitag, 10. November 2017 um 01:47 Uhr
> Von: "Petr Baudis"
> An: computer-go@computer-go.org
> Betreff: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication
>
> Hi,
>
> I got first *somewhat* positive results in my attempt to reproduce
> AlphaGo Zero - 25% winrate against GNUGo on the easiest reasonable task
> - 7x7 board. :) a.k.a.
>
> "Sometimes beating GNUGo on a tiny board" without human knowledge
>
> (much wow!)
>
> Normally this would be a pretty weak result much but (A) I wanted to
> help calibrate other efforts on larger boards that are possibly still
> at the "random" stage, and (B) I'll probably move on to other projects
> again soon, so this might be as good as it gets for me.
>
> I started the project by replacing MC simulations with a Keras model
> in my 550-line educational Go program Michi - it lived in its `nnet`
> branch until now when I separated it to a project on its own:
>
> https://github.com/rossumai/nochi
>
> Starting from a small base means that the codebase is tiny and should be
> easy to follow, though it's not at all as tidy as Michi is.
>
> You can grab the current training state (== pickled archive of selfplay
> positions used for replay, chronological) and neural network weights
> from the github's "Releases" page:
>
> https://github.com/rossumai/nochi/releases/tag/G171107T013304_00150
>
> This is a truly "zero-knowledge" system like AlphaGo Zero - it needs
> no supervision, and it contains no Monte Carlo simulations or other
> heuristics. But it's not entirely 1:1, I did some tweaks which I thought
> might help early convergence:
>
> * AlphaGo used 19 resnet layers for 19x19, so I used 7 layers for 7x7.
> * The neural network is updated after _every_ game, _twice_, on _all_
> positions plus 64 randomly sampled positions from the entire history,
> this all done four times - on original position and the three
> symmetry flips (but I was too lazy to implement 90\deg rotation).
> * Instead of supplying last 8 positions as the network input I feed
> just the last position plus two indicator matrices showing
> the location of the last and second-to-last move.
> * No symmetry pruning during tree search.
> * Value function is trained with cross-entropy rather than MSE,
> no L2 regularization, and plain Adam rather than hand-tuned SGD (but
> the annealing is reset time by time due to manual restarts of the
> script from a checkpoint).
> * No resign auto-threshold but it is important to play 25% games
> without resigning to escale local "optima".
> * 1/Temperature is 2 for first three moves.
> * Initially I used 1000 "simulations" per move, but by mistake, last
> 1500 games when the network improved significantly (see below) were
> run with 2000 simulations per move. So that might matter.
>
> This has been running for two weeks, self-playing 8500 games. A week
> ago its moves already looked a bit natural but it was stuck in various
> local optima. Three days ago it has beaten GNUGo once across 20 games.
> Now five times across 20 games - so I'll let it self-play a little longer
> as it might surpass GNUGo quickly at this point? Also this late
> improvement coincides with the increased simulation number.
>
> At the same time, Nochi supports supervised training (with the rest
> kept the same) which I'm now experimenting with on 19x19.
>
> Happy training,
>
> --
> Petr Baudis, Rossum
> Run before you walk! Fly before you crawl! Keep moving forward!
> If we fail, I'd rather fail really hugely. -- Moist von Lipwig
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go