Re: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication

2017-11-09 Thread Ingo Althöfer
Dear Petr,

many thanks for you experiments and your posting!
It is an important puzzle piece in understanding 
Alpha-Zero.

Keep on with your good work.

Regards, Ingo.



> Gesendet: Freitag, 10. November 2017 um 01:47 Uhr
> Von: "Petr Baudis" 
> An: computer-go@computer-go.org
> Betreff: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication
>
>   Hi,
> 
>   I got first *somewhat* positive results in my attempt to reproduce
> AlphaGo Zero - 25% winrate against GNUGo on the easiest reasonable task
> - 7x7 board. :)  a.k.a.
> 
>   "Sometimes beating GNUGo on a tiny board" without human knowledge
> 
> (much wow!)
> 
>   Normally this would be a pretty weak result much but (A) I wanted to
> help calibrate other efforts on larger boards that are possibly still
> at the "random" stage, and (B) I'll probably move on to other projects
> again soon, so this might be as good as it gets for me.
> 
>   I started the project by replacing MC simulations with a Keras model
> in my 550-line educational Go program Michi - it lived in its `nnet`
> branch until now when I separated it to a project on its own:
> 
>   https://github.com/rossumai/nochi
> 
> Starting from a small base means that the codebase is tiny and should be
> easy to follow, though it's not at all as tidy as Michi is.
> 
> You can grab the current training state (== pickled archive of selfplay
> positions used for replay, chronological) and neural network weights
> from the github's "Releases" page:
> 
>   https://github.com/rossumai/nochi/releases/tag/G171107T013304_00150
> 
>   This is a truly "zero-knowledge" system like AlphaGo Zero - it needs
> no supervision, and it contains no Monte Carlo simulations or other
> heuristics. But it's not entirely 1:1, I did some tweaks which I thought
> might help early convergence:
> 
>   * AlphaGo used 19 resnet layers for 19x19, so I used 7 layers for 7x7.
>   * The neural network is updated after _every_ game, _twice_, on _all_
> positions plus 64 randomly sampled positions from the entire history,
> this all done four times - on original position and the three
> symmetry flips (but I was too lazy to implement 90\deg rotation).
>   * Instead of supplying last 8 positions as the network input I feed
> just the last position plus two indicator matrices showing
> the location of the last and second-to-last move.
>   * No symmetry pruning during tree search.
>   * Value function is trained with cross-entropy rather than MSE,
> no L2 regularization, and plain Adam rather than hand-tuned SGD (but
> the annealing is reset time by time due to manual restarts of the
> script from a checkpoint).
>   * No resign auto-threshold but it is important to play 25% games
> without resigning to escale local "optima".
>   * 1/Temperature is 2 for first three moves.
>   * Initially I used 1000 "simulations" per move, but by mistake, last
> 1500 games when the network improved significantly (see below) were
> run with 2000 simulations per move.  So that might matter.
> 
>   This has been running for two weeks, self-playing 8500 games.  A week
> ago its moves already looked a bit natural but it was stuck in various
> local optima.  Three days ago it has beaten GNUGo once across 20 games.
> Now five times across 20 games - so I'll let it self-play a little longer
> as it might surpass GNUGo quickly at this point?  Also this late
> improvement coincides with the increased simulation number.
> 
>   At the same time, Nochi supports supervised training (with the rest
> kept the same) which I'm now experimenting with on 19x19.
> 
>   Happy training,
> 
> -- 
>   Petr Baudis, Rossum
>   Run before you walk! Fly before you crawl! Keep moving forward!
>   If we fail, I'd rather fail really hugely.  -- Moist von Lipwig
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Nochi: Slightly successful AlphaGo Zero replication

2017-11-09 Thread Petr Baudis
  Hi,

  I got first *somewhat* positive results in my attempt to reproduce
AlphaGo Zero - 25% winrate against GNUGo on the easiest reasonable task
- 7x7 board. :)  a.k.a.

"Sometimes beating GNUGo on a tiny board" without human knowledge

(much wow!)

  Normally this would be a pretty weak result much but (A) I wanted to
help calibrate other efforts on larger boards that are possibly still
at the "random" stage, and (B) I'll probably move on to other projects
again soon, so this might be as good as it gets for me.

  I started the project by replacing MC simulations with a Keras model
in my 550-line educational Go program Michi - it lived in its `nnet`
branch until now when I separated it to a project on its own:

https://github.com/rossumai/nochi

Starting from a small base means that the codebase is tiny and should be
easy to follow, though it's not at all as tidy as Michi is.

You can grab the current training state (== pickled archive of selfplay
positions used for replay, chronological) and neural network weights
from the github's "Releases" page:

https://github.com/rossumai/nochi/releases/tag/G171107T013304_00150

  This is a truly "zero-knowledge" system like AlphaGo Zero - it needs
no supervision, and it contains no Monte Carlo simulations or other
heuristics. But it's not entirely 1:1, I did some tweaks which I thought
might help early convergence:

  * AlphaGo used 19 resnet layers for 19x19, so I used 7 layers for 7x7.
  * The neural network is updated after _every_ game, _twice_, on _all_
positions plus 64 randomly sampled positions from the entire history,
this all done four times - on original position and the three
symmetry flips (but I was too lazy to implement 90\deg rotation).
  * Instead of supplying last 8 positions as the network input I feed
just the last position plus two indicator matrices showing
the location of the last and second-to-last move.
  * No symmetry pruning during tree search.
  * Value function is trained with cross-entropy rather than MSE,
no L2 regularization, and plain Adam rather than hand-tuned SGD (but
the annealing is reset time by time due to manual restarts of the
script from a checkpoint).
  * No resign auto-threshold but it is important to play 25% games
without resigning to escale local "optima".
  * 1/Temperature is 2 for first three moves.
  * Initially I used 1000 "simulations" per move, but by mistake, last
1500 games when the network improved significantly (see below) were
run with 2000 simulations per move.  So that might matter.

  This has been running for two weeks, self-playing 8500 games.  A week
ago its moves already looked a bit natural but it was stuck in various
local optima.  Three days ago it has beaten GNUGo once across 20 games.
Now five times across 20 games - so I'll let it self-play a little longer
as it might surpass GNUGo quickly at this point?  Also this late
improvement coincides with the increased simulation number.

  At the same time, Nochi supports supervised training (with the rest
kept the same) which I'm now experimenting with on 19x19.

  Happy training,

-- 
Petr Baudis, Rossum
Run before you walk! Fly before you crawl! Keep moving forward!
If we fail, I'd rather fail really hugely.  -- Moist von Lipwig
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go