Re: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication

2017-11-15 Thread Gian-Carlo Pascutto
On 11-11-17 00:58, Petr Baudis wrote:
>>>   * The neural network is updated after _every_ game, _twice_, on _all_
>>> positions plus 64 randomly sampled positions from the entire history,
>>> this all done four times - on original position and the three
>>> symmetry flips (but I was too lazy to implement 90\deg rotation).
>>
>> The reasoning being to give a stronger and faster reinforcement with the
>> latest data?
> 
> Yes.

One thing I wonder about, given the huge size of the network and the
strong reinforcement, don't you get total overfitting?

I guess the next few games will quickly "point out" the overfit, but I
still wonder whether keeping the overfit under control wouldn't be
better rather than the see-sawing this would seem to cause.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication

2017-11-10 Thread Petr Baudis
On Fri, Nov 10, 2017 at 03:40:27PM +0100, Gian-Carlo Pascutto wrote:
> On 10/11/2017 1:47, Petr Baudis wrote:
> 
> >   * AlphaGo used 19 resnet layers for 19x19, so I used 7 layers for 7x7.
> 
> How many filters per layer?

256 like AlphaGo.

> FWIW 7 layer resnet (14 + 2 layers) is still pretty huge - larger than
> the initial AlphaGo. Given the amount of games you have, and the size of
> the board, I would not be surprised if your neural net program is
> "outbooking" the opponent by remembering the sequences rather than
> learning more generic things.
> 
> (But hey, outbooking is learning too!)

I couldn't exclude this, yes.  It would be interesting to try to use the
same convolutions on a bigger board to see if they play shapes and can do
basic tactics.

> >   * The neural network is updated after _every_ game, _twice_, on _all_
> > positions plus 64 randomly sampled positions from the entire history,
> > this all done four times - on original position and the three
> > symmetry flips (but I was too lazy to implement 90\deg rotation).
> 
> The reasoning being to give a stronger and faster reinforcement with the
> latest data?

Yes.

> >   * Value function is trained with cross-entropy rather than MSE,
> > no L2 regularization, and plain Adam rather than hand-tuned SGD (but
> > the annealing is reset time by time due to manual restarts of the
> > script from a checkpoint).
> 
> I never really had good results with Adam and friends compared to SGD
> (even momentum does not always help - but of course it's much faster
> early on).

It has worked great on all my neural models in other tasks - but this is
actually my first neural model for Go. :)

> >   * No resign auto-threshold but it is important to play 25% games
> > without resigning to escale local "optima".
> 
> This makes sense because both sides will miscount in exactly the same way.

Without this, producing value 1.0 for one color and 0.0 for the other
is a super-strong attractor.

> >   * 1/Temperature is 2 for first three moves.
> >   * Initially I used 1000 "simulations" per move, but by mistake, last
> > 1500 games when the network improved significantly (see below) were
> > run with 2000 simulations per move.  So that might matter.
> > 
> >   This has been running for two weeks, self-playing 8500 games.  A week
> > ago its moves already looked a bit natural but it was stuck in various
> > local optima.  Three days ago it has beaten GNUGo once across 20 games.
> > Now five times across 20 games - so I'll let it self-play a little longer
> > as it might surpass GNUGo quickly at this point?  Also this late
> > improvement coincides with the increased simulation number.
> 
> The simulation number if one of the big black boxes in this setup, I
> think. If the policy network does not have a strong opinion yet, it
> seems that one has to make it sufficiently bigger than the amount of
> legal moves. If not, first-play-urgency will cause every successor
> position to be evaluated and there's no look ahead, which means MCTS
> can't discover anything.

I don't see how first-play-urgency comes into play.  Initially it'll be
typically noise but that still means growing the tree pretty
asymmetrically.  I saw uniform sampling only in some cases when number
of simulations was << number of children.

> So a few times 361 makes sense for 19x19, but don't ask me why 1600 and
> not 1200 etc.

My feeling now is that especially slightly later on raising the count
really helps.  I think the moment is when you stop seeing regular, large
discrepancies between network predictions and scoring output in very
late endgame.  But it could be an illusion.

> With only 50-ish moves to consider on 7x7, it's interesting that you see
> a big improvement by making it (relatively) much larger than DeepMind did.
> 
> But uh, you're not simply matching it against GNUGo with more
> simulations are you? I mean it would be quite normal to win more when
> searching deeper.

All playtests should have been with 2000 simulations.

-- 
Petr Baudis, Rossum
Run before you walk! Fly before you crawl! Keep moving forward!
If we fail, I'd rather fail really hugely.  -- Moist von Lipwig
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication

2017-11-10 Thread Gian-Carlo Pascutto
On 10/11/2017 1:47, Petr Baudis wrote:

>   * AlphaGo used 19 resnet layers for 19x19, so I used 7 layers for 7x7.

How many filters per layer?

FWIW 7 layer resnet (14 + 2 layers) is still pretty huge - larger than
the initial AlphaGo. Given the amount of games you have, and the size of
the board, I would not be surprised if your neural net program is
"outbooking" the opponent by remembering the sequences rather than
learning more generic things.

(But hey, outbooking is learning too!)

>   * The neural network is updated after _every_ game, _twice_, on _all_
> positions plus 64 randomly sampled positions from the entire history,
> this all done four times - on original position and the three
> symmetry flips (but I was too lazy to implement 90\deg rotation).

The reasoning being to give a stronger and faster reinforcement with the
latest data?

>   * Value function is trained with cross-entropy rather than MSE,
> no L2 regularization, and plain Adam rather than hand-tuned SGD (but
> the annealing is reset time by time due to manual restarts of the
> script from a checkpoint).

I never really had good results with Adam and friends compared to SGD
(even momentum does not always help - but of course it's much faster
early on).

>   * No resign auto-threshold but it is important to play 25% games
> without resigning to escale local "optima".

This makes sense because both sides will miscount in exactly the same way.

>   * 1/Temperature is 2 for first three moves.
>   * Initially I used 1000 "simulations" per move, but by mistake, last
> 1500 games when the network improved significantly (see below) were
> run with 2000 simulations per move.  So that might matter.
> 
>   This has been running for two weeks, self-playing 8500 games.  A week
> ago its moves already looked a bit natural but it was stuck in various
> local optima.  Three days ago it has beaten GNUGo once across 20 games.
> Now five times across 20 games - so I'll let it self-play a little longer
> as it might surpass GNUGo quickly at this point?  Also this late
> improvement coincides with the increased simulation number.

The simulation number if one of the big black boxes in this setup, I
think. If the policy network does not have a strong opinion yet, it
seems that one has to make it sufficiently bigger than the amount of
legal moves. If not, first-play-urgency will cause every successor
position to be evaluated and there's no look ahead, which means MCTS
can't discover anything.

So a few times 361 makes sense for 19x19, but don't ask me why 1600 and
not 1200 etc.

With only 50-ish moves to consider on 7x7, it's interesting that you see
a big improvement by making it (relatively) much larger than DeepMind did.

But uh, you're not simply matching it against GNUGo with more
simulations are you? I mean it would be quite normal to win more when
searching deeper.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication

2017-11-10 Thread Álvaro Begué
It's a model written using the Keras neural network library:
https://en.wikipedia.org/wiki/Keras


On Fri, Nov 10, 2017 at 7:09 AM, Xavier Combelle 
wrote:

> You make me really curious,
> what is a Keras model ?
>
> Le 10/11/2017 à 01:47, Petr Baudis a écrit :
> >   Hi,
> >
> >   I got first *somewhat* positive results in my attempt to reproduce
> > AlphaGo Zero - 25% winrate against GNUGo on the easiest reasonable task
> > - 7x7 board. :)  a.k.a.
> >
> >   "Sometimes beating GNUGo on a tiny board" without human knowledge
> >
> > (much wow!)
> >
> >   Normally this would be a pretty weak result much but (A) I wanted to
> > help calibrate other efforts on larger boards that are possibly still
> > at the "random" stage, and (B) I'll probably move on to other projects
> > again soon, so this might be as good as it gets for me.
> >
> >   I started the project by replacing MC simulations with a Keras model
> > in my 550-line educational Go program Michi - it lived in its `nnet`
> > branch until now when I separated it to a project on its own:
> >
> >   https://github.com/rossumai/nochi
> >
> > Starting from a small base means that the codebase is tiny and should be
> > easy to follow, though it's not at all as tidy as Michi is.
> >
> > You can grab the current training state (== pickled archive of selfplay
> > positions used for replay, chronological) and neural network weights
> > from the github's "Releases" page:
> >
> >   https://github.com/rossumai/nochi/releases/tag/
> G171107T013304_00150
> >
> >   This is a truly "zero-knowledge" system like AlphaGo Zero - it needs
> > no supervision, and it contains no Monte Carlo simulations or other
> > heuristics. But it's not entirely 1:1, I did some tweaks which I thought
> > might help early convergence:
> >
> >   * AlphaGo used 19 resnet layers for 19x19, so I used 7 layers for 7x7.
> >   * The neural network is updated after _every_ game, _twice_, on _all_
> > positions plus 64 randomly sampled positions from the entire history,
> > this all done four times - on original position and the three
> > symmetry flips (but I was too lazy to implement 90\deg rotation).
> >   * Instead of supplying last 8 positions as the network input I feed
> > just the last position plus two indicator matrices showing
> > the location of the last and second-to-last move.
> >   * No symmetry pruning during tree search.
> >   * Value function is trained with cross-entropy rather than MSE,
> > no L2 regularization, and plain Adam rather than hand-tuned SGD (but
> > the annealing is reset time by time due to manual restarts of the
> > script from a checkpoint).
> >   * No resign auto-threshold but it is important to play 25% games
> > without resigning to escale local "optima".
> >   * 1/Temperature is 2 for first three moves.
> >   * Initially I used 1000 "simulations" per move, but by mistake, last
> > 1500 games when the network improved significantly (see below) were
> > run with 2000 simulations per move.  So that might matter.
> >
> >   This has been running for two weeks, self-playing 8500 games.  A week
> > ago its moves already looked a bit natural but it was stuck in various
> > local optima.  Three days ago it has beaten GNUGo once across 20 games.
> > Now five times across 20 games - so I'll let it self-play a little longer
> > as it might surpass GNUGo quickly at this point?  Also this late
> > improvement coincides with the increased simulation number.
> >
> >   At the same time, Nochi supports supervised training (with the rest
> > kept the same) which I'm now experimenting with on 19x19.
> >
> >   Happy training,
> >
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication

2017-11-10 Thread Darren Cook
> You make me really curious, what is a Keras model ?

When I was a lad, you had to bike 3 miles (uphill in both directions) to
the library to satisfy curiosity. Nowadays you just type "keras" into
Google ;-)

https://keras.io/

Darren
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication

2017-11-10 Thread Xavier Combelle
You make me really curious,
what is a Keras model ?

Le 10/11/2017 à 01:47, Petr Baudis a écrit :
>   Hi,
>
>   I got first *somewhat* positive results in my attempt to reproduce
> AlphaGo Zero - 25% winrate against GNUGo on the easiest reasonable task
> - 7x7 board. :)  a.k.a.
>
>   "Sometimes beating GNUGo on a tiny board" without human knowledge
>
> (much wow!)
>
>   Normally this would be a pretty weak result much but (A) I wanted to
> help calibrate other efforts on larger boards that are possibly still
> at the "random" stage, and (B) I'll probably move on to other projects
> again soon, so this might be as good as it gets for me.
>
>   I started the project by replacing MC simulations with a Keras model
> in my 550-line educational Go program Michi - it lived in its `nnet`
> branch until now when I separated it to a project on its own:
>
>   https://github.com/rossumai/nochi
>
> Starting from a small base means that the codebase is tiny and should be
> easy to follow, though it's not at all as tidy as Michi is.
>
> You can grab the current training state (== pickled archive of selfplay
> positions used for replay, chronological) and neural network weights
> from the github's "Releases" page:
>
>   https://github.com/rossumai/nochi/releases/tag/G171107T013304_00150
>
>   This is a truly "zero-knowledge" system like AlphaGo Zero - it needs
> no supervision, and it contains no Monte Carlo simulations or other
> heuristics. But it's not entirely 1:1, I did some tweaks which I thought
> might help early convergence:
>
>   * AlphaGo used 19 resnet layers for 19x19, so I used 7 layers for 7x7.
>   * The neural network is updated after _every_ game, _twice_, on _all_
> positions plus 64 randomly sampled positions from the entire history,
> this all done four times - on original position and the three
> symmetry flips (but I was too lazy to implement 90\deg rotation).
>   * Instead of supplying last 8 positions as the network input I feed
> just the last position plus two indicator matrices showing
> the location of the last and second-to-last move.
>   * No symmetry pruning during tree search.
>   * Value function is trained with cross-entropy rather than MSE,
> no L2 regularization, and plain Adam rather than hand-tuned SGD (but
> the annealing is reset time by time due to manual restarts of the
> script from a checkpoint).
>   * No resign auto-threshold but it is important to play 25% games
> without resigning to escale local "optima".
>   * 1/Temperature is 2 for first three moves.
>   * Initially I used 1000 "simulations" per move, but by mistake, last
> 1500 games when the network improved significantly (see below) were
> run with 2000 simulations per move.  So that might matter.
>
>   This has been running for two weeks, self-playing 8500 games.  A week
> ago its moves already looked a bit natural but it was stuck in various
> local optima.  Three days ago it has beaten GNUGo once across 20 games.
> Now five times across 20 games - so I'll let it self-play a little longer
> as it might surpass GNUGo quickly at this point?  Also this late
> improvement coincides with the increased simulation number.
>
>   At the same time, Nochi supports supervised training (with the rest
> kept the same) which I'm now experimenting with on 19x19.
>
>   Happy training,
>

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication

2017-11-10 Thread Petr Baudis
On Fri, Nov 10, 2017 at 01:47:17AM +0100, Petr Baudis wrote:
>   This is a truly "zero-knowledge" system like AlphaGo Zero - it needs
> no supervision, and it contains no Monte Carlo simulations or other
> heuristics. But it's not entirely 1:1, I did some tweaks which I thought
> might help early convergence:

  * The usual MC rule of not filling one's true eye is used - eye-filling
moves are never considered by Nochi.  Rather than encoding knowledge,
I prefer to think about this as a tweaked Go ruleset, but YMMV. :)

Petr Baudis
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication

2017-11-09 Thread Ingo Althöfer
Dear Petr,

many thanks for you experiments and your posting!
It is an important puzzle piece in understanding 
Alpha-Zero.

Keep on with your good work.

Regards, Ingo.



> Gesendet: Freitag, 10. November 2017 um 01:47 Uhr
> Von: "Petr Baudis" <pa...@ucw.cz>
> An: computer-go@computer-go.org
> Betreff: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication
>
>   Hi,
> 
>   I got first *somewhat* positive results in my attempt to reproduce
> AlphaGo Zero - 25% winrate against GNUGo on the easiest reasonable task
> - 7x7 board. :)  a.k.a.
> 
>   "Sometimes beating GNUGo on a tiny board" without human knowledge
> 
> (much wow!)
> 
>   Normally this would be a pretty weak result much but (A) I wanted to
> help calibrate other efforts on larger boards that are possibly still
> at the "random" stage, and (B) I'll probably move on to other projects
> again soon, so this might be as good as it gets for me.
> 
>   I started the project by replacing MC simulations with a Keras model
> in my 550-line educational Go program Michi - it lived in its `nnet`
> branch until now when I separated it to a project on its own:
> 
>   https://github.com/rossumai/nochi
> 
> Starting from a small base means that the codebase is tiny and should be
> easy to follow, though it's not at all as tidy as Michi is.
> 
> You can grab the current training state (== pickled archive of selfplay
> positions used for replay, chronological) and neural network weights
> from the github's "Releases" page:
> 
>   https://github.com/rossumai/nochi/releases/tag/G171107T013304_00150
> 
>   This is a truly "zero-knowledge" system like AlphaGo Zero - it needs
> no supervision, and it contains no Monte Carlo simulations or other
> heuristics. But it's not entirely 1:1, I did some tweaks which I thought
> might help early convergence:
> 
>   * AlphaGo used 19 resnet layers for 19x19, so I used 7 layers for 7x7.
>   * The neural network is updated after _every_ game, _twice_, on _all_
> positions plus 64 randomly sampled positions from the entire history,
> this all done four times - on original position and the three
> symmetry flips (but I was too lazy to implement 90\deg rotation).
>   * Instead of supplying last 8 positions as the network input I feed
> just the last position plus two indicator matrices showing
> the location of the last and second-to-last move.
>   * No symmetry pruning during tree search.
>   * Value function is trained with cross-entropy rather than MSE,
> no L2 regularization, and plain Adam rather than hand-tuned SGD (but
> the annealing is reset time by time due to manual restarts of the
> script from a checkpoint).
>   * No resign auto-threshold but it is important to play 25% games
> without resigning to escale local "optima".
>   * 1/Temperature is 2 for first three moves.
>   * Initially I used 1000 "simulations" per move, but by mistake, last
> 1500 games when the network improved significantly (see below) were
> run with 2000 simulations per move.  So that might matter.
> 
>   This has been running for two weeks, self-playing 8500 games.  A week
> ago its moves already looked a bit natural but it was stuck in various
> local optima.  Three days ago it has beaten GNUGo once across 20 games.
> Now five times across 20 games - so I'll let it self-play a little longer
> as it might surpass GNUGo quickly at this point?  Also this late
> improvement coincides with the increased simulation number.
> 
>   At the same time, Nochi supports supervised training (with the rest
> kept the same) which I'm now experimenting with on 19x19.
> 
>   Happy training,
> 
> -- 
>   Petr Baudis, Rossum
>   Run before you walk! Fly before you crawl! Keep moving forward!
>   If we fail, I'd rather fail really hugely.  -- Moist von Lipwig
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Nochi: Slightly successful AlphaGo Zero replication

2017-11-09 Thread Petr Baudis
  Hi,

  I got first *somewhat* positive results in my attempt to reproduce
AlphaGo Zero - 25% winrate against GNUGo on the easiest reasonable task
- 7x7 board. :)  a.k.a.

"Sometimes beating GNUGo on a tiny board" without human knowledge

(much wow!)

  Normally this would be a pretty weak result much but (A) I wanted to
help calibrate other efforts on larger boards that are possibly still
at the "random" stage, and (B) I'll probably move on to other projects
again soon, so this might be as good as it gets for me.

  I started the project by replacing MC simulations with a Keras model
in my 550-line educational Go program Michi - it lived in its `nnet`
branch until now when I separated it to a project on its own:

https://github.com/rossumai/nochi

Starting from a small base means that the codebase is tiny and should be
easy to follow, though it's not at all as tidy as Michi is.

You can grab the current training state (== pickled archive of selfplay
positions used for replay, chronological) and neural network weights
from the github's "Releases" page:

https://github.com/rossumai/nochi/releases/tag/G171107T013304_00150

  This is a truly "zero-knowledge" system like AlphaGo Zero - it needs
no supervision, and it contains no Monte Carlo simulations or other
heuristics. But it's not entirely 1:1, I did some tweaks which I thought
might help early convergence:

  * AlphaGo used 19 resnet layers for 19x19, so I used 7 layers for 7x7.
  * The neural network is updated after _every_ game, _twice_, on _all_
positions plus 64 randomly sampled positions from the entire history,
this all done four times - on original position and the three
symmetry flips (but I was too lazy to implement 90\deg rotation).
  * Instead of supplying last 8 positions as the network input I feed
just the last position plus two indicator matrices showing
the location of the last and second-to-last move.
  * No symmetry pruning during tree search.
  * Value function is trained with cross-entropy rather than MSE,
no L2 regularization, and plain Adam rather than hand-tuned SGD (but
the annealing is reset time by time due to manual restarts of the
script from a checkpoint).
  * No resign auto-threshold but it is important to play 25% games
without resigning to escale local "optima".
  * 1/Temperature is 2 for first three moves.
  * Initially I used 1000 "simulations" per move, but by mistake, last
1500 games when the network improved significantly (see below) were
run with 2000 simulations per move.  So that might matter.

  This has been running for two weeks, self-playing 8500 games.  A week
ago its moves already looked a bit natural but it was stuck in various
local optima.  Three days ago it has beaten GNUGo once across 20 games.
Now five times across 20 games - so I'll let it self-play a little longer
as it might surpass GNUGo quickly at this point?  Also this late
improvement coincides with the increased simulation number.

  At the same time, Nochi supports supervised training (with the rest
kept the same) which I'm now experimenting with on 19x19.

  Happy training,

-- 
Petr Baudis, Rossum
Run before you walk! Fly before you crawl! Keep moving forward!
If we fail, I'd rather fail really hugely.  -- Moist von Lipwig
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go