[Computer-go] Aya reaches pro level on GoQuest 9x9 and 13x13

Aya reaches pro level on GoQuest 9x9 and 13x13.
Aya got highest rating in 9x9, and highest best rating in 13x13.
GoQuest is Go App for Android, iPhone and Browser.

In 9x9 and 13x13, Aya uses Policy network and Value network.
Policy net is same as 19x19.
It is trained by GoGoD 78000 games, using 8 symmetries, 120,000,000 positions.
It took one month with a GTX 980. Accuracy is 51.0%.
12 Layers, 128 Filters.
128 5x5 x1, 128 3x3 x10, 128 3x3 x1
Features are 49 channels.
Network is fully convolution, so it can be used 9x9 and 13x13.

DCNN without search is +580(19x1), +448(13x13) and +393(9x9) stronger than
GNU Go.(CGOS BayesElo)

DCNN_AyaF128a510x1  2193
Gnugo-3.7.10-a1  1800

DCNN_AyaF128a510x1  2248
Gnugo-3.7.10-a1  1800

DCNN_AyaF128a510x1  2380
Gnugo-3.7.10-a1  1800

Value Net is 32 Filters, 14 Layers.
32 5x5 x1, 32 3x3 x11, 32 1x1 x1, fully connect 256, fully connect tanh 1
Features are 50 channels.
Learning positions are made by Aya's selfplay. 9x9 is 2,200,000 games,
13x13 is 1,000,000 games. 16 position are selected from one game.
9x9   is 2000 playout/move. komi 7.0. (CGOS 2290).
13x13 is  500 playout/move. Only root is created by Policy Net. komi 7.5. (CGOS 
In 9x9, opening book from GoQuest 8607 games is used.
In 13x13, first 16 moves are selected from Policy net probability.

At first, I used playout winrate for training data. If 24 move's Black winrate
is 59%, set 0.59. But it is weaker than using game result 0 or 1.

Policy + Value vs Policy, 1000 playouts/move, 1000 games. 9x9, komi 7.0
0.634  using game result. 0 or 1
0.552  using game result. Cubic approximation.
0.625  using game result. Linear approximation.
0.641  using game result. 0 or 1, dropout, half, all layers
0.554  using playout winrate

Linear approximation is, if game ends 60 moves, and result is W win(0.0),
 then 30 moves position's value is (0.25).
Linear approximation reduces training loss though. (from 0.37 to 0.08.
19x19, B win +1.0, W win -1.0.)

Policy + Value vs Policy, 1000 playouts/move, 13x13, komi 7.5
0.735 1000 playouts/move, 994 games

Compared with 9x9, it seems stronger selfplay makes stronger value net.

I also made 19x19 Value net. 19x19 learning positions are from KGS 4d over,
GoGoD, Tygem and 500 playouts/move selfplay. 990255 games. 32 positions
are selected from a game. Like Detlef's idea, I also use game result.
I trust B+R and W+R games with komi 5.5, 6.5 and 7.5. In other games,
If B+ and 1000 playouts at final position is over +0.60, I use it.

Policy + Value vs Policy, 19x19, komi 7.5, Filter  32, Layer 14
0.640  1000 playouts/move, 995 games
0.654  1000 playouts/move, 500 games, explicit symmetry ensemble(Value net only)
0.635  1000 playouts/move, 818 games, Linear approximation

Policy + Value vs Policy, 19x19, komi 7.5, Filter 128, Layer 14
0.667   500 playouts/move, 501 games.
0.664  2000 playouts/move, 530 games.

Policy + Value vs Policy, 19x19, komi 7.5, Filter 128, Layler 14, using 2000 
playouts winrate
0.694  1000 playouts/move, 572 games
0.771 1 playouts/move, 332 games

Recently I found Black winrate is low in KGS games. Because there are
many komi 0.5 games, and in komi 0.5, White tends to win. Maybe I need
to reduce some White win games.

19x19 Black winrate 0.418, komi 7.5,  30,840,000 positions, GoGoD, KGD 4d, tygem
13x13 Black winrate 0.485, komi 7.5,  16,790,000 positions, selfplay, 500 
9x9   Black winrate 0.514, komi 7.0,  33,760,000 positions, selfplay, 2000 
playout/move, draw is 0.5

Using Policy + Value(Filter 32), Aya reaches 7d on KGS.
Machine is W3680 3.3GHz, 6 cores, a GTX 980
 AyaMC 4d
 AyaMC 6d  with Policy
 AyaMC 7d  with Policy and Value, handicaps <= 3, no dynamic komi.

GoQuest ranking, Bot is not listed. "spaceman" is OHASHI Hirofumi 6p.
13x13   http://wars.fm/go13#users/0
 9x9   http://wars.fm/go9#users/0
AyaZBot http://wars.fm/go9#user/:ayazbot

Aya's GoQuest rating
:AyaXBot   2322 2407   1 playout/move, only root node is Policy
:AyaZBot   2466 2361   year 2014
:AyaZBot   2647 2711   Policy+Value, W3680 3.3GHz, 6 core, a GTX 980
:CrazyStoneBot 2592year 2014

 GoQuest time setting is 5 minutes + add 3 sec/move in 13x13.
 Computers have an advantage on this setting.

I wrote an article how to make Poilicy and Value network.
I'm afraid it is in Japanese. But some of links are maybe useful.
This includes Aya's network definition.

Open source
Ray-nn, Ray with Policy and Value net, CGOS 2

[Computer-go] English commentary available (Re: Zen vs Cho Chikun)

These games will have English commentary by Michael Redmond 9p and 
Antti Tormanen 1p.



>Anyone will be able to watch the games on Niconico video.  


>Set the first language of your web browser to English and access 

>http://www.nicovideo.jp/.  You will be able to register (free) and 

>will be able to watch the games.


>The links to the games are 

>http://live.nicovideo.jp/watch/lv280070012 (11/19, 12:30 PM JST), 

>http://live.nicovideo.jp/watch/lv280070055 (11/20, ditto) and 

>http://live.nicovideo.jp/watch/lv280070087 (11/23, ditto).

>You can set (or reserve) "time shift" viewing, which allows you to 

>watch later.


>Rules: Japanese rules, 6.5 pts komi, 2 hours + 3 x 60 seconds 

>byoyomi.  All three games will be played.



>CPU: 2 x Intel Xeon E5-2699v4 (44 cores/2.2 GHz)

>GPU: 4 x nVidia Titan X (Pascal)

>RAM: 128GB


>Best, Hideki


>>match between Zen and human pro hero Cho Chiukum with games to


>>be played on November 19, 20, and 23.


>>For me (and likely for many in the Western world) it would be nice


>>if for instance Hideki Kato could explain what is to come.




