[Computer-go] interesting paper in arXiv

2016-03-04 Thread Ingo Althöfer
I think, this new paper by Yuandong Tian and Yan Zhu
is interesting for us:

Better Computer Go Player with Neural Network and Long-term Prediction
http://arxiv.org/abs/1511.06410

Cheers, Ingo.
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Value Network

2016-03-04 Thread Detlef Schmicker
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

thanks a lot for sharing! I try a slightly different approach at the
moment:

I use a combined policy / value network (adding 3-5 layers with about
16 filters at the end of the policy network for the value network to
avoid overfitting) and I use the results of the games as value. My
main problem is still overfitting!

As your results seem good I will try your bigger database to get more
games into training I think.

I will keep you posted

Detlef

Am 04.03.2016 um 16:23 schrieb Hiroshi Yamashita:
> Hi,
> 
> I tried to make Value network.
> 
> "Policy network + Value network"  vs  "Policy network" Winrate
> Wins/Games 70.7%322 / 455,1000 playouts/move 76.6%141 /
> 184,   1 playouts/move
> 
> It seems more playouts, more Value network is effetctive. Games is
> not enough though. Search is similar to AlphaGo. Mixing parameter
> lambda is 0.5. Search is synchronous. Using one GTX 980. In 1
> playouts/move, Policy network is called 175 times, Value network is
> called 786 times. Node Expansion threshold is 33.
> 
> 
> Value network is 13 layers, 128 filters. (5x5_128, 3x3_128 x10,
> 1x1_1, fully connect, tanh) Policy network is 12 layers, 256
> filters. (5x5_256, 3x3_256 x10, 3x3_1), Accuracy is 50.1%
> 
> For Value network, I collected 15804400 positions from 987775
> games. Games are from GoGoD, tygem 9d,  22477 games
> http://baduk.sourceforge.net/TygemAmateur.7z KGS 4d over, 1450946
> games http://www.u-go.net/gamerecords-4d/ (except handicaps
> games). And select 16 positions randomly from one game. One game is
> divided 16 game stage, and select one of each. 1st and 9th position
> are rotated in same symmetry. Then Aya searches with 500 playouts, 
> with Policy network. And store winrate (-1 to +1). Komi is 7.5. 
> This 500 playouts is around 2730 BayesElo on CGOS.
> 
> I did some of this on Amazon EC2 g2.2xlarge, 11 instances. It took 
> 2 days, and costed $54. Spot instance is reasonable. However 
> g2.2xlarge(GRID K520), is 3x slower than GTX 980. My Pocicy 
> network(12L 256F) takes 5.37ms(GTX 980), and 15.0ms(g2.2xlarge). 
> Test and Traing loss are 0.00923 and 0.00778. I think there is no
> big overfitting.
> 
> Value network is effective, but Aya has still fatal semeai
> weakness.
> 
> Regards, Hiroshi Yamashita
> 
> ___ Computer-go mailing
> list Computer-go@computer-go.org 
> http://computer-go.org/mailman/listinfo/computer-go
> 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJW2cD5AAoJEInWdHg+Znf4KXIP/2rrfEph3VNHkrf5B4H0DJXm
abSsFbqF453SjFOucjSGXv8Ecp90lCmwz41NWkQEpBLvedjl4atjMoBiCorjqhny
ZKeFUgY6tK0HWU2euxHH9reJ6HAsDrlYgMrJKqNySdAtPxNq2buMW1qIiFrAHCsL
wCsYlwVtz4EpViJcuSXoFufreTyfUJ7p8AxrhRtuC6ALZI1wUTm+xrwrCHPQ91Bg
AKx5N2xLO2c7rHCt9FsLhR1BmXgximzmYsD7Sge4mdYMwU5nrRhxgAvX1Uj8sP8Z
2YfF+/8YmFP/rc55LqqRGzjeUwpJaX8rv1eHxl+eaNoptP7PZcFchsC5motc6XNV
fjTwOhyaeEsPlPIDylJN5PNPn2hXc75MqVDHMnUn2J+VF2DdlerKMmZhqTd1VaIu
sHz1+DN7PNZIO4cO3AKi9ynmBEHB1pQaRH4nDWkL6hdI8Zv6ZgJEjRhXjnFWyJcI
PVmErcUI6Xn1xCXHEWhxSjKwuwil/RgdVfPgywfqhj1MiuTtkcrThpUmWcPCrLRk
fxsNddSKmJcFs4nCcK/M6oO/OiZ6mn7dO4xoCWnAvds3aW71tEupTuZhYjiWx9YH
KR5p4r7JIBNCSn1ZfonD3BKMKyBv7qIJ63ITSAdy0EH3aJPt4CVmZm2dsrE3ZhtW
wMqhp4Yf8ecTiapJOcol
=oJsM
-END PGP SIGNATURE-
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Value Network

2016-03-04 Thread Hiroshi Yamashita

Hi,

I tried to make Value network.

"Policy network + Value network"  vs  "Policy network"
Winrate  Wins/Games
70.7%322 / 455,1000 playouts/move
76.6%141 / 184,   1 playouts/move

It seems more playouts, more Value network is effetctive. Games
is not enough though. Search is similar to AlphaGo. Mixing
parameter lambda is 0.5. Search is synchronous. Using one GTX 980.
In 1 playouts/move, Policy network is called 175 times,
Value network is called 786 times. Node Expansion threshold is 33.


Value network is
 13 layers, 128 filters. (5x5_128, 3x3_128 x10, 1x1_1, fully connect, tanh)
Policy network is
 12 layers, 256 filters. (5x5_256, 3x3_256 x10, 3x3_1), Accuracy is 50.1%

For Value network, I collected 15804400 positions from 987775 games.
Games are from
 GoGoD,
 tygem 9d,  22477 games http://baduk.sourceforge.net/TygemAmateur.7z
 KGS 4d over, 1450946 games http://www.u-go.net/gamerecords-4d/
 (except handicaps games).
And select 16 positions randomly from one game. One game is divided
16 game stage, and select one of each. 1st and 9th position are
rotated in same symmetry. Then Aya searches with 500 playouts,
with Policy network. And store winrate (-1 to +1). Komi is 7.5.
This 500 playouts is around 2730 BayesElo on CGOS.

I did some of this on Amazon EC2 g2.2xlarge, 11 instances. It took
2 days, and costed $54. Spot instance is reasonable. However
g2.2xlarge(GRID K520), is 3x slower than GTX 980. My Pocicy
network(12L 256F) takes 5.37ms(GTX 980), and 15.0ms(g2.2xlarge).
Test and Traing loss are 0.00923 and 0.00778. I think there is
no big overfitting.

Value network is effective, but Aya has still fatal semeai weakness.

Regards,
Hiroshi Yamashita

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go