Re: [Computer-go] Value network that doesn't want to learn.

2017-06-23 Thread Brian Sheppard via Computer-go
>... my value network was trained to tell me the game is balanced at the 
>beginning...

:-)

The best training policy is to select positions that correct errors.

I used the policies below to train a backgammon NN. Together, they reduced the 
expected loss of the network by 50% (cut the error rate in half):

- Select training positions from the program's own games.
- Can be self-play or versus an opponent.
- Best is to have a broad panel of opponents.
- Beneficial to bootstrap with pro games, but then add ONLY training 
examples from program's own games.
- Train only the moves made by the winner of the game
- Very important for deterministic games!
- Note that the winner can be either your program or the opponent.
- If your program wins then training reinforces good behavior; if 
opponent wins then training corrects bad behavior.
- Per game, you should aim to get only a few training examples (3 in 
backgammon. Maybe 10 in Go?). Use two policies:
- Select positions where the static evaluation of a position is 
significantly different from a deep search
- Select positions where the move selected by a deep search did not 
have the highest static evaluation. (And in this case you have two training 
positions, which differ by the move chosen.)
- Of course, you are selecting examples where you did as badly as 
possible.
- The training value of the position is the result of a deep search.
- This is equivalent to "temporal difference learning", but accelerated 
by the depth of the search.
- Periodically refresh the training evaluations as your search/eval 
improve.

These policies actively seek out cases where your evaluation function has some 
weakness, so training is definitely focused on improving results in the 
distribution of positions that your program will actually face.

You will need about 30 training examples for every free parameter in your NN. 
You can do the math on how many games that will take. It is inevitable: you 
will train your NN based on blitz games.

Good luck!



___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Value network that doesn't want to learn.

2017-06-23 Thread Vincent Richard

Finally found the problem. In the end, it was as stupid as expected:

When I pick a game for the batch creation I select randomly a limited 
number of moves inside the game. In the case of the value network I use 
like 8-16 moves to not overfit the data (I can't take 1 or then the I/O 
operations slow down the training) and for other networks, I would 
simply take all the moves. Or at least this was what I thought my code 
was doing. Instead of picking N random moves in the game, it was picking 
the first N moves in a random order. So... my value network was trained 
to tell me the game is balanced at the beginning...



Le 20-Jun-17 à 5:48 AM, Gian-Carlo Pascutto a écrit :

On 19/06/2017 21:31, Vincent Richard wrote:

- The data is then analyzed by a script which extracts all kind of
features from games. When I'm training a network, I load the features I
want from this analysis to build the batch. I have 2 possible methods
for the batch construction. I can either add moves one after the other
(the fast mode) or pick random moves among different games (slower but
reduces the variance).

You absolutely need the latter, especially as for outcome prediction the
moves from the same game are not independent samples.


During sime of the tests, all the networks I was training had the same
layers except for the last. So as you suggested, I was also wondering if
this last layer wasn’t the problem. Yet, I haven’t found any error.

...

However, if I feed a stupid
value as target output (for example black always win) it has no trouble
learning.

A problem with side to move/won side marking in the input or feature
planes, or with the expected outcome (0 vs 1 vs -1)?



___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Value network that doesn't want to learn.

2017-06-19 Thread Vincent Richard

This is what have been thinking about, yet unable to find an error.

Currently, I'm working with:

- SGF Database: fuseki info Tygem -> http://tygem.fuseki.info/index.php 
(until recently I was working with games of all level from KGS)


- The data is then analyzed by a script which extracts all kind of 
features from games. When I'm training a network, I load the features I 
want from this analysis to build the batch. I have 2 possible methods 
for the batch construction. I can either add moves one after the other 
(the fast mode) or pick random moves among different games (slower but 
reduces the variance). I set the batch size according to my GPU memory 
(200 moves in the case of full sized value/policy network). I don't 
think the problem may come from here since the data is the same for all 
the networks


- For the input, I’m using the same architecture as 
https://github.com/TheDuck314/go-NN (I have been trying a lot of kind of 
shapes, from minimalist to alphago)


- For the network, I’m once again using TheDuck314 network 
(EvalModels.Conv11PosDepFC1ELU) with the same layers 
https://github.com/TheDuck314/go-NN/blob/master/engine/Layers.py, and 
the learning rate he recommends


During sime of the tests, all the networks I was training had the same 
layers except for the last. So as you suggested, I was also wondering if 
this last layer wasn’t the problem. Yet, I haven’t found any error.




Le 20-Jun-17 à 3:19 AM, Gian-Carlo Pascutto a écrit :

On 19-06-17 17:38, Vincent Richard wrote:


During my research, I’ve trained a lot of different networks, first on
9x9 then on 19x19, and as far as I remember all the nets I’ve worked
with learned quickly (especially during the first batches), except the
value net which has always been problematic (diverge easily, doesn't
learn quickly,...) . I have been stuck on the 19x19 value network for a
couple months now. I’ve tried countless of inputs (feature planes) and
lots of different models, even using the exact same code as others. Yet,
whatever I try, the loss value doesn’t move an inch and accuracy stays
at 50% (even after days of training). I've tried to change the learning
rate (increase/decrease), it doesn't change. However, if I feed a stupid
value as target output (for example black always win) it has no trouble
learning.
It is even more frustrating that training any other kind of network
(predicting next move, territory,...) goes smoothly and fast.

Has anyone experienced a similar problem with value networks or has an
idea of the cause?

1) What is the training data for the value network? How big is it, how
is it presented/shuffled/prepared?

2) What is the *exact* structure of the network and training setup?

My best guess would be an error in the construction of the final layers.



___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Value network that doesn't want to learn.

2017-06-19 Thread Gian-Carlo Pascutto
On 19-06-17 17:38, Vincent Richard wrote:

> During my research, I’ve trained a lot of different networks, first on
> 9x9 then on 19x19, and as far as I remember all the nets I’ve worked
> with learned quickly (especially during the first batches), except the
> value net which has always been problematic (diverge easily, doesn't
> learn quickly,...) . I have been stuck on the 19x19 value network for a
> couple months now. I’ve tried countless of inputs (feature planes) and
> lots of different models, even using the exact same code as others. Yet,
> whatever I try, the loss value doesn’t move an inch and accuracy stays
> at 50% (even after days of training). I've tried to change the learning
> rate (increase/decrease), it doesn't change. However, if I feed a stupid
> value as target output (for example black always win) it has no trouble
> learning.
> It is even more frustrating that training any other kind of network
> (predicting next move, territory,...) goes smoothly and fast.
> 
> Has anyone experienced a similar problem with value networks or has an
> idea of the cause?

1) What is the training data for the value network? How big is it, how
is it presented/shuffled/prepared?

2) What is the *exact* structure of the network and training setup?

My best guess would be an error in the construction of the final layers.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Value network that doesn't want to learn.

2017-06-19 Thread Vincent Richard

Hello everyone,

For my master thesis, I have built an AI that has a strategical approach 
to the game. It doesn’t play but simply describe the strategy behind all 
possible move for a given strategy ("enclosing this group", "making life 
for this group", "saving these stones", etc). My main idea is that once 
associated with a playing AI, I will be able to generate comments on a 
position (and then teach people). So for my final experiment, I’m trying 
to build a playing AI. I don’t want it to be highly competitive, I just 
need it to be decent (1d or so), so I thought about using a policy 
network, a value network and a simple MCTS.  The MCTS works fine, the 
policy network learns quickly and is accurate, but the value network 
seems to never learn, even the slightest.


During my research, I’ve trained a lot of different networks, first on 
9x9 then on 19x19, and as far as I remember all the nets I’ve worked 
with learned quickly (especially during the first batches), except the 
value net which has always been problematic (diverge easily, doesn't 
learn quickly,...) . I have been stuck on the 19x19 value network for a 
couple months now. I’ve tried countless of inputs (feature planes) and 
lots of different models, even using the exact same code as others. Yet, 
whatever I try, the loss value doesn’t move an inch and accuracy stays 
at 50% (even after days of training). I've tried to change the learning 
rate (increase/decrease), it doesn't change. However, if I feed a stupid 
value as target output (for example black always win) it has no trouble 
learning.
It is even more frustrating that training any other kind of network 
(predicting next move, territory,...) goes smoothly and fast.


Has anyone experienced a similar problem with value networks or has an 
idea of the cause?


Thank you
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Value Network

2016-03-19 Thread Detlef Schmicker
t; num_output: 128 kernel_size: 3 pad: 1 weight_filler { type:
>> "xavier" } bias_filler { type: "constant" } } } layers { name:
>> "relu8" type: RELU bottom: "conv8" top: "conv8" }
>> 
>> layers { name: "conv9_3x3_128" type: CONVOLUTION blobs_lr: 1. 
>> blobs_lr: 2. bottom: "conv8" top: "conv9" convolution_param { 
>> num_output: 128 kernel_size: 3 pad: 1 weight_filler { type:
>> "xavier" } bias_filler { type: "constant" } } } layers { name:
>> "relu9" type: RELU bottom: "conv9" top: "conv9" }
>> 
>> layers { name: "conv10_3x3_128" type: CONVOLUTION blobs_lr: 1. 
>> blobs_lr: 2. bottom: "conv9" top: "conv10" convolution_param { 
>> num_output: 128 kernel_size: 3 pad: 1 weight_filler { type:
>> "xavier" } bias_filler { type: "constant" } } } layers { name:
>> "relu10" type: RELU bottom: "conv10" top: "conv10" }
>> 
>> layers { name: "conv11_3x3_128" type: CONVOLUTION blobs_lr: 1. 
>> blobs_lr: 2. bottom: "conv10" top: "conv11" convolution_param { 
>> num_output: 128 kernel_size: 3 pad: 1 weight_filler { type:
>> "xavier" } bias_filler { type: "constant" } } } layers { name:
>> "relu11" type: RELU bottom: "conv11" top: "conv11" }
>> 
>> layers { name: "conv12_1x1_1" type: CONVOLUTION blobs_lr: 1. 
>> blobs_lr: 2. bottom: "conv11" top: "conv12" convolution_param { 
>> num_output: 1 kernel_size: 1 pad: 0 weight_filler { type:
>> "xavier" } bias_filler { type: "constant" } } } layers { name:
>> "relu12" type: RELU bottom: "conv12" top: "conv12" }
>> 
>> layers { name: "fc13" type: INNER_PRODUCT bottom: "conv12" top:
>> "fc13" inner_product_param { num_output: 256 weight_filler { 
>> type: "xavier" } bias_filler { type: "constant" } } } layers { 
>> name: "relu13" type: RELU bottom: "fc13" top: "fc13" }
>> 
>> layers { name: "fc14" type: INNER_PRODUCT bottom: "fc13" top:
>> "fc14" inner_product_param { num_output: 1 weight_filler { type:
>> "xavier" } bias_filler { type: "constant" } } } layers { name:
>> "tanh14" type: TANH bottom: "fc14" top: "fc14" }
>> 
>> layers { name: "loss" type: EUCLIDEAN_LOSS bottom: "fc14" bottom:
>> "label" top: "loss" } 
>> 
>> 
>> Thanks, Hiroshi Yamashita
>> 
>> - Original Message - From: "Detlef Schmicker"
>> <d...@physik.de> To: <computer-go@computer-go.org> Sent: Saturday,
>> March 19, 2016 7:41 PM Subject: Re: [Computer-go] Value Network
>> 
>> 
>> 
>> What are you using for loss?
>>> 
>>> this:
>>> 
>>> layers { name: "loss4" type:  EUCLIDEAN_LOSS loss_weight: 2.0 
>>> bottom: "vvv" bottom: "pool2" top: "accloss4" }
>>> 
>>> 
>>> ?
>>> 
>>> Am 04.03.2016 um 16:23 schrieb Hiroshi Yamashita:
>>> 
>>>> Hi,
>>>> 
>>>> I tried to make Value network.
>>>> 
>>>> "Policy network + Value network"  vs  "Policy network"
>>>> Winrate Wins/Games 70.7%322 / 455,1000 playouts/move
>>>> 76.6%141 / 184,   1 playouts/move
>>>> 
>>>> It seems more playouts, more Value network is effetctive.
>>>> Games is not enough though. Search is similar to AlphaGo.
>>>> Mixing parameter lambda is 0.5. Search is synchronous. Using
>>>> one GTX 980. In 1 playouts/move, Policy network is called
>>>> 175 times, Value network is called 786 times. Node Expansion
>>>> threshold is 33.
>>>> 
>>>> 
>>>> Value network is 13 layers, 128 filters. (5x5_128, 3x3_128
>>>> x10, 1x1_1, fully connect, tanh) Policy network is 12 layers,
>>>> 256 filters. (5x5_256, 3x3_256 x10, 3x3_1), Accuracy is
>>>> 50.1%
>>>> 
>>>> For Value network, I collected 15804400 positions from
>>>> 987775 games. Games are from GoGoD, tygem 9d,  22477
>>>> games http://baduk.sourceforge.net/TygemAmateur.7z KGS 4d
>>>> over, 1450946 games http://www.u-go.net/gamerecords-4d/
>>>> (except handicaps games). And select 16 positions randomly
>>>> from one game. One game is divided 16 game stage, and select
>>>> one of each. 1st and 9th position are rotated in same
>>>> symmetry. Then Aya searches with 500 playouts, with Policy
>>>> network. And store winrate (-1 to +1). Komi is 7.5. This 500 
>>>> playouts is around 2730 BayesElo on CGOS.
>>>> 
>>>> I did some of this on Amazon EC2 g2.2xlarge, 11 instances. It
>>>> took 2 days, and costed $54. Spot instance is reasonable.
>>>> However g2.2xlarge(GRID K520), is 3x slower than GTX 980. My
>>>> Pocicy network(12L 256F) takes 5.37ms(GTX 980), and
>>>> 15.0ms(g2.2xlarge). Test and Traing loss are 0.00923 and
>>>> 0.00778. I think there is no big overfitting.
>>>> 
>>>> Value network is effective, but Aya has still fatal semeai 
>>>> weakness.
>>>> 
>>>> Regards, Hiroshi Yamashita
>>>> 
>>> 
>> ___ Computer-go
>> mailing list Computer-go@computer-go.org 
>> http://computer-go.org/mailman/listinfo/computer-go
>> 
> 
> 
> 
> ___ Computer-go mailing
> list Computer-go@computer-go.org 
> http://computer-go.org/mailman/listinfo/computer-go
> 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJW7ZeFAAoJEInWdHg+Znf4RJAQAKwTOidHeQjutSUYKhNCcAcj
X5LWSBg72PEGInlvS6qz3BDIlLI/ftOmQwcHpAvA+Ci91wCbiZlH7n+DI+YZqixm
s1lAryvpQgp8EyhgNqH4H3URtQvbZsjaEqjIeDPA8Xiqvx+Yi0sKlH5Tbkcyhy5H
7uHb0ls0VTf0q0DOCTcwkbdOd3nfXNj0xKwZ4JMh0s5d1OE1XFRqzNjZsre2uTUN
Fdj1YBOkAsW1Ja31IDwEK9eM/aoBoaxWrbnLV/1pLhhHYDwxEJvo9V9JroxN3sTR
0ll1xrNzfMAXyPY+yRk7SgYTayD8dUZKj0WbThvx389CJqnWZFtXog8HuybVLeI3
fr9PDGOx9quN07SXvdjVAjrOP01YZHfTqh31nKK4hnfH/krXpFivc/l2zs5CvkGs
PtsS61wfRflUPZiiTwrnRT/sHJn8Eqw99u9GeS4v2J3of9BtnKs8JAKUL4pbXcVT
5Lfxml1stBVABAXJoPXrHyFbUkSusPoHHppaGfG+E9uBoaEGXE2xTpdXzr3u1rUv
aSOvqt+Pbe4u1eboStOVtDjwOAGmrLBSu9X5HkcnvOQ6L10dS52WTkvPzB7i6Hoa
RuMZIFT1iIzJ9ZHiJRx+icgEE/Kh3bObbPuCuueHH2315eaIshLqtlrj65g5M+sU
r/z6Oc8pk5xRDcfTpfK5
=4Apv
-END PGP SIGNATURE-
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Value Network

2016-03-19 Thread Hiroshi Yamashita


Thanks Aja,

Maybe you read already, but I found Lee Sedol's comment about you.

-
Playing with Go AI, "I'm proud", Lee 9dan mother said
http://japanese.yonhapnews.co.kr/society/2016/03/16/08AJP20160316004200882.HTML

Lee 9dan commented at Gimpo Airport,
Lee 9dan admired Dr. Aja Huang who played AlphaGo's moves on the board.
"While playing, Mr. Huang never change his face, and never go to the restroom.
Mr. Huang was one of the making effort to AlphaGo, and "Really admirable".

Mr. Huang controled his features while playing even if AlphaGo played 
unexpected move. He got attension.

Expressionless Mr. Huang has a presenceis. Net user said he is like an
AlphaGo itself.

Sedol 9dan's wife introduced a episode. When she orderd breakfast in
the Hotel, and went to wake up Lee 9dan. Surprisingly, Lee 9dan talked
in his sleep, "I'll eat with Mr. Huang".

Lee 9dan laughed, "I could not remember of dream. Maybe immpression is
deep. or thankfully, he kept manner.
-

Thanks,
Hiroshi Yamashita

- Original Message - 
From: "Aja Huang" <ajahu...@google.com>

To: <computer-go@computer-go.org>
Sent: Saturday, March 19, 2016 10:25 PM
Subject: Re: [Computer-go] Value Network



Good stuff, Hiroshi. Looks like I don't need to answer the questions
regarding value network. :)

Aja


___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Value Network

2016-03-19 Thread Hiroshi Yamashita
quot;conv10_3x3_128"
 type: CONVOLUTION
 blobs_lr: 1.
 blobs_lr: 2.
 bottom: "conv9"
 top: "conv10"
 convolution_param {
   num_output: 128
   kernel_size: 3
   pad: 1
   weight_filler {
 type: "xavier"
   }
   bias_filler {
 type: "constant"
   }
 }
}
layers {
 name: "relu10"
 type: RELU
 bottom: "conv10"
 top: "conv10"
}

layers {
 name: "conv11_3x3_128"
 type: CONVOLUTION
 blobs_lr: 1.
 blobs_lr: 2.
 bottom: "conv10"
 top: "conv11"
 convolution_param {
   num_output: 128
   kernel_size: 3
   pad: 1
   weight_filler {
 type: "xavier"
   }
   bias_filler {
 type: "constant"
   }
 }
}
layers {
 name: "relu11"
 type: RELU
 bottom: "conv11"
 top: "conv11"
}

layers {
 name: "conv12_1x1_1"
 type: CONVOLUTION
 blobs_lr: 1.
 blobs_lr: 2.
 bottom: "conv11"
 top: "conv12"
 convolution_param {
   num_output: 1
   kernel_size: 1
   pad: 0
   weight_filler {
 type: "xavier"
   }
   bias_filler {
 type: "constant"
   }
 }
}
layers {
 name: "relu12"
 type: RELU
 bottom: "conv12"
 top: "conv12"
}

layers {
 name: "fc13"
 type: INNER_PRODUCT
 bottom: "conv12"
 top: "fc13"
 inner_product_param {
   num_output: 256
   weight_filler {
 type: "xavier"
   }
   bias_filler {
 type: "constant"
   }
 }
}
layers {
 name: "relu13"
 type: RELU
 bottom: "fc13"
 top: "fc13"
}

layers {
 name: "fc14"
 type: INNER_PRODUCT
 bottom: "fc13"
 top: "fc14"
 inner_product_param {
   num_output: 1
   weight_filler {
 type: "xavier"
   }
   bias_filler {
 type: "constant"
   }
 }
}
layers {
 name: "tanh14"
 type: TANH
 bottom: "fc14"
 top: "fc14"
}

layers {
 name: "loss"
 type: EUCLIDEAN_LOSS
 bottom: "fc14"
 bottom: "label"
 top: "loss"
}


Thanks,
Hiroshi Yamashita

- Original Message - 
From: "Detlef Schmicker" <d...@physik.de>

To: <computer-go@computer-go.org>
Sent: Saturday, March 19, 2016 7:41 PM
Subject: Re: [Computer-go] Value Network



What are you using for loss?

this:

layers {
 name: "loss4"
 type:  EUCLIDEAN_LOSS
 loss_weight: 2.0
 bottom: "vvv"
 bottom: "pool2"
 top: "accloss4"
}


?

Am 04.03.2016 um 16:23 schrieb Hiroshi Yamashita:

Hi,

I tried to make Value network.

"Policy network + Value network"  vs  "Policy network" Winrate
Wins/Games 70.7%322 / 455,1000 playouts/move 76.6%141 /
184,   1 playouts/move

It seems more playouts, more Value network is effetctive. Games is
not enough though. Search is similar to AlphaGo. Mixing parameter
lambda is 0.5. Search is synchronous. Using one GTX 980. In 1
playouts/move, Policy network is called 175 times, Value network is
called 786 times. Node Expansion threshold is 33.


Value network is 13 layers, 128 filters. (5x5_128, 3x3_128 x10,
1x1_1, fully connect, tanh) Policy network is 12 layers, 256
filters. (5x5_256, 3x3_256 x10, 3x3_1), Accuracy is 50.1%

For Value network, I collected 15804400 positions from 987775
games. Games are from GoGoD, tygem 9d,  22477 games
http://baduk.sourceforge.net/TygemAmateur.7z KGS 4d over, 1450946
games http://www.u-go.net/gamerecords-4d/ (except handicaps
games). And select 16 positions randomly from one game. One game is
divided 16 game stage, and select one of each. 1st and 9th position
are rotated in same symmetry. Then Aya searches with 500 playouts, 
with Policy network. And store winrate (-1 to +1). Komi is 7.5. 
This 500 playouts is around 2730 BayesElo on CGOS.


I did some of this on Amazon EC2 g2.2xlarge, 11 instances. It took 
2 days, and costed $54. Spot instance is reasonable. However 
g2.2xlarge(GRID K520), is 3x slower than GTX 980. My Pocicy 
network(12L 256F) takes 5.37ms(GTX 980), and 15.0ms(g2.2xlarge). 
Test and Traing loss are 0.00923 and 0.00778. I think there is no

big overfitting.

Value network is effective, but Aya has still fatal semeai
weakness.

Regards, Hiroshi Yamashita


___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Value Network

2016-03-19 Thread Aja Huang
{
>  name: "conv7_3x3_128"
>  type: CONVOLUTION
>  blobs_lr: 1.
>  blobs_lr: 2.
>  bottom: "conv6"
>  top: "conv7"
>  convolution_param {
>num_output: 128
>kernel_size: 3
>pad: 1
>weight_filler {
>  type: "xavier"
>}
>bias_filler {
>  type: "constant"
>}
>  }
> }
> layers {
>  name: "relu7"
>  type: RELU
>  bottom: "conv7"
>  top: "conv7"
> }
>
> layers {
>  name: "conv8_3x3_128"
>  type: CONVOLUTION
>  blobs_lr: 1.
>  blobs_lr: 2.
>  bottom: "conv7"
>  top: "conv8"
>  convolution_param {
>num_output: 128
>kernel_size: 3
>pad: 1
>weight_filler {
>  type: "xavier"
>}
>bias_filler {
>  type: "constant"
>}
>  }
> }
> layers {
>  name: "relu8"
>  type: RELU
>  bottom: "conv8"
>  top: "conv8"
> }
>
> layers {
>  name: "conv9_3x3_128"
>  type: CONVOLUTION
>  blobs_lr: 1.
>  blobs_lr: 2.
>  bottom: "conv8"
>  top: "conv9"
>  convolution_param {
>num_output: 128
>kernel_size: 3
>pad: 1
>weight_filler {
>  type: "xavier"
>}
>bias_filler {
>  type: "constant"
>}
>  }
> }
> layers {
>  name: "relu9"
>  type: RELU
>  bottom: "conv9"
>  top: "conv9"
> }
>
> layers {
>  name: "conv10_3x3_128"
>  type: CONVOLUTION
>  blobs_lr: 1.
>  blobs_lr: 2.
>  bottom: "conv9"
>  top: "conv10"
>  convolution_param {
>num_output: 128
>kernel_size: 3
>pad: 1
>weight_filler {
>  type: "xavier"
>}
>bias_filler {
>  type: "constant"
>}
>  }
> }
> layers {
>  name: "relu10"
>  type: RELU
>  bottom: "conv10"
>  top: "conv10"
> }
>
> layers {
>  name: "conv11_3x3_128"
>  type: CONVOLUTION
>  blobs_lr: 1.
>  blobs_lr: 2.
>  bottom: "conv10"
>  top: "conv11"
>  convolution_param {
>num_output: 128
>kernel_size: 3
>pad: 1
>weight_filler {
>  type: "xavier"
>}
>bias_filler {
>  type: "constant"
>}
>  }
> }
> layers {
>  name: "relu11"
>  type: RELU
>  bottom: "conv11"
>  top: "conv11"
> }
>
> layers {
>  name: "conv12_1x1_1"
>  type: CONVOLUTION
>  blobs_lr: 1.
>  blobs_lr: 2.
>  bottom: "conv11"
>  top: "conv12"
>  convolution_param {
>num_output: 1
>kernel_size: 1
>pad: 0
>weight_filler {
>  type: "xavier"
>}
>bias_filler {
>  type: "constant"
>}
>  }
> }
> layers {
>  name: "relu12"
>  type: RELU
>  bottom: "conv12"
>  top: "conv12"
> }
>
> layers {
>  name: "fc13"
>  type: INNER_PRODUCT
>  bottom: "conv12"
>  top: "fc13"
>  inner_product_param {
>num_output: 256
>weight_filler {
>  type: "xavier"
>}
>bias_filler {
>  type: "constant"
>}
>  }
> }
> layers {
>  name: "relu13"
>  type: RELU
>  bottom: "fc13"
>  top: "fc13"
> }
>
> layers {
>  name: "fc14"
>  type: INNER_PRODUCT
>  bottom: "fc13"
>  top: "fc14"
>  inner_product_param {
>num_output: 1
>weight_filler {
>  type: "xavier"
>}
>bias_filler {
>  type: "constant"
>}
>  }
> }
> layers {
>  name: "tanh14"
>  type: TANH
>  bottom: "fc14"
>  top: "fc14"
> }
>
> layers {
>  name: "loss"
>  type: EUCLIDEAN_LOSS
>  bottom: "fc14"
>  bottom: "label"
>  top: "loss"
> }
> 
>
> Thanks,
> Hiroshi Yamashita
>
> - Original Message - From: "Detlef Schmicker" <d...@physik.de>
> To: <computer-go@computer-go.org>
> Sent: Saturday, March 19, 2016 7:41 PM
> Subject: Re: [Computer-go] Value Network
>
>
>
> What are you using for loss?
>>
>> this:
>>
>> layers {
>>  name: "loss4"
>>  type:  EUCLIDEAN_LOSS
>>  loss_weight: 2.0
>>  bottom: "vvv"
>>  bottom: "pool2"
>>  top: "accloss4"
>> }
>>
>>
>> ?
>>
>> Am 04.03.2016 um 16:23 schrieb Hiroshi Yamashita:
>>
>>> Hi,
>>>
>>> I tried to make Value network.
>>>
>>> "Policy network + Value network"  vs  "Policy network" Winrate
>>> Wins/Games 70.7%322 / 455,1000 playouts/move 76.6%141 /
>>> 184,   1 playouts/move
>>>
>>> It seems more playouts, more Value network is effetctive. Games is
>>> not enough though. Search is similar to AlphaGo. Mixing parameter
>>> lambda is 0.5. Search is synchronous. Using one GTX 980. In 1
>>> playouts/move, Policy network is called 175 times, Value network is
>>> called 786 times. Node Expansion threshold is 33.
>>>
>>>
>>> Value network is 13 layers, 128 filters. (5x5_128, 3x3_128 x10,
>>> 1x1_1, fully connect, tanh) Policy network is 12 layers, 256
>>> filters. (5x5_256, 3x3_256 x10, 3x3_1), Accuracy is 50.1%
>>>
>>> For Value network, I collected 15804400 positions from 987775
>>> games. Games are from GoGoD, tygem 9d,  22477 games
>>> http://baduk.sourceforge.net/TygemAmateur.7z KGS 4d over, 1450946
>>> games http://www.u-go.net/gamerecords-4d/ (except handicaps
>>> games). And select 16 positions randomly from one game. One game is
>>> divided 16 game stage, and select one of each. 1st and 9th position
>>> are rotated in same symmetry. Then Aya searches with 500 playouts, with
>>> Policy network. And store winrate (-1 to +1). Komi is 7.5. This 500
>>> playouts is around 2730 BayesElo on CGOS.
>>>
>>> I did some of this on Amazon EC2 g2.2xlarge, 11 instances. It took 2
>>> days, and costed $54. Spot instance is reasonable. However g2.2xlarge(GRID
>>> K520), is 3x slower than GTX 980. My Pocicy network(12L 256F) takes
>>> 5.37ms(GTX 980), and 15.0ms(g2.2xlarge). Test and Traing loss are 0.00923
>>> and 0.00778. I think there is no
>>> big overfitting.
>>>
>>> Value network is effective, but Aya has still fatal semeai
>>> weakness.
>>>
>>> Regards, Hiroshi Yamashita
>>>
>>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Value Network

2016-03-19 Thread Detlef Schmicker
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

What are you using for loss?

this:

layers {
  name: "loss4"
  type:  EUCLIDEAN_LOSS
  loss_weight: 2.0
  bottom: "vvv"
  bottom: "pool2"
  top: "accloss4"
}


?

Am 04.03.2016 um 16:23 schrieb Hiroshi Yamashita:
> Hi,
> 
> I tried to make Value network.
> 
> "Policy network + Value network"  vs  "Policy network" Winrate
> Wins/Games 70.7%322 / 455,1000 playouts/move 76.6%141 /
> 184,   1 playouts/move
> 
> It seems more playouts, more Value network is effetctive. Games is
> not enough though. Search is similar to AlphaGo. Mixing parameter
> lambda is 0.5. Search is synchronous. Using one GTX 980. In 1
> playouts/move, Policy network is called 175 times, Value network is
> called 786 times. Node Expansion threshold is 33.
> 
> 
> Value network is 13 layers, 128 filters. (5x5_128, 3x3_128 x10,
> 1x1_1, fully connect, tanh) Policy network is 12 layers, 256
> filters. (5x5_256, 3x3_256 x10, 3x3_1), Accuracy is 50.1%
> 
> For Value network, I collected 15804400 positions from 987775
> games. Games are from GoGoD, tygem 9d,  22477 games
> http://baduk.sourceforge.net/TygemAmateur.7z KGS 4d over, 1450946
> games http://www.u-go.net/gamerecords-4d/ (except handicaps
> games). And select 16 positions randomly from one game. One game is
> divided 16 game stage, and select one of each. 1st and 9th position
> are rotated in same symmetry. Then Aya searches with 500 playouts, 
> with Policy network. And store winrate (-1 to +1). Komi is 7.5. 
> This 500 playouts is around 2730 BayesElo on CGOS.
> 
> I did some of this on Amazon EC2 g2.2xlarge, 11 instances. It took 
> 2 days, and costed $54. Spot instance is reasonable. However 
> g2.2xlarge(GRID K520), is 3x slower than GTX 980. My Pocicy 
> network(12L 256F) takes 5.37ms(GTX 980), and 15.0ms(g2.2xlarge). 
> Test and Traing loss are 0.00923 and 0.00778. I think there is no
> big overfitting.
> 
> Value network is effective, but Aya has still fatal semeai
> weakness.
> 
> Regards, Hiroshi Yamashita
> 
> ___ Computer-go mailing
> list Computer-go@computer-go.org 
> http://computer-go.org/mailman/listinfo/computer-go
> 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJW7SzYAAoJEInWdHg+Znf4gV8P/jylYxgTHDY/i2RdGy4TaviL
sL7M2y1d/7xpcUYZHmCBV32jhDfoIEh3hzW9ZYIVylak6JJgwy+czDVbCy30akZL
MMLKYUjtcHrQJphOXcYMsBJ/CYgsRVO45AAUgFXiHRlXuCs9LVrB41QjjQ291Cph
sTgNOUaxa62CHT85eKFWGfyKNIRo4p0uWQhnen2ZCrrVyaghV8Iqzjcjgxotlzuf
Ur2bwBb3SlkE31x9slQYpiFJ+jkfTDLqF0Z2gFqiZTf+sUxA03LP+j64+3cMaMou
kyHizVJMGk+JOg2z4cdIVFoLkT4FdrmD5R7zpv5RLQ6Q9r7wOLE5Ptho6fxkRShU
d0qofNnEQwIKz48knqBdPVuo+yZlv8b/JopyVaPEN1isq0Hab0I2ASCMWAAZRbeF
5xdCP2MdvMoSuDSSBTeZRQnLCAntGZ5O+PDYbGVqVaperDSeqszSdXQ+DrhpkGwm
uWT2D3YYO0/P9ovdTd0NEf11A+PlOF/CfGbdOfCbjxOT9cE2WyZOpy2hPu2SMj8d
2cgt36FVD+1lQueaWiJlTa6Q/8mehBCgLJLNHMuF3v8TTCUP2ofwWr1eZ5Y7T1DU
D2xX2eVo6HUdaehwjc9X8Gsa37nJIYtEAHm2BFxzlASdntj9eMgbg3GNYtSwr/2r
T6QqYjbRQJO6+mw0iU0t
=S3fg
-END PGP SIGNATURE-
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Value Network

2016-03-04 Thread Detlef Schmicker
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

thanks a lot for sharing! I try a slightly different approach at the
moment:

I use a combined policy / value network (adding 3-5 layers with about
16 filters at the end of the policy network for the value network to
avoid overfitting) and I use the results of the games as value. My
main problem is still overfitting!

As your results seem good I will try your bigger database to get more
games into training I think.

I will keep you posted

Detlef

Am 04.03.2016 um 16:23 schrieb Hiroshi Yamashita:
> Hi,
> 
> I tried to make Value network.
> 
> "Policy network + Value network"  vs  "Policy network" Winrate
> Wins/Games 70.7%322 / 455,1000 playouts/move 76.6%141 /
> 184,   1 playouts/move
> 
> It seems more playouts, more Value network is effetctive. Games is
> not enough though. Search is similar to AlphaGo. Mixing parameter
> lambda is 0.5. Search is synchronous. Using one GTX 980. In 1
> playouts/move, Policy network is called 175 times, Value network is
> called 786 times. Node Expansion threshold is 33.
> 
> 
> Value network is 13 layers, 128 filters. (5x5_128, 3x3_128 x10,
> 1x1_1, fully connect, tanh) Policy network is 12 layers, 256
> filters. (5x5_256, 3x3_256 x10, 3x3_1), Accuracy is 50.1%
> 
> For Value network, I collected 15804400 positions from 987775
> games. Games are from GoGoD, tygem 9d,  22477 games
> http://baduk.sourceforge.net/TygemAmateur.7z KGS 4d over, 1450946
> games http://www.u-go.net/gamerecords-4d/ (except handicaps
> games). And select 16 positions randomly from one game. One game is
> divided 16 game stage, and select one of each. 1st and 9th position
> are rotated in same symmetry. Then Aya searches with 500 playouts, 
> with Policy network. And store winrate (-1 to +1). Komi is 7.5. 
> This 500 playouts is around 2730 BayesElo on CGOS.
> 
> I did some of this on Amazon EC2 g2.2xlarge, 11 instances. It took 
> 2 days, and costed $54. Spot instance is reasonable. However 
> g2.2xlarge(GRID K520), is 3x slower than GTX 980. My Pocicy 
> network(12L 256F) takes 5.37ms(GTX 980), and 15.0ms(g2.2xlarge). 
> Test and Traing loss are 0.00923 and 0.00778. I think there is no
> big overfitting.
> 
> Value network is effective, but Aya has still fatal semeai
> weakness.
> 
> Regards, Hiroshi Yamashita
> 
> ___ Computer-go mailing
> list Computer-go@computer-go.org 
> http://computer-go.org/mailman/listinfo/computer-go
> 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJW2cD5AAoJEInWdHg+Znf4KXIP/2rrfEph3VNHkrf5B4H0DJXm
abSsFbqF453SjFOucjSGXv8Ecp90lCmwz41NWkQEpBLvedjl4atjMoBiCorjqhny
ZKeFUgY6tK0HWU2euxHH9reJ6HAsDrlYgMrJKqNySdAtPxNq2buMW1qIiFrAHCsL
wCsYlwVtz4EpViJcuSXoFufreTyfUJ7p8AxrhRtuC6ALZI1wUTm+xrwrCHPQ91Bg
AKx5N2xLO2c7rHCt9FsLhR1BmXgximzmYsD7Sge4mdYMwU5nrRhxgAvX1Uj8sP8Z
2YfF+/8YmFP/rc55LqqRGzjeUwpJaX8rv1eHxl+eaNoptP7PZcFchsC5motc6XNV
fjTwOhyaeEsPlPIDylJN5PNPn2hXc75MqVDHMnUn2J+VF2DdlerKMmZhqTd1VaIu
sHz1+DN7PNZIO4cO3AKi9ynmBEHB1pQaRH4nDWkL6hdI8Zv6ZgJEjRhXjnFWyJcI
PVmErcUI6Xn1xCXHEWhxSjKwuwil/RgdVfPgywfqhj1MiuTtkcrThpUmWcPCrLRk
fxsNddSKmJcFs4nCcK/M6oO/OiZ6mn7dO4xoCWnAvds3aW71tEupTuZhYjiWx9YH
KR5p4r7JIBNCSn1ZfonD3BKMKyBv7qIJ63ITSAdy0EH3aJPt4CVmZm2dsrE3ZhtW
wMqhp4Yf8ecTiapJOcol
=oJsM
-END PGP SIGNATURE-
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Value Network

2016-03-04 Thread Hiroshi Yamashita

Hi,

I tried to make Value network.

"Policy network + Value network"  vs  "Policy network"
Winrate  Wins/Games
70.7%322 / 455,1000 playouts/move
76.6%141 / 184,   1 playouts/move

It seems more playouts, more Value network is effetctive. Games
is not enough though. Search is similar to AlphaGo. Mixing
parameter lambda is 0.5. Search is synchronous. Using one GTX 980.
In 1 playouts/move, Policy network is called 175 times,
Value network is called 786 times. Node Expansion threshold is 33.


Value network is
 13 layers, 128 filters. (5x5_128, 3x3_128 x10, 1x1_1, fully connect, tanh)
Policy network is
 12 layers, 256 filters. (5x5_256, 3x3_256 x10, 3x3_1), Accuracy is 50.1%

For Value network, I collected 15804400 positions from 987775 games.
Games are from
 GoGoD,
 tygem 9d,  22477 games http://baduk.sourceforge.net/TygemAmateur.7z
 KGS 4d over, 1450946 games http://www.u-go.net/gamerecords-4d/
 (except handicaps games).
And select 16 positions randomly from one game. One game is divided
16 game stage, and select one of each. 1st and 9th position are
rotated in same symmetry. Then Aya searches with 500 playouts,
with Policy network. And store winrate (-1 to +1). Komi is 7.5.
This 500 playouts is around 2730 BayesElo on CGOS.

I did some of this on Amazon EC2 g2.2xlarge, 11 instances. It took
2 days, and costed $54. Spot instance is reasonable. However
g2.2xlarge(GRID K520), is 3x slower than GTX 980. My Pocicy
network(12L 256F) takes 5.37ms(GTX 980), and 15.0ms(g2.2xlarge).
Test and Traing loss are 0.00923 and 0.00778. I think there is
no big overfitting.

Value network is effective, but Aya has still fatal semeai weakness.

Regards,
Hiroshi Yamashita

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go