Re: [Computer-go] Value network that doesn't want to learn.
>... my value network was trained to tell me the game is balanced at the >beginning... :-) The best training policy is to select positions that correct errors. I used the policies below to train a backgammon NN. Together, they reduced the expected loss of the network by 50% (cut the error rate in half): - Select training positions from the program's own games. - Can be self-play or versus an opponent. - Best is to have a broad panel of opponents. - Beneficial to bootstrap with pro games, but then add ONLY training examples from program's own games. - Train only the moves made by the winner of the game - Very important for deterministic games! - Note that the winner can be either your program or the opponent. - If your program wins then training reinforces good behavior; if opponent wins then training corrects bad behavior. - Per game, you should aim to get only a few training examples (3 in backgammon. Maybe 10 in Go?). Use two policies: - Select positions where the static evaluation of a position is significantly different from a deep search - Select positions where the move selected by a deep search did not have the highest static evaluation. (And in this case you have two training positions, which differ by the move chosen.) - Of course, you are selecting examples where you did as badly as possible. - The training value of the position is the result of a deep search. - This is equivalent to "temporal difference learning", but accelerated by the depth of the search. - Periodically refresh the training evaluations as your search/eval improve. These policies actively seek out cases where your evaluation function has some weakness, so training is definitely focused on improving results in the distribution of positions that your program will actually face. You will need about 30 training examples for every free parameter in your NN. You can do the math on how many games that will take. It is inevitable: you will train your NN based on blitz games. Good luck! ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Value network that doesn't want to learn.
Finally found the problem. In the end, it was as stupid as expected: When I pick a game for the batch creation I select randomly a limited number of moves inside the game. In the case of the value network I use like 8-16 moves to not overfit the data (I can't take 1 or then the I/O operations slow down the training) and for other networks, I would simply take all the moves. Or at least this was what I thought my code was doing. Instead of picking N random moves in the game, it was picking the first N moves in a random order. So... my value network was trained to tell me the game is balanced at the beginning... Le 20-Jun-17 à 5:48 AM, Gian-Carlo Pascutto a écrit : On 19/06/2017 21:31, Vincent Richard wrote: - The data is then analyzed by a script which extracts all kind of features from games. When I'm training a network, I load the features I want from this analysis to build the batch. I have 2 possible methods for the batch construction. I can either add moves one after the other (the fast mode) or pick random moves among different games (slower but reduces the variance). You absolutely need the latter, especially as for outcome prediction the moves from the same game are not independent samples. During sime of the tests, all the networks I was training had the same layers except for the last. So as you suggested, I was also wondering if this last layer wasn’t the problem. Yet, I haven’t found any error. ... However, if I feed a stupid value as target output (for example black always win) it has no trouble learning. A problem with side to move/won side marking in the input or feature planes, or with the expected outcome (0 vs 1 vs -1)? ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Value network that doesn't want to learn.
This is what have been thinking about, yet unable to find an error. Currently, I'm working with: - SGF Database: fuseki info Tygem -> http://tygem.fuseki.info/index.php (until recently I was working with games of all level from KGS) - The data is then analyzed by a script which extracts all kind of features from games. When I'm training a network, I load the features I want from this analysis to build the batch. I have 2 possible methods for the batch construction. I can either add moves one after the other (the fast mode) or pick random moves among different games (slower but reduces the variance). I set the batch size according to my GPU memory (200 moves in the case of full sized value/policy network). I don't think the problem may come from here since the data is the same for all the networks - For the input, I’m using the same architecture as https://github.com/TheDuck314/go-NN (I have been trying a lot of kind of shapes, from minimalist to alphago) - For the network, I’m once again using TheDuck314 network (EvalModels.Conv11PosDepFC1ELU) with the same layers https://github.com/TheDuck314/go-NN/blob/master/engine/Layers.py, and the learning rate he recommends During sime of the tests, all the networks I was training had the same layers except for the last. So as you suggested, I was also wondering if this last layer wasn’t the problem. Yet, I haven’t found any error. Le 20-Jun-17 à 3:19 AM, Gian-Carlo Pascutto a écrit : On 19-06-17 17:38, Vincent Richard wrote: During my research, I’ve trained a lot of different networks, first on 9x9 then on 19x19, and as far as I remember all the nets I’ve worked with learned quickly (especially during the first batches), except the value net which has always been problematic (diverge easily, doesn't learn quickly,...) . I have been stuck on the 19x19 value network for a couple months now. I’ve tried countless of inputs (feature planes) and lots of different models, even using the exact same code as others. Yet, whatever I try, the loss value doesn’t move an inch and accuracy stays at 50% (even after days of training). I've tried to change the learning rate (increase/decrease), it doesn't change. However, if I feed a stupid value as target output (for example black always win) it has no trouble learning. It is even more frustrating that training any other kind of network (predicting next move, territory,...) goes smoothly and fast. Has anyone experienced a similar problem with value networks or has an idea of the cause? 1) What is the training data for the value network? How big is it, how is it presented/shuffled/prepared? 2) What is the *exact* structure of the network and training setup? My best guess would be an error in the construction of the final layers. ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Value network that doesn't want to learn.
On 19-06-17 17:38, Vincent Richard wrote: > During my research, I’ve trained a lot of different networks, first on > 9x9 then on 19x19, and as far as I remember all the nets I’ve worked > with learned quickly (especially during the first batches), except the > value net which has always been problematic (diverge easily, doesn't > learn quickly,...) . I have been stuck on the 19x19 value network for a > couple months now. I’ve tried countless of inputs (feature planes) and > lots of different models, even using the exact same code as others. Yet, > whatever I try, the loss value doesn’t move an inch and accuracy stays > at 50% (even after days of training). I've tried to change the learning > rate (increase/decrease), it doesn't change. However, if I feed a stupid > value as target output (for example black always win) it has no trouble > learning. > It is even more frustrating that training any other kind of network > (predicting next move, territory,...) goes smoothly and fast. > > Has anyone experienced a similar problem with value networks or has an > idea of the cause? 1) What is the training data for the value network? How big is it, how is it presented/shuffled/prepared? 2) What is the *exact* structure of the network and training setup? My best guess would be an error in the construction of the final layers. -- GCP ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
[Computer-go] Value network that doesn't want to learn.
Hello everyone, For my master thesis, I have built an AI that has a strategical approach to the game. It doesn’t play but simply describe the strategy behind all possible move for a given strategy ("enclosing this group", "making life for this group", "saving these stones", etc). My main idea is that once associated with a playing AI, I will be able to generate comments on a position (and then teach people). So for my final experiment, I’m trying to build a playing AI. I don’t want it to be highly competitive, I just need it to be decent (1d or so), so I thought about using a policy network, a value network and a simple MCTS. The MCTS works fine, the policy network learns quickly and is accurate, but the value network seems to never learn, even the slightest. During my research, I’ve trained a lot of different networks, first on 9x9 then on 19x19, and as far as I remember all the nets I’ve worked with learned quickly (especially during the first batches), except the value net which has always been problematic (diverge easily, doesn't learn quickly,...) . I have been stuck on the 19x19 value network for a couple months now. I’ve tried countless of inputs (feature planes) and lots of different models, even using the exact same code as others. Yet, whatever I try, the loss value doesn’t move an inch and accuracy stays at 50% (even after days of training). I've tried to change the learning rate (increase/decrease), it doesn't change. However, if I feed a stupid value as target output (for example black always win) it has no trouble learning. It is even more frustrating that training any other kind of network (predicting next move, territory,...) goes smoothly and fast. Has anyone experienced a similar problem with value networks or has an idea of the cause? Thank you ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Value Network
t; num_output: 128 kernel_size: 3 pad: 1 weight_filler { type: >> "xavier" } bias_filler { type: "constant" } } } layers { name: >> "relu8" type: RELU bottom: "conv8" top: "conv8" } >> >> layers { name: "conv9_3x3_128" type: CONVOLUTION blobs_lr: 1. >> blobs_lr: 2. bottom: "conv8" top: "conv9" convolution_param { >> num_output: 128 kernel_size: 3 pad: 1 weight_filler { type: >> "xavier" } bias_filler { type: "constant" } } } layers { name: >> "relu9" type: RELU bottom: "conv9" top: "conv9" } >> >> layers { name: "conv10_3x3_128" type: CONVOLUTION blobs_lr: 1. >> blobs_lr: 2. bottom: "conv9" top: "conv10" convolution_param { >> num_output: 128 kernel_size: 3 pad: 1 weight_filler { type: >> "xavier" } bias_filler { type: "constant" } } } layers { name: >> "relu10" type: RELU bottom: "conv10" top: "conv10" } >> >> layers { name: "conv11_3x3_128" type: CONVOLUTION blobs_lr: 1. >> blobs_lr: 2. bottom: "conv10" top: "conv11" convolution_param { >> num_output: 128 kernel_size: 3 pad: 1 weight_filler { type: >> "xavier" } bias_filler { type: "constant" } } } layers { name: >> "relu11" type: RELU bottom: "conv11" top: "conv11" } >> >> layers { name: "conv12_1x1_1" type: CONVOLUTION blobs_lr: 1. >> blobs_lr: 2. bottom: "conv11" top: "conv12" convolution_param { >> num_output: 1 kernel_size: 1 pad: 0 weight_filler { type: >> "xavier" } bias_filler { type: "constant" } } } layers { name: >> "relu12" type: RELU bottom: "conv12" top: "conv12" } >> >> layers { name: "fc13" type: INNER_PRODUCT bottom: "conv12" top: >> "fc13" inner_product_param { num_output: 256 weight_filler { >> type: "xavier" } bias_filler { type: "constant" } } } layers { >> name: "relu13" type: RELU bottom: "fc13" top: "fc13" } >> >> layers { name: "fc14" type: INNER_PRODUCT bottom: "fc13" top: >> "fc14" inner_product_param { num_output: 1 weight_filler { type: >> "xavier" } bias_filler { type: "constant" } } } layers { name: >> "tanh14" type: TANH bottom: "fc14" top: "fc14" } >> >> layers { name: "loss" type: EUCLIDEAN_LOSS bottom: "fc14" bottom: >> "label" top: "loss" } >> >> >> Thanks, Hiroshi Yamashita >> >> - Original Message - From: "Detlef Schmicker" >> <d...@physik.de> To: <computer-go@computer-go.org> Sent: Saturday, >> March 19, 2016 7:41 PM Subject: Re: [Computer-go] Value Network >> >> >> >> What are you using for loss? >>> >>> this: >>> >>> layers { name: "loss4" type: EUCLIDEAN_LOSS loss_weight: 2.0 >>> bottom: "vvv" bottom: "pool2" top: "accloss4" } >>> >>> >>> ? >>> >>> Am 04.03.2016 um 16:23 schrieb Hiroshi Yamashita: >>> >>>> Hi, >>>> >>>> I tried to make Value network. >>>> >>>> "Policy network + Value network" vs "Policy network" >>>> Winrate Wins/Games 70.7%322 / 455,1000 playouts/move >>>> 76.6%141 / 184, 1 playouts/move >>>> >>>> It seems more playouts, more Value network is effetctive. >>>> Games is not enough though. Search is similar to AlphaGo. >>>> Mixing parameter lambda is 0.5. Search is synchronous. Using >>>> one GTX 980. In 1 playouts/move, Policy network is called >>>> 175 times, Value network is called 786 times. Node Expansion >>>> threshold is 33. >>>> >>>> >>>> Value network is 13 layers, 128 filters. (5x5_128, 3x3_128 >>>> x10, 1x1_1, fully connect, tanh) Policy network is 12 layers, >>>> 256 filters. (5x5_256, 3x3_256 x10, 3x3_1), Accuracy is >>>> 50.1% >>>> >>>> For Value network, I collected 15804400 positions from >>>> 987775 games. Games are from GoGoD, tygem 9d, 22477 >>>> games http://baduk.sourceforge.net/TygemAmateur.7z KGS 4d >>>> over, 1450946 games http://www.u-go.net/gamerecords-4d/ >>>> (except handicaps games). And select 16 positions randomly >>>> from one game. One game is divided 16 game stage, and select >>>> one of each. 1st and 9th position are rotated in same >>>> symmetry. Then Aya searches with 500 playouts, with Policy >>>> network. And store winrate (-1 to +1). Komi is 7.5. This 500 >>>> playouts is around 2730 BayesElo on CGOS. >>>> >>>> I did some of this on Amazon EC2 g2.2xlarge, 11 instances. It >>>> took 2 days, and costed $54. Spot instance is reasonable. >>>> However g2.2xlarge(GRID K520), is 3x slower than GTX 980. My >>>> Pocicy network(12L 256F) takes 5.37ms(GTX 980), and >>>> 15.0ms(g2.2xlarge). Test and Traing loss are 0.00923 and >>>> 0.00778. I think there is no big overfitting. >>>> >>>> Value network is effective, but Aya has still fatal semeai >>>> weakness. >>>> >>>> Regards, Hiroshi Yamashita >>>> >>> >> ___ Computer-go >> mailing list Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go >> > > > > ___ Computer-go mailing > list Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJW7ZeFAAoJEInWdHg+Znf4RJAQAKwTOidHeQjutSUYKhNCcAcj X5LWSBg72PEGInlvS6qz3BDIlLI/ftOmQwcHpAvA+Ci91wCbiZlH7n+DI+YZqixm s1lAryvpQgp8EyhgNqH4H3URtQvbZsjaEqjIeDPA8Xiqvx+Yi0sKlH5Tbkcyhy5H 7uHb0ls0VTf0q0DOCTcwkbdOd3nfXNj0xKwZ4JMh0s5d1OE1XFRqzNjZsre2uTUN Fdj1YBOkAsW1Ja31IDwEK9eM/aoBoaxWrbnLV/1pLhhHYDwxEJvo9V9JroxN3sTR 0ll1xrNzfMAXyPY+yRk7SgYTayD8dUZKj0WbThvx389CJqnWZFtXog8HuybVLeI3 fr9PDGOx9quN07SXvdjVAjrOP01YZHfTqh31nKK4hnfH/krXpFivc/l2zs5CvkGs PtsS61wfRflUPZiiTwrnRT/sHJn8Eqw99u9GeS4v2J3of9BtnKs8JAKUL4pbXcVT 5Lfxml1stBVABAXJoPXrHyFbUkSusPoHHppaGfG+E9uBoaEGXE2xTpdXzr3u1rUv aSOvqt+Pbe4u1eboStOVtDjwOAGmrLBSu9X5HkcnvOQ6L10dS52WTkvPzB7i6Hoa RuMZIFT1iIzJ9ZHiJRx+icgEE/Kh3bObbPuCuueHH2315eaIshLqtlrj65g5M+sU r/z6Oc8pk5xRDcfTpfK5 =4Apv -END PGP SIGNATURE- ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Value Network
Thanks Aja, Maybe you read already, but I found Lee Sedol's comment about you. - Playing with Go AI, "I'm proud", Lee 9dan mother said http://japanese.yonhapnews.co.kr/society/2016/03/16/08AJP20160316004200882.HTML Lee 9dan commented at Gimpo Airport, Lee 9dan admired Dr. Aja Huang who played AlphaGo's moves on the board. "While playing, Mr. Huang never change his face, and never go to the restroom. Mr. Huang was one of the making effort to AlphaGo, and "Really admirable". Mr. Huang controled his features while playing even if AlphaGo played unexpected move. He got attension. Expressionless Mr. Huang has a presenceis. Net user said he is like an AlphaGo itself. Sedol 9dan's wife introduced a episode. When she orderd breakfast in the Hotel, and went to wake up Lee 9dan. Surprisingly, Lee 9dan talked in his sleep, "I'll eat with Mr. Huang". Lee 9dan laughed, "I could not remember of dream. Maybe immpression is deep. or thankfully, he kept manner. - Thanks, Hiroshi Yamashita - Original Message - From: "Aja Huang" <ajahu...@google.com> To: <computer-go@computer-go.org> Sent: Saturday, March 19, 2016 10:25 PM Subject: Re: [Computer-go] Value Network Good stuff, Hiroshi. Looks like I don't need to answer the questions regarding value network. :) Aja ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Value Network
quot;conv10_3x3_128" type: CONVOLUTION blobs_lr: 1. blobs_lr: 2. bottom: "conv9" top: "conv10" convolution_param { num_output: 128 kernel_size: 3 pad: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layers { name: "relu10" type: RELU bottom: "conv10" top: "conv10" } layers { name: "conv11_3x3_128" type: CONVOLUTION blobs_lr: 1. blobs_lr: 2. bottom: "conv10" top: "conv11" convolution_param { num_output: 128 kernel_size: 3 pad: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layers { name: "relu11" type: RELU bottom: "conv11" top: "conv11" } layers { name: "conv12_1x1_1" type: CONVOLUTION blobs_lr: 1. blobs_lr: 2. bottom: "conv11" top: "conv12" convolution_param { num_output: 1 kernel_size: 1 pad: 0 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layers { name: "relu12" type: RELU bottom: "conv12" top: "conv12" } layers { name: "fc13" type: INNER_PRODUCT bottom: "conv12" top: "fc13" inner_product_param { num_output: 256 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layers { name: "relu13" type: RELU bottom: "fc13" top: "fc13" } layers { name: "fc14" type: INNER_PRODUCT bottom: "fc13" top: "fc14" inner_product_param { num_output: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layers { name: "tanh14" type: TANH bottom: "fc14" top: "fc14" } layers { name: "loss" type: EUCLIDEAN_LOSS bottom: "fc14" bottom: "label" top: "loss" } Thanks, Hiroshi Yamashita - Original Message - From: "Detlef Schmicker" <d...@physik.de> To: <computer-go@computer-go.org> Sent: Saturday, March 19, 2016 7:41 PM Subject: Re: [Computer-go] Value Network What are you using for loss? this: layers { name: "loss4" type: EUCLIDEAN_LOSS loss_weight: 2.0 bottom: "vvv" bottom: "pool2" top: "accloss4" } ? Am 04.03.2016 um 16:23 schrieb Hiroshi Yamashita: Hi, I tried to make Value network. "Policy network + Value network" vs "Policy network" Winrate Wins/Games 70.7%322 / 455,1000 playouts/move 76.6%141 / 184, 1 playouts/move It seems more playouts, more Value network is effetctive. Games is not enough though. Search is similar to AlphaGo. Mixing parameter lambda is 0.5. Search is synchronous. Using one GTX 980. In 1 playouts/move, Policy network is called 175 times, Value network is called 786 times. Node Expansion threshold is 33. Value network is 13 layers, 128 filters. (5x5_128, 3x3_128 x10, 1x1_1, fully connect, tanh) Policy network is 12 layers, 256 filters. (5x5_256, 3x3_256 x10, 3x3_1), Accuracy is 50.1% For Value network, I collected 15804400 positions from 987775 games. Games are from GoGoD, tygem 9d, 22477 games http://baduk.sourceforge.net/TygemAmateur.7z KGS 4d over, 1450946 games http://www.u-go.net/gamerecords-4d/ (except handicaps games). And select 16 positions randomly from one game. One game is divided 16 game stage, and select one of each. 1st and 9th position are rotated in same symmetry. Then Aya searches with 500 playouts, with Policy network. And store winrate (-1 to +1). Komi is 7.5. This 500 playouts is around 2730 BayesElo on CGOS. I did some of this on Amazon EC2 g2.2xlarge, 11 instances. It took 2 days, and costed $54. Spot instance is reasonable. However g2.2xlarge(GRID K520), is 3x slower than GTX 980. My Pocicy network(12L 256F) takes 5.37ms(GTX 980), and 15.0ms(g2.2xlarge). Test and Traing loss are 0.00923 and 0.00778. I think there is no big overfitting. Value network is effective, but Aya has still fatal semeai weakness. Regards, Hiroshi Yamashita ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Value Network
{ > name: "conv7_3x3_128" > type: CONVOLUTION > blobs_lr: 1. > blobs_lr: 2. > bottom: "conv6" > top: "conv7" > convolution_param { >num_output: 128 >kernel_size: 3 >pad: 1 >weight_filler { > type: "xavier" >} >bias_filler { > type: "constant" >} > } > } > layers { > name: "relu7" > type: RELU > bottom: "conv7" > top: "conv7" > } > > layers { > name: "conv8_3x3_128" > type: CONVOLUTION > blobs_lr: 1. > blobs_lr: 2. > bottom: "conv7" > top: "conv8" > convolution_param { >num_output: 128 >kernel_size: 3 >pad: 1 >weight_filler { > type: "xavier" >} >bias_filler { > type: "constant" >} > } > } > layers { > name: "relu8" > type: RELU > bottom: "conv8" > top: "conv8" > } > > layers { > name: "conv9_3x3_128" > type: CONVOLUTION > blobs_lr: 1. > blobs_lr: 2. > bottom: "conv8" > top: "conv9" > convolution_param { >num_output: 128 >kernel_size: 3 >pad: 1 >weight_filler { > type: "xavier" >} >bias_filler { > type: "constant" >} > } > } > layers { > name: "relu9" > type: RELU > bottom: "conv9" > top: "conv9" > } > > layers { > name: "conv10_3x3_128" > type: CONVOLUTION > blobs_lr: 1. > blobs_lr: 2. > bottom: "conv9" > top: "conv10" > convolution_param { >num_output: 128 >kernel_size: 3 >pad: 1 >weight_filler { > type: "xavier" >} >bias_filler { > type: "constant" >} > } > } > layers { > name: "relu10" > type: RELU > bottom: "conv10" > top: "conv10" > } > > layers { > name: "conv11_3x3_128" > type: CONVOLUTION > blobs_lr: 1. > blobs_lr: 2. > bottom: "conv10" > top: "conv11" > convolution_param { >num_output: 128 >kernel_size: 3 >pad: 1 >weight_filler { > type: "xavier" >} >bias_filler { > type: "constant" >} > } > } > layers { > name: "relu11" > type: RELU > bottom: "conv11" > top: "conv11" > } > > layers { > name: "conv12_1x1_1" > type: CONVOLUTION > blobs_lr: 1. > blobs_lr: 2. > bottom: "conv11" > top: "conv12" > convolution_param { >num_output: 1 >kernel_size: 1 >pad: 0 >weight_filler { > type: "xavier" >} >bias_filler { > type: "constant" >} > } > } > layers { > name: "relu12" > type: RELU > bottom: "conv12" > top: "conv12" > } > > layers { > name: "fc13" > type: INNER_PRODUCT > bottom: "conv12" > top: "fc13" > inner_product_param { >num_output: 256 >weight_filler { > type: "xavier" >} >bias_filler { > type: "constant" >} > } > } > layers { > name: "relu13" > type: RELU > bottom: "fc13" > top: "fc13" > } > > layers { > name: "fc14" > type: INNER_PRODUCT > bottom: "fc13" > top: "fc14" > inner_product_param { >num_output: 1 >weight_filler { > type: "xavier" >} >bias_filler { > type: "constant" >} > } > } > layers { > name: "tanh14" > type: TANH > bottom: "fc14" > top: "fc14" > } > > layers { > name: "loss" > type: EUCLIDEAN_LOSS > bottom: "fc14" > bottom: "label" > top: "loss" > } > > > Thanks, > Hiroshi Yamashita > > - Original Message - From: "Detlef Schmicker" <d...@physik.de> > To: <computer-go@computer-go.org> > Sent: Saturday, March 19, 2016 7:41 PM > Subject: Re: [Computer-go] Value Network > > > > What are you using for loss? >> >> this: >> >> layers { >> name: "loss4" >> type: EUCLIDEAN_LOSS >> loss_weight: 2.0 >> bottom: "vvv" >> bottom: "pool2" >> top: "accloss4" >> } >> >> >> ? >> >> Am 04.03.2016 um 16:23 schrieb Hiroshi Yamashita: >> >>> Hi, >>> >>> I tried to make Value network. >>> >>> "Policy network + Value network" vs "Policy network" Winrate >>> Wins/Games 70.7%322 / 455,1000 playouts/move 76.6%141 / >>> 184, 1 playouts/move >>> >>> It seems more playouts, more Value network is effetctive. Games is >>> not enough though. Search is similar to AlphaGo. Mixing parameter >>> lambda is 0.5. Search is synchronous. Using one GTX 980. In 1 >>> playouts/move, Policy network is called 175 times, Value network is >>> called 786 times. Node Expansion threshold is 33. >>> >>> >>> Value network is 13 layers, 128 filters. (5x5_128, 3x3_128 x10, >>> 1x1_1, fully connect, tanh) Policy network is 12 layers, 256 >>> filters. (5x5_256, 3x3_256 x10, 3x3_1), Accuracy is 50.1% >>> >>> For Value network, I collected 15804400 positions from 987775 >>> games. Games are from GoGoD, tygem 9d, 22477 games >>> http://baduk.sourceforge.net/TygemAmateur.7z KGS 4d over, 1450946 >>> games http://www.u-go.net/gamerecords-4d/ (except handicaps >>> games). And select 16 positions randomly from one game. One game is >>> divided 16 game stage, and select one of each. 1st and 9th position >>> are rotated in same symmetry. Then Aya searches with 500 playouts, with >>> Policy network. And store winrate (-1 to +1). Komi is 7.5. This 500 >>> playouts is around 2730 BayesElo on CGOS. >>> >>> I did some of this on Amazon EC2 g2.2xlarge, 11 instances. It took 2 >>> days, and costed $54. Spot instance is reasonable. However g2.2xlarge(GRID >>> K520), is 3x slower than GTX 980. My Pocicy network(12L 256F) takes >>> 5.37ms(GTX 980), and 15.0ms(g2.2xlarge). Test and Traing loss are 0.00923 >>> and 0.00778. I think there is no >>> big overfitting. >>> >>> Value network is effective, but Aya has still fatal semeai >>> weakness. >>> >>> Regards, Hiroshi Yamashita >>> >> > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Value Network
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 What are you using for loss? this: layers { name: "loss4" type: EUCLIDEAN_LOSS loss_weight: 2.0 bottom: "vvv" bottom: "pool2" top: "accloss4" } ? Am 04.03.2016 um 16:23 schrieb Hiroshi Yamashita: > Hi, > > I tried to make Value network. > > "Policy network + Value network" vs "Policy network" Winrate > Wins/Games 70.7%322 / 455,1000 playouts/move 76.6%141 / > 184, 1 playouts/move > > It seems more playouts, more Value network is effetctive. Games is > not enough though. Search is similar to AlphaGo. Mixing parameter > lambda is 0.5. Search is synchronous. Using one GTX 980. In 1 > playouts/move, Policy network is called 175 times, Value network is > called 786 times. Node Expansion threshold is 33. > > > Value network is 13 layers, 128 filters. (5x5_128, 3x3_128 x10, > 1x1_1, fully connect, tanh) Policy network is 12 layers, 256 > filters. (5x5_256, 3x3_256 x10, 3x3_1), Accuracy is 50.1% > > For Value network, I collected 15804400 positions from 987775 > games. Games are from GoGoD, tygem 9d, 22477 games > http://baduk.sourceforge.net/TygemAmateur.7z KGS 4d over, 1450946 > games http://www.u-go.net/gamerecords-4d/ (except handicaps > games). And select 16 positions randomly from one game. One game is > divided 16 game stage, and select one of each. 1st and 9th position > are rotated in same symmetry. Then Aya searches with 500 playouts, > with Policy network. And store winrate (-1 to +1). Komi is 7.5. > This 500 playouts is around 2730 BayesElo on CGOS. > > I did some of this on Amazon EC2 g2.2xlarge, 11 instances. It took > 2 days, and costed $54. Spot instance is reasonable. However > g2.2xlarge(GRID K520), is 3x slower than GTX 980. My Pocicy > network(12L 256F) takes 5.37ms(GTX 980), and 15.0ms(g2.2xlarge). > Test and Traing loss are 0.00923 and 0.00778. I think there is no > big overfitting. > > Value network is effective, but Aya has still fatal semeai > weakness. > > Regards, Hiroshi Yamashita > > ___ Computer-go mailing > list Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJW7SzYAAoJEInWdHg+Znf4gV8P/jylYxgTHDY/i2RdGy4TaviL sL7M2y1d/7xpcUYZHmCBV32jhDfoIEh3hzW9ZYIVylak6JJgwy+czDVbCy30akZL MMLKYUjtcHrQJphOXcYMsBJ/CYgsRVO45AAUgFXiHRlXuCs9LVrB41QjjQ291Cph sTgNOUaxa62CHT85eKFWGfyKNIRo4p0uWQhnen2ZCrrVyaghV8Iqzjcjgxotlzuf Ur2bwBb3SlkE31x9slQYpiFJ+jkfTDLqF0Z2gFqiZTf+sUxA03LP+j64+3cMaMou kyHizVJMGk+JOg2z4cdIVFoLkT4FdrmD5R7zpv5RLQ6Q9r7wOLE5Ptho6fxkRShU d0qofNnEQwIKz48knqBdPVuo+yZlv8b/JopyVaPEN1isq0Hab0I2ASCMWAAZRbeF 5xdCP2MdvMoSuDSSBTeZRQnLCAntGZ5O+PDYbGVqVaperDSeqszSdXQ+DrhpkGwm uWT2D3YYO0/P9ovdTd0NEf11A+PlOF/CfGbdOfCbjxOT9cE2WyZOpy2hPu2SMj8d 2cgt36FVD+1lQueaWiJlTa6Q/8mehBCgLJLNHMuF3v8TTCUP2ofwWr1eZ5Y7T1DU D2xX2eVo6HUdaehwjc9X8Gsa37nJIYtEAHm2BFxzlASdntj9eMgbg3GNYtSwr/2r T6QqYjbRQJO6+mw0iU0t =S3fg -END PGP SIGNATURE- ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Value Network
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, thanks a lot for sharing! I try a slightly different approach at the moment: I use a combined policy / value network (adding 3-5 layers with about 16 filters at the end of the policy network for the value network to avoid overfitting) and I use the results of the games as value. My main problem is still overfitting! As your results seem good I will try your bigger database to get more games into training I think. I will keep you posted Detlef Am 04.03.2016 um 16:23 schrieb Hiroshi Yamashita: > Hi, > > I tried to make Value network. > > "Policy network + Value network" vs "Policy network" Winrate > Wins/Games 70.7%322 / 455,1000 playouts/move 76.6%141 / > 184, 1 playouts/move > > It seems more playouts, more Value network is effetctive. Games is > not enough though. Search is similar to AlphaGo. Mixing parameter > lambda is 0.5. Search is synchronous. Using one GTX 980. In 1 > playouts/move, Policy network is called 175 times, Value network is > called 786 times. Node Expansion threshold is 33. > > > Value network is 13 layers, 128 filters. (5x5_128, 3x3_128 x10, > 1x1_1, fully connect, tanh) Policy network is 12 layers, 256 > filters. (5x5_256, 3x3_256 x10, 3x3_1), Accuracy is 50.1% > > For Value network, I collected 15804400 positions from 987775 > games. Games are from GoGoD, tygem 9d, 22477 games > http://baduk.sourceforge.net/TygemAmateur.7z KGS 4d over, 1450946 > games http://www.u-go.net/gamerecords-4d/ (except handicaps > games). And select 16 positions randomly from one game. One game is > divided 16 game stage, and select one of each. 1st and 9th position > are rotated in same symmetry. Then Aya searches with 500 playouts, > with Policy network. And store winrate (-1 to +1). Komi is 7.5. > This 500 playouts is around 2730 BayesElo on CGOS. > > I did some of this on Amazon EC2 g2.2xlarge, 11 instances. It took > 2 days, and costed $54. Spot instance is reasonable. However > g2.2xlarge(GRID K520), is 3x slower than GTX 980. My Pocicy > network(12L 256F) takes 5.37ms(GTX 980), and 15.0ms(g2.2xlarge). > Test and Traing loss are 0.00923 and 0.00778. I think there is no > big overfitting. > > Value network is effective, but Aya has still fatal semeai > weakness. > > Regards, Hiroshi Yamashita > > ___ Computer-go mailing > list Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJW2cD5AAoJEInWdHg+Znf4KXIP/2rrfEph3VNHkrf5B4H0DJXm abSsFbqF453SjFOucjSGXv8Ecp90lCmwz41NWkQEpBLvedjl4atjMoBiCorjqhny ZKeFUgY6tK0HWU2euxHH9reJ6HAsDrlYgMrJKqNySdAtPxNq2buMW1qIiFrAHCsL wCsYlwVtz4EpViJcuSXoFufreTyfUJ7p8AxrhRtuC6ALZI1wUTm+xrwrCHPQ91Bg AKx5N2xLO2c7rHCt9FsLhR1BmXgximzmYsD7Sge4mdYMwU5nrRhxgAvX1Uj8sP8Z 2YfF+/8YmFP/rc55LqqRGzjeUwpJaX8rv1eHxl+eaNoptP7PZcFchsC5motc6XNV fjTwOhyaeEsPlPIDylJN5PNPn2hXc75MqVDHMnUn2J+VF2DdlerKMmZhqTd1VaIu sHz1+DN7PNZIO4cO3AKi9ynmBEHB1pQaRH4nDWkL6hdI8Zv6ZgJEjRhXjnFWyJcI PVmErcUI6Xn1xCXHEWhxSjKwuwil/RgdVfPgywfqhj1MiuTtkcrThpUmWcPCrLRk fxsNddSKmJcFs4nCcK/M6oO/OiZ6mn7dO4xoCWnAvds3aW71tEupTuZhYjiWx9YH KR5p4r7JIBNCSn1ZfonD3BKMKyBv7qIJ63ITSAdy0EH3aJPt4CVmZm2dsrE3ZhtW wMqhp4Yf8ecTiapJOcol =oJsM -END PGP SIGNATURE- ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
[Computer-go] Value Network
Hi, I tried to make Value network. "Policy network + Value network" vs "Policy network" Winrate Wins/Games 70.7%322 / 455,1000 playouts/move 76.6%141 / 184, 1 playouts/move It seems more playouts, more Value network is effetctive. Games is not enough though. Search is similar to AlphaGo. Mixing parameter lambda is 0.5. Search is synchronous. Using one GTX 980. In 1 playouts/move, Policy network is called 175 times, Value network is called 786 times. Node Expansion threshold is 33. Value network is 13 layers, 128 filters. (5x5_128, 3x3_128 x10, 1x1_1, fully connect, tanh) Policy network is 12 layers, 256 filters. (5x5_256, 3x3_256 x10, 3x3_1), Accuracy is 50.1% For Value network, I collected 15804400 positions from 987775 games. Games are from GoGoD, tygem 9d, 22477 games http://baduk.sourceforge.net/TygemAmateur.7z KGS 4d over, 1450946 games http://www.u-go.net/gamerecords-4d/ (except handicaps games). And select 16 positions randomly from one game. One game is divided 16 game stage, and select one of each. 1st and 9th position are rotated in same symmetry. Then Aya searches with 500 playouts, with Policy network. And store winrate (-1 to +1). Komi is 7.5. This 500 playouts is around 2730 BayesElo on CGOS. I did some of this on Amazon EC2 g2.2xlarge, 11 instances. It took 2 days, and costed $54. Spot instance is reasonable. However g2.2xlarge(GRID K520), is 3x slower than GTX 980. My Pocicy network(12L 256F) takes 5.37ms(GTX 980), and 15.0ms(g2.2xlarge). Test and Traing loss are 0.00923 and 0.00778. I think there is no big overfitting. Value network is effective, but Aya has still fatal semeai weakness. Regards, Hiroshi Yamashita ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go