The downside to training the moves of the winner is that you "throw away" half of the data. But that isn't so bad, because you can always get as much data as you need.
The upside is that losing positions do not necessarily have a training signal. That is: there is no good move in a losing position. So you benefit by throwing away the portion of the moves that occur from the last game-losing move and onward. -----Original Message----- From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Hiroshi Yamashita Sent: Tuesday, December 20, 2016 5:45 AM To: computer-go@computer-go.org Subject: Re: [Computer-go] Some experiences with CNN trained on moves by thewinning player Hi Detlef, > Interestingly the prediction rate of this moves was slightly higher > (without training, just taking the previously trained network) than I also have started training with only using the moves by the winner of each game, from GoGoD with 8 symmetries. I also got accuracy up from 49.1% to 50.1. (without training, Iteration 0, finetuning from big learning rate) My learning is not finished, so I can't say how strong it is though. This result is interesting. Loser's move is diffcult for DCNN? Or DCNN tends to learn only good moves? I have thought if DCNN learn KGS 6k moves, it can reproduce 6k moves. But this is not correct? Thanks, Hiroshi Yamashita ----- Original Message ----- From: "Detlef Schmicker" <d...@physik.de> To: <computer-go@computer-go.org> Sent: Sunday, December 11, 2016 7:38 PM Subject: [Computer-go] Some experiences with CNN trained on moves by thewinning player >I want to share some experience training my policy cnn: > > As I wondered, why reinforcement learning was so helpful. I trained > from the Godod database with only using the moves by the winner of > each game. > > Interestingly the prediction rate of this moves was slightly higher > (without training, just taking the previously trained network) than > taking into account the moves by both players (53% against 52%) > > Training on winning player moves did not help a lot, I got a > statistical significant improvement of about 20-30ELO. > > So I still don't understand, why reinforcement should do around > 100-200ELO :) > > Detlef _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go