Re: [Computer-go] Some experiences with CNN trained on moves by the winning player

2016-12-11 Thread Brian Sheppard
IMO, training using only the moves of winners is obviously the practical choice. Worst case: you "waste" half of your data. But that is actually not a downside provided that you have lots of data, and as your program strengthens you will avoid potential data-quality problems. Asymptotically,

Re: [Computer-go] Some experiences with CNN trained on moves by the winning player

2016-12-11 Thread Álvaro Begué
On Sun, Dec 11, 2016 at 4:50 PM, Rémi Coulom wrote: > It makes the policy stronger because it makes it more deterministic. The > greedy policy is way stronger than the probability distribution. > I suspected this is what it was mainly about. Did you run any experiments to

Re: [Computer-go] Some experiences with CNN trained on moves by the winning player

2016-12-11 Thread Rémi Coulom
It makes the policy stronger because it makes it more deterministic. The greedy policy is way stronger than the probability distribution. Rémi - Mail original - De: "Detlef Schmicker" À: computer-go@computer-go.org Envoyé: Dimanche 11 Décembre 2016 11:38:08 Objet:

Re: [Computer-go] Some experiences with CNN trained on moves by the winning player

2016-12-11 Thread Erik van der Werf
On Sun, Dec 11, 2016 at 8:44 PM, Detlef Schmicker wrote: > Hi Erik, > > as far as I understood it, it was 250ELO in policy network alone ... Two problems: (1) it is a self-play result, (2) the policy was tested as a stand-alone player. A policy trained to win games will beat a

Re: [Computer-go] Some experiences with CNN trained on moves by the winning player

2016-12-11 Thread Detlef Schmicker
Hi Erik, as far as I understood it, it was 250ELO in policy network alone ... section 2Reinforcement Learning of Policy Networks We evaluated the performance of the RL policy network in game play, sampling each move (...) from its output probability distribution over actions. When

Re: [Computer-go] Some experiences with CNN trained on moves by the winning player

2016-12-11 Thread Erik van der Werf
Detlef, I think your result makes sense. For games between near-equally strong players the winning player's moves will not be much better than the loosing player's moves. The game is typically decided by subtle mistakes. Even if nearly all my moves are perfect, just one blunder can throw the game.

[Computer-go] Some experiences with CNN trained on moves by the winning player

2016-12-11 Thread Detlef Schmicker
I want to share some experience training my policy cnn: As I wondered, why reinforcement learning was so helpful. I trained from the Godod database with only using the moves by the winner of each game. Interestingly the prediction rate of this moves was slightly higher (without training, just