The downside to training the moves of the winner is that you "throw away" half 
of the data. But that isn't so bad, because you can always get as much data as 
you need.

The upside is that losing positions do not necessarily have a training signal. 
That is: there is no good move in a losing position. So you benefit by throwing 
away the portion of the moves that occur from the last game-losing move and 
onward.




-----Original Message-----
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Hiroshi Yamashita
Sent: Tuesday, December 20, 2016 5:45 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Some experiences with CNN trained on moves by 
thewinning player

Hi Detlef,

> Interestingly the prediction rate of this moves was slightly higher 
> (without training, just taking the previously trained network) than

I also have started training with only using the moves by the winner  of each 
game, from GoGoD with 8 symmetries.
I also got accuracy up from 49.1% to 50.1.
(without training, Iteration 0, finetuning from big learning rate) My learning 
is not finished, so I can't say how strong it is though.

This result is interesting.
Loser's move is diffcult for DCNN? Or DCNN tends to learn only good moves?
I have thought if DCNN learn KGS 6k moves, it can reproduce 6k moves.
But this is not correct?

Thanks,
Hiroshi Yamashita

----- Original Message -----
From: "Detlef Schmicker" <d...@physik.de>
To: <computer-go@computer-go.org>
Sent: Sunday, December 11, 2016 7:38 PM
Subject: [Computer-go] Some experiences with CNN trained on moves by thewinning 
player


>I want to share some experience training my policy cnn:
> 
> As I wondered, why reinforcement learning was so helpful. I trained
> from the Godod database with only using the moves by the winner of
> each game.
> 
> Interestingly the prediction rate of this moves was slightly higher
> (without training, just taking the previously trained network) than
> taking into account the moves by both players (53% against 52%)
> 
> Training on winning player moves did not help a lot, I got a
> statistical significant improvement of about 20-30ELO.
> 
> So I still don't understand, why reinforcement should do around
> 100-200ELO :)
> 
> Detlef

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to