Re: [Computer-go] NiceGoZero games during learning
I think it'd be quite interesting to at least keep track of the winrate over the 4d version until then (although I recognize it will be zero for some time). Maybe when it wins one? s. On Nov 6, 2017 6:00 PM, "Detlef Schmicker" wrote: > Not in this weak state of the learned net. I measure with a net trained > from 4d+ kgs games right now on CGOS (NG-learn-ref). > > This should be the line, which could be beaten by Zero after enough > learning. If I manage to beat this version (I check every learning cycle > 10 games against this version) than I will probably also measure the > strength of this, but I think this will take some weeks:) > > > Am 06.11.2017 um 17:05 schrieb uurtamo .: > > Detlef, > > > > I misunderstand your last sentence. Do you mean that eventually you'll > put > > a subset of functioning nets on CGOS to measure how quickly their > strength > > is improving? > > > > s. > > > > On Nov 6, 2017 4:54 PM, "Detlef Schmicker" wrote: > > > >> I thought it might be fun to have some games in early stage of learning > >> from nearly Zero knowledge. > >> > >> I did not turn off the (relatively weak) playouts and mix them with 30% > >> into the result from the value network. I started at an initial random > >> neural net (small one, about 4ms on GTX970) and use a relatively wide > >> search for MC (much much wider, than I do for good playing strength, > >> unpruning about 5-6 moves) and 100 playouts expanding every 3 playouts, > >> thus 33 network evaluations per move. > >> > >> Additionally I add Gaussian random numbers with a standard derivation of > >> 0.02 to the policy network. > >> > >> With this setup I play 1000 games and do an reinforcement learning cycle > >> with them. One cycle takes me about 5 hours. > >> > >> The first 2 days I did not archive games, than I noticed it might be fun > >> having games from the training history: now I always archive one game > >> per cycle. > >> > >> > >> Here are some games ... > >> > >> > >> http://physik.de/games_during_learning/ > >> > >> > >> I will probably add some more games, if I have them and will try to > >> measure, how strong the bot is with exactly this (weak broad search ) > >> configuration but a pretrained net from 4d+ kgs games on CGOS... > >> > >> > >> Detlef > >> ___ > >> Computer-go mailing list > >> Computer-go@computer-go.org > >> http://computer-go.org/mailman/listinfo/computer-go > > > > > > > > ___ > > Computer-go mailing list > > Computer-go@computer-go.org > > http://computer-go.org/mailman/listinfo/computer-go > > > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] NiceGoZero games during learning
Not in this weak state of the learned net. I measure with a net trained from 4d+ kgs games right now on CGOS (NG-learn-ref). This should be the line, which could be beaten by Zero after enough learning. If I manage to beat this version (I check every learning cycle 10 games against this version) than I will probably also measure the strength of this, but I think this will take some weeks:) Am 06.11.2017 um 17:05 schrieb uurtamo .: > Detlef, > > I misunderstand your last sentence. Do you mean that eventually you'll put > a subset of functioning nets on CGOS to measure how quickly their strength > is improving? > > s. > > On Nov 6, 2017 4:54 PM, "Detlef Schmicker" wrote: > >> I thought it might be fun to have some games in early stage of learning >> from nearly Zero knowledge. >> >> I did not turn off the (relatively weak) playouts and mix them with 30% >> into the result from the value network. I started at an initial random >> neural net (small one, about 4ms on GTX970) and use a relatively wide >> search for MC (much much wider, than I do for good playing strength, >> unpruning about 5-6 moves) and 100 playouts expanding every 3 playouts, >> thus 33 network evaluations per move. >> >> Additionally I add Gaussian random numbers with a standard derivation of >> 0.02 to the policy network. >> >> With this setup I play 1000 games and do an reinforcement learning cycle >> with them. One cycle takes me about 5 hours. >> >> The first 2 days I did not archive games, than I noticed it might be fun >> having games from the training history: now I always archive one game >> per cycle. >> >> >> Here are some games ... >> >> >> http://physik.de/games_during_learning/ >> >> >> I will probably add some more games, if I have them and will try to >> measure, how strong the bot is with exactly this (weak broad search ) >> configuration but a pretrained net from 4d+ kgs games on CGOS... >> >> >> Detlef >> ___ >> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go > > > > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] NiceGoZero games during learning
Detlef, I misunderstand your last sentence. Do you mean that eventually you'll put a subset of functioning nets on CGOS to measure how quickly their strength is improving? s. On Nov 6, 2017 4:54 PM, "Detlef Schmicker" wrote: > I thought it might be fun to have some games in early stage of learning > from nearly Zero knowledge. > > I did not turn off the (relatively weak) playouts and mix them with 30% > into the result from the value network. I started at an initial random > neural net (small one, about 4ms on GTX970) and use a relatively wide > search for MC (much much wider, than I do for good playing strength, > unpruning about 5-6 moves) and 100 playouts expanding every 3 playouts, > thus 33 network evaluations per move. > > Additionally I add Gaussian random numbers with a standard derivation of > 0.02 to the policy network. > > With this setup I play 1000 games and do an reinforcement learning cycle > with them. One cycle takes me about 5 hours. > > The first 2 days I did not archive games, than I noticed it might be fun > having games from the training history: now I always archive one game > per cycle. > > > Here are some games ... > > > http://physik.de/games_during_learning/ > > > I will probably add some more games, if I have them and will try to > measure, how strong the bot is with exactly this (weak broad search ) > configuration but a pretrained net from 4d+ kgs games on CGOS... > > > Detlef > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
[Computer-go] NiceGoZero games during learning
I thought it might be fun to have some games in early stage of learning from nearly Zero knowledge. I did not turn off the (relatively weak) playouts and mix them with 30% into the result from the value network. I started at an initial random neural net (small one, about 4ms on GTX970) and use a relatively wide search for MC (much much wider, than I do for good playing strength, unpruning about 5-6 moves) and 100 playouts expanding every 3 playouts, thus 33 network evaluations per move. Additionally I add Gaussian random numbers with a standard derivation of 0.02 to the policy network. With this setup I play 1000 games and do an reinforcement learning cycle with them. One cycle takes me about 5 hours. The first 2 days I did not archive games, than I noticed it might be fun having games from the training history: now I always archive one game per cycle. Here are some games ... http://physik.de/games_during_learning/ I will probably add some more games, if I have them and will try to measure, how strong the bot is with exactly this (weak broad search ) configuration but a pretrained net from 4d+ kgs games on CGOS... Detlef ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go