Re: [Computer-go] *****SPAM***** Re: Move evalution by expected value, as product of expected winrate and expected points?

2016-02-23 Thread David Fotland
to be dynamic,
> favoring winrate the further the game progresses.
> 
> 
> As a general example for the idea, let's assume we have the following
> potential moves generated by our program:
> 
> #1: Winrate 55%, +5 expected final points
> #2: Winrate 53%, +15 expected final points
> 
> Is the move with higher winrate always better? Or would there be some
> benefit to choosing #2? Would this differ depending on how far along the
> game is?
> 
> If we knew the winrate prediction to be perfect, then going by that
> alone would probably result in the best overall performance. But given
> some uncertainty there, expected value could be interesting.
> 
> 
> Any takers for some experiments?
> 
> 
> -Michael
> 
> 
> --
> 
> Message: 3
> Date: Tue, 23 Feb 2016 06:44:04 -0500
> From:  lvaro Begu  <alvaro.be...@gmail.com[alvaro.be...@gmail.com]>
> To: computer-go <computer-go@computer-go.org[computer-go@computer-
> go.org]>
> Subject: Re: [Computer-go] Move evalution by expected value, as
> product of expected winrate and expected points?
> Message-ID:
>  Q07YeLZwqV9M9JCW+_VbSRVp=evj9cn6w...@mail.gmail.com[evj9cn6w...@mail.gma
> il.com]>
> Content-Type: text/plain; charset="utf-8"
> 
> I have experimented with a CNN that predicts ownership, but I found it
> to be too weak to be useful. The main difference between what Google did
> and what I did is in the dataset used for training: I had tens of
> thousands of games (I did several different experiments) and I used all
> the positions from each game (which is known to be problematic); they
> used 30M positions from independent games. I expect you can learn a lot
> about ownership and expected number of points from a dataset like that.
> Unfortunately, generating such a dataset is infeasible with the
> resources most of us have.
> 
> Here's an idea: Google could make the dataset publicly available for
> download, ideally with the final configurations of the board as well.
> There is a tradition of making interesting datasets for machine learning
> available, so I have some hope this may happen.
> 
> The one experiment I would like to make along the lines of your post is
> to train a CNN to compute both the expected number of points and its
> standard deviation. If you assume the distribution of scores is well
> approximated by a normal distribution, maximizing winning probability
> can be achieved by maximizing (expected score) / (standard deviation of
> the score). I wonder if that results in stronger or more natural play
> than making a direct model for winning probability, because you get to
> learn more about each position.
> 
>  lvaro.
> 
> 
> 
> On Tue, Feb 23, 2016 at 5:36 AM, Michael Markefka <
> michael.marke...@gmail.com[michael.marke...@gmail.com]> wrote:
> 
> > Hello everyone,
> >
> > in the wake of AlphaGo using a DCNN to predict expected winrate of a
> > move, I've been wondering whether one could train a DCNN for expected
> > territory or points successfully enough to be of some use (leaving the
> > issue of win by resignation for a more in-depth discussion). And,
> > whether winrate and expected territory (or points) always run in
> > parallel or whether there are diverging moments.
> >
> > Computer Go programs play what are considered slack or slow moves when
> > ahead, sometimes being too conservative and giving away too much of
> > their potential advantage. If expected points and expected winrate
> > diverge, this could be a way to make the programs play in a more
> > natural way, even if there were no strength increase to be gained.
> > Then again there might be a parameter configuration that might yield
> > some advantage and perhaps this configuration would need to be
> > dynamic, favoring winrate the further the game progresses.
> >
> >
> > As a general example for the idea, let's assume we have the following
> > potential moves generated by our program:
> >
> > #1: Winrate 55%, +5 expected final points
> > #2: Winrate 53%, +15 expected final points
> >
> > Is the move with higher winrate always better? Or would there be some
> > benefit to choosing #2? Would this differ depending on how far along
> > the game is?
> >
> > If we knew the winrate prediction to be perfect, then going by that
> > alone would probably result in the best overall performance. But given
> > some uncertainty there, expected value could be interesting.
> >
> >
> > Any takers for some experiments?
> >
> >
> > -Michael
> > ___
>

Re: [Computer-go] Mastering the Game of Go with Deep Neural Networks and Tree Search

2016-02-23 Thread Petri Pitkanen
Opent to intepretation if this method is brute force. I think it i. Uses
huge amounts of CPU power to run simulations and evaluate NN's. Even in
chess it was not just about tree search, it needs evaluationfunction ot
make sense of the search

2016-02-24 6:52 GMT+02:00 muupan :

> Congratulations, people at DeepMind! Your paper is very interesting to
> read.
>
> I have a question about the paper. On policy network training it says
>
> > On the first pass through the training pipeline, the baseline was set to
> zero; on the second pass we used the value network vθ(s) as a baseline;
>
> but I cannot find any other description about the "second pass". What is
> it? It uses vθ(s), so at least it is done after training vθ(s). Is it that
> after completing the whole training pipeline depicted in Fig. 1, only the
> RL policy network training part is repeated? Or training vθ(s) is also
> repeated? Is the second pass the last pass, or there are more passes? Sorry
> if I just missed the relevant part of the paper.
>
>
> 2016-02-13 12:21 GMT+09:00 John Tromp :
>
>> On Wed, Jan 27, 2016 at 1:46 PM, Aja Huang  wrote:
>> > We are very excited to announce that our Go program, AlphaGo, has
>> beaten a
>> > professional player for the first time. AlphaGo beat the European
>> champion
>> > Fan Hui by 5 games to 0.
>>
>> It's interesting to go back nearly a decade and read this 2007 article:
>>
>> http://spectrum.ieee.org/computing/software/cracking-go
>>
>> where Feng-Hsiung Hsu, Deep Blue's lead developer, made this prediction:
>>
>> "Nevertheless, I believe that a world-champion-level Go machine can be
>> built within 10 years"
>>
>> Which now appears to be spot on. March 9 cannot come soon enough...
>> The remainder of his prediction rings less true though:
>>
>> ", based on the same method of intensive analysis—brute force,
>> basically—that Deep Blue employed for chess".
>>
>> regards,
>> -John
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering the Game of Go with Deep Neural Networks and Tree Search

2016-02-23 Thread muupan
Congratulations, people at DeepMind! Your paper is very interesting to read.

I have a question about the paper. On policy network training it says

> On the first pass through the training pipeline, the baseline was set to
zero; on the second pass we used the value network vθ(s) as a baseline;

but I cannot find any other description about the "second pass". What is
it? It uses vθ(s), so at least it is done after training vθ(s). Is it that
after completing the whole training pipeline depicted in Fig. 1, only the
RL policy network training part is repeated? Or training vθ(s) is also
repeated? Is the second pass the last pass, or there are more passes? Sorry
if I just missed the relevant part of the paper.


2016-02-13 12:21 GMT+09:00 John Tromp :

> On Wed, Jan 27, 2016 at 1:46 PM, Aja Huang  wrote:
> > We are very excited to announce that our Go program, AlphaGo, has beaten
> a
> > professional player for the first time. AlphaGo beat the European
> champion
> > Fan Hui by 5 games to 0.
>
> It's interesting to go back nearly a decade and read this 2007 article:
>
> http://spectrum.ieee.org/computing/software/cracking-go
>
> where Feng-Hsiung Hsu, Deep Blue's lead developer, made this prediction:
>
> "Nevertheless, I believe that a world-champion-level Go machine can be
> built within 10 years"
>
> Which now appears to be spot on. March 9 cannot come soon enough...
> The remainder of his prediction rings less true though:
>
> ", based on the same method of intensive analysis—brute force,
> basically—that Deep Blue employed for chess".
>
> regards,
> -John
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Move evalution by expected value, as product of expected winrate and expected points?

2016-02-23 Thread dave.de...@planet.nl
If you accumulate end scores of playout results, you can make a histogram by 
plotting the frequency of a score f(s) as a function of the score. The winrate 
is the sum(f(s)) where s > 0. The average score is sum(s * f(s)) / sum(s) 
summed over all s. 
When the distibution can be approximated by a normal distribution, it may not 
matter much whether you choose to maximize winrate or average score. 
But in general, the distribution could be a multimodal distribution (in fact I 
think it always is, unless you solved the game à la Conway). In that case, the 
average score may not be a very reliable representation of the situation. For 
example, having a 99% chance of losing by 0.5 points combined with a 1% chance 
of winning by 100 points might give you the impression that are winning by 0.5 
points (which would be the average score), while in reality you have only a 1% 
of winning (which would be the winrate).
Dave de Vos
Origineel Bericht
Van : alvaro.be...@gmail.com
Datum : 23/02/2016 12:44
Aan : computer-go@computer-go.org
Onderwerp : Re: [Computer-go] Move evalution by expected value, as product of 
expected winrate and expected points?
I have experimented with a CNN that predicts ownership, but I found it to be 
too weak to be useful. The main difference between what Google did and what I 
did is in the dataset used for training: I had tens of thousands of games (I 
did several different experiments) and I used all the positions from each game 
(which is known to be problematic); they used 30M positions from independent 
games. I expect you can learn a lot about ownership and expected number of 
points from a dataset like that. Unfortunately, generating such a dataset is 
infeasible with the resources most of us have.
Here's an idea: Google could make the dataset publicly available for download, 
ideally with the final configurations of the board as well. There is a 
tradition of making interesting datasets for machine learning available, so I 
have some hope this may happen.
The one experiment I would like to make along the lines of your post is to 
train a CNN to compute both the expected number of points and its standard 
deviation. If you assume the distribution of scores is well approximated by a 
normal distribution, maximizing winning probability can be achieved by 
maximizing (expected score) / (standard deviation of the score). I wonder if 
that results in stronger or more natural play than making a direct model for 
winning probability, because you get to learn more about each position.
Álvaro.
On Tue, Feb 23, 2016 at 5:36 AM, Michael Markefka  
wrote:
Hello everyone,
in the wake of AlphaGo using a DCNN to predict expected winrate of a
move, I've been wondering whether one could train a DCNN for expected
territory or points successfully enough to be of some use (leaving the
issue of win by resignation for a more in-depth discussion). And,
whether winrate and expected territory (or points) always run in
parallel or whether there are diverging moments.
Computer Go programs play what are considered slack or slow moves when
ahead, sometimes being too conservative and giving away too much of
their potential advantage. If expected points and expected winrate
diverge, this could be a way to make the programs play in a more
natural way, even if there were no strength increase to be gained.
Then again there might be a parameter configuration that might yield
some advantage and perhaps this configuration would need to be
dynamic, favoring winrate the further the game progresses.
As a general example for the idea, let's assume we have the following
potential moves generated by our program:
#1: Winrate 55%, +5 expected final points
#2: Winrate 53%, +15 expected final points
Is the move with higher winrate always better? Or would there be some
benefit to choosing #2? Would this differ depending on how far along
the game is?
If we knew the winrate prediction to be perfect, then going by that
alone would probably result in the best overall performance. But given
some uncertainty there, expected value could be interesting.
Any takers for some experiments?
-Michael
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Move evalution by expected value, as product of expected winrate and expected points?

2016-02-23 Thread Erik van der Werf
On Tue, Feb 23, 2016 at 4:41 PM, Justin .Gilmer  wrote:

> I made a similar attempt as Alvaro to predict final ownership. You can
> find the code here: https://github.com/jmgilmer/GoCNN/. It's trained to
> predict final ownership for about 15000 professional games which were
> played until the end (didn't end in resignation). It gets about 80.5%
> accuracy on a held out test set, although the accuracy greatly varies based
> on how far through the game you are. Can't say how well it would work in a
> go player.
>

At the risk of sounding like a broken record; that result (~80%) seems
similar to what I got many years ago when excluding life knowledge.
When life & death knowledge is included (which can also be learned from
examples and/or self-play) then the accuracy should approach 100% for final
positions. For more information see chapter 10 of my thesis (
http://erikvanderwerf.tengen.nl/pubdown/thesis_erikvanderwerf.pdf).

Erik
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Move evalution by expected value, as product of expected winrate and expected points?

2016-02-23 Thread Justin .Gilmer
ak to be useful. The main difference between what Google did and
> what I did is in the dataset used for training: I had tens of thousands of
> games (I did several different experiments) and I used all the positions
> from each game (which is known to be problematic); they used 30M positions
> from independent games. I expect you can learn a lot about ownership and
> expected number of points from a dataset like that. Unfortunately,
> generating such a dataset is infeasible with the resources most of us have.
>
> Here's an idea: Google could make the dataset publicly available for
> download, ideally with the final configurations of the board as well. There
> is a tradition of making interesting datasets for machine learning
> available, so I have some hope this may happen.
>
> The one experiment I would like to make along the lines of your post is to
> train a CNN to compute both the expected number of points and its standard
> deviation. If you assume the distribution of scores is well approximated by
> a normal distribution, maximizing winning probability can be achieved by
> maximizing (expected score) / (standard deviation of the score). I wonder
> if that results in stronger or more natural play than making a direct model
> for winning probability, because you get to learn more about each position.
>
> Álvaro.
>
>
>
> On Tue, Feb 23, 2016 at 5:36 AM, Michael Markefka <
> michael.marke...@gmail.com> wrote:
>
> > Hello everyone,
> >
> > in the wake of AlphaGo using a DCNN to predict expected winrate of a
> > move, I've been wondering whether one could train a DCNN for expected
> > territory or points successfully enough to be of some use (leaving the
> > issue of win by resignation for a more in-depth discussion). And,
> > whether winrate and expected territory (or points) always run in
> > parallel or whether there are diverging moments.
> >
> > Computer Go programs play what are considered slack or slow moves when
> > ahead, sometimes being too conservative and giving away too much of
> > their potential advantage. If expected points and expected winrate
> > diverge, this could be a way to make the programs play in a more
> > natural way, even if there were no strength increase to be gained.
> > Then again there might be a parameter configuration that might yield
> > some advantage and perhaps this configuration would need to be
> > dynamic, favoring winrate the further the game progresses.
> >
> >
> > As a general example for the idea, let's assume we have the following
> > potential moves generated by our program:
> >
> > #1: Winrate 55%, +5 expected final points
> > #2: Winrate 53%, +15 expected final points
> >
> > Is the move with higher winrate always better? Or would there be some
> > benefit to choosing #2? Would this differ depending on how far along
> > the game is?
> >
> > If we knew the winrate prediction to be perfect, then going by that
> > alone would probably result in the best overall performance. But given
> > some uncertainty there, expected value could be interesting.
> >
> >
> > Any takers for some experiments?
> >
> >
> > -Michael
> > ___
> > Computer-go mailing list
> > Computer-go@computer-go.org
> > http://computer-go.org/mailman/listinfo/computer-go
> -- next part --
> An HTML attachment was scrubbed...
> URL: <
> http://computer-go.org/pipermail/computer-go/attachments/20160223/700a08a3/attachment-0001.html
> >
>
> --
>
> Message: 4
> Date: Tue, 23 Feb 2016 12:54:22 +0100
> From: Robert Jasiek <jas...@snafu.de>
> To: computer-go@computer-go.org
> Subject: Re: [Computer-go] Move evalution by expected value, as
> product of expected winrate and expected points?
> Message-ID: <56cc486e.1030...@snafu.de>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> On 23.02.2016 11:36, Michael Markefka wrote:
> > whether one could train a DCNN for expected territory
>
> First, some definition of territory must be chosen or stated. Second,
> you must decide if territory according to this definition can be
> determined by a neural net meaningfully at all. Third, if yes, do it.
>
> Note that there are very different definitions of territory. The most
> suitable definition for positional judgement (see Positional Judgement 1
> - Territory) is sophisticated and requires a combination of expert rules
> (specifying for what to detemine, and how to read to 

Re: [Computer-go] Move evalution by expected value, as product of expected winrate and expected points?

2016-02-23 Thread Robert Jasiek

On 23.02.2016 11:36, Michael Markefka wrote:

whether one could train a DCNN for expected territory


First, some definition of territory must be chosen or stated. Second, 
you must decide if territory according to this definition can be 
determined by a neural net meaningfully at all. Third, if yes, do it.


Note that there are very different definitions of territory. The most 
suitable definition for positional judgement (see Positional Judgement 1 
- Territory) is sophisticated and requires a combination of expert rules 
(specifying for what to detemine, and how to read to determine it) and 
reading.


A weak definition could predict whether a particular intersections will 
be territory in the game end's scoring position. Such can be fast for MC 
or NN, and maybe such is good enough as a very rough approximation for 
programs. For humans, such is very bad because it neglects different 
degrees of safety of (potential) territory and the strategic concepts of 
sacrifice and exchange.


I have also suggested other definitions, but IMO they are less 
attractive for NN.


--
robert jasiek
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Move evalution by expected value, as product of expected winrate and expected points?

2016-02-23 Thread Álvaro Begué
I have experimented with a CNN that predicts ownership, but I found it to
be too weak to be useful. The main difference between what Google did and
what I did is in the dataset used for training: I had tens of thousands of
games (I did several different experiments) and I used all the positions
from each game (which is known to be problematic); they used 30M positions
from independent games. I expect you can learn a lot about ownership and
expected number of points from a dataset like that. Unfortunately,
generating such a dataset is infeasible with the resources most of us have.

Here's an idea: Google could make the dataset publicly available for
download, ideally with the final configurations of the board as well. There
is a tradition of making interesting datasets for machine learning
available, so I have some hope this may happen.

The one experiment I would like to make along the lines of your post is to
train a CNN to compute both the expected number of points and its standard
deviation. If you assume the distribution of scores is well approximated by
a normal distribution, maximizing winning probability can be achieved by
maximizing (expected score) / (standard deviation of the score). I wonder
if that results in stronger or more natural play than making a direct model
for winning probability, because you get to learn more about each position.

Álvaro.



On Tue, Feb 23, 2016 at 5:36 AM, Michael Markefka <
michael.marke...@gmail.com> wrote:

> Hello everyone,
>
> in the wake of AlphaGo using a DCNN to predict expected winrate of a
> move, I've been wondering whether one could train a DCNN for expected
> territory or points successfully enough to be of some use (leaving the
> issue of win by resignation for a more in-depth discussion). And,
> whether winrate and expected territory (or points) always run in
> parallel or whether there are diverging moments.
>
> Computer Go programs play what are considered slack or slow moves when
> ahead, sometimes being too conservative and giving away too much of
> their potential advantage. If expected points and expected winrate
> diverge, this could be a way to make the programs play in a more
> natural way, even if there were no strength increase to be gained.
> Then again there might be a parameter configuration that might yield
> some advantage and perhaps this configuration would need to be
> dynamic, favoring winrate the further the game progresses.
>
>
> As a general example for the idea, let's assume we have the following
> potential moves generated by our program:
>
> #1: Winrate 55%, +5 expected final points
> #2: Winrate 53%, +15 expected final points
>
> Is the move with higher winrate always better? Or would there be some
> benefit to choosing #2? Would this differ depending on how far along
> the game is?
>
> If we knew the winrate prediction to be perfect, then going by that
> alone would probably result in the best overall performance. But given
> some uncertainty there, expected value could be interesting.
>
>
> Any takers for some experiments?
>
>
> -Michael
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Move evalution by expected value, as product of expected winrate and expected points?

2016-02-23 Thread Michael Markefka
Hello everyone,

in the wake of AlphaGo using a DCNN to predict expected winrate of a
move, I've been wondering whether one could train a DCNN for expected
territory or points successfully enough to be of some use (leaving the
issue of win by resignation for a more in-depth discussion). And,
whether winrate and expected territory (or points) always run in
parallel or whether there are diverging moments.

Computer Go programs play what are considered slack or slow moves when
ahead, sometimes being too conservative and giving away too much of
their potential advantage. If expected points and expected winrate
diverge, this could be a way to make the programs play in a more
natural way, even if there were no strength increase to be gained.
Then again there might be a parameter configuration that might yield
some advantage and perhaps this configuration would need to be
dynamic, favoring winrate the further the game progresses.


As a general example for the idea, let's assume we have the following
potential moves generated by our program:

#1: Winrate 55%, +5 expected final points
#2: Winrate 53%, +15 expected final points

Is the move with higher winrate always better? Or would there be some
benefit to choosing #2? Would this differ depending on how far along
the game is?

If we knew the winrate prediction to be perfect, then going by that
alone would probably result in the best overall performance. But given
some uncertainty there, expected value could be interesting.


Any takers for some experiments?


-Michael
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go