[Computer-go] CLeelaz9Test on CGOS 9x9

2018-02-13 Thread Hiroshi Yamashita

Hi,

It seems CLeelaz9Test rating has changed from 1700 to 2500.
http://www.yss-aya.com/cgos/9x9/cross/CLeelaz9Test.html

Could CLeelaz9Test author change its name when program get stronger?
like CLeelaz9_1 or CLeelaz9_2.

CGOS users may want to know correct rating quickly.
Same program name but different strength is not good for CGOS rating system.
Especially, BayesElo assumes all program strength are fixed.

Thanks,
Hiroshi Yamashita

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] MCTS with win-draw-loss scores

2018-02-13 Thread David Wu
Ah, right, the cases where you and your opponent's interests are not
perfectly anti-aligned make things a bit trickier, possibly introducing
some game theory into the mix. Then I don't know. :)

My first instinct is to say that in principle you should provide the neural
net both "must-win" parameters, and have the neural net produce two value
outputs, namely the expected utilities for each side separately (which
might not sum to 0), which the MCTS would accumulate separately, and at
each node depending on who is to move it would use the appropriate side's
statistics to choose the child to simulate. That's quite a big difference
though, and I haven't thought about ways that this could go wrong, it seems
like there might easily be some big pitfalls here.

The case where you and your opponent's interests are exactly anti-aligned
should still be straightforward though. In that case the way I think of it
is that "Play chess where draws are worth half a win and half a loss" and
"Play chess where draws are losing for you and winning for your opponent"
are two entirely distinct zero-sum games that merely happen to share a lot
of rules and features. So of course you should train on both games and
distinguish the two so that the neural net always knows which one it's
playing, but you can still share the same neural net instead of having two
separate nets to take advantage of the fact that each one will regularize
the learning for the other.

Maybe you still do need to take a little care, for example in Chess if the
bot gets sufficiently strong then must-win as black might just always fail
to succeed and only produce uninformative samples of always failing,
harming the training. I'm optimistic, but ultimately, all this would still
need testing.

On Tue, Feb 13, 2018 at 12:11 PM, Dan Schmidt  wrote:

> Do you intend to use the same draw values for both sides in the self-play
> games? They can be independent:
>  - in a 3/1/0 scenario, neither player is especially happy with a draw
> (and in fact would rather each throw a game to each other in a two-game
> match than make two draws, but that's a separate issue);
>  - in a match with one game left, both players agree that a draw and a
> Black win (say) are equivalent results;
>  - in a tournament, the must-win situations of both players could be
> independent.
>
> In real life you usually have a good sense of how your opponent's
> "must-win" parameter is set, but that doesn't really apply here.
>
>
> On Tue, Feb 13, 2018 at 10:58 AM, David Wu  wrote:
>
>> Actually this pretty much solves the whole issue right? Of course the
>> proof would be to actually test it out, but it seems to me a pretty
>> straightforward solution, not nontrivial at all.
>>
>>
>> On Feb 13, 2018 10:52 AM, "David Wu"  wrote:
>>
>> Seems to me like you could fix that in the policy too by providing an
>> input feature plane that indicates the value of a draw, whether 0 as
>> normal, or -1 for must-win, or -1/3 for 3/1/0, or 1 for only-need-not-lose,
>> etc.
>>
>> Then just play games with a variety of values for this parameter in your
>> self-play training pipeline so the policy net gets exposed to each kind of
>> game.
>>
>> On Feb 13, 2018 10:40 AM, "Dan Schmidt"  wrote:
>>
>> The AlphaZero paper says that they just assign values 1, 0, and -1 to
>> wins, draws, and losses respectively. This is fine for maximizing your
>> expected value over an infinite number of games given the way that chess
>> tournaments (to pick the example that I'm familiar with) are typically
>> scored, where you get 1, 0.5, and 0 points respectively for wins, draws,
>> and losses.
>>
>> However 1) not all tournaments use this scoring system (3/1/0 is popular
>> these days, to discourage draws), and 2) this system doesn't account for
>> must-win situations where a draw is as bad as a loss (say you are 1 point
>> behind your opponent and it's the last game of a match). Ideally you'd keep
>> track of all three probabilities and use some linear meta-scoring function
>> on top of them. I don't think it's trivial to extend the AlphaZero
>> architecture to handle this, though. Maybe it is sufficient to train with
>> the standard meta-scoring (while keeping track of the separate W/D/L
>> probabilities) but then use the currently applicable meta-scoring while
>> playing. Your policy network won't quite match your current situation, but
>> at least your value network and search will.
>>
>> On Tue, Feb 13, 2018 at 10:05 AM, "Ingo Althöfer" <3-hirn-ver...@gmx.de>
>> wrote:
>>
>>> Hello,
>>>
>>> what is known about proper MCTS procedures for games
>>> which do not only have wins and losses, but also draws
>>> (like chess, Shogi or Go with integral komi)?
>>>
>>> Should neural nets provide (win, draw, loss)-probabilities
>>> for positions in such games?
>>>
>>> Ingo.
>>> ___
>>> Computer-go mailing list
>>> 

Re: [Computer-go] MCTS with win-draw-loss scores

2018-02-13 Thread Dave Dyer
The exact meaning of the result MCTS returns is irrelevant.  The
net should just learn it.

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] MCTS with win-draw-loss scores

2018-02-13 Thread Dan Schmidt
Do you intend to use the same draw values for both sides in the self-play
games? They can be independent:
 - in a 3/1/0 scenario, neither player is especially happy with a draw (and
in fact would rather each throw a game to each other in a two-game match
than make two draws, but that's a separate issue);
 - in a match with one game left, both players agree that a draw and a
Black win (say) are equivalent results;
 - in a tournament, the must-win situations of both players could be
independent.

In real life you usually have a good sense of how your opponent's
"must-win" parameter is set, but that doesn't really apply here.


On Tue, Feb 13, 2018 at 10:58 AM, David Wu  wrote:

> Actually this pretty much solves the whole issue right? Of course the
> proof would be to actually test it out, but it seems to me a pretty
> straightforward solution, not nontrivial at all.
>
>
> On Feb 13, 2018 10:52 AM, "David Wu"  wrote:
>
> Seems to me like you could fix that in the policy too by providing an
> input feature plane that indicates the value of a draw, whether 0 as
> normal, or -1 for must-win, or -1/3 for 3/1/0, or 1 for only-need-not-lose,
> etc.
>
> Then just play games with a variety of values for this parameter in your
> self-play training pipeline so the policy net gets exposed to each kind of
> game.
>
> On Feb 13, 2018 10:40 AM, "Dan Schmidt"  wrote:
>
> The AlphaZero paper says that they just assign values 1, 0, and -1 to
> wins, draws, and losses respectively. This is fine for maximizing your
> expected value over an infinite number of games given the way that chess
> tournaments (to pick the example that I'm familiar with) are typically
> scored, where you get 1, 0.5, and 0 points respectively for wins, draws,
> and losses.
>
> However 1) not all tournaments use this scoring system (3/1/0 is popular
> these days, to discourage draws), and 2) this system doesn't account for
> must-win situations where a draw is as bad as a loss (say you are 1 point
> behind your opponent and it's the last game of a match). Ideally you'd keep
> track of all three probabilities and use some linear meta-scoring function
> on top of them. I don't think it's trivial to extend the AlphaZero
> architecture to handle this, though. Maybe it is sufficient to train with
> the standard meta-scoring (while keeping track of the separate W/D/L
> probabilities) but then use the currently applicable meta-scoring while
> playing. Your policy network won't quite match your current situation, but
> at least your value network and search will.
>
> On Tue, Feb 13, 2018 at 10:05 AM, "Ingo Althöfer" <3-hirn-ver...@gmx.de>
> wrote:
>
>> Hello,
>>
>> what is known about proper MCTS procedures for games
>> which do not only have wins and losses, but also draws
>> (like chess, Shogi or Go with integral komi)?
>>
>> Should neural nets provide (win, draw, loss)-probabilities
>> for positions in such games?
>>
>> Ingo.
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
>
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] MCTS with win-draw-loss scores

2018-02-13 Thread David Wu
Actually this pretty much solves the whole issue right? Of course the proof
would be to actually test it out, but it seems to me a pretty
straightforward solution, not nontrivial at all.

On Feb 13, 2018 10:52 AM, "David Wu"  wrote:

Seems to me like you could fix that in the policy too by providing an input
feature plane that indicates the value of a draw, whether 0 as normal, or
-1 for must-win, or -1/3 for 3/1/0, or 1 for only-need-not-lose, etc.

Then just play games with a variety of values for this parameter in your
self-play training pipeline so the policy net gets exposed to each kind of
game.

On Feb 13, 2018 10:40 AM, "Dan Schmidt"  wrote:

The AlphaZero paper says that they just assign values 1, 0, and -1 to wins,
draws, and losses respectively. This is fine for maximizing your expected
value over an infinite number of games given the way that chess tournaments
(to pick the example that I'm familiar with) are typically scored, where
you get 1, 0.5, and 0 points respectively for wins, draws, and losses.

However 1) not all tournaments use this scoring system (3/1/0 is popular
these days, to discourage draws), and 2) this system doesn't account for
must-win situations where a draw is as bad as a loss (say you are 1 point
behind your opponent and it's the last game of a match). Ideally you'd keep
track of all three probabilities and use some linear meta-scoring function
on top of them. I don't think it's trivial to extend the AlphaZero
architecture to handle this, though. Maybe it is sufficient to train with
the standard meta-scoring (while keeping track of the separate W/D/L
probabilities) but then use the currently applicable meta-scoring while
playing. Your policy network won't quite match your current situation, but
at least your value network and search will.

On Tue, Feb 13, 2018 at 10:05 AM, "Ingo Althöfer" <3-hirn-ver...@gmx.de>
wrote:

> Hello,
>
> what is known about proper MCTS procedures for games
> which do not only have wins and losses, but also draws
> (like chess, Shogi or Go with integral komi)?
>
> Should neural nets provide (win, draw, loss)-probabilities
> for positions in such games?
>
> Ingo.
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go



___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] MCTS with win-draw-loss scores

2018-02-13 Thread David Wu
Seems to me like you could fix that in the policy too by providing an input
feature plane that indicates the value of a draw, whether 0 as normal, or
-1 for must-win, or -1/3 for 3/1/0, or 1 for only-need-not-lose, etc.

Then just play games with a variety of values for this parameter in your
self-play training pipeline so the policy net gets exposed to each kind of
game.

On Feb 13, 2018 10:40 AM, "Dan Schmidt"  wrote:

The AlphaZero paper says that they just assign values 1, 0, and -1 to wins,
draws, and losses respectively. This is fine for maximizing your expected
value over an infinite number of games given the way that chess tournaments
(to pick the example that I'm familiar with) are typically scored, where
you get 1, 0.5, and 0 points respectively for wins, draws, and losses.

However 1) not all tournaments use this scoring system (3/1/0 is popular
these days, to discourage draws), and 2) this system doesn't account for
must-win situations where a draw is as bad as a loss (say you are 1 point
behind your opponent and it's the last game of a match). Ideally you'd keep
track of all three probabilities and use some linear meta-scoring function
on top of them. I don't think it's trivial to extend the AlphaZero
architecture to handle this, though. Maybe it is sufficient to train with
the standard meta-scoring (while keeping track of the separate W/D/L
probabilities) but then use the currently applicable meta-scoring while
playing. Your policy network won't quite match your current situation, but
at least your value network and search will.

On Tue, Feb 13, 2018 at 10:05 AM, "Ingo Althöfer" <3-hirn-ver...@gmx.de>
wrote:

> Hello,
>
> what is known about proper MCTS procedures for games
> which do not only have wins and losses, but also draws
> (like chess, Shogi or Go with integral komi)?
>
> Should neural nets provide (win, draw, loss)-probabilities
> for positions in such games?
>
> Ingo.
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go



___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] MCTS with win-draw-loss scores

2018-02-13 Thread Gian-Carlo Pascutto
On 13-02-18 16:05, "Ingo Althöfer" wrote:
> Hello,
> 
> what is known about proper MCTS procedures for games
> which do not only have wins and losses, but also draws
> (like chess, Shogi or Go with integral komi)?
> 
> Should neural nets provide (win, draw, loss)-probabilities
> for positions in such games?

I treat draw the same as a 50% win-rate score. Works well enough, don't
really see what advantages treating it separately would give.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] MCTS with win-draw-loss scores

2018-02-13 Thread Dan Schmidt
The AlphaZero paper says that they just assign values 1, 0, and -1 to wins,
draws, and losses respectively. This is fine for maximizing your expected
value over an infinite number of games given the way that chess tournaments
(to pick the example that I'm familiar with) are typically scored, where
you get 1, 0.5, and 0 points respectively for wins, draws, and losses.

However 1) not all tournaments use this scoring system (3/1/0 is popular
these days, to discourage draws), and 2) this system doesn't account for
must-win situations where a draw is as bad as a loss (say you are 1 point
behind your opponent and it's the last game of a match). Ideally you'd keep
track of all three probabilities and use some linear meta-scoring function
on top of them. I don't think it's trivial to extend the AlphaZero
architecture to handle this, though. Maybe it is sufficient to train with
the standard meta-scoring (while keeping track of the separate W/D/L
probabilities) but then use the currently applicable meta-scoring while
playing. Your policy network won't quite match your current situation, but
at least your value network and search will.

On Tue, Feb 13, 2018 at 10:05 AM, "Ingo Althöfer" <3-hirn-ver...@gmx.de>
wrote:

> Hello,
>
> what is known about proper MCTS procedures for games
> which do not only have wins and losses, but also draws
> (like chess, Shogi or Go with integral komi)?
>
> Should neural nets provide (win, draw, loss)-probabilities
> for positions in such games?
>
> Ingo.
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] MCTS with win-draw-loss scores

2018-02-13 Thread Ingo Althöfer
Hello,

what is known about proper MCTS procedures for games
which do not only have wins and losses, but also draws
(like chess, Shogi or Go with integral komi)?

Should neural nets provide (win, draw, loss)-probabilities
for positions in such games?

Ingo.
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go