Re: [Computer-go] 9x9 is last frontier?

2018-03-06 Thread Dan
Alpha-beta rollouts is like MCTS without playouts (as in AlphaZero), and
something that can also do alpha-beta pruning.

With standard MCTS, the tree converges to a minmax tree not an alpha-beta
tree, so as you know there is a huge branching factor difference there.

For MCTS to become competitive with alpha-beta, one has to use either
alpha-beta rollouts OR as in AlphaZero's case
pick accurately, say the best 3 moves with its PUCT bias, and then search
those with MCTS that would converge to minimax (not alpha-beta).

I went for the latter (recasting alphabeta in rollouts form which can be
combined with standard MCTS as well as standard recursive alpha-beta
implementation)

The paper is here
https://www.microsoft.com/en-us/research/wp-content/uploads/2014/11/huang_rollout.pdf

If it was upto me, I would have used standard alpha-beta search with
policy/value networks to improve move sorting and evaluation resp.

It seems AlphaZero's MCTS approach hinged on the policy network accurately
picking the best n moves (while still using minmax) AND then
ofcourse not making blunders. I think what "shallow traps" are some bad
looking moves turning out to be brilliant tactically later. How the policy
nework able to identify those moves ? A full width like alpha-beta does not
miss those.

Daniel



On Tue, Mar 6, 2018 at 1:23 PM, Álvaro Begué  wrote:

> Sorry, I haven't been paying enough attention lately to know what
> "alpha-beta rollouts" means precisely. Can you either describe them or give
> me a reference?
>
> Thanks,
> Álvaro.
>
>
>
> On Tue, Mar 6, 2018 at 1:49 PM, Dan  wrote:
>
>> I did a quick test with my MCTS chess engine wth two different
>> implementations.
>> A standard MCTS with averaging, and MCTS with alpha-beta rollouts. The
>> result is like a 600 elo difference
>>
>> Finished game 44 (scorpio-pmcts vs scorpio-mcts): 1/2-1/2 {Draw by 3-fold
>> repetition}
>> Score of scorpio-mcts vs scorpio-pmcts: 41 - 1 - 2  [0.955] 44
>> Elo difference: 528.89 +/- nan
>>
>> scorpio-mcts uses alpha-beta rollouts
>> scorpio-pmcts is "pure" mcts with averaging and UCB formula.
>>
>> Daniel
>>
>> On Tue, Mar 6, 2018 at 11:46 AM, Dan  wrote:
>>
>>> I am pretty sure it is an MCTS problem and I suspect not something that
>>> could be easily solved with a policy network (could be wrong hree). My
>>> opinon is that DCNN is not
>>> a miracle worker (as somebody already mentioned here) and it is going to
>>> fail  resolving tactics.  I would be more than happy with it if it has same
>>> power as a qsearch to be honest.
>>>
>>> Search traps are the major problem with games like Chess, and what makes
>>> transitioning the success of DCNN from Go to Chess non trivial.
>>> The following paper discusses shallow traps that are prevalent in chess.
>>> ( https://www.aaai.org/ocs/index.php/ICAPS/ICAPS10/paper/downl
>>> oad/1458/1571 )
>>> They mention traps make MCTS very inefficient.  Even if the MCTS is
>>> given 50x more time is needed by an exhaustive minimax tree, it could fail
>>> to find a level-5 or level-7 trap.
>>> It will spend, f.i, 95% of its time searching an asymetric tree of depth
>>> > 7 when a shallow trap of depth-7 exists, thus, missing to find the
>>> level-7 trap.
>>> This is very hard to solve even if you have unlimited power.
>>>
>>> The plain MCTS as used by AlphaZero is the most ill-suited MCTS version
>>> in my opinion and i have hard a hard time seeing how it can be competitive
>>> with Stockfish tactically.
>>>
>>> My MCTS chess engine with  AlphaZero like MCTS was averaging was missing
>>> a lot of tactics. I don't use policy or eval networks but qsearch() for
>>> eval, and the policy is basically
>>> choosing which ever moves leads to a higher eval.
>>>
>>> a) My first improvement to the MCTS is to use minimax backups instead of
>>> averaging. This was an improvmenet but not something that would solve the
>>> traps
>>>
>>> b) My second improvment is to use alphabeta rollouts. This is a rollouts
>>> version that can do nullmove and LMR etc... This is a huge improvment and
>>> none of the MCTS
>>> versons can match it. More on alpha-beta rollouts here (
>>> https://www.microsoft.com/en-us/research/wp-content/upload
>>> s/2014/11/huang_rollout.pdf )
>>>
>>> So AlphaZero used none of the above improvements and yet it seems to be
>>> tactically strong. Leela-Zero suffered from tactical falls left and right
>>> too as I expected.
>>>
>>> So the only explanation left is the policy network able to avoid traps
>>> which I find hard to believe it can identify more than a qsearch level
>>> tactics.
>>>
>>> All I am saying is that my experience (as well as many others) with MCTS
>>> for tactical dominated games is bad, and there must be some breakthrough in
>>> that regard in AlphaZero
>>> for it to be able to compete with Stockfish on a tactical level.
>>>
>>> I am curious how Remi's attempt at Shogi using AlphaZero's method will
>>> turnout.
>>>
>>> 

Re: [Computer-go] 9x9 is last frontier?

2018-03-06 Thread uurtamo .
Summarizing the objections to my (non-evidence-based, but hand-wavy
observationally-based) assertion that 9x9 is going down anytime someone
really wants it to go down, I get the following:

* value networks can't hack it (okay, maybe? does this make it less likely?
-- we shouldn't expect to cut-and-paste.)
* double ko is some kind of super special problem on 9x9
* the margins are way narrower

the second issue (ko in general, but multi-ko) is exactly why go is
pspace-complete (I made a rough argument for the fact that under slight
relaxation it isn't, but didn't flesh it out or publish it).

I am not feeling strong arguments about the overall picture (i.e. it's
super much harder than 19x19 to beat humans at) other than that the margins
are narrower.

Does anyone else have a different synopsis?

Thanks,

Steve




On Tue, Mar 6, 2018 at 12:17 PM, Brian Sheppard via Computer-go <
computer-go@computer-go.org> wrote:

> Well, AlphaZero did fine at chess tactics, and the papers are clear on the
> details. There must be an error in your deductions somewhere.
>
>
>
> *From:* Computer-go [mailto:computer-go-boun...@computer-go.org] *On
> Behalf Of *Dan
> *Sent:* Tuesday, March 6, 2018 1:46 PM
>
> *To:* computer-go@computer-go.org
> *Subject:* Re: [Computer-go] 9x9 is last frontier?
>
>
>
> I am pretty sure it is an MCTS problem and I suspect not something that
> could be easily solved with a policy network (could be wrong hree). My
> opinon is that DCNN is not
>
> a miracle worker (as somebody already mentioned here) and it is going to
> fail  resolving tactics.  I would be more than happy with it if it has same
> power as a qsearch to be honest.
>
>
>
> Search traps are the major problem with games like Chess, and what makes
> transitioning the success of DCNN from Go to Chess non trivial.
>
> The following paper discusses shallow traps that are prevalent in chess. (
> https://www.aaai.org/ocs/index.php/ICAPS/ICAPS10/paper/download/1458/1571
> )
>
> They mention traps make MCTS very inefficient.  Even if the MCTS is given
> 50x more time is needed by an exhaustive minimax tree, it could fail to
> find a level-5 or level-7 trap.
>
> It will spend, f.i, 95% of its time searching an asymetric tree of depth >
> 7 when a shallow trap of depth-7 exists, thus, missing to find the level-7
> trap.
>
> This is very hard to solve even if you have unlimited power.
>
>
>
> The plain MCTS as used by AlphaZero is the most ill-suited MCTS version in
> my opinion and i have hard a hard time seeing how it can be competitive
> with Stockfish tactically.
>
>
>
> My MCTS chess engine with  AlphaZero like MCTS was averaging was missing a
> lot of tactics. I don't use policy or eval networks but qsearch() for eval,
> and the policy is basically
>
> choosing which ever moves leads to a higher eval.
>
>
>
> a) My first improvement to the MCTS is to use minimax backups instead of
> averaging. This was an improvmenet but not something that would solve the
> traps
>
>
>
> b) My second improvment is to use alphabeta rollouts. This is a rollouts
> version that can do nullmove and LMR etc... This is a huge improvment and
> none of the MCTS
>
> versons can match it. More on alpha-beta rollouts here (
> https://www.microsoft.com/en-us/research/wp-content/
> uploads/2014/11/huang_rollout.pdf )
>
>
>
> So AlphaZero used none of the above improvements and yet it seems to be
> tactically strong. Leela-Zero suffered from tactical falls left and right
> too as I expected.
>
>
>
> So the only explanation left is the policy network able to avoid traps
> which I find hard to believe it can identify more than a qsearch level
> tactics.
>
>
>
> All I am saying is that my experience (as well as many others) with MCTS
> for tactical dominated games is bad, and there must be some breakthrough in
> that regard in AlphaZero
>
> for it to be able to compete with Stockfish on a tactical level.
>
>
>
> I am curious how Remi's attempt at Shogi using AlphaZero's method will
> turnout.
>
>
>
> regards,
>
> Daniel
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Tue, Mar 6, 2018 at 9:41 AM, Brian Sheppard via Computer-go <
> computer-go@computer-go.org> wrote:
>
> Training on Stockfish games is guaranteed to produce a blunder-fest,
> because there are no blunders in the training set and therefore the policy
> network never learns how to refute blunders.
>
>
>
> This is not a flaw in MCTS, but rather in the policy network. MCTS will
> eventually search every move infinitely often, producing asymptotically
> optimal play. But if the policy network does not provide the guidance
> necessary to rapidly refute the blunders that occur in the search, then
> convergence of MCTS to optimal play will be very slow.
>
>
>
> It is necessary for the network to train on self-play games using MCTS.
> For instance, the AGZ approach samples next states during training games by
> sampling from the distribution of visits in the search. Specifically: not
> by choosing the most-visited play!
>
>
>
> 

Re: [Computer-go] 9x9 is last frontier?

2018-03-06 Thread Álvaro Begué
Sorry, I haven't been paying enough attention lately to know what
"alpha-beta rollouts" means precisely. Can you either describe them or give
me a reference?

Thanks,
Álvaro.



On Tue, Mar 6, 2018 at 1:49 PM, Dan  wrote:

> I did a quick test with my MCTS chess engine wth two different
> implementations.
> A standard MCTS with averaging, and MCTS with alpha-beta rollouts. The
> result is like a 600 elo difference
>
> Finished game 44 (scorpio-pmcts vs scorpio-mcts): 1/2-1/2 {Draw by 3-fold
> repetition}
> Score of scorpio-mcts vs scorpio-pmcts: 41 - 1 - 2  [0.955] 44
> Elo difference: 528.89 +/- nan
>
> scorpio-mcts uses alpha-beta rollouts
> scorpio-pmcts is "pure" mcts with averaging and UCB formula.
>
> Daniel
>
> On Tue, Mar 6, 2018 at 11:46 AM, Dan  wrote:
>
>> I am pretty sure it is an MCTS problem and I suspect not something that
>> could be easily solved with a policy network (could be wrong hree). My
>> opinon is that DCNN is not
>> a miracle worker (as somebody already mentioned here) and it is going to
>> fail  resolving tactics.  I would be more than happy with it if it has same
>> power as a qsearch to be honest.
>>
>> Search traps are the major problem with games like Chess, and what makes
>> transitioning the success of DCNN from Go to Chess non trivial.
>> The following paper discusses shallow traps that are prevalent in chess.
>> ( https://www.aaai.org/ocs/index.php/ICAPS/ICAPS10/paper/downl
>> oad/1458/1571 )
>> They mention traps make MCTS very inefficient.  Even if the MCTS is given
>> 50x more time is needed by an exhaustive minimax tree, it could fail to
>> find a level-5 or level-7 trap.
>> It will spend, f.i, 95% of its time searching an asymetric tree of depth
>> > 7 when a shallow trap of depth-7 exists, thus, missing to find the
>> level-7 trap.
>> This is very hard to solve even if you have unlimited power.
>>
>> The plain MCTS as used by AlphaZero is the most ill-suited MCTS version
>> in my opinion and i have hard a hard time seeing how it can be competitive
>> with Stockfish tactically.
>>
>> My MCTS chess engine with  AlphaZero like MCTS was averaging was missing
>> a lot of tactics. I don't use policy or eval networks but qsearch() for
>> eval, and the policy is basically
>> choosing which ever moves leads to a higher eval.
>>
>> a) My first improvement to the MCTS is to use minimax backups instead of
>> averaging. This was an improvmenet but not something that would solve the
>> traps
>>
>> b) My second improvment is to use alphabeta rollouts. This is a rollouts
>> version that can do nullmove and LMR etc... This is a huge improvment and
>> none of the MCTS
>> versons can match it. More on alpha-beta rollouts here (
>> https://www.microsoft.com/en-us/research/wp-content/upload
>> s/2014/11/huang_rollout.pdf )
>>
>> So AlphaZero used none of the above improvements and yet it seems to be
>> tactically strong. Leela-Zero suffered from tactical falls left and right
>> too as I expected.
>>
>> So the only explanation left is the policy network able to avoid traps
>> which I find hard to believe it can identify more than a qsearch level
>> tactics.
>>
>> All I am saying is that my experience (as well as many others) with MCTS
>> for tactical dominated games is bad, and there must be some breakthrough in
>> that regard in AlphaZero
>> for it to be able to compete with Stockfish on a tactical level.
>>
>> I am curious how Remi's attempt at Shogi using AlphaZero's method will
>> turnout.
>>
>> regards,
>> Daniel
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Mar 6, 2018 at 9:41 AM, Brian Sheppard via Computer-go <
>> computer-go@computer-go.org> wrote:
>>
>>> Training on Stockfish games is guaranteed to produce a blunder-fest,
>>> because there are no blunders in the training set and therefore the policy
>>> network never learns how to refute blunders.
>>>
>>>
>>>
>>> This is not a flaw in MCTS, but rather in the policy network. MCTS will
>>> eventually search every move infinitely often, producing asymptotically
>>> optimal play. But if the policy network does not provide the guidance
>>> necessary to rapidly refute the blunders that occur in the search, then
>>> convergence of MCTS to optimal play will be very slow.
>>>
>>>
>>>
>>> It is necessary for the network to train on self-play games using MCTS.
>>> For instance, the AGZ approach samples next states during training games by
>>> sampling from the distribution of visits in the search. Specifically: not
>>> by choosing the most-visited play!
>>>
>>>
>>>
>>> You see how this policy trains both search and evaluation to be
>>> internally consistent? The policy head is trained to refute the bad moves
>>> that will come up in search, and the value head is trained to the value
>>> observed by the full tree.
>>>
>>>
>>>
>>> *From:* Computer-go [mailto:computer-go-boun...@computer-go.org] *On
>>> Behalf Of *Dan
>>> *Sent:* Monday, March 5, 2018 4:55 AM
>>> *To:* computer-go@computer-go.org
>>> *Subject:* 

Re: [Computer-go] 9x9 is last frontier?

2018-03-06 Thread Brian Sheppard via Computer-go
Well, AlphaZero did fine at chess tactics, and the papers are clear on the 
details. There must be an error in your deductions somewhere.

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Dan
Sent: Tuesday, March 6, 2018 1:46 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] 9x9 is last frontier?

 

I am pretty sure it is an MCTS problem and I suspect not something that could 
be easily solved with a policy network (could be wrong hree). My opinon is that 
DCNN is not

a miracle worker (as somebody already mentioned here) and it is going to fail  
resolving tactics.  I would be more than happy with it if it has same power as 
a qsearch to be honest.

 

Search traps are the major problem with games like Chess, and what makes 
transitioning the success of DCNN from Go to Chess non trivial.

The following paper discusses shallow traps that are prevalent in chess. ( 
https://www.aaai.org/ocs/index.php/ICAPS/ICAPS10/paper/download/1458/1571 )

They mention traps make MCTS very inefficient.  Even if the MCTS is given 50x 
more time is needed by an exhaustive minimax tree, it could fail to find a 
level-5 or level-7 trap.

It will spend, f.i, 95% of its time searching an asymetric tree of depth > 7 
when a shallow trap of depth-7 exists, thus, missing to find the level-7 trap.

This is very hard to solve even if you have unlimited power.

 

The plain MCTS as used by AlphaZero is the most ill-suited MCTS version in my 
opinion and i have hard a hard time seeing how it can be competitive with 
Stockfish tactically.

 

My MCTS chess engine with  AlphaZero like MCTS was averaging was missing a lot 
of tactics. I don't use policy or eval networks but qsearch() for eval, and the 
policy is basically

choosing which ever moves leads to a higher eval.

 

a) My first improvement to the MCTS is to use minimax backups instead of 
averaging. This was an improvmenet but not something that would solve the traps

 

b) My second improvment is to use alphabeta rollouts. This is a rollouts 
version that can do nullmove and LMR etc... This is a huge improvment and none 
of the MCTS

versons can match it. More on alpha-beta rollouts here ( 
https://www.microsoft.com/en-us/research/wp-content/uploads/2014/11/huang_rollout.pdf
 )

 

So AlphaZero used none of the above improvements and yet it seems to be 
tactically strong. Leela-Zero suffered from tactical falls left and right too 
as I expected.

 

So the only explanation left is the policy network able to avoid traps which I 
find hard to believe it can identify more than a qsearch level tactics.

 

All I am saying is that my experience (as well as many others) with MCTS for 
tactical dominated games is bad, and there must be some breakthrough in that 
regard in AlphaZero

for it to be able to compete with Stockfish on a tactical level.

 

I am curious how Remi's attempt at Shogi using AlphaZero's method will turnout.

 

regards,

Daniel

 

 

 

 

 

 

 

 

On Tue, Mar 6, 2018 at 9:41 AM, Brian Sheppard via Computer-go 
 > wrote:

Training on Stockfish games is guaranteed to produce a blunder-fest, because 
there are no blunders in the training set and therefore the policy network 
never learns how to refute blunders.

 

This is not a flaw in MCTS, but rather in the policy network. MCTS will 
eventually search every move infinitely often, producing asymptotically optimal 
play. But if the policy network does not provide the guidance necessary to 
rapidly refute the blunders that occur in the search, then convergence of MCTS 
to optimal play will be very slow.

 

It is necessary for the network to train on self-play games using MCTS. For 
instance, the AGZ approach samples next states during training games by 
sampling from the distribution of visits in the search. Specifically: not by 
choosing the most-visited play!

 

You see how this policy trains both search and evaluation to be internally 
consistent? The policy head is trained to refute the bad moves that will come 
up in search, and the value head is trained to the value observed by the full 
tree. 

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org 
 ] On Behalf Of Dan
Sent: Monday, March 5, 2018 4:55 AM
To: computer-go@computer-go.org  
Subject: Re: [Computer-go] 9x9 is last frontier?

 

Actually prior to this it was trained with hundreds of thousands of stockfish 
games and didn’t do well on tactics (the games were actually a blunder fest). I 
believe this is a problem of the MCTS used and not due to for lack of training. 

 

Go is a strategic game so that is different from chess that is full of traps.   
  

I m not surprised Lela zero did well in go.

 

On Mon, Mar 5, 2018 at 2:16 AM Gian-Carlo Pascutto  > wrote:

On 02-03-18 17:07, Dan wrote:
> Leela-chess is 

[Computer-go] Game-independent AlphaZero approach

2018-03-06 Thread Ingo Althöfer
Hi Remi, hi friends,

> For the moment, my main objective is shogi. I 
> will participate in the World Computer Shogi 
> Championship in May. 

Good luck! Please, keep us informed when
the tournament is running.


> So I am developing a game-independent AlphaZero framework. 

I am hoping several people are working in this direction.

Explanation: In 1998 "Zillions of Games" (by Mark Lefler and
Jeff Mallet) was made public. It is a multi-game program. The
user only has to formulate the rules of a game in a script
language (zrf), and then the engine is playing this game, based
on alpha-beta with an evaluation function where each piece 
gets its (local+global) mobility assigned. (Mark Lefler is
co-programmer of current top chess engine Komodo.)
 
In 2008, Cameron Browne published his Ph.D. thesis on "automated
game design", where a new game is evaluated by selfplay of an
alpha-beta engine. 

It would be fantastic to have new programs which learn to play
newly designed games in AlphaZero mode. This would revolutionize
the design of new board games.


Side question: Is someone investigating AlphaZero approaches for
non-zero-sum games of for games with more than two players?

Ingo.
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] 9x9 is last frontier?

2018-03-06 Thread Dan
I did a quick test with my MCTS chess engine wth two different
implementations.
A standard MCTS with averaging, and MCTS with alpha-beta rollouts. The
result is like a 600 elo difference

Finished game 44 (scorpio-pmcts vs scorpio-mcts): 1/2-1/2 {Draw by 3-fold
repetition}
Score of scorpio-mcts vs scorpio-pmcts: 41 - 1 - 2  [0.955] 44
Elo difference: 528.89 +/- nan

scorpio-mcts uses alpha-beta rollouts
scorpio-pmcts is "pure" mcts with averaging and UCB formula.

Daniel

On Tue, Mar 6, 2018 at 11:46 AM, Dan  wrote:

> I am pretty sure it is an MCTS problem and I suspect not something that
> could be easily solved with a policy network (could be wrong hree). My
> opinon is that DCNN is not
> a miracle worker (as somebody already mentioned here) and it is going to
> fail  resolving tactics.  I would be more than happy with it if it has same
> power as a qsearch to be honest.
>
> Search traps are the major problem with games like Chess, and what makes
> transitioning the success of DCNN from Go to Chess non trivial.
> The following paper discusses shallow traps that are prevalent in chess. (
> https://www.aaai.org/ocs/index.php/ICAPS/ICAPS10/paper/download/1458/1571
> )
> They mention traps make MCTS very inefficient.  Even if the MCTS is given
> 50x more time is needed by an exhaustive minimax tree, it could fail to
> find a level-5 or level-7 trap.
> It will spend, f.i, 95% of its time searching an asymetric tree of depth >
> 7 when a shallow trap of depth-7 exists, thus, missing to find the level-7
> trap.
> This is very hard to solve even if you have unlimited power.
>
> The plain MCTS as used by AlphaZero is the most ill-suited MCTS version in
> my opinion and i have hard a hard time seeing how it can be competitive
> with Stockfish tactically.
>
> My MCTS chess engine with  AlphaZero like MCTS was averaging was missing a
> lot of tactics. I don't use policy or eval networks but qsearch() for eval,
> and the policy is basically
> choosing which ever moves leads to a higher eval.
>
> a) My first improvement to the MCTS is to use minimax backups instead of
> averaging. This was an improvmenet but not something that would solve the
> traps
>
> b) My second improvment is to use alphabeta rollouts. This is a rollouts
> version that can do nullmove and LMR etc... This is a huge improvment and
> none of the MCTS
> versons can match it. More on alpha-beta rollouts here (
> https://www.microsoft.com/en-us/research/wp-content/
> uploads/2014/11/huang_rollout.pdf )
>
> So AlphaZero used none of the above improvements and yet it seems to be
> tactically strong. Leela-Zero suffered from tactical falls left and right
> too as I expected.
>
> So the only explanation left is the policy network able to avoid traps
> which I find hard to believe it can identify more than a qsearch level
> tactics.
>
> All I am saying is that my experience (as well as many others) with MCTS
> for tactical dominated games is bad, and there must be some breakthrough in
> that regard in AlphaZero
> for it to be able to compete with Stockfish on a tactical level.
>
> I am curious how Remi's attempt at Shogi using AlphaZero's method will
> turnout.
>
> regards,
> Daniel
>
>
>
>
>
>
>
>
> On Tue, Mar 6, 2018 at 9:41 AM, Brian Sheppard via Computer-go <
> computer-go@computer-go.org> wrote:
>
>> Training on Stockfish games is guaranteed to produce a blunder-fest,
>> because there are no blunders in the training set and therefore the policy
>> network never learns how to refute blunders.
>>
>>
>>
>> This is not a flaw in MCTS, but rather in the policy network. MCTS will
>> eventually search every move infinitely often, producing asymptotically
>> optimal play. But if the policy network does not provide the guidance
>> necessary to rapidly refute the blunders that occur in the search, then
>> convergence of MCTS to optimal play will be very slow.
>>
>>
>>
>> It is necessary for the network to train on self-play games using MCTS.
>> For instance, the AGZ approach samples next states during training games by
>> sampling from the distribution of visits in the search. Specifically: not
>> by choosing the most-visited play!
>>
>>
>>
>> You see how this policy trains both search and evaluation to be
>> internally consistent? The policy head is trained to refute the bad moves
>> that will come up in search, and the value head is trained to the value
>> observed by the full tree.
>>
>>
>>
>> *From:* Computer-go [mailto:computer-go-boun...@computer-go.org] *On
>> Behalf Of *Dan
>> *Sent:* Monday, March 5, 2018 4:55 AM
>> *To:* computer-go@computer-go.org
>> *Subject:* Re: [Computer-go] 9x9 is last frontier?
>>
>>
>>
>> Actually prior to this it was trained with hundreds of thousands of
>> stockfish games and didn’t do well on tactics (the games were actually a
>> blunder fest). I believe this is a problem of the MCTS used and not due to
>> for lack of training.
>>
>>
>>
>> Go is a strategic game so that is different from chess that is 

Re: [Computer-go] 9x9 is last frontier?

2018-03-06 Thread Dan
I am pretty sure it is an MCTS problem and I suspect not something that
could be easily solved with a policy network (could be wrong hree). My
opinon is that DCNN is not
a miracle worker (as somebody already mentioned here) and it is going to
fail  resolving tactics.  I would be more than happy with it if it has same
power as a qsearch to be honest.

Search traps are the major problem with games like Chess, and what makes
transitioning the success of DCNN from Go to Chess non trivial.
The following paper discusses shallow traps that are prevalent in chess. (
https://www.aaai.org/ocs/index.php/ICAPS/ICAPS10/paper/download/1458/1571 )
They mention traps make MCTS very inefficient.  Even if the MCTS is given
50x more time is needed by an exhaustive minimax tree, it could fail to
find a level-5 or level-7 trap.
It will spend, f.i, 95% of its time searching an asymetric tree of depth >
7 when a shallow trap of depth-7 exists, thus, missing to find the level-7
trap.
This is very hard to solve even if you have unlimited power.

The plain MCTS as used by AlphaZero is the most ill-suited MCTS version in
my opinion and i have hard a hard time seeing how it can be competitive
with Stockfish tactically.

My MCTS chess engine with  AlphaZero like MCTS was averaging was missing a
lot of tactics. I don't use policy or eval networks but qsearch() for eval,
and the policy is basically
choosing which ever moves leads to a higher eval.

a) My first improvement to the MCTS is to use minimax backups instead of
averaging. This was an improvmenet but not something that would solve the
traps

b) My second improvment is to use alphabeta rollouts. This is a rollouts
version that can do nullmove and LMR etc... This is a huge improvment and
none of the MCTS
versons can match it. More on alpha-beta rollouts here (
https://www.microsoft.com/en-us/research/wp-content/uploads/2014/11/huang_rollout.pdf
)

So AlphaZero used none of the above improvements and yet it seems to be
tactically strong. Leela-Zero suffered from tactical falls left and right
too as I expected.

So the only explanation left is the policy network able to avoid traps
which I find hard to believe it can identify more than a qsearch level
tactics.

All I am saying is that my experience (as well as many others) with MCTS
for tactical dominated games is bad, and there must be some breakthrough in
that regard in AlphaZero
for it to be able to compete with Stockfish on a tactical level.

I am curious how Remi's attempt at Shogi using AlphaZero's method will
turnout.

regards,
Daniel








On Tue, Mar 6, 2018 at 9:41 AM, Brian Sheppard via Computer-go <
computer-go@computer-go.org> wrote:

> Training on Stockfish games is guaranteed to produce a blunder-fest,
> because there are no blunders in the training set and therefore the policy
> network never learns how to refute blunders.
>
>
>
> This is not a flaw in MCTS, but rather in the policy network. MCTS will
> eventually search every move infinitely often, producing asymptotically
> optimal play. But if the policy network does not provide the guidance
> necessary to rapidly refute the blunders that occur in the search, then
> convergence of MCTS to optimal play will be very slow.
>
>
>
> It is necessary for the network to train on self-play games using MCTS.
> For instance, the AGZ approach samples next states during training games by
> sampling from the distribution of visits in the search. Specifically: not
> by choosing the most-visited play!
>
>
>
> You see how this policy trains both search and evaluation to be internally
> consistent? The policy head is trained to refute the bad moves that will
> come up in search, and the value head is trained to the value observed by
> the full tree.
>
>
>
> *From:* Computer-go [mailto:computer-go-boun...@computer-go.org] *On
> Behalf Of *Dan
> *Sent:* Monday, March 5, 2018 4:55 AM
> *To:* computer-go@computer-go.org
> *Subject:* Re: [Computer-go] 9x9 is last frontier?
>
>
>
> Actually prior to this it was trained with hundreds of thousands of
> stockfish games and didn’t do well on tactics (the games were actually a
> blunder fest). I believe this is a problem of the MCTS used and not due to
> for lack of training.
>
>
>
> Go is a strategic game so that is different from chess that is full of
> traps.
>
> I m not surprised Lela zero did well in go.
>
>
>
> On Mon, Mar 5, 2018 at 2:16 AM Gian-Carlo Pascutto  wrote:
>
> On 02-03-18 17:07, Dan wrote:
> > Leela-chess is not performing well enough
>
> I don't understand how one can say that given that they started with the
> random network last week only and a few clients. Of course it's bad!
> That doesn't say anything about the approach.
>
> Leela Zero has gotten strong but it has been learning for *months* with
> ~400 people. It also took a while to get to 30 kyu.
>
> --
> GCP
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> 

Re: [Computer-go] Crazy Stone is back

2018-03-06 Thread Hideki Kato
valky...@phmp.se: <19f31e7e5cdf310b9afa91f577997...@phmp.se>:
>I think you misunderstood what I wrote,
>if perfect play on 9x9 is 6000 Elo, then if the value function is 3000 
>Elo and MC eval is 2000 Elo with 1 second thinking time then  it might 
>be that the combination of a value function and MC eval ends up being 
>2700 Elo. It could also be that it ends up at 3200 Elo.
>
>I personally believe MC eval alone can be very strong so it might be 
>that the capacity of a neural network is not enough to replace MC eval.
>
>When I wrote "converge to perfection" it was not to claim that the Zero 
>approach reaches perfection, just that it get stronger over time.

How to gurantee the improvement does not stop nor 
oscillate?  Actually, the first instance (40 layer version) 
of AlphaGo Zero stopped improvements in three days (at 
least looks so).

>The interesting question is if old school MC evaluation can fill up the 
>knowledge gaps of the value function.

My point is that.  As value networks approximate the value 
function in vary rough manner (ie, smoother parts) due to 
not enough freedom and/or samples and MC rollouts can 
implement (maybe partly) the detail parts of the function, 
mixing those two could yields better approximation (in 
theory :).

Hideki

>For my own Odin project I am not working on the MC evaluation currently, 
>since only when I a have final neural network solution can I see which 
>weaknesses I need to fix.
>
>Best
>Magnus
>
>
>On 2018-03-05 21:12, Hideki Kato wrote:
>> DCNNs are not magic but just non-linear continuous function
>> approximators with finite freedom and we can provide up to
>> 10^8 samples (board positions) in practice.
>> 
>> Why do most people believe VN can approximate (perfect or
>> near perfect) value function?  What do they estimate the
>> complexity of the value function for 19x19 Go?
>> 
>> valky...@phmp.se: :
>>> My guess is that there is some kind of threshold depending on the
>>> relative strength of MC eval and the value function of the NN.
>>> 
>>> If the value function is stronger than MC eval I would guess MCEval
>>> turns into a bad noisy feature with little benefit.
>>> 
>>> Depending on how strong MC eval is this threshold is probably very
>>> different between engines. Also i can imagine that NN value function 
>>> can
>>> have some gaping holes in its knowledge that even simple MC eval can
>>> patch up. Probably true for supervised learning where training data
>>> probably has a lot of holes since bad moves are not in the data.
>>> 
>>> The Zero approach is different because it should converge to 
>>> perfection
>>> in the limit, thus overcome any weaknesses of the value function early
>>> on. At least in theory.
>>> 
>>> 
>>> On 2018-03-05 14:04, Gian-Carlo Pascutto wrote:
 On 5/03/2018 12:28, valky...@phmp.se wrote:
> Remi twittered more details here (see the discussion with gghideki:
> 
> https://twitter.com/Remi_Coulom/status/969936332205318144
 
 Thank you. So Remi gave up on rollouts as well. Interesting 
 "difference
 of opinion" there with Zen.
 
 Last time I tested this in regular Leela, playouts were beneficial, 
 but
 this was before combined value+policy nets and much more training 
 data
 was available. I do not know what the current status would be.
>>> 
>>> ___
>>> Computer-go mailing list
>>> Computer-go@computer-go.org
>>> http://computer-go.org/mailman/listinfo/computer-go
>___
>Computer-go mailing list
>Computer-go@computer-go.org
>http://computer-go.org/mailman/listinfo/computer-go
-- 
Hideki Kato 
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] 9x9 is last frontier?

2018-03-06 Thread Brian Sheppard via Computer-go
Training on Stockfish games is guaranteed to produce a blunder-fest, because 
there are no blunders in the training set and therefore the policy network 
never learns how to refute blunders.

 

This is not a flaw in MCTS, but rather in the policy network. MCTS will 
eventually search every move infinitely often, producing asymptotically optimal 
play. But if the policy network does not provide the guidance necessary to 
rapidly refute the blunders that occur in the search, then convergence of MCTS 
to optimal play will be very slow.

 

It is necessary for the network to train on self-play games using MCTS. For 
instance, the AGZ approach samples next states during training games by 
sampling from the distribution of visits in the search. Specifically: not by 
choosing the most-visited play!

 

You see how this policy trains both search and evaluation to be internally 
consistent? The policy head is trained to refute the bad moves that will come 
up in search, and the value head is trained to the value observed by the full 
tree. 

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Dan
Sent: Monday, March 5, 2018 4:55 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] 9x9 is last frontier?

 

Actually prior to this it was trained with hundreds of thousands of stockfish 
games and didn’t do well on tactics (the games were actually a blunder fest). I 
believe this is a problem of the MCTS used and not due to for lack of training. 

 

Go is a strategic game so that is different from chess that is full of traps.   
  

I m not surprised Lela zero did well in go.

 

On Mon, Mar 5, 2018 at 2:16 AM Gian-Carlo Pascutto  > wrote:

On 02-03-18 17:07, Dan wrote:
> Leela-chess is not performing well enough

I don't understand how one can say that given that they started with the
random network last week only and a few clients. Of course it's bad!
That doesn't say anything about the approach.

Leela Zero has gotten strong but it has been learning for *months* with
~400 people. It also took a while to get to 30 kyu.

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org  
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Crazy Stone is back

2018-03-06 Thread valkyria

I think you misunderstood what I wrote,
if perfect play on 9x9 is 6000 Elo, then if the value function is 3000 
Elo and MC eval is 2000 Elo with 1 second thinking time then  it might 
be that the combination of a value function and MC eval ends up being 
2700 Elo. It could also be that it ends up at 3200 Elo.


I personally believe MC eval alone can be very strong so it might be 
that the capacity of a neural network is not enough to replace MC eval.


When I wrote "converge to perfection" it was not to claim that the Zero 
approach reaches perfection, just that it get stronger over time.


The interesting question is if old school MC evaluation can fill up the 
knowledge gaps of the value function.


For my own Odin project I am not working on the MC evaluation currently, 
since only when I a have final neural network solution can I see which 
weaknesses I need to fix.


Best
Magnus


On 2018-03-05 21:12, Hideki Kato wrote:

DCNNs are not magic but just non-linear continuous function
approximators with finite freedom and we can provide up to
10^8 samples (board positions) in practice.

Why do most people believe VN can approximate (perfect or
near perfect) value function?  What do they estimate the
complexity of the value function for 19x19 Go?

valky...@phmp.se: :

My guess is that there is some kind of threshold depending on the
relative strength of MC eval and the value function of the NN.

If the value function is stronger than MC eval I would guess MCEval
turns into a bad noisy feature with little benefit.

Depending on how strong MC eval is this threshold is probably very
different between engines. Also i can imagine that NN value function 
can

have some gaping holes in its knowledge that even simple MC eval can
patch up. Probably true for supervised learning where training data
probably has a lot of holes since bad moves are not in the data.

The Zero approach is different because it should converge to 
perfection

in the limit, thus overcome any weaknesses of the value function early
on. At least in theory.


On 2018-03-05 14:04, Gian-Carlo Pascutto wrote:

On 5/03/2018 12:28, valky...@phmp.se wrote:

Remi twittered more details here (see the discussion with gghideki:

https://twitter.com/Remi_Coulom/status/969936332205318144


Thank you. So Remi gave up on rollouts as well. Interesting 
"difference

of opinion" there with Zen.

Last time I tested this in regular Leela, playouts were beneficial, 
but
this was before combined value+policy nets and much more training 
data

was available. I do not know what the current status would be.


___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go