from:"Brian Sheppard via Computer\\\-go"

Re: [Computer-go] MuZero - new paper from DeepMind

2019-11-25 Thread Brian Sheppard via Computer-go

I read through that paper, but I admit that I didn't really get where the extra 
power comes from.


-Original Message-
From: valkyria 
To: Computer-go 
Sent: Mon, Nov 25, 2019 6:41 am
Subject: [Computer-go] MuZero - new paper from DeepMind

Hi,

if anyone still get email from this list:

DeepMind did it again

https://arxiv.org/abs/1911.08265

Best wishes
Magnus Persson




___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Indexing and Searching Go Positions -- Literature Wanted

2019-09-17 Thread Brian Sheppard via Computer-go

I remember a scheme (from Dave Dyer, IIRC) that indexed positions based on the 
points on which the 20th, 40th, 60th,... moves were made. IIRC it was nearly a 
unique key for pro positions.

Best,Brian

-Original Message-
From: Erik van der Werf 
To: computer-go 
Sent: Tue, Sep 17, 2019 5:55 am
Subject: Re: [Computer-go] Indexing and Searching Go Positions -- Literature 
Wanted

Hi Stephen,
I'm not aware of recent published work. There is an ancient document by Antti 
Huima on hash schemes for easy symmetry detection/lookup. Unfortunately his 
implementation was broken, but other schemes have been proposed that solve the 
issue (I found one myself, but I think many others found the same or similar 
solutions). You may want to search the archives for "Zobrist hashing with easy 
transformation comparison". If you like math Nic Schrauolph has an interesting 
solution ;-)
In Steenvreter I implemented a 16-segment scheme with a xor update (for 
rotation, mirroring and color symmetry). In GridMaster I have an experimental 
search feature which is somewhat similar except that I don't use hash keys 
(every possible point on the board simply gets its own bits), and I use 'or' 
instead of 'xor' (so stones that are added are never removed, which makes 
parsing game records extremely fast). This makes it very easy to filter 
positions/games that cannot match, and for the remainder (if needed, dealing 
with captures) it simply replays (which is fast enough because the number of 
remaining games is usually very small). I'm not sure what Kombilo does, but I 
wouldn't be surprised if it's similar. The only thing I haven't implemented yet 
is lookup of translated (shifted) local patterns. Still pondering what's most 
efficient for that, but I could simply run multiple searches with a mask.
Best,Erik

On Tue, Sep 17, 2019 at 10:17 AM Stephen Martindale 
 wrote:
>
> Dear Go programmers,
>
> I'm interested in experimenting with some new ideas for indexing and 
> searching Goban positions and patterns and I want to stand on the shoulders 
> of giants. Which papers, articles, blog posts or open-source code should I 
> read to get concrete knowledge of the approaches used in the past?
>
> I know that Kombilo is (or used to be) the state of the art in this field. 
> The source is available but, beyond reading the Libkombilo sources, are there 
> any other, more human friendly resources out there?
>
> My new ideas are currently insubstantial and vague but I have done some work, 
> in the past, with natural language embeddings and large-database image 
> indexing and searching and concepts from those two domains keep bouncing 
> around in my mind -- I can't help but feel that there must be something there 
> that can be the "next big thing" in Go position indexing.
>
> Any leads would be appreciated.
>
> Stephen Martindale
>
> +49 160 950 27545
> stephen.c.martind...@gmail.com
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Accelerating Self-Play Learning in Go

2019-03-08 Thread Brian Sheppard via Computer-go

>> contrary to intuition built up from earlier-generation MCTS programs in Go, 
>> putting significant weight on score maximization rather than only 
>> win/loss seems to help.

This narrative glosses over important nuances.

Collectively we are trying to find the golden mean of cost efficiency... The 
"Zero" community has proven that you can succeed without knowing anything about 
Go. The rest of us are trying to discover benefits of *some* prior knowledge, 
perhaps because we don't have cloud datacenters with TPUs. 

A lot of the Kago paper shows that techniques that are useful to MCTS/rollout 
programs are also useful for NN programs.

Brief historical summary...

MCTS/rollout programs that maximized win/loss outperformed programs that 
maximized point differential, because occasionally winning by a lot does not 
compensate for a lot of small losses. Rollouts are stochastic, so every 
position has opportunities to win/lose by a lot.

This result has been widely quoted along the lines of, "Go programs should only 
use wins and losses and not use point differential." This has long been known 
to be overstated.

Because using only win/loss has a core problem: a pure win/loss program is 
content to bleed points, which sometimes results in unnecessary losses. 
Fundamentally, it should be easier to win games where the theoretical point 
differential is larger, so losing points contributes to difficulties in 
distinguishing wins from losses using rollouts.

There were two responses to this: dynamic komi and point diff as tiebreaker. 
Dynamic komi adjusts komi when the rollout winning percentage falls outside of 
a range like [40%,60%]. Point-diff-as-tiebreaker reserves 90% of the result for 
the win/loss value of a rollout and 10% for a sigmoid function of the final 
point differential.

IIRC, Pachi invented point-diff-as-tiebreaker. The technique worked in my 
program as well, and it should work in a lot of MCTS/rollout programs.

Kago is using point-diff-as-tiebreaker. That is, the invention is to adapt the 
existing idea to the NN framework.

Did the Kago paper mention of dynamic komi? Kago can use that too, because its 
komi is a settable input parameter.


>Score maximization in self-play means it is encouraged to play more 
>aggressively/dangerously, by creating life/death problems on the board.

Point-diff-as-tiebreaker is *risk-averse*. The purpose is to keep all of the 
points, not to engage in risks to earn more. There is a graph in the Kago paper 
that will help to visualize  how 0.9 * winning + 0.1 * sigmoid(point-diff) 
trades off gains against losses.

Best,
Brian


___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] PUCT formula

2018-03-09 Thread Brian Sheppard via Computer-go

Thanks for the explanation. I agree that there is no actual consistency in 
exploration terms across historical papers.

I confirmed that the PUCT formulas across the AG, AGZ, and AZ papers are all 
consistent. That is unlikely to be an error. So now I am wondering whether the 
faster decay is useful for bootstrapping the learner. A short search (800 
trials) can run out of trials if it sticks with its first thought for too long.

So... following GCP's advice, I will adopt the AGZ formula as published. And 
not worry much about it, as MM advises.  :-)

Many thanks to all.

-Original Message-
From: Computer-go  On Behalf Of Martin 
Mueller
Sent: Friday, March 9, 2018 5:13 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] PUCT formula

I talked to Chenjun just now so this is what we both remember.
The PUCB formula as published in Chris Rosin’s paper actually has an additive 
knowledge term, and it looks nothing like the two different PUCT variants tried 
in AlphaGo and our paper.

Chenjun tried an additive term as in Chris’ paper first, and it did not work 
well for him. Then he tried the “UCT-like” PUCT as in our paper, with the decay 
term under the square root. This worked well for him. He never tried the 
AlphaGo formula with the faster decay.

Beyond the published papers, my memory is that many people tried many different 
combinations of knowledge terms and of decay formulas over time. I have never 
read any systematic comparison, or any principled argument for which type of 
decay is “best” for which type of knowledge, and why. It must depend on the 
quality of knowledge, amongst many other things.

There are also earlier MoGo papers that combined many different evaluation 
terms into one formula and tested them empirically.

Martin

> However, the formula in the AGZ paper doesn't look like any "UCT variant". 
> Formula from paper: Cpuct * P(s,a) * sqrt(Sum(N(s,b))) / (1 + N(s,a)) Note 
> that there is no logarithmic term, and the division by N+1 falls outside the 
> sqrt. For comparison, a normal UCT term looks like sqrt(ln(sum(N(s,b))) / (1 
> + N))
> 
> Since I asked my question, I found that other people have also noticed a 
> discrepancy. I saw a post on a DeepChem board about this subject. I also 
> found a paper 
> (https://webdocs.cs.ualberta.ca/~mmueller/ps/2016/2016-Integrating-Factorization-Ranked.pdf)
>  by our old friends Chenjun Xiao and Martin Muller:
> 
> "We apply a variant of PUCT [11] formula which is used in AlphaGo [12] to 
> integrate FBT knowledge in MCTS. " But then the formula that they give 
> differs: argmax((Q(s,a) + Cpuct * P(s,a) * sqrt( lg(N(s)) / (1 + N(s,a)))
> 
> I am guessing that Chenjun and Martin decided (or knew) that the AGZ paper 
> was incorrect and modified the equation accordingly.
> 
> Anyone remember anything about this?
> 
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] PUCT formula

2018-03-09 Thread Brian Sheppard via Computer-go

However, the formula in the AGZ paper doesn't look like any "UCT variant". 
Formula from paper: Cpuct * P(s,a) * sqrt(Sum(N(s,b))) / (1 + N(s,a)) Note that 
there is no logarithmic term, and the division by N+1 falls outside the sqrt. 
For comparison, a normal UCT term looks like sqrt(ln(sum(N(s,b))) / (1 + N))

Since I asked my question, I found that other people have also noticed a 
discrepancy. I saw a post on a DeepChem board about this subject. I also found 
a paper 
(https://webdocs.cs.ualberta.ca/~mmueller/ps/2016/2016-Integrating-Factorization-Ranked.pdf)
 by our old friends Chenjun Xiao and Martin Muller: 

"We apply a variant of PUCT [11] formula which is used in AlphaGo [12] to 
integrate FBT knowledge in MCTS. " But then the formula that they give 
differs: argmax((Q(s,a) + Cpuct * P(s,a) * sqrt( lg(N(s)) / (1 + N(s,a)))

I am guessing that Chenjun and Martin decided (or knew) that the AGZ paper was 
incorrect and modified the equation accordingly.

Anyone remember anything about this?

-Original Message-
From: Computer-go <computer-go-boun...@computer-go.org> On Behalf Of Gian-Carlo 
Pascutto
Sent: Friday, March 9, 2018 4:48 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] PUCT formula

On 08-03-18 18:47, Brian Sheppard via Computer-go wrote:
> I recall that someone investigated this question, but I don’t recall 
> the result. What is the formula that AGZ actually uses?

The one mentioned in their paper, I assume.

I investigated both that and the original from the referenced paper, but after 
tuning I saw little meaningful strength difference.

One thing of note is that (IIRC) the AGZ formula keeps scaling the exploration 
term by the policy prior forever. In the original formula, it is a diminishing 
term.

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] PUCT formula

2018-03-08 Thread Brian Sheppard via Computer-go

In the AGZ paper, there is a formula for what they call “a variant of the PUCT 
algorithm”, and they cite a paper from Christopher Rosin: 
http://gauss.ececs.uc.edu/Workshops/isaim2010/papers/rosin.pdf

 

But that paper has a formula that he calls the PUCB formula, which incorporates 
the priors in a different way.

 

And there is something called the PUCT algorithm, from our old friend Olivier 
Teytaud (et al): https://hal.inria.fr/hal-00835352/document/, but it is not 
about incorporating prior probabilities. It is about progressive widening in a 
provably consistent way.

 

I recall that someone investigated this question, but I don’t recall the 
result. What is the formula that AGZ actually uses?

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] On proper naming

2018-03-08 Thread Brian Sheppard via Computer-go

The technique originated with backgammon players in the late 1970's, who would 
roll out positions manually. Ron Tiekert (Scrabble champion) also applied the 
technique to Scrabble, and I took that idea for Maven. It seemed like people 
were using the terms interchangeably.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Darren Cook
Sent: Thursday, March 8, 2018 6:16 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] On proper naming

> but then it does not make sense to call that algorithm "rollout".
> 
> In general: when introducing a new name, care should be taken that the 
> name describes properly what is going on.

Speaking of which, why did people start calling them rollouts instead of 
playouts?

Darren

P.S. And don't get me started on "chains": at one point this seemed to be the 
standard term for a solidly connected set of stones, the basic unit of tactical 
search (as distinguished from a "group", which is made up of 1+ chains). But 
then somewhere along the way people started calling them strings.
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] 9x9 is last frontier?

2018-03-07 Thread Brian Sheppard via Computer-go

-7 exists, thus, missing to find the level-7 trap.

This is very hard to solve even if you have unlimited power.

 

The plain MCTS as used by AlphaZero is the most ill-suited MCTS version in my 
opinion and i have hard a hard time seeing how it can be competitive with 
Stockfish tactically.

 

My MCTS chess engine with  AlphaZero like MCTS was averaging was missing a lot 
of tactics. I don't use policy or eval networks but qsearch() for eval, and the 
policy is basically

choosing which ever moves leads to a higher eval.

 

a) My first improvement to the MCTS is to use minimax backups instead of 
averaging. This was an improvmenet but not something that would solve the traps

 

b) My second improvment is to use alphabeta rollouts. This is a rollouts 
version that can do nullmove and LMR etc... This is a huge improvment and none 
of the MCTS

versons can match it. More on alpha-beta rollouts here ( 
https://www.microsoft.com/en-us/research/wp-content/uploads/2014/11/huang_rollout.pdf
 )

 

So AlphaZero used none of the above improvements and yet it seems to be 
tactically strong. Leela-Zero suffered from tactical falls left and right too 
as I expected.

 

So the only explanation left is the policy network able to avoid traps which I 
find hard to believe it can identify more than a qsearch level tactics.

 

All I am saying is that my experience (as well as many others) with MCTS for 
tactical dominated games is bad, and there must be some breakthrough in that 
regard in AlphaZero

for it to be able to compete with Stockfish on a tactical level.

 

I am curious how Remi's attempt at Shogi using AlphaZero's method will turnout.

 

regards,

Daniel

 

 

 

 

 

 

 

 

On Tue, Mar 6, 2018 at 9:41 AM, Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

Training on Stockfish games is guaranteed to produce a blunder-fest, because 
there are no blunders in the training set and therefore the policy network 
never learns how to refute blunders.

 

This is not a flaw in MCTS, but rather in the policy network. MCTS will 
eventually search every move infinitely often, producing asymptotically optimal 
play. But if the policy network does not provide the guidance necessary to 
rapidly refute the blunders that occur in the search, then convergence of MCTS 
to optimal play will be very slow.

 

It is necessary for the network to train on self-play games using MCTS. For 
instance, the AGZ approach samples next states during training games by 
sampling from the distribution of visits in the search. Specifically: not by 
choosing the most-visited play!

 

You see how this policy trains both search and evaluation to be internally 
consistent? The policy head is trained to refute the bad moves that will come 
up in search, and the value head is trained to the value observed by the full 
tree. 

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org 
<mailto:computer-go-boun...@computer-go.org> ] On Behalf Of Dan
Sent: Monday, March 5, 2018 4:55 AM
To: computer-go@computer-go.org <mailto:computer-go@computer-go.org> 
Subject: Re: [Computer-go] 9x9 is last frontier?

 

Actually prior to this it was trained with hundreds of thousands of stockfish 
games and didn’t do well on tactics (the games were actually a blunder fest). I 
believe this is a problem of the MCTS used and not due to for lack of training. 

 

Go is a strategic game so that is different from chess that is full of traps.   
  

I m not surprised Lela zero did well in go.

 

On Mon, Mar 5, 2018 at 2:16 AM Gian-Carlo Pascutto <g...@sjeng.org 
<mailto:g...@sjeng.org> > wrote:

On 02-03-18 17:07, Dan wrote:
> Leela-chess is not performing well enough

I don't understand how one can say that given that they started with the
random network last week only and a few clients. Of course it's bad!
That doesn't say anything about the approach.

Leela Zero has gotten strong but it has been learning for *months* with
~400 people. It also took a while to get to 30 kyu.

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go


___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 

 


___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 


___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] 9x9 is last frontier?

2018-03-06 Thread Brian Sheppard via Computer-go

Well, AlphaZero did fine at chess tactics, and the papers are clear on the 
details. There must be an error in your deductions somewhere.

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Dan
Sent: Tuesday, March 6, 2018 1:46 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] 9x9 is last frontier?

I am pretty sure it is an MCTS problem and I suspect not something that could 
be easily solved with a policy network (could be wrong hree). My opinon is that 
DCNN is not

a miracle worker (as somebody already mentioned here) and it is going to fail  
resolving tactics.  I would be more than happy with it if it has same power as 
a qsearch to be honest.

Search traps are the major problem with games like Chess, and what makes 
transitioning the success of DCNN from Go to Chess non trivial.

The following paper discusses shallow traps that are prevalent in chess. ( 
https://www.aaai.org/ocs/index.php/ICAPS/ICAPS10/paper/download/1458/1571 )

They mention traps make MCTS very inefficient.  Even if the MCTS is given 50x 
more time is needed by an exhaustive minimax tree, it could fail to find a 
level-5 or level-7 trap.

It will spend, f.i, 95% of its time searching an asymetric tree of depth > 7 
when a shallow trap of depth-7 exists, thus, missing to find the level-7 trap.

This is very hard to solve even if you have unlimited power.

The plain MCTS as used by AlphaZero is the most ill-suited MCTS version in my 
opinion and i have hard a hard time seeing how it can be competitive with 
Stockfish tactically.

My MCTS chess engine with  AlphaZero like MCTS was averaging was missing a lot 
of tactics. I don't use policy or eval networks but qsearch() for eval, and the 
policy is basically

choosing which ever moves leads to a higher eval.

a) My first improvement to the MCTS is to use minimax backups instead of 
averaging. This was an improvmenet but not something that would solve the traps

b) My second improvment is to use alphabeta rollouts. This is a rollouts 
version that can do nullmove and LMR etc... This is a huge improvment and none 
of the MCTS

versons can match it. More on alpha-beta rollouts here ( 
https://www.microsoft.com/en-us/research/wp-content/uploads/2014/11/huang_rollout.pdf
 )

So AlphaZero used none of the above improvements and yet it seems to be 
tactically strong. Leela-Zero suffered from tactical falls left and right too 
as I expected.

So the only explanation left is the policy network able to avoid traps which I 
find hard to believe it can identify more than a qsearch level tactics.

All I am saying is that my experience (as well as many others) with MCTS for 
tactical dominated games is bad, and there must be some breakthrough in that 
regard in AlphaZero

for it to be able to compete with Stockfish on a tactical level.

I am curious how Remi's attempt at Shogi using AlphaZero's method will turnout.

regards,

Daniel

On Tue, Mar 6, 2018 at 9:41 AM, Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

Training on Stockfish games is guaranteed to produce a blunder-fest, because 
there are no blunders in the training set and therefore the policy network 
never learns how to refute blunders.

This is not a flaw in MCTS, but rather in the policy network. MCTS will 
eventually search every move infinitely often, producing asymptotically optimal 
play. But if the policy network does not provide the guidance necessary to 
rapidly refute the blunders that occur in the search, then convergence of MCTS 
to optimal play will be very slow.

It is necessary for the network to train on self-play games using MCTS. For 
instance, the AGZ approach samples next states during training games by 
sampling from the distribution of visits in the search. Specifically: not by 
choosing the most-visited play!

You see how this policy trains both search and evaluation to be internally 
consistent? The policy head is trained to refute the bad moves that will come 
up in search, and the value head is trained to the value observed by the full 
tree. 

From: Computer-go [mailto:computer-go-boun...@computer-go.org 
<mailto:computer-go-boun...@computer-go.org> ] On Behalf Of Dan
Sent: Monday, March 5, 2018 4:55 AM
To: computer-go@computer-go.org <mailto:computer-go@computer-go.org> 
Subject: Re: [Computer-go] 9x9 is last frontier?

Actually prior to this it was trained with hundreds of thousands of stockfish 
games and didn’t do well on tactics (the games were actually a blunder fest). I 
believe this is a problem of the MCTS used and not due to for lack of training. 

Go is a strategic game so that is different from chess that is full of traps.   

I m not surprised Lela zero did well in go.

On Mon, Mar 5, 2018 at 2:16 AM Gian-Carlo Pascutto <g...@sjeng.org 
<mailto:g...@sjeng.org> > wrote:

On 0

Re: [Computer-go] 9x9 is last frontier?

2018-03-06 Thread Brian Sheppard via Computer-go

Training on Stockfish games is guaranteed to produce a blunder-fest, because 
there are no blunders in the training set and therefore the policy network 
never learns how to refute blunders.

 

This is not a flaw in MCTS, but rather in the policy network. MCTS will 
eventually search every move infinitely often, producing asymptotically optimal 
play. But if the policy network does not provide the guidance necessary to 
rapidly refute the blunders that occur in the search, then convergence of MCTS 
to optimal play will be very slow.

 

It is necessary for the network to train on self-play games using MCTS. For 
instance, the AGZ approach samples next states during training games by 
sampling from the distribution of visits in the search. Specifically: not by 
choosing the most-visited play!

 

You see how this policy trains both search and evaluation to be internally 
consistent? The policy head is trained to refute the bad moves that will come 
up in search, and the value head is trained to the value observed by the full 
tree. 

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Dan
Sent: Monday, March 5, 2018 4:55 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] 9x9 is last frontier?

 

Actually prior to this it was trained with hundreds of thousands of stockfish 
games and didn’t do well on tactics (the games were actually a blunder fest). I 
believe this is a problem of the MCTS used and not due to for lack of training. 

 

Go is a strategic game so that is different from chess that is full of traps.   
  

I m not surprised Lela zero did well in go.

 

On Mon, Mar 5, 2018 at 2:16 AM Gian-Carlo Pascutto  > wrote:

On 02-03-18 17:07, Dan wrote:
> Leela-chess is not performing well enough

I don't understand how one can say that given that they started with the
random network last week only and a few clients. Of course it's bad!
That doesn't say anything about the approach.

Leela Zero has gotten strong but it has been learning for *months* with
~400 people. It also took a while to get to 30 kyu.

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org  
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Project Leela Zero

2017-12-29 Thread Brian Sheppard via Computer-go

Seems like extraordinarily fast progress. Great to hear that.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
"Ingo Althöfer"
Sent: Friday, December 29, 2017 12:30 PM
To: computer-go@computer-go.org
Subject: [Computer-go] Project Leela Zero

Hello in the round,

I am not sure how narrowly people from the list are following the progress of 
Gian Carlo Pascutto's project Zero Leela. Therefore, here are some impressions.

The project site is:
http://zero.sjeng.org/

Shortly before Christmas some guys in the German Go mailing list claimed that 
LeelaZero had not yet reached a 25-kyu rank. To get an own impression I (with 
an inactive rating of 18-kyu) went on KGS and played a free game against 
LeelaZSlow. This version takes exactly 28 seconds per move, even in trivial 
situations. Long paragraph, short outcome: I lost clearly. You can download the 
sgf from here:
http://files.gokgs.com/games/2017/12/21/GoIngo-LeelaZSlow.sgf

In the meantime the KGS versions of Leela have made considerable progress. For 
instance, yesterday and today two 1-dans and a 3-dan were beaten in single 
games.
However, Leela also has "strange" weaknesses. The most serious one concerns 
hopeless ladders. The only way out for Leela seems to be to play early 
tengen-like moves (as possible ladder breakers).
At least three versions of LeelaZero are active:
LeelaZFast, LeelaZSlow, and LeelZeroT.

As soon as a new "best" LeelaZero version has emerged in selfplay runs (of 
length 400 games) it substitutes the previous version for play on KGS. 
Currently this happens 1 or 2 times per day.

Ingo.
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AGZ Policy Head

2017-12-29 Thread Brian Sheppard via Computer-go

I agree that having special knowledge for "pass" is not a big compromise, but 
it would not meet the "zero knowledge" goal, no?

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Rémi Coulom
Sent: Friday, December 29, 2017 7:50 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AGZ Policy Head

I also wonder about this. A purely convolutional approach would save a lot of 
weights. The output for pass can be set to be a single bias parameter, 
connected to nothing. Setting pass to a constant might work, too. I don't 
understand the reason for such a complication.

- Mail original -
De: "Andy" 
À: "computer-go" 
Envoyé: Vendredi 29 Décembre 2017 06:47:06
Objet: [Computer-go] AGZ Policy Head

Is there some particular reason AGZ uses two 1x1 filters for the policy head 
instead of one? 

They could also have allowed more, but I guess that would be expensive? I 
calculate that the fully connected layer has 2*361*362 weights, where 2 is the 
number of filters. 

By comparison the value head has only a single 1x1 filter, but it goes to a 
hidden layer of 256. That gives 1*361*256 weights. Why not use two 1x1 filters 
here? Maybe since the final output is only a single scalar it's not needed? 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mcts and tactics

2017-12-19 Thread Brian Sheppard via Computer-go

>I wouldn't find it so surprising if eventually the 20 or 40 block networks 
>develop a set of convolutional channels that traces possible ladders 
>diagonally across the board.

 

Learning the deep tactics is more-or-less guaranteed because of the interaction 
between search and evaluation through the training process.

 

Let’s look at the base case for tactical expertise: the evaluation function has 
learned that capturing stones is good.

 

In that state, a very short ladder can be seen through by tactics (800 nodes of 
search). This results in a corrective force that adjusts the evaluation 
function.

 

Once the evaluation function knows that very short ladders are favorable, then 
the search can see through deeper ladders.

 

This continues as the ladders become longer and longer.

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaZero & Co. still far from perfect play ?

2017-12-08 Thread Brian Sheppard via Computer-go

Agreed.

 

You can push this farther. If we define an “error” as a move that flips the W/L 
state of a Go game, then only the side that is currently winning can make an 
error. Let’s suppose that 6.5 komi is winning for Black. Then Black can make an 
error, and after he does then White can make an error, and so on. So the game 
ends up as a Black win if the total number of errors in the game is even.

 

Now build a spreadsheet modeling the W/L rate predicted for Black at various 
error rates. You will conclude that AZ and AGZ must still have sizeable error 
rates. For example, if the error rate were 1% then Black would win a lot more 
games than White, yet AZ and AGZ are still pretty close to 50/50.

 

A model like this has limits, but we can be certain that there is a lot of room 
for improvement.

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
ChtiGo via Computer-go
Sent: Friday, December 8, 2017 5:12 AM
To: computer-go@computer-go.org
Cc: patrick.bar...@laposte.net
Subject: [Computer-go] AlphaZero & Co. still far from perfect play ?

 

Hi,

 

In go, komi allows for a very fine balancing of the winning probability, 
whoever the first player to move.

 

However, under perfect play, for a given komi, Black has a either a win or a 
loss before putting any stone on the board. E.g., could be a win for 6.5 pts 
komi and a loss for 7.5 komi.

 

'Close to perfect play' (is this has any sense), one would expect observing 
unbalanced win/loss ratios for a player, when taking Black or taking White. 
This would be the equivallent of the 'drawish' character of chess, observed for 
top programs but already at amateur level.

 

At the level of play of AlphaZero and AlphagoZero (~5200 ELO FWIW), AlphaZero 
has Win/Loss count against AG0 3-day which seems quite independent of the color 
(31/19 with White, 29/21 with Black, see table 1 of AlphaZero paper). In great 
contrast with AZ/SF in chess tournament (25/0 with White, 3/0 with Black).

 

Although this statistics might be insufficient, can we elaborate a bit on this 
and hypothetize that:

 

1) Commonly used 7.5 pts komi defined from human games is so far a good value 
up to top programs level.

2) The level of play of AlphaZero (@5200 ELO), running of big machine, is still 
far from perfect play and there is there is plenty of room for improvement. 
Nothing in view close to a winning strategy.

3) Go (with komi) is deeper than chess (very biased statement ;-), since SOA 
programs on SOA machines still cannot find any 'color benefit'.

 

Regards,
Patrick

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-07 Thread Brian Sheppard via Computer-go

AZ scalability looks good in that diagram, and it is certainly a good start, 
but it only goes out through 10 sec/move. Also, if the hardware is 7x better 
for AZ than SF, then should we elongate the curve for AZ by 7x? Or compress the 
curve for SF by 7x? Or some combination? Or take the data at face value?

I just noticed that AZ has some losses when the opening was forced into 
specific variations as in Table 2. So we know that AZ is not perfect, but 19 
losses in 1200 games is hard to extrapolate. (Curious: SF was a net winner over 
AZ with White in a B40 Sicilian, the only position/color combination out of 24 
in which SF had an edge.)

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Rémi Coulom
Sent: Thursday, December 7, 2017 11:51 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a 
General Reinforcement Learning Algorithm

>My concern about many of these points of comparison is that they presume how 
>AZ scales. In the absence of data, I would guess that AZ gains much less from 
>hardware than SF. I am basing this guess on >two known facts. First is that AZ 
>did not lose a game, so the upper bound on its strength is perfection. Second, 
>AZ is a knowledge intensive program, so it is counting on judgement to a 
>larger degree.

Doesn't Figure 2 in the paper indicate convincingly that AZ scales better than 
Stockfish?

Rémi
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-07 Thread Brian Sheppard via Computer-go

> Which is IMHO missing the point a bit ;-)

I saw it the same way, while conceding that the facts are accurate.

It makes sense for SF to internalize the details before making decisions. At 
some point there will be a realization that AZ is a fundamental change.

>What about the data point that AlphaGo Zero gained 2100 Elo from its tree 
>search? In a game commonly considered less tactical?

That is a common perception, especially among those who have never debugged a 
Go program. :-)

I was coming at it from the other direction, reasoning that since SF and AZ are 
close to perfect at chess, then there is less to gain from speed. (Whereas I 
doubt that AGZ is close to perfect at Go.)

All of this is subject to my money back guarantee: my opinions are guaranteed 
wrong, or your money back. :-)

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Gian-Carlo Pascutto
Sent: Thursday, December 7, 2017 8:17 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a 
General Reinforcement Learning Algorithm

On 7/12/2017 13:20, Brian Sheppard via Computer-go wrote:
> The conversation on Stockfish's mailing list focused on how the match 
> was imbalanced.

Which is IMHO missing the point a bit ;-)

> My concern about many of these points of comparison is that they 
> presume how AZ scales. In the absence of data, I would guess that AZ 
> gains much less from hardware than SF. I am basing this guess on two 
> known facts. First is that AZ did not lose a game, so the upper bound 
> on its strength is perfection. Second, AZ is a knowledge intensive 
> program, so it is counting on judgement to a larger degree.

What about the data point that AlphaGo Zero gained 2100 Elo from its tree 
search? In a game commonly considered less tactical?

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-07 Thread Brian Sheppard via Computer-go

The conversation on Stockfish's mailing list focused on how the match was 
imbalanced.

- AZ's TPU hardware was estimated at several times (7 times?) the computational 
power of Stockfish's.
- Stockfish's transposition table size (1 GB) was considered much too small for 
a 64 core machine.
- Stockfish's opening book is disabled, whereas AZ has, in effect, memorized a 
huge opening book.
- The match was against SF 8 (one year old) rather than the latest dev version.

To this I would add that the losses of Stockfish that I played through seemed 
to be largely self-similar, so it is possible that Stockfish has a relatively 
limited number of weaknesses that AZ does not, but the format of the match 
amplifies the issue.

So the attitude among the SF core is pretty competitive. Which is great news 
for continued development.

My concern about many of these points of comparison is that they presume how AZ 
scales. In the absence of data, I would guess that AZ gains much less from 
hardware than SF. I am basing this guess on two known facts. First is that AZ 
did not lose a game, so the upper bound on its strength is perfection. Second, 
AZ is a knowledge intensive program, so it is counting on judgement to a larger 
degree.

But I could be wrong. Maybe AZ falls apart tactically without 80K pops. There 
is no data, so all WAGs are valid.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Gian-Carlo Pascutto
Sent: Thursday, December 7, 2017 4:13 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a 
General Reinforcement Learning Algorithm

On 06-12-17 22:29, Brian Sheppard via Computer-go wrote:
> The chess result is 64-36: a 100 rating point edge! I think the
> Stockfish open source project improved Stockfish by ~20 rating points in
> the last year.

It's about 40-45 Elo FWIW.

> AZ would dominate the current TCEC. 

I don't think you'll get to 80 knps with a regular 22 core machine or
whatever they use. Remember that AZ hardware is about 16 x 1080 Ti's.
You'll lose that (70 - 40 = 30 Elo) advantage very, very quickly.

IMHO this makes it all the more clear how silly it is that so much
attention is given to TCEC with its completely arbitrary hardware choice.

> The Stockfish team will have some self-examination going forward for
> sure. I wonder what they will decide to do.

Probably the same the Zen team did. Ignore a large part of the result
because people's actual computers - let alone mobile phones - can't run
a neural net at TPU speeds.

The question is if resizing the network makes the resulting program more
competitive, enough to overcome the speed difference. And, aha, in which
direction are you going to try to resize? Bigger or smaller?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Brian Sheppard via Computer-go

Requiring a margin > 55% is a defense against a random result. A 55% score in a 
400-game match is 2 sigma.

But I like the AZ policy better, because it does not require arbitrary 
parameters. It also improves more fluidly by always drawing training examples 
from the current probability distribution, and when the program is close to 
perfect you would be able to capture the lest 5% of skill.

I am not sure what to make of the AZ vs AGZ result. Mathematically, there 
should be a degree of training sufficient for AZ to exceed any fixed level of 
skill, such as AGZ's 40/40 level. So there must be a reason why DeepMind did 
not report such a result, but it unclear what that is.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Darren Cook
Sent: Wednesday, December 6, 2017 12:58 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a 
General Reinforcement Learning Algorithm

> Mastering Chess and Shogi by Self-Play with a General Reinforcement 
> Learning Algorithm https://arxiv.org/pdf/1712.01815.pdf

One of the changes they made (bottom of p.3) was to continuously update the 
neural net, rather than require a new network to beat it 55% of the time to be 
used. (That struck me as strange at the time, when reading the AlphaGoZero 
paper - why not just >50%?)

The AlphaZero paper shows it out-performs AlphaGoZero, but they are comparing 
to the 20-block, 3-day version. Not the 40-block, 40-day version that was even 
stronger.

As papers rarely show failures, can we take it to mean they couldn't 
out-perform their best go bot, do you think? If so, I wonder how hard they 
tried?

In other words, do you think the changes they made from AlphaGo Zero to Alpha 
Zero have made it weaker (when just viewed from the point of view of making the 
strongest possible go program).

Darren
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Brian Sheppard via Computer-go

The chess result is 64-36: a 100 rating point edge! I think the Stockfish open 
source project improved Stockfish by ~20 rating points in the last year. Given 
the number of people/computers involved, Stockfish’s annual effort level seems 
comparable to the AZ effort.

 

Stockfish is really, really tweaked out to do exactly what it does. It is very 
hard to improve anything about Stockfish. To be clear: I am not disparaging the 
code or people or project in any way. The code is great, people are great, 
project is great. It is really easy to work on Stockfish, but very hard to make 
progress given the extraordinarily fine balance of resources that already 
exists.  I tried hard for about 6 months last year without any successes. I 
tried dozens (maybe 100?) experiments, including several that were motivated by 
automated tuning or automated searching for opportunities. No luck.

 

AZ would dominate the current TCEC. Stockfish didn’t lose a game in the 
semi-final, failing to make the final because of too many draws against the 
weaker players.

 

The Stockfish team will have some self-examination going forward for sure. I 
wonder what they will decide to do.

 

I hope this isn’t the last we see of these DeepMind programs.

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Richard Lorentz
Sent: Wednesday, December 6, 2017 12:50 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a 
General Reinforcement Learning Algorithm

 

One chess result stood out for me, namely, just how much easier it was for 
AlphaZero to win with white (25 wins, 25 draws, 0 losses) rather than with 
black (3 wins, 47 draws, 0 losses).

Maybe we should not give up on the idea of White to play and win in chess!

On 12/06/2017 01:24 AM, Hiroshi Yamashita wrote:

Hi, 

DeepMind makes strongest Chess and Shogi programs with AlphaGo Zero method. 

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning 
Algorithm 
https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_pdf_1712.01815.pdf
 

 
=DwIGaQ=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U=i0hg-cKH69CA5MsdosvezQ=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ=dsola-9J77ArHVeuVc0ZCZKn2nJOsjfsnJzPc_MdPDo=
 

AlphaZero(Chess) outperformed Stockfish after 4 hours, 
AlphaZero(Shogi) outperformed elmo after 2 hours. 

Search is MCTS. 
AlphaZero(Chess) searches 80,000 positions/sec. 
Stockfishsearches 70,000,000 positions/sec. 
AlphaZero(Shogi) searches 40,000 positions/sec. 
elmo searches 35,000,000 positions/sec. 

Thanks, 
Hiroshi Yamashita 

___ 
Computer-go mailing list 
Computer-go@computer-go.org   
https://urldefense.proofpoint.com/v2/url?u=http-3A__computer-2Dgo.org_mailman_listinfo_computer-2Dgo
 

 
=DwIGaQ=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U=i0hg-cKH69CA5MsdosvezQ=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ=Dflm7ezefzMJ9xLNmNYrSQKWa7qvG9FkzlCHngo_NcY=

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Significance of resignation in AGZ

2017-12-03 Thread Brian Sheppard via Computer-go

I have been interested in a different approach, and it had some elements in 
common with AGZ, so AGZ gave me the confidence to try it.

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Chaz G.
Sent: Sunday, December 3, 2017 4:05 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Significance of resignation in AGZ

Hi Brian,

Thanks for sharing your genuinely interesting result. One question though: why 
would you train on a non-"zero" program? Do you think your program as a result 
of your rules would perform better than zero, or is it imitating the best known 
algorithm inconvenient for your purposes?

Best,

-Chaz

On Sat, Dec 2, 2017 at 7:31 PM, Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

I implemented the ad hoc rule of not training on positions after the first 
pass, and my program is basically playing moves until the first pass is forced. 
(It is not a “zero” program, so I don’t mind ad hoc rules like this.)

From: Computer-go [mailto:computer-go-boun...@computer-go.org 
<mailto:computer-go-boun...@computer-go.org> ] On Behalf Of Xavier Combelle
Sent: Saturday, December 2, 2017 12:36 PM
To: computer-go@computer-go.org <mailto:computer-go@computer-go.org> 

Subject: Re: [Computer-go] Significance of resignation in AGZ

It might make sense to enable resignation threshold even on stupid level. As 
such the first thing the network should learn would be not to resign to early 
(even before not passing)

Le 02/12/2017 à 18:17, Brian Sheppard via Computer-go a écrit :

I have some hard data now. My network’s initial training reached the same 
performance in half the iterations. That is, the steepness of skill gain in the 
first day of training was twice as great when I avoided training on fill-ins.

The has all the usual caveats: only one run before/after, YMMV, etc.

From: Brian Sheppard [mailto:sheppar...@aol.com] 
Sent: Friday, December 1, 2017 5:39 PM
To: 'computer-go'  <mailto:computer-go@computer-go.org> 
<computer-go@computer-go.org>
Subject: RE: [Computer-go] Significance of resignation in AGZ

I didn’t measure precisely because as soon as I saw the training artifacts I 
changed the code. And I am not doing an AGZ-style experiment, so there are 
differences for sure. So I will give you a swag…

Speed difference is maybe 20%-ish for 9x9 games.

A frequentist approach will overstate the frequency of fill-in plays by a 
pretty large factor, because fill-in plays are guaranteed to occur in every 
game but are not best in the competitive part of the game. This will affect the 
speed of learning in the early going.

The network will use some fraction (almost certainly <= 20%) of its capacity to 
improve accuracy on positions that will not contribute to its ultimate 
strength. This applies to both ordering and evaluation aspects.

From: Andy [mailto:andy.olsen...@gmail.com] 
Sent: Friday, December 1, 2017 4:55 PM
To: Brian Sheppard  <mailto:sheppar...@aol.com> <sheppar...@aol.com>; 
computer-go  <mailto:computer-go@computer-go.org> <computer-go@computer-go.org>
Subject: Re: [Computer-go] Significance of resignation in AGZ

Brian, do you have any experiments showing what kind of impact it has? It 
sounds like you have tried both with and without your ad hoc first pass 
approach?

2017-12-01 15:29 GMT-06:00 Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> >:

I have concluded that AGZ's policy of resigning "lost" games early is somewhat 
significant. Not as significant as using residual networks, for sure, but you 
wouldn't want to go without these advantages.

The benefit cited in the paper is speed. Certainly a factor. I see two other 
advantages.

First is that training does not include the "fill in" portion of the game, 
where every move is low value. I see a specific effect on the move ordering 
system, since it is based on frequency. By eliminating training on fill-ins, 
the prioritization function will not be biased toward moves that are not 
relevant to strong play. (That is, there are a lot of fill-in moves, which are 
usually not best in the interesting portion of the game, but occur a lot if the 
game is played out to the end, and therefore the move prioritization system 
would predict them more often.) My ad hoc alternative is to not train on 
positions after the first pass in a game. (Note that this does not qualify as 
"zero knowledge", but that is OK with me since I am not trying to reproduce 
AGZ.)

Second is the positional evaluation is not training on situations where 
everything is decided, so less of the NN capacity is devoted to situations in 
which nothing can be gained.

As always, YMMV.

Best,
Brian

__

Re: [Computer-go] Significance of resignation in AGZ

2017-12-02 Thread Brian Sheppard via Computer-go

I implemented the ad hoc rule of not training on positions after the first 
pass, and my program is basically playing moves until the first pass is forced. 
(It is not a “zero” program, so I don’t mind ad hoc rules like this.)

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Xavier Combelle
Sent: Saturday, December 2, 2017 12:36 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Significance of resignation in AGZ

It might make sense to enable resignation threshold even on stupid level. As 
such the first thing the network should learn would be not to resign to early 
(even before not passing)

Le 02/12/2017 à 18:17, Brian Sheppard via Computer-go a écrit :

I have some hard data now. My network’s initial training reached the same 
performance in half the iterations. That is, the steepness of skill gain in the 
first day of training was twice as great when I avoided training on fill-ins.

The has all the usual caveats: only one run before/after, YMMV, etc.

From: Brian Sheppard [mailto:sheppar...@aol.com] 
Sent: Friday, December 1, 2017 5:39 PM
To: 'computer-go'  <mailto:computer-go@computer-go.org> 
<computer-go@computer-go.org>
Subject: RE: [Computer-go] Significance of resignation in AGZ

I didn’t measure precisely because as soon as I saw the training artifacts I 
changed the code. And I am not doing an AGZ-style experiment, so there are 
differences for sure. So I will give you a swag…

Speed difference is maybe 20%-ish for 9x9 games.

A frequentist approach will overstate the frequency of fill-in plays by a 
pretty large factor, because fill-in plays are guaranteed to occur in every 
game but are not best in the competitive part of the game. This will affect the 
speed of learning in the early going.

The network will use some fraction (almost certainly <= 20%) of its capacity to 
improve accuracy on positions that will not contribute to its ultimate 
strength. This applies to both ordering and evaluation aspects.

From: Andy [mailto:andy.olsen...@gmail.com] 
Sent: Friday, December 1, 2017 4:55 PM
To: Brian Sheppard  <mailto:sheppar...@aol.com> <sheppar...@aol.com>; 
computer-go  <mailto:computer-go@computer-go.org> <computer-go@computer-go.org>
Subject: Re: [Computer-go] Significance of resignation in AGZ

Brian, do you have any experiments showing what kind of impact it has? It 
sounds like you have tried both with and without your ad hoc first pass 
approach?

2017-12-01 15:29 GMT-06:00 Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> >:

I have concluded that AGZ's policy of resigning "lost" games early is somewhat 
significant. Not as significant as using residual networks, for sure, but you 
wouldn't want to go without these advantages.

The benefit cited in the paper is speed. Certainly a factor. I see two other 
advantages.

First is that training does not include the "fill in" portion of the game, 
where every move is low value. I see a specific effect on the move ordering 
system, since it is based on frequency. By eliminating training on fill-ins, 
the prioritization function will not be biased toward moves that are not 
relevant to strong play. (That is, there are a lot of fill-in moves, which are 
usually not best in the interesting portion of the game, but occur a lot if the 
game is played out to the end, and therefore the move prioritization system 
would predict them more often.) My ad hoc alternative is to not train on 
positions after the first pass in a game. (Note that this does not qualify as 
"zero knowledge", but that is OK with me since I am not trying to reproduce 
AGZ.)

Second is the positional evaluation is not training on situations where 
everything is decided, so less of the NN capacity is devoted to situations in 
which nothing can be gained.

As always, YMMV.

Best,
Brian

___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Significance of resignation in AGZ

2017-12-02 Thread Brian Sheppard via Computer-go

I have some hard data now. My network’s initial training reached the same 
performance in half the iterations. That is, the steepness of skill gain in the 
first day of training was twice as great when I avoided training on fill-ins.

 

The has all the usual caveats: only one run before/after, YMMV, etc.

 

From: Brian Sheppard [mailto:sheppar...@aol.com] 
Sent: Friday, December 1, 2017 5:39 PM
To: 'computer-go' <computer-go@computer-go.org>
Subject: RE: [Computer-go] Significance of resignation in AGZ

 

I didn’t measure precisely because as soon as I saw the training artifacts I 
changed the code. And I am not doing an AGZ-style experiment, so there are 
differences for sure. So I will give you a swag…

 

Speed difference is maybe 20%-ish for 9x9 games.

 

A frequentist approach will overstate the frequency of fill-in plays by a 
pretty large factor, because fill-in plays are guaranteed to occur in every 
game but are not best in the competitive part of the game. This will affect the 
speed of learning in the early going.

 

The network will use some fraction (almost certainly <= 20%) of its capacity to 
improve accuracy on positions that will not contribute to its ultimate 
strength. This applies to both ordering and evaluation aspects.

 

 

 

 

From: Andy [mailto:andy.olsen...@gmail.com] 
Sent: Friday, December 1, 2017 4:55 PM
To: Brian Sheppard <sheppar...@aol.com>; computer-go 
<computer-go@computer-go.org>
Subject: Re: [Computer-go] Significance of resignation in AGZ

 

Brian, do you have any experiments showing what kind of impact it has? It 
sounds like you have tried both with and without your ad hoc first pass 
approach?

 

 

 

 

2017-12-01 15:29 GMT-06:00 Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> >:

I have concluded that AGZ's policy of resigning "lost" games early is somewhat 
significant. Not as significant as using residual networks, for sure, but you 
wouldn't want to go without these advantages.

The benefit cited in the paper is speed. Certainly a factor. I see two other 
advantages.

First is that training does not include the "fill in" portion of the game, 
where every move is low value. I see a specific effect on the move ordering 
system, since it is based on frequency. By eliminating training on fill-ins, 
the prioritization function will not be biased toward moves that are not 
relevant to strong play. (That is, there are a lot of fill-in moves, which are 
usually not best in the interesting portion of the game, but occur a lot if the 
game is played out to the end, and therefore the move prioritization system 
would predict them more often.) My ad hoc alternative is to not train on 
positions after the first pass in a game. (Note that this does not qualify as 
"zero knowledge", but that is OK with me since I am not trying to reproduce 
AGZ.)

Second is the positional evaluation is not training on situations where 
everything is decided, so less of the NN capacity is devoted to situations in 
which nothing can be gained.

As always, YMMV.

Best,
Brian


___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Significance of resignation in AGZ

2017-12-01 Thread Brian Sheppard via Computer-go

I didn’t measure precisely because as soon as I saw the training artifacts I 
changed the code. And I am not doing an AGZ-style experiment, so there are 
differences for sure. So I will give you a swag…

 

Speed difference is maybe 20%-ish for 9x9 games.

 

A frequentist approach will overstate the frequency of fill-in plays by a 
pretty large factor, because fill-in plays are guaranteed to occur in every 
game but are not best in the competitive part of the game. This will affect the 
speed of learning in the early going.

 

The network will use some fraction (almost certainly <= 20%) of its capacity to 
improve accuracy on positions that will not contribute to its ultimate 
strength. This applies to both ordering and evaluation aspects.

 

 

 

 

From: Andy [mailto:andy.olsen...@gmail.com] 
Sent: Friday, December 1, 2017 4:55 PM
To: Brian Sheppard <sheppar...@aol.com>; computer-go 
<computer-go@computer-go.org>
Subject: Re: [Computer-go] Significance of resignation in AGZ

 

Brian, do you have any experiments showing what kind of impact it has? It 
sounds like you have tried both with and without your ad hoc first pass 
approach?

 

 

 

 

2017-12-01 15:29 GMT-06:00 Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> >:

I have concluded that AGZ's policy of resigning "lost" games early is somewhat 
significant. Not as significant as using residual networks, for sure, but you 
wouldn't want to go without these advantages.

The benefit cited in the paper is speed. Certainly a factor. I see two other 
advantages.

First is that training does not include the "fill in" portion of the game, 
where every move is low value. I see a specific effect on the move ordering 
system, since it is based on frequency. By eliminating training on fill-ins, 
the prioritization function will not be biased toward moves that are not 
relevant to strong play. (That is, there are a lot of fill-in moves, which are 
usually not best in the interesting portion of the game, but occur a lot if the 
game is played out to the end, and therefore the move prioritization system 
would predict them more often.) My ad hoc alternative is to not train on 
positions after the first pass in a game. (Note that this does not qualify as 
"zero knowledge", but that is OK with me since I am not trying to reproduce 
AGZ.)

Second is the positional evaluation is not training on situations where 
everything is decided, so less of the NN capacity is devoted to situations in 
which nothing can be gained.

As always, YMMV.

Best,
Brian


___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Significance of resignation in AGZ

2017-12-01 Thread Brian Sheppard via Computer-go

I have concluded that AGZ's policy of resigning "lost" games early is somewhat 
significant. Not as significant as using residual networks, for sure, but you 
wouldn't want to go without these advantages.

The benefit cited in the paper is speed. Certainly a factor. I see two other 
advantages.

First is that training does not include the "fill in" portion of the game, 
where every move is low value. I see a specific effect on the move ordering 
system, since it is based on frequency. By eliminating training on fill-ins, 
the prioritization function will not be biased toward moves that are not 
relevant to strong play. (That is, there are a lot of fill-in moves, which are 
usually not best in the interesting portion of the game, but occur a lot if the 
game is played out to the end, and therefore the move prioritization system 
would predict them more often.) My ad hoc alternative is to not train on 
positions after the first pass in a game. (Note that this does not qualify as 
"zero knowledge", but that is OK with me since I am not trying to reproduce 
AGZ.)

Second is the positional evaluation is not training on situations where 
everything is decided, so less of the NN capacity is devoted to situations in 
which nothing can be gained.

As always, YMMV.

Best,
Brian


___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Is MCTS needed?

2017-11-16 Thread Brian Sheppard via Computer-go

State of the art in computer chess is alpha-beta search, but note that the 
search is very selective because of "late move reductions."

A late move reduction is to reduce depth for moves after the first move 
generated in a node. For example, a simple implementation would be "search the 
first move generated to depth N, and all subsequent moves to depth N-1". That 
simple idea is a big improvement.

LMR works (in large part) because move generators produce probable-best-first 
order. Therefore, if the first few moves do not work (i.e., refutes opponent's 
late move) then there probably isn't a refutation. By searching unlikely moves 
less, you have the time to search likely variations more.

Stockfish's actual implementation massively increases effective search depth 
over the simple policy outlined above. Because of LMR, the effective branching 
factor of chess search trees is less than 2.

IMO, late move reductions should work really well in Go, given that the move 
predictors are above 60% accuracy. But I have not tried. 

Using MCTS for training the evaluator is probably necessary, even if the 
ultimate goal is to use the evaluator in alpha-beta.

Best,
Brian

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Petr Baudis
Sent: Thursday, November 16, 2017 10:43 AM
To: computer-go@computer-go.org
Subject: [Computer-go] Is MCTS needed?

  Hi,

  when explaining AlphaGo Zero to a machine learning audience yesterday


(https://docs.google.com/presentation/d/1VIueYgFciGr9pxiGmoQyUQ088Ca4ouvEFDPoWpRO4oQ/view)

it occurred to me that using MCTS in this setup is actually such
a kludge!

  Originally, we used MCTS because with the repeated simulations,
we would be improving the accuracy of the arm reward estimates.  MCTS
policies assume stationary distributions, which is violated every time
we expand the tree, but it's an okay tradeoff if all you feed into the
tree are rewards in the form of just Bernoulli trials.  Moreover, you
could argue evaluations are somewhat monotonic with increasing node
depths as you are basically just fixing a growing prefix of the MC
simulation.

  But now, we expand the nodes literally all the time, breaking the
stationarity possibly in drastic ways.  There are no reevaluations that
would improve your estimate.  The input isn't binary but an estimate in
a continuous space.  Suddenly the Multi-armed Bandit analogy loses a lot
of ground.

  Therefore, can't we take the next step, and do away with MCTS?  Is
there a theoretical viewpoint from which it still makes sense as the best
policy improvement operator?

  What would you say is the current state-of-art game tree search for
chess?  That's a very unfamiliar world for me, to be honest all I really
know is MCTS...

-- 
Petr Baudis, Rossum
Run before you walk! Fly before you crawl! Keep moving forward!
If we fail, I'd rather fail really hugely.  -- Moist von Lipwig
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Zero is weaker than Master!?

2017-10-26 Thread Brian Sheppard via Computer-go

I would add that "wild guesses based on not enough info" is an indispensable 
skill.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Hideki Kato
Sent: Thursday, October 26, 2017 10:17 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Zero is weaker than Master!?

Xavier Combelle: <62b977d7-d227-a74d-04b7-0d46db6a7...@gmail.com>:
>It is just wild guesses  based on reasonable arguments but without 
>evidence.


Yes, of course. Due to not enough info provided by Google.

Hideki


>Le 26/10/2017
à 07:51, Hideki Kato a écrit :
>> You can believe
>>> 
Of what I understand same network architecture imply the same number of
>>> block
>> 
but David Silver told AlphaGo Master used 40 layers in 
>> 
May. 
>> http://www.bestchinanews.com/Science-Technology/1
0371.html
>> # The paper was submitted in April.

>>
>> Usually, network "architecture" does not imply the num
ber of 
>> layers whereas "configulation" may do.

>>
>> Clearly they made 40 layers version first because it's
 
>> called "1st instance" where the 80 layers one is called
 "2nd 
>> instance."  The 1st was trained 3 days and overtoo
k AlphaGo 
>> Lee.  Then they changed to the 2nd.  Awaring t
his fact, and 
>> watching the growing curve of the 1st, I g
uess 40 layers was 
>> not enough to reach AlphaGo Master le
vel and so they 
>> doubled the layers.

>>
>> Hideki

>>
>> Xavier Combelle: <1550c907-8b96-e4ea-1f5e-2344f394b967
@gmail.com>:
>>> As I understand the paper they directly cre
ated alphago zero with a 40 
>>> block

>>> setup.
>>> They just made a reduced 20 block setup to co
mpare on kifu prediction
>>> (as far as I searched in the pa
per, it is the only
>>> place where they mention the 20 bloc
k setup)
>>> They specifically mention comparing several ver
sion of their software.
>>> with various parameter

>>> If the number of block was an important parameter I hope they would

>>> mention it.

>>> Of course they are a lot of things that they try and failed and we 
>>> will

>>> not know about

>>> But I have hard time to believe that alphago zero with a 20 block is 
>>> one

>>> of them

>>> About the paper, there is no mention of the number of block of master:

>>> "AlphaGo Master is the program that defeated top human players by 
>>> 600

>>> in January, 2017 34 .

>>> It was previously unpublished but uses the same neural network

>>> architecture, reinforcement

>>> learning algorithm, and MCTS algorithm as described in this paper.

>>> However, it uses the

>>> same handcrafted features and rollouts as AlphaGo Lee

>>> and training was initialised by

>>> supervised learning from human data."

>>> Of what I understand same network architecture imply the same number 
>>> of

>>> block

>>> Le 25/10/2017 à 17:58, Xavier Combelle a écrit :

 I understand better

 Le 25/10/2017 à 04:28, Hideki Kato a écrit :

> Are you thinking the 1st instance could reach Master level

> if giving more training days?

> I don't think so.  The performance would be stopping

> improving at 3 days.  If not, why they built the 2nd

> instance?

> Best,

> Hideki

> Xavier Combelle: <05c04de1-59c4-8fcd-2dd1-094faabf3...@gmail.com>:

>> How is it a fair comparison if there is only 3 days of training 
>> for

>>> Zero ?

>> Master had longer training no ? Moreover, Zero has bootstrap 
>> problem

>> because at the opposite of Master it don't learn from expert 
>> games

>> which means that it is likely to be weaker with little training.

>> Le 24/10/2017 à 20:20, Hideki Kato a écrit :

>>> David Silver told Master used 40 layers network in May. 

>>> According to new paper, Master used the same architecture

>>> as Zero.  So, Master used 20 blocks ResNet.  

>>> The first instance of Zero, 20 blocks ResNet version, is

>>> weaker than Master (after 3 days training).  So, with the

>>> same layers (a fair comparison) Zero is weaker than

>>> Master.

>>> Hideki

>> ___

>> Computer-go mailing list

>> Computer-go@computer-go.org

>> http://computer-go.org/mailman/listinfo/computer-go

>>> ___

>>> Computer-go mailing list

>>> Computer-go@computer-go.org

>>> http://computer-go.org/mailman/listinfo/computer-go

>

>___

>Computer-go mailing list

>Computer-go@computer-go.org

>http://computer-go.org/mailman/listinfo/computer-go
--
Hideki Kato  
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero SGF - Free Use or Copyright?

2017-10-26 Thread Brian Sheppard via Computer-go

Well,... good luck with that! :-)

Seriously: it is important to account for p-space completeness. That is, a set 
of rules that covers Go without conflict must be exponential in space usage.

Search has a triple role in system design. It accounts (at least 
asymptotically) for missing knowledge and also papers over disagreements 
between rules. It also evaluates the global situation, which allows rules to be 
expressed in terms of purely local effects.

From my perspective, that is too good a deal to pass by. But I don't want to be 
only a bearer of bad news. If you accept a limitation on your rule sets, then 
there is a higher level conflict resolution method that will lead to good 
results.

Your rules could express their effect as a local point gain, in the sense of 
"temperature". That is, temperature == the difference between moving first and 
letting the opponent move first. Then CGT provides a higher-order theory for 
rationalizing multiple priorities.

This suggestion only addresses one of the three roles of search, though perhaps 
the most important one.

Best,
Brian


-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Robert Jasiek
Sent: Thursday, October 26, 2017 10:17 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo Zero SGF - Free Use or Copyright?

On 26.10.2017 13:52, Brian Sheppard via Computer-go wrote:
> MCTS is the glue that binds incompatible rules.

This is, however, not what I mean. Conflicting principles (call them rules if 
you like) must be dissolved by higher order principles. Only when all conflicts 
are dissolved should MCTS be applied.

What you describe has been used with success and better success than I expect 
what my knowledge-pure approach can currently achieve. But MCTS as glue for 
conflicting principles has also run into a boundary. I want to see that 
boundary surpassed by my pure approach.

--
robert jasiek
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero SGF - Free Use or Copyright?

2017-10-26 Thread Brian Sheppard via Computer-go

Robert is right, but Robert seems to think this hasn't been done. Actually 
every prominent non-neural MCTS program since Mogo has been based on the exact 
design that Robert describes. The best of them achieve somewhat greater 
strength than Robert expects.

MCTS is the glue that binds incompatible rules. It rationalizes different 
heuristics into a coherent whole by testing the ideas in a competition against 
one another using a meaningful evaluation (win/loss).

Best,
Brian

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Xavier Combelle
Sent: Thursday, October 26, 2017 1:50 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo Zero SGF - Free Use or Copyright?



>> The reason why (b) had became unpopular is because there is no go 
>> theory precise enough to implement it as an algorithm
>
> There is quite some theory of the 95% principle kind which might be 
> implemented as approximation. E.g. "Usually, defend your weak 
> important group." can be approximated by approximating "group", 
> "important" (its loss is too large in a quick positional judgement), 
> "weak" (can be killed in two successive moves), "defend" (after the 
> move, cannot be killed in two successive moves), "usually" (always, 
> unless there are several such groups and some must be chosen, say, 
> randomly; the approximation being that the alternative strategy of 
> large scale exchange is discarded).
>
> Besides, one must prioritise principles to solve conflicting 
> principles by a higher order principle.
>
> IMO, such an expert system combined with tree reading and maybe MCTS 
> to emulate reading used when a principle depends on reading can, with 
> an effort of a few manyears of implementation, already achieve amateur 
> mid dan. Not high dan yet because high dans can choose advanced 
> strategies, such as global exchange, and there are no good enough 
> principles for that yet, which would also consider necessary side 
> conditions related to influence, aji etc. I need to work out such 
> principles during the following years. Currently, the state is that 
> weaker principles have identified the major topics (influence, aji
> etc.) to be considered in fights but they must be refined to create 
> 95%+ principles.
>
> ***
>
> In the 80s and 90s, expert systems failed to do better than ca. 5 kyu 
> because principles were only marginally better than 50%. Today, (my) 
> average principles discard the weaker, 50% principles and are ca. 75%.
> Tomorrow, the 75% principles can be discarded for an average of 95% 
> principles. Expert systems get their chance again! Their major 
> disadvantage remains: great manpower is required for implementation.
> The advantage is semantical understanding.
>
From a software developer point of view enlighten by my knowledge of history of 
ai and history of go development,
 such approximate definition is close to useless to build a software at the 
current state of art.
One of the reason is as you state the considerable work it would require to 
implement a huge number of imprecise rules.
As you are not a software developer, I want you to look on this comics which 
state the difference between apparent difficulty and real difficulty of 
developping software. https://xkcd.com/1425/ As far as I understand your task 
to implement such an expert system would require the many years of 
implementations would be thousands of years.
As far as my experience speak the expected reward would be a win of one or two 
rank and so definitely not a mid dan amateur level.

Xavier Combelle

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Source code (Was: Reducing network size? (Was: AlphaGo Zero))

2017-10-25 Thread Brian Sheppard via Computer-go

I think it uses the champion network. That is, the training periodically 
generates a candidate, and there is a playoff against the current champion. If 
the candidate wins by more than 55% then a new champion is declared.

 

Keeping a champion is an important mechanism, I believe. That creates the 
competitive coevolution dynamic, where the network is evolving to learn how to 
beat the best, and not just most recent. Without that dynamic, the training 
process can go up and down.

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
uurtamo .
Sent: Wednesday, October 25, 2017 6:07 PM
To: computer-go 
Subject: Re: [Computer-go] Source code (Was: Reducing network size? (Was: 
AlphaGo Zero))

 

Does the self-play step use the most recent network for each move?

 

On Oct 25, 2017 2:23 PM, "Gian-Carlo Pascutto"  > wrote:

On 25-10-17 17:57, Xavier Combelle wrote:
> Is there some way to distribute learning of a neural network ?

Learning as in training the DCNN, not really unless there are high
bandwidth links between the machines (AFAIK - unless the state of the
art changed?).

Learning as in generating self-play games: yes. Especially if you update
the network only every 25 000 games.

My understanding is that this task is much more bottlenecked on game
generation than on DCNN training, until you get quite a bit of machines
that generate games.

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org  
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-19 Thread Brian Sheppard via Computer-go

So I am reading that residual networks are simply better than normal 
convolutional networks. There is a detailed write-up here: 
https://blog.waya.ai/deep-residual-learning-9610bb62c355

Summary: the residual network has a fixed connection that adds (with no 
scaling) the output of the previous level to the output of the current level. 
The point is that once some layer learns a concept, that concept is immediately 
available to all downstream layers, without need for learning how to propagate 
the value through a complicated network design. These connections also provide 
a fast pathway for tuning deeper layers.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Gian-Carlo Pascutto
Sent: Wednesday, October 18, 2017 4:33 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo Zero

On 18/10/2017 19:50, cazen...@ai.univ-paris8.fr wrote:
> 
> https://deepmind.com/blog/
> 
> http://www.nature.com/nature/index.html

Select quotes that I find interesting from a brief skim:

1) Using a residual network was more accurate, achieved lower error, and 
improved performance in AlphaGo by over 600 Elo.

2) Combining policy and value together into a single network slightly reduced 
the move prediction accuracy, but reduced the value error and boosted playing 
performance in AlphaGo by around another 600 Elo.

These gains sound very high (much higher than previous experiments with them 
reported here), but are likely due to the joint training.

3) The raw neural network, without using any lookahead, achieved an Elo rating 
of 3,055. ... AlphaGo Zero achieved a rating of 5,185.

The increase of 2000 Elo from tree search sounds very high, but this may just 
mean the value network is simply very good - and perhaps relatively better than 
the policy one. (They previously had problems there that SL
> RL for the policy network guiding the tree search - but I'm not sure
there's any relation)

4) History features Xt; Yt are necessary because Go is not fully observable 
solely from the current stones, as repetitions are forbidden.

This is a weird statement. Did they need 17 planes just to check for ko?
It seems more likely that history features are very helpful for the internal 
understanding of the network as an optimization. That sucks though - it's 
annoying for analysis and position setup.

Lastly, the entire training procedure is actually not very complicated at all, 
and it's hopeful the training is "faster" than previous approaches - but many 
things look fast if you can throw 64 GPU workers at a problem.

In this context, the graphs of the differing network architectures causing huge 
strength discrepancies are both good and bad. Making a better pick can cause 
you to get massively better results, take a bad pick and you won't come close.

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-18 Thread Brian Sheppard via Computer-go

I also expected bootstrapping by self-play. (I also wrote a post to that 
effect. But of course, DeepMind actually DID IT.)

But I didn't envision any of the other stuff. This is why I love their papers. 
Papers from most sources are predictable, skimpy, and sketchy, but theirs 
contain all sorts of deep insights that I never saw coming. And the theory, 
architecture, implementation, and explanation are all first-rate. It's like the 
Poker papers from U Alberta, or the source code for Stockfish. Lessons on every 
line.

Regarding Elo deltas: the length of Go games obscures what might be very small 
differences. E.g., if one player's moves are 3% more likely to be game-losing 
errors, then won't that player lose nearly every game? That is, 3 more 
"blunders" per game.

Regarding these details: at some level, all of these *must* be artifacts of 
training. That is, the NN architectures that did "badly" are still 
asymptotically optimal, so they should also eventually play equally well, 
provided that training continues indefinitely, and the network are large 
enough, and parameters do not freeze prematurely, and training eventually uses 
only self-play data, etc. I believe that is mathematically accurate, so I would 
ask a different question: why do those choices make better use of resources in 
the short run?

I have no idea; I'm just asking.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Gian-Carlo Pascutto
Sent: Wednesday, October 18, 2017 5:40 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo Zero

On 18/10/2017 22:00, Brian Sheppard via Computer-go wrote:
> This paper is required reading. When I read this team’s papers, I 
> think to myself “Wow, this is brilliant! And I think I see the next step.”
> When I read their next paper, they show me the next *three* steps.

Hmm, interesting way of seeing it. Once they had Lee Sedol AlphaGo, it was 
somewhat obvious that just self-playing that should lead to an improved policy 
and value net.

And before someone accuses me of Captain Hindsighting here, this was pointed 
out on this list:
http://computer-go.org/pipermail/computer-go/2017-January/009786.html

It looks to me like the real devil is in the details. Don't use a residual 
stack? -600 Elo. Don't combine the networks? -600 Elo.
Bootstrap the learning? -300 Elo

We made 3 perfectly reasonable choices and somehow lost 1500 Elo along the way. 
I can't get over that number, actually.

Getting the details right makes a difference. And they're getting them right, 
either because they're smart, because of experience from other domains, or 
because they're trying a ton of them. I'm betting on all 3.

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-18 Thread Brian Sheppard via Computer-go

Some thoughts toward the idea of general game-playing...

One aspect of Go is ideally suited for visual NN: strong locality of reference. 
 That is, stones affect stones that are nearby.

I wonder whether the late emergence of ladder understanding within AlphaGo Zero 
is an artifact of the board representation? The authors speculate that ladders 
are not as important as humans surmise.

Another aspect of Go is ideally suited for visual NN: translation invariance. 
The convolutions transfer knowledge around the board, with the presumption that 
good moves will travel well.

I wonder whether we can data-mine positional evaluations to discover features? 
E.g., start with a standard visual NN and then make a database of positions 
where the delta between actual and expected evaluation is large enough to cause 
a different move to be selected. Data mine features from this set, and extend 
the NN with new inputs. (This would not discover a notion like "liberty count", 
but would discover notions like "ladder breaker".)

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Gian-Carlo Pascutto
Sent: Wednesday, October 18, 2017 4:33 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo Zero

On 18/10/2017 19:50, cazen...@ai.univ-paris8.fr wrote:
> 
> https://deepmind.com/blog/
> 
> http://www.nature.com/nature/index.html

Select quotes that I find interesting from a brief skim:

1) Using a residual network was more accurate, achieved lower error, and 
improved performance in AlphaGo by over 600 Elo.

2) Combining policy and value together into a single network slightly reduced 
the move prediction accuracy, but reduced the value error and boosted playing 
performance in AlphaGo by around another 600 Elo.

These gains sound very high (much higher than previous experiments with them 
reported here), but are likely due to the joint training.

3) The raw neural network, without using any lookahead, achieved an Elo rating 
of 3,055. ... AlphaGo Zero achieved a rating of 5,185.

The increase of 2000 Elo from tree search sounds very high, but this may just 
mean the value network is simply very good - and perhaps relatively better than 
the policy one. (They previously had problems there that SL
> RL for the policy network guiding the tree search - but I'm not sure
there's any relation)

4) History features Xt; Yt are necessary because Go is not fully observable 
solely from the current stones, as repetitions are forbidden.

This is a weird statement. Did they need 17 planes just to check for ko?
It seems more likely that history features are very helpful for the internal 
understanding of the network as an optimization. That sucks though - it's 
annoying for analysis and position setup.

Lastly, the entire training procedure is actually not very complicated at all, 
and it's hopeful the training is "faster" than previous approaches - but many 
things look fast if you can throw 64 GPU workers at a problem.

In this context, the graphs of the differing network architectures causing huge 
strength discrepancies are both good and bad. Making a better pick can cause 
you to get massively better results, take a bad pick and you won't come close.

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-18 Thread Brian Sheppard via Computer-go

Yeah, I would expect that encoding stones as "signed liberty count" would train 
faster/better/stronger. You could imagine a follow-up paper where Go features 
are supplied.

But, if I think about this on a large scale, wouldn't it be huge to put 
together a general game-playing program just based on visual-NN and simple 
MCTS? In the extended section is a comparison of 
checkers/backgammon/poker/scrabble(thanks!)/... that have been addressed by 
rollouts, and how those research efforts differed from the tabula rasa / ab 
initio formulation. The list of games that are played above human caliber now 
includes deterministic and non-deterministic, and perfect and imperfect 
information, so there are examples of every type out there.

I'm hoping that DeepMind will pursue general game-playing. It would be awesome 
to see strong AIs emerging from first principles. I would be happy to supply 
Maven as a sparring partner for Scrabble. I also have a world-class Pineapple 
Poker player that is based on rollouts and hand-coded features.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Gian-Carlo Pascutto
Sent: Wednesday, October 18, 2017 4:38 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo Zero

On 18/10/2017 22:00, Brian Sheppard via Computer-go wrote:
> A stunning result. The NN uses a standard vision architecture (no Go 
> adaptation beyond what is necessary to represent the game state).

The paper says that Master (4858 rating) uses Go specific features, initialized 
by SL, and the same technique. Without go features, and without initialization, 
it's Zero (5185 rating).

The obvious question is, what would be the result of using go features and not 
initializing?

I would expect that providing liberties is a useful shortcut (see my remark 
about game history!). But I'm willing to be surprised :-)

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-18 Thread Brian Sheppard via Computer-go

This paper is required reading. When I read this team’s papers, I think to 
myself “Wow, this is brilliant! And I think I see the next step.” When I read 
their next paper, they show me the next *three* steps. I can’t say enough good 
things about the quality of the work.

 

A stunning result. The NN uses a standard vision architecture (no Go adaptation 
beyond what is necessary to represent the game state). The MCTS is the simplest 
imaginable search: it has no playout policy!

 

I can’t wait to see what the team does next.

 

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Álvaro Begué
Sent: Wednesday, October 18, 2017 3:08 PM
To: computer-go 
Subject: Re: [Computer-go] AlphaGo Zero

 

A link to the paper (from the blog post): 
https://deepmind.com/documents/119/agz_unformatted_nature.pdf

Enjoy!
Álvaro.

 

On Wed, Oct 18, 2017 at 2:29 PM, Richard Lorentz  > wrote:

Wow! That's very exciting. I'm glad they didn't completely shelve the project 
as they implied they might do after the match with Lee Sedol.

I'm looking forward to seeing some games and "... plays unknown to humans", as 
Hassabis states.

Also, I love this comment from Silver, something I have always promoted: The 
implication is that “algorithms matter much more than either computing or data 
available”.

-Richard




On 10/18/2017 10:50 AM, cazen...@ai.univ-paris8.fr 
  wrote:

https://urldefense.proofpoint.com/v2/url?u=https-3A__deepmind.com_blog_ 

 
=DwIGaQ=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U=i0hg-cKH69CA5MsdosvezQ=IPW6s_201Mkb1YsJA4v5VU1jX-PAmMmmrbwYr8hhh2w=zdcDXO2JZU2MfbTwTTrIB8JlmwOD4L11kctLG8w4ktI=
 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.nature.com_nature_index.html
 

 
=DwIGaQ=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U=i0hg-cKH69CA5MsdosvezQ=IPW6s_201Mkb1YsJA4v5VU1jX-PAmMmmrbwYr8hhh2w=rb3-CGHJ_gOUzsjkKh1Ul9f7-eDNkvaWahvgs689xWA=
 
Impressive!
 
___
Computer-go mailing list
Computer-go@computer-go.org  
https://urldefense.proofpoint.com/v2/url?u=http-3A__computer-2Dgo.org_mailman_listinfo_computer-2Dgo
 

 
=DwIGaQ=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U=i0hg-cKH69CA5MsdosvezQ=IPW6s_201Mkb1YsJA4v5VU1jX-PAmMmmrbwYr8hhh2w=Mw0XMdQlRQY3qFVrnSXjL_zBrnoa72a_n8wWibfyxRg=

 


___
Computer-go mailing list
Computer-go@computer-go.org  
http://computer-go.org/mailman/listinfo/computer-go

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Alphago and solving Go

2017-08-06 Thread Brian Sheppard via Computer-go

Possibly you are answering a different question than the one posed? Possibly 
your interpretation is the one actually intended. I don’t know, and maybe you 
could be right about what was being asked.

 

I do know the semantics of brute force, though, which you quoted below.

 

Note that “brute force” != unintelligent. Inevitably, every brute force 
algorithm will incorporate intelligent heuristics. Consider the evolution of 
minimax, for example, via alpha-beta, selective extensions, LMR, etc.

 

 

 

From: Steven Clark [mailto:steven.p.cl...@gmail.com] 
Sent: Sunday, August 6, 2017 2:52 PM
To: Brian Sheppard <sheppar...@aol.com>
Cc: computer-go <computer-go@computer-go.org>
Subject: Re: [Computer-go] Alphago and solving Go

 

This is semantics. Yes, in the limit of infinite time, it is brute-force. 
Meanwhile, in the real world, AlphaGo chooses to balance its finite time budget 
between depth & width. The mere fact that the CNN policy network generates a 
score for each coordinate on the board in a given position, does not mean that 
all of those nodes will be expanded in any reasonable scenario.

 

On Sun, Aug 6, 2017 at 2:20 PM, Brian Sheppard <sheppar...@aol.com 
<mailto:sheppar...@aol.com> > wrote:

I understand why most people are saying that AlphaGo is not brute force, 
because it appears to be highly selective. But MCTS is a full width search. 
Read the AlphaGo papers, as one of the other respondents (rather sarcastically) 
suggested: AlphaGo will eventually search every move at every node.

 

MCTS has the appearance of a selective search because time control terminates 
search while the tree is still ragged. In fact, it will search every 
continuation an infinite number of times.

 

In order to have high performance, an MCTS implementation needs to search best 
moves as early as possible in each node. It is in this respect that AlphaGo 
truly excels. (AlphaGo also excels at whole board evaluation, but that is a 
separate topic.)

 

 

From: Steven Clark [mailto:steven.p.cl...@gmail.com 
<mailto:steven.p.cl...@gmail.com> ] 
Sent: Sunday, August 6, 2017 1:14 PM
To: Brian Sheppard <sheppar...@aol.com <mailto:sheppar...@aol.com> >; 
computer-go <computer-go@computer-go.org <mailto:computer-go@computer-go.org> >
Subject: Re: [Computer-go] Alphago and solving Go

 

Why do you say AlphaGo is brute-force? Brute force is defined as: "In computer 
science, brute-force search or exhaustive search, also known as generate and 
test, is a very general problem-solving technique that consists of 
systematically enumerating all possible candidates for the solution and 
checking whether each candidate satisfies the problem's statement."

 

The whole point of the policy network is to avoid brute-force search, by 
reducing the branching factor...

 

On Sun, Aug 6, 2017 at 10:42 AM, Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

Yes, AlphaGo is brute force.

No it is impossible to solve Go.

Perfect play looks a lot like AlphaGo in that you would not be able to tell the 
difference. But I think that AlphaGo still has 0% win rate against perfect play.

 

My own best guess is that top humans make about 12 errors per game. This is 
estimated based on the win rate of top pros in head-to-head games. The 
calculation starts by assuming that Go is a win at 6.5 komi for either Black 
(more likely) or White, so a perfect player would win 100% for Black. Actual 
championship caliber players win 51% to 52% for Black. In 9-dan play overall, I 
think the rate is 53% to 54% for Black. Then you can estimate how many errors 
each player has to make to bring about such a result. E.g., If players made 
only one error on average, then Black would win the vast majority of games, so 
they must make more errors. I came up with 12 errors per game, but you can 
reasonably get other numbers based on your model.

 

Best,

Brian

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org 
<mailto:computer-go-boun...@computer-go.org> ] On Behalf Of Cai Gengyang
Sent: Sunday, August 6, 2017 9:49 AM
To: computer-go@computer-go.org <mailto:computer-go@computer-go.org> 
Subject: [Computer-go] Alphago and solving Go

 

Is Alphago brute force search? 

Is it possible to solve Go for 19x19 ? 

And what does perfect play in Go look like? 

How far are current top pros from perfect play?

 

Gengyang


___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Alphago and solving Go

2017-08-06 Thread Brian Sheppard via Computer-go

If AlphaGo actually used hard (e.g. permanent) pruning, then it would not be 
brute force. But it doesn’t operate that way, so it is brute force.

 

BTW, AlphaGo’s papers reports benefiting from search and RAVE. That suggests 
that hard pruning is a risky business.

 

From: David Wu [mailto:lightvec...@gmail.com] 
Sent: Sunday, August 6, 2017 2:54 PM
To: Brian Sheppard <sheppar...@aol.com>; computer-go@computer-go.org
Cc: Steven Clark <steven.p.cl...@gmail.com>
Subject: Re: [Computer-go] Alphago and solving Go

 

Saying in an unqualified way that AlphaGo is brute force is wrong in the spirit 
of the question. Assuming AlphaGo uses a typical variant of MCTS, it is 
technically correct. The reason it's technically correct uninteresting is 
because the bias introduced by a policy net is so extreme that it might as well 
be a selective search. 

 

Or put another way, imagine one were to set a threshold on the policy net 
output past a certain point in the tree such that moves below the threshold 
would be hard-pruned, and that threshold were set to a level that would prune, 
say, 70% of the legal moves in an average position. In technical sense, the 
search would no longer be full-width, and therefore it would suddenly become 
"not brute force" according to the definition earlier in the thread. But this 
distinction is not very useful, because moves in the tree that fall below such 
a threshold would receive zero simulations under any reasonable time controls 
anyways, so there would be no practical observable difference in the program's 
search or its play.

 

So - spirit of the question - no AlphaGo is not brute force, its search is 
selective to an extreme due to the policy net, the vast majority of 
possibilities will never be in practice given any attention or time whatsoever.

 

Technical answer - yes, AlphaGo is brute force, in that in the limit of having 
enormously vastly many more orders of magnitude of search time than we would 
ever devote to it and unbounded memory, it will theoretically eventually search 
everything (maybe, it would still depend on the actual details of its 
implementation).

 

 

On Sun, Aug 6, 2017 at 2:20 PM, Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

I understand why most people are saying that AlphaGo is not brute force, 
because it appears to be highly selective. But MCTS is a full width search. 
Read the AlphaGo papers, as one of the other respondents (rather sarcastically) 
suggested: AlphaGo will eventually search every move at every node.

 

MCTS has the appearance of a selective search because time control terminates 
search while the tree is still ragged. In fact, it will search every 
continuation an infinite number of times.

 

In order to have high performance, an MCTS implementation needs to search best 
moves as early as possible in each node. It is in this respect that AlphaGo 
truly excels. (AlphaGo also excels at whole board evaluation, but that is a 
separate topic.)

 

 

From: Steven Clark [mailto:steven.p.cl...@gmail.com 
<mailto:steven.p.cl...@gmail.com> ] 
Sent: Sunday, August 6, 2017 1:14 PM
To: Brian Sheppard <sheppar...@aol.com <mailto:sheppar...@aol.com> >; 
computer-go <computer-go@computer-go.org <mailto:computer-go@computer-go.org> >
Subject: Re: [Computer-go] Alphago and solving Go

 

Why do you say AlphaGo is brute-force? Brute force is defined as: "In computer 
science, brute-force search or exhaustive search, also known as generate and 
test, is a very general problem-solving technique that consists of 
systematically enumerating all possible candidates for the solution and 
checking whether each candidate satisfies the problem's statement."

 

The whole point of the policy network is to avoid brute-force search, by 
reducing the branching factor...

 

On Sun, Aug 6, 2017 at 10:42 AM, Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

Yes, AlphaGo is brute force.

No it is impossible to solve Go.

Perfect play looks a lot like AlphaGo in that you would not be able to tell the 
difference. But I think that AlphaGo still has 0% win rate against perfect play.

 

My own best guess is that top humans make about 12 errors per game. This is 
estimated based on the win rate of top pros in head-to-head games. The 
calculation starts by assuming that Go is a win at 6.5 komi for either Black 
(more likely) or White, so a perfect player would win 100% for Black. Actual 
championship caliber players win 51% to 52% for Black. In 9-dan play overall, I 
think the rate is 53% to 54% for Black. Then you can estimate how many errors 
each player has to make to bring about such a result. E.g., If players made 
only one error on average, then Black would win the vast majority of games, so 
they must make more errors. I came up with 12 errors per

Re: [Computer-go] Alphago and solving Go

2017-08-06 Thread Brian Sheppard via Computer-go

There is a definition of “brute force” on Wikipedia. Seems to me that MCTS 
fits. Deep Blue fits.

From: Álvaro Begué [mailto:alvaro.be...@gmail.com] 
Sent: Sunday, August 6, 2017 2:56 PM
To: Brian Sheppard <sheppar...@aol.com>; computer-go 
<computer-go@computer-go.org>
Subject: Re: [Computer-go] Alphago and solving Go

It was already a stretch when people said that Deep Blue was a brute-force 
searcher. If we apply it to AlphaGo as well, the term just means nothing. 
Full-width and brute-force are most definitely not the same thing.

Álvaro.

On Sun, Aug 6, 2017 at 2:20 PM, Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

I understand why most people are saying that AlphaGo is not brute force, 
because it appears to be highly selective. But MCTS is a full width search. 
Read the AlphaGo papers, as one of the other respondents (rather sarcastically) 
suggested: AlphaGo will eventually search every move at every node.

MCTS has the appearance of a selective search because time control terminates 
search while the tree is still ragged. In fact, it will search every 
continuation an infinite number of times.

In order to have high performance, an MCTS implementation needs to search best 
moves as early as possible in each node. It is in this respect that AlphaGo 
truly excels. (AlphaGo also excels at whole board evaluation, but that is a 
separate topic.)

From: Steven Clark [mailto:steven.p.cl...@gmail.com 
<mailto:steven.p.cl...@gmail.com> ] 
Sent: Sunday, August 6, 2017 1:14 PM
To: Brian Sheppard <sheppar...@aol.com <mailto:sheppar...@aol.com> >; 
computer-go <computer-go@computer-go.org <mailto:computer-go@computer-go.org> >
Subject: Re: [Computer-go] Alphago and solving Go

Why do you say AlphaGo is brute-force? Brute force is defined as: "In computer 
science, brute-force search or exhaustive search, also known as generate and 
test, is a very general problem-solving technique that consists of 
systematically enumerating all possible candidates for the solution and 
checking whether each candidate satisfies the problem's statement."

The whole point of the policy network is to avoid brute-force search, by 
reducing the branching factor...

On Sun, Aug 6, 2017 at 10:42 AM, Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

Yes, AlphaGo is brute force.

No it is impossible to solve Go.

Perfect play looks a lot like AlphaGo in that you would not be able to tell the 
difference. But I think that AlphaGo still has 0% win rate against perfect play.

My own best guess is that top humans make about 12 errors per game. This is 
estimated based on the win rate of top pros in head-to-head games. The 
calculation starts by assuming that Go is a win at 6.5 komi for either Black 
(more likely) or White, so a perfect player would win 100% for Black. Actual 
championship caliber players win 51% to 52% for Black. In 9-dan play overall, I 
think the rate is 53% to 54% for Black. Then you can estimate how many errors 
each player has to make to bring about such a result. E.g., If players made 
only one error on average, then Black would win the vast majority of games, so 
they must make more errors. I came up with 12 errors per game, but you can 
reasonably get other numbers based on your model.

Best,

Brian

From: Computer-go [mailto:computer-go-boun...@computer-go.org 
<mailto:computer-go-boun...@computer-go.org> ] On Behalf Of Cai Gengyang
Sent: Sunday, August 6, 2017 9:49 AM
To: computer-go@computer-go.org <mailto:computer-go@computer-go.org> 
Subject: [Computer-go] Alphago and solving Go

Is Alphago brute force search? 

Is it possible to solve Go for 19x19 ? 

And what does perfect play in Go look like? 

How far are current top pros from perfect play?

Gengyang

___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Alphago and solving Go

2017-08-06 Thread Brian Sheppard via Computer-go

I understand why most people are saying that AlphaGo is not brute force, 
because it appears to be highly selective. But MCTS is a full width search. 
Read the AlphaGo papers, as one of the other respondents (rather sarcastically) 
suggested: AlphaGo will eventually search every move at every node.

 

MCTS has the appearance of a selective search because time control terminates 
search while the tree is still ragged. In fact, it will search every 
continuation an infinite number of times.

 

In order to have high performance, an MCTS implementation needs to search best 
moves as early as possible in each node. It is in this respect that AlphaGo 
truly excels. (AlphaGo also excels at whole board evaluation, but that is a 
separate topic.)

 

 

From: Steven Clark [mailto:steven.p.cl...@gmail.com] 
Sent: Sunday, August 6, 2017 1:14 PM
To: Brian Sheppard <sheppar...@aol.com>; computer-go 
<computer-go@computer-go.org>
Subject: Re: [Computer-go] Alphago and solving Go

 

Why do you say AlphaGo is brute-force? Brute force is defined as: "In computer 
science, brute-force search or exhaustive search, also known as generate and 
test, is a very general problem-solving technique that consists of 
systematically enumerating all possible candidates for the solution and 
checking whether each candidate satisfies the problem's statement."

 

The whole point of the policy network is to avoid brute-force search, by 
reducing the branching factor...

 

On Sun, Aug 6, 2017 at 10:42 AM, Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

Yes, AlphaGo is brute force.

No it is impossible to solve Go.

Perfect play looks a lot like AlphaGo in that you would not be able to tell the 
difference. But I think that AlphaGo still has 0% win rate against perfect play.

 

My own best guess is that top humans make about 12 errors per game. This is 
estimated based on the win rate of top pros in head-to-head games. The 
calculation starts by assuming that Go is a win at 6.5 komi for either Black 
(more likely) or White, so a perfect player would win 100% for Black. Actual 
championship caliber players win 51% to 52% for Black. In 9-dan play overall, I 
think the rate is 53% to 54% for Black. Then you can estimate how many errors 
each player has to make to bring about such a result. E.g., If players made 
only one error on average, then Black would win the vast majority of games, so 
they must make more errors. I came up with 12 errors per game, but you can 
reasonably get other numbers based on your model.

 

Best,

Brian

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org 
<mailto:computer-go-boun...@computer-go.org> ] On Behalf Of Cai Gengyang
Sent: Sunday, August 6, 2017 9:49 AM
To: computer-go@computer-go.org <mailto:computer-go@computer-go.org> 
Subject: [Computer-go] Alphago and solving Go

 

Is Alphago brute force search? 

Is it possible to solve Go for 19x19 ? 

And what does perfect play in Go look like? 

How far are current top pros from perfect play?

 

Gengyang


___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Alphago and solving Go

2017-08-06 Thread Brian Sheppard via Computer-go

Yes, AlphaGo is brute force.

No it is impossible to solve Go.

Perfect play looks a lot like AlphaGo in that you would not be able to tell the 
difference. But I think that AlphaGo still has 0% win rate against perfect play.

 

My own best guess is that top humans make about 12 errors per game. This is 
estimated based on the win rate of top pros in head-to-head games. The 
calculation starts by assuming that Go is a win at 6.5 komi for either Black 
(more likely) or White, so a perfect player would win 100% for Black. Actual 
championship caliber players win 51% to 52% for Black. In 9-dan play overall, I 
think the rate is 53% to 54% for Black. Then you can estimate how many errors 
each player has to make to bring about such a result. E.g., If players made 
only one error on average, then Black would win the vast majority of games, so 
they must make more errors. I came up with 12 errors per game, but you can 
reasonably get other numbers based on your model.

 

Best,

Brian

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Cai 
Gengyang
Sent: Sunday, August 6, 2017 9:49 AM
To: computer-go@computer-go.org
Subject: [Computer-go] Alphago and solving Go

 

Is Alphago brute force search? 

Is it possible to solve Go for 19x19 ? 

And what does perfect play in Go look like? 

How far are current top pros from perfect play?

 

Gengyang

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Possible idea - decay old simulations?

2017-07-24 Thread Brian Sheppard via Computer-go

>I haven't tried it, but (with the computer chess hat on) these kind of 
>proposals behave pretty badly when you get into situations where your 
>evaluation is off and there are horizon effects.

In computer Go, this issue focuses on cases where the initial move ordering is 
bad. It isn't so much evaluation/horizon.

For example, at some node moves A, B, C, and D are seen as the best move, but 
they all fail. So perhaps you invest 2000 moves in each, with a 40% win rate 
overall. Now you come to move E, which is best. In fact, E is a refutation, 
with a 70% win rate.

The question is: how long will it take before this node is considered to be 
refuted?

The problem is that we started the node badly. When E is first explored, the 
node has 3200 wins, 4800 losses. Even with a 70% win rate, E needs 4000 trials  
(= 2800 wins, 1200 losses) just to get back to a 50% win rate.

But if you had explored E first, then this node would have been considered 
refuted after maybe a few hundred trials. There is literally a factor of 100 
difference in search efficiency.

This situation arises seems odd, but arises frequently. For example, if E is a 
winning move now, but becomes a losing move if played in the future. (For 
example, E is in a capturing race that is currently winning for the side to 
move first.) Unless E is tried early in the node's lifecycle, the RAVE score 
for E is very low.



>> I recall a paper published on this basis. A paper presumably about
>> CrazyStone: Efficient Selectivity and Backup Operators in Monte-Carlo 
>> Tree Search.

>I'm reasonably sure this did not include forgetting/discounting, only shifting 
>between average and maximum by focusing simulations near the maximum.

Yes, but emphasizing the maximum is equivalent to forgetting.

E.g., in my example, giving E a larger weight is equivalent to giving A, B, C, 
and D a lower weight (== forgetting).



___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Possible idea - decay old simulations?

2017-07-23 Thread Brian Sheppard via Computer-go

Yes. This is a long-known phenomenon.

 

I was able to get improvements in Pebbles based on the idea of forgetting 
unsuccessful results. It has to be done somewhat carefully, because results 
propagate up the tree. But you can definitely make it work.

 

I recall a paper published on this basis. A paper presumably about CrazyStone: 
Efficient Selectivity and Backup Operators in

Monte-Carlo Tree Search. I see a paper called “Accelerated UCT and Its 
Application to

Two-Player Games”. Could be others.

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
David Wu
Sent: Sunday, July 23, 2017 12:25 PM
To: computer-go@computer-go.org
Subject: [Computer-go] Possible idea - decay old simulations?

 

I've been using Leela 0.10.0 for analysis quite often, and I've noticed 
something that might lead to an improvement for the search, and maybe also for 
other MCTS programs.

 

Sometimes, after putting hundreds of thousands of simulations into a few 
possible moves, Leela will appear to favor one, disliking the others for having 
clearly worse reported winrates. But then every once in a while, the winrate 
for one of the other disliked moves will start rising gradually, but very 
consistently.

 

When this happens, if I play down the variation for that move and look at the 
analysis window, I often find that Leela has discovered a new tactic. 
Specifically, I find a node in that subtree where one move has a greatly higher 
winrate than all the others, but does not have too many simulations yet, 
meaning Leela only just now found it. 

(Possibly it already has more simulations than any other single move, but the 
number of simulations of all of the other moves combined still significantly 
outweighs it).

 

Going back to the root, it's clear that if the new tactic has a high enough 
winrate, then the previously disliked move will eventually overtake the favored 
move. But it takes a long time, since the disliked move has a lot of bad 
simulations to outweigh - it's painful to watch the winrate creep up slowly but 
with high upward consistency, until it finally beats out the previously favored 
move.

 

I think there's a chance that the search could be improved by adding a small 
decay over time to the weight of old simulations. This would allow a move to be 
promoted a bit more rapidly with the discovery of a better tactic. You would 
probably want to set the decay over time so that the total weight over time 
still readily approaches infinity (e.g. a fixed exponential decay would 
probably be bad, that would bound the total weight by a constant), but perhaps 
a bit slower than linearly. 

 

Thinking about it from the multi-armed-bandit perspective, I think this also 
makes sense. The distribution of results from each child is nonstationary, 
because the subtree below the child is evolving over time. If they were 
stationary you would weight all historical simulations equally, but since they 
aren't, the more-recent results from a child should get a little bit more 
weight since they give you more information about the current performance of 
the child move.

 

Has anyone tried this sort of idea before?

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Value network that doesn't want to learn.

2017-06-23 Thread Brian Sheppard via Computer-go

>... my value network was trained to tell me the game is balanced at the 
>beginning...

:-)

The best training policy is to select positions that correct errors.

I used the policies below to train a backgammon NN. Together, they reduced the 
expected loss of the network by 50% (cut the error rate in half):

- Select training positions from the program's own games.
- Can be self-play or versus an opponent.
- Best is to have a broad panel of opponents.
- Beneficial to bootstrap with pro games, but then add ONLY training 
examples from program's own games.
- Train only the moves made by the winner of the game
- Very important for deterministic games!
- Note that the winner can be either your program or the opponent.
- If your program wins then training reinforces good behavior; if 
opponent wins then training corrects bad behavior.
- Per game, you should aim to get only a few training examples (3 in 
backgammon. Maybe 10 in Go?). Use two policies:
- Select positions where the static evaluation of a position is 
significantly different from a deep search
- Select positions where the move selected by a deep search did not 
have the highest static evaluation. (And in this case you have two training 
positions, which differ by the move chosen.)
- Of course, you are selecting examples where you did as badly as 
possible.
- The training value of the position is the result of a deep search.
- This is equivalent to "temporal difference learning", but accelerated 
by the depth of the search.
- Periodically refresh the training evaluations as your search/eval 
improve.

These policies actively seek out cases where your evaluation function has some 
weakness, so training is definitely focused on improving results in the 
distribution of positions that your program will actually face.

You will need about 30 training examples for every free parameter in your NN. 
You can do the math on how many games that will take. It is inevitable: you 
will train your NN based on blitz games.

Good luck!



___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Brian Sheppard via Computer-go

My reaction was "well, if you are using alpha-beta, then at least use LMR 
rather than hard pruning." Your reaction is "don't use alpha-beta", and you 
would know better than anyone!

Yes, LMR in Go has is a big difference compared to LMR in chess: Go tactics 
take many moves to play out, whereas chess tactics are often pretty immediate. 
So LMR could hurt Go tactics much more than it hurts chess tactics. Compare the 
benefit of forcing the playout to the end of the game.

Best,
Brian

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Gian-Carlo Pascutto
Sent: Monday, May 22, 2017 4:08 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] mini-max with Policy and Value network

On 20/05/2017 22:26, Brian Sheppard via Computer-go wrote:
> Could use late-move reductions to eliminate the hard pruning. Given 
> the accuracy rate of the policy network, I would guess that even move
> 2 should be reduced.
> 

The question I always ask is: what's the real difference between MCTS with a 
small UCT constant and an alpha-beta search with heavy Late Move Reductions? 
Are the explored trees really so different?

In any case, in my experiments Monte Carlo still gives a strong benefit, even 
with a not so strong Monte Carlo part. IIRC it was the case for AlphaGo too, 
and they used more training data for the value network than is publicly 
available, and Zen reported the same: Monte Carlo is important.

The main problem is the "only top x moves part". Late Move Reductions are very 
nice because there is never a full pruning. This heavy pruning by the policy 
network OTOH seems to be an issue for me. My program has big tactical holes.

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

2017-05-20 Thread Brian Sheppard via Computer-go

Could use late-move reductions to eliminate the hard pruning. Given the 
accuracy rate of the policy network, I would guess that even move 2 should be 
reduced.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Hiroshi Yamashita
Sent: Saturday, May 20, 2017 3:42 PM
To: computer-go@computer-go.org
Subject: [Computer-go] mini-max with Policy and Value network

Hi,

HiraBot author reported mini-max search with Policy and Value network. 
It does not use monte-carlo.
Only top 8 moves from Policy is searched in root node. In other depth,  top 4 
moves is searched.

Game result against Policy network best move (without search)

 Win Loss winrate 
MaxDepth=1, (558-442) 0.558   +40 Elo
MaxDepth=2, (351-150) 0.701  +148 Elo
MaxDepth=3, (406-116) 0.778  +218 Elo
MaxDepth=4, (670- 78) 0.896  +374 Elo
MaxDepth=5, (490- 57) 0.896  +374 Elo
MaxDepth=6, (520- 20) 0.963  +556 Elo

Search is simple alpha-beta.
There is a modification Policy network high probability moves tend to be 
selected.
MaxDepth=6 takes one second/move on i7-4790k + GTX1060.

His nega-max code
http://kiyoshifk.dip.jp/kiyoshifk/apk/negamax.zip
CGOS result, MaxDepth=6
http://www.yss-aya.com/cgos/19x19/cross/minimax-depth6.html
His Policy network(without search) is maybe 
http://www.yss-aya.com/cgos/19x19/cross/DCNN-No336-tygem.html
His Policy and Value network(MCTS) is maybe 
http://www.yss-aya.com/cgos/19x19/cross/Hiratuka10_38B100.html

Thanks,
Hiroshi Yamashita

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Patterns and bad shape

2017-04-18 Thread Brian Sheppard via Computer-go

My comment was addressed to the original question, which mentioned more 
traditional pattern based work, such as Remi’s.

 

Let’s think about how you might build a NN using a large pattern base as inputs.

 

A NN has K features per point on the board, and you don’t want K to be a large 
number. Instead you look for the K most fundamental concepts that are 
non-overlapping and additive in terms of information content.

 

For example: liberty counts are certainly the highest importance, ladders are 
non-local patterns that the NN would not be able to learn, influence of some 
sort, ko,… (AlphaGo did a great job of input design, IMO.)

 

So, how to incorporate knowledge of many patterns, each of which is 
individually rare?

 

You could have just two additional input bit (per point) that represents 
“tactically forced move for side-to-move/opponent”. That is, combine all 
knowledge about “super urgent” moves into a two bits per point.

 

Could add value. Can’t hurt? Should add value?

 

But note that current thinking is giving most credit to the value network 
(evaluation function) rather than policy network (move ordering).

 

From: Jim O'Flaherty [mailto:jim.oflaherty...@gmail.com] 
Sent: Tuesday, April 18, 2017 9:06 AM
To: computer-go@computer-go.org; Brian Sheppard <sheppar...@aol.com>
Subject: Re: [Computer-go] Patterns and bad shape

 

Now, I love this idea. A super fast cheap pattern matcher can act as input into 
a neural network input layer in sort of a "pay additional attention here and 
here and...".

 

On Apr 18, 2017 6:31 AM, "Brian Sheppard via Computer-go" 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

Adding patterns is very cheap: encode the patterns as an if/else tree, and it 
is O(log n) to match.

 

Pattern matching as such did not show up as a significant component of Pebbles. 
But that is mostly because all of the machinery that makes pattern-matching 
cheap (incremental updating of 3x3 neighborhoods, and related tricks) was a 
large component.

 

But I think that is the basic idea: pre-compute or incrementally compute some 
basic functions of the board position so that pattern matching is cheap. Then 
add as many patterns as possible.

 

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org 
<mailto:computer-go-boun...@computer-go.org> ] On Behalf Of Jim O'Flaherty
Sent: Monday, April 17, 2017 7:05 AM
To: computer-go@computer-go.org <mailto:computer-go@computer-go.org> 
Subject: Re: [Computer-go] Patterns and bad shape

 

It seems chasing down good moves for bad shapes would be an explosion of 
"exception cases", like combinatorially huge. So, while you would be saving 
some branching in the search space, you would be ballooning up the number of 
patterns for which to scan by orders of magnitude.

 

Wouldn't it be preferable to just have the AI continue to make the better move 
emergently and generally from probabilities around win placements as opposed to 
what appears to be a focus on one type of local optima?

 

 

On Mon, Apr 17, 2017 at 5:07 AM, lemonsqueeze <lemonsque...@free.fr 
<mailto:lemonsque...@free.fr> > wrote:

Hi,

I'm sure the topic must have come up before but i can't seem to find it right 
now, i'd appreciate if someone can point me in the right direction.

I'm looking into MM, LFR and similar cpu-based pattern approaches for
generating priors, and was wondering about basic bad shape:

Let's say we use 6d games for training. The system becomes pretty good at 
predicting 6d moves by learning patterns associated with the kind of moves 6d 
players make.

However it seems it doesn't learn to punish basic mistakes effectively (say 
double ataris, shapes with obvious defects ...) because they almost never show 
up in 6d games =) They show up in the search tree though and without good 
answer search might take a long time to realize these moves don't work.

Maybe I missed some later paper / development but basically,
Wouldn't it make sense to also train on good answers to bad moves ?
(maybe harvesting them from the search tree or something like that)

I'm thinking about basic situations like this which patterns should be able to 
recognize:

   A B C D E F G H J K L M N O P Q R S T
 +---+
  19 | . . . . . . . . . . . . . . . . . . . |
  18 | . . . O O . O . . . . . . . . . . . . |
  17 | . . X . X O . X O . . . . . . . . . . |
  16 | . . X . X O . O O . . . . . . X . . . |
  15 | . . . . . X X X X . . . . . . . . . . |
  14 | . . X . . . . . . . . . . . . . . . . |
  13 | . O . . . . . . . . . . . . X . O . . |
  12 | . . O . . . . . . . . . . . . . O . . |
  11 | . . O X . . . . . . . . . . . X . . . |
  10 | . . O X . . . . . . . . . . . X O . . |
   9 | . . O X . . . X . . . X O . . X O . . |
   8 | . . O X . . . O X X X O X)X X O O . . |
   7 | . O X . . . . . O O X O . X O O . . . |
   6 |

Re: [Computer-go] Patterns and bad shape

2017-04-18 Thread Brian Sheppard via Computer-go

Adding patterns is very cheap: encode the patterns as an if/else tree, and it 
is O(log n) to match.

 

Pattern matching as such did not show up as a significant component of Pebbles. 
But that is mostly because all of the machinery that makes pattern-matching 
cheap (incremental updating of 3x3 neighborhoods, and related tricks) was a 
large component.

 

But I think that is the basic idea: pre-compute or incrementally compute some 
basic functions of the board position so that pattern matching is cheap. Then 
add as many patterns as possible.

 

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Jim 
O'Flaherty
Sent: Monday, April 17, 2017 7:05 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Patterns and bad shape

 

It seems chasing down good moves for bad shapes would be an explosion of 
"exception cases", like combinatorially huge. So, while you would be saving 
some branching in the search space, you would be ballooning up the number of 
patterns for which to scan by orders of magnitude.

 

Wouldn't it be preferable to just have the AI continue to make the better move 
emergently and generally from probabilities around win placements as opposed to 
what appears to be a focus on one type of local optima?

 

 

On Mon, Apr 17, 2017 at 5:07 AM, lemonsqueeze  > wrote:

Hi,

I'm sure the topic must have come up before but i can't seem to find it right 
now, i'd appreciate if someone can point me in the right direction.

I'm looking into MM, LFR and similar cpu-based pattern approaches for
generating priors, and was wondering about basic bad shape:

Let's say we use 6d games for training. The system becomes pretty good at 
predicting 6d moves by learning patterns associated with the kind of moves 6d 
players make.

However it seems it doesn't learn to punish basic mistakes effectively (say 
double ataris, shapes with obvious defects ...) because they almost never show 
up in 6d games =) They show up in the search tree though and without good 
answer search might take a long time to realize these moves don't work.

Maybe I missed some later paper / development but basically,
Wouldn't it make sense to also train on good answers to bad moves ?
(maybe harvesting them from the search tree or something like that)

I'm thinking about basic situations like this which patterns should be able to 
recognize:

   A B C D E F G H J K L M N O P Q R S T
 +---+
  19 | . . . . . . . . . . . . . . . . . . . |
  18 | . . . O O . O . . . . . . . . . . . . |
  17 | . . X . X O . X O . . . . . . . . . . |
  16 | . . X . X O . O O . . . . . . X . . . |
  15 | . . . . . X X X X . . . . . . . . . . |
  14 | . . X . . . . . . . . . . . . . . . . |
  13 | . O . . . . . . . . . . . . X . O . . |
  12 | . . O . . . . . . . . . . . . . O . . |
  11 | . . O X . . . . . . . . . . . X . . . |
  10 | . . O X . . . . . . . . . . . X O . . |
   9 | . . O X . . . X . . . X O . . X O . . |
   8 | . . O X . . . O X X X O X)X X O O . . |
   7 | . O X . . . . . O O X O . X O O . . . |
   6 | O X X . X X X O . . O . . . X O X . . |
   5 | . O O . . O X . . . . . O . . . . . . |
   4 | . X O O O O X . . . O . . O . O . . . |
   3 | . X X X X O . . X . X X . . . . . . . |
   2 | . . . . X O . . . . . . O . . . . . . |
   1 | . . . . . . . . . . . . . . . . . . . |
 +---+

Patterns probably never see this during training and miss W L9,
For example :

In Remi's CrazyPatterns.exe L9 comes in 4th position:
   [ M10 N10 O6 L9 ...

With Pachi's large patterns it's 8th:
   [ G8  M10 G9  O17 N10 O6  J4  L9  ...

Cheers,
Matt



MM: Computing Elo Ratings of Move Patterns in the Game of Go
   https://www.remi-coulom.fr/Amsterdam2007/

LFR: Move Prediction in Go – Modelling Feature Interactions Using
  Latent Factors
   https://www.ismll.uni-hildesheim.de/pub/pdfs/wistuba_et_al_KI_2013.pdf


___
Computer-go mailing list
Computer-go@computer-go.org  
http://computer-go.org/mailman/listinfo/computer-go

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] dealing with multiple local optima

2017-02-24 Thread Brian Sheppard via Computer-go

OK, so let’s talk theory vs practice.

 

In theory, TD learning approaches asymptotic optimality when used with a full 
state space. That is, if your RL model has one parameter for each state, then 
TD will converge those parameters to the game theoretic values. There are some 
pre-conditions, but this is true with remarkable generality. In particular, 
there is nothing about stochastic vs deterministic in the theory.

 

So yes, the published experiments for Chess were failures. That is, using a 
very shallow search and relatively simple evaluation function does not work for 
chess like it did for Tesauro’s experiments in backgammon. But the conclusion 
that this is because of stochastic/deterministic is incorrect. Consider Go, for 
example, where static move generators can play at 1 dan level, and probably 
quite a bit higher.

 

An inference that is compatible with theory is that to conquer chess you will 
need a deeper, wider network, or better inputs, or a better search engine. Or 
all of those. You could imagine measuring the skill of a variety of 
architectures, and map out the path of steepest ascent.

 

In retrospect, I view Schradolph’s paper as evidence that neural networks have 
always been surprisingly successful at Go. Like Brugmann’s paper about Monte 
Carlo, which was underestimated for a long time. Sigh.

 

Best,

Brian

 

From: Minjae Kim [mailto:xive...@gmail.com] 
Sent: Friday, February 24, 2017 11:56 AM
To: Brian Sheppard <sheppar...@aol.com>; computer-go@computer-go.org
Subject: Re: [Computer-go] dealing with multiple local optima

 

TD-gammon is regarded as a special case from the stochastic characteristics of 
the backgammon game; it smoothens the search space for the value function and 
the value function itself to a great degree compared to those 'static' games 
such as go. Experiments of applying TD learning to chess and go was done after 
the success of TD-gammon, and the result was not good. I also did a small 
experiment a while ago applying TD learning to go using a similar network 
structure to that of TD-gammon. The network gets 'stuck' really fast. You can 
see similar comments to my experience here 
(https://chessprogramming.wikispaces.com/Temporal+Difference+Learning). You can 
make the network more 'actively' trained by doing stochastic move selection 
rather than simply selecting the move with highest value, but this doesn't work 
well enough to my experience, where I applied softmax. Schradolph experimented 
with TD for go in his 1994 paper, where he applied Gibbs sampling for 
stochastic move selection, although it wasn't a success for building a strong 
go bot.

 

On Fri, Feb 24, 2017 at 11:03 PM, Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

Neural networks always have a lot of local optima. Simply because they have a 
high degree of internal symmetry. That is, you can “permute” sets of 
coefficients and get the same function.

 

Don’t think of starting with expert training as a way to avoid local optima. It 
is a way to start training with some good examples so that learning can start 
at a higher level. But then you should continue with self-play.

 

Backgammon was trained to expert level based on self-play games that were 
initially random. Google “Tesauro backgammon” and you should be able to find a 
paper.

 

I don’t know NEAT and HyperNEAT; I will look them up. Thank you for the 
reference.

 

Best,

Brian

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org 
<mailto:computer-go-boun...@computer-go.org> ] On Behalf Of Minjae Kim
Sent: Friday, February 24, 2017 3:39 AM
To: computer-go@computer-go.org <mailto:computer-go@computer-go.org> 
Subject: [Computer-go] dealing with multiple local optima

 

I've recently viewed the paper of AlphaGo, which has done gradient-based 
reinforcement learning to get stronger. The learning was successful enough to 
beat a human master, but in this case, supervised learning with a large 
database of master level human games was preceded the reinforcement learning. 
For a complex enough game as go, one can expect that the search space for the 
policy function would not be smooth at all. So supposedly supervised learning 
was necessary to guide the policy function to a good starting point before 
reinforcement. Without such, applying reinforcement learning directly to a 
random policy can easily make the policy stuck at a bad local optimum. I could 
have a miunderstanding at this point; correct me if so, but to continue on: if 
it is hard to have "the good starting point" such as a trained policy from 
human expert game records, what is a way to devise one. I've had a look on NEAT 
and HyperNEAT, which are evolutionary methods. Do these evolutionary algorithms 
scale well on complex strategic decision processes and not just on simple 
linear decisions such as food gathering and danger avoidance? I

Re: [Computer-go] dealing with multiple local optima

2017-02-24 Thread Brian Sheppard via Computer-go

Neural networks always have a lot of local optima. Simply because they have a 
high degree of internal symmetry. That is, you can “permute” sets of 
coefficients and get the same function.

 

Don’t think of starting with expert training as a way to avoid local optima. It 
is a way to start training with some good examples so that learning can start 
at a higher level. But then you should continue with self-play.

 

Backgammon was trained to expert level based on self-play games that were 
initially random. Google “Tesauro backgammon” and you should be able to find a 
paper.

 

I don’t know NEAT and HyperNEAT; I will look them up. Thank you for the 
reference.

 

Best,

Brian

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Minjae Kim
Sent: Friday, February 24, 2017 3:39 AM
To: computer-go@computer-go.org
Subject: [Computer-go] dealing with multiple local optima

 

I've recently viewed the paper of AlphaGo, which has done gradient-based 
reinforcement learning to get stronger. The learning was successful enough to 
beat a human master, but in this case, supervised learning with a large 
database of master level human games was preceded the reinforcement learning. 
For a complex enough game as go, one can expect that the search space for the 
policy function would not be smooth at all. So supposedly supervised learning 
was necessary to guide the policy function to a good starting point before 
reinforcement. Without such, applying reinforcement learning directly to a 
random policy can easily make the policy stuck at a bad local optimum. I could 
have a miunderstanding at this point; correct me if so, but to continue on: if 
it is hard to have "the good starting point" such as a trained policy from 
human expert game records, what is a way to devise one. I've had a look on NEAT 
and HyperNEAT, which are evolutionary methods. Do these evolutionary algorithms 
scale well on complex strategic decision processes and not just on simple 
linear decisions such as food gathering and danger avoidance? In case not, what 
alternatives are known? Is there any success case of a chess, go, or any kind 
of complex strategic game playing algorithm, where it gained expert strength 
without domain knowledge such as expert game examples?

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Playout policy optimization

2017-02-12 Thread Brian Sheppard via Computer-go

If your database is composed of self-play games, then the likelihood 
maximization policy should gain strength rapidly, and there should be a way to 
have asymptotic optimality. (That is, the patterns alone will play a perfect 
game in the limit.)

 

Specifically: play self-play games using an asymptotically strong search 
policy, such as UCT, and then add all of the moves played by winning players to 
the training set. (Never train to the moves played by losers if your goal is 
asymptotic optimality.)

 

The intuition about why this results in strong play: at any moment in self-play 
training, the search engine is always working on situations that it believes 
are most equal for both sides. But then it turns out that one side is better 
and wins, so that data is added to the training set. Reinforcement learning 
then biases the pattern base to make the winning pathway more likely in future 
searches. The winning alternatives will be easier to see in future self-play 
games. Inevitably the losing player will prefer a different path, and the 
learning process continues.

 

Regarding asymptotic optimality: obviously you need a pattern set that could 
potentially be asymptotically optimal. For example, a general decision tree, 
rather than any fixed field of view. If you have such a thing, then the proof 
for asymptotic optimality goes along these lines: first show that the tree 
search and pattern base combination will eventually memorize the winning moves 
at terminal nodes, and then apply the same reasoning recursively. 

 

Regarding trying this: I was just starting to code this when AlphaGo debuted, 
and then I stopped working on Go. So: no, I have not tried this.

 

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Álvaro Begué
Sent: Saturday, February 11, 2017 11:44 PM
To: computer-go 
Subject: [Computer-go] Playout policy optimization

 

Hi,

 

I remember an old paper by Rémi Coulom ("Computing Elo Ratings of Move Patterns 
in the Game of Go") where he computed "gammas" (exponentials of scores that you 
could feed to a softmax) for different move features, which he fit to best 
explain the move probabilities from real games.

 

Similarly, AlphaGo's paper describes how their rollout policy's weights are 
trained to maximize log likelihood of moves from a database.

 

However, there is no a priori reason why imitating the probabilities observed 
in reference games should be optimal for this particular purpose.

 

I thought about this for about an hour this morning, and this is what I came up 
with. You could make a database of positions with a label indicating the result 
(perhaps from real games, perhaps similarly to how AlphaGo trained their value 
network). Loop over the positions, run a few playouts and tweak the move 
probabilities by some sort of reinforcement learning, where you promote the 
move choices from playouts whose outcome matches the label, and you discourage 
the move choices from playouts whose outcome does not match the label.

 

The point is that we would be pushing our playout policy to produce good 
estimates of the result of the game, which in the end is what playout policies 
are for.

 

Any thoughts? Did anyone actually try something like this?

 

Cheers,

Álvaro.

 

 

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo rollout nakade patterns?

2017-01-31 Thread Brian Sheppard via Computer-go


Rollout policy is a black art. Lots of trial and error. That said, you are on 
the right path if you use the published papers as the starting point for your 
own experiments.

Two board representation details are very important. If your board 
representation has these capabilities then you can represent an astounding 
amount of Go theory. In Pebbles, the following properties are incrementally 
updated as part of the make-move operation:

- the number of liberties for any string
- the index of the 3x3 neighborhood of every point.

Best,
Brian
 
-Original Message-
From: Roel van Engelen <ich.bun...@gmail.com>
To: computer-go <computer-go@computer-go.org>
Sent: Tue, Jan 31, 2017 12:55 pm
Subject: Re: [Computer-go] AlphaGo rollout nakade patterns?



@Gian-Carlo Pascutto, thanks! but identifying the shapes is not the problem=)

@Brain Sheppard, Sorry it was not my goal to make you start guessing any 
implementation details, somehow from your first explanation i got the 
impression that you where familiar with it and i was wondering whether anybody 
wrote something about it.
right now i am sticking to the paper as much as possible and reading, trying to 
understand how others improve their rollouts. I hope that i will be able to 
improve the rollouts at some point.


Roel



On 31 January 2017 at 17:21, Brian Sheppard via Computer-go 
<computer-go@computer-go.org> wrote:

If a "diamond" pattern is centered on a 5x5 square, then you have 13 points. 
The diagram below will give the idea.
 
__+__

_+++_
+
_+++_
__+__
 
 
At one bit per cell, this would be 8192 patterns, so this is why I am guessing 
that this is the pattern set. (You would set one bit for each captured stone, 
then look up in a table.)
 
I feel like I am engaging in a lot of guesswork regarding implementation 
details. I want to emphasize that the implementation details are not 
particularly important. The important point is that you can add this capability 
("reply on the vital point after the capture of a nakade group, provided that 
the opponent's surrounding stones have no additional eyes") to your rollout, 
and the implementation should take less than 1% of total time. Any 
implementation that achieves that goal will make a noticeable difference to 
strength.
 


 
-Original Message-
From: Roel van Engelen <ich.bun...@gmail.com>
To: computer-go <computer-go@computer-go.org>
Sent: Tue, Jan 31, 2017 10:42 am
Subject: Re: [Computer-go] AlphaGo rollout nakade patterns?



@Brain Sheppard
Thanks that is a really useful explanation! 
the way you state: "and therefore a 8192-sized pattern set will identify all 
potential nakade." seems to indicate this is a known pattern set? could i find 
some more information on it somewhere? also i was unable to find Pebbles, is it 
open source?

@Robert Jasiek
what definitions/papers/publications are you referring to?



m.v.g. Roel



On 24 January 2017 at 12:57, Brian Sheppard via Computer-go 
<computer-go@computer-go.org> wrote:

There are two issues: one is the shape and the other is the policy that the 
search should follow.

Of course the vital point is a killing move whether or not a group was just 
captured. So it is possible to detect such shapes on the board and then play 
the vital point.

It is an entirely different thing to say when a rollout should look for such 
features. Rollouts are complicated; playing the "best" play does not always 
make your search engine stronger. Of course, there is a question of the time 
required for analysis. And then there is the question of "balance".

"Balance" means that the rollout should play "equally well" for both sides, 
with the goal that the terminal nodes of the rollout are accurate evaluations 
of the leafs of the tree. If you incorporate all moves that punish tactical 
errors then sometimes you can get unbalanced results because you do not have 
rules that prevent tactical errors from happening.

A common rule for nakade is to only check after a group is captured. The point 
is that the vital point is otherwise not motivated by any heuristics, whereas 
most other moves in capturing races are suggested by local patterns. My 
understanding of Alpha Go's policy is that they were only checking for nakade 
after captures.

The "center of a group of three" rule is a separate issue. My recollection is 
that this pattern should be checked after every move, and that was a discovery 
by the Mogo team.

Note that there are often subtle differences for your program compared to the 
published papers.

Best,
Brian

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Gian-Carlo Pascutto
Sent: Tuesday, January 24, 2017 3:05 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo rollout nakade patterns?


On 23-01-17 20:10, Brian Sheppard via Computer

Re: [Computer-go] AlphaGo rollout nakade patterns?

2017-01-31 Thread Brian Sheppard via Computer-go


If a "diamond" pattern is centered on a 5x5 square, then you have 13 points. 
The diagram below will give the idea.

__+__

_+++_
+
_+++_
__+__


At one bit per cell, this would be 8192 patterns, so this is why I am guessing 
that this is the pattern set. (You would set one bit for each captured stone, 
then look up in a table.)

I feel like I am engaging in a lot of guesswork regarding implementation 
details. I want to emphasize that the implementation details are not 
particularly important. The important point is that you can add this capability 
("reply on the vital point after the capture of a nakade group, provided that 
the opponent's surrounding stones have no additional eyes") to your rollout, 
and the implementation should take less than 1% of total time. Any 
implementation that achieves that goal will make a noticeable difference to 
strength.


 
-Original Message-
From: Roel van Engelen <ich.bun...@gmail.com>
To: computer-go <computer-go@computer-go.org>
Sent: Tue, Jan 31, 2017 10:42 am
Subject: Re: [Computer-go] AlphaGo rollout nakade patterns?



@Brain Sheppard
Thanks that is a really useful explanation! 
the way you state: "and therefore a 8192-sized pattern set will identify all 
potential nakade." seems to indicate this is a known pattern set? could i find 
some more information on it somewhere? also i was unable to find Pebbles, is it 
open source?

@Robert Jasiek
what definitions/papers/publications are you referring to?



m.v.g. Roel



On 24 January 2017 at 12:57, Brian Sheppard via Computer-go 
<computer-go@computer-go.org> wrote:

There are two issues: one is the shape and the other is the policy that the 
search should follow.

Of course the vital point is a killing move whether or not a group was just 
captured. So it is possible to detect such shapes on the board and then play 
the vital point.

It is an entirely different thing to say when a rollout should look for such 
features. Rollouts are complicated; playing the "best" play does not always 
make your search engine stronger. Of course, there is a question of the time 
required for analysis. And then there is the question of "balance".

"Balance" means that the rollout should play "equally well" for both sides, 
with the goal that the terminal nodes of the rollout are accurate evaluations 
of the leafs of the tree. If you incorporate all moves that punish tactical 
errors then sometimes you can get unbalanced results because you do not have 
rules that prevent tactical errors from happening.

A common rule for nakade is to only check after a group is captured. The point 
is that the vital point is otherwise not motivated by any heuristics, whereas 
most other moves in capturing races are suggested by local patterns. My 
understanding of Alpha Go's policy is that they were only checking for nakade 
after captures.

The "center of a group of three" rule is a separate issue. My recollection is 
that this pattern should be checked after every move, and that was a discovery 
by the Mogo team.

Note that there are often subtle differences for your program compared to the 
published papers.

Best,
Brian

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Gian-Carlo Pascutto
Sent: Tuesday, January 24, 2017 3:05 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo rollout nakade patterns?


On 23-01-17 20:10, Brian Sheppard via Computer-go wrote:
> only captures of up to 9 stones can be nakade.

I don't really understand this.

http://senseis.xmp.net/?StraightThree

Both constructing this shape and playing the vital point are not captures. How 
can you detect the nakade (and play at a in time) if you only check captures?

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go




___Computer-go mailing 
listComputer-go@computer-go.orghttp://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo rollout nakade patterns?

2017-01-24 Thread Brian Sheppard via Computer-go

There are two issues: one is the shape and the other is the policy that the 
search should follow.

Of course the vital point is a killing move whether or not a group was just 
captured. So it is possible to detect such shapes on the board and then play 
the vital point.

It is an entirely different thing to say when a rollout should look for such 
features. Rollouts are complicated; playing the "best" play does not always 
make your search engine stronger. Of course, there is a question of the time 
required for analysis. And then there is the question of "balance".

"Balance" means that the rollout should play "equally well" for both sides, 
with the goal that the terminal nodes of the rollout are accurate evaluations 
of the leafs of the tree. If you incorporate all moves that punish tactical 
errors then sometimes you can get unbalanced results because you do not have 
rules that prevent tactical errors from happening.

A common rule for nakade is to only check after a group is captured. The point 
is that the vital point is otherwise not motivated by any heuristics, whereas 
most other moves in capturing races are suggested by local patterns. My 
understanding of Alpha Go's policy is that they were only checking for nakade 
after captures.

The "center of a group of three" rule is a separate issue. My recollection is 
that this pattern should be checked after every move, and that was a discovery 
by the Mogo team.

Note that there are often subtle differences for your program compared to the 
published papers.

Best,
Brian

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Gian-Carlo Pascutto
Sent: Tuesday, January 24, 2017 3:05 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo rollout nakade patterns?

On 23-01-17 20:10, Brian Sheppard via Computer-go wrote:
> only captures of up to 9 stones can be nakade.

I don't really understand this.

http://senseis.xmp.net/?StraightThree

Both constructing this shape and playing the vital point are not captures. How 
can you detect the nakade (and play at a in time) if you only check captures?

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo rollout nakade patterns?

2017-01-23 Thread Brian Sheppard via Computer-go


A capturing move has a potential nakade if the string that was removed is among 
a limited set of possibilities. Probably Alpha Go has a 13-point bounding 
region (e.g., the 13-point star) that it uses as a positional index, and 
therefore a 8192-sized pattern set will identify all potential nakade.

It is also easy and inexpensive to identify nakade by if/the/else rules. Very 
few moves in a Go game are captures, and only captures of up to 9 stones can be 
nakade. The captured stones must also fit in a 3x3 bounding box, so code can 
rapidly eliminate non-nakade situations.

The point is that using much less than 1% of your CPU time, you can identify 
potential nakade moves. And since you reach this stage very rarely, you can 
invest a lot of time trying to do precise analysis. In Pebbles, I concluded 
that it was worthwhile to also test that the surrounding strings likely did not 
have another potential eye. That is a pretty expensive calculation, but the 
code path was executed only when it was critical to success.

Nakade are enormously valuable because they are cheap to calculate and a 
playout would otherwise have close to 0% chance of following the right path.
 
-Original Message-
From: terry mcintyre via Computer-go 
To: computer-go 
Sent: Mon, Jan 23, 2017 1:14 pm
Subject: Re: [Computer-go] AlphaGo rollout nakade patterns?


I speculate: nakade involves creating a shape (such as three in a row or a 
bulky five) such that, if captured, it would only form one eye, given the 
proper placement. I can imagine a set of patterns which enumerate the 
possibilities. Some examples exist, however, which are quite complex. 


Sent from Yahoo Mail for iPad


On Monday, January 23, 2017, 11:45 AM, Roel van Engelen  
wrote:

I am trying to re-create the fast rollout policy as described by deepMind but 
got stuck on the nakade patterns:
"Nakade, # of patterns 8192: Move matches a nakade pattern at captured stone"

the "at captured stone" confuses me, my first thought is: "this is only 
computed if stones have been captured recently" but 
i don't think that is correct. how should i read it?


since they say "# of patterns 8192" i imagine they found some way to hash them 
just like the 3x3 and 12point diamond shapes but so far
i have not found a way to do so. I found that other engines use heuristics to 
find nakade patterns so my question is does AlphaGo use patterns and does 
somebody know how this works?


Thanks!
Roel

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


___Computer-go mailing 
listComputer-go@computer-go.orghttp://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

53 matches

Mail list logo