Re: [Computer-go] MuZero - new paper from DeepMind

2019-11-25 Thread Brian Sheppard via Computer-go
I read through that paper, but I admit that I didn't really get where the extra 
power comes from.


-Original Message-
From: valkyria 
To: Computer-go 
Sent: Mon, Nov 25, 2019 6:41 am
Subject: [Computer-go] MuZero - new paper from DeepMind

Hi,

if anyone still get email from this list:

DeepMind did it again

https://arxiv.org/abs/1911.08265

Best wishes
Magnus Persson




___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Indexing and Searching Go Positions -- Literature Wanted

2019-09-17 Thread Brian Sheppard via Computer-go
I remember a scheme (from Dave Dyer, IIRC) that indexed positions based on the 
points on which the 20th, 40th, 60th,... moves were made. IIRC it was nearly a 
unique key for pro positions.

Best,Brian

-Original Message-
From: Erik van der Werf 
To: computer-go 
Sent: Tue, Sep 17, 2019 5:55 am
Subject: Re: [Computer-go] Indexing and Searching Go Positions -- Literature 
Wanted

Hi Stephen,
I'm not aware of recent published work. There is an ancient document by Antti 
Huima on hash schemes for easy symmetry detection/lookup. Unfortunately his 
implementation was broken, but other schemes have been proposed that solve the 
issue (I found one myself, but I think many others found the same or similar 
solutions). You may want to search the archives for "Zobrist hashing with easy 
transformation comparison". If you like math Nic Schrauolph has an interesting 
solution ;-)
In Steenvreter I implemented a 16-segment scheme with a xor update (for 
rotation, mirroring and color symmetry). In GridMaster I have an experimental 
search feature which is somewhat similar except that I don't use hash keys 
(every possible point on the board simply gets its own bits), and I use 'or' 
instead of 'xor' (so stones that are added are never removed, which makes 
parsing game records extremely fast). This makes it very easy to filter 
positions/games that cannot match, and for the remainder (if needed, dealing 
with captures) it simply replays (which is fast enough because the number of 
remaining games is usually very small). I'm not sure what Kombilo does, but I 
wouldn't be surprised if it's similar. The only thing I haven't implemented yet 
is lookup of translated (shifted) local patterns. Still pondering what's most 
efficient for that, but I could simply run multiple searches with a mask.
Best,Erik

On Tue, Sep 17, 2019 at 10:17 AM Stephen Martindale 
 wrote:
>
> Dear Go programmers,
>
> I'm interested in experimenting with some new ideas for indexing and 
> searching Goban positions and patterns and I want to stand on the shoulders 
> of giants. Which papers, articles, blog posts or open-source code should I 
> read to get concrete knowledge of the approaches used in the past?
>
> I know that Kombilo is (or used to be) the state of the art in this field. 
> The source is available but, beyond reading the Libkombilo sources, are there 
> any other, more human friendly resources out there?
>
> My new ideas are currently insubstantial and vague but I have done some work, 
> in the past, with natural language embeddings and large-database image 
> indexing and searching and concepts from those two domains keep bouncing 
> around in my mind -- I can't help but feel that there must be something there 
> that can be the "next big thing" in Go position indexing.
>
> Any leads would be appreciated.
>
> Stephen Martindale
>
> +49 160 950 27545
> stephen.c.martind...@gmail.com
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Accelerating Self-Play Learning in Go

2019-03-08 Thread Brian Sheppard via Computer-go
>> contrary to intuition built up from earlier-generation MCTS programs in Go, 
>> putting significant weight on score maximization rather than only 
>> win/loss seems to help.

This narrative glosses over important nuances.

Collectively we are trying to find the golden mean of cost efficiency... The 
"Zero" community has proven that you can succeed without knowing anything about 
Go. The rest of us are trying to discover benefits of *some* prior knowledge, 
perhaps because we don't have cloud datacenters with TPUs. 

A lot of the Kago paper shows that techniques that are useful to MCTS/rollout 
programs are also useful for NN programs.

Brief historical summary...

MCTS/rollout programs that maximized win/loss outperformed programs that 
maximized point differential, because occasionally winning by a lot does not 
compensate for a lot of small losses. Rollouts are stochastic, so every 
position has opportunities to win/lose by a lot.

This result has been widely quoted along the lines of, "Go programs should only 
use wins and losses and not use point differential." This has long been known 
to be overstated.

Because using only win/loss has a core problem: a pure win/loss program is 
content to bleed points, which sometimes results in unnecessary losses. 
Fundamentally, it should be easier to win games where the theoretical point 
differential is larger, so losing points contributes to difficulties in 
distinguishing wins from losses using rollouts.

There were two responses to this: dynamic komi and point diff as tiebreaker. 
Dynamic komi adjusts komi when the rollout winning percentage falls outside of 
a range like [40%,60%]. Point-diff-as-tiebreaker reserves 90% of the result for 
the win/loss value of a rollout and 10% for a sigmoid function of the final 
point differential.

IIRC, Pachi invented point-diff-as-tiebreaker. The technique worked in my 
program as well, and it should work in a lot of MCTS/rollout programs.

Kago is using point-diff-as-tiebreaker. That is, the invention is to adapt the 
existing idea to the NN framework.

Did the Kago paper mention of dynamic komi? Kago can use that too, because its 
komi is a settable input parameter.


>Score maximization in self-play means it is encouraged to play more 
>aggressively/dangerously, by creating life/death problems on the board.

Point-diff-as-tiebreaker is *risk-averse*. The purpose is to keep all of the 
points, not to engage in risks to earn more. There is a graph in the Kago paper 
that will help to visualize  how 0.9 * winning + 0.1 * sigmoid(point-diff) 
trades off gains against losses.

Best,
Brian


___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] PUCT formula

2018-03-09 Thread Brian Sheppard via Computer-go
Thanks for the explanation. I agree that there is no actual consistency in 
exploration terms across historical papers.

I confirmed that the PUCT formulas across the AG, AGZ, and AZ papers are all 
consistent. That is unlikely to be an error. So now I am wondering whether the 
faster decay is useful for bootstrapping the learner. A short search (800 
trials) can run out of trials if it sticks with its first thought for too long.

So... following GCP's advice, I will adopt the AGZ formula as published. And 
not worry much about it, as MM advises.  :-)

Many thanks to all.

-Original Message-
From: Computer-go  On Behalf Of Martin 
Mueller
Sent: Friday, March 9, 2018 5:13 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] PUCT formula

I talked to Chenjun just now so this is what we both remember.
The PUCB formula as published in Chris Rosin’s paper actually has an additive 
knowledge term, and it looks nothing like the two different PUCT variants tried 
in AlphaGo and our paper.

Chenjun tried an additive term as in Chris’ paper first, and it did not work 
well for him. Then he tried the “UCT-like” PUCT as in our paper, with the decay 
term under the square root. This worked well for him. He never tried the 
AlphaGo formula with the faster decay.

Beyond the published papers, my memory is that many people tried many different 
combinations of knowledge terms and of decay formulas over time. I have never 
read any systematic comparison, or any principled argument for which type of 
decay is “best” for which type of knowledge, and why. It must depend on the 
quality of knowledge, amongst many other things.

There are also earlier MoGo papers that combined many different evaluation 
terms into one formula and tested them empirically.

Martin

> However, the formula in the AGZ paper doesn't look like any "UCT variant". 
> Formula from paper: Cpuct * P(s,a) * sqrt(Sum(N(s,b))) / (1 + N(s,a)) Note 
> that there is no logarithmic term, and the division by N+1 falls outside the 
> sqrt. For comparison, a normal UCT term looks like sqrt(ln(sum(N(s,b))) / (1 
> + N))
> 
> Since I asked my question, I found that other people have also noticed a 
> discrepancy. I saw a post on a DeepChem board about this subject. I also 
> found a paper 
> (https://webdocs.cs.ualberta.ca/~mmueller/ps/2016/2016-Integrating-Factorization-Ranked.pdf)
>  by our old friends Chenjun Xiao and Martin Muller:
> 
> "We apply a variant of PUCT [11] formula which is used in AlphaGo [12] to 
> integrate FBT knowledge in MCTS. " But then the formula that they give 
> differs: argmax((Q(s,a) + Cpuct * P(s,a) * sqrt( lg(N(s)) / (1 + N(s,a)))
> 
> I am guessing that Chenjun and Martin decided (or knew) that the AGZ paper 
> was incorrect and modified the equation accordingly.
> 
> Anyone remember anything about this?
> 
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] PUCT formula

2018-03-09 Thread Brian Sheppard via Computer-go
However, the formula in the AGZ paper doesn't look like any "UCT variant". 
Formula from paper: Cpuct * P(s,a) * sqrt(Sum(N(s,b))) / (1 + N(s,a)) Note that 
there is no logarithmic term, and the division by N+1 falls outside the sqrt. 
For comparison, a normal UCT term looks like sqrt(ln(sum(N(s,b))) / (1 + N))

Since I asked my question, I found that other people have also noticed a 
discrepancy. I saw a post on a DeepChem board about this subject. I also found 
a paper 
(https://webdocs.cs.ualberta.ca/~mmueller/ps/2016/2016-Integrating-Factorization-Ranked.pdf)
 by our old friends Chenjun Xiao and Martin Muller: 

"We apply a variant of PUCT [11] formula which is used in AlphaGo [12] to 
integrate FBT knowledge in MCTS. " But then the formula that they give 
differs: argmax((Q(s,a) + Cpuct * P(s,a) * sqrt( lg(N(s)) / (1 + N(s,a)))

I am guessing that Chenjun and Martin decided (or knew) that the AGZ paper was 
incorrect and modified the equation accordingly.

Anyone remember anything about this?

-Original Message-
From: Computer-go <computer-go-boun...@computer-go.org> On Behalf Of Gian-Carlo 
Pascutto
Sent: Friday, March 9, 2018 4:48 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] PUCT formula

On 08-03-18 18:47, Brian Sheppard via Computer-go wrote:
> I recall that someone investigated this question, but I don’t recall 
> the result. What is the formula that AGZ actually uses?

The one mentioned in their paper, I assume.

I investigated both that and the original from the referenced paper, but after 
tuning I saw little meaningful strength difference.

One thing of note is that (IIRC) the AGZ formula keeps scaling the exploration 
term by the policy prior forever. In the original formula, it is a diminishing 
term.

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] PUCT formula

2018-03-08 Thread Brian Sheppard via Computer-go
In the AGZ paper, there is a formula for what they call “a variant of the PUCT 
algorithm”, and they cite a paper from Christopher Rosin: 
http://gauss.ececs.uc.edu/Workshops/isaim2010/papers/rosin.pdf

 

But that paper has a formula that he calls the PUCB formula, which incorporates 
the priors in a different way.

 

And there is something called the PUCT algorithm, from our old friend Olivier 
Teytaud (et al): https://hal.inria.fr/hal-00835352/document/, but it is not 
about incorporating prior probabilities. It is about progressive widening in a 
provably consistent way.

 

I recall that someone investigated this question, but I don’t recall the 
result. What is the formula that AGZ actually uses?

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] On proper naming

2018-03-08 Thread Brian Sheppard via Computer-go
The technique originated with backgammon players in the late 1970's, who would 
roll out positions manually. Ron Tiekert (Scrabble champion) also applied the 
technique to Scrabble, and I took that idea for Maven. It seemed like people 
were using the terms interchangeably.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Darren Cook
Sent: Thursday, March 8, 2018 6:16 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] On proper naming

> but then it does not make sense to call that algorithm "rollout".
> 
> In general: when introducing a new name, care should be taken that the 
> name describes properly what is going on.

Speaking of which, why did people start calling them rollouts instead of 
playouts?

Darren

P.S. And don't get me started on "chains": at one point this seemed to be the 
standard term for a solidly connected set of stones, the basic unit of tactical 
search (as distinguished from a "group", which is made up of 1+ chains). But 
then somewhere along the way people started calling them strings.
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] 9x9 is last frontier?

2018-03-07 Thread Brian Sheppard via Computer-go
-7 exists, thus, missing to find the level-7 trap.

This is very hard to solve even if you have unlimited power.

 

The plain MCTS as used by AlphaZero is the most ill-suited MCTS version in my 
opinion and i have hard a hard time seeing how it can be competitive with 
Stockfish tactically.

 

My MCTS chess engine with  AlphaZero like MCTS was averaging was missing a lot 
of tactics. I don't use policy or eval networks but qsearch() for eval, and the 
policy is basically

choosing which ever moves leads to a higher eval.

 

a) My first improvement to the MCTS is to use minimax backups instead of 
averaging. This was an improvmenet but not something that would solve the traps

 

b) My second improvment is to use alphabeta rollouts. This is a rollouts 
version that can do nullmove and LMR etc... This is a huge improvment and none 
of the MCTS

versons can match it. More on alpha-beta rollouts here ( 
https://www.microsoft.com/en-us/research/wp-content/uploads/2014/11/huang_rollout.pdf
 )

 

So AlphaZero used none of the above improvements and yet it seems to be 
tactically strong. Leela-Zero suffered from tactical falls left and right too 
as I expected.

 

So the only explanation left is the policy network able to avoid traps which I 
find hard to believe it can identify more than a qsearch level tactics.

 

All I am saying is that my experience (as well as many others) with MCTS for 
tactical dominated games is bad, and there must be some breakthrough in that 
regard in AlphaZero

for it to be able to compete with Stockfish on a tactical level.

 

I am curious how Remi's attempt at Shogi using AlphaZero's method will turnout.

 

regards,

Daniel

 

 

 

 

 

 

 

 

On Tue, Mar 6, 2018 at 9:41 AM, Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

Training on Stockfish games is guaranteed to produce a blunder-fest, because 
there are no blunders in the training set and therefore the policy network 
never learns how to refute blunders.

 

This is not a flaw in MCTS, but rather in the policy network. MCTS will 
eventually search every move infinitely often, producing asymptotically optimal 
play. But if the policy network does not provide the guidance necessary to 
rapidly refute the blunders that occur in the search, then convergence of MCTS 
to optimal play will be very slow.

 

It is necessary for the network to train on self-play games using MCTS. For 
instance, the AGZ approach samples next states during training games by 
sampling from the distribution of visits in the search. Specifically: not by 
choosing the most-visited play!

 

You see how this policy trains both search and evaluation to be internally 
consistent? The policy head is trained to refute the bad moves that will come 
up in search, and the value head is trained to the value observed by the full 
tree. 

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org 
<mailto:computer-go-boun...@computer-go.org> ] On Behalf Of Dan
Sent: Monday, March 5, 2018 4:55 AM
To: computer-go@computer-go.org <mailto:computer-go@computer-go.org> 
Subject: Re: [Computer-go] 9x9 is last frontier?

 

Actually prior to this it was trained with hundreds of thousands of stockfish 
games and didn’t do well on tactics (the games were actually a blunder fest). I 
believe this is a problem of the MCTS used and not due to for lack of training. 

 

Go is a strategic game so that is different from chess that is full of traps.   
  

I m not surprised Lela zero did well in go.

 

On Mon, Mar 5, 2018 at 2:16 AM Gian-Carlo Pascutto <g...@sjeng.org 
<mailto:g...@sjeng.org> > wrote:

On 02-03-18 17:07, Dan wrote:
> Leela-chess is not performing well enough

I don't understand how one can say that given that they started with the
random network last week only and a few clients. Of course it's bad!
That doesn't say anything about the approach.

Leela Zero has gotten strong but it has been learning for *months* with
~400 people. It also took a while to get to 30 kyu.

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go


___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 

 


___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 


___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] 9x9 is last frontier?

2018-03-06 Thread Brian Sheppard via Computer-go
Well, AlphaZero did fine at chess tactics, and the papers are clear on the 
details. There must be an error in your deductions somewhere.

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Dan
Sent: Tuesday, March 6, 2018 1:46 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] 9x9 is last frontier?

 

I am pretty sure it is an MCTS problem and I suspect not something that could 
be easily solved with a policy network (could be wrong hree). My opinon is that 
DCNN is not

a miracle worker (as somebody already mentioned here) and it is going to fail  
resolving tactics.  I would be more than happy with it if it has same power as 
a qsearch to be honest.

 

Search traps are the major problem with games like Chess, and what makes 
transitioning the success of DCNN from Go to Chess non trivial.

The following paper discusses shallow traps that are prevalent in chess. ( 
https://www.aaai.org/ocs/index.php/ICAPS/ICAPS10/paper/download/1458/1571 )

They mention traps make MCTS very inefficient.  Even if the MCTS is given 50x 
more time is needed by an exhaustive minimax tree, it could fail to find a 
level-5 or level-7 trap.

It will spend, f.i, 95% of its time searching an asymetric tree of depth > 7 
when a shallow trap of depth-7 exists, thus, missing to find the level-7 trap.

This is very hard to solve even if you have unlimited power.

 

The plain MCTS as used by AlphaZero is the most ill-suited MCTS version in my 
opinion and i have hard a hard time seeing how it can be competitive with 
Stockfish tactically.

 

My MCTS chess engine with  AlphaZero like MCTS was averaging was missing a lot 
of tactics. I don't use policy or eval networks but qsearch() for eval, and the 
policy is basically

choosing which ever moves leads to a higher eval.

 

a) My first improvement to the MCTS is to use minimax backups instead of 
averaging. This was an improvmenet but not something that would solve the traps

 

b) My second improvment is to use alphabeta rollouts. This is a rollouts 
version that can do nullmove and LMR etc... This is a huge improvment and none 
of the MCTS

versons can match it. More on alpha-beta rollouts here ( 
https://www.microsoft.com/en-us/research/wp-content/uploads/2014/11/huang_rollout.pdf
 )

 

So AlphaZero used none of the above improvements and yet it seems to be 
tactically strong. Leela-Zero suffered from tactical falls left and right too 
as I expected.

 

So the only explanation left is the policy network able to avoid traps which I 
find hard to believe it can identify more than a qsearch level tactics.

 

All I am saying is that my experience (as well as many others) with MCTS for 
tactical dominated games is bad, and there must be some breakthrough in that 
regard in AlphaZero

for it to be able to compete with Stockfish on a tactical level.

 

I am curious how Remi's attempt at Shogi using AlphaZero's method will turnout.

 

regards,

Daniel

 

 

 

 

 

 

 

 

On Tue, Mar 6, 2018 at 9:41 AM, Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

Training on Stockfish games is guaranteed to produce a blunder-fest, because 
there are no blunders in the training set and therefore the policy network 
never learns how to refute blunders.

 

This is not a flaw in MCTS, but rather in the policy network. MCTS will 
eventually search every move infinitely often, producing asymptotically optimal 
play. But if the policy network does not provide the guidance necessary to 
rapidly refute the blunders that occur in the search, then convergence of MCTS 
to optimal play will be very slow.

 

It is necessary for the network to train on self-play games using MCTS. For 
instance, the AGZ approach samples next states during training games by 
sampling from the distribution of visits in the search. Specifically: not by 
choosing the most-visited play!

 

You see how this policy trains both search and evaluation to be internally 
consistent? The policy head is trained to refute the bad moves that will come 
up in search, and the value head is trained to the value observed by the full 
tree. 

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org 
<mailto:computer-go-boun...@computer-go.org> ] On Behalf Of Dan
Sent: Monday, March 5, 2018 4:55 AM
To: computer-go@computer-go.org <mailto:computer-go@computer-go.org> 
Subject: Re: [Computer-go] 9x9 is last frontier?

 

Actually prior to this it was trained with hundreds of thousands of stockfish 
games and didn’t do well on tactics (the games were actually a blunder fest). I 
believe this is a problem of the MCTS used and not due to for lack of training. 

 

Go is a strategic game so that is different from chess that is full of traps.   
  

I m not surprised Lela zero did well in go.

 

On Mon, Mar 5, 2018 at 2:16 AM Gian-Carlo Pascutto <g...@sjeng.org 
<mailto:g...@sjeng.org> > wrote:

On 0

Re: [Computer-go] 9x9 is last frontier?

2018-03-06 Thread Brian Sheppard via Computer-go
Training on Stockfish games is guaranteed to produce a blunder-fest, because 
there are no blunders in the training set and therefore the policy network 
never learns how to refute blunders.

 

This is not a flaw in MCTS, but rather in the policy network. MCTS will 
eventually search every move infinitely often, producing asymptotically optimal 
play. But if the policy network does not provide the guidance necessary to 
rapidly refute the blunders that occur in the search, then convergence of MCTS 
to optimal play will be very slow.

 

It is necessary for the network to train on self-play games using MCTS. For 
instance, the AGZ approach samples next states during training games by 
sampling from the distribution of visits in the search. Specifically: not by 
choosing the most-visited play!

 

You see how this policy trains both search and evaluation to be internally 
consistent? The policy head is trained to refute the bad moves that will come 
up in search, and the value head is trained to the value observed by the full 
tree. 

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Dan
Sent: Monday, March 5, 2018 4:55 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] 9x9 is last frontier?

 

Actually prior to this it was trained with hundreds of thousands of stockfish 
games and didn’t do well on tactics (the games were actually a blunder fest). I 
believe this is a problem of the MCTS used and not due to for lack of training. 

 

Go is a strategic game so that is different from chess that is full of traps.   
  

I m not surprised Lela zero did well in go.

 

On Mon, Mar 5, 2018 at 2:16 AM Gian-Carlo Pascutto  > wrote:

On 02-03-18 17:07, Dan wrote:
> Leela-chess is not performing well enough

I don't understand how one can say that given that they started with the
random network last week only and a few clients. Of course it's bad!
That doesn't say anything about the approach.

Leela Zero has gotten strong but it has been learning for *months* with
~400 people. It also took a while to get to 30 kyu.

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org  
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Project Leela Zero

2017-12-29 Thread Brian Sheppard via Computer-go
Seems like extraordinarily fast progress. Great to hear that.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
"Ingo Althöfer"
Sent: Friday, December 29, 2017 12:30 PM
To: computer-go@computer-go.org
Subject: [Computer-go] Project Leela Zero

Hello in the round,

I am not sure how narrowly people from the list are following the progress of 
Gian Carlo Pascutto's project Zero Leela. Therefore, here are some impressions.

The project site is:
http://zero.sjeng.org/

Shortly before Christmas some guys in the German Go mailing list claimed that 
LeelaZero had not yet reached a 25-kyu rank. To get an own impression I (with 
an inactive rating of 18-kyu) went on KGS and played a free game against 
LeelaZSlow. This version takes exactly 28 seconds per move, even in trivial 
situations. Long paragraph, short outcome: I lost clearly. You can download the 
sgf from here:
http://files.gokgs.com/games/2017/12/21/GoIngo-LeelaZSlow.sgf

In the meantime the KGS versions of Leela have made considerable progress. For 
instance, yesterday and today two 1-dans and a 3-dan were beaten in single 
games.
However, Leela also has "strange" weaknesses. The most serious one concerns 
hopeless ladders. The only way out for Leela seems to be to play early 
tengen-like moves (as possible ladder breakers).
At least three versions of LeelaZero are active:
LeelaZFast, LeelaZSlow, and LeelZeroT.

As soon as a new "best" LeelaZero version has emerged in selfplay runs (of 
length 400 games) it substitutes the previous version for play on KGS. 
Currently this happens 1 or 2 times per day.

Ingo.
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AGZ Policy Head

2017-12-29 Thread Brian Sheppard via Computer-go
I agree that having special knowledge for "pass" is not a big compromise, but 
it would not meet the "zero knowledge" goal, no?

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Rémi Coulom
Sent: Friday, December 29, 2017 7:50 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AGZ Policy Head

I also wonder about this. A purely convolutional approach would save a lot of 
weights. The output for pass can be set to be a single bias parameter, 
connected to nothing. Setting pass to a constant might work, too. I don't 
understand the reason for such a complication.

- Mail original -
De: "Andy" 
À: "computer-go" 
Envoyé: Vendredi 29 Décembre 2017 06:47:06
Objet: [Computer-go] AGZ Policy Head



Is there some particular reason AGZ uses two 1x1 filters for the policy head 
instead of one? 


They could also have allowed more, but I guess that would be expensive? I 
calculate that the fully connected layer has 2*361*362 weights, where 2 is the 
number of filters. 


By comparison the value head has only a single 1x1 filter, but it goes to a 
hidden layer of 256. That gives 1*361*256 weights. Why not use two 1x1 filters 
here? Maybe since the final output is only a single scalar it's not needed? 










___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mcts and tactics

2017-12-19 Thread Brian Sheppard via Computer-go
>I wouldn't find it so surprising if eventually the 20 or 40 block networks 
>develop a set of convolutional channels that traces possible ladders 
>diagonally across the board.

 

Learning the deep tactics is more-or-less guaranteed because of the interaction 
between search and evaluation through the training process.

 

Let’s look at the base case for tactical expertise: the evaluation function has 
learned that capturing stones is good.

 

In that state, a very short ladder can be seen through by tactics (800 nodes of 
search). This results in a corrective force that adjusts the evaluation 
function.

 

Once the evaluation function knows that very short ladders are favorable, then 
the search can see through deeper ladders.

 

This continues as the ladders become longer and longer.

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaZero & Co. still far from perfect play ?

2017-12-08 Thread Brian Sheppard via Computer-go
Agreed.

 

You can push this farther. If we define an “error” as a move that flips the W/L 
state of a Go game, then only the side that is currently winning can make an 
error. Let’s suppose that 6.5 komi is winning for Black. Then Black can make an 
error, and after he does then White can make an error, and so on. So the game 
ends up as a Black win if the total number of errors in the game is even.

 

Now build a spreadsheet modeling the W/L rate predicted for Black at various 
error rates. You will conclude that AZ and AGZ must still have sizeable error 
rates. For example, if the error rate were 1% then Black would win a lot more 
games than White, yet AZ and AGZ are still pretty close to 50/50.

 

A model like this has limits, but we can be certain that there is a lot of room 
for improvement.

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
ChtiGo via Computer-go
Sent: Friday, December 8, 2017 5:12 AM
To: computer-go@computer-go.org
Cc: patrick.bar...@laposte.net
Subject: [Computer-go] AlphaZero & Co. still far from perfect play ?

 

Hi,

 

In go, komi allows for a very fine balancing of the winning probability, 
whoever the first player to move.

 

However, under perfect play, for a given komi, Black has a either a win or a 
loss before putting any stone on the board. E.g., could be a win for 6.5 pts 
komi and a loss for 7.5 komi.

 

'Close to perfect play' (is this has any sense), one would expect observing 
unbalanced win/loss ratios for a player, when taking Black or taking White. 
This would be the equivallent of the 'drawish' character of chess, observed for 
top programs but already at amateur level.

 

At the level of play of AlphaZero and AlphagoZero (~5200 ELO FWIW), AlphaZero 
has Win/Loss count against AG0 3-day which seems quite independent of the color 
(31/19 with White, 29/21 with Black, see table 1 of AlphaZero paper). In great 
contrast with AZ/SF in chess tournament (25/0 with White, 3/0 with Black).

 

Although this statistics might be insufficient, can we elaborate a bit on this 
and hypothetize that:

 

1) Commonly used 7.5 pts komi defined from human games is so far a good value 
up to top programs level.

2) The level of play of AlphaZero (@5200 ELO), running of big machine, is still 
far from perfect play and there is there is plenty of room for improvement. 
Nothing in view close to a winning strategy.

3) Go (with komi) is deeper than chess (very biased statement ;-), since SOA 
programs on SOA machines still cannot find any 'color benefit'.

 

Regards,
Patrick

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-07 Thread Brian Sheppard via Computer-go
AZ scalability looks good in that diagram, and it is certainly a good start, 
but it only goes out through 10 sec/move. Also, if the hardware is 7x better 
for AZ than SF, then should we elongate the curve for AZ by 7x? Or compress the 
curve for SF by 7x? Or some combination? Or take the data at face value?

I just noticed that AZ has some losses when the opening was forced into 
specific variations as in Table 2. So we know that AZ is not perfect, but 19 
losses in 1200 games is hard to extrapolate. (Curious: SF was a net winner over 
AZ with White in a B40 Sicilian, the only position/color combination out of 24 
in which SF had an edge.)

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Rémi Coulom
Sent: Thursday, December 7, 2017 11:51 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a 
General Reinforcement Learning Algorithm

>My concern about many of these points of comparison is that they presume how 
>AZ scales. In the absence of data, I would guess that AZ gains much less from 
>hardware than SF. I am basing this guess on >two known facts. First is that AZ 
>did not lose a game, so the upper bound on its strength is perfection. Second, 
>AZ is a knowledge intensive program, so it is counting on judgement to a 
>larger degree.

Doesn't Figure 2 in the paper indicate convincingly that AZ scales better than 
Stockfish?

Rémi
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-07 Thread Brian Sheppard via Computer-go
> Which is IMHO missing the point a bit ;-)

I saw it the same way, while conceding that the facts are accurate.

It makes sense for SF to internalize the details before making decisions. At 
some point there will be a realization that AZ is a fundamental change.


>What about the data point that AlphaGo Zero gained 2100 Elo from its tree 
>search? In a game commonly considered less tactical?

That is a common perception, especially among those who have never debugged a 
Go program. :-)

I was coming at it from the other direction, reasoning that since SF and AZ are 
close to perfect at chess, then there is less to gain from speed. (Whereas I 
doubt that AGZ is close to perfect at Go.)

All of this is subject to my money back guarantee: my opinions are guaranteed 
wrong, or your money back. :-)


-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Gian-Carlo Pascutto
Sent: Thursday, December 7, 2017 8:17 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a 
General Reinforcement Learning Algorithm

On 7/12/2017 13:20, Brian Sheppard via Computer-go wrote:
> The conversation on Stockfish's mailing list focused on how the match 
> was imbalanced.

Which is IMHO missing the point a bit ;-)

> My concern about many of these points of comparison is that they 
> presume how AZ scales. In the absence of data, I would guess that AZ 
> gains much less from hardware than SF. I am basing this guess on two 
> known facts. First is that AZ did not lose a game, so the upper bound 
> on its strength is perfection. Second, AZ is a knowledge intensive 
> program, so it is counting on judgement to a larger degree.

What about the data point that AlphaGo Zero gained 2100 Elo from its tree 
search? In a game commonly considered less tactical?

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-07 Thread Brian Sheppard via Computer-go
The conversation on Stockfish's mailing list focused on how the match was 
imbalanced.

- AZ's TPU hardware was estimated at several times (7 times?) the computational 
power of Stockfish's.
- Stockfish's transposition table size (1 GB) was considered much too small for 
a 64 core machine.
- Stockfish's opening book is disabled, whereas AZ has, in effect, memorized a 
huge opening book.
- The match was against SF 8 (one year old) rather than the latest dev version.

To this I would add that the losses of Stockfish that I played through seemed 
to be largely self-similar, so it is possible that Stockfish has a relatively 
limited number of weaknesses that AZ does not, but the format of the match 
amplifies the issue.

So the attitude among the SF core is pretty competitive. Which is great news 
for continued development.

My concern about many of these points of comparison is that they presume how AZ 
scales. In the absence of data, I would guess that AZ gains much less from 
hardware than SF. I am basing this guess on two known facts. First is that AZ 
did not lose a game, so the upper bound on its strength is perfection. Second, 
AZ is a knowledge intensive program, so it is counting on judgement to a larger 
degree.

But I could be wrong. Maybe AZ falls apart tactically without 80K pops. There 
is no data, so all WAGs are valid.


-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Gian-Carlo Pascutto
Sent: Thursday, December 7, 2017 4:13 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a 
General Reinforcement Learning Algorithm

On 06-12-17 22:29, Brian Sheppard via Computer-go wrote:
> The chess result is 64-36: a 100 rating point edge! I think the
> Stockfish open source project improved Stockfish by ~20 rating points in
> the last year.

It's about 40-45 Elo FWIW.

> AZ would dominate the current TCEC. 

I don't think you'll get to 80 knps with a regular 22 core machine or
whatever they use. Remember that AZ hardware is about 16 x 1080 Ti's.
You'll lose that (70 - 40 = 30 Elo) advantage very, very quickly.

IMHO this makes it all the more clear how silly it is that so much
attention is given to TCEC with its completely arbitrary hardware choice.

> The Stockfish team will have some self-examination going forward for
> sure. I wonder what they will decide to do.

Probably the same the Zen team did. Ignore a large part of the result
because people's actual computers - let alone mobile phones - can't run
a neural net at TPU speeds.

The question is if resizing the network makes the resulting program more
competitive, enough to overcome the speed difference. And, aha, in which
direction are you going to try to resize? Bigger or smaller?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Brian Sheppard via Computer-go
Requiring a margin > 55% is a defense against a random result. A 55% score in a 
400-game match is 2 sigma.

But I like the AZ policy better, because it does not require arbitrary 
parameters. It also improves more fluidly by always drawing training examples 
from the current probability distribution, and when the program is close to 
perfect you would be able to capture the lest 5% of skill.

I am not sure what to make of the AZ vs AGZ result. Mathematically, there 
should be a degree of training sufficient for AZ to exceed any fixed level of 
skill, such as AGZ's 40/40 level. So there must be a reason why DeepMind did 
not report such a result, but it unclear what that is.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Darren Cook
Sent: Wednesday, December 6, 2017 12:58 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a 
General Reinforcement Learning Algorithm

> Mastering Chess and Shogi by Self-Play with a General Reinforcement 
> Learning Algorithm https://arxiv.org/pdf/1712.01815.pdf

One of the changes they made (bottom of p.3) was to continuously update the 
neural net, rather than require a new network to beat it 55% of the time to be 
used. (That struck me as strange at the time, when reading the AlphaGoZero 
paper - why not just >50%?)

The AlphaZero paper shows it out-performs AlphaGoZero, but they are comparing 
to the 20-block, 3-day version. Not the 40-block, 40-day version that was even 
stronger.

As papers rarely show failures, can we take it to mean they couldn't 
out-perform their best go bot, do you think? If so, I wonder how hard they 
tried?

In other words, do you think the changes they made from AlphaGo Zero to Alpha 
Zero have made it weaker (when just viewed from the point of view of making the 
strongest possible go program).

Darren
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Brian Sheppard via Computer-go
The chess result is 64-36: a 100 rating point edge! I think the Stockfish open 
source project improved Stockfish by ~20 rating points in the last year. Given 
the number of people/computers involved, Stockfish’s annual effort level seems 
comparable to the AZ effort.

 

Stockfish is really, really tweaked out to do exactly what it does. It is very 
hard to improve anything about Stockfish. To be clear: I am not disparaging the 
code or people or project in any way. The code is great, people are great, 
project is great. It is really easy to work on Stockfish, but very hard to make 
progress given the extraordinarily fine balance of resources that already 
exists.  I tried hard for about 6 months last year without any successes. I 
tried dozens (maybe 100?) experiments, including several that were motivated by 
automated tuning or automated searching for opportunities. No luck.

 

AZ would dominate the current TCEC. Stockfish didn’t lose a game in the 
semi-final, failing to make the final because of too many draws against the 
weaker players.

 

The Stockfish team will have some self-examination going forward for sure. I 
wonder what they will decide to do.

 

I hope this isn’t the last we see of these DeepMind programs.

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Richard Lorentz
Sent: Wednesday, December 6, 2017 12:50 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a 
General Reinforcement Learning Algorithm

 

One chess result stood out for me, namely, just how much easier it was for 
AlphaZero to win with white (25 wins, 25 draws, 0 losses) rather than with 
black (3 wins, 47 draws, 0 losses).

Maybe we should not give up on the idea of White to play and win in chess!

On 12/06/2017 01:24 AM, Hiroshi Yamashita wrote:

Hi, 

DeepMind makes strongest Chess and Shogi programs with AlphaGo Zero method. 

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning 
Algorithm 
https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_pdf_1712.01815.pdf
 

 
=DwIGaQ=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U=i0hg-cKH69CA5MsdosvezQ=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ=dsola-9J77ArHVeuVc0ZCZKn2nJOsjfsnJzPc_MdPDo=
 

AlphaZero(Chess) outperformed Stockfish after 4 hours, 
AlphaZero(Shogi) outperformed elmo after 2 hours. 

Search is MCTS. 
AlphaZero(Chess) searches 80,000 positions/sec. 
Stockfishsearches 70,000,000 positions/sec. 
AlphaZero(Shogi) searches 40,000 positions/sec. 
elmo searches 35,000,000 positions/sec. 

Thanks, 
Hiroshi Yamashita 

___ 
Computer-go mailing list 
Computer-go@computer-go.org   
https://urldefense.proofpoint.com/v2/url?u=http-3A__computer-2Dgo.org_mailman_listinfo_computer-2Dgo
 

 
=DwIGaQ=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U=i0hg-cKH69CA5MsdosvezQ=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ=Dflm7ezefzMJ9xLNmNYrSQKWa7qvG9FkzlCHngo_NcY=

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Significance of resignation in AGZ

2017-12-03 Thread Brian Sheppard via Computer-go
I have been interested in a different approach, and it had some elements in 
common with AGZ, so AGZ gave me the confidence to try it.

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Chaz G.
Sent: Sunday, December 3, 2017 4:05 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Significance of resignation in AGZ

 

Hi Brian,

 

Thanks for sharing your genuinely interesting result. One question though: why 
would you train on a non-"zero" program? Do you think your program as a result 
of your rules would perform better than zero, or is it imitating the best known 
algorithm inconvenient for your purposes?

 

Best,

-Chaz

 

On Sat, Dec 2, 2017 at 7:31 PM, Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

I implemented the ad hoc rule of not training on positions after the first 
pass, and my program is basically playing moves until the first pass is forced. 
(It is not a “zero” program, so I don’t mind ad hoc rules like this.)

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org 
<mailto:computer-go-boun...@computer-go.org> ] On Behalf Of Xavier Combelle
Sent: Saturday, December 2, 2017 12:36 PM
To: computer-go@computer-go.org <mailto:computer-go@computer-go.org> 


Subject: Re: [Computer-go] Significance of resignation in AGZ

 

It might make sense to enable resignation threshold even on stupid level. As 
such the first thing the network should learn would be not to resign to early 
(even before not passing)

 

Le 02/12/2017 à 18:17, Brian Sheppard via Computer-go a écrit :

I have some hard data now. My network’s initial training reached the same 
performance in half the iterations. That is, the steepness of skill gain in the 
first day of training was twice as great when I avoided training on fill-ins.

 

The has all the usual caveats: only one run before/after, YMMV, etc.

 

From: Brian Sheppard [mailto:sheppar...@aol.com] 
Sent: Friday, December 1, 2017 5:39 PM
To: 'computer-go'  <mailto:computer-go@computer-go.org> 
<computer-go@computer-go.org>
Subject: RE: [Computer-go] Significance of resignation in AGZ

 

I didn’t measure precisely because as soon as I saw the training artifacts I 
changed the code. And I am not doing an AGZ-style experiment, so there are 
differences for sure. So I will give you a swag…

 

Speed difference is maybe 20%-ish for 9x9 games.

 

A frequentist approach will overstate the frequency of fill-in plays by a 
pretty large factor, because fill-in plays are guaranteed to occur in every 
game but are not best in the competitive part of the game. This will affect the 
speed of learning in the early going.

 

The network will use some fraction (almost certainly <= 20%) of its capacity to 
improve accuracy on positions that will not contribute to its ultimate 
strength. This applies to both ordering and evaluation aspects.

 

 

 

 

From: Andy [mailto:andy.olsen...@gmail.com] 
Sent: Friday, December 1, 2017 4:55 PM
To: Brian Sheppard  <mailto:sheppar...@aol.com> <sheppar...@aol.com>; 
computer-go  <mailto:computer-go@computer-go.org> <computer-go@computer-go.org>
Subject: Re: [Computer-go] Significance of resignation in AGZ

 

Brian, do you have any experiments showing what kind of impact it has? It 
sounds like you have tried both with and without your ad hoc first pass 
approach?

 

 

 

 

2017-12-01 15:29 GMT-06:00 Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> >:

I have concluded that AGZ's policy of resigning "lost" games early is somewhat 
significant. Not as significant as using residual networks, for sure, but you 
wouldn't want to go without these advantages.

The benefit cited in the paper is speed. Certainly a factor. I see two other 
advantages.

First is that training does not include the "fill in" portion of the game, 
where every move is low value. I see a specific effect on the move ordering 
system, since it is based on frequency. By eliminating training on fill-ins, 
the prioritization function will not be biased toward moves that are not 
relevant to strong play. (That is, there are a lot of fill-in moves, which are 
usually not best in the interesting portion of the game, but occur a lot if the 
game is played out to the end, and therefore the move prioritization system 
would predict them more often.) My ad hoc alternative is to not train on 
positions after the first pass in a game. (Note that this does not qualify as 
"zero knowledge", but that is OK with me since I am not trying to reproduce 
AGZ.)

Second is the positional evaluation is not training on situations where 
everything is decided, so less of the NN capacity is devoted to situations in 
which nothing can be gained.

As always, YMMV.

Best,
Brian


__

Re: [Computer-go] Significance of resignation in AGZ

2017-12-02 Thread Brian Sheppard via Computer-go
I implemented the ad hoc rule of not training on positions after the first 
pass, and my program is basically playing moves until the first pass is forced. 
(It is not a “zero” program, so I don’t mind ad hoc rules like this.)

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Xavier Combelle
Sent: Saturday, December 2, 2017 12:36 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Significance of resignation in AGZ

 

It might make sense to enable resignation threshold even on stupid level. As 
such the first thing the network should learn would be not to resign to early 
(even before not passing)

 

Le 02/12/2017 à 18:17, Brian Sheppard via Computer-go a écrit :

I have some hard data now. My network’s initial training reached the same 
performance in half the iterations. That is, the steepness of skill gain in the 
first day of training was twice as great when I avoided training on fill-ins.

 

The has all the usual caveats: only one run before/after, YMMV, etc.

 

From: Brian Sheppard [mailto:sheppar...@aol.com] 
Sent: Friday, December 1, 2017 5:39 PM
To: 'computer-go'  <mailto:computer-go@computer-go.org> 
<computer-go@computer-go.org>
Subject: RE: [Computer-go] Significance of resignation in AGZ

 

I didn’t measure precisely because as soon as I saw the training artifacts I 
changed the code. And I am not doing an AGZ-style experiment, so there are 
differences for sure. So I will give you a swag…

 

Speed difference is maybe 20%-ish for 9x9 games.

 

A frequentist approach will overstate the frequency of fill-in plays by a 
pretty large factor, because fill-in plays are guaranteed to occur in every 
game but are not best in the competitive part of the game. This will affect the 
speed of learning in the early going.

 

The network will use some fraction (almost certainly <= 20%) of its capacity to 
improve accuracy on positions that will not contribute to its ultimate 
strength. This applies to both ordering and evaluation aspects.

 

 

 

 

From: Andy [mailto:andy.olsen...@gmail.com] 
Sent: Friday, December 1, 2017 4:55 PM
To: Brian Sheppard  <mailto:sheppar...@aol.com> <sheppar...@aol.com>; 
computer-go  <mailto:computer-go@computer-go.org> <computer-go@computer-go.org>
Subject: Re: [Computer-go] Significance of resignation in AGZ

 

Brian, do you have any experiments showing what kind of impact it has? It 
sounds like you have tried both with and without your ad hoc first pass 
approach?

 

 

 

 

2017-12-01 15:29 GMT-06:00 Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> >:

I have concluded that AGZ's policy of resigning "lost" games early is somewhat 
significant. Not as significant as using residual networks, for sure, but you 
wouldn't want to go without these advantages.

The benefit cited in the paper is speed. Certainly a factor. I see two other 
advantages.

First is that training does not include the "fill in" portion of the game, 
where every move is low value. I see a specific effect on the move ordering 
system, since it is based on frequency. By eliminating training on fill-ins, 
the prioritization function will not be biased toward moves that are not 
relevant to strong play. (That is, there are a lot of fill-in moves, which are 
usually not best in the interesting portion of the game, but occur a lot if the 
game is played out to the end, and therefore the move prioritization system 
would predict them more often.) My ad hoc alternative is to not train on 
positions after the first pass in a game. (Note that this does not qualify as 
"zero knowledge", but that is OK with me since I am not trying to reproduce 
AGZ.)

Second is the positional evaluation is not training on situations where 
everything is decided, so less of the NN capacity is devoted to situations in 
which nothing can be gained.

As always, YMMV.

Best,
Brian


___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 






___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Significance of resignation in AGZ

2017-12-02 Thread Brian Sheppard via Computer-go
I have some hard data now. My network’s initial training reached the same 
performance in half the iterations. That is, the steepness of skill gain in the 
first day of training was twice as great when I avoided training on fill-ins.

 

The has all the usual caveats: only one run before/after, YMMV, etc.

 

From: Brian Sheppard [mailto:sheppar...@aol.com] 
Sent: Friday, December 1, 2017 5:39 PM
To: 'computer-go' <computer-go@computer-go.org>
Subject: RE: [Computer-go] Significance of resignation in AGZ

 

I didn’t measure precisely because as soon as I saw the training artifacts I 
changed the code. And I am not doing an AGZ-style experiment, so there are 
differences for sure. So I will give you a swag…

 

Speed difference is maybe 20%-ish for 9x9 games.

 

A frequentist approach will overstate the frequency of fill-in plays by a 
pretty large factor, because fill-in plays are guaranteed to occur in every 
game but are not best in the competitive part of the game. This will affect the 
speed of learning in the early going.

 

The network will use some fraction (almost certainly <= 20%) of its capacity to 
improve accuracy on positions that will not contribute to its ultimate 
strength. This applies to both ordering and evaluation aspects.

 

 

 

 

From: Andy [mailto:andy.olsen...@gmail.com] 
Sent: Friday, December 1, 2017 4:55 PM
To: Brian Sheppard <sheppar...@aol.com>; computer-go 
<computer-go@computer-go.org>
Subject: Re: [Computer-go] Significance of resignation in AGZ

 

Brian, do you have any experiments showing what kind of impact it has? It 
sounds like you have tried both with and without your ad hoc first pass 
approach?

 

 

 

 

2017-12-01 15:29 GMT-06:00 Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> >:

I have concluded that AGZ's policy of resigning "lost" games early is somewhat 
significant. Not as significant as using residual networks, for sure, but you 
wouldn't want to go without these advantages.

The benefit cited in the paper is speed. Certainly a factor. I see two other 
advantages.

First is that training does not include the "fill in" portion of the game, 
where every move is low value. I see a specific effect on the move ordering 
system, since it is based on frequency. By eliminating training on fill-ins, 
the prioritization function will not be biased toward moves that are not 
relevant to strong play. (That is, there are a lot of fill-in moves, which are 
usually not best in the interesting portion of the game, but occur a lot if the 
game is played out to the end, and therefore the move prioritization system 
would predict them more often.) My ad hoc alternative is to not train on 
positions after the first pass in a game. (Note that this does not qualify as 
"zero knowledge", but that is OK with me since I am not trying to reproduce 
AGZ.)

Second is the positional evaluation is not training on situations where 
everything is decided, so less of the NN capacity is devoted to situations in 
which nothing can be gained.

As always, YMMV.

Best,
Brian


___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Significance of resignation in AGZ

2017-12-01 Thread Brian Sheppard via Computer-go
I didn’t measure precisely because as soon as I saw the training artifacts I 
changed the code. And I am not doing an AGZ-style experiment, so there are 
differences for sure. So I will give you a swag…

 

Speed difference is maybe 20%-ish for 9x9 games.

 

A frequentist approach will overstate the frequency of fill-in plays by a 
pretty large factor, because fill-in plays are guaranteed to occur in every 
game but are not best in the competitive part of the game. This will affect the 
speed of learning in the early going.

 

The network will use some fraction (almost certainly <= 20%) of its capacity to 
improve accuracy on positions that will not contribute to its ultimate 
strength. This applies to both ordering and evaluation aspects.

 

 

 

 

From: Andy [mailto:andy.olsen...@gmail.com] 
Sent: Friday, December 1, 2017 4:55 PM
To: Brian Sheppard <sheppar...@aol.com>; computer-go 
<computer-go@computer-go.org>
Subject: Re: [Computer-go] Significance of resignation in AGZ

 

Brian, do you have any experiments showing what kind of impact it has? It 
sounds like you have tried both with and without your ad hoc first pass 
approach?

 

 

 

 

2017-12-01 15:29 GMT-06:00 Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> >:

I have concluded that AGZ's policy of resigning "lost" games early is somewhat 
significant. Not as significant as using residual networks, for sure, but you 
wouldn't want to go without these advantages.

The benefit cited in the paper is speed. Certainly a factor. I see two other 
advantages.

First is that training does not include the "fill in" portion of the game, 
where every move is low value. I see a specific effect on the move ordering 
system, since it is based on frequency. By eliminating training on fill-ins, 
the prioritization function will not be biased toward moves that are not 
relevant to strong play. (That is, there are a lot of fill-in moves, which are 
usually not best in the interesting portion of the game, but occur a lot if the 
game is played out to the end, and therefore the move prioritization system 
would predict them more often.) My ad hoc alternative is to not train on 
positions after the first pass in a game. (Note that this does not qualify as 
"zero knowledge", but that is OK with me since I am not trying to reproduce 
AGZ.)

Second is the positional evaluation is not training on situations where 
everything is decided, so less of the NN capacity is devoted to situations in 
which nothing can be gained.

As always, YMMV.

Best,
Brian


___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Significance of resignation in AGZ

2017-12-01 Thread Brian Sheppard via Computer-go
I have concluded that AGZ's policy of resigning "lost" games early is somewhat 
significant. Not as significant as using residual networks, for sure, but you 
wouldn't want to go without these advantages.

The benefit cited in the paper is speed. Certainly a factor. I see two other 
advantages.

First is that training does not include the "fill in" portion of the game, 
where every move is low value. I see a specific effect on the move ordering 
system, since it is based on frequency. By eliminating training on fill-ins, 
the prioritization function will not be biased toward moves that are not 
relevant to strong play. (That is, there are a lot of fill-in moves, which are 
usually not best in the interesting portion of the game, but occur a lot if the 
game is played out to the end, and therefore the move prioritization system 
would predict them more often.) My ad hoc alternative is to not train on 
positions after the first pass in a game. (Note that this does not qualify as 
"zero knowledge", but that is OK with me since I am not trying to reproduce 
AGZ.)

Second is the positional evaluation is not training on situations where 
everything is decided, so less of the NN capacity is devoted to situations in 
which nothing can be gained.

As always, YMMV.

Best,
Brian


___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Is MCTS needed?

2017-11-16 Thread Brian Sheppard via Computer-go
State of the art in computer chess is alpha-beta search, but note that the 
search is very selective because of "late move reductions."

A late move reduction is to reduce depth for moves after the first move 
generated in a node. For example, a simple implementation would be "search the 
first move generated to depth N, and all subsequent moves to depth N-1". That 
simple idea is a big improvement.

LMR works (in large part) because move generators produce probable-best-first 
order. Therefore, if the first few moves do not work (i.e., refutes opponent's 
late move) then there probably isn't a refutation. By searching unlikely moves 
less, you have the time to search likely variations more.

Stockfish's actual implementation massively increases effective search depth 
over the simple policy outlined above. Because of LMR, the effective branching 
factor of chess search trees is less than 2.

IMO, late move reductions should work really well in Go, given that the move 
predictors are above 60% accuracy. But I have not tried. 

Using MCTS for training the evaluator is probably necessary, even if the 
ultimate goal is to use the evaluator in alpha-beta.

Best,
Brian

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Petr Baudis
Sent: Thursday, November 16, 2017 10:43 AM
To: computer-go@computer-go.org
Subject: [Computer-go] Is MCTS needed?

  Hi,

  when explaining AlphaGo Zero to a machine learning audience yesterday


(https://docs.google.com/presentation/d/1VIueYgFciGr9pxiGmoQyUQ088Ca4ouvEFDPoWpRO4oQ/view)

it occurred to me that using MCTS in this setup is actually such
a kludge!

  Originally, we used MCTS because with the repeated simulations,
we would be improving the accuracy of the arm reward estimates.  MCTS
policies assume stationary distributions, which is violated every time
we expand the tree, but it's an okay tradeoff if all you feed into the
tree are rewards in the form of just Bernoulli trials.  Moreover, you
could argue evaluations are somewhat monotonic with increasing node
depths as you are basically just fixing a growing prefix of the MC
simulation.

  But now, we expand the nodes literally all the time, breaking the
stationarity possibly in drastic ways.  There are no reevaluations that
would improve your estimate.  The input isn't binary but an estimate in
a continuous space.  Suddenly the Multi-armed Bandit analogy loses a lot
of ground.

  Therefore, can't we take the next step, and do away with MCTS?  Is
there a theoretical viewpoint from which it still makes sense as the best
policy improvement operator?

  What would you say is the current state-of-art game tree search for
chess?  That's a very unfamiliar world for me, to be honest all I really
know is MCTS...

-- 
Petr Baudis, Rossum
Run before you walk! Fly before you crawl! Keep moving forward!
If we fail, I'd rather fail really hugely.  -- Moist von Lipwig
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Zero is weaker than Master!?

2017-10-26 Thread Brian Sheppard via Computer-go
I would add that "wild guesses based on not enough info" is an indispensable 
skill.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Hideki Kato
Sent: Thursday, October 26, 2017 10:17 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Zero is weaker than Master!?

Xavier Combelle: <62b977d7-d227-a74d-04b7-0d46db6a7...@gmail.com>:
>It is just wild guesses  based on reasonable arguments but without 
>evidence.


Yes, of course. Due to not enough info provided by Google.

Hideki


>Le 26/10/2017
à 07:51, Hideki Kato a écrit :
>> You can believe
>>> 
Of what I understand same network architecture imply the same number of
>>> block
>> 
but David Silver told AlphaGo Master used 40 layers in 
>> 
May. 
>> http://www.bestchinanews.com/Science-Technology/1
0371.html
>> # The paper was submitted in April.

>>
>> Usually, network "architecture" does not imply the num
ber of 
>> layers whereas "configulation" may do.

>>
>> Clearly they made 40 layers version first because it's
 
>> called "1st instance" where the 80 layers one is called
 "2nd 
>> instance."  The 1st was trained 3 days and overtoo
k AlphaGo 
>> Lee.  Then they changed to the 2nd.  Awaring t
his fact, and 
>> watching the growing curve of the 1st, I g
uess 40 layers was 
>> not enough to reach AlphaGo Master le
vel and so they 
>> doubled the layers.

>>
>> Hideki

>>
>> Xavier Combelle: <1550c907-8b96-e4ea-1f5e-2344f394b967
@gmail.com>:
>>> As I understand the paper they directly cre
ated alphago zero with a 40 
>>> block

>>> setup.
>>> They just made a reduced 20 block setup to co
mpare on kifu prediction
>>> (as far as I searched in the pa
per, it is the only
>>> place where they mention the 20 bloc
k setup)
>>> They specifically mention comparing several ver
sion of their software.
>>> with various parameter

>>> If the number of block was an important parameter I hope they would

>>> mention it.

>>> Of course they are a lot of things that they try and failed and we 
>>> will

>>> not know about

>>> But I have hard time to believe that alphago zero with a 20 block is 
>>> one

>>> of them

>>> About the paper, there is no mention of the number of block of master:

>>> "AlphaGo Master is the program that defeated top human players by 
>>> 600

>>> in January, 2017 34 .

>>> It was previously unpublished but uses the same neural network

>>> architecture, reinforcement

>>> learning algorithm, and MCTS algorithm as described in this paper.

>>> However, it uses the

>>> same handcrafted features and rollouts as AlphaGo Lee

>>> and training was initialised by

>>> supervised learning from human data."

>>> Of what I understand same network architecture imply the same number 
>>> of

>>> block

>>> Le 25/10/2017 à 17:58, Xavier Combelle a écrit :

 I understand better

 Le 25/10/2017 à 04:28, Hideki Kato a écrit :

> Are you thinking the 1st instance could reach Master level

> if giving more training days?

> I don't think so.  The performance would be stopping

> improving at 3 days.  If not, why they built the 2nd

> instance?

> Best,

> Hideki

> Xavier Combelle: <05c04de1-59c4-8fcd-2dd1-094faabf3...@gmail.com>:

>> How is it a fair comparison if there is only 3 days of training 
>> for

>>> Zero ?

>> Master had longer training no ? Moreover, Zero has bootstrap 
>> problem

>> because at the opposite of Master it don't learn from expert 
>> games

>> which means that it is likely to be weaker with little training.

>> Le 24/10/2017 à 20:20, Hideki Kato a écrit :

>>> David Silver told Master used 40 layers network in May. 

>>> According to new paper, Master used the same architecture

>>> as Zero.  So, Master used 20 blocks ResNet.  

>>> The first instance of Zero, 20 blocks ResNet version, is

>>> weaker than Master (after 3 days training).  So, with the

>>> same layers (a fair comparison) Zero is weaker than

>>> Master.

>>> Hideki

>> ___

>> Computer-go mailing list

>> Computer-go@computer-go.org

>> http://computer-go.org/mailman/listinfo/computer-go

>>> ___

>>> Computer-go mailing list

>>> Computer-go@computer-go.org

>>> http://computer-go.org/mailman/listinfo/computer-go

>

>___

>Computer-go mailing list

>Computer-go@computer-go.org

>http://computer-go.org/mailman/listinfo/computer-go
--
Hideki Kato  
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero SGF - Free Use or Copyright?

2017-10-26 Thread Brian Sheppard via Computer-go
Well,... good luck with that! :-)

Seriously: it is important to account for p-space completeness. That is, a set 
of rules that covers Go without conflict must be exponential in space usage.

Search has a triple role in system design. It accounts (at least 
asymptotically) for missing knowledge and also papers over disagreements 
between rules. It also evaluates the global situation, which allows rules to be 
expressed in terms of purely local effects.

From my perspective, that is too good a deal to pass by. But I don't want to be 
only a bearer of bad news. If you accept a limitation on your rule sets, then 
there is a higher level conflict resolution method that will lead to good 
results.

Your rules could express their effect as a local point gain, in the sense of 
"temperature". That is, temperature == the difference between moving first and 
letting the opponent move first. Then CGT provides a higher-order theory for 
rationalizing multiple priorities.

This suggestion only addresses one of the three roles of search, though perhaps 
the most important one.

Best,
Brian


-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Robert Jasiek
Sent: Thursday, October 26, 2017 10:17 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo Zero SGF - Free Use or Copyright?

On 26.10.2017 13:52, Brian Sheppard via Computer-go wrote:
> MCTS is the glue that binds incompatible rules.

This is, however, not what I mean. Conflicting principles (call them rules if 
you like) must be dissolved by higher order principles. Only when all conflicts 
are dissolved should MCTS be applied.

What you describe has been used with success and better success than I expect 
what my knowledge-pure approach can currently achieve. But MCTS as glue for 
conflicting principles has also run into a boundary. I want to see that 
boundary surpassed by my pure approach.

--
robert jasiek
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero SGF - Free Use or Copyright?

2017-10-26 Thread Brian Sheppard via Computer-go
Robert is right, but Robert seems to think this hasn't been done. Actually 
every prominent non-neural MCTS program since Mogo has been based on the exact 
design that Robert describes. The best of them achieve somewhat greater 
strength than Robert expects.

MCTS is the glue that binds incompatible rules. It rationalizes different 
heuristics into a coherent whole by testing the ideas in a competition against 
one another using a meaningful evaluation (win/loss).

Best,
Brian

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Xavier Combelle
Sent: Thursday, October 26, 2017 1:50 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo Zero SGF - Free Use or Copyright?



>> The reason why (b) had became unpopular is because there is no go 
>> theory precise enough to implement it as an algorithm
>
> There is quite some theory of the 95% principle kind which might be 
> implemented as approximation. E.g. "Usually, defend your weak 
> important group." can be approximated by approximating "group", 
> "important" (its loss is too large in a quick positional judgement), 
> "weak" (can be killed in two successive moves), "defend" (after the 
> move, cannot be killed in two successive moves), "usually" (always, 
> unless there are several such groups and some must be chosen, say, 
> randomly; the approximation being that the alternative strategy of 
> large scale exchange is discarded).
>
> Besides, one must prioritise principles to solve conflicting 
> principles by a higher order principle.
>
> IMO, such an expert system combined with tree reading and maybe MCTS 
> to emulate reading used when a principle depends on reading can, with 
> an effort of a few manyears of implementation, already achieve amateur 
> mid dan. Not high dan yet because high dans can choose advanced 
> strategies, such as global exchange, and there are no good enough 
> principles for that yet, which would also consider necessary side 
> conditions related to influence, aji etc. I need to work out such 
> principles during the following years. Currently, the state is that 
> weaker principles have identified the major topics (influence, aji
> etc.) to be considered in fights but they must be refined to create 
> 95%+ principles.
>
> ***
>
> In the 80s and 90s, expert systems failed to do better than ca. 5 kyu 
> because principles were only marginally better than 50%. Today, (my) 
> average principles discard the weaker, 50% principles and are ca. 75%.
> Tomorrow, the 75% principles can be discarded for an average of 95% 
> principles. Expert systems get their chance again! Their major 
> disadvantage remains: great manpower is required for implementation.
> The advantage is semantical understanding.
>
From a software developer point of view enlighten by my knowledge of history of 
ai and history of go development,
 such approximate definition is close to useless to build a software at the 
current state of art.
One of the reason is as you state the considerable work it would require to 
implement a huge number of imprecise rules.
As you are not a software developer, I want you to look on this comics which 
state the difference between apparent difficulty and real difficulty of 
developping software. https://xkcd.com/1425/ As far as I understand your task 
to implement such an expert system would require the many years of 
implementations would be thousands of years.
As far as my experience speak the expected reward would be a win of one or two 
rank and so definitely not a mid dan amateur level.

Xavier Combelle

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Source code (Was: Reducing network size? (Was: AlphaGo Zero))

2017-10-25 Thread Brian Sheppard via Computer-go
I think it uses the champion network. That is, the training periodically 
generates a candidate, and there is a playoff against the current champion. If 
the candidate wins by more than 55% then a new champion is declared.

 

Keeping a champion is an important mechanism, I believe. That creates the 
competitive coevolution dynamic, where the network is evolving to learn how to 
beat the best, and not just most recent. Without that dynamic, the training 
process can go up and down.

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
uurtamo .
Sent: Wednesday, October 25, 2017 6:07 PM
To: computer-go 
Subject: Re: [Computer-go] Source code (Was: Reducing network size? (Was: 
AlphaGo Zero))

 

Does the self-play step use the most recent network for each move?

 

On Oct 25, 2017 2:23 PM, "Gian-Carlo Pascutto"  > wrote:

On 25-10-17 17:57, Xavier Combelle wrote:
> Is there some way to distribute learning of a neural network ?

Learning as in training the DCNN, not really unless there are high
bandwidth links between the machines (AFAIK - unless the state of the
art changed?).

Learning as in generating self-play games: yes. Especially if you update
the network only every 25 000 games.

My understanding is that this task is much more bottlenecked on game
generation than on DCNN training, until you get quite a bit of machines
that generate games.

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org  
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-19 Thread Brian Sheppard via Computer-go
So I am reading that residual networks are simply better than normal 
convolutional networks. There is a detailed write-up here: 
https://blog.waya.ai/deep-residual-learning-9610bb62c355

Summary: the residual network has a fixed connection that adds (with no 
scaling) the output of the previous level to the output of the current level. 
The point is that once some layer learns a concept, that concept is immediately 
available to all downstream layers, without need for learning how to propagate 
the value through a complicated network design. These connections also provide 
a fast pathway for tuning deeper layers.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Gian-Carlo Pascutto
Sent: Wednesday, October 18, 2017 4:33 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo Zero

On 18/10/2017 19:50, cazen...@ai.univ-paris8.fr wrote:
> 
> https://deepmind.com/blog/
> 
> http://www.nature.com/nature/index.html

Select quotes that I find interesting from a brief skim:

1) Using a residual network was more accurate, achieved lower error, and 
improved performance in AlphaGo by over 600 Elo.

2) Combining policy and value together into a single network slightly reduced 
the move prediction accuracy, but reduced the value error and boosted playing 
performance in AlphaGo by around another 600 Elo.

These gains sound very high (much higher than previous experiments with them 
reported here), but are likely due to the joint training.

3) The raw neural network, without using any lookahead, achieved an Elo rating 
of 3,055. ... AlphaGo Zero achieved a rating of 5,185.

The increase of 2000 Elo from tree search sounds very high, but this may just 
mean the value network is simply very good - and perhaps relatively better than 
the policy one. (They previously had problems there that SL
> RL for the policy network guiding the tree search - but I'm not sure
there's any relation)

4) History features Xt; Yt are necessary because Go is not fully observable 
solely from the current stones, as repetitions are forbidden.

This is a weird statement. Did they need 17 planes just to check for ko?
It seems more likely that history features are very helpful for the internal 
understanding of the network as an optimization. That sucks though - it's 
annoying for analysis and position setup.

Lastly, the entire training procedure is actually not very complicated at all, 
and it's hopeful the training is "faster" than previous approaches - but many 
things look fast if you can throw 64 GPU workers at a problem.

In this context, the graphs of the differing network architectures causing huge 
strength discrepancies are both good and bad. Making a better pick can cause 
you to get massively better results, take a bad pick and you won't come close.

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-18 Thread Brian Sheppard via Computer-go
I also expected bootstrapping by self-play. (I also wrote a post to that 
effect. But of course, DeepMind actually DID IT.)

But I didn't envision any of the other stuff. This is why I love their papers. 
Papers from most sources are predictable, skimpy, and sketchy, but theirs 
contain all sorts of deep insights that I never saw coming. And the theory, 
architecture, implementation, and explanation are all first-rate. It's like the 
Poker papers from U Alberta, or the source code for Stockfish. Lessons on every 
line.

Regarding Elo deltas: the length of Go games obscures what might be very small 
differences. E.g., if one player's moves are 3% more likely to be game-losing 
errors, then won't that player lose nearly every game? That is, 3 more 
"blunders" per game.

Regarding these details: at some level, all of these *must* be artifacts of 
training. That is, the NN architectures that did "badly" are still 
asymptotically optimal, so they should also eventually play equally well, 
provided that training continues indefinitely, and the network are large 
enough, and parameters do not freeze prematurely, and training eventually uses 
only self-play data, etc. I believe that is mathematically accurate, so I would 
ask a different question: why do those choices make better use of resources in 
the short run?

I have no idea; I'm just asking.



-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Gian-Carlo Pascutto
Sent: Wednesday, October 18, 2017 5:40 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo Zero

On 18/10/2017 22:00, Brian Sheppard via Computer-go wrote:
> This paper is required reading. When I read this team’s papers, I 
> think to myself “Wow, this is brilliant! And I think I see the next step.”
> When I read their next paper, they show me the next *three* steps.

Hmm, interesting way of seeing it. Once they had Lee Sedol AlphaGo, it was 
somewhat obvious that just self-playing that should lead to an improved policy 
and value net.

And before someone accuses me of Captain Hindsighting here, this was pointed 
out on this list:
http://computer-go.org/pipermail/computer-go/2017-January/009786.html

It looks to me like the real devil is in the details. Don't use a residual 
stack? -600 Elo. Don't combine the networks? -600 Elo.
Bootstrap the learning? -300 Elo

We made 3 perfectly reasonable choices and somehow lost 1500 Elo along the way. 
I can't get over that number, actually.

Getting the details right makes a difference. And they're getting them right, 
either because they're smart, because of experience from other domains, or 
because they're trying a ton of them. I'm betting on all 3.

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-18 Thread Brian Sheppard via Computer-go
Some thoughts toward the idea of general game-playing...

One aspect of Go is ideally suited for visual NN: strong locality of reference. 
 That is, stones affect stones that are nearby.

I wonder whether the late emergence of ladder understanding within AlphaGo Zero 
is an artifact of the board representation? The authors speculate that ladders 
are not as important as humans surmise.

Another aspect of Go is ideally suited for visual NN: translation invariance. 
The convolutions transfer knowledge around the board, with the presumption that 
good moves will travel well.

I wonder whether we can data-mine positional evaluations to discover features? 
E.g., start with a standard visual NN and then make a database of positions 
where the delta between actual and expected evaluation is large enough to cause 
a different move to be selected. Data mine features from this set, and extend 
the NN with new inputs. (This would not discover a notion like "liberty count", 
but would discover notions like "ladder breaker".)


-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Gian-Carlo Pascutto
Sent: Wednesday, October 18, 2017 4:33 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo Zero

On 18/10/2017 19:50, cazen...@ai.univ-paris8.fr wrote:
> 
> https://deepmind.com/blog/
> 
> http://www.nature.com/nature/index.html

Select quotes that I find interesting from a brief skim:

1) Using a residual network was more accurate, achieved lower error, and 
improved performance in AlphaGo by over 600 Elo.

2) Combining policy and value together into a single network slightly reduced 
the move prediction accuracy, but reduced the value error and boosted playing 
performance in AlphaGo by around another 600 Elo.

These gains sound very high (much higher than previous experiments with them 
reported here), but are likely due to the joint training.

3) The raw neural network, without using any lookahead, achieved an Elo rating 
of 3,055. ... AlphaGo Zero achieved a rating of 5,185.

The increase of 2000 Elo from tree search sounds very high, but this may just 
mean the value network is simply very good - and perhaps relatively better than 
the policy one. (They previously had problems there that SL
> RL for the policy network guiding the tree search - but I'm not sure
there's any relation)

4) History features Xt; Yt are necessary because Go is not fully observable 
solely from the current stones, as repetitions are forbidden.

This is a weird statement. Did they need 17 planes just to check for ko?
It seems more likely that history features are very helpful for the internal 
understanding of the network as an optimization. That sucks though - it's 
annoying for analysis and position setup.

Lastly, the entire training procedure is actually not very complicated at all, 
and it's hopeful the training is "faster" than previous approaches - but many 
things look fast if you can throw 64 GPU workers at a problem.

In this context, the graphs of the differing network architectures causing huge 
strength discrepancies are both good and bad. Making a better pick can cause 
you to get massively better results, take a bad pick and you won't come close.

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-18 Thread Brian Sheppard via Computer-go
Yeah, I would expect that encoding stones as "signed liberty count" would train 
faster/better/stronger. You could imagine a follow-up paper where Go features 
are supplied.

But, if I think about this on a large scale, wouldn't it be huge to put 
together a general game-playing program just based on visual-NN and simple 
MCTS? In the extended section is a comparison of 
checkers/backgammon/poker/scrabble(thanks!)/... that have been addressed by 
rollouts, and how those research efforts differed from the tabula rasa / ab 
initio formulation. The list of games that are played above human caliber now 
includes deterministic and non-deterministic, and perfect and imperfect 
information, so there are examples of every type out there.

I'm hoping that DeepMind will pursue general game-playing. It would be awesome 
to see strong AIs emerging from first principles. I would be happy to supply 
Maven as a sparring partner for Scrabble. I also have a world-class Pineapple 
Poker player that is based on rollouts and hand-coded features.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Gian-Carlo Pascutto
Sent: Wednesday, October 18, 2017 4:38 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo Zero

On 18/10/2017 22:00, Brian Sheppard via Computer-go wrote:
> A stunning result. The NN uses a standard vision architecture (no Go 
> adaptation beyond what is necessary to represent the game state).

The paper says that Master (4858 rating) uses Go specific features, initialized 
by SL, and the same technique. Without go features, and without initialization, 
it's Zero (5185 rating).

The obvious question is, what would be the result of using go features and not 
initializing?

I would expect that providing liberties is a useful shortcut (see my remark 
about game history!). But I'm willing to be surprised :-)

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-18 Thread Brian Sheppard via Computer-go
This paper is required reading. When I read this team’s papers, I think to 
myself “Wow, this is brilliant! And I think I see the next step.” When I read 
their next paper, they show me the next *three* steps. I can’t say enough good 
things about the quality of the work.

 

A stunning result. The NN uses a standard vision architecture (no Go adaptation 
beyond what is necessary to represent the game state). The MCTS is the simplest 
imaginable search: it has no playout policy!

 

I can’t wait to see what the team does next.

 

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Álvaro Begué
Sent: Wednesday, October 18, 2017 3:08 PM
To: computer-go 
Subject: Re: [Computer-go] AlphaGo Zero

 

A link to the paper (from the blog post): 
https://deepmind.com/documents/119/agz_unformatted_nature.pdf

Enjoy!
Álvaro.

 

On Wed, Oct 18, 2017 at 2:29 PM, Richard Lorentz  > wrote:

Wow! That's very exciting. I'm glad they didn't completely shelve the project 
as they implied they might do after the match with Lee Sedol.

I'm looking forward to seeing some games and "... plays unknown to humans", as 
Hassabis states.

Also, I love this comment from Silver, something I have always promoted: The 
implication is that “algorithms matter much more than either computing or data 
available”.

-Richard




On 10/18/2017 10:50 AM, cazen...@ai.univ-paris8.fr 
  wrote:

https://urldefense.proofpoint.com/v2/url?u=https-3A__deepmind.com_blog_ 

 
=DwIGaQ=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U=i0hg-cKH69CA5MsdosvezQ=IPW6s_201Mkb1YsJA4v5VU1jX-PAmMmmrbwYr8hhh2w=zdcDXO2JZU2MfbTwTTrIB8JlmwOD4L11kctLG8w4ktI=
 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.nature.com_nature_index.html
 

 
=DwIGaQ=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U=i0hg-cKH69CA5MsdosvezQ=IPW6s_201Mkb1YsJA4v5VU1jX-PAmMmmrbwYr8hhh2w=rb3-CGHJ_gOUzsjkKh1Ul9f7-eDNkvaWahvgs689xWA=
 
Impressive!
 
___
Computer-go mailing list
Computer-go@computer-go.org  
https://urldefense.proofpoint.com/v2/url?u=http-3A__computer-2Dgo.org_mailman_listinfo_computer-2Dgo
 

 
=DwIGaQ=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U=i0hg-cKH69CA5MsdosvezQ=IPW6s_201Mkb1YsJA4v5VU1jX-PAmMmmrbwYr8hhh2w=Mw0XMdQlRQY3qFVrnSXjL_zBrnoa72a_n8wWibfyxRg=

 


___
Computer-go mailing list
Computer-go@computer-go.org  
http://computer-go.org/mailman/listinfo/computer-go

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Alphago and solving Go

2017-08-06 Thread Brian Sheppard via Computer-go
Possibly you are answering a different question than the one posed? Possibly 
your interpretation is the one actually intended. I don’t know, and maybe you 
could be right about what was being asked.

 

I do know the semantics of brute force, though, which you quoted below.

 

Note that “brute force” != unintelligent. Inevitably, every brute force 
algorithm will incorporate intelligent heuristics. Consider the evolution of 
minimax, for example, via alpha-beta, selective extensions, LMR, etc.

 

 

 

From: Steven Clark [mailto:steven.p.cl...@gmail.com] 
Sent: Sunday, August 6, 2017 2:52 PM
To: Brian Sheppard <sheppar...@aol.com>
Cc: computer-go <computer-go@computer-go.org>
Subject: Re: [Computer-go] Alphago and solving Go

 

This is semantics. Yes, in the limit of infinite time, it is brute-force. 
Meanwhile, in the real world, AlphaGo chooses to balance its finite time budget 
between depth & width. The mere fact that the CNN policy network generates a 
score for each coordinate on the board in a given position, does not mean that 
all of those nodes will be expanded in any reasonable scenario.

 

On Sun, Aug 6, 2017 at 2:20 PM, Brian Sheppard <sheppar...@aol.com 
<mailto:sheppar...@aol.com> > wrote:

I understand why most people are saying that AlphaGo is not brute force, 
because it appears to be highly selective. But MCTS is a full width search. 
Read the AlphaGo papers, as one of the other respondents (rather sarcastically) 
suggested: AlphaGo will eventually search every move at every node.

 

MCTS has the appearance of a selective search because time control terminates 
search while the tree is still ragged. In fact, it will search every 
continuation an infinite number of times.

 

In order to have high performance, an MCTS implementation needs to search best 
moves as early as possible in each node. It is in this respect that AlphaGo 
truly excels. (AlphaGo also excels at whole board evaluation, but that is a 
separate topic.)

 

 

From: Steven Clark [mailto:steven.p.cl...@gmail.com 
<mailto:steven.p.cl...@gmail.com> ] 
Sent: Sunday, August 6, 2017 1:14 PM
To: Brian Sheppard <sheppar...@aol.com <mailto:sheppar...@aol.com> >; 
computer-go <computer-go@computer-go.org <mailto:computer-go@computer-go.org> >
Subject: Re: [Computer-go] Alphago and solving Go

 

Why do you say AlphaGo is brute-force? Brute force is defined as: "In computer 
science, brute-force search or exhaustive search, also known as generate and 
test, is a very general problem-solving technique that consists of 
systematically enumerating all possible candidates for the solution and 
checking whether each candidate satisfies the problem's statement."

 

The whole point of the policy network is to avoid brute-force search, by 
reducing the branching factor...

 

On Sun, Aug 6, 2017 at 10:42 AM, Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

Yes, AlphaGo is brute force.

No it is impossible to solve Go.

Perfect play looks a lot like AlphaGo in that you would not be able to tell the 
difference. But I think that AlphaGo still has 0% win rate against perfect play.

 

My own best guess is that top humans make about 12 errors per game. This is 
estimated based on the win rate of top pros in head-to-head games. The 
calculation starts by assuming that Go is a win at 6.5 komi for either Black 
(more likely) or White, so a perfect player would win 100% for Black. Actual 
championship caliber players win 51% to 52% for Black. In 9-dan play overall, I 
think the rate is 53% to 54% for Black. Then you can estimate how many errors 
each player has to make to bring about such a result. E.g., If players made 
only one error on average, then Black would win the vast majority of games, so 
they must make more errors. I came up with 12 errors per game, but you can 
reasonably get other numbers based on your model.

 

Best,

Brian

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org 
<mailto:computer-go-boun...@computer-go.org> ] On Behalf Of Cai Gengyang
Sent: Sunday, August 6, 2017 9:49 AM
To: computer-go@computer-go.org <mailto:computer-go@computer-go.org> 
Subject: [Computer-go] Alphago and solving Go

 

Is Alphago brute force search? 

Is it possible to solve Go for 19x19 ? 

And what does perfect play in Go look like? 

How far are current top pros from perfect play?

 

Gengyang


___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Alphago and solving Go

2017-08-06 Thread Brian Sheppard via Computer-go
If AlphaGo actually used hard (e.g. permanent) pruning, then it would not be 
brute force. But it doesn’t operate that way, so it is brute force.

 

BTW, AlphaGo’s papers reports benefiting from search and RAVE. That suggests 
that hard pruning is a risky business.

 

From: David Wu [mailto:lightvec...@gmail.com] 
Sent: Sunday, August 6, 2017 2:54 PM
To: Brian Sheppard <sheppar...@aol.com>; computer-go@computer-go.org
Cc: Steven Clark <steven.p.cl...@gmail.com>
Subject: Re: [Computer-go] Alphago and solving Go

 

Saying in an unqualified way that AlphaGo is brute force is wrong in the spirit 
of the question. Assuming AlphaGo uses a typical variant of MCTS, it is 
technically correct. The reason it's technically correct uninteresting is 
because the bias introduced by a policy net is so extreme that it might as well 
be a selective search. 

 

Or put another way, imagine one were to set a threshold on the policy net 
output past a certain point in the tree such that moves below the threshold 
would be hard-pruned, and that threshold were set to a level that would prune, 
say, 70% of the legal moves in an average position. In technical sense, the 
search would no longer be full-width, and therefore it would suddenly become 
"not brute force" according to the definition earlier in the thread. But this 
distinction is not very useful, because moves in the tree that fall below such 
a threshold would receive zero simulations under any reasonable time controls 
anyways, so there would be no practical observable difference in the program's 
search or its play.

 

So - spirit of the question - no AlphaGo is not brute force, its search is 
selective to an extreme due to the policy net, the vast majority of 
possibilities will never be in practice given any attention or time whatsoever.

 

Technical answer - yes, AlphaGo is brute force, in that in the limit of having 
enormously vastly many more orders of magnitude of search time than we would 
ever devote to it and unbounded memory, it will theoretically eventually search 
everything (maybe, it would still depend on the actual details of its 
implementation).

 

 

On Sun, Aug 6, 2017 at 2:20 PM, Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

I understand why most people are saying that AlphaGo is not brute force, 
because it appears to be highly selective. But MCTS is a full width search. 
Read the AlphaGo papers, as one of the other respondents (rather sarcastically) 
suggested: AlphaGo will eventually search every move at every node.

 

MCTS has the appearance of a selective search because time control terminates 
search while the tree is still ragged. In fact, it will search every 
continuation an infinite number of times.

 

In order to have high performance, an MCTS implementation needs to search best 
moves as early as possible in each node. It is in this respect that AlphaGo 
truly excels. (AlphaGo also excels at whole board evaluation, but that is a 
separate topic.)

 

 

From: Steven Clark [mailto:steven.p.cl...@gmail.com 
<mailto:steven.p.cl...@gmail.com> ] 
Sent: Sunday, August 6, 2017 1:14 PM
To: Brian Sheppard <sheppar...@aol.com <mailto:sheppar...@aol.com> >; 
computer-go <computer-go@computer-go.org <mailto:computer-go@computer-go.org> >
Subject: Re: [Computer-go] Alphago and solving Go

 

Why do you say AlphaGo is brute-force? Brute force is defined as: "In computer 
science, brute-force search or exhaustive search, also known as generate and 
test, is a very general problem-solving technique that consists of 
systematically enumerating all possible candidates for the solution and 
checking whether each candidate satisfies the problem's statement."

 

The whole point of the policy network is to avoid brute-force search, by 
reducing the branching factor...

 

On Sun, Aug 6, 2017 at 10:42 AM, Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

Yes, AlphaGo is brute force.

No it is impossible to solve Go.

Perfect play looks a lot like AlphaGo in that you would not be able to tell the 
difference. But I think that AlphaGo still has 0% win rate against perfect play.

 

My own best guess is that top humans make about 12 errors per game. This is 
estimated based on the win rate of top pros in head-to-head games. The 
calculation starts by assuming that Go is a win at 6.5 komi for either Black 
(more likely) or White, so a perfect player would win 100% for Black. Actual 
championship caliber players win 51% to 52% for Black. In 9-dan play overall, I 
think the rate is 53% to 54% for Black. Then you can estimate how many errors 
each player has to make to bring about such a result. E.g., If players made 
only one error on average, then Black would win the vast majority of games, so 
they must make more errors. I came up with 12 errors per

Re: [Computer-go] Alphago and solving Go

2017-08-06 Thread Brian Sheppard via Computer-go
There is a definition of “brute force” on Wikipedia. Seems to me that MCTS 
fits. Deep Blue fits.

 

From: Álvaro Begué [mailto:alvaro.be...@gmail.com] 
Sent: Sunday, August 6, 2017 2:56 PM
To: Brian Sheppard <sheppar...@aol.com>; computer-go 
<computer-go@computer-go.org>
Subject: Re: [Computer-go] Alphago and solving Go

 

It was already a stretch when people said that Deep Blue was a brute-force 
searcher. If we apply it to AlphaGo as well, the term just means nothing. 
Full-width and brute-force are most definitely not the same thing.

 

Álvaro.

 

 

 

On Sun, Aug 6, 2017 at 2:20 PM, Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

I understand why most people are saying that AlphaGo is not brute force, 
because it appears to be highly selective. But MCTS is a full width search. 
Read the AlphaGo papers, as one of the other respondents (rather sarcastically) 
suggested: AlphaGo will eventually search every move at every node.

 

MCTS has the appearance of a selective search because time control terminates 
search while the tree is still ragged. In fact, it will search every 
continuation an infinite number of times.

 

In order to have high performance, an MCTS implementation needs to search best 
moves as early as possible in each node. It is in this respect that AlphaGo 
truly excels. (AlphaGo also excels at whole board evaluation, but that is a 
separate topic.)

 

 

From: Steven Clark [mailto:steven.p.cl...@gmail.com 
<mailto:steven.p.cl...@gmail.com> ] 
Sent: Sunday, August 6, 2017 1:14 PM
To: Brian Sheppard <sheppar...@aol.com <mailto:sheppar...@aol.com> >; 
computer-go <computer-go@computer-go.org <mailto:computer-go@computer-go.org> >
Subject: Re: [Computer-go] Alphago and solving Go

 

Why do you say AlphaGo is brute-force? Brute force is defined as: "In computer 
science, brute-force search or exhaustive search, also known as generate and 
test, is a very general problem-solving technique that consists of 
systematically enumerating all possible candidates for the solution and 
checking whether each candidate satisfies the problem's statement."

 

The whole point of the policy network is to avoid brute-force search, by 
reducing the branching factor...

 

On Sun, Aug 6, 2017 at 10:42 AM, Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

Yes, AlphaGo is brute force.

No it is impossible to solve Go.

Perfect play looks a lot like AlphaGo in that you would not be able to tell the 
difference. But I think that AlphaGo still has 0% win rate against perfect play.

 

My own best guess is that top humans make about 12 errors per game. This is 
estimated based on the win rate of top pros in head-to-head games. The 
calculation starts by assuming that Go is a win at 6.5 komi for either Black 
(more likely) or White, so a perfect player would win 100% for Black. Actual 
championship caliber players win 51% to 52% for Black. In 9-dan play overall, I 
think the rate is 53% to 54% for Black. Then you can estimate how many errors 
each player has to make to bring about such a result. E.g., If players made 
only one error on average, then Black would win the vast majority of games, so 
they must make more errors. I came up with 12 errors per game, but you can 
reasonably get other numbers based on your model.

 

Best,

Brian

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org 
<mailto:computer-go-boun...@computer-go.org> ] On Behalf Of Cai Gengyang
Sent: Sunday, August 6, 2017 9:49 AM
To: computer-go@computer-go.org <mailto:computer-go@computer-go.org> 
Subject: [Computer-go] Alphago and solving Go

 

Is Alphago brute force search? 

Is it possible to solve Go for 19x19 ? 

And what does perfect play in Go look like? 

How far are current top pros from perfect play?

 

Gengyang


___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 


___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Alphago and solving Go

2017-08-06 Thread Brian Sheppard via Computer-go
I understand why most people are saying that AlphaGo is not brute force, 
because it appears to be highly selective. But MCTS is a full width search. 
Read the AlphaGo papers, as one of the other respondents (rather sarcastically) 
suggested: AlphaGo will eventually search every move at every node.

 

MCTS has the appearance of a selective search because time control terminates 
search while the tree is still ragged. In fact, it will search every 
continuation an infinite number of times.

 

In order to have high performance, an MCTS implementation needs to search best 
moves as early as possible in each node. It is in this respect that AlphaGo 
truly excels. (AlphaGo also excels at whole board evaluation, but that is a 
separate topic.)

 

 

From: Steven Clark [mailto:steven.p.cl...@gmail.com] 
Sent: Sunday, August 6, 2017 1:14 PM
To: Brian Sheppard <sheppar...@aol.com>; computer-go 
<computer-go@computer-go.org>
Subject: Re: [Computer-go] Alphago and solving Go

 

Why do you say AlphaGo is brute-force? Brute force is defined as: "In computer 
science, brute-force search or exhaustive search, also known as generate and 
test, is a very general problem-solving technique that consists of 
systematically enumerating all possible candidates for the solution and 
checking whether each candidate satisfies the problem's statement."

 

The whole point of the policy network is to avoid brute-force search, by 
reducing the branching factor...

 

On Sun, Aug 6, 2017 at 10:42 AM, Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

Yes, AlphaGo is brute force.

No it is impossible to solve Go.

Perfect play looks a lot like AlphaGo in that you would not be able to tell the 
difference. But I think that AlphaGo still has 0% win rate against perfect play.

 

My own best guess is that top humans make about 12 errors per game. This is 
estimated based on the win rate of top pros in head-to-head games. The 
calculation starts by assuming that Go is a win at 6.5 komi for either Black 
(more likely) or White, so a perfect player would win 100% for Black. Actual 
championship caliber players win 51% to 52% for Black. In 9-dan play overall, I 
think the rate is 53% to 54% for Black. Then you can estimate how many errors 
each player has to make to bring about such a result. E.g., If players made 
only one error on average, then Black would win the vast majority of games, so 
they must make more errors. I came up with 12 errors per game, but you can 
reasonably get other numbers based on your model.

 

Best,

Brian

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org 
<mailto:computer-go-boun...@computer-go.org> ] On Behalf Of Cai Gengyang
Sent: Sunday, August 6, 2017 9:49 AM
To: computer-go@computer-go.org <mailto:computer-go@computer-go.org> 
Subject: [Computer-go] Alphago and solving Go

 

Is Alphago brute force search? 

Is it possible to solve Go for 19x19 ? 

And what does perfect play in Go look like? 

How far are current top pros from perfect play?

 

Gengyang


___
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Alphago and solving Go

2017-08-06 Thread Brian Sheppard via Computer-go
Yes, AlphaGo is brute force.

No it is impossible to solve Go.

Perfect play looks a lot like AlphaGo in that you would not be able to tell the 
difference. But I think that AlphaGo still has 0% win rate against perfect play.

 

My own best guess is that top humans make about 12 errors per game. This is 
estimated based on the win rate of top pros in head-to-head games. The 
calculation starts by assuming that Go is a win at 6.5 komi for either Black 
(more likely) or White, so a perfect player would win 100% for Black. Actual 
championship caliber players win 51% to 52% for Black. In 9-dan play overall, I 
think the rate is 53% to 54% for Black. Then you can estimate how many errors 
each player has to make to bring about such a result. E.g., If players made 
only one error on average, then Black would win the vast majority of games, so 
they must make more errors. I came up with 12 errors per game, but you can 
reasonably get other numbers based on your model.

 

Best,

Brian

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Cai 
Gengyang
Sent: Sunday, August 6, 2017 9:49 AM
To: computer-go@computer-go.org
Subject: [Computer-go] Alphago and solving Go

 

Is Alphago brute force search? 

Is it possible to solve Go for 19x19 ? 

And what does perfect play in Go look like? 

How far are current top pros from perfect play?

 

Gengyang

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Possible idea - decay old simulations?

2017-07-24 Thread Brian Sheppard via Computer-go
>I haven't tried it, but (with the computer chess hat on) these kind of 
>proposals behave pretty badly when you get into situations where your 
>evaluation is off and there are horizon effects.

In computer Go, this issue focuses on cases where the initial move ordering is 
bad. It isn't so much evaluation/horizon.

For example, at some node moves A, B, C, and D are seen as the best move, but 
they all fail. So perhaps you invest 2000 moves in each, with a 40% win rate 
overall. Now you come to move E, which is best. In fact, E is a refutation, 
with a 70% win rate.

The question is: how long will it take before this node is considered to be 
refuted?

The problem is that we started the node badly. When E is first explored, the 
node has 3200 wins, 4800 losses. Even with a 70% win rate, E needs 4000 trials  
(= 2800 wins, 1200 losses) just to get back to a 50% win rate.

But if you had explored E first, then this node would have been considered 
refuted after maybe a few hundred trials. There is literally a factor of 100 
difference in search efficiency.

This situation arises seems odd, but arises frequently. For example, if E is a 
winning move now, but becomes a losing move if played in the future. (For 
example, E is in a capturing race that is currently winning for the side to 
move first.) Unless E is tried early in the node's lifecycle, the RAVE score 
for E is very low.



>> I recall a paper published on this basis. A paper presumably about
>> CrazyStone: Efficient Selectivity and Backup Operators in Monte-Carlo 
>> Tree Search.

>I'm reasonably sure this did not include forgetting/discounting, only shifting 
>between average and maximum by focusing simulations near the maximum.

Yes, but emphasizing the maximum is equivalent to forgetting.

E.g., in my example, giving E a larger weight is equivalent to giving A, B, C, 
and D a lower weight (== forgetting).



___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Possible idea - decay old simulations?

2017-07-23 Thread Brian Sheppard via Computer-go
Yes. This is a long-known phenomenon.

 

I was able to get improvements in Pebbles based on the idea of forgetting 
unsuccessful results. It has to be done somewhat carefully, because results 
propagate up the tree. But you can definitely make it work.

 

I recall a paper published on this basis. A paper presumably about CrazyStone: 
Efficient Selectivity and Backup Operators in

Monte-Carlo Tree Search. I see a paper called “Accelerated UCT and Its 
Application to

Two-Player Games”. Could be others.

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
David Wu
Sent: Sunday, July 23, 2017 12:25 PM
To: computer-go@computer-go.org
Subject: [Computer-go] Possible idea - decay old simulations?

 

I've been using Leela 0.10.0 for analysis quite often, and I've noticed 
something that might lead to an improvement for the search, and maybe also for 
other MCTS programs.

 

Sometimes, after putting hundreds of thousands of simulations into a few 
possible moves, Leela will appear to favor one, disliking the others for having 
clearly worse reported winrates. But then every once in a while, the winrate 
for one of the other disliked moves will start rising gradually, but very 
consistently.

 

When this happens, if I play down the variation for that move and look at the 
analysis window, I often find that Leela has discovered a new tactic. 
Specifically, I find a node in that subtree where one move has a greatly higher 
winrate than all the others, but does not have too many simulations yet, 
meaning Leela only just now found it. 

(Possibly it already has more simulations than any other single move, but the 
number of simulations of all of the other moves combined still significantly 
outweighs it).

 

Going back to the root, it's clear that if the new tactic has a high enough 
winrate, then the previously disliked move will eventually overtake the favored 
move. But it takes a long time, since the disliked move has a lot of bad 
simulations to outweigh - it's painful to watch the winrate creep up slowly but 
with high upward consistency, until it finally beats out the previously favored 
move.

 

I think there's a chance that the search could be improved by adding a small 
decay over time to the weight of old simulations. This would allow a move to be 
promoted a bit more rapidly with the discovery of a better tactic. You would 
probably want to set the decay over time so that the total weight over time 
still readily approaches infinity (e.g. a fixed exponential decay would 
probably be bad, that would bound the total weight by a constant), but perhaps 
a bit slower than linearly. 

 

Thinking about it from the multi-armed-bandit perspective, I think this also 
makes sense. The distribution of results from each child is nonstationary, 
because the subtree below the child is evolving over time. If they were 
stationary you would weight all historical simulations equally, but since they 
aren't, the more-recent results from a child should get a little bit more 
weight since they give you more information about the current performance of 
the child move.

 

Has anyone tried this sort of idea before?

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Value network that doesn't want to learn.

2017-06-23 Thread Brian Sheppard via Computer-go
>... my value network was trained to tell me the game is balanced at the 
>beginning...

:-)

The best training policy is to select positions that correct errors.

I used the policies below to train a backgammon NN. Together, they reduced the 
expected loss of the network by 50% (cut the error rate in half):

- Select training positions from the program's own games.
- Can be self-play or versus an opponent.
- Best is to have a broad panel of opponents.
- Beneficial to bootstrap with pro games, but then add ONLY training 
examples from program's own games.
- Train only the moves made by the winner of the game
- Very important for deterministic games!
- Note that the winner can be either your program or the opponent.
- If your program wins then training reinforces good behavior; if 
opponent wins then training corrects bad behavior.
- Per game, you should aim to get only a few training examples (3 in 
backgammon. Maybe 10 in Go?). Use two policies:
- Select positions where the static evaluation of a position is 
significantly different from a deep search
- Select positions where the move selected by a deep search did not 
have the highest static evaluation. (And in this case you have two training 
positions, which differ by the move chosen.)
- Of course, you are selecting examples where you did as badly as 
possible.
- The training value of the position is the result of a deep search.
- This is equivalent to "temporal difference learning", but accelerated 
by the depth of the search.
- Periodically refresh the training evaluations as your search/eval 
improve.

These policies actively seek out cases where your evaluation function has some 
weakness, so training is definitely focused on improving results in the 
distribution of positions that your program will actually face.

You will need about 30 training examples for every free parameter in your NN. 
You can do the math on how many games that will take. It is inevitable: you 
will train your NN based on blitz games.

Good luck!



___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Brian Sheppard via Computer-go
My reaction was "well, if you are using alpha-beta, then at least use LMR 
rather than hard pruning." Your reaction is "don't use alpha-beta", and you 
would know better than anyone!

Yes, LMR in Go has is a big difference compared to LMR in chess: Go tactics 
take many moves to play out, whereas chess tactics are often pretty immediate. 
So LMR could hurt Go tactics much more than it hurts chess tactics. Compare the 
benefit of forcing the playout to the end of the game.

Best,
Brian

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Gian-Carlo Pascutto
Sent: Monday, May 22, 2017 4:08 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] mini-max with Policy and Value network

On 20/05/2017 22:26, Brian Sheppard via Computer-go wrote:
> Could use late-move reductions to eliminate the hard pruning. Given 
> the accuracy rate of the policy network, I would guess that even move
> 2 should be reduced.
> 

The question I always ask is: what's the real difference between MCTS with a 
small UCT constant and an alpha-beta search with heavy Late Move Reductions? 
Are the explored trees really so different?

In any case, in my experiments Monte Carlo still gives a strong benefit, even 
with a not so strong Monte Carlo part. IIRC it was the case for AlphaGo too, 
and they used more training data for the value network than is publicly 
available, and Zen reported the same: Monte Carlo is important.

The main problem is the "only top x moves part". Late Move Reductions are very 
nice because there is never a full pruning. This heavy pruning by the policy 
network OTOH seems to be an issue for me. My program has big tactical holes.

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

2017-05-20 Thread Brian Sheppard via Computer-go
Could use late-move reductions to eliminate the hard pruning. Given the 
accuracy rate of the policy network, I would guess that even move 2 should be 
reduced.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Hiroshi Yamashita
Sent: Saturday, May 20, 2017 3:42 PM
To: computer-go@computer-go.org
Subject: [Computer-go] mini-max with Policy and Value network

Hi,

HiraBot author reported mini-max search with Policy and Value network. 
It does not use monte-carlo.
Only top 8 moves from Policy is searched in root node. In other depth,  top 4 
moves is searched.

Game result against Policy network best move (without search)

 Win Loss winrate 
MaxDepth=1, (558-442) 0.558   +40 Elo
MaxDepth=2, (351-150) 0.701  +148 Elo
MaxDepth=3, (406-116) 0.778  +218 Elo
MaxDepth=4, (670- 78) 0.896  +374 Elo
MaxDepth=5, (490- 57) 0.896  +374 Elo
MaxDepth=6, (520- 20) 0.963  +556 Elo

Search is simple alpha-beta.
There is a modification Policy network high probability moves tend to be 
selected.
MaxDepth=6 takes one second/move on i7-4790k + GTX1060.

His nega-max code
http://kiyoshifk.dip.jp/kiyoshifk/apk/negamax.zip
CGOS result, MaxDepth=6
http://www.yss-aya.com/cgos/19x19/cross/minimax-depth6.html
His Policy network(without search) is maybe 
http://www.yss-aya.com/cgos/19x19/cross/DCNN-No336-tygem.html
His Policy and Value network(MCTS) is maybe 
http://www.yss-aya.com/cgos/19x19/cross/Hiratuka10_38B100.html

Thanks,
Hiroshi Yamashita

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Patterns and bad shape

2017-04-18 Thread Brian Sheppard via Computer-go
My comment was addressed to the original question, which mentioned more 
traditional pattern based work, such as Remi’s.

 

Let’s think about how you might build a NN using a large pattern base as inputs.

 

A NN has K features per point on the board, and you don’t want K to be a large 
number. Instead you look for the K most fundamental concepts that are 
non-overlapping and additive in terms of information content.

 

For example: liberty counts are certainly the highest importance, ladders are 
non-local patterns that the NN would not be able to learn, influence of some 
sort, ko,… (AlphaGo did a great job of input design, IMO.)

 

So, how to incorporate knowledge of many patterns, each of which is 
individually rare?

 

You could have just two additional input bit (per point) that represents 
“tactically forced move for side-to-move/opponent”. That is, combine all 
knowledge about “super urgent” moves into a two bits per point.

 

Could add value. Can’t hurt? Should add value?

 

But note that current thinking is giving most credit to the value network 
(evaluation function) rather than policy network (move ordering).

 

From: Jim O'Flaherty [mailto:jim.oflaherty...@gmail.com] 
Sent: Tuesday, April 18, 2017 9:06 AM
To: computer-go@computer-go.org; Brian Sheppard <sheppar...@aol.com>
Subject: Re: [Computer-go] Patterns and bad shape

 

Now, I love this idea. A super fast cheap pattern matcher can act as input into 
a neural network input layer in sort of a "pay additional attention here and 
here and...".

 

On Apr 18, 2017 6:31 AM, "Brian Sheppard via Computer-go" 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

Adding patterns is very cheap: encode the patterns as an if/else tree, and it 
is O(log n) to match.

 

Pattern matching as such did not show up as a significant component of Pebbles. 
But that is mostly because all of the machinery that makes pattern-matching 
cheap (incremental updating of 3x3 neighborhoods, and related tricks) was a 
large component.

 

But I think that is the basic idea: pre-compute or incrementally compute some 
basic functions of the board position so that pattern matching is cheap. Then 
add as many patterns as possible.

 

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org 
<mailto:computer-go-boun...@computer-go.org> ] On Behalf Of Jim O'Flaherty
Sent: Monday, April 17, 2017 7:05 AM
To: computer-go@computer-go.org <mailto:computer-go@computer-go.org> 
Subject: Re: [Computer-go] Patterns and bad shape

 

It seems chasing down good moves for bad shapes would be an explosion of 
"exception cases", like combinatorially huge. So, while you would be saving 
some branching in the search space, you would be ballooning up the number of 
patterns for which to scan by orders of magnitude.

 

Wouldn't it be preferable to just have the AI continue to make the better move 
emergently and generally from probabilities around win placements as opposed to 
what appears to be a focus on one type of local optima?

 

 

On Mon, Apr 17, 2017 at 5:07 AM, lemonsqueeze <lemonsque...@free.fr 
<mailto:lemonsque...@free.fr> > wrote:

Hi,

I'm sure the topic must have come up before but i can't seem to find it right 
now, i'd appreciate if someone can point me in the right direction.

I'm looking into MM, LFR and similar cpu-based pattern approaches for
generating priors, and was wondering about basic bad shape:

Let's say we use 6d games for training. The system becomes pretty good at 
predicting 6d moves by learning patterns associated with the kind of moves 6d 
players make.

However it seems it doesn't learn to punish basic mistakes effectively (say 
double ataris, shapes with obvious defects ...) because they almost never show 
up in 6d games =) They show up in the search tree though and without good 
answer search might take a long time to realize these moves don't work.

Maybe I missed some later paper / development but basically,
Wouldn't it make sense to also train on good answers to bad moves ?
(maybe harvesting them from the search tree or something like that)

I'm thinking about basic situations like this which patterns should be able to 
recognize:

   A B C D E F G H J K L M N O P Q R S T
 +---+
  19 | . . . . . . . . . . . . . . . . . . . |
  18 | . . . O O . O . . . . . . . . . . . . |
  17 | . . X . X O . X O . . . . . . . . . . |
  16 | . . X . X O . O O . . . . . . X . . . |
  15 | . . . . . X X X X . . . . . . . . . . |
  14 | . . X . . . . . . . . . . . . . . . . |
  13 | . O . . . . . . . . . . . . X . O . . |
  12 | . . O . . . . . . . . . . . . . O . . |
  11 | . . O X . . . . . . . . . . . X . . . |
  10 | . . O X . . . . . . . . . . . X O . . |
   9 | . . O X . . . X . . . X O . . X O . . |
   8 | . . O X . . . O X X X O X)X X O O . . |
   7 | . O X . . . . . O O X O . X O O . . . |
   6 | 

Re: [Computer-go] Patterns and bad shape

2017-04-18 Thread Brian Sheppard via Computer-go
Adding patterns is very cheap: encode the patterns as an if/else tree, and it 
is O(log n) to match.

 

Pattern matching as such did not show up as a significant component of Pebbles. 
But that is mostly because all of the machinery that makes pattern-matching 
cheap (incremental updating of 3x3 neighborhoods, and related tricks) was a 
large component.

 

But I think that is the basic idea: pre-compute or incrementally compute some 
basic functions of the board position so that pattern matching is cheap. Then 
add as many patterns as possible.

 

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Jim 
O'Flaherty
Sent: Monday, April 17, 2017 7:05 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Patterns and bad shape

 

It seems chasing down good moves for bad shapes would be an explosion of 
"exception cases", like combinatorially huge. So, while you would be saving 
some branching in the search space, you would be ballooning up the number of 
patterns for which to scan by orders of magnitude.

 

Wouldn't it be preferable to just have the AI continue to make the better move 
emergently and generally from probabilities around win placements as opposed to 
what appears to be a focus on one type of local optima?

 

 

On Mon, Apr 17, 2017 at 5:07 AM, lemonsqueeze  > wrote:

Hi,

I'm sure the topic must have come up before but i can't seem to find it right 
now, i'd appreciate if someone can point me in the right direction.

I'm looking into MM, LFR and similar cpu-based pattern approaches for
generating priors, and was wondering about basic bad shape:

Let's say we use 6d games for training. The system becomes pretty good at 
predicting 6d moves by learning patterns associated with the kind of moves 6d 
players make.

However it seems it doesn't learn to punish basic mistakes effectively (say 
double ataris, shapes with obvious defects ...) because they almost never show 
up in 6d games =) They show up in the search tree though and without good 
answer search might take a long time to realize these moves don't work.

Maybe I missed some later paper / development but basically,
Wouldn't it make sense to also train on good answers to bad moves ?
(maybe harvesting them from the search tree or something like that)

I'm thinking about basic situations like this which patterns should be able to 
recognize:

   A B C D E F G H J K L M N O P Q R S T
 +---+
  19 | . . . . . . . . . . . . . . . . . . . |
  18 | . . . O O . O . . . . . . . . . . . . |
  17 | . . X . X O . X O . . . . . . . . . . |
  16 | . . X . X O . O O . . . . . . X . . . |
  15 | . . . . . X X X X . . . . . . . . . . |
  14 | . . X . . . . . . . . . . . . . . . . |
  13 | . O . . . . . . . . . . . . X . O . . |
  12 | . . O . . . . . . . . . . . . . O . . |
  11 | . . O X . . . . . . . . . . . X . . . |
  10 | . . O X . . . . . . . . . . . X O . . |
   9 | . . O X . . . X . . . X O . . X O . . |
   8 | . . O X . . . O X X X O X)X X O O . . |
   7 | . O X . . . . . O O X O . X O O . . . |
   6 | O X X . X X X O . . O . . . X O X . . |
   5 | . O O . . O X . . . . . O . . . . . . |
   4 | . X O O O O X . . . O . . O . O . . . |
   3 | . X X X X O . . X . X X . . . . . . . |
   2 | . . . . X O . . . . . . O . . . . . . |
   1 | . . . . . . . . . . . . . . . . . . . |
 +---+

Patterns probably never see this during training and miss W L9,
For example :

In Remi's CrazyPatterns.exe L9 comes in 4th position:
   [ M10 N10 O6 L9 ...

With Pachi's large patterns it's 8th:
   [ G8  M10 G9  O17 N10 O6  J4  L9  ...

Cheers,
Matt



MM: Computing Elo Ratings of Move Patterns in the Game of Go
   https://www.remi-coulom.fr/Amsterdam2007/

LFR: Move Prediction in Go – Modelling Feature Interactions Using
  Latent Factors
   https://www.ismll.uni-hildesheim.de/pub/pdfs/wistuba_et_al_KI_2013.pdf


___
Computer-go mailing list
Computer-go@computer-go.org  
http://computer-go.org/mailman/listinfo/computer-go

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] dealing with multiple local optima

2017-02-24 Thread Brian Sheppard via Computer-go
OK, so let’s talk theory vs practice.

 

In theory, TD learning approaches asymptotic optimality when used with a full 
state space. That is, if your RL model has one parameter for each state, then 
TD will converge those parameters to the game theoretic values. There are some 
pre-conditions, but this is true with remarkable generality. In particular, 
there is nothing about stochastic vs deterministic in the theory.

 

So yes, the published experiments for Chess were failures. That is, using a 
very shallow search and relatively simple evaluation function does not work for 
chess like it did for Tesauro’s experiments in backgammon. But the conclusion 
that this is because of stochastic/deterministic is incorrect. Consider Go, for 
example, where static move generators can play at 1 dan level, and probably 
quite a bit higher.

 

An inference that is compatible with theory is that to conquer chess you will 
need a deeper, wider network, or better inputs, or a better search engine. Or 
all of those. You could imagine measuring the skill of a variety of 
architectures, and map out the path of steepest ascent.

 

In retrospect, I view Schradolph’s paper as evidence that neural networks have 
always been surprisingly successful at Go. Like Brugmann’s paper about Monte 
Carlo, which was underestimated for a long time. Sigh.

 

Best,

Brian

 

From: Minjae Kim [mailto:xive...@gmail.com] 
Sent: Friday, February 24, 2017 11:56 AM
To: Brian Sheppard <sheppar...@aol.com>; computer-go@computer-go.org
Subject: Re: [Computer-go] dealing with multiple local optima

 

TD-gammon is regarded as a special case from the stochastic characteristics of 
the backgammon game; it smoothens the search space for the value function and 
the value function itself to a great degree compared to those 'static' games 
such as go. Experiments of applying TD learning to chess and go was done after 
the success of TD-gammon, and the result was not good. I also did a small 
experiment a while ago applying TD learning to go using a similar network 
structure to that of TD-gammon. The network gets 'stuck' really fast. You can 
see similar comments to my experience here 
(https://chessprogramming.wikispaces.com/Temporal+Difference+Learning). You can 
make the network more 'actively' trained by doing stochastic move selection 
rather than simply selecting the move with highest value, but this doesn't work 
well enough to my experience, where I applied softmax. Schradolph experimented 
with TD for go in his 1994 paper, where he applied Gibbs sampling for 
stochastic move selection, although it wasn't a success for building a strong 
go bot.

 

On Fri, Feb 24, 2017 at 11:03 PM, Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

Neural networks always have a lot of local optima. Simply because they have a 
high degree of internal symmetry. That is, you can “permute” sets of 
coefficients and get the same function.

 

Don’t think of starting with expert training as a way to avoid local optima. It 
is a way to start training with some good examples so that learning can start 
at a higher level. But then you should continue with self-play.

 

Backgammon was trained to expert level based on self-play games that were 
initially random. Google “Tesauro backgammon” and you should be able to find a 
paper.

 

I don’t know NEAT and HyperNEAT; I will look them up. Thank you for the 
reference.

 

Best,

Brian

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org 
<mailto:computer-go-boun...@computer-go.org> ] On Behalf Of Minjae Kim
Sent: Friday, February 24, 2017 3:39 AM
To: computer-go@computer-go.org <mailto:computer-go@computer-go.org> 
Subject: [Computer-go] dealing with multiple local optima

 

I've recently viewed the paper of AlphaGo, which has done gradient-based 
reinforcement learning to get stronger. The learning was successful enough to 
beat a human master, but in this case, supervised learning with a large 
database of master level human games was preceded the reinforcement learning. 
For a complex enough game as go, one can expect that the search space for the 
policy function would not be smooth at all. So supposedly supervised learning 
was necessary to guide the policy function to a good starting point before 
reinforcement. Without such, applying reinforcement learning directly to a 
random policy can easily make the policy stuck at a bad local optimum. I could 
have a miunderstanding at this point; correct me if so, but to continue on: if 
it is hard to have "the good starting point" such as a trained policy from 
human expert game records, what is a way to devise one. I've had a look on NEAT 
and HyperNEAT, which are evolutionary methods. Do these evolutionary algorithms 
scale well on complex strategic decision processes and not just on simple 
linear decisions such as food gathering and danger avoidance? I

Re: [Computer-go] dealing with multiple local optima

2017-02-24 Thread Brian Sheppard via Computer-go
Neural networks always have a lot of local optima. Simply because they have a 
high degree of internal symmetry. That is, you can “permute” sets of 
coefficients and get the same function.

 

Don’t think of starting with expert training as a way to avoid local optima. It 
is a way to start training with some good examples so that learning can start 
at a higher level. But then you should continue with self-play.

 

Backgammon was trained to expert level based on self-play games that were 
initially random. Google “Tesauro backgammon” and you should be able to find a 
paper.

 

I don’t know NEAT and HyperNEAT; I will look them up. Thank you for the 
reference.

 

Best,

Brian

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Minjae Kim
Sent: Friday, February 24, 2017 3:39 AM
To: computer-go@computer-go.org
Subject: [Computer-go] dealing with multiple local optima

 

I've recently viewed the paper of AlphaGo, which has done gradient-based 
reinforcement learning to get stronger. The learning was successful enough to 
beat a human master, but in this case, supervised learning with a large 
database of master level human games was preceded the reinforcement learning. 
For a complex enough game as go, one can expect that the search space for the 
policy function would not be smooth at all. So supposedly supervised learning 
was necessary to guide the policy function to a good starting point before 
reinforcement. Without such, applying reinforcement learning directly to a 
random policy can easily make the policy stuck at a bad local optimum. I could 
have a miunderstanding at this point; correct me if so, but to continue on: if 
it is hard to have "the good starting point" such as a trained policy from 
human expert game records, what is a way to devise one. I've had a look on NEAT 
and HyperNEAT, which are evolutionary methods. Do these evolutionary algorithms 
scale well on complex strategic decision processes and not just on simple 
linear decisions such as food gathering and danger avoidance? In case not, what 
alternatives are known? Is there any success case of a chess, go, or any kind 
of complex strategic game playing algorithm, where it gained expert strength 
without domain knowledge such as expert game examples?

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Playout policy optimization

2017-02-12 Thread Brian Sheppard via Computer-go
If your database is composed of self-play games, then the likelihood 
maximization policy should gain strength rapidly, and there should be a way to 
have asymptotic optimality. (That is, the patterns alone will play a perfect 
game in the limit.)

 

Specifically: play self-play games using an asymptotically strong search 
policy, such as UCT, and then add all of the moves played by winning players to 
the training set. (Never train to the moves played by losers if your goal is 
asymptotic optimality.)

 

The intuition about why this results in strong play: at any moment in self-play 
training, the search engine is always working on situations that it believes 
are most equal for both sides. But then it turns out that one side is better 
and wins, so that data is added to the training set. Reinforcement learning 
then biases the pattern base to make the winning pathway more likely in future 
searches. The winning alternatives will be easier to see in future self-play 
games. Inevitably the losing player will prefer a different path, and the 
learning process continues.

 

Regarding asymptotic optimality: obviously you need a pattern set that could 
potentially be asymptotically optimal. For example, a general decision tree, 
rather than any fixed field of view. If you have such a thing, then the proof 
for asymptotic optimality goes along these lines: first show that the tree 
search and pattern base combination will eventually memorize the winning moves 
at terminal nodes, and then apply the same reasoning recursively. 

 

Regarding trying this: I was just starting to code this when AlphaGo debuted, 
and then I stopped working on Go. So: no, I have not tried this.

 

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Álvaro Begué
Sent: Saturday, February 11, 2017 11:44 PM
To: computer-go 
Subject: [Computer-go] Playout policy optimization

 

Hi,

 

I remember an old paper by Rémi Coulom ("Computing Elo Ratings of Move Patterns 
in the Game of Go") where he computed "gammas" (exponentials of scores that you 
could feed to a softmax) for different move features, which he fit to best 
explain the move probabilities from real games.

 

Similarly, AlphaGo's paper describes how their rollout policy's weights are 
trained to maximize log likelihood of moves from a database.

 

However, there is no a priori reason why imitating the probabilities observed 
in reference games should be optimal for this particular purpose.

 

I thought about this for about an hour this morning, and this is what I came up 
with. You could make a database of positions with a label indicating the result 
(perhaps from real games, perhaps similarly to how AlphaGo trained their value 
network). Loop over the positions, run a few playouts and tweak the move 
probabilities by some sort of reinforcement learning, where you promote the 
move choices from playouts whose outcome matches the label, and you discourage 
the move choices from playouts whose outcome does not match the label.

 

The point is that we would be pushing our playout policy to produce good 
estimates of the result of the game, which in the end is what playout policies 
are for.

 

Any thoughts? Did anyone actually try something like this?

 

Cheers,

Álvaro.

 

 

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo rollout nakade patterns?

2017-01-31 Thread Brian Sheppard via Computer-go

Rollout policy is a black art. Lots of trial and error. That said, you are on 
the right path if you use the published papers as the starting point for your 
own experiments.

Two board representation details are very important. If your board 
representation has these capabilities then you can represent an astounding 
amount of Go theory. In Pebbles, the following properties are incrementally 
updated as part of the make-move operation:

- the number of liberties for any string
- the index of the 3x3 neighborhood of every point.

Best,
Brian
 
-Original Message-
From: Roel van Engelen <ich.bun...@gmail.com>
To: computer-go <computer-go@computer-go.org>
Sent: Tue, Jan 31, 2017 12:55 pm
Subject: Re: [Computer-go] AlphaGo rollout nakade patterns?



@Gian-Carlo Pascutto, thanks! but identifying the shapes is not the problem=)

@Brain Sheppard, Sorry it was not my goal to make you start guessing any 
implementation details, somehow from your first explanation i got the 
impression that you where familiar with it and i was wondering whether anybody 
wrote something about it.
right now i am sticking to the paper as much as possible and reading, trying to 
understand how others improve their rollouts. I hope that i will be able to 
improve the rollouts at some point.


Roel



On 31 January 2017 at 17:21, Brian Sheppard via Computer-go 
<computer-go@computer-go.org> wrote:

If a "diamond" pattern is centered on a 5x5 square, then you have 13 points. 
The diagram below will give the idea.
 
__+__

_+++_
+
_+++_
__+__
 
 
At one bit per cell, this would be 8192 patterns, so this is why I am guessing 
that this is the pattern set. (You would set one bit for each captured stone, 
then look up in a table.)
 
I feel like I am engaging in a lot of guesswork regarding implementation 
details. I want to emphasize that the implementation details are not 
particularly important. The important point is that you can add this capability 
("reply on the vital point after the capture of a nakade group, provided that 
the opponent's surrounding stones have no additional eyes") to your rollout, 
and the implementation should take less than 1% of total time. Any 
implementation that achieves that goal will make a noticeable difference to 
strength.
 


 
-Original Message-
From: Roel van Engelen <ich.bun...@gmail.com>
To: computer-go <computer-go@computer-go.org>
Sent: Tue, Jan 31, 2017 10:42 am
Subject: Re: [Computer-go] AlphaGo rollout nakade patterns?



@Brain Sheppard
Thanks that is a really useful explanation! 
the way you state: "and therefore a 8192-sized pattern set will identify all 
potential nakade." seems to indicate this is a known pattern set? could i find 
some more information on it somewhere? also i was unable to find Pebbles, is it 
open source?

@Robert Jasiek
what definitions/papers/publications are you referring to?



m.v.g. Roel



On 24 January 2017 at 12:57, Brian Sheppard via Computer-go 
<computer-go@computer-go.org> wrote:

There are two issues: one is the shape and the other is the policy that the 
search should follow.

Of course the vital point is a killing move whether or not a group was just 
captured. So it is possible to detect such shapes on the board and then play 
the vital point.

It is an entirely different thing to say when a rollout should look for such 
features. Rollouts are complicated; playing the "best" play does not always 
make your search engine stronger. Of course, there is a question of the time 
required for analysis. And then there is the question of "balance".

"Balance" means that the rollout should play "equally well" for both sides, 
with the goal that the terminal nodes of the rollout are accurate evaluations 
of the leafs of the tree. If you incorporate all moves that punish tactical 
errors then sometimes you can get unbalanced results because you do not have 
rules that prevent tactical errors from happening.

A common rule for nakade is to only check after a group is captured. The point 
is that the vital point is otherwise not motivated by any heuristics, whereas 
most other moves in capturing races are suggested by local patterns. My 
understanding of Alpha Go's policy is that they were only checking for nakade 
after captures.

The "center of a group of three" rule is a separate issue. My recollection is 
that this pattern should be checked after every move, and that was a discovery 
by the Mogo team.

Note that there are often subtle differences for your program compared to the 
published papers.

Best,
Brian

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Gian-Carlo Pascutto
Sent: Tuesday, January 24, 2017 3:05 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo rollout nakade patterns?


On 23-01-17 20:10, Brian Sheppard via Computer

Re: [Computer-go] AlphaGo rollout nakade patterns?

2017-01-31 Thread Brian Sheppard via Computer-go

If a "diamond" pattern is centered on a 5x5 square, then you have 13 points. 
The diagram below will give the idea.

__+__

_+++_
+
_+++_
__+__


At one bit per cell, this would be 8192 patterns, so this is why I am guessing 
that this is the pattern set. (You would set one bit for each captured stone, 
then look up in a table.)

I feel like I am engaging in a lot of guesswork regarding implementation 
details. I want to emphasize that the implementation details are not 
particularly important. The important point is that you can add this capability 
("reply on the vital point after the capture of a nakade group, provided that 
the opponent's surrounding stones have no additional eyes") to your rollout, 
and the implementation should take less than 1% of total time. Any 
implementation that achieves that goal will make a noticeable difference to 
strength.


 
-Original Message-
From: Roel van Engelen <ich.bun...@gmail.com>
To: computer-go <computer-go@computer-go.org>
Sent: Tue, Jan 31, 2017 10:42 am
Subject: Re: [Computer-go] AlphaGo rollout nakade patterns?



@Brain Sheppard
Thanks that is a really useful explanation! 
the way you state: "and therefore a 8192-sized pattern set will identify all 
potential nakade." seems to indicate this is a known pattern set? could i find 
some more information on it somewhere? also i was unable to find Pebbles, is it 
open source?

@Robert Jasiek
what definitions/papers/publications are you referring to?



m.v.g. Roel



On 24 January 2017 at 12:57, Brian Sheppard via Computer-go 
<computer-go@computer-go.org> wrote:

There are two issues: one is the shape and the other is the policy that the 
search should follow.

Of course the vital point is a killing move whether or not a group was just 
captured. So it is possible to detect such shapes on the board and then play 
the vital point.

It is an entirely different thing to say when a rollout should look for such 
features. Rollouts are complicated; playing the "best" play does not always 
make your search engine stronger. Of course, there is a question of the time 
required for analysis. And then there is the question of "balance".

"Balance" means that the rollout should play "equally well" for both sides, 
with the goal that the terminal nodes of the rollout are accurate evaluations 
of the leafs of the tree. If you incorporate all moves that punish tactical 
errors then sometimes you can get unbalanced results because you do not have 
rules that prevent tactical errors from happening.

A common rule for nakade is to only check after a group is captured. The point 
is that the vital point is otherwise not motivated by any heuristics, whereas 
most other moves in capturing races are suggested by local patterns. My 
understanding of Alpha Go's policy is that they were only checking for nakade 
after captures.

The "center of a group of three" rule is a separate issue. My recollection is 
that this pattern should be checked after every move, and that was a discovery 
by the Mogo team.

Note that there are often subtle differences for your program compared to the 
published papers.

Best,
Brian

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Gian-Carlo Pascutto
Sent: Tuesday, January 24, 2017 3:05 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo rollout nakade patterns?


On 23-01-17 20:10, Brian Sheppard via Computer-go wrote:
> only captures of up to 9 stones can be nakade.

I don't really understand this.

http://senseis.xmp.net/?StraightThree

Both constructing this shape and playing the vital point are not captures. How 
can you detect the nakade (and play at a in time) if you only check captures?

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go




___Computer-go mailing 
listComputer-go@computer-go.orghttp://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo rollout nakade patterns?

2017-01-24 Thread Brian Sheppard via Computer-go
There are two issues: one is the shape and the other is the policy that the 
search should follow.

Of course the vital point is a killing move whether or not a group was just 
captured. So it is possible to detect such shapes on the board and then play 
the vital point.

It is an entirely different thing to say when a rollout should look for such 
features. Rollouts are complicated; playing the "best" play does not always 
make your search engine stronger. Of course, there is a question of the time 
required for analysis. And then there is the question of "balance".

"Balance" means that the rollout should play "equally well" for both sides, 
with the goal that the terminal nodes of the rollout are accurate evaluations 
of the leafs of the tree. If you incorporate all moves that punish tactical 
errors then sometimes you can get unbalanced results because you do not have 
rules that prevent tactical errors from happening.

A common rule for nakade is to only check after a group is captured. The point 
is that the vital point is otherwise not motivated by any heuristics, whereas 
most other moves in capturing races are suggested by local patterns. My 
understanding of Alpha Go's policy is that they were only checking for nakade 
after captures.

The "center of a group of three" rule is a separate issue. My recollection is 
that this pattern should be checked after every move, and that was a discovery 
by the Mogo team.

Note that there are often subtle differences for your program compared to the 
published papers.

Best,
Brian

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Gian-Carlo Pascutto
Sent: Tuesday, January 24, 2017 3:05 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo rollout nakade patterns?

On 23-01-17 20:10, Brian Sheppard via Computer-go wrote:
> only captures of up to 9 stones can be nakade.

I don't really understand this.

http://senseis.xmp.net/?StraightThree

Both constructing this shape and playing the vital point are not captures. How 
can you detect the nakade (and play at a in time) if you only check captures?

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo rollout nakade patterns?

2017-01-23 Thread Brian Sheppard via Computer-go

A capturing move has a potential nakade if the string that was removed is among 
a limited set of possibilities. Probably Alpha Go has a 13-point bounding 
region (e.g., the 13-point star) that it uses as a positional index, and 
therefore a 8192-sized pattern set will identify all potential nakade.

It is also easy and inexpensive to identify nakade by if/the/else rules. Very 
few moves in a Go game are captures, and only captures of up to 9 stones can be 
nakade. The captured stones must also fit in a 3x3 bounding box, so code can 
rapidly eliminate non-nakade situations.

The point is that using much less than 1% of your CPU time, you can identify 
potential nakade moves. And since you reach this stage very rarely, you can 
invest a lot of time trying to do precise analysis. In Pebbles, I concluded 
that it was worthwhile to also test that the surrounding strings likely did not 
have another potential eye. That is a pretty expensive calculation, but the 
code path was executed only when it was critical to success.

Nakade are enormously valuable because they are cheap to calculate and a 
playout would otherwise have close to 0% chance of following the right path.
 
-Original Message-
From: terry mcintyre via Computer-go 
To: computer-go 
Sent: Mon, Jan 23, 2017 1:14 pm
Subject: Re: [Computer-go] AlphaGo rollout nakade patterns?


I speculate: nakade involves creating a shape (such as three in a row or a 
bulky five) such that, if captured, it would only form one eye, given the 
proper placement. I can imagine a set of patterns which enumerate the 
possibilities. Some examples exist, however, which are quite complex. 


Sent from Yahoo Mail for iPad


On Monday, January 23, 2017, 11:45 AM, Roel van Engelen  
wrote:

I am trying to re-create the fast rollout policy as described by deepMind but 
got stuck on the nakade patterns:
"Nakade, # of patterns 8192: Move matches a nakade pattern at captured stone"

the "at captured stone" confuses me, my first thought is: "this is only 
computed if stones have been captured recently" but 
i don't think that is correct. how should i read it?


since they say "# of patterns 8192" i imagine they found some way to hash them 
just like the 3x3 and 12point diamond shapes but so far
i have not found a way to do so. I found that other engines use heuristics to 
find nakade patterns so my question is does AlphaGo use patterns and does 
somebody know how this works?


Thanks!
Roel

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


___Computer-go mailing 
listComputer-go@computer-go.orghttp://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-10 Thread Brian Sheppard
I was writing code along those lines when AlphaGo debuted. When it became clear 
that AlphaGo had succeeded, then I ceased work.

 

So I don’t know whether this strategy will succeed, but the theoretical merits 
were good enough to encourage me.

 

Best of luck,

Brian

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Bo 
Peng
Sent: Tuesday, January 10, 2017 5:25 PM
To: computer-go@computer-go.org
Subject: [Computer-go] Training the value network (a possibly more efficient 
approach)

 

Hi everyone. It occurs to me there might be a more efficient method to train 
the value network directly (without using the policy network).

 

You are welcome to check my method: http://withablink.com/GoValueFunction.pdf

 

Let me know if there is any silly mistakes :)

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Some experiences with CNN trained on moves by thewinning player

2016-12-20 Thread Brian Sheppard
The downside to training the moves of the winner is that you "throw away" half 
of the data. But that isn't so bad, because you can always get as much data as 
you need.

The upside is that losing positions do not necessarily have a training signal. 
That is: there is no good move in a losing position. So you benefit by throwing 
away the portion of the moves that occur from the last game-losing move and 
onward.




-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Hiroshi Yamashita
Sent: Tuesday, December 20, 2016 5:45 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Some experiences with CNN trained on moves by 
thewinning player

Hi Detlef,

> Interestingly the prediction rate of this moves was slightly higher 
> (without training, just taking the previously trained network) than

I also have started training with only using the moves by the winner  of each 
game, from GoGoD with 8 symmetries.
I also got accuracy up from 49.1% to 50.1.
(without training, Iteration 0, finetuning from big learning rate) My learning 
is not finished, so I can't say how strong it is though.

This result is interesting.
Loser's move is diffcult for DCNN? Or DCNN tends to learn only good moves?
I have thought if DCNN learn KGS 6k moves, it can reproduce 6k moves.
But this is not correct?

Thanks,
Hiroshi Yamashita

- Original Message -
From: "Detlef Schmicker" 
To: 
Sent: Sunday, December 11, 2016 7:38 PM
Subject: [Computer-go] Some experiences with CNN trained on moves by thewinning 
player


>I want to share some experience training my policy cnn:
> 
> As I wondered, why reinforcement learning was so helpful. I trained
> from the Godod database with only using the moves by the winner of
> each game.
> 
> Interestingly the prediction rate of this moves was slightly higher
> (without training, just taking the previously trained network) than
> taking into account the moves by both players (53% against 52%)
> 
> Training on winning player moves did not help a lot, I got a
> statistical significant improvement of about 20-30ELO.
> 
> So I still don't understand, why reinforcement should do around
> 100-200ELO :)
> 
> Detlef

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Some experiences with CNN trained on moves by the winning player

2016-12-11 Thread Brian Sheppard
IMO, training using only the moves of winners is obviously the practical choice.

Worst case: you "waste" half of your data. But that is actually not a downside 
provided that you have lots of data, and as your program strengthens you will 
avoid potential data-quality problems.

Asymptotically, you have to train using only self-play games (and only the 
moves of winners). Otherwise you cannot break through the limitations inherent 
in the quality of the training games.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Erik van der Werf
Sent: Sunday, December 11, 2016 6:51 AM
To: computer-go 
Subject: Re: [Computer-go] Some experiences with CNN trained on moves by the 
winning player

Detlef, I think your result makes sense. For games between near-equally strong 
players the winning player's moves will not be much better than the loosing 
player's moves. The game is typically decided by subtle mistakes. Even if 
nearly all my moves are perfect, just one blunder can throw the game. Of course 
it depends on how you implement the details, but in principle reinforcement 
learning should be able to deal with such cases (i.e., prevent propagating 
irrelevant information all the way back to the starting position).

W.r.t. AG's reinforcement learning results, as far as I know, reinforcement 
learning was only indirectly helpful. The RL policy net performed worse then 
the SL policy net in the over-all system. Only by training the value net to 
predict expected outcomes from the
(over-fitted?) RL policy net they got some improvement (or so they claim). In 
essence this just means that RL may have been effective in creating a better 
training set for SL. Don't get me wrong, I love RL, but the reason why the RL 
part was hyped so much is in my opinion more related to marketing, politics and 
personal ego.

Erik


On Sun, Dec 11, 2016 at 11:38 AM, Detlef Schmicker  wrote:
> I want to share some experience training my policy cnn:
>
> As I wondered, why reinforcement learning was so helpful. I trained 
> from the Godod database with only using the moves by the winner of 
> each game.
>
> Interestingly the prediction rate of this moves was slightly higher 
> (without training, just taking the previously trained network) than 
> taking into account the moves by both players (53% against 52%)
>
> Training on winning player moves did not help a lot, I got a 
> statistical significant improvement of about 20-30ELO.
>
> So I still don't understand, why reinforcement should do around 
> 100-200ELO :)
>
> Detlef
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Converging to 57%

2016-08-23 Thread Brian Sheppard
The learning rate seems much too high. My experience (which is from backgammon 
rather than Go, among other caveats) is that you need tiny learning rates. 
Tiny, as in 1/TrainingSetSize.

 

Neural networks are dark magic. Be prepared to spend many weeks just trying to 
figure things out. You can bet that the Google & FB results are just their 
final runs.

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Robert Waite
Sent: Tuesday, August 23, 2016 2:40 AM
To: computer-go@computer-go.org
Subject: [Computer-go] Converging to 57%

 

I had subscribed to this mailing list back with MoGo... and remember probably 
arguing that the game of go wasn't going to be beat for years and years. I am a 
little late to the game now but was curious if anyone here has worked with 
supervised learning networks like in the AlphaGo paper.

 

I have been training some networks along the lines of the AlphaGo paper and the 
DarkForest paper.. and a couple others... and am working with a single 660gtx. 
I know... laugh... but its a fair benchmark and i'm being cheap for the moment.

 

Breaking 50% accuracy is quite challenging... I have attempted many 
permutations of learning algorithms... and can hit 40% accuracy in perhaps 4-12 
hours... depending on the parameters set. Some things I have tried are using 
default AlphaGo but wtih 128 filters, 32 minibatch size and .01 learning rate, 
changing the optimizer from vanilla SGD to Adam or RMSProp. Changing batching 
to match DarkForest style (making sure that a minibatch contains samples from 
game phases... for example beginning, middle and end-game).Pretty much 
everything seems to converge at a rate that will really stretch out. I am 
planning on picking a line and going with it for an extended training but was 
wondering if anyone has ever gotten close to the convergence rates implied by 
the DarkForest paper.

 

For comparison... Google team had 50 gpus, spend 3 weeks.. and processed 5440M 
state/action pairs. The FB team had 4 gpus, spent 2 weeks and processed 
150M-200M state/action pairs. Both seemed to get to around 57% accuracy with 
their networks.

 

I have also been testing them against GnuGo as a baseline.. and find that GnuGo 
can be beaten rather easily with very little network training... my eye is on 
Pachi... but have to break 50% accuracy i think to even worry about that.

 

Have also played with reinforcement learning phase... started with learning 
rate of .01... which i think was too high that does take quite a bit of 
time on my machine.. so didnt play too much with it yet.

 

Anyway does anyone have any tales of how long it took to break 50%? What is 
the magic bullet that will help me converge there quickly!

 

Here is a long-view graph of various attempts:

 

https://drive.google.com/file/d/0B0BbrXeL6VyCUFRkMlNPbzV2QTQ/view

 

Red and Blue lines are from another member that ran 32 in a minibatch, .01 
learning rate and 128 filters in the middle layers vs. 192. They had 4 k40 gpus 
I believe. They also used 4 training pairs to 4 validation pairs... so 
I imagine that is whey they had such a spread. There is a jump in the accuracy 
which was when learning rate was decreased to .001 I believe.

 

Closer shot:

 

https://drive.google.com/file/d/0B0BbrXeL6VyCRVUxUFJaWVJBdEE/view

 

Most stay between the lines... but looking at both graphs makes me wonder if 
any of the lines are approaching the convergence of DarkForest. My gut tells me 
they were onto something... and am rather curious of the playing strength of 
the DarkForest SL network and the AG SL network.

 

Also... a picture of the network's view on a position... this one was trained 
to 41% accuracy and played itself greedily.

 

https://drive.google.com/file/d/0B0BbrXeL6VyCNkRmVDBIYldraWs/view

 

Oh... and another thing AG used KGS amateur data... FB and my networks have 
been trained on pro games only. At one point I tested the 41% network in the 
image (trained on pro data) and a 44% network trained on amateur (KGS games) 
against GnuGo... and the pro data network soundly won... and the amateur 
network soundly lost... so I stuck with pro since. Not sure if the end result 
is the same... and kinda glad AG team used amateur as that removes the argument 
that it somehow learned Le Sedol's style.

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Crazystone on Steam

2016-05-27 Thread Brian Sheppard
Are you the Johannes Laire who works at Cimpress in WTR?

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Johannes Laire
Sent: Friday, May 27, 2016 4:45 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Crazystone on Steam

The 2013 version is quite usable in wine, and the screenshots of the new 
version look exactly the same so I would expect it to work as well. Sometimes 
the stones don't display or the board grid is drawn wrong (19x19 in a 9x9 
game), but minimizing and maximizing the window normally fixes this.

The UI has some severe usability issues that are a much bigger nuisance on any 
platform than the wine-specific glitches are on Linux.

On Fri, May 27, 2016 at 10:10 PM, daniel rich  wrote:
> Anyone have any idea how well crazy stone does in Wine? Or if 
> Mac/Linux support will happen?
>
>
>
> On Fri, May 27, 2016 at 1:23 PM, Marc Landgraf  wrote:
>>
>> Oh, this is great.
>> But one question: Will there be any updates/upgrades/patches for 
>> client and/or engine?
>> Having it on Steam would finally give a convenient way to do so. This 
>> would be a reason for me to buy it, hoping for it to change for the better.
>> Or will it just be a one release policy? Can't see myself spending 
>> 74euro then, considering the mixed reviews in various places.
>>
>> Am 27.05.2016 21:09 schrieb "Jim O'Flaherty" :
>>>
>>> Tysvm! The video on Stream is a very nice touch. And the first 
>>> review rocks!
>>>
>>> On Fri, May 27, 2016 at 1:36 PM, Andreas Persson 
>>> 
>>> wrote:

 Congrats on the Steam release of Crazystone Rémi! Hope it will sell 
 well. For people that haven't seen it here is a link 
 http://store.steampowered.com/app/479330/. Bit to pricey for me but 
 will pick it up at some point, enjoy the ios app a lot.

 /Andreas

 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go
>>>
>>>
>>>
>>> ___
>>> Computer-go mailing list
>>> Computer-go@computer-go.org
>>> http://computer-go.org/mailman/listinfo/computer-go
>>
>>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] new challenge for Go programmers

2016-03-31 Thread Brian Sheppard
Yes, I recall that earlier episode. I would be happy to have a better 
relationship going forward.

I wrote some explanation generators for Scrabble and Chess AI, but these were 
much simpler systems that I could break apart. E.g., the Chess engine would 
play two moves out until a "quiet" position was reached, and then explain which 
evaluation parameters were affected. But even this was not easy, because 
sometimes after playing out for a bit the AI would "change its mind" by 
deciding that the other move was better after all. And evaluation factors were 
sometimes too numerous to list individually, even after pruning the factors 
that are too small to change the decision. And explanation by "comparing 
factors" assumed a linear evaluation function.

So I see huge challenges scaling that up.

Another approach that works well for tree-search games is just to let the human 
explore, in the style of an opening library. The engine simply responds to the 
moves that the human proposes, creating trees and assigning values to 
endpoints. MCTS programs show a "board control" visual, which might be enough 
to explain the positional evaluations. Basically: let the human absorb a lot of 
case studies.

There was a paper about postal-go where a human player used multiple programs 
in this way to construct a phenomenally strong player.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
"Ingo Althöfer"
Sent: Thursday, March 31, 2016 4:31 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] new challenge for Go programmers

Hello all,

"Brian Sheppard" <sheppar...@aol.com> wrote:
> ... This is out of line, IMO. Djhbrown asked a sensible question that 
> has valuable intentions. I would like to see responsible, thoughtful, 
> and constructive replies.

there is a natural explanation why some people here react allergic to 
Djhbrown's new contributions.

He had an active phase on the list already from early August 2015 to mid 
October. Things started interestingly, but somehow he went into a "strange 
loop", and in the end he was asked to stop posting.
http://computer-go.org/pipermail/computer-go/2015-October/008051.html

Perhaps all sides can help that things run better now.

Ingo.

PS. For my interest in computer-assisted human go visualisation questions on 
DCNNs are indeed interesting.
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] new challenge for Go programmers

2016-03-30 Thread Brian Sheppard
This is out of line, IMO. Djhbrown asked a sensible question that has valuable 
intentions. I would like to see responsible, thoughtful, and constructive 
replies.

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
uurtamo .
Sent: Wednesday, March 30, 2016 7:43 PM
To: computer-go 
Subject: Re: [Computer-go] new challenge for Go programmers

 

He cannot possibly write code

On Mar 30, 2016 4:38 PM, "Jim O'Flaherty"  > wrote:

I don't think djhbrown is a software engineer. And he seems to have the most 
fits. :)

 

On Wed, Mar 30, 2016 at 6:37 PM, uurtamo .  > wrote:

This is clearly the alphago final laugh; make an email list responder to send 
programmers into fits.

s.

On Mar 30, 2016 4:16 PM, "djhbrown ."  > wrote:

thank you very much Ben for sharing the inception work, which may well
open the door to a new avenue of AI research.  i am particularly
impressed by one pithy statement the authors make:

 "We must go deeper: Iterations"

i remember as an undergrad being impressed by the expressive power of
recursive functions, and later by the iterative quality of biological
growth and its fractal nature.

seeing animals in clouds is a bit like seeing geta in a go position;
so maybe one way to approach the problem of chatting with a CNN might
be to seek correlations between convolution weights and successive
stone configurations that turn up time and time again in games.

it may be that some kind of iterative procedure could do this, just as
my iterative procedure for circumscribing a group has a recursive
quality to its definition.

all you need then is to give such a correlation a name, and you will
be on the way to discovering a new language for talking about Go.


On 31/03/2016, Ben  
> wrote:
> It would be very interesting to see what these go playing neural
> networks dream about [1].
> [1]
> http://googleresearch.blogspot.de/2015/06/inceptionism-going-deeper-into-neural.html
___
Computer-go mailing list
Computer-go@computer-go.org  
http://computer-go.org/mailman/listinfo/computer-go


___
Computer-go mailing list
Computer-go@computer-go.org  
http://computer-go.org/mailman/listinfo/computer-go

 


___
Computer-go mailing list
Computer-go@computer-go.org  
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] new challenge for Go programmers

2016-03-30 Thread Brian Sheppard
Trouble is that it is very difficult to put certain concepts into mathematics. 
For instance: “well, I tried to find parameters that did a better job of 
minimizing that error function, but eventually I lost patience.” :-)

 

Neural network parameters are not directly humanly understandable. They just 
happen to minimize an error function on a sample of training cases that might 
not even be representative. So you want to reason “around” the NN by 
interrogating it in some way, and trying to explain the results.

 

If anyone wants to pursue this research, I suggest several avenues.

 

First, you could differentiate the output with respect to each input to 
determine the aspects of the position that weigh on the result most heavily. 
Then, assuming that you can compare the scale of the inputs in some way, and 
assuming that the inputs are something that is understandable in the problem 
domain, maybe you can construct an explanation.

 

Second, you could construct a set of hypothetical different similar positions, 
and see how those results differ. E.g., make a set of examples by adding a 
black stone and a white stone to each empty point on the board, or removing 
each existing stone from the board, and then evaluate the NN on those cases, 
then do decision-tree induction to discover patterns.

 

Third, in theory decision trees are just as powerful as NN (in that both are 
asymptotically optimal learning systems), and it happens that decision trees 
provide humanly understandable explanations for reasoning. So maybe you can 
replace the NN with DT and have equally impressive performance, and pick up 
human understandability as a side-effect.

 

Actually, if anyone is interested in making computer go programs that do not 
require GPUs and super-computers, then looking into DTs is advisable.

 

Best,

Brian

 

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Jim 
O'Flaherty
Sent: Wednesday, March 30, 2016 4:24 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] new challenge for Go programmers

 

I agree, "cannot" is too strong. But, values close enough to "extremely 
difficult as to be unlikely" is why I used it.

On Mar 30, 2016 11:12 AM, "Robert Jasiek"  > wrote:

On 30.03.2016 16:58, Jim O'Flaherty wrote:

My own study says that we cannot top down include "English explanations" of
how the ANNs (Artificial Neural Networks, of which DCNN is just one type)
arrive a conclusions.


"cannot" is a strong word. I would use it only if it were proven mathematically.

-- 
robert jasiek
___
Computer-go mailing list
Computer-go@computer-go.org  
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Nice graph

2016-03-25 Thread Brian Sheppard
Hmm, seems to imply a 1000-Elo edge over human 9p. But such a player would 
literally never lose a game to a human.

I take this as an example of the difficulty of extrapolating based on games 
against computers. (and the slide seems to have a disclaimer to this effect, if 
I am reading the text on the left hand side correctly). Computers have 
structural similarities that exaggerates strength differences in head-to-head 
comparisons. But against opponents that have different playing characteristics, 
such as human 9p, then the strength distribution is different.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Rémi Coulom
Sent: Friday, March 25, 2016 8:23 PM
To: computer-go 
Subject: [Computer-go] Nice graph

AlphaGo improved 3-4 stones:

http://i.imgur.com/ylQTErVl.jpg

(Found in the Life in 19x19 forum)

Rémi
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo & DCNN: Handling long-range dependency

2016-03-15 Thread Brian Sheppard
I remember another way to estimate peak capabilities. This one was used in the 
late 1970's by Ken Thompson / Belle, but maybe was invented earlier: play 
handicap matches at different time controls, and fit a quadratic curve to the 
data. If your program is strong enough, then additional computational effort 
shows diminishing returns. If your program is too weak, then the performance 
scales linearly, and you don't see diminishing returns to higher computational 
effort.

It is also possible to have a peak that falls short of perfection if your 
program's algorithms are not asymptotically optimal, but that was not a problem 
for chess programs. Belle was 1100 rating points lower than current programs, 
and it already showed diminishing returns. I recall that the method gave 
reasonable results. E.g., I remember estimating that Belle would need at least 
depth 13 searches to contend for championship caliber.

-Original Message-
From: Brian Sheppard [mailto:sheppar...@aol.com] 
Sent: Tuesday, March 15, 2016 6:20 PM
To: 'computer-go@computer-go.org' <computer-go@computer-go.org>
Subject: RE: [Computer-go] AlphaGo & DCNN: Handling long-range dependency

>So a small error in the opening or middle game can literally be worth anything 
>by the time the game ends.

These are my estimates: human pros >= 24 points lost, and >= 5 game-losing 
errors against other human pros.

I relayed my experience of a comparable experiment with chess, and how those 
estimates proved to be loose lower bounds, and it would not surprise me if 
these estimates are also far from perfection.

I urge you to construct a model that you feel embodies important 
characteristics, and get back to us with your estimates.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Darren Cook
Sent: Monday, March 14, 2016 5:15 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo & DCNN: Handling long-range dependency

> You can also look at the score differentials. If the game is perfect, 
> then the game ends up on 7 points every time. If players made one 
> small error (2 points), then the distribution would be much narrower 
> than it is.

I was with you up to this point, but players (computer and strong
humans) play to win, not to maximize the score. So a small error in the opening 
or middle game can literally be worth anything by the time the game ends.

> I am certain that there is a vast gap between humans and perfect play. 
> Maybe 24 points? Four stones??

24pts would be about two stones (if each handicap stone is twice komi, e.g. see 
http://senseis.xmp.net/?topic=2464).

The old saying is that a pro would need to take 3 to 4 stones against god (i.e. 
perfect play).

Darren
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo & DCNN: Handling long-range dependency

2016-03-15 Thread Brian Sheppard
>So a small error in the opening or middle game can literally be worth anything 
>by the time the game ends.

These are my estimates: human pros >= 24 points lost, and >= 5 game-losing 
errors against other human pros.

I relayed my experience of a comparable experiment with chess, and how those 
estimates proved to be loose lower bounds, and it would not surprise me if 
these estimates are also far from perfection.

I urge you to construct a model that you feel embodies important 
characteristics, and get back to us with your estimates.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Darren Cook
Sent: Monday, March 14, 2016 5:15 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo & DCNN: Handling long-range dependency

> You can also look at the score differentials. If the game is perfect, 
> then the game ends up on 7 points every time. If players made one 
> small error (2 points), then the distribution would be much narrower 
> than it is.

I was with you up to this point, but players (computer and strong
humans) play to win, not to maximize the score. So a small error in the opening 
or middle game can literally be worth anything by the time the game ends.

> I am certain that there is a vast gap between humans and perfect play. 
> Maybe 24 points? Four stones??

24pts would be about two stones (if each handicap stone is twice komi, e.g. see 
http://senseis.xmp.net/?topic=2464).

The old saying is that a pro would need to take 3 to 4 stones against god (i.e. 
perfect play).

Darren
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo & DCNN: Handling long-range dependency

2016-03-14 Thread Brian Sheppard
Maybe 10 years ago I extrapolated from the results of title matches to arrive 
at an estimate of perfect play. My memory is hazy about the conclusions 
(sorry), so I would love if someone is curious enough to build models and draw 
conclusions and share them.

The basis of the estimate is that Black wins 53% (or so) at 6.5 point komi, but 
White would win with a 7.5 points. So we estimate 7 as the value of the game, 
and Black should always win with perfect play. How often do the players have to 
make game-changing errors in order for Black to win only 53%?

You can be sure, for example, that the players make more than one "losing move" 
per game. If they didn't, then Black would win an enormous percentage. Even 2, 
3 or 4 errors (win/loss affecting!) per game per player don't match the 
observed outcome at all. I recall that you need 5 or 6 game-losing errors per 
player per game. (And note that game-losing errors have to alternate between 
the players, so they are always equal if Black wins and off-by-one if White 
wins.)

You can also look at the score differentials. If the game is perfect, then the 
game ends up on 7 points every time. If players made one small error (2 
points), then the distribution would be much narrower than it is. I recall 
estimating ~12 small errors per player per game.

I am certain that there is a vast gap between humans and perfect play. Maybe 24 
points? Four stones?? I don't know how that translates to Elo terms, but 600 
point could be about right.

I do want to acknowledge Robert's point that we don't know what the limit is. 
It is very hard to extrapolate from this distance. For comparison, I did the 
same calculation for chess in 1981 after the Korchnoi-Karpov rematch. I recall 
an estimate that perfect chess play was >= 3000 Elo, and it turns out that is a 
significant underestimate.

Best,
Brian

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Robert Jasiek
Sent: Monday, March 14, 2016 10:11 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo & DCNN: Handling long-range dependency

On 14.03.2016 09:33, Petri Pitkanen wrote:
> And being 600 elo points above best human you are pretty close  to 
> best possible play.

You do not have any evidence for such a limit.

--
robert jasiek
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Game 4: a rare insight

2016-03-13 Thread Brian Sheppard
I have the impression that the value network is used to initialize the score of 
a node to, say, 70% out of N trials. Then the MCTS is trial N+1, N+2, etc. 
Still asymptotically optimal, but if the value network is accurate then you 
have a big acceleration in accuracy because the scores start from a higher 
point instead of wobbling unstably for a while.

But then I didn't follow the back-up policy. That is, if you do a search, and 
the color to move loses, but the evaluation at the leaf node was winning by 
70%, then what update is made to this node?

In MCTS, you only use the W/L value. But if you are using a value network then 
it seems inconsistent not to use the 70% in some way.

So I also have to go back to read the paper again...

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Darren Cook
Sent: Sunday, March 13, 2016 2:20 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Game 4: a rare insight

> You are right, but from fig 2 of the paper can see, that mc and value 
> network should give similar results:
> 
> 70% value network should be comparable to 60-65% MC winrate from this 
> paper, usually expected around move 140 in a "human expert game" (what 
> ever this means in this figure :)

Thanks, that makes sense.

>>> Assuming that is an MCTS estimate of winning probability, that 70% 
>>> sounds high (i.e. very confident);
> 
>> That tweet says 70% is from value net, not from MCTS estimate.

I guess I need to go back and read the AlphaGo papers again; I thought it was 
still an MCTS program at the top-level, and the value network was being used to 
influence the moves the tree explores. But from this, and some other comments 
I've seen, I have the feeling I've misunderstood.

Darren




___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Game 4: a rare insight

2016-03-13 Thread Brian Sheppard
I would not place too much confidence in the observers. Even though they are 
pro players, they don't have the same degree of concentration as the game's 
participants, and they have an obligation to speak about the game on a regular 
basis which further deteriorates analytic skills. Figuring out where a player 
went wrong will take much longer than it takes to play the game.

If you really want good insight, better to use several programs to play out 
games based on moves that they collectively propose, with the additional 
ability to take moves from a human. Over a period of days you can get quite 
strong analysis, even if the players are not that strong. There was a great 
paper describing the method from a few years ago.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Marc Landgraf
Sent: Sunday, March 13, 2016 5:26 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Game 4: a rare insight

What is the most interesting part is, that at this point many pro commentators 
found a lot of aji, but did not find a "solution" for Lee Sedol that broke 
AlphaGos position. So the question remains: Did AlphaGo find a hole in it's own 
position and tried to dodge that? Was it too strong for its own good? Or was it 
a misevaluation due to the immense amounts of aji, which would not result in 
harm, if played properly?


2016-03-13 9:54 GMT+01:00 Darren Cook :
> From Demis Hassabis:
>   When I say 'thought' and 'realisation' I just mean the output of
>   #AlphaGo value net. It was around 70% at move 79 and then dived
>   on move 87
>
>   https://twitter.com/demishassabis/status/708934687926804482
>
> Assuming that is an MCTS estimate of winning probability, that 70% 
> sounds high (i.e. very confident); when I was doing the computer-human 
> team experiments, on 9x9, with three MCTS programs, I generally knew 
> I'd found a winning move when the percentages moved from the 48-52% 
> range to, say, 55%.
>
> I really hope they reveal the win estimates for each move of the 5 
> games. It will especially be interesting to then compare that to the 
> other leading MCTS programs.
>
> Darren
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Congratulations to AlphaGo

2016-03-12 Thread Brian Sheppard
Play on KGS. Pros can be anonymous, and test themselves and AlphaGo at the same 
time. :-)

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Jim 
O'Flaherty
Sent: Saturday, March 12, 2016 4:56 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Congratulations to AlphaGo

 

I think you're correct, Thomas. The challenge is going to be getting ANY 
professional to be the one who "takes handicap stones" for the first time in 
years. The possible "shame" of doing so is what will make it messy.

 

Once someone does take that step, though, I think it is only a matter of time 
before the rating of humans will be made a subordinate rating relative to the 
"objective" rating of the AIs, AlphaGo just being the first. And that has its 
own psychological challenges as the Go world has many decades of handling ELOs 
and rankings for humans. So, I don't think change in this area is going to be 
welcomed anytime soon.

 

 

On Sat, Mar 12, 2016 at 3:03 PM, Thomas Wolf  > wrote:

Chris,

Prompted from a discussion on the computer go email list
(and my last email today) :

We currently have no measure at all to judge how safe a winor loss is at any
stage of the game. The measure applied currently of counting territory does
only apply if both players try to maximize territory but not if at least one
player maximizes the chance of winning. (I know, it was mentioned already).

But really, comments like "Player ... is catching up" are pretty meaningless
and are only valid if one explicitly mentions points or territorry, and adds
that this has nothing to do with winning probabilities.

Even the winning percentages provided by the computer programs themselves are
no real indicator for winninig chances. They are tools to find the best move
and are a statistical measure over several playout sequences based on selfplay
not based on play against that opponent. Equally, winning percentages worked
out by other computer programs are also not adequate (although they are at
least unbiased) because they do also not use the real opponents to play out
the sequences.

The only valid strength indicator would be to gradually increase handicap
stones or komi for the previous loser in a series of games.

Regards,
Thomas



On Sat, 12 Mar 2016, Sorin Gherman wrote:


It is fascinating indeed to try to find how much stronger is AlphaGo
compared to top humans.

Given the fact that it is hard to find the reason why Lee Sedol lost, and
that AlphaGo seems to get mysteriously ahead without a clear reason, tells
me that the difference is definitely more than one stone handicap, maybe 2+
stones, as crazy as it may sound given Lee Sedol's level.

I am pretty sure he will not accept to play with handicap against AlphaGo
though. Maybe "younger wolves" like Ke Jie will though and we will find out.

On Mar 12, 2016 11:03 AM, "Thomas Wolf"  > wrote:
  A suggestion for possible future games to be arranged between
  AlphaGo and
  strong players:

  Whoever lost shall be given 1 stone or the equivalent of 1/2
  stone handcap in the
  next game. Games should continue until each side has won at
  least once. This
  way AlphaGo will be forced to demonstrate its full strength over
  a whole game
  which we are all too curious to see.

  Thomas

  On Sat, 12 Mar 2016, Aja Huang wrote:

Thanks all. AlphaGo has won the match against Lee
Sedol. But there are still 2 games to play.
Aja

On Sat, Mar 12, 2016 at 5:49 PM, Jim O'Flaherty
 > 
wrote:
  It was exhilerating to witness history being
made! Awesome!

On Sat, Mar 12, 2016 at 2:17 AM, David Fotland
 > wrote:

  Tremendous games by AlphaGo.  Congratulations!

   

  From: Computer-go
[mailto:computer-go-boun...@computer-go.org 
 ] On
Behalf Of Lukas van de Wiel
  Sent: Saturday, March 12, 2016 12:14 AM
  To: computer-go@computer-go.org 
 
  Subject: [Computer-go] Congratulations to
AlphaGo

 

Whoa, what a fight! Well fought, and well won!

Lukas


___
Computer-go mailing list
Computer-go@computer-go.org  
http://computer-go.org/mailman/listinfo/computer-go



___
Computer-go mailing list
Computer-go@computer-go.org  

Re: [Computer-go] Congratulations to AlphaGo

2016-03-12 Thread Brian Sheppard
Playing on KGS will give a good sense. Players have sustained ranks between 9 
and 10 dan there, but never over 10 dan. If AlphaGo can sustain over 10 dan, 
then there is clear separation.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Thomas Wolf
Sent: Saturday, March 12, 2016 4:04 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Congratulations to AlphaGo

Chris,

Prompted from a discussion on the computer go email list (and my last email 
today) :

We currently have no measure at all to judge how safe a winor loss is at any 
stage of the game. The measure applied currently of counting territory does 
only apply if both players try to maximize territory but not if at least one 
player maximizes the chance of winning. (I know, it was mentioned already).

But really, comments like "Player ... is catching up" are pretty meaningless 
and are only valid if one explicitly mentions points or territorry, and adds 
that this has nothing to do with winning probabilities.

Even the winning percentages provided by the computer programs themselves are 
no real indicator for winninig chances. They are tools to find the best move 
and are a statistical measure over several playout sequences based on selfplay 
not based on play against that opponent. Equally, winning percentages worked 
out by other computer programs are also not adequate (although they are at 
least unbiased) because they do also not use the real opponents to play out the 
sequences.

The only valid strength indicator would be to gradually increase handicap 
stones or komi for the previous loser in a series of games.

Regards,
Thomas

On Sat, 12 Mar 2016, Sorin Gherman wrote:

> 
> It is fascinating indeed to try to find how much stronger is AlphaGo 
> compared to top humans.
> 
> Given the fact that it is hard to find the reason why Lee Sedol lost, 
> and that AlphaGo seems to get mysteriously ahead without a clear 
> reason, tells me that the difference is definitely more than one stone 
> handicap, maybe 2+ stones, as crazy as it may sound given Lee Sedol's level.
> 
> I am pretty sure he will not accept to play with handicap against 
> AlphaGo though. Maybe "younger wolves" like Ke Jie will though and we will 
> find out.
> 
> On Mar 12, 2016 11:03 AM, "Thomas Wolf"  wrote:
>   A suggestion for possible future games to be arranged between
>   AlphaGo and
>   strong players:
>
>   Whoever lost shall be given 1 stone or the equivalent of 1/2
>   stone handcap in the
>   next game. Games should continue until each side has won at
>   least once. This
>   way AlphaGo will be forced to demonstrate its full strength over
>   a whole game
>   which we are all too curious to see.
>
>   Thomas
>
>   On Sat, 12 Mar 2016, Aja Huang wrote:
>
> Thanks all. AlphaGo has won the match against Lee
> Sedol. But there are still 2 games to play.
> Aja
>
> On Sat, Mar 12, 2016 at 5:49 PM, Jim O'Flaherty
>  wrote:
>   It was exhilerating to witness history being
> made! Awesome!
>
> On Sat, Mar 12, 2016 at 2:17 AM, David Fotland
>  wrote:
>
>   Tremendous games by AlphaGo.  Congratulations!
>
>
>
>   From: Computer-go
> [mailto:computer-go-boun...@computer-go.org] On
> Behalf Of Lukas van de Wiel
>   Sent: Saturday, March 12, 2016 12:14 AM
>   To: computer-go@computer-go.org
>   Subject: [Computer-go] Congratulations to
> AlphaGo
>
>  
>
> Whoa, what a fight! Well fought, and well won!
>
> Lukas
> 
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
> 
> 
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
> 
> 
> 
>
>   ___
>   Computer-go mailing list
>   Computer-go@computer-go.org
>   http://computer-go.org/mailman/listinfo/computer-go
> 
> 
>

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo & DCNN: Handling long-range dependency

2016-03-11 Thread Brian Sheppard
Actually chess software is much, much better. I recall that today's software 
running on 1998 hardware beats 1998 software running on today's hardware.

It was very soon after 1998 that ordinary PCs could play on a par with world 
champions.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
?? ???
Sent: Friday, March 11, 2016 7:18 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo & DCNN: Handling long-range dependency

I think that a desktop computer's calculating power appear to develop to a 
necessary level sooner then the algorithm may be optimized to use the power 
nowdays available. For example, I belive that chess programs run on a desktop 
well not because of a new better algotrithm but because the Deep Blue's 11.38 
GFLOP power is available on desktop from about 2006, in ten years only. So I 
think the speculation that Deep Mind will change the objective to a more 
advanced task is right :)

Dmitry

11.03.2016, 14:28, "Darren Cook" :
>>>  global, more long-term planning. A rumour so far suggests to have 
>>> used the
>>>  time for more learning, but I'd be surprised if this should have sufficed.
>>
>>  My personal hypothesis so far is that it might - the REINFORCE might
>>  scale amazingly well and just continuous application of it...
>
> Agreed. What they have built is a training data generator, that can 
> churn out 9-dan level moves, 24 hours a day. Over the years I've had 
> to throw away so many promising ideas because they came down to 
> needing a 9-dan pro to, say, do the tedious job of ranking all legal 
> moves in each test position.
>
> What I'm hoping Deep Mind will do next is study how to maintain the 
> same level but using less hardware, until they can shrink it down to 
> run on, say, a high-end desktop computer. The knowledge gained 
> obviously has a clear financial benefit just in running costs, and 
> computer-go is a nice objective domain to measure progress. (But the 
> cynic in me suspects they'll just move to the next bright and shiny AI 
> problem.)
>
> Darren
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Finding Alphago's Weaknesses

2016-03-10 Thread Brian Sheppard
Amen to Don Dailey. He would be so proud.

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Jim 
O'Flaherty
Sent: Thursday, March 10, 2016 6:49 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Finding Alphago's Weaknesses

 

I think we are going to see a case of human professionals having drifted into a 
local optima in at least three areas:

  1) Early training around openings is so ingrained in their acquiring their 
skill (optimal neural plasticity window), there has been very little new 
discovery around the first third of the game with almost all professionals 
relying fairly strongly on the already time tested josekis - AIs can use 
reading to explore closer and closer to the start of a game using less and less 
automatic patterns thereby confusing humans who have memorized those patterns

  2) The middle of the board is so high in reading complexity, there has been 
little investment to figure out how to leverage it until mid game as it has 
been more expedient to focus on the corners and edges - AIs are going to get 
faster, better and deeper at reading through and then intentionally generating 
complexity

  3) As a human's cognition tires, the probability of reading errors rises 
non-linearly which increases the probability of late mid-game and end game 
errors - I think AlphaGo has already progressed pretty far in the end game

 

I'd consider these the three primary general vulnerabilities of human Go 
playing against any future AI. Given AlphaGo's training mechanism is actually 
search space exploration engine, it will slowly but surely explore and converge 
on more optimal play in all three of these domains significantly faster and 
cheaper than directly investing in and expending human cognition efforts; i.e. 
professionals studying to do the knowledge expansion and verification. And I 
think they will continue to optimize AlphaGo's algorithms in both human and 
self-play.

 

The window where humans are going to be able to fish out a win against AlphaGo 
is rapidly closing...and it may have already closed.

 

 

Other thoughts...

 

I think we are going to see some fascinating "discoveries" of errors in 
existing very old josekis. At some point, I think we will even see one or two 
new ones discovered by AIs or by humans exploiting AIs. We are going to see 
some new center oriented fighting based on vastly more complex move sequences 
which will result in an substantial increase in resignations at the 
professional level against each other. 

 

Said a slightly different way...even if Lee Sedol figures how how to get a lead 
in a game during the opening, AlphaGo will just continue to elevate the board 
complexity with each move until it is just beyond its opponent's reading 
ability while staying well within it's own reading ability constraints. IOW, 
complexity is now an AIs advantage. AlphaGo doesn't have the human frailty of 
being nervous of a possible future mistake and then altering its priorities by 
pushing winning by a higher margin as a buffer against said future reading 
complexity mistake. IOW, AlphaGo is regulated by it's algorithm's prioritizing 
the probability of win higher than the amount of margin by which it could 
buffer for a win. What seems like a weakness is turning out to be one hell of a 
strength.

 

Add to the fact that this kind of behavior by AlphaGo is denying it's opponent 
critical information about the state of the game which is more readily 
available in human-vs-human games; i.e. AlphaGo's will continue to converge 
towards calmer and calmer play in the face of chaotic play. And the calmer it 
becomes, the less "weakness surface area" it will have for a human to exploit 
in attempting a win.

 

This is utterly fascinating to get to witness. I sure wish Don Daily was still 
here to get to enjoy this.

 

 

On Thu, Mar 10, 2016 at 2:52 PM, Thomas Wolf  > wrote:

With at most 2x361 or so different end scores but 10^{XXX} possible different
games, there are at least in the opening many moves with the same optimal
outcome. The difference between these moves is not the guaranteed score (they
are all optimal) but the difficulty to play optimal after that move. And the
human and computer strengths are rather different.



On Thu, 10 Mar 2016, uurtamo . wrote:


If that's the case, then they should be able to give opinions on best first 
moves, best first two move combos, and best first three move combos. That'd
be interesting to see. (Top 10 or so of each).

s.

On Mar 10, 2016 12:37 PM, "Sorin Gherman"  > wrote:

  From reading their article, AlphaGo makes no difference at all between 
start, middle and endgame.
  Just like any other position, the empty (or almost empty, or almost full) 
board is just another game position in which it chooses (one of)
  the most promising moves in order to maximize her chance of 

Re: [Computer-go] Neural Net move prediction

2016-02-04 Thread Brian Sheppard
The method is not likely to work, since the goal of NN training s to reduce the 
residual error to a random function of the NN inputs. If the NN succeeds at 
this, then there will be no signal to train against. If the NN fails, then it 
could be because the NN is not large enough, or because there were aspects of 
training that were set up incorrectly.

 

What this comes down to: my experience is that you would be better off making a 
single network that is twice as large.

 

This feedback applies strictly to the exact method being proposed. It is very 
likely that there are other ways to use multiple networks that would be 
significant improvements over using a single network for the whole space.

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Huazuo Gao
Sent: Thursday, February 4, 2016 7:51 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Neural Net move prediction

 

Sounds like some kind of boosting, I suppose?

 

On Thu, Feb 4, 2016 at 7:52 PM Marc Landgraf  > wrote:

Hi,

lately a friend and me wondered about the following idea.

Let's assume you have a reasonably strong move prediction DCNN. What
happens if you now train a second net on the same database.
When training the first net, you tried to maximize the judgement value
of the expert move. But for the second net you now try to maximize the
maximum of the judgement of both nets. This means, that the second net
does not profit from finding moves the first net can easily find, but
instead will try to fill in the weaknesses of the first net.
In practical application the easy static usage would be to first
expand the top2 candidates of the first net, then mix in the top
candidate of the second net, then again the next 2 candidates from the
first net, etc.

What do you guys think about that?

Cheers, Marc
___
Computer-go mailing list
Computer-go@computer-go.org  
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering the Game of Go with Deep Neural Networks and Tree Search

2016-02-01 Thread Brian Sheppard
You play until neither player wishes to make a move. The players are willing to 
move on any point that is not self-atari, and they are willing to make 
self-atari plays if capture would result in a Nakade 
(http://senseis.xmp.net/?Nakade)

 

This correctly plays seki. 

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Brian Cloutier
Sent: Monday, February 1, 2016 5:02 PM
To: computer-go 
Subject: Re: [Computer-go] Mastering the Game of Go with Deep Neural Networks 
and Tree Search

 

> One thing that is not explained is how to determine that a game is over

You'll find that very little of the literature explicitly covers this. When I 
asked this question I had to search a lot of papers on MCTS which mentioned 
"terminal states" before finding one which defined them.

Let me see if I can find the actual paper, but they defined it as a position 
where there are no more legal moves. You're right though, that ignores sekis, 
which makes me think I'm remembering wrong.

On Mon, Feb 1, 2016, 13:45 Álvaro Begué  > wrote:

 

Aja,

 

I read the paper with great interest. [Insert appropriate praises here.]

 

I am trying to understand the part where you use reinforcement learning to 
improve upon the CNN trained by imitating humans. One thing that is not 
explained is how to determine that a game is over, particularly when a player 
is simply a CNN that has a probability distribution as its output. Do you play 
until every point is either a suicide or looks like an eye? Do you do anything 
to make sure you don't play in a seki?

 

I am sure you are a busy man these days, so please answer only when you have 
time.

 

Thanks!

Álvaro.

 

 

 

On Wed, Jan 27, 2016 at 1:46 PM, Aja Huang  > wrote:

Hi all,

 

We are very excited to announce that our Go program, AlphaGo, has beaten a 
professional player for the first time. AlphaGo beat the European champion Fan 
Hui by 5 games to 0. We hope you enjoy our paper, published in Nature today. 
The paper and all the games can be found here: 

 

http://www.deepmind.com/alpha-go.html

 

AlphaGo will be competing in a match against Lee Sedol in Seoul, this March, to 
see whether we finally have a Go program that is stronger than any human! 

 

Aja

 

PS I am very busy preparing AlphaGo for the match, so apologies in advance if I 
cannot respond to all questions about AlphaGo.


___
Computer-go mailing list
Computer-go@computer-go.org  
http://computer-go.org/mailman/listinfo/computer-go

 

___
Computer-go mailing list
Computer-go@computer-go.org  
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Game Over

2016-01-28 Thread Brian Sheppard
I would just mention that Maven/Scrabble truncated rollouts are not comparable 
to Go/MCTS truncated rollouts. An evaluation function in Scrabble is readily at 
hand, because scoring points is hugely correlated with winning. There is no 
evaluation function for Go that is readily at hand.

There have been some efforts at whole-board evaluation in Go. Maybe NeuroGo was 
the earliest really cool demonstration. But I never saw anything that gave me 
confidence that the approach could work when embedded in an MCTS framework. I 
am blown away.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
"Ingo Althöfer"
Sent: Thursday, January 28, 2016 1:00 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Game Over

Hello Anders,

thanks for the summary on the smartgo site.

> ... the truncated rollouts mentioned in the paper are still unclear to me.

The greatest expert on these rollouts might be Richard Lorentz.
He applied them successfully to his bots in the games Amazons (not to be mixed 
up
with the online bookshop), Havannah and Breakthrough. Richard found that in many
applications a truncation level of 4 moves seem to work quite well.
There is a paper by him on this topic in the proceedings of the conference
Advances in Computer Games 2015 (in Leiden , NL), published by Springer
Lecture Notes in Computer Science (LNCS).

A very early application of truncated rollouts was applied by Brian
Sheppard in his bot for Scrabble (MAVEN).

Ingo. 
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] alternative for cgos

2015-01-13 Thread Brian Sheppard
Alas, the limitations of a hosted environment would make participating in 
Baduk.io a non-starter for me. It is much better to build a selection of 
standard bots and run tests on my own hardware. Which is what I have done in 
the absence of CGOS.

 

IMO, running using one’s own hardware/OS/language is a requirement.

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Chris LaRose
Sent: Tuesday, January 13, 2015 9:05 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] alternative for cgos

 

Thanks everyone for all the feedback! Sorry I hijacked the thread!

 

There are definitely some big pros and cons to a hosted, containerized 
environment for Go bots. I've replied to some comments below:

 

Won’t hosting limit your usability?  With cgos I can build and immediately test 
on cgos on my development machine.  With your service, how do I get my new 
executable to run?

 

You're absolutely right. The workflow for developers will have some additional 
overhead. The nice thing, though, is that Docker is very good at providing a 
way for an application to run virtually anywhere in a consistent fashion. That 
is, if you can get your bot running in a Docker container on your own machine, 
then it's pretty likely that it'll run on Baduk.io's Docker host without a 
hitch.

 

 If my engine uses a GPU or is a multinode cluster, how does that run on your 
docker service?

 

Right now, I don't have plans for supporting such bots. But as Urban Hafner 
replied, it puts everyone on a level playing field--all bots have access to the 
same exact resources. As some have noted, maybe that means Baduk.io wouldn't 
serve the same purpose as CGOS. That's fine by me.

 

At first, I want Baduk.io to be a place where players, especially beginners can 
get a few games in against bots. Because the bots are hosted, I'm not limited 
in the number of simultaneous games I can play against one bot. As far as I 
understand it, it GNU Go is playing against five different people on KGS, that 
means that there are five different instances of GNU Go running on people's 
machines someone in the world. If someone else wants to play, they can't. Also, 
bots that aren't currently playing a game can be terminated and won't consume 
resources. Starting and stopping containers is so fast that I can afford to 
only start bots immediately after its opponent plays, request a single move, 
and terminate it.

 

Anyway, Baduk.io is largely only a proof-of-concept right now--I'll have to 
post a message to the list when I get a little further working on it.

 

On Wed, Jan 14, 2015 at 8:33 AM, Joshua Shriver jshri...@gmail.com 
mailto:jshri...@gmail.com  wrote:

If you can send me a binary that would be greatly appreciated.  Trying
to build some anchors now.

-Josh


On Tue, Jan 13, 2015 at 5:11 PM, Hideki Kato hideki_ka...@ybb.ne.jp 
mailto:hideki_ka...@ybb.ne.jp  wrote:
 Shilver,

 I'll be able to run FatMan1, the anchor for 9x9, on my site, if
 necessary.  Or, it's also possible to send you its binary and password
 so that you can run it on your site.

 Hideki

 Joshua Shriver: 
 CAEdmgvYkFd-tDGL7TxYd55_bWpNBK4gk_VLBNHpWto=xwb5...@mail.gmail.com 
 mailto:xwb5...@mail.gmail.com :
I'll try and get CGOS back online before this weekend.  Technically it
should be running now, but there were several issues.  In order to use
it now  you take the cgos  client for your architecture and you have
to specify cgos.computergo.org http://cgos.computergo.org   manually since 
the binaries are
hardcoded to the old boardspace address.  I've had some troubles
unbundling the binaries and rebuilding the executables with TCL.

Rankings are also an issue as well which is something I'll have to
change in the code to make sure anchors and their predefined ELO
ratings are used.

Will try and make a better write-up on how to connect.  Hopefully this
weekend I should have the anchors running 24/7 and some people can try
connecting.

I'll flush the old data and in terms of games and we'll start with a
fresh slate.  Though all the data even from years past are still
available though for historic reasons and for anyone who wants the
SGF's.

-Josh

On Fri, Jan 9, 2015 at 5:47 AM, folkert folk...@vanheusden.com 
mailto:folk...@vanheusden.com  wrote:
 Hi,

 I have the feeling that cgos won't come back in even the distant future
 so I was wondering if there are any alternatives?
 E.g. a server that constantly lets go engines play against each other
 and then determines an elo rating for them.


 Folkert van Heusden

 --
 Afraid of irssi? Scared of bitchx? Does xchat gives you bad shivers?
 In all these cases take a look at http://www.vanheusden.com/fi/ maybe
 even try it or use it for all your day-to-day IRC conversations!
 ---
 Phone: +31-6-41278122 tel:%2B31-6-41278122 , PGP-key: 1F28D8AE, 
 www.vanheusden.com http://www.vanheusden.com 
 

Re: [Computer-go] Move Evaluation in Go Using Deep Convolutional Neural Networks

2014-12-23 Thread Brian Sheppard
A 3-layer network (input, hidden, output) is sufficient to be a universal 
function approximator, so from a theoretical perspective only 3 layers are 
necessary. But the gap between theoretical and practical is quite large.

 

The CNN architecture builds in translation invariance and sensitivity to local  
phenomena. That gives it a big advantage (on a per distinct weight basis) over 
the flat architecture.

 

Additionally, the input layers of these CNN designs are very important. 
Compared to a stone-by-stone representation, the use of high level concepts in 
the input layer allows the network to devote its capacity to advanced concepts 
rather than synthesizing basic concepts.

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
uurtamo .
Sent: Tuesday, December 23, 2014 7:34 PM
To: computer-go
Subject: Re: [Computer-go] Move Evaluation in Go Using Deep Convolutional 
Neural Networks

 

I thought that any layers beyond 3 were irrelevant. Probably I'm subsuming your 
nn into what I learned about nn's and didn't read anything carefully enough.

Can you help correct me?

s.

On Dec 23, 2014 6:47 AM, Aja Huang ajahu...@google.com 
mailto:ajahu...@google.com  wrote:

On Mon, Dec 22, 2014 at 12:38 PM, David Silver davidstarsil...@gmail.com 
mailto:davidstarsil...@gmail.com  wrote:

we'll evaluate against Fuego 1.1 and post the results. 

 

I quickly tested our 12-layer CNN against Fuego 1.1 with 5 secs and 10 secs per 
move, 2 threads. The hardware is Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz.

 

5 secs per move 12-layer CNN scored 55.8% ±5.4

10 secs per move   12-layer CNN scored 32.9% ±3.8

 

Fuego1.1 is clearly much weaker than the latest svn release. And interestingly, 
the network is actually as strong as Fuego 1.1 with 5 secs per move. 

 

Since Clark and Storkey's CNN scored 12% against Fuego 1.1 running on a weaker 
hardware, our best CNN is about 220 to 310 Elo stronger which is consistent to 
the results against GnuGo.

 

Regards,

Aja

 


___
Computer-go mailing list
Computer-go@computer-go.org mailto:Computer-go@computer-go.org 
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Teaching Deep Convolutional Neural Networks to Play Go

2014-12-19 Thread Brian Sheppard
This is Aya's move predictor(W) vs GNU Go(B).
http://eidogo.com/#3BNw8ez0R
I think previous move effect is too strong.


This is a good example of why a good playout engine will not necessarily play 
well. The purpose of the playout policy is to *balance* errors. Following your 
opponent's last play is very helpful in that regard.


-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Hiroshi Yamashita
Sent: Friday, December 19, 2014 10:25 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Teaching Deep Convolutional Neural Networks to Play 
Go

Hi,

 The predictor is white.  It really does just play shapes, but 
 evidently it's plenty enough sometimes or against weaker opponents.

I saw some games, and my impression are

DCNN sees board widely.
Without previous move info, DCNN can answer opponent move.
It knows well corner life and death shape.
It does not understand two eyes, and ladder.
Tactical fight is weak.
Ko fight is weak. Ko threat is simpley good pattern move.
It does not understand semeai which has many libs, like 4 vs 5.
 So it will not help to generate semeai moves.


This is Aya's move predictor(W) vs GNU Go(B).
http://eidogo.com/#3BNw8ez0R
I think previous move effect is too strong.

Hiroshi Yamashita

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Teaching Deep Convolutional Neural Networks to Play Go

2014-12-16 Thread Brian Sheppard
My impression is that each feature gets a single weight in Crazy-stone. The 
team-of-features aspect arises because a single point can match several 
patterns, so you need a model to assign credit when tuning. The paper that I 
remember used fixed receptive fields to define the patterns. (E.g., from 3x3 
through 5x10, or some such.) The easiest way to match those is to use a hash 
function.

 

My impression is that both NN and DT are capable of asymptotically learning the 
entire game. (Also true if you use fixed receptive fields.) They should be 
equally powerful, though they differ in terms of the degree of understanding 
required by the programmer.

 

IMO, MCTS should always be the outermost loop in the system. MCTS provides 
asymptotic optimality guarantees under remarkably general conditions.

 

 

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
René van de Veerdonk
Sent: Monday, December 15, 2014 11:47 PM
To: computer-go
Subject: Re: [Computer-go] Teaching Deep Convolutional Neural Networks to Play 
Go

 

Correct me if I am wrong, but I believe that the CrazyStone approach of 
team-of-features can be cast in terms of a shallow neural network. The inputs 
are matched patterns on the board and other local information on atari, 
previous moves, ko situation, and such. Remi alluded as much on this list 
sometime after his paper got published.

 

Without having studied the Deep Learning papers in detail, it seems that these 
are the types of smart features that could be learned by a Deep Neural Net in 
the first few layers if the input is restricted to just the raw board, but 
could equally well be provided as domain specific features in order to improve 
computational efficiency (and perhaps enforce correctness).

 

These approaches may not be all that far apart, other than the depth of the net 
and the domain specific knowledge used directly. Remi recently mentioned that 
the number of patterns in more recent versions of CrazyStone also number in the 
millions. I think the prediction rates for these two approaches are also pretty 
close. Compare the Deep Learning result to the other recent study of a German 
group quoted in the Deep Learning paper.

 

The bigger questions to me are related to engine architecture. Are you going to 
use this as an input to a search? Or are you going to use this directly to 
play? If the former, it had better be reasonably fast. The latter approach can 
be far slower, but requires the predictions to be of much higher quality. And 
the biggest question, how can you make these two approaches interact 
efficiently?

 

René

 

On Mon, Dec 15, 2014 at 8:00 PM, Brian Sheppard sheppar...@aol.com wrote:

Is it really such a burden?

 

Well, I have to place my bets on some things and not on others.

 

It seems to me that the costs of a NN must be higher than a system based on 
decision trees. The convolution NN has a very large parameter space if my 
reading of the paper is correct. Specifically, it can represent all patterns 
translated and rotated and matched against all points in parallel.

 

To me, that seems like a good way to mimic the visual cortex, but an 
inefficient way to match patterns on a Go board.

 

So my bet is on decision trees. The published research on NN will help me to 
understand the opportunities much better, and I have every expectation that the 
performance of decision trees should be = NN in every way. E.g., faster, more 
accurate, easier and faster to tune. 

 

I recognize that my approach is full of challenges. E.g., a NN would 
automatically infer soft qualities such as wall, influence that would 
have to be provided to a DT as inputs. No free lunch, but again, this is about 
betting that one technology is (overall) more suitable than another.

 

 

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Stefan Kaitschick
Sent: Monday, December 15, 2014 6:37 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Teaching Deep Convolutional Neural Networks to Play 
Go

 

 


Finally, I am not a fan of NN in the MCTS architecture. The NN architecture 
imposes a high CPU burden (e.g., compared to decision trees), and this study 
didn't produce such a breakthrough in accuracy that I would give away 
performance.

 

 Is it really such a burden? Supporting the move generator with the NN result 
high up in the decision tree can't be that expensive.


___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Teaching Deep Convolutional Neural Networks to Play Go

2014-12-16 Thread Brian Sheppard
I am very much looking forward to your paper. I do see the CNN research as a 
new idea that has great potential.

 

 Linear models like what MM is using are ... far less powerful than CNN

 

My mathematical objection is that this cannot be. The no free lunch theorem 
applies, and besides, both representations are provably asymptotically perfect 
given sufficiently accurate training data.

 

I am skeptical about the practical claims, but maybe your paper will sway me.

 

Best,

Brian.

 

 

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Aja 
Huang
Sent: Tuesday, December 16, 2014 10:23 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Teaching Deep Convolutional Neural Networks to Play 
Go

 

Hi Brian,

 

I understand your points, but deep convolutional neural networks are very 
powerful in the sense that they can represent very complicated functions and 
are still trainable. Linear models like what MM is using are much simpler and 
easier to train, but far less powerful than CNN. 

 

In our paper we have presented the theoretical background and very good results 
using CNN on Go. Once approved I'll post the paper and I hope you will enjoy 
reading it.

 

Regards,

Aja

 

On Tue, Dec 16, 2014 at 2:03 PM, Brian Sheppard sheppar...@aol.com wrote:

My impression is that each feature gets a single weight in Crazy-stone. The 
team-of-features aspect arises because a single point can match several 
patterns, so you need a model to assign credit when tuning. The paper that I 
remember used fixed receptive fields to define the patterns. (E.g., from 3x3 
through 5x10, or some such.) The easiest way to match those is to use a hash 
function.

 

My impression is that both NN and DT are capable of asymptotically learning the 
entire game. (Also true if you use fixed receptive fields.) They should be 
equally powerful, though they differ in terms of the degree of understanding 
required by the programmer.

 

IMO, MCTS should always be the outermost loop in the system. MCTS provides 
asymptotic optimality guarantees under remarkably general conditions.

 

 

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
René van de Veerdonk
Sent: Monday, December 15, 2014 11:47 PM
To: computer-go
Subject: Re: [Computer-go] Teaching Deep Convolutional Neural Networks to Play 
Go

 

Correct me if I am wrong, but I believe that the CrazyStone approach of 
team-of-features can be cast in terms of a shallow neural network. The inputs 
are matched patterns on the board and other local information on atari, 
previous moves, ko situation, and such. Remi alluded as much on this list 
sometime after his paper got published.

 

Without having studied the Deep Learning papers in detail, it seems that these 
are the types of smart features that could be learned by a Deep Neural Net in 
the first few layers if the input is restricted to just the raw board, but 
could equally well be provided as domain specific features in order to improve 
computational efficiency (and perhaps enforce correctness).

 

These approaches may not be all that far apart, other than the depth of the net 
and the domain specific knowledge used directly. Remi recently mentioned that 
the number of patterns in more recent versions of CrazyStone also number in the 
millions. I think the prediction rates for these two approaches are also pretty 
close. Compare the Deep Learning result to the other recent study of a German 
group quoted in the Deep Learning paper.

 

The bigger questions to me are related to engine architecture. Are you going to 
use this as an input to a search? Or are you going to use this directly to 
play? If the former, it had better be reasonably fast. The latter approach can 
be far slower, but requires the predictions to be of much higher quality. And 
the biggest question, how can you make these two approaches interact 
efficiently?

 

René

 

On Mon, Dec 15, 2014 at 8:00 PM, Brian Sheppard sheppar...@aol.com wrote:

Is it really such a burden?

 

Well, I have to place my bets on some things and not on others.

 

It seems to me that the costs of a NN must be higher than a system based on 
decision trees. The convolution NN has a very large parameter space if my 
reading of the paper is correct. Specifically, it can represent all patterns 
translated and rotated and matched against all points in parallel.

 

To me, that seems like a good way to mimic the visual cortex, but an 
inefficient way to match patterns on a Go board.

 

So my bet is on decision trees. The published research on NN will help me to 
understand the opportunities much better, and I have every expectation that the 
performance of decision trees should be = NN in every way. E.g., faster, more 
accurate, easier and faster to tune. 

 

I recognize that my approach is full of challenges. E.g., a NN would 
automatically infer soft qualities such as wall, influence that would 
have

Re: [Computer-go] Teaching Deep Convolutional Neural Networks to Play Go

2014-12-15 Thread Brian Sheppard
So I read this kind of study with some skepticism. My guess is that the 
large-scale pattern systems in use by leading programs are already pretty 
good for their purpose (i.e., progressive bias).

Rampant personal and unverified speculation follows...
--
I found the 14% win rate against Fuego is potentially impressive, but I didn't 
get a sense for Fuego's effort level in those games. E.g., Elo ratings. MCTS 
actually doesn't play particularly well until a sufficient investment is made.

I am not sure what to think about winning 91% against Gnu Go. Gnu Go makes a 
lot of moves based on rules, so it replays games. I found that many of 
Pebbles games against Gnu Go were move-for-move repeats of previous games, so 
much so that I had to randomize Pebbles if I wanted to use Gnu Go for 
calibrating parameters. My guess is that the 91% rate is substantially 
attributable to the way that Gnu Go's rule set interacts with the positions 
that the NN likes. This could be a measure of strength, but not necessarily.

My impression is that the progressive bias systems in MCTS programs should 
prioritize interesting moves to search. A good progressive bias system might 
have a high move prediction rate, but that will be a side-effect of tuning it 
for its intended purpose. E.g., it is important to search a lot of bad moves 
because you need to know for *certain* that they are bad.

Similarly, it is my impression is that a good progressive bias engine does not 
have to be a strong stand-alone player. Strong play implies a degree of 
tactical pattern matching that is not necessary when the system's 
responsibility is to prioritize moves. Tactical accuracy should be delegated to 
the search engine. The theoretical prediction is that MCTS search will be 
(asymptotically) a better judge of tactical results.

Finally, I am not a fan of NN in the MCTS architecture. The NN architecture 
imposes a high CPU burden (e.g., compared to decision trees), and this study 
didn't produce such a breakthrough in accuracy that I would give away 
performance.


-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Hiroshi Yamashita
Sent: Monday, December 15, 2014 10:27 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Teaching Deep Convolutional Neural Networks to Play 
Go

I tested Aya's move prediction strength.

Prediction rate is 38.8% (first choice is same as pro's move)

against GNU Go 3.7.10 Level 10

   winrate  games
19x19   0.059 607
13x13   0.170 545
9x9 0.1411020

I was bit surprised there is no big difference from 9x9 to 19x19.
But 6% in 19x19 is still low, paper's 91% winrate is really high.
It must understand whole board life and death.
I'd like to see their sgf vs GNU Go and Fuego.

Aya's prediction includes local string capture search.
So this result maybe include some look-ahead.
Aya uses this move prediction in UCT, playout uses another prediction.
Aya gets 50% against GNU Go with 300 playout in 19x19 and 100 in 9x9.

Regards,
Hiroshi Yamashita

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Teaching Deep Convolutional Neural Networks to Play Go

2014-12-15 Thread Brian Sheppard
Is it really such a burden?

 

Well, I have to place my bets on some things and not on others.

 

It seems to me that the costs of a NN must be higher than a system based on 
decision trees. The convolution NN has a very large parameter space if my 
reading of the paper is correct. Specifically, it can represent all patterns 
translated and rotated and matched against all points in parallel.

 

To me, that seems like a good way to mimic the visual cortex, but an 
inefficient way to match patterns on a Go board.

 

So my bet is on decision trees. The published research on NN will help me to 
understand the opportunities much better, and I have every expectation that the 
performance of decision trees should be = NN in every way. E.g., faster, more 
accurate, easier and faster to tune. 

 

I recognize that my approach is full of challenges. E.g., a NN would 
automatically infer soft qualities such as wall, influence that would 
have to be provided to a DT as inputs. No free lunch, but again, this is about 
betting that one technology is (overall) more suitable than another.

 

 

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Stefan Kaitschick
Sent: Monday, December 15, 2014 6:37 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Teaching Deep Convolutional Neural Networks to Play 
Go

 

 


Finally, I am not a fan of NN in the MCTS architecture. The NN architecture 
imposes a high CPU burden (e.g., compared to decision trees), and this study 
didn't produce such a breakthrough in accuracy that I would give away 
performance.

 

 Is it really such a burden? Supporting the move generator with the NN result 
high up in the decision tree can't be that expensive.

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[computer-go] Paper about Mogo's Opening Strategy

2010-01-15 Thread Brian Sheppard
I recommend the paper
http://hal.inria.fr/docs/00/36/97/83/PDF/ouvertures9x9.pdf by the Mogo team,
which describes how to use a grid to compute Mogo's opening book using
coevolution.

Brian

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Optimizing combinations of flags

2009-11-24 Thread Brian Sheppard
What do you do when you add a new parameter? Do you retain your existing
'history', considering each game to have been played with the value of
the new parameter set to zero?

Yes, exactly.

If you have 50 parameters already, doesn't adding a new parameter create
a rather large number of new parameter sets, most of which there will
never be time to investigate?

Yes. So the new parameter will drift to its optimal value against the
existing parameter values.

But here's the thing: declining epsilon greedy policies are zero regret
starting from any initial state. So if the setting of the new parameter
affects old parameter settings, then the established parameters will start
to move as well.

If the objective function is a convex function of the parameters (which is
generally the case, based on the curves that I have seen) then the whole
system will drift to a global optimum.

I have been using UCB and UCT to tune engine settings, but I don't think
these methods work well to tune more than a handful of parameters at the
same time.

Such systems have trouble because their exploration is a *deterministic*
function of the sequence of wins. That is, all parameters will lock into the
same set of feedback. If you use UCT, then you have to optimize
*combinations* of parameters, which is unwieldy.

Declining epsilon greedy is a randomized exploration strategy, but still
zero-regret. Now the same sequence of wins/losses can be used to tune all
parameters concurrently.


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Optimizing combinations of flags

2009-11-23 Thread Brian Sheppard
From what I understand, for each parameter you take with some high
probability the best so far, and with some lower probability the least
tried one. This requires (manually) enumerating all parameters on some
integer scale, if I got it correctly.

Yes, for each parameter you make a range of values that you believe brackets
the optimal value. Then the optimizer takes over and (eventually)
concentrates attention on the value that works best.

For instance, if your program includes a parameter for the number of RAVE
wins to initially assign to an atari move, then you might test 0, 1, 2, ...
10. (Note that parameters do not have to be integers. Just an integer number
of choices.)

Note that a value of 0 sometimes turns off a feature, in effect. When a
parameter is turned off I will put the entire code within an if-statement,
to approximate the CPU gain from omitting the code entirely:

if (theParameter != 0) {
   evaluation += theParameter * someFunction();
}

For the domain of Go, I typically find a broad range of near-optimal values.
In fact, there is often insignificant variations over the entire range:
perhaps a 1% difference. But I do see 10% gains for certain features, and I
never know what will be important.

I have tested systems that automatically create new values by interpolation.
But I haven't seen any benefit.


Given a set of parameters each with a range of values, and a set of game
results between different parameter sets (or a score average for each
set versus a reference), how to determine:

a) the most likely optimal set of parameters
b) the most interesting parameter set to test next

The system above will choose the most interesting parameter set to test
next. Now, whether the policy is zero-regret is more complicated.

In the case of a single parameter, the method is zero regret. If you want to
automatically introduce new values to test, then you can make a zero-regret
policy if the objective function is bounded and continuous.

But the more interesting case is where you have multiple parameters. In that
case, I think that my policy will find the global optimum if the objective
function is convex.

Convexity is not guaranteed in our domain, but is often the case. That is,
the benefit is small if a parameter is either too small or too large, and
the optimum is somewhere in between.

I could be wrong about some of these statements, because I haven't paused to
try to prove anything from a mathematical perspective. Instead, I am
convinced by some experimental observation of some pragmatic outcomes:

  1) does the optimizer tweak parameters towards optimality? (yes)

  2) does the optimizer work without needing my attention? (yes)

  3) if I change the program in other ways, will it adapt? (yes)

  4) if I code a bad feature, will the optimizer stop using it? (yes)

All together, this is not bad value.

I should mention one tweak, relating to the fact that your program might
sometimes face strong players, and other times weak players. I keep a record
of average rating for the opponents faced by each parameter value. Then when
I explore I can choose either the least-taken parameter value or the
parameter value whose average opponent would be most affected by playing
this game. This policy keeps average opponent ratings in line.

Brian

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Optimizing combinations of flags

2009-11-23 Thread Brian Sheppard
Your system seems very interesting but it seems to me that you assume
that each parameters are independant.
What happen if, for example, two parameters works well when only one of
the is active and badly if the two are actives at the same time ?

I think that I am assuming only that the objective function is convex. The
parameters in Go programs are always inter-dependent.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Optiizing combinations of flags

2009-11-20 Thread Brian Sheppard
On that topic, I have around 17 flag who enable or not features in my
pure playouts bots, and I want to search the best combinations of them.
I known this is almost a dream but does anyone know the best way to
approximate this.

Pebbles randomly chooses (using a zero asymptotic regret strategy) parameter
values before each game. I literally never manually tune parameters for
Pebbles. I just set up experiments, and put them on a parameter for my
optimizer to manage. After a few hundred games it is clear what the right
choices are.

My favorite exploration strategy is a declining epsilon greedy strategy. I
like it because it is a randomized strategy, so I can optimize all
parameters concurrently using a single stream of games. In this strategy,
one chooses a random number p, and then select the strategy with highest
historical mean if p  epsilon, and the strategy taken least often
otherwise. If epsilon = C*log(n)/n, where n is the number of experiments so
far, then the strategy has zero asymptotic regret.

Pebbles has about 50 parameters right now. Most are pretty settled because
they have thousands of games of experience. All are potentially modified
before each game.

Brian

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] A Tenure Track Position in Japan

2009-11-05 Thread Brian Sheppard
Man, I just don't live in the right location. First Paris, and now Japan.
Can't a position open in, say, Boston? :-(

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] MPI vs Thread-safe

2009-11-01 Thread Brian Sheppard
The PRNG (pseudo random number generator) can cause the super-linear 
speed-up, for example.

Really? The serial simulation argument seems to apply.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] MPI vs Thread-safe

2009-10-30 Thread Brian Sheppard
I personally just use root parallelization in Pachi

I think this answers my question; each core in Pachi independently explores
a tree, and the master thread merges the data. This is even though you have
shared memory on your machine.


Have you read the Parallel Monte-Carlo Tree Search paper?

Yes, both the Mango team's work and Bruno Bouzy's pioneering work.


It sums up the possibilities nicely.

Well, I have doubts. I am not saying anything negative regarding their work,
which I am confident is an accurate representation of the experimental data.
But there are many possibilities for parallelization that are not covered by
those papers.

For instance, the dichotomy between global and local mutexes is
artificial. You can take any number N of physical locks and multiplex locks
for tree nodes onto those N. This provides a range of intermediate
algorithms that do not have as much contention as global, and not as much
data as local.

You also have possibilities for largely lockless thread safety. For
instance, the Intel architecture has atomic memory access instructions that
allow lockless data safety. Remi Coulom published a paper on this subject.

I am not even sure that leaf parallelization is really ruled out. For
example, if GPU-based implementation works, then leaf parallelization must
be reconsidered.


 similar to what you probably mean by MPI, though without resyncs

MPI is message passing interface an industry standard API for supporting
high performance computing.

It is used for sharing data among multiple processes (that is, no shared
memory). I recall that MoGo published that their massively scalable strategy
is based on this approach.


confirming the paper's finding that the play improvement is
larger than multiplying number of sequential playouts appropriately.

Well, this is another reason why I doubt the results from the Mango paper.
Parallelization *cannot* provide super-linear speed-up. The existence of
super-linear speed-up proves that the underlying single-threaded program is
flawed.


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Reservations about Parallelization Strategy

2009-10-30 Thread Brian Sheppard
While re-reading the parallelization papers, I tried to formulate why I
thought that they couldn't be right. The issue with Mango reporting a
super-linear speed-up was an obvious red flag, but that doesn't mean that
their conclusions were wrong. It just means that Mango's exploration policy
needs tweaking.

I have three objections.

First, root parallelization does not fully exploit the RAM attached to each
compute node. If I have N compute nodes then I would like to search a tree
that is N times as large. But in root parallelization I cannot actually do
that because lower layers of the tree are duplicated across compute
processes.

Second, root parallelization does not share information about how to explore
away from the root. Tactics necessary to refute variations must be
discovered across all N processes. This seems wasteful of CPU resources.

Third, it seems wasteful of network bandwidth. In particular, the root node
is shared because of a side-effect: it makes N processes independently agree
on how to distribute attention across root moves. If you count up the bytes
required to represent a node, and how many bytes are required to specify a
distribution of attention, you will find that the node is much, much larger.

These reservations might not be significant. For example, most of a tree is
stored near the leaves, so the cost of duplicating nodes near the root might
be small. And the network bandwidth might not be a bottleneck.

But I do believe that the second objection is a killer. Theory says that it
is just as hard to choose replies to our moves as it is to choose our moves.
So a strategy that splits only at the root cannot be optimal.

One obvious alternative to root parallelization is deep parallelization in
which we share the root node and children of already shared nodes that have
at least N trials.

Deep parallelization should asymptotically solve the problem of sharing deep
tactics.

But this approach worsens the drawbacks from my first and third objections.


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] MPI vs Thread-safe

2009-10-30 Thread Brian Sheppard
Back of envelope calculation: MFG processes 5K nodes/sec/core * 4 cores per
process = 20K nodes/sec/process. Four processes makes 80K nodes/sec. If you
think for 30 seconds (pondering + move time) then you are at 2.4 million
nodes. Figure about 25,000 nodes having 100 visits or more. UCT data is
roughly 2500 bytes per node (2 counters of 4 bytes for 300 legal moves). If
you have 4 nodes then bandwidth is 25,000 * 4 * 2500 = 250 megabytes. That
is how much data the master has to broadcast for a complete refresh. And the
data has to be sent to the master for aggregation, but that is a smaller
number because most processes will only modify a small number of nodes
within the refresh cycle.

Now, there are all kinds of reductions to that number. For instance, MPI
supports process groups and there are network layer tricks that can (in
theory) supply all listeners in a single message. And nodes don't have to be
updated if they are not changed, and you can compress the counters, and so
on.

I'm just saying that it looks like a lot of data. It can't be as much as I
just calculated, because Gigabit Ethernet would saturate at something less
than 100 megabytes/sec. But how much is it?


Does Mogo share RAVE values as well over MPI?

I would think so, because RAVE is critical to MoGo's move ordering policies.


It might be the MFGO bias.

This doesn't add up to me. If the problem were in something so basic then
the serial program wouldn't play more strongly beyond 4 times as much clock
time.


It might be due to a bug in my MPI implementation, or any number
of other possible bugs.

Bugs in non-MPI areas would also kill off improvement in the serial program
beyond 4x. So if you aren't observing that then you can look elsewhere.

But of course it is possible for problems to exist only in scaling.

For example, suppose that a node is created in process A and gets 100
trials. It would then have its UCT data passed to other processes, but other
nodes have not necessarily created the node, nor done progressive widening
and so on. Such a node exists in a state that could not be created in a
serial program, which would cause a loss in move ordering. This is a bug in
parallelization, though not in MPI specifically.

And you are quite right that bugs can manifest as a limit to scalability.

The difficulty of debugging MPI processes is perhaps the biggest complaint
about that model.


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] MPI vs Thread-safe

2009-10-29 Thread Brian Sheppard
I have a question for those who have parallelized programs.

It seems like MPI is the obvious architecture when scaling a program to
multiple machines. Let's assume that we implement a program that has that
capability.

Now, it is possible to use MPI for scaling *within* a compute node. For
example, on a 4-core machine we could run four processes and use MPI to
synchronize them.

That policy has the obvious downside that the shared memory on a multi-core
box is fragmented, and some portion of the tree is duplicated even within
processes, which seems wasteful.

For this reason I have assumed that programs would use a thread-safe
shared-memory design within a multi-core box, and only use MPI to scale to
clusters.

But there are downsides to that design as well. Like the extra complexity of
having two models for parallel programming.

And I don't really know the cost of duplicating nodes. Maybe the tree splits
so much that different processes share relatively few nodes. Or maybe you
can allocate trials so that is the case.

And now my question: what do you actually do: MPI, thread-safe, both, or
something else?

And can you share any observations about your choices?

Thanks,
Brian


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] NVidia Fermi and go?

2009-10-23 Thread Brian Sheppard
I just wondered if this new Fermi GPU solves the issues for go
playouts, or don't really make any difference?

My first impression of Fermi is very positive. Fermi contains a lot of
features that make general purpose computing on a GPU much easier and better
performing.

However, it remains the case that all kernels on a multiprocessor must
execute the same instruction on each cycle. When executing if/then/else
logic, this implies that if *any* core needs to execute a branch, then *all*
cores must wait for those instructions to complete.

Playout policies have a lot of if/then/else logic. Sequential processors
handle such code quite well, because most of it isn't executed. But when you
have 32 playouts executing in parallel, then there is a high chance that
both branches will be needed. This really cuts into the potential gain.

Amdahl's law is a factor, as well. Amdahl's law says that the gain from
parallelization is limited when some aspects of the solution execute
sequentially. For example, the GPU has to generate positions and transfer
them to the GPU for playout. Generation and transfer are sequential. Because
of such overhead, massively parallel programs generally need very high
increases from parallelization.

Clock speed is also a factor. CPUS execute at over 3 GHz, and because of
speculative execution they often execute more than one instruction per
clock. The GPU generally has a clock rate ~ 1 GHz, and most general purpose
instructions require multiple clocks. So you must have a large parallel
speedup just to break even. (Unless you can exploit some of the specialized
graphics instructions, such as texture mapping, that equate to dozens of
sequential instructions yet execute as a single instruction on the GPU. I
don't think computer Go has that possibility.)

So I am not convinced yet, but Fermi is a big step (really many small steps)
in the right direction.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] NVidia Fermi and go?

2009-10-23 Thread Brian Sheppard
In my own gpu experiment (light playouts), registers/memory were the
bounding factors on simulation speed.

I respect your experimental finding, but I note that you have carefully
specified light playouts, probably because you suspect that there may be a
significant difference if playouts are heavy.

I have not done any GPU experiments, so readers should take my guesswork
FWIW. I think the code that is light is the only piece that parallelizes
efficiently. Heavy playouts look for rare but important situations and
handle them using specific knowledge. If a situation is rare then a warp
of 32 playouts won't have many matches, so it will stall the other cores.

I have no data regarding the probability of such stalls in heavy playouts,
but I think they must be frequent. For example, if the ratio of heavy to
light playouts is a four-fold increase in CPU time, then a sequential
program is spending 75% of its time identifying and handling rare
situations.

My opinion is that heavy playouts are necessary, so if you put all this
guesswork together then you come up with my opinion: Fermi is probably still
not enough, but there are steps in the right direction.

BTW, it occurs to me that we can approximate the efficiency of
parallelization by taking execution counts from a profiler and
post-processing them. I should do that before buying a new GPU. :-)


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] NVidia Fermi and go?

2009-10-23 Thread Brian Sheppard
 BTW, it occurs to me that we can approximate the efficiency of
 parallelization by taking execution counts from a profiler and
 post-processing them. I should do that before buying a new GPU. :-)

I wonder what you mean by that.

If you run your program on a sequential machine and count statements then
you can analyze what happens around branches to calculate how your system
will parallelize.

Simple example:

If (foo) {   // 3200 hits
  Bar(); // 3190 hits
}
Else
  Baz(); //   10 hits
}

If the 3200 executions were divided into 100 warps of 32, then the 10 hits
on Baz() would be distributed into 9 or 10 different warps. So the Baz code
stalls 10 out of 100 warps. The effect would be as if the Baz code were
executed on 10% of trials rather than 0.3%.

You can calculate the cost of this if-statement as 100% of Bar() + 10% of
Baz() on a GPU. Note that the CPU cost is 99.7% of Bar() + 0.3% of Baz(). If
Baz takes 1000 cycles and Bar takes 10 cycles, then the GPU costs 110 cycles
and the CPU costs 13 cycles. Factor in 32x parallelism and the GPU uses
fewer cycles. But in this case the fact that the CPU's cycles are so much
faster will outweigh parallelism.

Regarding the rest of your post: I think you are onto something. The GPU is
different from the CPU, so maybe a different design is a better match.

You need to express the playout policy as data rather than code. The
downside there is that the memory hierarchy on a GPU imposes huge latencies.
Maybe the caches in Fermi will make a big difference here.



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Pattern matching

2009-10-13 Thread Brian Sheppard
Someone commented on it, so it must have come across OK.

Yes, that is what I concluded.


Is it possible that your e-mail provider or e-mail reader decided to
not include the attachment? At least that's what 'skipped content'
seems to indicate.

I am reading in IE, by surfing to
http://computer-go.org/pipermail/computer-go/2009-October/019672.html. I
think the server must have sent that message.


If I need to repost or publish it in another way I'd be happy to do
so.

If I am the only one who cannot read it then the simplest is to send a copy
by e-mail.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Rating variability on CGOS

2009-10-08 Thread Brian Sheppard
About two weeks ago I took Pebbles offline for an extensive overhaul of its
board representation. At that time Valkyria 3.3.4 had a 9x9 CGOS rating of
roughly 2475.

When I looked today, I saw Valkyria 3.3.4 rated at roughly 2334, so I
wondered what was going on.

I found a contributing factor: Valkyria has massively different results
against Pachi than against Pebbles. It happens that Pachi started playing a
day or two after Pebbles went offline.

Pebbles and Pachi are both rated around 2200, but Valkyria shreds Pebbles a
lot more often than Pachi:

Pachi:   185 / 273 = 67.8%
Pebbles: 429 / 503 = 85.3%

There are a lot of lessons here...

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Rating variability on CGOS

2009-10-08 Thread Brian Sheppard
One must be very careful about proclaiming wild transitivity issues.  I'm
not saying it's not an issue, there is some going on with every program on
CGOS, but with less than 500 games between any two players you are going
to get error margins of +/- 30-50 ELO or something like that.

Actually we are certain that significant differences are being observed. If
we pool the Pachi and Pebbles data, then the null hypothesis is that
Valkyria defeats both programs by 79%. The observed data differs by at least
3.5 standard deviations.

Note that we are talking about 150 rating points.




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Long superko example

2009-10-05 Thread Brian Sheppard
The sequence is as interesting as the superko, really, because it's the 
sort of strange sequence that would only occur to monte carlo programs. 

Yes, that is why I posted it.


I'm very curious what black thought the win rate was before B9...

I do not know. This position wasn't from a game. It was from a playout. I
was searching for instances of hole of three situations, following MoGo's
recommendation. I was building a regression suite of test cases.

I don't know whether Pebbles could play that sequence. I just noticed the
possibility of a repetition while I was validating that the situation was a
genuine hole of three.


As for black, looks like he should resign instead of B9.

You are right that X dies first in alternating play. But the only way to be
sure of that is to play B9 and see what happens. So hole of three rule
does generate a critical play here.

   1 2 3 4 5 6 7 8 9
 A O O O X - X X O -
 B - O O X X O O - -
 C O - O X X - O O O
 D - O X X O O O X O
 E O O O O X X O X X
 F X O O X X X X X -
 G X X O O X O X - X
 H - X X - O O O X -
 J X X - O - X X - X

 X: B9 (hole-of-three)
 O: A9 (atari)
 X: B8 (capture)
 O: A8 (atari)
 X: A9 (capture)
 O: A8 (capture)

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Long superko example

2009-10-04 Thread Brian Sheppard

  1 2 3 4 5 6 7 8 9
A O O O X - X X O -
B - O O X X O O - -
C O - O X X - O O O
D - O X X O O O X O
E O O O O X X O X X
F X O O X X X X X -
G X X O O X O X - X
H - X X - O O O X -
J X X - O - X X - X

X: B9 (hole-of-three)
O: A9 (atari)
X: B8 (capture)
O: A8 (atari)
X: A9 (capture)
O: A8 (capture)

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


  1   2   >