Re: [Computer-go] Is MCTS needed?

2017-11-17 Thread Hideki Kato
>On 17-11-17 02:15, Hideki Kato wrote:
>> Stephan K: 

Re: [Computer-go] Is MCTS needed?

2017-11-17 Thread Gian-Carlo Pascutto
On 16-11-17 18:24, Stephan K wrote:
> 2017-11-16 17:37 UTC+01:00, Gian-Carlo Pascutto :
>> Third, evaluating with a different rotation effectively forms an
>> ensemble that improves the estimate.
> 
> Could you expand on that? I understand rotating the board has an
> impact for a neural network, but how does that change anything for a
> tree search? Or is it because the monte carlo tree search relies on
> the policy network?

It was a response to the statement "There are no reevaluations that
would improve your estimate."

Consider a quiet position where the tree search wouldn't reveal any
tactics. Normally, searching deeper won't give an immediate benefit. But
because of the rotations, the value network's score is improved from a
single estimate to an ensemble.

In chess/alpha-beta terms, the quiescence search resolves the tactics
(if any), so running it again with part of the tactics resolved would
produce the same score. But with value nets, this is not entirely true.

> Could it be possible to train a value net using only the results of
> already finished games, rather than monte carlo rollouts?

Isn't this how it works already?

> My (extremely vague and possibly fallacious) understanding of the
> situation was that monte carlo tree search was less effective for
> chess because of the more sudden changes there might be when
> evaluating chess positions. For instance, a player with an apparently
> lesser position might actually be a few moves away from a checkmate
> (or just from a big gain), which might be missed by the monte carlo
> tree search because it depends on one particular branch of the tree.

Life and death and capture races behave the same. The inability of MCTS
to switch to a new PV instantly isn't necessarily very different from
the requirement in chess that all moves are searched to an equal
(nominal!) depth. In practical alpha-beta implementations, failing high
on a new best move requires a re-search as well.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Is MCTS needed?

2017-11-17 Thread Gian-Carlo Pascutto
On 17-11-17 02:15, Hideki Kato wrote:
> Stephan K: 
> 

Re: [Computer-go] Is MCTS needed?

2017-11-16 Thread Hideki Kato
Stephan K: 

Re: [Computer-go] Is MCTS needed?

2017-11-16 Thread uurtamo .
Hideki,

This is a very nice observation.

s.


On Nov 16, 2017 12:37 PM, "Hideki Kato"  wrote:

Hi,

I strongly believe adding rollout makes Zero stronger.
They removed rollout just to say "no human knowledge".
#Though the number of past moves (16) has been tuned by
human :).

Hideki

Petr Baudis: <20171116154309.tfq5ix2hzwzci...@machine.or.cz>:
>  Hi,
>
>  when explaining AlphaGo Zero to a machine learning audience yesterday
>
>
>(https://docs.google.com/presentation/d/1VIueYgFciGr9pxiGmoQyUQ088Ca4o
uvEFDPoWpRO4oQ/view)
>
>it occurred to me that using MCTS in this setup is actually such
>a kludge!
>
>  Originally, we used MCTS because with the repeated simulations,
>we would be improving the accuracy of the arm reward estimates.  MCTS
>policies assume stationary distributions, which is violated every time
>we expand the tree, but it's an okay tradeoff if all you feed into the
>tree are rewards in the form of just Bernoulli trials.  Moreover, you
>could argue evaluations are somewhat monotonic with increasing node
>depths as you are basically just fixing a growing prefix of the MC
>simulation.
>
>  But now, we expand the nodes literally all the time, breaking the
>stationarity possibly in drastic ways.  There are no reevaluations that
>would improve your estimate.  The input isn't binary but an estimate in
>a continuous space.  Suddenly the Multi-armed Bandit analogy loses a lot
>of ground.
>
>  Therefore, can't we take the next step, and do away with MCTS?  Is
>there a theoretical viewpoint from which it still makes sense as the best
>policy improvement operator?
>
>  What would you say is the current state-of-art game tree search for
>chess?  That's a very unfamiliar world for me, to be honest all I really
>know is MCTS...
>
>--
>   Petr Baudis, Rossum
>   Run before you walk! Fly before you crawl! Keep moving forward!
>   If we fail, I'd rather fail really hugely.  -- Moist von Lipwig
>___
>Computer-go mailing list
>Computer-go@computer-go.org
>http://computer-go.org/mailman/listinfo/computer-go
--
Hideki Kato 
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Is MCTS needed?

2017-11-16 Thread Dave Dyer

While MCTS works "better" in games with a forward direction, it eventually
converges to the same answer as alpha-beta in any game.   The general
architecture is to set a maximum depth, and use a suitable evaluator
at the leaf nodes.

I haven't done detailed studies, but there is definitely a space where
(for example) a 30 second MCTS with a depth limit of 20 ply will outperform
a progressive-deepening alpha-beta that is allowed 30 seconds.

Another advantage of MCTS is that it's easy to run multiple threads, which
is not true of alpha-beta.

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Is MCTS needed?

2017-11-16 Thread Dave Dyer

While MCTS works "better" in games with a forward direction, it eventually
converges to the same answer as alpha-beta in any game.   The general
architecture is to set a maximum depth, and use a suitable evaluator
at the leaf nodes.

I haven't done detailed studies, but there is definitely a space where
(for example) a 30 second MCTS with a depth limit of 20 ply will outperform
a progressive-deepening alpha-beta that is allowed 30 seconds.

Another advantage of MCTS is that it's easy to run multiple threads, which
is not true of alpha-beta.

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Is MCTS needed?

2017-11-16 Thread Gian-Carlo Pascutto
On 16-11-17 18:15, "Ingo Althöfer" wrote:
> Something like MCTS would not work in chess, because in
> contrast to Go (and Hex and Amazons and ...) Chess is
> not a "game with forward direction".

Ingo, I think the reason Petr brought the whole thing up is that AlphaGo
Zero uses "MCTS" but it does not actually use Monte Carlo Playouts. I
think your condition about "forward direction" only applies to the
randomized playouts, yes? A neural network evaluation on the other hand
is very much like a classical static chess evaluation.

There are publications about Parallel Randomized Best First Search in
chess, read them and notice how it compares to MCTS.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Is MCTS needed?

2017-11-16 Thread Brian Sheppard via Computer-go
State of the art in computer chess is alpha-beta search, but note that the 
search is very selective because of "late move reductions."

A late move reduction is to reduce depth for moves after the first move 
generated in a node. For example, a simple implementation would be "search the 
first move generated to depth N, and all subsequent moves to depth N-1". That 
simple idea is a big improvement.

LMR works (in large part) because move generators produce probable-best-first 
order. Therefore, if the first few moves do not work (i.e., refutes opponent's 
late move) then there probably isn't a refutation. By searching unlikely moves 
less, you have the time to search likely variations more.

Stockfish's actual implementation massively increases effective search depth 
over the simple policy outlined above. Because of LMR, the effective branching 
factor of chess search trees is less than 2.

IMO, late move reductions should work really well in Go, given that the move 
predictors are above 60% accuracy. But I have not tried. 

Using MCTS for training the evaluator is probably necessary, even if the 
ultimate goal is to use the evaluator in alpha-beta.

Best,
Brian

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Petr Baudis
Sent: Thursday, November 16, 2017 10:43 AM
To: computer-go@computer-go.org
Subject: [Computer-go] Is MCTS needed?

  Hi,

  when explaining AlphaGo Zero to a machine learning audience yesterday


(https://docs.google.com/presentation/d/1VIueYgFciGr9pxiGmoQyUQ088Ca4ouvEFDPoWpRO4oQ/view)

it occurred to me that using MCTS in this setup is actually such
a kludge!

  Originally, we used MCTS because with the repeated simulations,
we would be improving the accuracy of the arm reward estimates.  MCTS
policies assume stationary distributions, which is violated every time
we expand the tree, but it's an okay tradeoff if all you feed into the
tree are rewards in the form of just Bernoulli trials.  Moreover, you
could argue evaluations are somewhat monotonic with increasing node
depths as you are basically just fixing a growing prefix of the MC
simulation.

  But now, we expand the nodes literally all the time, breaking the
stationarity possibly in drastic ways.  There are no reevaluations that
would improve your estimate.  The input isn't binary but an estimate in
a continuous space.  Suddenly the Multi-armed Bandit analogy loses a lot
of ground.

  Therefore, can't we take the next step, and do away with MCTS?  Is
there a theoretical viewpoint from which it still makes sense as the best
policy improvement operator?

  What would you say is the current state-of-art game tree search for
chess?  That's a very unfamiliar world for me, to be honest all I really
know is MCTS...

-- 
Petr Baudis, Rossum
Run before you walk! Fly before you crawl! Keep moving forward!
If we fail, I'd rather fail really hugely.  -- Moist von Lipwig
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Is MCTS needed?

2017-11-16 Thread Hideki Kato
Hi,

I strongly believe adding rollout makes Zero stronger.  
They removed rollout just to say "no human knowledge".
#Though the number of past moves (16) has been tuned by 
human :).

Hideki

Petr Baudis: <20171116154309.tfq5ix2hzwzci...@machine.or.cz>:
>  Hi,
>
>  when explaining AlphaGo Zero to a machine learning audience yesterday
>
>   
>(https://docs.google.com/presentation/d/1VIueYgFciGr9pxiGmoQyUQ088Ca4ouvEFDPoWpRO4oQ/view)
>
>it occurred to me that using MCTS in this setup is actually such
>a kludge!
>
>  Originally, we used MCTS because with the repeated simulations,
>we would be improving the accuracy of the arm reward estimates.  MCTS
>policies assume stationary distributions, which is violated every time
>we expand the tree, but it's an okay tradeoff if all you feed into the
>tree are rewards in the form of just Bernoulli trials.  Moreover, you
>could argue evaluations are somewhat monotonic with increasing node
>depths as you are basically just fixing a growing prefix of the MC
>simulation.
>
>  But now, we expand the nodes literally all the time, breaking the
>stationarity possibly in drastic ways.  There are no reevaluations that
>would improve your estimate.  The input isn't binary but an estimate in
>a continuous space.  Suddenly the Multi-armed Bandit analogy loses a lot
>of ground.
>
>  Therefore, can't we take the next step, and do away with MCTS?  Is
>there a theoretical viewpoint from which it still makes sense as the best
>policy improvement operator?
>
>  What would you say is the current state-of-art game tree search for
>chess?  That's a very unfamiliar world for me, to be honest all I really
>know is MCTS...
>
>-- 
>   Petr Baudis, Rossum
>   Run before you walk! Fly before you crawl! Keep moving forward!
>   If we fail, I'd rather fail really hugely.  -- Moist von Lipwig
>___
>Computer-go mailing list
>Computer-go@computer-go.org
>http://computer-go.org/mailman/listinfo/computer-go
-- 
Hideki Kato 
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Is MCTS needed?

2017-11-16 Thread Stephan K
2017-11-16 17:37 UTC+01:00, Gian-Carlo Pascutto :
> Third, evaluating with a different rotation effectively forms an
> ensemble that improves the estimate.

Could you expand on that? I understand rotating the board has an
impact for a neural network, but how does that change anything for a
tree search? Or is it because the monte carlo tree search relies on
the policy network?

> As for a theoretical viewpoint: the value net is an estimation of the
> value of some fixed amount of Monte Carlo rollouts.

Could it be possible to train a value net using only the results of
already finished games, rather than monte carlo rollouts?

What about the value network from [Multi-Labelled Value Networks for
Computer Go https://arxiv.org/abs/1705.10701 ], which can compute an
estimate of the score by assigning each intersection of the board a
probability that it will be black territory? (It does compute a more
usual winrate estimation, but it also computes a territory
estimation).

>> What would you say is the current state-of-art game tree search for
>> chess?  That's a very unfamiliar world for me, to be honest all I
>> really know is MCTS...
>
> The same it was 20 year ago, alpha-beta. Though one could certainly make
> the argument that an alpha-beta searcher using late move reductions
> (searching everything but the best moves less deeply) is searching a
> tree of a very similar shape as an UCT searcher with a small exploration
> constant.

My (extremely vague and possibly fallacious) understanding of the
situation was that monte carlo tree search was less effective for
chess because of the more sudden changes there might be when
evaluating chess positions. For instance, a player with an apparently
lesser position might actually be a few moves away from a checkmate
(or just from a big gain), which might be missed by the monte carlo
tree search because it depends on one particular branch of the tree.
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Is MCTS needed?

2017-11-16 Thread Ingo Althöfer
Hi Petr,

>   What would you say is the current state-of-art game tree search for
> chess?  That's a very unfamiliar world for me, to be honest all I really
> know is MCTS...

Stockfish is one of the top-three chess programs, and
it is open source. It is mainly iterative deepening
alpha-beta, but with many tricky details.

Something like MCTS would not work in chess, because in
contrast to Go (and Hex and Amazons and ...) Chess is
not a "game with forward direction".

Just 2 Cent, Ingo.

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Is MCTS needed?

2017-11-16 Thread Gian-Carlo Pascutto
On 16/11/2017 16:43, Petr Baudis wrote:
> But now, we expand the nodes literally all the time, breaking the 
> stationarity possibly in drastic ways.  There are no reevaluations
> that would improve your estimate.

First of all, you don't expect the network evaluations to drastically
vary between parent and children, unless there are tactics that you are
not understanding.

Secondly, the evaluations are rather noisy, so averaging still makes sense.

Third, evaluating with a different rotation effectively forms an
ensemble that improves the estimate.

> Therefore, can't we take the next step, and do away with MCTS?  Is 
> there a theoretical viewpoint from which it still makes sense as the
> best policy improvement operator?

People have posted results with that on this list and IIRC programs
using regular alpha-beta were weaker.

As for a theoretical viewpoint: the value net is an estimation of the
value of some fixed amount of Monte Carlo rollouts.

> What would you say is the current state-of-art game tree search for 
> chess?  That's a very unfamiliar world for me, to be honest all I
> really know is MCTS...

The same it was 20 year ago, alpha-beta. Though one could certainly make
the argument that an alpha-beta searcher using late move reductions
(searching everything but the best moves less deeply) is searching a
tree of a very similar shape as an UCT searcher with a small exploration
constant.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Is MCTS needed?

2017-11-16 Thread Xavier Combelle
As far as I know, the state of art in chess is some flavor of alphabeta
(as long as I read stockfish source correctly),
so basically they prove their current esimation is the best one up to a
certain depth.

MCTS has the benefit to enable various depth search depending on how
good the evaluation is.
I believe that related to the way alpha go and alphago zero is, it is
far superior to alphabeta.



Le 16/11/2017 à 16:43, Petr Baudis a écrit :
>   Hi,
>
>   when explaining AlphaGo Zero to a machine learning audience yesterday
>
>   
> (https://docs.google.com/presentation/d/1VIueYgFciGr9pxiGmoQyUQ088Ca4ouvEFDPoWpRO4oQ/view)
>
> it occurred to me that using MCTS in this setup is actually such
> a kludge!
>
>   Originally, we used MCTS because with the repeated simulations,
> we would be improving the accuracy of the arm reward estimates.  MCTS
> policies assume stationary distributions, which is violated every time
> we expand the tree, but it's an okay tradeoff if all you feed into the
> tree are rewards in the form of just Bernoulli trials.  Moreover, you
> could argue evaluations are somewhat monotonic with increasing node
> depths as you are basically just fixing a growing prefix of the MC
> simulation.
>
>   But now, we expand the nodes literally all the time, breaking the
> stationarity possibly in drastic ways.  There are no reevaluations that
> would improve your estimate.  The input isn't binary but an estimate in
> a continuous space.  Suddenly the Multi-armed Bandit analogy loses a lot
> of ground.
>
>   Therefore, can't we take the next step, and do away with MCTS?  Is
> there a theoretical viewpoint from which it still makes sense as the best
> policy improvement operator?
>
>   What would you say is the current state-of-art game tree search for
> chess?  That's a very unfamiliar world for me, to be honest all I really
> know is MCTS...
>

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go