Re: [Computer-go] dealing with multiple local optima

2017-02-28 Thread Cyris Sargon
On Mon, Feb 27, 2017 at 8:05 AM, Erik van der Werf  wrote:

> On Mon, Feb 27, 2017 at 4:30 PM, Darren Cook  wrote:
>
>> > But those video games have a very simple optimal policy. Consider Super
>> Mario:
>> > if you see an enemy, step on it; if you see a whole, jump over it; if
>> you see a
>> > pipe sticking up, also jump over it; etc.
>>
>> A bit like go? If you see an unsettled group, make it live. If you have
>> a ko, play a ko threat. If you see have two 1-eye groups near each
>> other, join them together. :-)
>>
>> Okay, those could be considered higher-level concepts, but I still
>> thought it was impressive to learn to play arcade games with no hints at
>> all.
>>
>
>
> The impressive part is hidden in what most humans consider trivial; to
> make the programs 'see'
>
> Erik
>
>
Which is what (at least in part) Terry McIntyre was pointing at when he
wrote:

> "seeing" is complex when the input is just a bunch of pixels.
>
> Terry McIntyre
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] dealing with multiple local optima

2017-02-27 Thread Erik van der Werf
On Mon, Feb 27, 2017 at 4:30 PM, Darren Cook  wrote:

> > But those video games have a very simple optimal policy. Consider Super
> Mario:
> > if you see an enemy, step on it; if you see a whole, jump over it; if
> you see a
> > pipe sticking up, also jump over it; etc.
>
> A bit like go? If you see an unsettled group, make it live. If you have
> a ko, play a ko threat. If you see have two 1-eye groups near each
> other, join them together. :-)
>
> Okay, those could be considered higher-level concepts, but I still
> thought it was impressive to learn to play arcade games with no hints at
> all.
>


The impressive part is hidden in what most humans consider trivial; to make
the programs 'see'

Erik
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] dealing with multiple local optima

2017-02-27 Thread Darren Cook
> But those video games have a very simple optimal policy. Consider Super 
> Mario: 
> if you see an enemy, step on it; if you see a whole, jump over it; if you see 
> a 
> pipe sticking up, also jump over it; etc.

A bit like go? If you see an unsettled group, make it live. If you have
a ko, play a ko threat. If you see have two 1-eye groups near each
other, join them together. :-)

Okay, those could be considered higher-level concepts, but I still
thought it was impressive to learn to play arcade games with no hints at
all.

Darren


> 
> On Sat, Feb 25, 2017 at 12:36 AM, Darren Cook  > wrote:
> 
>  > ...if it is hard to have "the good starting point" such as a trained
>  > policy from human expert game records, what is a way to devise one.
> 
> My first thought was to look at the DeepMind research on learning to
> play video games (which I think either pre-dates the AlphaGo research,
> or was done in parallel with it): https://deepmind.com/research/dqn/
> 
> 
> It just learns from trial and error, no expert game records:
> 
> 
> http://www.theverge.com/2016/6/9/11893002/google-ai-deepmind-atari-montezumas-revenge
> 
> 
> 
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] dealing with multiple local optima

2017-02-25 Thread Erik van der Werf
On Sat, Feb 25, 2017 at 12:30 AM, Brian Sheppard via Computer-go <
computer-go@computer-go.org> wrote:

> In retrospect, I view Schradolph’s paper as evidence that neural networks
> have always been surprisingly successful at Go. Like Brugmann’s paper about
> Monte Carlo, which was underestimated for a long time. Sigh.
>

Hear hear :-)
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] dealing with multiple local optima

2017-02-24 Thread Brian Sheppard via Computer-go
OK, so let’s talk theory vs practice.

 

In theory, TD learning approaches asymptotic optimality when used with a full 
state space. That is, if your RL model has one parameter for each state, then 
TD will converge those parameters to the game theoretic values. There are some 
pre-conditions, but this is true with remarkable generality. In particular, 
there is nothing about stochastic vs deterministic in the theory.

 

So yes, the published experiments for Chess were failures. That is, using a 
very shallow search and relatively simple evaluation function does not work for 
chess like it did for Tesauro’s experiments in backgammon. But the conclusion 
that this is because of stochastic/deterministic is incorrect. Consider Go, for 
example, where static move generators can play at 1 dan level, and probably 
quite a bit higher.

 

An inference that is compatible with theory is that to conquer chess you will 
need a deeper, wider network, or better inputs, or a better search engine. Or 
all of those. You could imagine measuring the skill of a variety of 
architectures, and map out the path of steepest ascent.

 

In retrospect, I view Schradolph’s paper as evidence that neural networks have 
always been surprisingly successful at Go. Like Brugmann’s paper about Monte 
Carlo, which was underestimated for a long time. Sigh.

 

Best,

Brian

 

From: Minjae Kim [mailto:xive...@gmail.com] 
Sent: Friday, February 24, 2017 11:56 AM
To: Brian Sheppard <sheppar...@aol.com>; computer-go@computer-go.org
Subject: Re: [Computer-go] dealing with multiple local optima

 

TD-gammon is regarded as a special case from the stochastic characteristics of 
the backgammon game; it smoothens the search space for the value function and 
the value function itself to a great degree compared to those 'static' games 
such as go. Experiments of applying TD learning to chess and go was done after 
the success of TD-gammon, and the result was not good. I also did a small 
experiment a while ago applying TD learning to go using a similar network 
structure to that of TD-gammon. The network gets 'stuck' really fast. You can 
see similar comments to my experience here 
(https://chessprogramming.wikispaces.com/Temporal+Difference+Learning). You can 
make the network more 'actively' trained by doing stochastic move selection 
rather than simply selecting the move with highest value, but this doesn't work 
well enough to my experience, where I applied softmax. Schradolph experimented 
with TD for go in his 1994 paper, where he applied Gibbs sampling for 
stochastic move selection, although it wasn't a success for building a strong 
go bot.

 

On Fri, Feb 24, 2017 at 11:03 PM, Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

Neural networks always have a lot of local optima. Simply because they have a 
high degree of internal symmetry. That is, you can “permute” sets of 
coefficients and get the same function.

 

Don’t think of starting with expert training as a way to avoid local optima. It 
is a way to start training with some good examples so that learning can start 
at a higher level. But then you should continue with self-play.

 

Backgammon was trained to expert level based on self-play games that were 
initially random. Google “Tesauro backgammon” and you should be able to find a 
paper.

 

I don’t know NEAT and HyperNEAT; I will look them up. Thank you for the 
reference.

 

Best,

Brian

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org 
<mailto:computer-go-boun...@computer-go.org> ] On Behalf Of Minjae Kim
Sent: Friday, February 24, 2017 3:39 AM
To: computer-go@computer-go.org <mailto:computer-go@computer-go.org> 
Subject: [Computer-go] dealing with multiple local optima

 

I've recently viewed the paper of AlphaGo, which has done gradient-based 
reinforcement learning to get stronger. The learning was successful enough to 
beat a human master, but in this case, supervised learning with a large 
database of master level human games was preceded the reinforcement learning. 
For a complex enough game as go, one can expect that the search space for the 
policy function would not be smooth at all. So supposedly supervised learning 
was necessary to guide the policy function to a good starting point before 
reinforcement. Without such, applying reinforcement learning directly to a 
random policy can easily make the policy stuck at a bad local optimum. I could 
have a miunderstanding at this point; correct me if so, but to continue on: if 
it is hard to have "the good starting point" such as a trained policy from 
human expert game records, what is a way to devise one. I've had a look on NEAT 
and HyperNEAT, which are evolutionary methods. Do these evolutionary algorithms 
scale well on complex strategic decision processes and not just on simple 
linear decisions such as food gathering and danger avoidance? I

Re: [Computer-go] dealing with multiple local optima

2017-02-24 Thread Jim O'Flaherty
NEAT and hyperNEAT are awesome when "evolving" fairly simple networks with
a very limited number of input and output dimensions. However, without
access to some serious computational power, scaling the NEAT method up to
the kind of level you would need for the current encoding methods for the
input layer used by AlphaGo into its ANNs and then the decoding methods for
the output layer, is likely not feasible for anything less than a Google
sized team and investment; i.e half a dozen people and millions of dollars
of computational access on their Google Cloud distributed computing
architecture. Amazon (with AWS) and Microsoft (with Azure) are the only two
other companies that would have the excess capacity in both supporting the
personnel and the distributed computation costs. A distant second would be
companies like IBM who could carry the personnel while leveraging AWS,
Azure, Google Cloud, etc.

Once there is a sufficient investment in this kind of evolutionary
meta-modeling, it will be a very useful starting point for others. However,
until someone is willing to play extremely long term and pony up the HUGE
up front costs of evolutionary bootstrapping out of the simple models NEAT
handles today, it is a short-term dead in.



On Fri, Feb 24, 2017 at 2:39 AM, Minjae Kim  wrote:

> I've recently viewed the paper of AlphaGo, which has done gradient-based
> reinforcement learning to get stronger. The learning was successful enough
> to beat a human master, but in this case, supervised learning with a large
> database of master level human games was preceded the reinforcement
> learning. For a complex enough game as go, one can expect that the search
> space for the policy function would not be smooth at all. So supposedly
> supervised learning was necessary to guide the policy function to a good
> starting point before reinforcement. Without such, applying reinforcement
> learning directly to a random policy can easily make the policy stuck at a
> bad local optimum. I could have a miunderstanding at this point; correct me
> if so, but to continue on: if it is hard to have "the good starting point"
> such as a trained policy from human expert game records, what is a way to
> devise one. I've had a look on NEAT and HyperNEAT, which are evolutionary
> methods. Do these evolutionary algorithms scale well on complex strategic
> decision processes and not just on simple linear decisions such as food
> gathering and danger avoidance? In case not, what alternatives are known?
> Is there any success case of a chess, go, or any kind of complex strategic
> game playing algorithm, where it gained expert strength without domain
> knowledge such as expert game examples?
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] dealing with multiple local optima

2017-02-24 Thread terry mcintyre via Computer-go
"seeing" is complex when the input is just a bunch of pixels.  Terry McIntyre 
 Unix/Linux Systems Administration Taking time to do 
it right saves having to do it twice. 

On Friday, February 24, 2017 12:32 PM, Minjae Kim  wrote:
 

 But those video games have a very simple optimal policy. Consider Super Mario: 
if you see an enemy, step on it; if you see a whole, jump over it; if you see a 
pipe sticking up, also jump over it; etc.

On Sat, Feb 25, 2017 at 12:36 AM, Darren Cook  wrote:

> ...if it is hard to have "the good starting point" such as a trained
> policy from human expert game records, what is a way to devise one.

My first thought was to look at the DeepMind research on learning to
play video games (which I think either pre-dates the AlphaGo research,
or was done in parallel with it): https://deepmind.com/research/ dqn/

It just learns from trial and error, no expert game records:

http://www.theverge.com/2016/ 6/9/11893002/google-ai- 
deepmind-atari-montezumas- revenge

Darren



--
Darren Cook, Software Researcher/Developer
My New Book: Practical Machine Learning with H2O:
  http://shop.oreilly.com/ product/0636920053170.do
__ _
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/ mailman/listinfo/computer-go


___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

   ___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] dealing with multiple local optima

2017-02-24 Thread Álvaro Begué
I should point out that Reinforcement Learning is a relatively unimportant
part of AlphaGo, according to the paper. They only used it to turn the
move-prediction network into a stronger player (presumably increasing the
weights of the layer before SoftMax would do most of the job, by making the
highest-probability move the vast majority of the time). This stronger
player was then used to label positions by the result of self-play, and
these labelled positions were then used to learn a "value network" (what we
call an "evaluation function" in computer chess). The final AlphaGo playing
program doesn't actually use the RI network at all.


> Is there any success case of a chess, go, or any kind of complex
strategic game playing algorithm, where it gained expert strength without
domain knowledge such as expert game examples?

Yes! In 2015 I made a Spanish checkers program that used a neural network
as evaluation function (121 binary inputs, two hidden layers with 32 units
each and ReLU activation functions and one tanh output unit). I started by
generating 6-men tablebases, because the endgames are hard to get right
with shallow searches, and it is critical to know them to learn what
"enough advantage to win" is. I bootstrapped the learning process by
playing games with random evaluation function and depth-6 search. Then I
used these games to train the neural network, then played games with the
neural network, at around 0.1s/move (that's about 100K nodes searched).
Rinse, repeat (3 times, I think). The first time around, the program didn't
really know if it was better to have more pieces than the opponent or
fewer. By the 3rd iteration it had at least as much positional knowledge of
the game as I do, and I spent a lot of time in my teenage years playing
this game.

One important detail: The first few moves of the games were played using
UCT as a sort of opening book, to create a lot of variety. I should also
point out that the last move in the UCT tree is pretty much random. I
believe both AlphaGo and Giraffe used random moves when generating
databases to learn from.

I sent the program to my friend José Manuel Morán, which is probably the
strongest player in Spain (there are several comparably strong players in
Portugal, where the game is more popular). After a few days his report was
"It cannot be beaten."

Perhaps alpha-beta on a modern computer is too successful for this game to
really be able to tell if the evaluation function was very good, but it was
certainly not bad, and I didn't use any human knowledge anywhere in the
process of producing it.

Regards,
Álvaro.



On Fri, Feb 24, 2017 at 3:39 AM, Minjae Kim  wrote:

> I've recently viewed the paper of AlphaGo, which has done gradient-based
> reinforcement learning to get stronger. The learning was successful enough
> to beat a human master, but in this case, supervised learning with a large
> database of master level human games was preceded the reinforcement
> learning. For a complex enough game as go, one can expect that the search
> space for the policy function would not be smooth at all. So supposedly
> supervised learning was necessary to guide the policy function to a good
> starting point before reinforcement. Without such, applying reinforcement
> learning directly to a random policy can easily make the policy stuck at a
> bad local optimum. I could have a miunderstanding at this point; correct me
> if so, but to continue on: if it is hard to have "the good starting point"
> such as a trained policy from human expert game records, what is a way to
> devise one. I've had a look on NEAT and HyperNEAT, which are evolutionary
> methods. Do these evolutionary algorithms scale well on complex strategic
> decision processes and not just on simple linear decisions such as food
> gathering and danger avoidance? In case not, what alternatives are known?
> Is there any success case of a chess, go, or any kind of complex strategic
> game playing algorithm, where it gained expert strength without domain
> knowledge such as expert game examples?
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] dealing with multiple local optima

2017-02-24 Thread Minjae Kim
But those video games have a very simple optimal policy. Consider Super
Mario: if you see an enemy, step on it; if you see a whole, jump over it;
if you see a pipe sticking up, also jump over it; etc.

On Sat, Feb 25, 2017 at 12:36 AM, Darren Cook  wrote:

> > ...if it is hard to have "the good starting point" such as a trained
> > policy from human expert game records, what is a way to devise one.
>
> My first thought was to look at the DeepMind research on learning to
> play video games (which I think either pre-dates the AlphaGo research,
> or was done in parallel with it): https://deepmind.com/research/dqn/
>
> It just learns from trial and error, no expert game records:
>
> http://www.theverge.com/2016/6/9/11893002/google-ai-
> deepmind-atari-montezumas-revenge
>
> Darren
>
>
>
> --
> Darren Cook, Software Researcher/Developer
> My New Book: Practical Machine Learning with H2O:
>   http://shop.oreilly.com/product/0636920053170.do
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] dealing with multiple local optima

2017-02-24 Thread Darren Cook
> ...if it is hard to have "the good starting point" such as a trained
> policy from human expert game records, what is a way to devise one.

My first thought was to look at the DeepMind research on learning to
play video games (which I think either pre-dates the AlphaGo research,
or was done in parallel with it): https://deepmind.com/research/dqn/

It just learns from trial and error, no expert game records:

http://www.theverge.com/2016/6/9/11893002/google-ai-deepmind-atari-montezumas-revenge

Darren



-- 
Darren Cook, Software Researcher/Developer
My New Book: Practical Machine Learning with H2O:
  http://shop.oreilly.com/product/0636920053170.do
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] dealing with multiple local optima

2017-02-24 Thread Brian Sheppard via Computer-go
Neural networks always have a lot of local optima. Simply because they have a 
high degree of internal symmetry. That is, you can “permute” sets of 
coefficients and get the same function.

 

Don’t think of starting with expert training as a way to avoid local optima. It 
is a way to start training with some good examples so that learning can start 
at a higher level. But then you should continue with self-play.

 

Backgammon was trained to expert level based on self-play games that were 
initially random. Google “Tesauro backgammon” and you should be able to find a 
paper.

 

I don’t know NEAT and HyperNEAT; I will look them up. Thank you for the 
reference.

 

Best,

Brian

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Minjae Kim
Sent: Friday, February 24, 2017 3:39 AM
To: computer-go@computer-go.org
Subject: [Computer-go] dealing with multiple local optima

 

I've recently viewed the paper of AlphaGo, which has done gradient-based 
reinforcement learning to get stronger. The learning was successful enough to 
beat a human master, but in this case, supervised learning with a large 
database of master level human games was preceded the reinforcement learning. 
For a complex enough game as go, one can expect that the search space for the 
policy function would not be smooth at all. So supposedly supervised learning 
was necessary to guide the policy function to a good starting point before 
reinforcement. Without such, applying reinforcement learning directly to a 
random policy can easily make the policy stuck at a bad local optimum. I could 
have a miunderstanding at this point; correct me if so, but to continue on: if 
it is hard to have "the good starting point" such as a trained policy from 
human expert game records, what is a way to devise one. I've had a look on NEAT 
and HyperNEAT, which are evolutionary methods. Do these evolutionary algorithms 
scale well on complex strategic decision processes and not just on simple 
linear decisions such as food gathering and danger avoidance? In case not, what 
alternatives are known? Is there any success case of a chess, go, or any kind 
of complex strategic game playing algorithm, where it gained expert strength 
without domain knowledge such as expert game examples?

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] dealing with multiple local optima

2017-02-24 Thread Minjae Kim
I've recently viewed the paper of AlphaGo, which has done gradient-based
reinforcement learning to get stronger. The learning was successful enough
to beat a human master, but in this case, supervised learning with a large
database of master level human games was preceded the reinforcement
learning. For a complex enough game as go, one can expect that the search
space for the policy function would not be smooth at all. So supposedly
supervised learning was necessary to guide the policy function to a good
starting point before reinforcement. Without such, applying reinforcement
learning directly to a random policy can easily make the policy stuck at a
bad local optimum. I could have a miunderstanding at this point; correct me
if so, but to continue on: if it is hard to have "the good starting point"
such as a trained policy from human expert game records, what is a way to
devise one. I've had a look on NEAT and HyperNEAT, which are evolutionary
methods. Do these evolutionary algorithms scale well on complex strategic
decision processes and not just on simple linear decisions such as food
gathering and danger avoidance? In case not, what alternatives are known?
Is there any success case of a chess, go, or any kind of complex strategic
game playing algorithm, where it gained expert strength without domain
knowledge such as expert game examples?
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go