Re: [Computer-go] Learning related stuff

2017-11-29 Thread Ray Tayek

On 11/29/2017 6:15 PM, Dave Dyer wrote:


My question is this; people have been messing around with neural nets
and machine learning for 40 years; what was the breakthrough that made
alphago succeed so spectacularly.



maybe it was 
https://en.wikipedia.org/wiki/Vanishing_gradient_problem#Residual_networks. 
they are pretty new i think. or some combination of things.


thanks

--
Honesty is a very expensive gift. So, don't expect it from cheap people 
- Warren Buffett

http://tayek.com/
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Learning related stuff

2017-11-29 Thread Darren Cook
> My question is this; people have been messing around with neural nets
> and machine learning for 40 years; what was the breakthrough that made
> alphago succeed so spectacularly.

5 or 6 orders more magnitude CPU power (relative to the late 90s) (*).

This means you can try out ideas to see if they work, and get the answer
back in hours, rather than years.

After 10 hrs it was playing with an elo somewhere between 0 and 1000
(Figure 3 in the alpha go zero paper). I.e. idiot level. That is
something like 1100 years of effort on 1995 hardware.

They put together a large team (by hobbyist computer go standards) of
top people, at least two of which had made strong go programs before.

I'd name two other things: dropout (and other regularization techniques)
allowed deeper networks; the work on image recognition gave you
production-ready CNNs, without having to work through all the dead ends
yourself. Also better optimization techniques. Taken together maybe
algorithmic advances are worth another order of magnitude.

Darren

*: The source is the intro to my own book ;-) From memory, I made the
estimate as the average of top supercomputer 20 years apart, and a
typical high-end PC 20 years apart.
https://en.wikipedia.org/wiki/History_of_supercomputing#Historical_TOP500_table

-- 
Darren Cook, Software Researcher/Developer
My New Book: Practical Machine Learning with H2O:
  http://shop.oreilly.com/product/0636920053170.do
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Learning related stuff

2017-11-29 Thread Dave Dyer

My question is this; people have been messing around with neural nets
and machine learning for 40 years; what was the breakthrough that made
alphago succeed so spectacularly.


___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Learning related stuff

2017-11-29 Thread uurtamo .
It's nearly comic to imagine a player at 1,1 trying to figure things out.

It's not a diss on you; I honestly want for people to relax, take a minute,
and treat badmouthing the alpha go team's ideas as a secondary
consideration. They did good work. Probably arguing about the essentials
won't prove that they're stupid in any way. So let's learn, move forward,
and have no bad words about their ridiculously well-funded effort.

Recreating their work at a smaller scale would be awesome.

s.

On Nov 29, 2017 4:33 PM, "Eric Boesch"  wrote:

> Could you be reading too much into my comment? AlphaGo Zero is an amazing
> achievement, and I might guess its programmers will succeed in applying
> their methods to other fields. Nonetheless, I thought it was interesting,
> and it would appear the programmers did too, that before improving to
> superhuman level, AlphaGo was temporarily stuck in a rut of playing
> literally the worst first move on the board (excluding pass). That doesn't
> mean I think I could do better.
>
>
> On Tue, Nov 28, 2017 at 4:50 AM, uurtamo .  wrote:
>
>> This is starting to feel like asking along the lines of, "how can I
>> explain this to myself or improve on what's already been done in a way that
>> will make this whole process work faster on my hardware".
>>
>> It really doesn't look like there are a bunch of obvious shortcuts.
>> That's the whole point of decision-trees imposed by humans for 20+ years on
>> the game; it wasn't really better.
>>
>> Probably what would be good to convince oneself of these things would be
>> to challenge each assumption in divergent branches (suggested earlier) and
>> watch the resulting players' strength over time. Yes, this might take a
>> year or more on your hardware.
>>
>> I feel like maybe a lot of this is sour grapes; let's  please again
>> acknowledge that the hobbyists aren't there yet without trying to tear down
>> the accomplishments of others.
>>
>> s.
>>
>> On Nov 27, 2017 7:36 PM, "Eric Boesch"  wrote:
>>
>>> I imagine implementation determines whether transferred knowledge is
>>> helpful. It's like asking whether forgetting is a problem -- it often is,
>>> but evidently not for AlphaGo Zero.
>>>
>>> One crude way to encourage stability is to include an explicit or
>>> implicit age parameter that forces the program to perform smaller
>>> modifications to its state during later stages. If the parameters you copy
>>> from problem A to problem B also include that age parameter, so the network
>>> acts old even though it is faced with a new problem, then its initial
>>> exploration may be inefficient. For an MCTS based example, if a MCTS node
>>> is initialized to a 10877-6771 win/loss record based on evaluations under
>>> slightly different game rules, then with a naive implementation, even if
>>> the program discovers the right refutation under the new rules right away,
>>> it would still need to revisit that node thousands of times to convince
>>> itself the node is now probably a losing position.
>>>
>>> But unlearning bad plans in a reasonable time frame is already a feature
>>> you need from a good learning algorithm. Even AlphaGo almost fell into trap
>>> states; from their paper, it appears that it stuck with 1-1 as an opening
>>> move for much longer than you would expect from a program probably already
>>> much better than 40 kyu. Even if it's unrealistic for Go specifically, you
>>> could imagine some other game where after days of analysis, the program
>>> suddenly discovers a reliable trick that adds one point for white to every
>>> single game. The effect would be the same as your komi change -- a mature
>>> network now needs to adapt to a general shift in the final score. So the
>>> task of adapting to handle similar games may be similar to the task of
>>> adapting to analysis reversals within a single game, and improvements to
>>> one could lead to improvements to the other.
>>>
>>>
>>>
>>> On Fri, Nov 24, 2017 at 7:54 AM, Stephan K 
>>> wrote:
>>>
 2017-11-21 23:27 UTC+01:00, "Ingo Althöfer" <3-hirn-ver...@gmx.de>:
 > My understanding is that the AlphaGo hardware is standing
 > somewhere in London, idle and waitung for new action...
 >
 > Ingo.

 The announcement at
 https://deepmind.com/blog/applying-machine-learning-mammography/ seems
 to disagree:

 "Our partners in this project wanted researchers at both DeepMind and
 Google involved in this research so that the project could take
 advantage of the AI expertise in both teams, as well as Google’s
 supercomputing infrastructure - widely regarded as one of the best in
 the world, and the same global infrastructure that powered DeepMind’s
 victory over the world champion at the ancient game of Go."
 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 

Re: [Computer-go] Learning related stuff

2017-11-29 Thread Eric Boesch
Could you be reading too much into my comment? AlphaGo Zero is an amazing
achievement, and I might guess its programmers will succeed in applying
their methods to other fields. Nonetheless, I thought it was interesting,
and it would appear the programmers did too, that before improving to
superhuman level, AlphaGo was temporarily stuck in a rut of playing
literally the worst first move on the board (excluding pass). That doesn't
mean I think I could do better.


On Tue, Nov 28, 2017 at 4:50 AM, uurtamo .  wrote:

> This is starting to feel like asking along the lines of, "how can I
> explain this to myself or improve on what's already been done in a way that
> will make this whole process work faster on my hardware".
>
> It really doesn't look like there are a bunch of obvious shortcuts. That's
> the whole point of decision-trees imposed by humans for 20+ years on the
> game; it wasn't really better.
>
> Probably what would be good to convince oneself of these things would be
> to challenge each assumption in divergent branches (suggested earlier) and
> watch the resulting players' strength over time. Yes, this might take a
> year or more on your hardware.
>
> I feel like maybe a lot of this is sour grapes; let's  please again
> acknowledge that the hobbyists aren't there yet without trying to tear down
> the accomplishments of others.
>
> s.
>
> On Nov 27, 2017 7:36 PM, "Eric Boesch"  wrote:
>
>> I imagine implementation determines whether transferred knowledge is
>> helpful. It's like asking whether forgetting is a problem -- it often is,
>> but evidently not for AlphaGo Zero.
>>
>> One crude way to encourage stability is to include an explicit or
>> implicit age parameter that forces the program to perform smaller
>> modifications to its state during later stages. If the parameters you copy
>> from problem A to problem B also include that age parameter, so the network
>> acts old even though it is faced with a new problem, then its initial
>> exploration may be inefficient. For an MCTS based example, if a MCTS node
>> is initialized to a 10877-6771 win/loss record based on evaluations under
>> slightly different game rules, then with a naive implementation, even if
>> the program discovers the right refutation under the new rules right away,
>> it would still need to revisit that node thousands of times to convince
>> itself the node is now probably a losing position.
>>
>> But unlearning bad plans in a reasonable time frame is already a feature
>> you need from a good learning algorithm. Even AlphaGo almost fell into trap
>> states; from their paper, it appears that it stuck with 1-1 as an opening
>> move for much longer than you would expect from a program probably already
>> much better than 40 kyu. Even if it's unrealistic for Go specifically, you
>> could imagine some other game where after days of analysis, the program
>> suddenly discovers a reliable trick that adds one point for white to every
>> single game. The effect would be the same as your komi change -- a mature
>> network now needs to adapt to a general shift in the final score. So the
>> task of adapting to handle similar games may be similar to the task of
>> adapting to analysis reversals within a single game, and improvements to
>> one could lead to improvements to the other.
>>
>>
>>
>> On Fri, Nov 24, 2017 at 7:54 AM, Stephan K 
>> wrote:
>>
>>> 2017-11-21 23:27 UTC+01:00, "Ingo Althöfer" <3-hirn-ver...@gmx.de>:
>>> > My understanding is that the AlphaGo hardware is standing
>>> > somewhere in London, idle and waitung for new action...
>>> >
>>> > Ingo.
>>>
>>> The announcement at
>>> https://deepmind.com/blog/applying-machine-learning-mammography/ seems
>>> to disagree:
>>>
>>> "Our partners in this project wanted researchers at both DeepMind and
>>> Google involved in this research so that the project could take
>>> advantage of the AI expertise in both teams, as well as Google’s
>>> supercomputing infrastructure - widely regarded as one of the best in
>>> the world, and the same global infrastructure that powered DeepMind’s
>>> victory over the world champion at the ancient game of Go."
>>> ___
>>> Computer-go mailing list
>>> Computer-go@computer-go.org
>>> http://computer-go.org/mailman/listinfo/computer-go
>>>
>>
>>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Update Odin Zero Iteration 4 9x9 on CGOS

2017-11-29 Thread valkyria
The new Iteration 4 network has 25000 games added to the game pool based 
on the first Iteration 3 network, but note there is a delay in learning 
since the games are generated and added in the background.


From visual inspection it looks like the prior distribution looks more 
finished. Really bad moves has close to zero probability, but the most 
important is if the high probability moves are strong or not.


On CGOS it seems both _I4 versions (full search and 5000 simulations) 
has a 55% winrate over the previous respective version. But against 
other programs there seems I4 is equal or even slightly worse.


It could be that the I3 versions won games because of playing very 
unconventional moves, and when I4 moves into more normal games, it get 
beaten by programs who are tuned to play those variations better. Pure 
speculation!


I also had to restart network training for 9x9 komi 7.0 and 5.5 becuase 
I got a "NaN" as a report loss value, from testing it looked like the 
networks might have exploded for some reason. My suspicion is that I 
have a very small batchsize to avoid overfitting (lower batchsize 
increases randomness and avoids sharp local minima that can lead to 
overfitting) but this might also cause instability. So I reduced the 
learning rate with a factor 1/2.


I also yesterday made the 13x13 play less games per iterations and 
faster games in selfplay, becuase it will take an eternity otherwise. It 
still has not generated the I3 network.


On 19x19 which is already running fast training with only 1 games 
from the first iteration it seems that is currently learned to predict 
the forced moves generated by Odin. Perhaps a little too much to my 
taste. In the extreme the network will just copy the monte carlo 
playouts of Odin...


There is a risk in my experiment that the networks will just become 
strongly biased to the original MCTS evaluation of Odin without learning 
any deeper knowledge about go.


Best
Magnus Persson
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go