Re: [Computer-go] AlphaGo & DCNN: Handling long-range dependency

2016-03-11 Thread Brian Sheppard
Actually chess software is much, much better. I recall that today's software 
running on 1998 hardware beats 1998 software running on today's hardware.

It was very soon after 1998 that ordinary PCs could play on a par with world 
champions.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
?? ???
Sent: Friday, March 11, 2016 7:18 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo & DCNN: Handling long-range dependency

I think that a desktop computer's calculating power appear to develop to a 
necessary level sooner then the algorithm may be optimized to use the power 
nowdays available. For example, I belive that chess programs run on a desktop 
well not because of a new better algotrithm but because the Deep Blue's 11.38 
GFLOP power is available on desktop from about 2006, in ten years only. So I 
think the speculation that Deep Mind will change the objective to a more 
advanced task is right :)

Dmitry

11.03.2016, 14:28, "Darren Cook" :
>>>  global, more long-term planning. A rumour so far suggests to have 
>>> used the
>>>  time for more learning, but I'd be surprised if this should have sufficed.
>>
>>  My personal hypothesis so far is that it might - the REINFORCE might
>>  scale amazingly well and just continuous application of it...
>
> Agreed. What they have built is a training data generator, that can 
> churn out 9-dan level moves, 24 hours a day. Over the years I've had 
> to throw away so many promising ideas because they came down to 
> needing a 9-dan pro to, say, do the tedious job of ranking all legal 
> moves in each test position.
>
> What I'm hoping Deep Mind will do next is study how to maintain the 
> same level but using less hardware, until they can shrink it down to 
> run on, say, a high-end desktop computer. The knowledge gained 
> obviously has a clear financial benefit just in running costs, and 
> computer-go is a nice objective domain to measure progress. (But the 
> cynic in me suspects they'll just move to the next bright and shiny AI 
> problem.)
>
> Darren
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo & DCNN: Handling long-range dependency

2016-03-11 Thread Рождественский Дмитрий
I think that a desktop computer's calculating power appear to develop to a 
necessary level sooner then the algorithm may be optimized to use the power 
nowdays available. For example, I belive that chess programs run on a desktop 
well not because of a new better algotrithm but because the Deep Blue's 11.38 
GFLOP power is available on desktop from about 2006, in ten years only. So I 
think the speculation that Deep Mind will change the objective to a more 
advanced task is right :)

Dmitry

11.03.2016, 14:28, "Darren Cook" :
>>>  global, more long-term planning. A rumour so far suggests to have used the
>>>  time for more learning, but I'd be surprised if this should have sufficed.
>>
>>  My personal hypothesis so far is that it might - the REINFORCE might
>>  scale amazingly well and just continuous application of it...
>
> Agreed. What they have built is a training data generator, that can
> churn out 9-dan level moves, 24 hours a day. Over the years I've had to
> throw away so many promising ideas because they came down to needing a
> 9-dan pro to, say, do the tedious job of ranking all legal moves in each
> test position.
>
> What I'm hoping Deep Mind will do next is study how to maintain the same
> level but using less hardware, until they can shrink it down to run on,
> say, a high-end desktop computer. The knowledge gained obviously has a
> clear financial benefit just in running costs, and computer-go is a nice
> objective domain to measure progress. (But the cynic in me suspects
> they'll just move to the next bright and shiny AI problem.)
>
> Darren
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo & DCNN: Handling long-range dependency

2016-03-11 Thread Darren Cook
>> global, more long-term planning. A rumour so far suggests to have used the
>> time for more learning, but I'd be surprised if this should have sufficed.
> 
> My personal hypothesis so far is that it might - the REINFORCE might
> scale amazingly well and just continuous application of it...

Agreed. What they have built is a training data generator, that can
churn out 9-dan level moves, 24 hours a day. Over the years I've had to
throw away so many promising ideas because they came down to needing a
9-dan pro to, say, do the tedious job of ranking all legal moves in each
test position.

What I'm hoping Deep Mind will do next is study how to maintain the same
level but using less hardware, until they can shrink it down to run on,
say, a high-end desktop computer. The knowledge gained obviously has a
clear financial benefit just in running costs, and computer-go is a nice
objective domain to measure progress. (But the cynic in me suspects
they'll just move to the next bright and shiny AI problem.)

Darren

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo & DCNN: Handling long-range dependency

2016-03-11 Thread Petr Baudis
On Fri, Mar 11, 2016 at 09:33:52AM +0100, Robert Jasiek wrote:
> On 11.03.2016 08:24, Huazuo Gao wrote:
> >Points at the center of the board indeed depends on the full board, but
> >points near the edge does not.
> 
> I have been wondering why AlphaGo could improve a lot between the Fan Hui
> and Lee Sedol matches incl. learning sente and showing greater signs of more
> global, more long-term planning. A rumour so far suggests to have used the
> time for more learning, but I'd be surprised if this should have sufficed.

My personal hypothesis so far is that it might - the REINFORCE might
scale amazingly well and just continuous application of it (or possibly
more frequent sampling to get more data points; once per game always
seemed quite conservative to me) could make AlphaGo amazingly strong.
We know that after 30mil. self-play games, the RL value network bumps
the strength by ~450 Elo, but what about after 300mil. self-play games?
(Possibly after training the RL policy further too.)

(My main clue for this was the comment that current AlphaGo self-play
games are already looking quite different from human games.  Another
explanation for that might be that they found a way to replace the SL
policy with RL policy in the tree.)

-- 
Petr Baudis
If you have good ideas, good data and fast computers,
you can do almost anything. -- Geoffrey Hinton
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo & DCNN: Handling long-range dependency

2016-03-11 Thread Robert Jasiek

On 11.03.2016 08:24, Huazuo Gao wrote:

Points at the center of the board indeed depends on the full board, but
points near the edge does not.


I have been wondering why AlphaGo could improve a lot between the Fan 
Hui and Lee Sedol matches incl. learning sente and showing greater signs 
of more global, more long-term planning. A rumour so far suggests to 
have used the time for more learning, but I'd be surprised if this 
should have sufficed. So far, I have the following theories:


- deeper net
- greater parameters for convolutional patterns (instead of 5x5 and 3x3, 
(also) use larger parameters) or combine the earlier parameters with 
additional larger parameters or with an additional NN having only / 
mostly larger parameters

- replace or enhance top KGS games by 100,000+ pro games
- instead of / in addition to feed forward nets, use long short term 
memory nets (but I cannot know if this is advantageous considering 
presumably greater GPU time)
- instead of single position patterns, use combinations of current 
position and later positions, for different (dynamic) parameters of time 
shift, so as to model long-term effects


--
robert jasiek
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go