Re: [Computer-go] AlphaGo & DCNN: Handling long-range dependency
Actually chess software is much, much better. I recall that today's software running on 1998 hardware beats 1998 software running on today's hardware. It was very soon after 1998 that ordinary PCs could play on a par with world champions. -Original Message- From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of ?? ??? Sent: Friday, March 11, 2016 7:18 AM To: computer-go@computer-go.org Subject: Re: [Computer-go] AlphaGo & DCNN: Handling long-range dependency I think that a desktop computer's calculating power appear to develop to a necessary level sooner then the algorithm may be optimized to use the power nowdays available. For example, I belive that chess programs run on a desktop well not because of a new better algotrithm but because the Deep Blue's 11.38 GFLOP power is available on desktop from about 2006, in ten years only. So I think the speculation that Deep Mind will change the objective to a more advanced task is right :) Dmitry 11.03.2016, 14:28, "Darren Cook": >>> global, more long-term planning. A rumour so far suggests to have >>> used the >>> time for more learning, but I'd be surprised if this should have sufficed. >> >> My personal hypothesis so far is that it might - the REINFORCE might >> scale amazingly well and just continuous application of it... > > Agreed. What they have built is a training data generator, that can > churn out 9-dan level moves, 24 hours a day. Over the years I've had > to throw away so many promising ideas because they came down to > needing a 9-dan pro to, say, do the tedious job of ranking all legal > moves in each test position. > > What I'm hoping Deep Mind will do next is study how to maintain the > same level but using less hardware, until they can shrink it down to > run on, say, a high-end desktop computer. The knowledge gained > obviously has a clear financial benefit just in running costs, and > computer-go is a nice objective domain to measure progress. (But the > cynic in me suspects they'll just move to the next bright and shiny AI > problem.) > > Darren > > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] AlphaGo & DCNN: Handling long-range dependency
I think that a desktop computer's calculating power appear to develop to a necessary level sooner then the algorithm may be optimized to use the power nowdays available. For example, I belive that chess programs run on a desktop well not because of a new better algotrithm but because the Deep Blue's 11.38 GFLOP power is available on desktop from about 2006, in ten years only. So I think the speculation that Deep Mind will change the objective to a more advanced task is right :) Dmitry 11.03.2016, 14:28, "Darren Cook": >>> global, more long-term planning. A rumour so far suggests to have used the >>> time for more learning, but I'd be surprised if this should have sufficed. >> >> My personal hypothesis so far is that it might - the REINFORCE might >> scale amazingly well and just continuous application of it... > > Agreed. What they have built is a training data generator, that can > churn out 9-dan level moves, 24 hours a day. Over the years I've had to > throw away so many promising ideas because they came down to needing a > 9-dan pro to, say, do the tedious job of ranking all legal moves in each > test position. > > What I'm hoping Deep Mind will do next is study how to maintain the same > level but using less hardware, until they can shrink it down to run on, > say, a high-end desktop computer. The knowledge gained obviously has a > clear financial benefit just in running costs, and computer-go is a nice > objective domain to measure progress. (But the cynic in me suspects > they'll just move to the next bright and shiny AI problem.) > > Darren > > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] AlphaGo & DCNN: Handling long-range dependency
>> global, more long-term planning. A rumour so far suggests to have used the >> time for more learning, but I'd be surprised if this should have sufficed. > > My personal hypothesis so far is that it might - the REINFORCE might > scale amazingly well and just continuous application of it... Agreed. What they have built is a training data generator, that can churn out 9-dan level moves, 24 hours a day. Over the years I've had to throw away so many promising ideas because they came down to needing a 9-dan pro to, say, do the tedious job of ranking all legal moves in each test position. What I'm hoping Deep Mind will do next is study how to maintain the same level but using less hardware, until they can shrink it down to run on, say, a high-end desktop computer. The knowledge gained obviously has a clear financial benefit just in running costs, and computer-go is a nice objective domain to measure progress. (But the cynic in me suspects they'll just move to the next bright and shiny AI problem.) Darren ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] AlphaGo & DCNN: Handling long-range dependency
On Fri, Mar 11, 2016 at 09:33:52AM +0100, Robert Jasiek wrote: > On 11.03.2016 08:24, Huazuo Gao wrote: > >Points at the center of the board indeed depends on the full board, but > >points near the edge does not. > > I have been wondering why AlphaGo could improve a lot between the Fan Hui > and Lee Sedol matches incl. learning sente and showing greater signs of more > global, more long-term planning. A rumour so far suggests to have used the > time for more learning, but I'd be surprised if this should have sufficed. My personal hypothesis so far is that it might - the REINFORCE might scale amazingly well and just continuous application of it (or possibly more frequent sampling to get more data points; once per game always seemed quite conservative to me) could make AlphaGo amazingly strong. We know that after 30mil. self-play games, the RL value network bumps the strength by ~450 Elo, but what about after 300mil. self-play games? (Possibly after training the RL policy further too.) (My main clue for this was the comment that current AlphaGo self-play games are already looking quite different from human games. Another explanation for that might be that they found a way to replace the SL policy with RL policy in the tree.) -- Petr Baudis If you have good ideas, good data and fast computers, you can do almost anything. -- Geoffrey Hinton ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] AlphaGo & DCNN: Handling long-range dependency
On 11.03.2016 08:24, Huazuo Gao wrote: Points at the center of the board indeed depends on the full board, but points near the edge does not. I have been wondering why AlphaGo could improve a lot between the Fan Hui and Lee Sedol matches incl. learning sente and showing greater signs of more global, more long-term planning. A rumour so far suggests to have used the time for more learning, but I'd be surprised if this should have sufficed. So far, I have the following theories: - deeper net - greater parameters for convolutional patterns (instead of 5x5 and 3x3, (also) use larger parameters) or combine the earlier parameters with additional larger parameters or with an additional NN having only / mostly larger parameters - replace or enhance top KGS games by 100,000+ pro games - instead of / in addition to feed forward nets, use long short term memory nets (but I cannot know if this is advantageous considering presumably greater GPU time) - instead of single position patterns, use combinations of current position and later positions, for different (dynamic) parameters of time shift, so as to model long-term effects -- robert jasiek ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go