Re: [Computer-go] Teaching Deep Convolutional Neural Networks to Play Go

2015-03-17 Thread David Silver
Hi Oliver Reinforcement learning is different to unsupervised learning. We used reinforcement learning to train the Atari games. Also we published a more recent paper (www.nature.com/articles/nature14236) that applied the same network to 50 different Atari games (achieving human level in around

Re: [Computer-go] Move Evaluation in Go Using Deep Convolutional Neural Networks

2014-12-22 Thread David Silver
Hi Martin - Would you be willing to share some of the sgf game records played by your network with the community? I tried to replay the game record in your paper, but got stuck since it does not show any of the moves that got captured. Sorry about that, we will correct the figure and repost.

Re: [computer-go] David Silvers Rave formula

2009-05-07 Thread David Silver
Hi Lars, is there anyone who can repost the pdf (rave.pdf?) the following mails are talking about? http://computer-go.org/pipermail/computer-go/2008-February/014095.html I think you can still find the original attachment here:

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-05-04 Thread David Silver
Hi, We used alpha=0.1. There may well be a better setting of alpha, but this appeared to work nicely in our experiments. -Dave On 3-May-09, at 2:01 AM, elife wrote: Hi Dave, In your experiments what's the constant value alpha you set? Thanks. 2009/5/1 David Silver sil...@cs.ualberta.ca

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-05-01 Thread David Silver
Hi Yamato, If M and N are the same, is there any reason to run M simulations and N simulations separately? What happens if you combine them and calculate V and g in the single loop? I think it gives the wrong answer to do it in a single loop. Note that the simulation outcomes z are used

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread David Silver
Hi Remi, This is strange: you do not take lost playouts into consideration. I believe there is a problem with your estimation of the gradient. Suppose for instance that you count z = +1 for a win, and z = -1 for a loss. Then you would take lost playouts into consideration. This makes me

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread David Silver
Hi Remi, I understood this. What I find strange is that using -1/1 should be equivalent to using 0/1, but your algorithm behaves differently: it ignores lost games with 0/1, and uses them with -1/1. Imagine you add a big constant to z. One million, say. This does not change the problem.

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread David Silver
Hi Yamato, Thanks for the detailed explanation. M, N and alpha are constant numbers, right? What did you set them to? You're welcome! Yes, in our experiments they were just constant numbers M=N=100. The feature vector is the set of patterns you use, with value 1 if a pattern is matched and

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread David Silver
IMO other people's equations/code/ideas/papers always seem smarter than your own. The stuff you understand and do yourself just seems like common sense, and the stuff you don't always has a mystical air of complexity, at least until you understand it too :-) On 30-Apr-09, at 1:59 PM,

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-29 Thread David Silver
Hi Yamato, Could you give us the source code which you used? Your algorithm is too complicated, so it would be very helpful if possible. Actually I think the source code would be much harder to understand! It is written inside RLGO, and makes use of a substantial existing framework that

[computer-go] Re: [computer go] Monte-Carlo Simulation Balancing

2009-04-29 Thread David Silver
Hi Remi, What komi did you use for 5x5 and 6x6 ? I used 7.5 komi for both board sizes. I find it strange that you get only 70 Elo points from supervised learning over uniform random. Don't you have any feature for atari extension ? This one alone should improve strength immensely (extend

[computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread David Silver
Hi Yamato, I like you idea, but why do you use only 5x5 and 6x6 Go? 1. Our second algorithm, two-ply simulation balancing, requires a training set of two-ply rollouts. Rolling out every position from a complete two-ply search is very expensive on larger board sizes, so we would probably

[computer-go] Monte-Carlo Simulation Balancing

2009-04-27 Thread David Silver
Hi Remi, If I understand correctly, your method makes your program 250 Elo points stronger than my pattern-learning algorithm on 5x5 and 6x6, by just learning better weights. Yes, although this is just in a very simple MC setting. Also we did not compare directly to the algorithm you used

[computer-go] Monte-Carlo Simulation Balancing

2009-04-27 Thread David Silver
Hi Michael, But one thing confuses me: You are using the value from Fuego's 10k simulations as an approximation of the actual value of the position. But isn't the actual value of the position either a win or a loss? On such small boards, can't you assume that Fuego is able to correctly

[computer-go] Re: RAVE formula of David Silver (reposted)

2008-11-28 Thread David Silver
This document is confusing, but here is my interpretation of it. And it works well for Valkyria. I would really want to see a pseudocode version of it. I might post the code I use for Valkyria, but it is probably not the same thing so I would probably just increase the confusion if I did... The

[computer-go] Paper for AAAI

2008-04-17 Thread David Silver
Hi Petr, Thanks for the great comments, sorry to be so slow in getting back to you (on vacation/workshop...) Hello, On Sun, Apr 06, 2008 at 08:55:26PM -0600, David Silver wrote: Here is a draft of the paper, any feedback would be very welcome :-) http://www.cs.ualberta.ca/~silver/research

[computer-go] Paper for AAAI

2008-04-06 Thread David Silver
Hi everyone, Sylvain and myself have had a paper accepted for the Nectar track at the 23rd Conference on Artificial Intelligence (AAAI-08). The idea of this track is to summarise previously published results from a specific field to a wider audience interested in general AI. Please bear

[computer-go] New UCT-RAVE formula (was Re: computer-go Digest, Vol 43, Issue 8)

2008-02-16 Thread David Silver
I am very confused about the new UCT-RAVE formula. The equation 9 seems to mean: variance_u = value_ur * (1 - value_ur) / n. Is it wrong? If correct, why is it the variance? I think that the variance of the UCT should be: variance_u = value_u * (1 - value_u). Hi Yamato, There are two

[computer-go] New UCT-RAVE formula (was Re: computer-go Digest, Vol 43, Issue 8)

2008-02-16 Thread David Silver
Hi Erik, Thanks for the thought-provoking response! Yes, but why add upper confidence bounds to the rave values at all? If they really go down that fast, does it make much of a difference? According to the recent experiments in MoGo, you are right :-) However, I've seen slightly different

[computer-go] New UCT-RAVE formula (was Re: computer-go Digest, Vol 43, Issue 8)

2008-02-16 Thread David Silver
David Silver wrote: BTW if anyone just wants the formula, and doesn't care about the derivation - then just use equations 11-14. Yes, I just want to use the formula. But I don't know what the bias is... How can I get the value of br? Sorry for the slow reply... The simplest answer

[computer-go] Re: Explanation to MoGo paper wanted.

2007-07-05 Thread David Silver
In other words UCT works well when evaluation/playouts is/are strong. I believe there are still improvements possible to the UCT algorithm as shown by the recent papers by Mogo and Crazystone authors, but what really will make a difference is in the quality in the playouts. Sylvain said

[computer-go] Re: Explanation to MoGo paper wanted.

2007-07-05 Thread David Silver
Seems like it should be up to the person in the other environment to adapt your successful algorithm (and notation/terminology) to their environment. But how do the other people in other environments find out about the algorithm? And find out that it is something they could use in their

[computer-go] Re: Explanation to MoGo paper wanted. (BackGammon Code)

2007-07-03 Thread David Silver
It's because Go is not only game in the world and certainly not only reinforcement learning problem. They are using a widely accepted terminology. But a very inappropriate one. I have read Suttons book and all the things I know (e.g. TD-Gammon) are completly obfuscated. Really? I think

[computer-go] Re: Explanation to MoGo paper wanted. (BackGammon Code)

2007-07-03 Thread David Silver
It's because Go is not only game in the world and certainly not only reinforcement learning problem. They are using a widely accepted terminology. But a very inappropriate one. I have read Suttons book and all the things I know (e.g. TD-Gammon) are completly obfuscated. Its maybe suitable

[computer-go] Re: Amsterdam 2007 paper

2007-05-21 Thread David Silver
On 5/18/07, Rémi Coulom [EMAIL PROTECTED] wrote: My idea was very similar to what you describe. The program built a collection of rules of the kind if condition then move. Condition could be anything from a tree-search rule of the kind in this particular position play x, or general rule such

[computer-go] Re: computer-go Digest, Vol 34, Issue 15

2007-05-18 Thread David Silver
Very interesting paper! I have one question. The assumption in your paper is that increasing the performance of the simulation player will increase the performance of Monte-Carlo methods that use that simulation player. However, we found in MoGo that this is not necessarily the case! Do

[computer-go] Re: Amsterdam 2007 paper

2007-05-18 Thread David Silver
I also use an online learning algorithm in RLGO to adjust feature weights during the game. I use around a million features (all possible patterns from 1x1 up to 3x3 at all locations on the board) and update the weights online from simulated games using temporal difference learning. I also

[computer-go] Re: Amsterdam 2007 paper

2007-05-18 Thread David Silver
Thanks for the great paper. And thanks for sharing it before it's published. Now I know what directions to take my engine in next. Time for Team MoGo so share some more secrets :) We are publishing MoGo's secrets at ICML 2007, in just over a month. So not long to wait now! -Dave