Re: [computer-go] ELO Ratings of move pattern

2007-12-26 Thread Jason House
I was doing an in-depth review of Hunter's paper over the extended
weekend...  I think I now see the value of the MM algorithms, and the
understand the divergence problem.

MM has the nice property that convergence is guaranteed.  Newton-Raphson
(N-R) carries no such guarantee.  The paper includes a simple (83 variable?)
example where N-R fails.  That doesn't mean that it has no value, but any
self-respecting implementation may need to replace N-R steps with an MM
steps if N-R fails to increase the objective function (Hunter calls this
modified Newton-Raphson)

There are a few critical properties of a minorization function.  Below I use
Q to represent the minorized version, and L to represent to original
objective function.
 1. It must be guaranteed that Q( old x ) == L( old x )
 2. It must be guaranteed that L( new x ) = Q( new x )
 3. It must be guaranteed that Q( new x )  Q( old x )

Combining those together
L( new x ) = Q(new x)  Q( old x) = L (old x)
... or L(new x)  L(old x)... ensuring forward progression.

Properties 12 are a basic requirement for minorized functions.  #3 is a
criteria on how to select between candidate minorized functions.  The math
of the new function must remain easy to achieve this.  Usually this will
mean defining a function that is easy to maximize.  In practice this is
further translated to mean that the partial derivatives of Q are independent
of all other variables being adjusted.  Without this, optimizing variables
independently (and doing a batch update) breaks property 3 and convergence
guarantees are lost.

This completely explains the divergence Remi observed and rationalizes why
his heuristic is correct (don't update more than one member of any team).
That restores the independence that makes finding a true maximum easy.

I've convinced myself that exactly replicating Remi's solution is the
correct first step in doing the ELO computations.  If speed is a factor, my
second step may be to compute multiple update strategies (such as N-R) and
then use whichever gives the largest jump in performance.

On Dec 21, 2007 2:07 PM, Rémi Coulom [EMAIL PROTECTED] wrote:

 Jason House wrote:
  For each iteration of batch updates, do you pass over the data
  computing whatever you can and then pick a set of mutually exclusive
  items to update?  How bad would it be to update everything, including
  those on the same team?

 The algorithm may not converge. In the beginning, when I had only a few
 features, I updated everything at the same time, and it worked. But when
 I added more features, it diverged.


 One advantage of MM is that it has proved convergence. It is
 difficult to get such a guarantee with approximate Newton.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] ELO Ratings of move pattern

2007-12-21 Thread Jason House
On Dec 21, 2007 8:53 AM, Rémi Coulom [EMAIL PROTECTED] wrote:

 Hi,

 Minorization-maximization is a simple optimization method, and I agree
 that it is likely that more efficient algorithms can be applied.

 Newton's method implies estimating the inverse of the Hessian matrix.
 Really computing the inverse has a cost cubic in the size of the matrix,
 so it is not practical with tens of thousands of parameters.

 Nevertheless, Newton's method can be applied to each parameter one by
 one. If I understand correctly, this is what Jason proposed. One step of
 Newton's method on just one parameter has a computational cost very
 similar to MM, and is much more efficient (it should converge in just
 one or two iterations).


You understand correctly.  Mine was based off what I remembered of Newton's
method from High School.  We didn't mess with Hessian matrices!  Hunter's
paper [1] does compare MM with the matrix form of the Newton-Raphson
method.  It shows convergence in far fewer iterations (6 vs. 26 in an
example with 82 parameters)

Given that doing one parameter at a time may be less ideal, I don't know if
my method would really inherit those properties or not.



 Still, MM has the advantage that it can be applied in batch mode, thus
 saving a lot of computation.


What do you mean by batch mode?  Are you implying something like what's at
the top of page 387 in Hunter's paper? [1].  I initially it as that, but I
don't really believe that any more.  It looks like too much work to be
useful.

For each iteration of batch updates, do you pass over the data computing
whatever you can and then pick a set of mutually exclusive items to update?
How bad would it be to update everything, including those on the same team?

In what I'm envisioning for my method, computations of 1/Ej would also be
shared for each computations.  I'd also keep a running sum of the selection
probability and the square of the selection probability.  Once at the end of
the data, I'd apply my equation to compute the update of all parameters.


[1]
http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=bodyid=pdf_1handle=euclid.aos/1079120141
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] ELO Ratings of move pattern

2007-12-21 Thread Álvaro Begué
On Dec 21, 2007 8:53 AM, Rémi Coulom [EMAIL PROTECTED] wrote:

 Hi,

 Minorization-maximization is a simple optimization method, and I agree
 that it is likely that more efficient algorithms can be applied.

 Newton's method implies estimating the inverse of the Hessian matrix.
 Really computing the inverse has a cost cubic in the size of the matrix,
 so it is not practical with tens of thousands of parameters.


Well, you don't need the exact solution to the system of linear equations,
so you can apply an iterative method instead of doing full Gaussian
elimination. Well, and I wasn't thinking of having tens of thousands of
parameters... How much data do you intend to use to get decent coefficients
in a regression with such number of coefficients to be determined? Aren't
they going to be extremely noisy?

I am sure MM is a perfectly good algorithm for this purpose, but it has the
serious down side that I don't understand it. :) I do understand the general
idea behind it and how it works in some simple cases, but I don't know
enough to adapt it to my particular needs.

Nevertheless, Newton's method can be applied to each parameter one by
 one. If I understand correctly, this is what Jason proposed. One step of
 Newton's method on just one parameter has a computational cost very
 similar to MM, and is much more efficient (it should converge in just
 one or two iterations).


Hmmm... That sounds similar to just considering the diagonal part of the
Hessian matrix, or something of that sort. If the features are sort of
orthogonal in some sense (uncorrelated?), this should be a good
approximation, but I don't know how well this would apply to our particular
regression problems.

Remi, do you dump the training data to a simple text file and you then feed
that to a program that does the regression, or does the program compute it
on the fly from the database of positions?

Álvaro.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] ELO Ratings of move pattern

2007-12-21 Thread Jason House
On Dec 21, 2007 10:03 AM, Álvaro Begué [EMAIL PROTECTED] wrote:

 I am sure MM is a perfectly good algorithm for this purpose, but it has
 the serious down side that I don't understand it. :) I do understand the
 general idea behind it and how it works in some simple cases, but I don't
 know enough to adapt it to my particular needs.



Based on my analysis, the assumptions in MM are not always that accurate.
It's partly what made me derive my own method.  Of course, I simply used
what I knew.



 Nevertheless, Newton's method can be applied to each parameter one by
  one. If I understand correctly, this is what Jason proposed. One step of
 
  Newton's method on just one parameter has a computational cost very
  similar to MM, and is much more efficient (it should converge in just
  one or two iterations).


 Hmmm... That sounds similar to just considering the diagonal part of the
 Hessian matrix, or something of that sort. If the features are sort of
 orthogonal in some sense (uncorrelated?), this should be a good
 approximation, but I don't know how well this would apply to our particular
 regression problems.


I doubt (very highly) that parameters are uncorrelated.



 Remi, do you dump the training data to a simple text file and you then
 feed that to a program that does the regression, or does the program compute
 it on the fly from the database of positions?


I'm not Remi, but I can at least say that I plan to dump my data to a text
file.  I view that as having a few advantages.  It removes logic from my
engine that is almost orthogonal to its true purpose and allows me to easily
share my tool with others (I plan to release it under GPLv3 when done).

I'm expecting the iterative passes over the very large input set to be the
limiting factor for the tool.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] ELO Ratings of move pattern

2007-12-21 Thread Rémi Coulom

Jason House wrote:


Given that doing one parameter at a time may be less ideal, I don't 
know if my method would really inherit those properties or not.


Probably not, because the Hessian has significant non-diagonal values. 
But I expect it would still converge in less iterations than MM.




 


Still, MM has the advantage that it can be applied in batch mode, thus
saving a lot of computation.


What do you mean by batch mode?  Are you implying something like 
what's at the top of page 387 in Hunter's paper? [1].  I initially it 
as that, but I don't really believe that any more.  It looks like too 
much work to be useful.


No, batch mode is the contrary of what is explained on that page. It 
means all the parameters are updated at the same time, so that the 1/Ej 
are computed only once.




For each iteration of batch updates, do you pass over the data 
computing whatever you can and then pick a set of mutually exclusive 
items to update?  How bad would it be to update everything, including 
those on the same team?


The algorithm may not converge. In the beginning, when I had only a few 
features, I updated everything at the same time, and it worked. But when 
I added more features, it diverged.




In what I'm envisioning for my method, computations of 1/Ej would also 
be shared for each computations.  I'd also keep a running sum of the 
selection probability and the square of the selection probability.  
Once at the end of the data, I'd apply my equation to compute the 
update of all parameters.


If what you do is Newton's method with diagonal approximation of the 
Hessian, you may have to add a small negative constant to terms of that 
diagonal. One advantage of MM is that it has proved convergence. It is 
difficult to get such a guarantee with approximate Newton.


Maybe conjugate gradient would work, too (chapter 10.6):
http://www.nrbook.com/a/bookcpdf.php

Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] ELO Ratings of move pattern

2007-12-20 Thread Jason House
On Dec 5, 2007 4:44 AM, Lars [EMAIL PROTECTED] wrote:

 I have some questions concernig this paper of Remi:
 http://remi.coulom.free.fr/Amsterdam2007/MMGoPatterns.pdf

 @Remi: How many iterations you had used?

 Anyone of you have similar or other experiences with the algorithm?



I seem to have more time to think than to code lately.  I believe I've
derived an alternate update method.  I'd be curious how quickly it
converges.

Using Minorization-Maximization, Remi derived
  new gamma = wins/sum(C/E)

Using Newton-Raphson method, I derived
  fractional change = (estimation error) * (fractional uncertainty)^2
  new gamma = gamma * (1 + fractional change)
Where
  estimation error = (wins-expected wins)
  fractional uncertainty = sigma/gamma
  fractional change = new gamma/gamma - 1

Changing gamma based on both the estimation error and the uncertainty that
gamma is correct makes intuitive sense to me.  Sigma matches remi's
definition.  Because strengths are multiplied together, fractional
uncertainty makes more sense.  I'm avoiding the difficult issue of
explaining why, but I do point people to consider two things:
  1. If gamma=gamma+sigma, what does that do to team strength?
  (It multiplies the team strength by 1 + fractional uncertainty)
  2. Look at the similar definitions of fractional uncertainty
  and fractional change.

A less intuitive form is:
  fractional change = (wins-expected wins) / (expected wins^2 - wins)
  Using Remi's notation, expected wins = gamma*sum(C/E)

Regardless of if someone else plays with the alternate equation, I'll be
playing with both after the holidays.  I'd also be curious what thoughts
people have about this.

To derive this, I applied the Newton-Raphson method to find where the
partial derivative of the log-evidence is zero (AKA, where log-evidence is
maximized).  If L = log evidence, and *L = 1st partial derivative of L with
respect to gamma and **L = 2nd partial derivative of L with respect to
gamma, then new gamma - gamma = *L/**L.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] ELO Ratings of move pattern

2007-12-20 Thread Álvaro Begué
I was trying to come up with my own algorithm to maximize likelihood and I
am having a hard time getting it all in my mind. I managed to write a
working algorithm for the case of logistic regression, but it was kind of
brittle and I didn't know how to extend it to the softmax case, which is
what I wanted.

Over lunch I thought of another way of doing it that would be very general
and easy to implement. Basically, I can compute the log-likelihood for a
particular setting of the weights, and I can compute the gradient and the
Jacobian matrix (the derivative with respect to each weight or pair of
weights, respectively). Then I can approximate the function I am minimizing
using a paraboloid computed from that data, compute the minimum of the
paraboloid and take that point to be the new setting of the weights. Has
anyone tried something like this? It seems like the natural way to do it to
me.

Anyway, I would really appreciate if other people that are trying this kind
of thing could exchange some test datasets with me to see if I get to
similar weights. I think I will make the code available for anyone to use
after I get it to be satisfactory enough.

Álvaro.


On Dec 20, 2007 11:43 AM, Jason House [EMAIL PROTECTED] wrote:



 On Dec 5, 2007 4:44 AM, Lars [EMAIL PROTECTED] wrote:

  I have some questions concernig this paper of Remi:
  http://remi.coulom.free.fr/Amsterdam2007/MMGoPatterns.pdf
 
  @Remi: How many iterations you had used?
 
  Anyone of you have similar or other experiences with the algorithm?



 I seem to have more time to think than to code lately.  I believe I've
 derived an alternate update method.  I'd be curious how quickly it
 converges.

 Using Minorization-Maximization, Remi derived
   new gamma = wins/sum(C/E)

 Using Newton-Raphson method, I derived
   fractional change = (estimation error) * (fractional uncertainty)^2
   new gamma = gamma * (1 + fractional change)
 Where
   estimation error = (wins-expected wins)
   fractional uncertainty = sigma/gamma
   fractional change = new gamma/gamma - 1

 Changing gamma based on both the estimation error and the uncertainty that
 gamma is correct makes intuitive sense to me.  Sigma matches remi's
 definition.  Because strengths are multiplied together, fractional
 uncertainty makes more sense.  I'm avoiding the difficult issue of
 explaining why, but I do point people to consider two things:
   1. If gamma=gamma+sigma, what does that do to team strength?
   (It multiplies the team strength by 1 + fractional uncertainty)
   2. Look at the similar definitions of fractional uncertainty
   and fractional change.

 A less intuitive form is:
   fractional change = (wins-expected wins) / (expected wins^2 - wins)
   Using Remi's notation, expected wins = gamma*sum(C/E)

 Regardless of if someone else plays with the alternate equation, I'll be
 playing with both after the holidays.  I'd also be curious what thoughts
 people have about this.

 To derive this, I applied the Newton-Raphson method to find where the
 partial derivative of the log-evidence is zero (AKA, where log-evidence is
 maximized).  If L = log evidence, and *L = 1st partial derivative of L with
 respect to gamma and **L = 2nd partial derivative of L with respect to
 gamma, then new gamma - gamma = *L/**L.

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] ELO Ratings of move pattern

2007-12-20 Thread Jason House
On Dec 20, 2007 11:43 AM, Jason House [EMAIL PROTECTED] wrote:

 I seem to have more time to think than to code lately.  I believe I've
 derived an alternate update method.


Thinking more, I realize I messed up a three things...
For one, Newton-Raphson requires
  new gamma - gamma = -*L/**L
instead of
  new gamma - gamma = *L/**L

Another,
  1/sigma^2 = -**L
instead of
  1/sigma^2 = **L

Finally,
  **L = sum[ (selection probability)^2 ] - wins
instead of
  **L = (estimated wins)^2 - wins



 Using Newton-Raphson method, I derived
   fractional change = (estimation error) * (fractional uncertainty)^2
   new gamma = gamma * (1 + fractional change)



A less intuitive form is:
   fractional change = (wins-expected wins) / (expected wins^2 - wins)
   Using Remi's notation, expected wins = gamma*sum(C/E)


This becomes:
  fractional change =
  ( wins - sum(selection probability) )
/ ( wins - sum((selection probability)^2) )

selection probability = gamma*C/E
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] ELO Ratings of move pattern

2007-12-20 Thread Jason House
On Dec 20, 2007 5:39 PM, Álvaro Begué [EMAIL PROTECTED] wrote:

 I was trying to come up with my own algorithm to maximize likelihood and I
 am having a hard time getting it all in my mind. I managed to write a
 working algorithm for the case of logistic regression, but it was kind of
 brittle and I didn't know how to extend it to the softmax case, which is
 what I wanted.

 Over lunch I thought of another way of doing it that would be very general
 and easy to implement. Basically, I can compute the log-likelihood for a
 particular setting of the weights, and I can compute the gradient and the
 Jacobian matrix (the derivative with respect to each weight or pair of
 weights, respectively). Then I can approximate the function I am minimizing
 using a paraboloid computed from that data, compute the minimum of the
 paraboloid and take that point to be the new setting of the weights. Has
 anyone tried something like this? It seems like the natural way to do it to
 me.


My knowledge of the Jacobian matrix is limited.
http://en.wikipedia.org/wiki/Jacobian seems to imply that your definition of
the Jacobian matrix is unusual.

To fit a paraboloid, don't you need the 2nd derivative as well?  I'd assume
you'd fit a parabola in each direction (for each gamma) and then sum them
together into one paraboloid

Interestingly enough, I believe assuming a paraboloid is equivalent to
assume a linear first derivative (which is what I did).

Of course, I have no idea how the Jacobian comes into play, so I assume I'm
missing a few things.  Is it possible to share a bit more of your final
method?



 Anyway, I would really appreciate if other people that are trying this
 kind of thing could exchange some test datasets with me to see if I get to
 similar weights. I think I will make the code available for anyone to use
 after I get it to be satisfactory enough.


I'm not that far yet.  I'll share when I have it, although I plan to try and
use the same starting data set (games) as Remi.  I have yet to code
calculation of some of the features.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] ELO Ratings of move pattern

2007-12-20 Thread Álvaro Begué
On Dec 20, 2007 10:36 PM, Jason House [EMAIL PROTECTED] wrote:



 On Dec 20, 2007 5:39 PM, Álvaro Begué [EMAIL PROTECTED] wrote:

  Over lunch I thought of another way of doing it that would be very
  general and easy to implement. Basically, I can compute the log-likelihood
  for a particular setting of the weights, and I can compute the gradient and
  the Jacobian matrix (the derivative with respect to each weight or pair of
  weights, respectively). Then I can approximate the function I am minimizing
  using a paraboloid computed from that data, compute the minimum of the
  paraboloid and take that point to be the new setting of the weights. Has
  anyone tried something like this? It seems like the natural way to do it to
  me.


 My knowledge of the Jacobian matrix is limited.  
 http://en.wikipedia.org/wiki/Jacobian
 seems to imply that your definition of the Jacobian matrix is unusual.


Ooops! It's been too long since I learned these things. What I meant to say
was the Hessian matrix ( http://en.wikipedia.org/wiki/Hessian_matrix ),
which does contain the second derivatives. You are right that assuming the
derivative changes linearly and using a paraboloid are the exact same thing.
So we are probably thinking of the same method, although I think I can code
it in very elegantly (maybe slow too?).


Álvaro.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] ELO Ratings of move pattern

2007-12-12 Thread Rémi Coulom

Jason House wrote:



On Dec 6, 2007 11:38 AM, Rémi Coulom [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:


Jason House wrote:

 This may serve as a good test of if there is enough data to assign
 values to the patterns.

I did not mention this in my paper, but you can rather easily
estimate
uncertainty margins around Elo values. This involves computing the
second-order derivative of the probability distribution with
respect to
log(gamma). Since the distribution has a shape that looks very
much like
a Gaussian, the second-order derivative at the maximum is a good
estimation of -1/sigma². That is how I compute confidence intervals in
bayeselo.


What do you mean by the probability distribution with respect to 
log(gamma)?  Do you mean a plot of the prediction rate with only the 
gamma of interest varying? 


No the prediction rate, but the probability of the training data. More 
precisely, the logarithm of that probability.


If you have P(x)=A*exp(-x²/2sigma²), then log(P(x))=log(A)-x²/2sigma², 
and d²(log(P(x)))/dx²=-1/sigma². This means that, for a Gaussian 
probability distribution, the second-order derivative directly gives the 
variance. For distributions that look similar to a Gaussian, the 
second-order derivative is a good approximation.


Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] ELO Ratings of move pattern

2007-12-12 Thread Jason House
On Dec 6, 2007 11:38 AM, Rémi Coulom [EMAIL PROTECTED] wrote:

 Jason House wrote:
 
  This may serve as a good test of if there is enough data to assign
  values to the patterns.

 I did not mention this in my paper, but you can rather easily estimate
 uncertainty margins around Elo values. This involves computing the
 second-order derivative of the probability distribution with respect to
 log(gamma). Since the distribution has a shape that looks very much like
 a Gaussian, the second-order derivative at the maximum is a good
 estimation of -1/sigma². That is how I compute confidence intervals in
 bayeselo.


What do you mean by the probability distribution with respect to
log(gamma)?  Do you mean a plot of the prediction rate with only the gamma
of interest varying?
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] ELO Ratings of move pattern

2007-12-12 Thread Jason House
On Dec 12, 2007 2:59 PM, Rémi Coulom [EMAIL PROTECTED] wrote:

  Do you mean a plot of the prediction rate with only the
  gamma of interest varying?

 No the prediction rate, but the probability of the training data. More
 precisely, the logarithm of that probability.


I still don't know what you mean by this.



 If you have P(x)=A*exp(-x²/2sigma²), then log(P(x))=log(A)-x²/2sigma²,
 and d²(log(P(x)))/dx²=-1/sigma². This means that, for a Gaussian
 probability distribution, the second-order derivative directly gives the
 variance. For distributions that look similar to a Gaussian, the
 second-order derivative is a good approximation.


This part I understand
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] ELO Ratings of move pattern

2007-12-12 Thread Álvaro Begué
On Dec 12, 2007 3:05 PM, Jason House [EMAIL PROTECTED] wrote:



 On Dec 12, 2007 2:59 PM, Rémi Coulom [EMAIL PROTECTED] wrote:

   Do you mean a plot of the prediction rate with only the
   gamma of interest varying?
 
  No the prediction rate, but the probability of the training data. More
  precisely, the logarithm of that probability.


 I still don't know what you mean by this.


He probably should use the word likelihood instead of probability.
http://en.wikipedia.org/wiki/Likelihood_function





  If you have P(x)=A*exp(-x²/2sigma²), then log(P(x))=log(A)-x²/2sigma²,
  and d²(log(P(x)))/dx²=-1/sigma². This means that, for a Gaussian
  probability distribution, the second-order derivative directly gives the
  variance. For distributions that look similar to a Gaussian, the
  second-order derivative is a good approximation.


 This part I understand

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] ELO Ratings of move pattern

2007-12-12 Thread Jason House
On Dec 12, 2007 3:09 PM, Álvaro Begué [EMAIL PROTECTED] wrote:



 On Dec 12, 2007 3:05 PM, Jason House [EMAIL PROTECTED] wrote:

 
 
  On Dec 12, 2007 2:59 PM, Rémi Coulom [EMAIL PROTECTED] wrote:
 
Do you mean a plot of the prediction rate with only the
gamma of interest varying?
  
   No the prediction rate, but the probability of the training data. More
   precisely, the logarithm of that probability.
 
 
  I still don't know what you mean by this.
 

 He probably should use the word likelihood instead of probability.
 http://en.wikipedia.org/wiki/Likelihood_function


Clearly I'm missing something, because I still don't understand.  Let's take
a simple example of a move is on the 3rd line and has a gamma value of 1.75.
What is the equation or sequence of discrete values that I can take the
derivative of?

Doing conditional probabilities based on move is on 3rd line and move is
selected (AKA pure training data) seems to yield a fixed value rather than
something approximating a normal distribution.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] ELO Ratings of move pattern

2007-12-12 Thread Álvaro Begué
On Dec 12, 2007 3:31 PM, Jason House [EMAIL PROTECTED] wrote:



 On Dec 12, 2007 3:09 PM, Álvaro Begué [EMAIL PROTECTED] wrote:

 
 
  On Dec 12, 2007 3:05 PM, Jason House [EMAIL PROTECTED]
  wrote:
 
  
  
   On Dec 12, 2007 2:59 PM, Rémi Coulom [EMAIL PROTECTED]
   wrote:
  
 Do you mean a plot of the prediction rate with only the
 gamma of interest varying?
   
No the prediction rate, but the probability of the training data.
More
precisely, the logarithm of that probability.
  
  
   I still don't know what you mean by this.
  
 
  He probably should use the word likelihood instead of probability.
  http://en.wikipedia.org/wiki/Likelihood_function
 

 Clearly I'm missing something, because I still don't understand.  Let's
 take a simple example of a move is on the 3rd line and has a gamma value of
 1.75.  What is the equation or sequence of discrete values that I can take
 the derivative of?


We start with a database of games, and we are trying to find a set of gamma
values. For a given set of gamma values, we can compute the probability of
all the moves happening exactly as they happened in the database. So if the
first move is E4 and we had E4 as having a probability of 0.005, we start
with that, then we take the next move, and multiply 0.005 by the probability
of the second move, etc. By the end of the database, we'll have some number
like 3.523E-9308 which is the probability of all of the moves in the
database happening. This is the probability of the database if it had been
generated by a random process following the probability distributions
modeled by the set gamma values. You can see this as a function of the gamma
values. This function is usually called likelihood function. In order to
pick the best gammas, we choose the ones with the maximum likelihood.
Sometimes we use the logarithm of the likelihood instead, which has the
interpretation of being minus the amount of information in the database,
plus it's not a number with gazillion 0s after the decimal point.

Now, around the point where the maximum likelihood happens, you can try to
move one of the gammas and see how much it hurts the likelihood. For some
features it will hurt a lot, which means that the value has to be very close
to the one you computed, or you'll get a bad model, and for some features it
will hurt very little, which means that there are other settings of the
value that are sort of equivalent. The second derivative of the likelihood
(or the log of the likelihood, I don't think it should matter much), will
tell you how narrow a peak you are at.

Does that make some sense?
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] ELO Ratings of move pattern

2007-12-12 Thread Rémi Coulom

Jason House wrote:



On Dec 12, 2007 3:09 PM, Álvaro Begué [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:




On Dec 12, 2007 3:05 PM, Jason House [EMAIL PROTECTED]
mailto:[EMAIL PROTECTED] wrote:



On Dec 12, 2007 2:59 PM, Rémi Coulom
[EMAIL PROTECTED]
mailto:[EMAIL PROTECTED] wrote:

 Do you mean a plot of the prediction rate with only the
 gamma of interest varying?

No the prediction rate, but the probability of the
training data. More
precisely, the logarithm of that probability. 



I still don't know what you mean by this.


He probably should use the word likelihood instead of
probability. http://en.wikipedia.org/wiki/Likelihood_function


Clearly I'm missing something, because I still don't understand.  
Let's take a simple example of a move is on the 3rd line and has a 
gamma value of 1.75.  What is the equation or sequence of discrete 
values that I can take the derivative of?


Doing conditional probabilities based on move is on 3rd line and 
move is selected (AKA pure training data) seems to yield a fixed 
value rather than something approximating a normal distribution.


Consider, in the Elo rating analogy, a player with a win and a loss to 
player whose gamma is 1. There you have P(gamma)=gamma/(1+gamma)², whose 
maximum is at gamma = 1. It is that probability that I am talking about. 
It is the probability that is maximized by the MM algorithm.


Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] ELO Ratings of move pattern

2007-12-12 Thread Jason House
On Dec 12, 2007 4:27 PM, Álvaro Begué [EMAIL PROTECTED] wrote:

 Clearly I'm missing something, because I still don't understand.  Let's
  take a simple example of a move is on the 3rd line and has a gamma value of
  1.75.  What is the equation or sequence of discrete values that I can
  take the derivative of?
 

 We start with a database of games, and we are trying to find a set of
 gamma values. For a given set of gamma values, we can compute the
 probability of all the moves happening exactly as they happened in the
 database. So if the first move is E4 and we had E4 as having a probability
 of 0.005, we start with that, then we take the next move, and multiply
 0.005 by the probability of the second move, etc. By the end of the
 database, we'll have some number like 3.523E-9308 which is the probability
 of all of the moves in the database happening. This is the probability of
 the database if it had been generated by a random process following the
 probability distributions modeled by the set gamma values. You can see this
 as a function of the gamma values. This function is usually called
 likelihood function. In order to pick the best gammas, we choose the ones
 with the maximum likelihood. Sometimes we use the logarithm of the
 likelihood instead, which has the interpretation of being minus the amount
 of information in the database, plus it's not a number with gazillion 0s
 after the decimal point.

 Now, around the point where the maximum likelihood happens, you can try to
 move one of the gammas and see how much it hurts the likelihood. For some
 features it will hurt a lot, which means that the value has to be very close
 to the one you computed, or you'll get a bad model, and for some features it
 will hurt very little, which means that there are other settings of the
 value that are sort of equivalent. The second derivative of the likelihood
 (or the log of the likelihood, I don't think it should matter much), will
 tell you how narrow a peak you are at.

 Does that make some sense?


It makes perfect sense.  Thanks.

If I do the math right, using the notations of Remi's paper,
d^2 log(L) / (dgamma_i)^2
 = sum_j [C_ij^2/(C_ij*gamma_i+D_ij)^2 - A_ij^2/(A_ij*gamma_i+B_ij)^2]
 = 1 / (sigma for ELO of gamma_i)^2
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] ELO Ratings of move pattern

2007-12-06 Thread Álvaro Begué
Thanks for the file! This should be very helpful when I try to reproduce
results.

It looks like you are not taking advantage of symmetries. For instance,

88|0|17.033168
88|1|12.263955

and

164|0|17.388714
164|1|25.862695

Are identical except for swapping the roles of white and black (88 == 1120
in base 4, 164 == 2210 in base 4), and therefore they should have the same
scores. If you do take symmetries of this type into consideration, you'll
reduce the number of gamma values you have to compute by a factor of almost
16 (8 symmetries in the D4 group, times 2 colors). This should result in
faster learning and better estimates.

Álvaro.


On Dec 6, 2007 4:28 AM, Lars [EMAIL PROTECTED] wrote:

 While you can find the gamma-values of non-shape patterns in the paper
 of Remi, I can give you the list of shape-pattern gamma-values.

 The source-code is mixed with my engine-code, which i don't like to
 publish. But the algorithm is more or less easy to implement..

 I'll try to attach the file to this mail, if it don't work I send it to
 your address.

 The file have the following coding:
 It's a text-file with one pattern each line.
 Lines look as:
 Pattern-ID|Move-right(0=black, 1=white)|gamma-value\n

 The Pattern-IDs have the following coding:
 In the binary representation of the ID you have 2 Bits for every field.
 So you have 18 Bits at all.
 00 = Empty, 01 = Black, 02 = White, 03 = Edge
 The pairs are ordered as:
 1 2 3
 4 5 6
 7 8 9
 in the ID it is
 123456789

 While you can rotate and mirror the patterns only one respresentation of
 the various possible representations is contained in the file.

 
  On Dec 5, 2007 3:17 PM, Lars [EMAIL PROTECTED] wrote:
  Thank you for your explanations!
  You were right, it was a bug, and now it works really fine.
 
  Are you (or Remi) willing to publish any of the following?
  1. Source code for extracting ELO ratings for moves
  2. Full list of gamma values, including patterns
 
 
  Either of those would help kick start those of us that are just
  starting to go down a similar road.
  ___
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] ELO Ratings of move pattern

2007-12-06 Thread Álvaro Begué
Oh, I see you have applied the symmetries, but not the swapping of roles.
Still, this should probably be done and cut the number of gamma values in
half.

Álvaro.


On Dec 6, 2007 7:13 AM, Álvaro Begué [EMAIL PROTECTED] wrote:

 Thanks for the file! This should be very helpful when I try to reproduce
 results.

 It looks like you are not taking advantage of symmetries. For instance,

 88|0|17.033168
 88|1|12.263955

 and

 164|0|17.388714
 164|1|25.862695

 Are identical except for swapping the roles of white and black (88 == 1120
 in base 4, 164 == 2210 in base 4), and therefore they should have the same
 scores. If you do take symmetries of this type into consideration, you'll
 reduce the number of gamma values you have to compute by a factor of almost
 16 (8 symmetries in the D4 group, times 2 colors). This should result in
 faster learning and better estimates.

 Álvaro.



 On Dec 6, 2007 4:28 AM, Lars [EMAIL PROTECTED] wrote:

  While you can find the gamma-values of non-shape patterns in the paper
  of Remi, I can give you the list of shape-pattern gamma-values.
 
  The source-code is mixed with my engine-code, which i don't like to
  publish. But the algorithm is more or less easy to implement..
 
  I'll try to attach the file to this mail, if it don't work I send it to
  your address.
 
  The file have the following coding:
  It's a text-file with one pattern each line.
  Lines look as:
  Pattern-ID|Move-right(0=black, 1=white)|gamma-value\n
 
  The Pattern-IDs have the following coding:
  In the binary representation of the ID you have 2 Bits for every field.
  So you have 18 Bits at all.
  00 = Empty, 01 = Black, 02 = White, 03 = Edge
  The pairs are ordered as:
  1 2 3
  4 5 6
  7 8 9
  in the ID it is
  123456789
 
  While you can rotate and mirror the patterns only one respresentation of
  the various possible representations is contained in the file.
 
  
   On Dec 5, 2007 3:17 PM, Lars [EMAIL PROTECTED] wrote:
   Thank you for your explanations!
   You were right, it was a bug, and now it works really fine.
  
   Are you (or Remi) willing to publish any of the following?
   1. Source code for extracting ELO ratings for moves
   2. Full list of gamma values, including patterns
  
  
   Either of those would help kick start those of us that are just
   starting to go down a similar road.
   ___
   computer-go mailing list
   computer-go@computer-go.org
   http://www.computer-go.org/mailman/listinfo/computer-go/
 
  ___
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/
 


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] ELO Ratings of move pattern

2007-12-06 Thread Chris Fant
But then you lose information on player-to-move, right?

On Dec 6, 2007 7:18 AM, Álvaro Begué [EMAIL PROTECTED] wrote:
 Oh, I see you have applied the symmetries, but not the swapping of roles.
 Still, this should probably be done and cut the number of gamma values in
 half.

 Álvaro.




 On Dec 6, 2007 7:13 AM, Álvaro Begué  [EMAIL PROTECTED] wrote:
  Thanks for the file! This should be very helpful when I try to reproduce
 results.
 
  It looks like you are not taking advantage of symmetries. For instance,
  88|0|17.033168
  88|1|12.263955
  and
  164|0|17.388714
 
  164|1|25.862695
  Are identical except for swapping the roles of white and black (88 == 1120
 in base 4, 164 == 2210 in base 4), and therefore they should have the same
 scores. If you do take symmetries of this type into consideration, you'll
 reduce the number of gamma values you have to compute by a factor of almost
 16 (8 symmetries in the D4 group, times 2 colors). This should result in
 faster learning and better estimates.
 
  Álvaro.
 
 
 
 
 
 
  On Dec 6, 2007 4:28 AM, Lars [EMAIL PROTECTED]  wrote:
 
   While you can find the gamma-values of non-shape patterns in the paper
   of Remi, I can give you the list of shape-pattern gamma-values.
  
   The source-code is mixed with my engine-code, which i don't like to
   publish. But the algorithm is more or less easy to implement..
  
   I'll try to attach the file to this mail, if it don't work I send it to
   your address.
  
   The file have the following coding:
   It's a text-file with one pattern each line.
   Lines look as:
   Pattern-ID|Move-right(0=black, 1=white)|gamma-value\n
  
   The Pattern-IDs have the following coding:
   In the binary representation of the ID you have 2 Bits for every field.
   So you have 18 Bits at all.
   00 = Empty, 01 = Black, 02 = White, 03 = Edge
   The pairs are ordered as:
   1 2 3
   4 5 6
   7 8 9
   in the ID it is
   123456789
  
   While you can rotate and mirror the patterns only one respresentation of
   the various possible representations is contained in the file.
  
   
On Dec 5, 2007 3:17 PM, Lars [EMAIL PROTECTED] wrote:
Thank you for your explanations!
You were right, it was a bug, and now it works really fine.
   
Are you (or Remi) willing to publish any of the following?
1. Source code for extracting ELO ratings for moves
2. Full list of gamma values, including patterns
   
   
Either of those would help kick start those of us that are just
starting to go down a similar road.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/
  
   ___
   computer-go mailing list
   computer-go@computer-go.org
   http://www.computer-go.org/mailman/listinfo/computer-go/
  
 
 


 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] ELO Ratings of move pattern

2007-12-06 Thread Chris Fant
Oh, I didn't notice at first that the player-to-move was encoded
seperatly from the pattern shape.

On Dec 6, 2007 9:53 AM, Álvaro Begué [EMAIL PROTECTED] wrote:



 On Dec 6, 2007 9:31 AM, Chris Fant [EMAIL PROTECTED] wrote:
  But then you lose information on player-to-move, right?

 No. What I am saying is that it is as urgent for black to play on

 . . .
 . . X
 X O .

 as it is for white to play in

 . . .
 . . O
 O X .

 Precisely one way of implementing what I said is by only storing patterns
 with black to move. If it is white's turn, make turn the 01s into 10s and
 the 10s into 01s in the pattern and then do the lookup.

 Some code like this would do:

 unsigned switch_players_in_pattern(unsigned x){
   return ((x0x2)1) | ((x0x1)1);
 }

 Álvaro.


 
 
 
 
 
 
 
 
  On Dec 6, 2007 7:18 AM, Álvaro Begué [EMAIL PROTECTED] wrote:
   Oh, I see you have applied the symmetries, but not the swapping of
 roles.
   Still, this should probably be done and cut the number of gamma values
 in
   half.
  
   Álvaro.
  
  
  
  
   On Dec 6, 2007 7:13 AM, Álvaro Begué  [EMAIL PROTECTED] wrote:
Thanks for the file! This should be very helpful when I try to
 reproduce
   results.
   
It looks like you are not taking advantage of symmetries. For
 instance,
88|0|17.033168
88|1|12.263955
and
164|0|17.388714
   
164|1|25.862695
Are identical except for swapping the roles of white and black (88 ==
 1120
   in base 4, 164 == 2210 in base 4), and therefore they should have the
 same
   scores. If you do take symmetries of this type into consideration,
 you'll
   reduce the number of gamma values you have to compute by a factor of
 almost
   16 (8 symmetries in the D4 group, times 2 colors). This should result in
   faster learning and better estimates.
   
Álvaro.
   
   
   
   
   
   
On Dec 6, 2007 4:28 AM, Lars [EMAIL PROTECTED]  wrote:
   
 While you can find the gamma-values of non-shape patterns in the
 paper
 of Remi, I can give you the list of shape-pattern gamma-values.

 The source-code is mixed with my engine-code, which i don't like to
 publish. But the algorithm is more or less easy to implement..

 I'll try to attach the file to this mail, if it don't work I send it
 to
 your address.

 The file have the following coding:
 It's a text-file with one pattern each line.
 Lines look as:
 Pattern-ID|Move-right(0=black, 1=white)|gamma-value\n

 The Pattern-IDs have the following coding:
 In the binary representation of the ID you have 2 Bits for every
 field.
 So you have 18 Bits at all.
 00 = Empty, 01 = Black, 02 = White, 03 = Edge
 The pairs are ordered as:
 1 2 3
 4 5 6
 7 8 9
 in the ID it is
 123456789

 While you can rotate and mirror the patterns only one
 respresentation of
 the various possible representations is contained in the file.

 
  On Dec 5, 2007 3:17 PM, Lars [EMAIL PROTECTED]  wrote:
  Thank you for your explanations!
  You were right, it was a bug, and now it works really
 fine.
 
  Are you (or Remi) willing to publish any of the following?
  1. Source code for extracting ELO ratings for moves
  2. Full list of gamma values, including patterns
 
 
  Either of those would help kick start those of us that are just
  starting to go down a similar road.
  ___
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

   
   
  
  
   ___
   computer-go mailing list
   computer-go@computer-go.org
   http://www.computer-go.org/mailman/listinfo/computer-go/
  
  ___
 
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/
 


 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] ELO Ratings of move pattern

2007-12-06 Thread Jason House
On Dec 6, 2007 7:13 AM, Álvaro Begué [EMAIL PROTECTED] wrote:

 88|0|17.033168
 88|1|12.263955

 and

 164|0|17.388714
 164|1|25.862695

 Are identical except for swapping the roles of white and black


Curiously, the gamma values in your example are way different

17.033168 vs 25.862595
and
12.263955 vs 17.388714
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] ELO Ratings of move pattern

2007-12-06 Thread Jason House
On Dec 6, 2007 10:13 AM, Álvaro Begué [EMAIL PROTECTED] wrote:



 On Dec 6, 2007 10:06 AM, Jason House [EMAIL PROTECTED] wrote:

 
 
  On Dec 6, 2007 7:13 AM, Álvaro Begué [EMAIL PROTECTED] wrote:
 
   88|0|17.033168
   88|1|12.263955
  
   and
  
   164|0|17.388714
   164|1|25.862695
  
   Are identical except for swapping the roles of white and black
 
 
  Curiously, the gamma values in your example are way different
 
  17.033168 vs 25.862595
  and
  12.263955 vs 17.388714
 

 That was exactly my point. Those should theoretically be identical, which
 means that the difference comes purely from noise. The games that were used
 for training probably don't have enough examples of these patterns to get a
 good estimate of their true strength.


This may serve as a good test of if there is enough data to assign values to
the patterns.



Actually, the gamma values are only determined up to a multiplication of all
 of them by a constant. Because patterns with white to move and patterns with
 black to move never compete with each other, they may have drifted in such a
 way that the discrepancy is not as large as it seems (since both ratios
 25.862595/17.033168 and 17.388714/12.263955 are similar).


That may be a fluke.  Other pairs have a much different ratio
0|0|1.463386
0|1|1.337276



 Still, forcing those gamma values to be identical seems like the right
 thing to do.


I agree
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] ELO Ratings of move pattern

2007-12-05 Thread Rémi Coulom

Lars wrote:

I have some questions concernig this paper of Remi:
http://remi.coulom.free.fr/Amsterdam2007/MMGoPatterns.pdf

1. Which sense make the prior (Section 3.3 in the paper and where is the
application?
I understand it the way that you put 2 more competitions to each pattern
in the minorization-maximization formula. But how does this produce a
probability distribution with mean 0 and standard deviation 302??
  


This probability distribution is in the scale of Elo ratings. The 
formula is Elo = 400*log10(gamma). The probability distribution of 
getting one win and one loss against one player of rating zero has 
standard deviation 302 (and mean zero).



Is the prior only necessary to make sure that every pattern have at
least one win?
  


It also has a regularization effect that avoids overfitting. That is to 
say, even if a pattern has, say, one win and ten losses against a player 
of rating zero, the prior will make it two wins and eleven losses, thus 
moving the strength of the pattern a little bit towards zero. It avoids 
high ratings for patterns that have been seen rarely.



2. I had run the algorithm on 400 games (including handicap-games) from
the same game-records source Remi used (Section 3.2), but i used an
other month. I concidered only 3x3 shape-patterns and simple non-shape
pattern including dist-to-boarder, dist-to-prevMove, atari, capture,
extension..
After 1 iterations of the algorithm, I got strenght-parameter values
completely different to the results of the paper (Table 1). Most of my
parameters (including all the dist-to-boarder parameters!) have values
less than 1.0. Does anyone have some explanations on this? 
After 5 iteration it is even worse. Most of the low values (less than
1.0) gets even lower, the high values even higher. 
  


This sounds like a bug. The number of iterations depends on features. I 
do not remember exactly how many iterations it took for mine. In general 
2-3 iterations are enough to get very reasonable values, a dozen to have 
good convergence.


Note that the values of some features are relative and not absolute. For 
instance, distance to the previous move: if you multiply all of them by 
a constant value, it will not change the likelihood. Nevertheless, if 
you have no bug in your implementation, you should get values that look 
like mine.


My suggestion would be to test your code with very small amounts of 
artificial data. For instance, start by two players A and B, and, say A 
beats B twice and B beats A once. Gammas should converge to gamma_A = 2 
* gamma_B. You can make up similar tests with teams.


Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] ELO Ratings of move pattern

2007-12-05 Thread Jason House
On Dec 5, 2007 4:44 AM, Lars [EMAIL PROTECTED] wrote:

 2. I had run the algorithm on 400 games (including handicap-games) from
 the same game-records source Remi used (Section 3.2), but i used an
 other month. I concidered only 3x3 shape-patterns and simple non-shape
 pattern including dist-to-boarder, dist-to-prevMove, atari, capture,
 extension..
 After 1 iterations of the algorithm, I got strenght-parameter values
 completely different to the results of the paper (Table 1). Most of my
 parameters (including all the dist-to-boarder parameters!) have values
 less than 1.0. Does anyone have some explanations on this?
 After 5 iteration it is even worse. Most of the low values (less than
 1.0) gets even lower, the high values even higher.


Some of Remi's features will score every move on the board.  When this
occurs, any scalar multiplication of those feature values will work.  This
seems to apply to both distance to move features and the MC Owner
feature.  When I get around to doing this (hopefully soon), I'll probably
leave out something to better pin the values down.  Essentially, omissions
are like forcing a value of 1.0.

As far as what the values should be in absolute terms, I can't yet comment
:(



 Anyone of you have similar or other experiences with the algorithm?


The holidays are slowing me down.  I hope to be ready to compare notes in
January.  I'm really glad to see someone else doing this.  Any lessons
learned will be incredibly helpful...
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/