Re: [computer-go] Re: Explanation to MoGo paper wanted. (BackGammonCode)

2007-07-04 Thread chrilly
Thanks, the dictionary is really great.

Chrilly
  - Original Message - 
  From: David Silver 
  To: computer-go@computer-go.org 
  Sent: Tuesday, July 03, 2007 11:29 PM
  Subject: [computer-go] Re: Explanation to MoGo paper wanted. (BackGammonCode)


 It's because Go is not only game in the world and certainly not only
 reinforcement learning problem. They are using a widely accepted
 terminology.

But a very inappropriate one. I have read Suttons book and all the things I 
know (e.g. TD-Gammon) are completly obfuscated. Its maybe suitable to 
present generel concepts, but it is extremly complicated to formulate an 
algorithm in this framework.


  Here is quick and dirty RL-Computer Go translation kit to try and help 
bridge the gap!


  RL terminology  Go terminology


  State   Position
  Action Move
  Reward Win/Loss
  Return Win/Loss
  Episode Game
  Time-step One move
  Agent   Program
  Value function  Evaluation function
  Policy Player
  Default policy  Simulation player
  Uniform random policy   Light simulation player
  Other stochastic policy Heavy simulation player
  Greedy policy   1-ply search player
  Epsilon-greedy policy   1-ply search player with some random moves   
  FeatureFactor used for position 
evaluation
  Weight  Weight of each factor in evaluation function
  Tabular representation  One weight for each complete position
  Partial tabular UCT tree
  representation
  State abstraction   One weight for many positions
  Linear value function   Evaluation function
  approximation  using weighted sum of various factors
  Feature discovery   Learning new factors for the evaluation function
  Sample-based search Simulation (Monte-Carlo methods, etc.)
  Transition function Rules of the game
  Environment Rules of the game + opponent
  Trajectory  Move sequence
  Online  During actual play
  Offline Before/after actual play (e.g. preprocessing)
  On-policy   If both players play as normal
  Off-policy  If either player behaves differently


  -Dave 




--


  ___
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Re: Explanation to MoGo paper wanted. (BackGammonCode)

2007-07-04 Thread Brian Slesinsky

I wonder whether the use of games as a metaphor would make general
machine learning concepts more easily understood by non-specialists?
That is, if you took a machine learning paper and rewrote it in terms
of games, would that make it easier or harder to understand for people
unfamiliar with both game programming and machine learning?

Of course, this is hard to tell once you've learned either vocabulary.
It's easy to assume that whatever jargon you know (or invent) is
inherently easier to understand.

- Brian
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Explanation to MoGo paper wanted.

2007-07-04 Thread Don Dailey
On Wed, 2007-07-04 at 11:34 +0200, Magnus Persson wrote:
 but what really will make a
 difference is in the quality in the playouts. 

I would like to suggest a more abstract view of things.  In the purest
form of the algorithm there isn't an artificial distinction between the
tree and the play-outs.  The algorithm is applied as if the whole tree
already exists (conceptually) and nodes are updated to the end of the
game.

We had to impose end nodes and a tree that grows in depth due to the
fact that it's impractical to store the whole tree in memory.  So we
have a tree phase on the one hand, and on the other hand we have a
play-out phase that simulates an unexplored tree (but without updates
which introduces out of necessity a small inefficiency.)

This makes everything a bit of a compromise but a well advised one due
to hardware limitations.   But then we started imposing our will on the
play-outs in order to make them smarter.   But we didn't do the same
to the tree portion because we now believe they are 2 separate things
(even though they really are not.)

So I prefer to think of the play-outs and the tree as the same thing.  I
think whatever is done can be applied to both.   For instance Lazarus
does a lot of pruning and the pruning rules are the same for tree
portion and the play-out portion.   Actually, Lazarus saw most of the
improvement from the tree pruning when I test each without the other.   

But I notice that we are now looking at the tree as the search portion
and the play-outs as the evaluation function.   I think that is
incredible because I have always believed that tree search and
evaluation are the same thing, just different forms or states.  Like
water and ice,  or matter and energy.  

It's interesting that chess has this too.  Traditionally programs have
always had these 3 very crude phases,   search, quiescence, evaluation.
Modern programs have somewhat blurred these distinctions but it hasn't
changed very much.   

UCT comes along and finally does away with the distinction altogether.
Now you can call it all evaluation or search, whatever pleases you.
But in it's purest form, UCT with totally random play-outs is a
beautiful thing - a recursive evaluation function with zero (almost)
domain specific knowledge.

Of course now we just had to go and spoil it all by imposing domain
specific rules.  I have done the same and I admit it.It would be fun
to see how far we could go if domain specific knowledge was forbidden as
an experiment.   Once patterns are introduced along with other direct Go
knowledge, it's still fun but it feels a bit wrong, kind of like
cheating.   It's clear that when we do this, we introduce strengths and
weaknesses to the program,  making it a bit more fragile, less
universal or robust.  Stronger too, but more susceptible to
in-transitivity.  

Of course we do this in Chess programs in a big way.  We very tediously
tell the program what is good and what is bad.   It has no choice, it
must accept our definition of right and wrong, our morality.   However
in our great wisdom we provide a search mechanism in order to correct
our bad judgments.   The search mechanism is an admission that we know
we are wrong about many things.  

Of course you are right - if the play-outs are improved, the quality of
the moves will also improve.

- Don


   


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Explanation to MoGo paper wanted.

2007-07-04 Thread Don Dailey
On Wed, 2007-07-04 at 16:57 -0400, George Dahl wrote:
  Of course now we just had to go and spoil it all by imposing domain
  specific rules.  I have done the same and I admit it.It would be
 fun
  to see how far we could go if domain specific knowledge was
 forbidden as
  an experiment.   Once patterns are introduced along with other
 direct Go
  knowledge, it's still fun but it feels a bit wrong, kind of like
  cheating.
 
 Is it still cheating if the program learns and discover's the patterns
 itself?  Then isn't it just precomputing a few things?

Of course it isn't cheating really,  but it seems more elegant to me if
the computer is doing the figuring out, not the programmer.   Of course
the programmer has to figure out how to write the program in the first
place.

But the idea of writing a Go program without any hand-coded Go knowledge
is very appealing to me.   Of course, there HAS to be Go knowledge, even
if it's figured out by the software.

In Lazarus, I use several patterns for pruning moves.  But those
patterns are not generated by ME.  Lazarus knows more about Go than I do
and so Lazarus generated those patterns (off-line.)

Ultimately, I would like programs to figure out on the fly what to do.

It's fun to imagine how a program would work if God wrote it.   Would
there be tons of hard coded knowledge built into it, or would it be a
learning meta-system that had facilities for quickly finding out things
for itself that it needed to know?

- Don


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Explanation to MoGo paper wanted.

2007-07-04 Thread Benjamin Teuber
And how much would generating patterns from pro games be cheating? How 
about a system that gives a reward to shapes it actually played in a 
game, the pro games are then used as seed to start the system..

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Explanation to MoGo paper wanted.

2007-07-04 Thread George Dahl

Pro games are cheating unless the program is one of the players. :)

You are right though, sometimes compromises must be made when
seeding an algorithm.  My ideas on using domain knowledge from
humans are sort of about maximizing a ratio.  The ratio of program
performance to domain knowledge added (by humans, directly).
Obviously it is hard to quantify these sorts of things, but if program
A is 3 times as good (whatever that means) as program B and uses only
twice the human given Go knowledge, I would rather have program A.
- George

On 7/4/07, Benjamin Teuber [EMAIL PROTECTED] wrote:

And how much would generating patterns from pro games be cheating? How
about a system that gives a reward to shapes it actually played in a
game, the pro games are then used as seed to start the system..
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Explanation to MoGo paper wanted.

2007-07-04 Thread Don Dailey
On Thu, 2007-07-05 at 00:53 +0200, Benjamin Teuber wrote:
 And how much would generating patterns from pro games be cheating? How 
 about a system that gives a reward to shapes it actually played in a 
 game, the pro games are then used as seed to start the system..

I have posted before about the evils of trying to extract knowledge from
human games.   I don't think it is very effective compared to generating
that knowledge from computer games for several reasons.   

Of course I realize this is not a popular point of view!

- Don



 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Explanation to MoGo paper wanted.

2007-07-04 Thread Don Dailey
On Wed, 2007-07-04 at 19:23 -0400, Don Dailey wrote:
 On Thu, 2007-07-05 at 01:09 +0200, Magnus Persson wrote:
  Just to disturb the vision a strong go program without hardwired go 
  knowledge I
  currently think that there are some really important things in Go that
  are
  really hard or even impossible to learn with for examples patterns.
  The ideal
  program would need to learn procedural skills (algorithms).
 
 I'm not saying a program can be as good without hardwired knowledge, I'm
 just saying it would be a cool thing!

And even if you could, it would still require hard coded meta-skills -
skills programmed explicitly to enable it to LEARN or discover what it
needed.   So even if it wasn't direct go knowledge it would be indirect
go knowledge.

Kind of like, give a man a fish or teach him to fish.

- Don




 - Don
 
 
 
 
 

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Explanation to MoGo paper wanted.

2007-07-04 Thread Don Dailey
On Thu, 2007-07-05 at 01:09 +0200, Magnus Persson wrote:
 Just to disturb the vision a strong go program without hardwired go 
 knowledge I
 currently think that there are some really important things in Go that
 are
 really hard or even impossible to learn with for examples patterns.
 The ideal
 program would need to learn procedural skills (algorithms).

I'm not saying a program can be as good without hardwired knowledge, I'm
just saying it would be a cool thing!

- Don





___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Explanation to MoGo paper wanted.

2007-07-04 Thread Yamato
In other words UCT works well when evaluation/playouts is/are strong. I
believe
there are still improvements possible to the UCT algorithm as shown by the
recent papers by Mogo and Crazystone authors, but what really will make a
difference is in the quality in the playouts.

Sylvain said that good moves in the playouts do not always improve
the performance of UCT. What do you think about this claim?

--
Yamato
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Explanation to MoGo paper wanted.

2007-07-04 Thread Peter Drake

I believe this claim is true in two senses:

1) If the computation necessary to find better moves is too  
expensive, performing many dumb playouts may be a better investment.


2) If the playouts are too deterministic, and the moves are merely  
pretty good, the program may avoid an important move and thus  
misjudge the value of a position.


Peter Drake
http://www.lclark.edu/~drake/



On Jul 4, 2007, at 5:52 PM, Yamato wrote:

In other words UCT works well when evaluation/playouts is/are  
strong. I

believe
there are still improvements possible to the UCT algorithm as  
shown by the
recent papers by Mogo and Crazystone authors, but what really will  
make a

difference is in the quality in the playouts.


Sylvain said that good moves in the playouts do not always improve
the performance of UCT. What do you think about this claim?

--
Yamato
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Explanation to MoGo paper wanted.

2007-07-04 Thread Chris Fant

2) If the playouts are too deterministic, and the moves are merely pretty
good, the program may avoid an important move and thus misjudge the value of
a position.


IMO, this is the most interesting part of Computer Go today.  How can
one possibly design an optimal playout agent when making a playout
agent that plays strong is not the solution?  The only known method
seems to be trial and error.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/