Re: [computer-go] Re: Explanation to MoGo paper wanted.

2007-07-09 Thread Gunnar Farneb�ck
Dave wrote:
 We have seen a similar effect many times in MoGo. Often we try
 something that seems like it should improve the quality of the
 simulation player, but it makes the overall performance worse. It is
 frustrating and surprising! Has anyone else encountered this?

I'm not surprised. The goal of Monte Carlo simulations should be to
provide an unbiased estimate of the true min-max value with as low
variance as possible. This has little to do with strength, unless you
happen to find a perfect simulation player, but then the whole search
business becomes moot.

The fact that many modifications of uniformly random playouts
simultaneously improve simulation playing strength and overall strength
is a red herring. Uniformly random playouts are strongly biased to
overestimate the value of having tightly connected stones since e.g. one
space jumps become cut through disproportionally often compared to what
happens in relevant paths through the min-max tree. Almost any change in
simulation policy that counters this tendency will improve overall
strength and likewise pretty much every sensible change will improve
simulation strength compared to uniformly random play.

At higher levels something that may happen is that a change in the
simulation policy improves the skill at making life in tight spots,
without changing other skills. This would likely improve simulation
strength but would cause a bias for positions where there's room for a
futile invasion that barely fails, decreasing overall strength.

Similar phenomena have turned up in GNU Go over the years. If you tune
tactical reading or life and death reading to find some new class of
attacking moves, results are likely to become worse if you don't do
matching changes in the capability to find defense moves.

There's also the classical effect of fixing an obvious mistake just to
find some regression tests starting to fail. Closer examination shows
that the tests were previously only passing because there were two
mistakes that cancelled each other and fixing one of them breaks the
balance.

/Gunnar
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Explanation to MoGo paper wanted.

2007-07-09 Thread Brian Slesinsky

This discussion reminds me of a naive theory that I sometimes wonder about:

Since the players in the playouts are so weak, it seems like the
improving the ability to defend a strong position from a
not-very-clever move (and not lose it via a blunder) should be more
important than improving the ability to find an attack.  If there are
two equally bad players that can easily attack each other but can't
defend, it seems like the results will be close to random, almost
regardless of starting position, unless it is very strong.  On the
other hand, if two bad players are somewhat better at defense but
lousy at seeing weaknesses in the other side, there will be less noise
and the one with more territory will tend to win, but an attack on a
mostly solid position will sometimes be found via a random move, and
given enough playouts, this will result in the probability of defense
with a weakness being slightly lower than a truly winning position.

It seems like this effect would be especially true of the endgame
where there aren't so many points to take, but a position could be
lost due to a blunder.

I'm not sure how useful that is, since to defend a position you need
to know how it might be attacked, but perhaps it leads somewhere?

- Brian
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Explanation to MoGo paper wanted.

2007-07-07 Thread steve uurtamo
 How is this a ko threat?   Lazarus threatens a chain of 4 or 5 stones
 with a self-atari move.   If the opponent captures, where is the ko?
 If the opponent doesn't capture, where is the ko?

sorry, this is just terminology on my part -- a 'ko threat' is any threat
that can be used during a ko, regardless of whether or not there is a
ko on the board.  so for instance, a strong player may count and evaluate
the size and number of all of his (and his opponent's) ko threats on the
board before deciding whether or not to start a crucial ko.  he may also
create situations that will generate multiple ko threats later, stockpiling
them for later use.  wasting one of these only reduces his ability to
successfully win one or more kos later.

one way that strong players can crush weaker players is simply to start
a game-deciding ko (for instance by having the ko winner either connect
or split two important one-eyed groups), knowing that their stockpile of
threats can't be matched.  once their opponent runs out of threats, the
stronger player will lose the ko, but have one of his threats go unanswered.

all this to say that lazarus can actually play and *expect* to have one
or more of these threats unanswered if there is a standing ko of large
enough value on the board.  just start an invasion/reduction ko or a
group-splitting/connecting ko, and some of these threats can be potential
points.

s.





   

Be a better Globetrotter. Get better travel answers from someone who knows. 
Yahoo! Answers - Check it out.
http://answers.yahoo.com/dir/?link=listsid=396545469
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Explanation to MoGo paper wanted.

2007-07-06 Thread steve uurtamo
There is one other issue I have seen  that is similar.  Sometimes
Lazarus will play a move that doesn't hurt nor help it's position.
It's not a wasted move because the opponent must respond or else lose.

this sounds a good bit like a ko threat, which is tricky to distinguish
from a good play.

s.





   

Sick sense of humor? Visit Yahoo! TV's 
Comedy with an Edge to see what's on, when. 
http://tv.yahoo.com/collections/222
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Explanation to MoGo paper wanted.

2007-07-06 Thread steve uurtamo
 The attack is easily
 refuted with a capture, and when that happens no time was lost.   But
 the opponent must capture immediately or the threat Lazarus made
 actually works.

this, in fact, is a ko threat.  if you play it *outside* of a ko, then it's a
wasted ko threat.  no big loss if there are no kos expected and no
lost kos have happened.  otherwise it's more tricky.

s.





   

Be a better Heartthrob. Get better relationship answers from someone who knows. 
Yahoo! Answers - Check it out. 
http://answers.yahoo.com/dir/?link=listsid=396545433
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Explanation to MoGo paper wanted.

2007-07-06 Thread forrestc
steve uurtamo said:
There is one other issue I have seen  that is similar.  Sometimes
 Lazarus will play a move that doesn't hurt nor help it's position. It's
 not a wasted move because the opponent must respond or else lose.

 this sounds a good bit like a ko threat, which is tricky to distinguish
 from a good play.nfo/computer-go/

To play a ko threat in a ko fight is usually good play.

To play such moves anytime else is a really insidious bad habit, one
difficult to extinguish for anyone who assumes forced moves are
necessarily a good thing. One can burn a lot of opportunities this way
without even knowing they were ever there.

Forrest Curo


-
This email was sent using AIS WebMail.
http://www.americanis.net/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Explanation to MoGo paper wanted.

2007-07-06 Thread Don Dailey
On Fri, 2007-07-06 at 16:52 -0700, steve uurtamo wrote:
  The attack is easily
  refuted with a capture, and when that happens no time was lost.   But
  the opponent must capture immediately or the threat Lazarus made
  actually works.
 
 this, in fact, is a ko threat.  if you play it *outside* of a ko, then it's a
 wasted ko threat.  no big loss if there are no kos expected and no
 lost kos have happened.  otherwise it's more tricky.


How is this a ko threat?   Lazarus threatens a chain of 4 or 5 stones
with a self-atari move.   If the opponent captures, where is the ko?
If the opponent doesn't capture, where is the ko?

- Don



 
 
 
 
 

 
 Be a better Heartthrob. Get better relationship answers from someone who 
 knows. Yahoo! Answers - Check it out. 
 http://answers.yahoo.com/dir/?link=listsid=396545433

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Explanation to MoGo paper wanted.

2007-07-06 Thread Peter Drake
I think Steve meant that the move /should have been used as/ a ko  
threat.


Peter Drake
http://www.lclark.edu/~drake/



On Jul 6, 2007, at 5:12 PM, Don Dailey wrote:


On Fri, 2007-07-06 at 16:52 -0700, steve uurtamo wrote:

The attack is easily
refuted with a capture, and when that happens no time was lost.
But

the opponent must capture immediately or the threat Lazarus made
actually works.


this, in fact, is a ko threat.  if you play it *outside* of a ko,  
then it's a

wasted ko threat.  no big loss if there are no kos expected and no
lost kos have happened.  otherwise it's more tricky.



How is this a ko threat?   Lazarus threatens a chain of 4 or 5 stones
with a self-atari move.   If the opponent captures, where is the ko?
If the opponent doesn't capture, where is the ko?

- Don










_ 
___
Be a better Heartthrob. Get better relationship answers from  
someone who knows. Yahoo! Answers - Check it out.

http://answers.yahoo.com/dir/?link=listsid=396545433


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Re: Explanation to MoGo paper wanted.

2007-07-06 Thread Yamato
In Go things are insofar worse as there is only one standard sparring 
partner, Gnu-Go. This creates severe inbreeding effects. In chess there was 
a similar problem. There were more strong opponents around, but over the 
years they become very similar. Suddenly there was a new programm, Rybka, 
which plays different and  all the inbreedings have a lot of difficulties.

I think there is no better way. One can do some pre-filtering with test 
positions. If a version is especially bad in these tests, one can ignore it. 
But being good in test positions and in games are different things.

When MonteGNU is published, it will be an alternative of GNU Go.
Of course MC vs MC may have some problems, but at least it is
stronger than GNU Go on 9x9.
And, if the way to combine UCT and the local tactical search is
discovered, the regression test like GNU Go will be also useful.

--
Yamato
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Re: Explanation to MoGo paper wanted.

2007-07-05 Thread David Silver

 In other words UCT works well when evaluation/playouts is/are
 strong. I
 believe
 there are still improvements possible to the UCT algorithm as
 shown by the
 recent papers by Mogo and Crazystone authors, but what really will
 make a
 difference is in the quality in the playouts.

 Sylvain said that good moves in the playouts do not always improve
 the performance of UCT. What do you think about this claim?


I believe this claim is true in two senses:

1) If the computation necessary to find better moves is too
expensive, performing many dumb playouts may be a better investment.



Sure, this is true. But even with the same number of simulations,  
stonger playouts do not necessarily perform better than dumb  
playouts. This is the real mystery!



2) If the playouts are too deterministic, and the moves are merely
pretty good, the program may avoid an important move and thus
misjudge the value of a position.


We tried the whole spectrum from completely random to completely  
deterministic playouts, but we never came close to the performance of  
the dumb playouts!


We have seen a similar effect many times in MoGo. Often we try  
something that seems like it should improve the quality of the  
simulation player, but it makes the overall performance worse. It is  
frustrating and surprising! Has anyone else encountered this?


-Dave


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Re: Explanation to MoGo paper wanted.

2007-07-05 Thread David Silver
Seems like it should be up to the person in the other environment  
to adapt your

successful algorithm (and notation/terminology) to their environment.


But how do the other people in other environments find out about the  
algorithm? And find out that it is something they could use in their  
own environment? I think we can help with both, by presenting our  
work in a more general way.


-Dave___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Re: Explanation to MoGo paper wanted.

2007-07-05 Thread Don Dailey
On Thu, 2007-07-05 at 10:50 -0600, David Silver wrote:
 We tried the whole spectrum from completely random to completely
 deterministic playouts, but we never came close to the performance of
 the dumb playouts! 

I don't understand - I though Mogo wasn't using dumb play-outs?

 We have seen a similar effect many times in MoGo. Often we try
 something that seems like it should improve the quality of the
 simulation player, but it makes the overall performance worse. It is
 frustrating and surprising! Has anyone else encountered this? 


Here is why this happens:

Let's think of this in the context of pruning moves, since controlling
the play-outs can be cast in this way.

Presumably, if you run 1000 random play-outs from a given position you
will get a fair indication of how good the position is.   

But what if you are able to prune out many of the bad moves in that
simulation?   Would this improve the accuracy of the simulation?   

Probably, but not necessarily.   Suppose that during the play-outs, you
are able to prune out 50% of the bad black moves, but only 30% of the
bad white moves? You would be playing 1000 simulations where BLACK
was playing consistently stronger, regardless of how good the actual
position was.   

If the chances were in fact pretty much even,  it would look as if black
had a big advantage.   If that color bias was consistent for that type
of position, building a UCT tree below it would not quickly fix the
problem.

The extra knowledge you impose is not impartial knowledge, it will work
better for one side than another, and for one position differently than
another.So even if the average quality of the play-outs improve each
position is responding differently to the extra knowledge making it more
difficult to compare one position to another.   

There is one other issue I have seen  that is similar.  Sometimes
Lazarus will play a move that doesn't hurt nor help it's position.
It's not a wasted move because the opponent must respond or else lose.
An example is a simple self-atari which itself is a direct threat.   The
opponent is forced to respond, so there is no reason not to try for the
cheap shot in his territory, but in the grand scheme of things this move
is a distraction and if you could remove them from the tree it would
help the program focus on what is really important.However,  it
sometimes pays to try moves like these.   When I fixed this problem in
Lazarus, it started winning less against weaker programs simply because
they sometimes fail to defend.

I imagine this can happen in more sophisticated contexts, where certain
moves could be very effective in exploiting more naive (but not totally
stupid) programs.  In such a case, an improvement could make your
program appear weaker.  I don't think this is merely academic because I
model skill not as how many good moves you play, but as how many bad
moves you avoid playing.   In other words there is no such thing as a
good move - there are only bad moves.   So you want to present your
opponent with opportunities to play them.  

In chess it works the same, if you are playing a weak opponent it's
really stupid to trade quickly into a drawn ending just because the
position is even.  


- Don






___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Re: Explanation to MoGo paper wanted.

2007-07-05 Thread Dave Dyer

One of my favorite observations about Go is that expert play tends
to be on the edge of catastrophy.  

By playing better moves on the average, you become more vulnerable
to the occasional misstep.

If a program is not very good, random better or worse moves do not have
much effect.  If the program improves, it's closer to the edge, and 
random missteps are more likely to make it fall off the cliff.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Re: Explanation to MoGo paper wanted.

2007-07-05 Thread Dave Dyer

One of my favorite observations about Go is that expert play tends
to be on the edge of catastrophy.  

By playing better moves on the average, you become more vulnerable
to the occasional misstep.

If a program is not very good, random better or worse moves do not have
much effect.  If the program improves, it's closer to the edge, and 
random missteps are more likely to make it fall off the cliff.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Explanation to MoGo paper wanted.

2007-07-05 Thread David Doshay
We have encountered this consistently in our non-MC/UCT program.  
Things that fix an obvious problem lead to unintended consequences  
that sometimes take weeks to tease apart. So far we have been able to  
understand how this comes about in each situation, but still have  
little ability to predict in advance when the fix will simply work  
(and thus improve play) and when it will lead to the unintended  
consequences that result in worse play.



Cheers,
David



On 5, Jul 2007, at 9:50 AM, David Silver wrote:

Often we try something that seems like it should improve the  
quality of the simulation player, but it makes the overall  
performance worse. It is frustrating and surprising! Has anyone  
else encountered this?


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Explanation to MoGo paper wanted.

2007-07-05 Thread Yamato
There is one other issue I have seen  that is similar.  Sometimes
Lazarus will play a move that doesn't hurt nor help it's position.
It's not a wasted move because the opponent must respond or else lose.
An example is a simple self-atari which itself is a direct threat.   The
opponent is forced to respond, so there is no reason not to try for the
cheap shot in his territory, but in the grand scheme of things this move
is a distraction and if you could remove them from the tree it would
help the program focus on what is really important.However,  it
sometimes pays to try moves like these.   When I fixed this problem in
Lazarus, it started winning less against weaker programs simply because
they sometimes fail to defend.

And is that version stronger against higher-level programs?
Losing against weaker programs might be the cost that we should pay
temporarily.
I think one of the problems is in testing. Currently we have almost
no way to judge whether a improvement is good or bad, other than
playing a lot of games against GNU Go. It takes very long time and
seems inefficient. Moreover, even it may not be a very good method.
GNU Go often cannot respond to an obvious bad move correctly, so
pruning such moves decrease the winning rate.

--
Yamato
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Explanation to MoGo paper wanted. (BackGammonCode)

2007-07-04 Thread chrilly
Thanks, the dictionary is really great.

Chrilly
  - Original Message - 
  From: David Silver 
  To: computer-go@computer-go.org 
  Sent: Tuesday, July 03, 2007 11:29 PM
  Subject: [computer-go] Re: Explanation to MoGo paper wanted. (BackGammonCode)


 It's because Go is not only game in the world and certainly not only
 reinforcement learning problem. They are using a widely accepted
 terminology.

But a very inappropriate one. I have read Suttons book and all the things I 
know (e.g. TD-Gammon) are completly obfuscated. Its maybe suitable to 
present generel concepts, but it is extremly complicated to formulate an 
algorithm in this framework.


  Here is quick and dirty RL-Computer Go translation kit to try and help 
bridge the gap!


  RL terminology  Go terminology


  State   Position
  Action Move
  Reward Win/Loss
  Return Win/Loss
  Episode Game
  Time-step One move
  Agent   Program
  Value function  Evaluation function
  Policy Player
  Default policy  Simulation player
  Uniform random policy   Light simulation player
  Other stochastic policy Heavy simulation player
  Greedy policy   1-ply search player
  Epsilon-greedy policy   1-ply search player with some random moves   
  FeatureFactor used for position 
evaluation
  Weight  Weight of each factor in evaluation function
  Tabular representation  One weight for each complete position
  Partial tabular UCT tree
  representation
  State abstraction   One weight for many positions
  Linear value function   Evaluation function
  approximation  using weighted sum of various factors
  Feature discovery   Learning new factors for the evaluation function
  Sample-based search Simulation (Monte-Carlo methods, etc.)
  Transition function Rules of the game
  Environment Rules of the game + opponent
  Trajectory  Move sequence
  Online  During actual play
  Offline Before/after actual play (e.g. preprocessing)
  On-policy   If both players play as normal
  Off-policy  If either player behaves differently


  -Dave 




--


  ___
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Re: Explanation to MoGo paper wanted. (BackGammonCode)

2007-07-04 Thread Brian Slesinsky

I wonder whether the use of games as a metaphor would make general
machine learning concepts more easily understood by non-specialists?
That is, if you took a machine learning paper and rewrote it in terms
of games, would that make it easier or harder to understand for people
unfamiliar with both game programming and machine learning?

Of course, this is hard to tell once you've learned either vocabulary.
It's easy to assume that whatever jargon you know (or invent) is
inherently easier to understand.

- Brian
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Re: Explanation to MoGo paper wanted. (BackGammon Code)

2007-07-03 Thread David Silver


 It's because Go is not only game in the world and certainly not only
 reinforcement learning problem. They are using a widely accepted
 terminology.

But a very inappropriate one. I have read Suttons book and all the  
things I

know (e.g. TD-Gammon) are completly obfuscated.


Really? I think it is a wonderful example of clear thinking and clear  
writing - I couldn't put the book down. It is the reason I chose to  
study RL, and to come study with Rich Sutton.



Its maybe suitable to
present generel concepts, but it is extremly complicated to  
formulate an

algorithm in this framework.


Of course everyone hopes that ideas will be presented to them in  
their personal terminology, as it saves them some effort. But we make  
progress in science by unifying and identifying the general concepts.


But the main point is: I think game programmers should be more  
proud of

their work and should present their results in the language of game
programming. We are the ones which make progress, not these paper  
tigers.


Isn't there room for both? Shouldn't we present our work within our  
own community, but also make efforts to share our ideas with others?


-Dave

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Re: Explanation to MoGo paper wanted.

2007-07-03 Thread chrilly
2. We want other communities to find out about UCT, and start using it many 
different domains. It is not just a Go-programming algorithm!

Yes. I think the idea has many potential fields of application. In the samewhat 
dated book R.Epstein: The of Gambling and Statistical Logic the simple 
algorithm Play an arm as long as he is winning is proposed.
But does not help too much. E.g. J.Schaeffer invented the History-Heuristik. 
This was long before jump-prediction become an important topic in 
microprocessor design. The first jump-predictions where static rules. After 
some years the hardware-designers invented then their own history-heuristic for 
jumps. 

ideas: 
http://www.cs.ualberta.ca/~silver/research/presentations/files/sylvain-silver.pdf
Thanks.

Actually I think the best notation would be: description in plain text + 
mathematical notation + pseudocode + many diagrams. But in a conference paper 
we have just 8 pages to describe everything, so we must make some compromises. 

Yes, I fully agree.

Why do you call this UCT if there is no tree? Isn't this just roll-out 
simulation, as used by Tesauro and Galperin in 1996?
Its the selection rule for making the rollouts at the root. This is a variant 
of UCT. Exactly its the multiarmed-bandit selection rule.
It differs e.g from the rule proposed by Epstein.

Chrilly
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Re: Explanation to MoGo paper wanted. (BackGammonCode)

2007-07-03 Thread chrilly
Isn't there room for both? Shouldn't we present our work within our own 
community, but also make efforts to share our ideas with others?

Yes, I do this by writing popular articles about computer-chess and games 
programming. 
The point of concern is: One is only considered important if one considers ones 
one work as important. Sometimes I have the feeling that academic researchers 
are a little bit ashamed that they do not do something more serious, important. 
And they hide then their work behind a more serious title/topic and vocabulary.
E.g. J. Schaeffer  Donsky wrote Falling from Grace. Both made important 
contributions to computer-chess. But in this article they blamed themself, that 
its their own (and the communities) fault that they have fallen from AI-Grace. 
But isn't it the problem of AI when the concepts do not work? Why didn't they 
wrote an article The concepts of AI are bullshit? Feng Hsu was the first one 
who did this. He was proud enough about his work.

Chrilly
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Re: Explanation to MoGo paper wanted. (BackGammonCode)

2007-07-03 Thread steve uurtamo
the language of mathematics is perhaps the most universal language for
computer scientists.  pseudocode comes in somewhere after that, and well-known
algorithms probably somewhere inbetween.  game programming is an application
of computer science, and the language of game programming isn't necessarily
appropriate (and would seem obtuse) to the much larger audience of potential 
readers
outside of its domain.  whenever an algorithm is applicable outside of the game
programming field, rephrasing its game-specific language might make the most
sense to the readers who are intended to read about it.

that having been said, the most appropriate language is obviously that which is
understandable by the largest number of potential readers interested in the 
title and
(if there is one) the abstract.

s.




   

Need a vacation? Get great deals
to amazing places on Yahoo! Travel.
http://travel.yahoo.com/___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Re: Explanation to MoGo paper wanted.(BackGammonCode)

2007-07-03 Thread chrilly
the language of mathematics is perhaps the most universal language for
computer scientists.  pseudocode comes in somewhere after that, and well-known
algorithms probably somewhere inbetween.  game programming is an application
of computer science, and the language of game programming isn't necessarily
appropriate (and would seem obtuse) to the much larger audience of potential 
readers
outside of its domain.  whenever an algorithm is applicable outside of the game
programming field, rephrasing its game-specific language might make the most
sense to the readers who are intended to read about it.

that having been said, the most appropriate language is obviously that which is
understandable by the largest number of potential readers interested in the 
title and
(if there is one) the abstract.

s.

I have a PhD in Mathematical statistics. So I am not at all against the use of 
Mathematics. I think the language should be choosen which is most appropriate.
For some mathematical proofs about the Big-O behaviour of algorithms there is 
no other language than mathematics. But for describing algorithms this notation 
is not suited.

D.Knuth choose in the Art of Computer Programming structured English and for a 
precise analysis MIX. His argument for MIX is, that he writes books for 
eternity. Therefore he can not use the latest fashion in programming 
language. There is some reason behind this argument, but I think that only a 
few programmers can read nowadays MIX. MIX does also not reveflect the 
capabilities of modern hardware. Knuth has therefore to rewrite his books in 
MMIX (Inschallah). Maybe  pseudo-Algol would have been more ethernal than 
MIX. 
But in any case he uses different levels of notation.

Chrilly
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Re: Explanation to MoGo paper wanted. (BackGammon Code)

2007-07-03 Thread David Silver

 It's because Go is not only game in the world and certainly not only
 reinforcement learning problem. They are using a widely accepted
 terminology.

But a very inappropriate one. I have read Suttons book and all the  
things I

know (e.g. TD-Gammon) are completly obfuscated. Its maybe suitable to
present generel concepts, but it is extremly complicated to  
formulate an

algorithm in this framework.


Here is quick and dirty RL-Computer Go translation kit to try and  
help bridge the gap!


RL terminology  Go terminology

State   Position
Action  Move
Reward  Win/Loss
Return  Win/Loss
Episode Game
Time-step   One move
Agent   Program
Value function  Evaluation function
Policy  Player
Default policy  Simulation player
Uniform random policy   Light simulation player
Other stochastic policy Heavy simulation player
Greedy policy   1-ply search player
Epsilon-greedy policy   1-ply search player with some random moves
FeatureFactor used for  
position evaluation

Weight  Weight of each factor in evaluation function
Tabular representation  One weight for each complete position
Partial tabular UCT tree
representation
State abstraction   One weight for many positions
Linear value function   Evaluation function
approximation  using weighted sum of various factors
Feature discovery   Learning new factors for the evaluation function
Sample-based search Simulation (Monte-Carlo methods, etc.)
Transition function Rules of the game
Environment Rules of the game + opponent
Trajectory  Move sequence
Online  During actual play
Offline Before/after actual play (e.g. preprocessing)
On-policy   If both players play as normal
Off-policy  If either player behaves differently

-Dave

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/