[computer-go] UCT RefBot

2008-11-20 Thread Mark Boon
After getting some strange results with my experiments I realised  
that doing these tests on the reference-bot is probably not  
appropriate. The MC+AMAF combination is a completely different beast  
than MC+UCT-search.


So I need to do the testing on a reference implementation for MC+UCT- 
search. People have mentioned here before they'd like to see  
something like that. I have something like a reference implementation  
that I use for my own experiments. But I'm hardly an expert on either  
MC or UCT so I'm a bit hesitant to propose a reference  
implementation. But maybe I can make a start. At least we do have a  
good standard definition of a reference MC algorithm. So I'm not  
going to repeat that here apart from saying it plays uniformly random  
simulations of moves that are legal and don't fill in an eye.


That leaves the UCT part, so let me take a stab at it:

- Let's say we're going to limit the search to N Monte-Carlo  
simulations.
- For each simulation we need to determine which node in the tree to  
expand. This involves a recursive definition starting from the root- 
node. If a node is incomplete it is expanded further with the next  
move, the selection of which is done randomly by the MC algorithm. (A  
node is 'complete' when it has all the moves that the MC algorithm  
can generate as its children. I define this the case when the MC  
algorithm produces a pass-move. See below.)
- If a node is 'complete', it tries to (recursively) expand the child  
with the best UCT value. This value is a combination of the win-loss  
ratio recorded in that node plus the UCT value. So the values used  
for comparison are based on the following formula:


	wins / visits + exploration-factor * sqrt( 2 * (log(parent-visits) /  
(10* visits))


- I'm actually using an exploration-factor of 1.0 at the moment and  
have no idea what is the best or standard value for light playouts.
- When the node to expand is selected, a random move is chosen for  
which it creates a new node. Next, provided the selected move is not  
a pass, a MC playout is performed. The win-visits ratio of the node  
just created is set to 1/1 or 0/1 depending on the playout result. It  
also updates the win-visits ratio of its parent(s) by the same amount  
until it reaches the root-node of the tree.
- When the randomly selected move is a pass it doesn't expand the  
tree unless the game is nearly finished. The node is set to  
'complete'. The game is considered 'nearly finished' when all empty  
points have more than 2 occupied neighbors. In that case it counts  
the score and sees if the side that passes would win the game. If it  
does win it updates the wins-visits ratio, otherwise it doesn't do  
anything.
- When it reaches N simulations, the child of the root-node with the  
best wins-visits ratio is played. I've also seen that simpy the child  
with the highest number of visits is selected. Is there a difference?
- Towards the end of the game, it may not be possible to do N  
expansions of the tree. To prevent an endless loop, I record how many  
times it failed to expand the tree and exit the loop when it exceeds  
a certain number (currently set to 1000).



The treatment of pass feels a bit convoluted. The idea behind it is  
that I don't want to allow pass-moves in the tree early in the game.  
But towards the very end of the game I want to allow pass to avoid  
being forced to make a bad move or not being able to make a move at all.


I'd also like to hear opinions on what would be a good N for a  
reference bot. If I set N to 2,000, just like the reference-bots on  
CGOS, it plays rather poorly. Much worse than the ref-bot. I haven't  
tested it a lot yet, but I think the break-even point against the MC- 
AMAF ref bot would be somewhere around N=10,000.


Any opinons, ideas or criticism are welcome.

Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] UCT RefBot

2008-11-20 Thread Magnus Persson
I could find anything problematic with your specification so I just  
make some comments.


Quoting Mark Boon [EMAIL PROTECTED]:


- When it reaches N simulations, the child of the root-node with the
best wins-visits ratio is played. I've also seen that simply the child
with the highest number of visits is selected. Is there a difference?


The following can happen if you chose the move with the best  
wins-visits ratio: If the search is very selective (for example using  
progressive widening or AMAF (RAVE)) a move can be searched a small  
number of times and get a really good ration and selected. In Valkyria  
which uses both methods of selectivity there is always moves that have  
higher win-visits ratios than the most searched move (19x19).


If you do plain MC and UCT I think win-visits may be just as good as  
highest number of visits, because the search will be quite wide.


Using the highest number of visits is sort of robust. Even if you know  
the move is not as good as it initially seemed, you sort of know that  
one has to search deep to see this and maybe the opponent cannot see  
the refuting move. Also the reason for the low win-visits ratio may be  
that problems in the position will give a low ratio for all moves if  
they are searched deep enough. This is by definition true in any lost  
position. If you can prove that all moves are losing the move that  
required the largest tree for the proof is the candidate for provoking  
a mistake.


Valkyria plays fast whenever win-visits and highest number of visits  
agree and slow otherwise, in an attempt to eliminate uncertainty.



I'd also like to hear opinions on what would be a good N for a
reference bot. If I set N to 2,000, just like the reference-bots on
CGOS, it plays rather poorly. Much worse than the ref-bot. I haven't
tested it a lot yet, but I think the break-even point against the
MC-AMAF ref bot would be somewhere around N=10,000.


What about doing a MC-AMAF-UCT version. Or perhaps just simply try a  
MC-AMAF-TS in a best win-visits ratio first manner?


Best
Magnus
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] UCT RefBot

2008-11-20 Thread Michael Williams

Mark Boon wrote:
The treatment of pass feels a bit convoluted. The idea behind it is that 
I don't want to allow pass-moves in the tree early in the game. But 
towards the very end of the game I want to allow pass to avoid being 
forced to make a bad move or not being able to make a move at all.



You could simply allow the pass all the time.  Early in the game, it will be a significantly inferior move and not often explored in the tree.  That may not be 
optimal, but it's certainly not convoluted and you are guaranteed to never fail to generate the pass move when you needed it.



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] UCT RefBot

2008-11-20 Thread Mark Boon

Thanks for the comments Magnus.

On 20-nov-08, at 13:00, Magnus Persson wrote:




I'd also like to hear opinions on what would be a good N for a
reference bot. If I set N to 2,000, just like the reference-bots on
CGOS, it plays rather poorly. Much worse than the ref-bot. I haven't
tested it a lot yet, but I think the break-even point against the
MC-AMAF ref bot would be somewhere around N=10,000.


What about doing a MC-AMAF-UCT version. Or perhaps just simply try  
a MC-AMAF-TS in a best win-visits ratio first manner?




To start off, I wanted to keep things simple.

But to be honest, I hadn't given AMAF (or RAVE?) in combination with  
tree-search much thought yet. Only yesterday did I look up an article  
describing RAVE and it's actually not entirely clear to me yet how  
this would be best implemented. I also did a little searching for  
past posts to this mailing-list but it did little to clarify things  
in my mind.


The way I understood the article, after a playout it updates all the  
nodes at the current level of all the moves played during the playout  
(if it's a win for the player) with a RAVE value that is used in a  
similar fashion to the UCT value of a node. Only of the current node  
does it update the win-visit ratio. Is that correct? This implies  
creating a lot more nodes than I'm currently doing. I have seen  
remarks that others postpone expanding a node until a certain number  
of simulations have been done. I never quite understood the need for  
that, but maybe this has to do with the fact that AMAF requires a  
much larger number of node-creations?


The way I had implemented my UCT-search it became very intuitive how  
to make the MC playout strategy prioritize certain moves in case of  
heavy (or semi-light) playouts and needed zero modirfications to the  
actual search algorithm. I'd probably need to completely review  
things when a RAVE value starts to influence move-priority 'a priori'.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] UCT RefBot

2008-11-20 Thread Mark Boon


On 20-nov-08, at 13:22, Michael Williams wrote:


Mark Boon wrote:
The treatment of pass feels a bit convoluted. The idea behind it  
is that I don't want to allow pass-moves in the tree early in the  
game. But towards the very end of the game I want to allow pass to  
avoid being forced to make a bad move or not being able to make a  
move at all.



You could simply allow the pass all the time.  Early in the game,  
it will be a significantly inferior move and not often explored in  
the tree.  That may not be optimal, but it's certainly not  
convoluted and you are guaranteed to never fail to generate the  
pass move when you needed it.




I remember that's how I implemented it initially. I do not remember  
the exact details of why I moved away from that, but I believe that  
(in part) it had to do with the program occasionally passing too  
early in the game.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] UCT RefBot

2008-11-20 Thread Michael Williams
You could simply allow the pass all the time.  Early in the game, it 
will be a significantly inferior move and not often explored in the 
tree.  That may not be optimal, but it's certainly not convoluted and 
you are guaranteed to never fail to generate the pass move when you 
needed it.




I remember that's how I implemented it initially. I do not remember the 
exact details of why I moved away from that, but I believe that (in 
part) it had to do with the program occasionally passing too early in 
the game.



When you say too early, do you mean before the game is decided or before all 
dead stones are removed.  Only the first one is truly too early.
In other words, there is no harm in leaving a dead stone on the board given 
that it does not impact who the winner is.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] UCT RefBot

2008-11-20 Thread Mark Boon


On 20-nov-08, at 14:03, Michael Williams wrote:

You could simply allow the pass all the time.  Early in the game,  
it will be a significantly inferior move and not often explored  
in the tree.  That may not be optimal, but it's certainly not  
convoluted and you are guaranteed to never fail to generate the  
pass move when you needed it.


I remember that's how I implemented it initially. I do not  
remember the exact details of why I moved away from that, but I  
believe that (in part) it had to do with the program occasionally  
passing too early in the game.



When you say too early, do you mean before the game is decided or  
before all dead stones are removed.  Only the first one is truly  
too early.
In other words, there is no harm in leaving a dead stone on the  
board given that it does not impact who the winner is.




The problem is there were large interval's while working on it. So  
sometimes I don't exactly remember why I made certain decisions. (The  
older you get, the more you learn how important it is to document  
these kind of decisions ;-( )


I only remember being particularly annoyed by it. So possibly it  
occasionally passed too early in both cases you mentioned. It's also  
possible that it was a solution to a problem that doesn't exist  
anymore due to other changes / improvements in the code made in the  
meantime.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] UCT RefBot

2008-11-20 Thread Magnus Persson

Quoting Mark Boon [EMAIL PROTECTED]:


Thanks for the comments Magnus.

On 20-nov-08, at 13:00, Magnus Persson wrote:

The way I understood the article, after a playout it updates all the
nodes at the current level of all the moves played during the playout
(if it's a win for the player) with a RAVE value that is used in a
similar fashion to the UCT value of a node. Only of the current node
does it update the win-visit ratio. Is that correct? This implies
creating a lot more nodes than I'm currently doing. I have seen remarks
that others postpone expanding a node until a certain number of
simulations have been done. I never quite understood the need for that,
but maybe this has to do with the fact that AMAF requires a much larger
number of node-creations?



I think you understand the basic principle of RAVE correctly. The hard  
part is how to weight together the AMAF value (which I call *virtual  
win-visit* ratio) with the actual win-visit ratio. And I have to admit  
that my implementation of RAVE in Valkyria might be a quite unique  
interpretation of the somewhat hard to understand information that has  
been posted here. I just implemented something that seemed to work and  
tested the parameters thoroughly. But I think I need to do new tests  
since there has been so many changes. Especially I should do test on  
large boards.


Valkyria postpones expansion of new nodes about 10 times, and an  
expansion includes all children of a position. This is necessary in  
order to update virtual win-visit scores for all moves played of the  
color for the current simulation even if there is only one. Thus a  
node in the tree is a list of children for a given position. This  
simplifies code that bias the tree search, since all necessary  
algorithms are executed simultaneously for all children during  
expansion. This is perhaps not efficient, but is perhaps less prone to  
bugs.


The effect of RAVE with a complete expansion is that after a few  
simulations (let us say 40) we have a rather good sample of virtual  
win-visit scores for all moves. Also if patterns biases exploration  
all real win-visit scores has been collected for those moves (often a  
single move) that are most likely to be the best. Thus the win rate  
for this position is already close to that of the best move. This is  
different from pure MC-UCT where all candidate moves are searched once  
and the win-rate initially is the average of all moves, bad as well as  
good ones. It takes a lot of time until MC-UCT will converge on to the  
winrate of the best move in each position.


Simply put combining virtual and real win-visit values for searching  
the tree lets us safely ignore the worst moves in all positions and  
the search basically only searches moves that our pattern heuristics  
predicted to be good or moves that were discovered thanks to AMAF.


The search is very selective when it works. And of course it often  
misses the best move. But it kind of works because a) often the second  
best move wins as well b) the selective search searches so deep  
effectively so it corrects itself quickly.


Magnus

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] UCT RefBot

2008-11-20 Thread Mark Boon


On 20-nov-08, at 14:03, Michael Williams wrote:

You could simply allow the pass all the time.  Early in the game,  
it will be a significantly inferior move and not often explored  
in the tree.  That may not be optimal, but it's certainly not  
convoluted and you are guaranteed to never fail to generate the  
pass move when you needed it.


I remember that's how I implemented it initially. I do not  
remember the exact details of why I moved away from that, but I  
believe that (in part) it had to do with the program occasionally  
passing too early in the game.



When you say too early, do you mean before the game is decided or  
before all dead stones are removed.  Only the first one is truly  
too early.
In other words, there is no harm in leaving a dead stone on the  
board given that it does not impact who the winner is.




I'm looking at my code and it's coming back to me. When deciding  
which node to expand, which is a recursive process, I stop when  
encountering two passes in a row. Instead of expanding the tree in  
that case, I just determine the winner and update the win-visits  
numbers in the tree. I believe I did this to allow for early  
termination of the search. Without this, it would continue the search  
until the time-allowance was depleted. This would cause a great waste  
of thinking time towards the end of the game when the maximum  
possible search-tree is much smaller than N.


But with the termination on two consecutive passes, what would  
occasionally happen is that one side would pass and the simulation  
would return a win. Then the other side could pass as well and based  
on the number of stones still on the board, now the second player  
would win. This caused problems. Therefore I decided only to allow a  
pass towards the end of the game when it would also win when the  
opponents stones still on the board are all considered alive.


I hope that makes sense :)

Mark



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] UCT RefBot

2008-11-20 Thread Magnus Persson

About allowing early passes.

I think my problem is that RAVE/AMAF really say nothing about the  
value of pass moves, which makes it problematic when selective search  
do not search bad moves such as pass. So how are we supposed to know  
when to search pass and not?


Thus, everytime I had some kind of design flaw or a bug in Valkyria it  
seemed to play early passes because of some weird interaction in my  
code.


So allowing passes early seemed to be a good bug catcher... on the  
other hand it also seemed to cause bugs for me. I do not even remember  
anymore how Valkyria handles pass because I changed it so many times  
and had to add a lot of kludges to avoid problems with older kludges  
and so on...



--
Magnus Persson
Berlin, Germany
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] UCT RefBot

2008-11-20 Thread Mark Boon


On 20-nov-08, at 14:42, Magnus Persson wrote:


I think you understand the basic principle of RAVE correctly. The  
hard part is how to weight together the AMAF value (which I call  
*virtual win-visit* ratio) with the actual win-visit ratio. And I  
have to admit that my implementation of RAVE in Valkyria might be a  
quite unique interpretation of the somewhat hard to understand  
information that has been posted here. I just implemented something  
that seemed to work and tested the parameters thoroughly. But I  
think I need to do new tests since there has been so many changes.  
Especially I should do test on large boards.


Valkyria postpones expansion of new nodes about 10 times, and an  
expansion includes all children of a position. This is necessary in  
order to update virtual win-visit scores for all moves played of  
the color for the current simulation even if there is only one.  
Thus a node in the tree is a list of children for a given position.  
This simplifies code that bias the tree search, since all necessary  
algorithms are executed simultaneously for all children during  
expansion. This is perhaps not efficient, but is perhaps less prone  
to bugs.


The effect of RAVE with a complete expansion is that after a few  
simulations (let us say 40) we have a rather good sample of virtual  
win-visit scores for all moves. Also if patterns biases exploration  
all real win-visit scores has been collected for those moves (often  
a single move) that are most likely to be the best. Thus the win  
rate for this position is already close to that of the best move.  
This is different from pure MC-UCT where all candidate moves are  
searched once and the win-rate initially is the average of all  
moves, bad as well as good ones. It takes a lot of time until MC- 
UCT will converge on to the winrate of the best move in each position.


Simply put combining virtual and real win-visit values for  
searching the tree lets us safely ignore the worst moves in all  
positions and the search basically only searches moves that our  
pattern heuristics predicted to be good or moves that were  
discovered thanks to AMAF.


The search is very selective when it works. And of course it often  
misses the best move. But it kind of works because a) often the  
second best move wins as well b) the selective search searches so  
deep effectively so it corrects itself quickly.




Again, thanks for sharing. So it seems by and large I understood it  
in a similar fashion as you did :-)


What is not exactly clear to me is what you mean by 'postponing  
expansion'. Let me write it in my own words to see if that's what you  
mean. When you have selected a best node based on the UCT + wins/ 
visits value which has no children yet, you first simply do a  
simulation and collect the playout result in the current node,  
including the AMAF value that you call 'virtual win-visit ratio, and  
only when that is done a certain number of times (in your case 10) do  
you suddenly create all the children and weight them based on the  
virtual win-visit ration and possibly weight them based on other move- 
priorities that resulted from 'heavy' playout selection?


When thinking about it this way it makes sense. I already had a  
nagging feeling about the requirement of visiting each move at least  
once. Especially on 19x19 I could see that becoming a burden.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] UCT RefBot

2008-11-20 Thread Mark Boon


On 20-nov-08, at 14:51, Magnus Persson wrote:

and had to add a lot of kludges to avoid problems with older  
kludges and so on...


Hey, that sounds strangely familiar :)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] UCT RefBot

2008-11-20 Thread Mark Boon


On 20-nov-08, at 15:20, Magnus Persson wrote:


Quoting Mark Boon [EMAIL PROTECTED]:

What is not exactly clear to me is what you mean by 'postponing
expansion'. Let me write it in my own words to see if that's what you
mean. When you have selected a best node based on the UCT + wins/ 
visits

value which has no children yet, you first simply do a simulation and
collect the playout result in the current node, including the AMAF
value that you call 'virtual win-visit ratio, and only when that is
done a certain number of times (in your case 10) do you suddenly  
create
all the children and weight them based on the virtual win-visit  
ration

and possibly weight them based on other move-priorities that resulted
from 'heavy' playout selection?


Yes I suddenly create all children, but at the creation I have no  
simulation and thus no virtual win-visit ratios for the children  
(although one might copy such values from higher up in the tree,  
which I think the Mogo team tried but with little or no success).  
The virtual win-visit ratios are initialized to some default value.  
But one can initialize the ratios differently depending on static  
evaluation the position using patterns for example. Or the  
proximity heuristic. It is not clear to me how to best do this. And  
I need to test the parameters for biasing. The nice thing is that  
one can initialize the virtual win-visits ratios and keep real win- 
visits ratios unbiased. You can afford to make mistakes here  
because if the position is searched a lot the virtual values get  
data much quicker than the real ones.




OK, things start to fall in place for me. I was wondering all this  
time what would happen with the information of the simulations that  
happen before expansion. So the answer is: nothing. But at least the  
result of the playout gets percolated up the tree, so I suppose you  
get the most important bit of information of the playout in your tree  
anyway. I have also been wondering before what was meant when I read  
that CrazyStone uses ownership information in the patterns (if I  
remember that correctly). I was always wondering how you got  
ownership information before doing playouts. So it turns out these  
come from simulations done before expansion. Somehow I had  
disregarded that idea because I was under the assumption it would  
discard too much information. But I suppose keeping the playout  
results in the tree is valuable enough by itself. And like you say,  
you win back a lot by not having to do at least a playout for every  
single move at every level in the tree.


From the beginning I have been experimenting with ownership  
information and I think I have a few interesting ideas. But they  
didn't fit in yet. Maybe with my new understanding I can go back and  
try out a few of those ideas again. It means rewriting my search-code  
a bit. Hopefully it's not too much work.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Re: UCT RefBot

2008-11-20 Thread Claus Reinke
My first monte carlo programs used ownership info very effectively,  but it 
can be that by using 
AMAF this information is used already.

As a relative beginner in these matters, the more I look at AMAF,
the less I like it, and I think that has to do with AMAF ignoring
relative sequencing. By looking at all moves as if they were played
first, it ignores that some moves only make sense after certain other
moves. I haven't yet been able to pin this down with evidence,
because the the game is lost, all moves are equally bad effect
tends to obscure it (I guess I need to insert resign moves at the
right places to limit the game records to the interesting pieces, even
if that means my bot can't score the result).

If I ignore that the Go board is 2d (dimensions, not dans;-), and think
only of time (number of moves), space (board positions), and alternatives
(simulation runs), then ownership maps project this 3d spacetime multiverse
onto a 2d space histogram (likelihood of each position being black/white)
while AMAF tries to project these three dimensions onto a 2d time
histogram (likelihood of each move resulting in win/lose).

As the various tweaks (don't count moves by other player, don't count
moves already played, value early moves higher than later moves) and
remaining bot blunders demonstrate, that doesn't quite work right, in
spite of first appearances.

My (as yet unproven) hypothesis is that the issues stem from AMAF
trying to ignore the time dimension of its projection, hence you get
highly rated moves that place themselves into atari (because that would
have been the last, necessary, move to capture the opponent group,
*after* surrounding it), that would be suicide right now (but not at some
later point in many playouts), or even on an occupied position (no longer
occupied later in many playouts), that try to fill an opponent's eye (which
would make sense only if the second, nearly complete eye were prevented
first, but that doesn't always happen in the same way, hence lower rating
for each of the possible ways).

Giving priority to early moves alleviates this somewhat, as the experiments
show, but then you still don't get perfect loss or win results, even if the
board is effectively owned by one of the players, because the tweaks
have removed some data from the statistics to cover the worst holes.

There should be a variation of AMAF that records move win rates in
move sequences (these moves are good, but this one always comes
after that one; never try this one unless you are sure you can make that
one as well), preserving more of the structure in the time dimension
of its projection without just reproducing the whole 3d input data space.

Given that ownership maps don't suffer from this kind of trouble (unless
one collapses their information to score, which has been said to perform
badly), it is surprising that they are hard to make good use of. It might
just be a matter of asking different questions: AMAF variants try to
answer which move is likely best to make first; ownership maps answer
which positions are likely to be mine in the end. The latter is not as directly
useable, at least not as long as we are only asking about the next move.

That continues with UCT: apart from offering scaffolding for introducing
more Go knowledge, it also introduces a way of interleaving use of
playout information with continued generation of it, but it tends to do so
in a way biased towards move sequences. Again, not the kind of
question that ownership maps can offer answers to, but that just means
that there are other questions we are not asking yet.

Perhaps there is a complement to UCT/Amaf that achieves such
interleaving based on territorial information? One could allocate the
first half of simulation runs to acquiring an ownership map, then use
that information to narrow down the second half of simulations to
the more interesting parts of the board (perhaps less than half the
runs are needed to weigh the rest successfully?).

Claus

PS. Is there a location where one can find the current versions of
the plain Amaf reference bots? I seem to be reaching the stage
where issues I find are not always in my own code, but I don't
know whether, eg, JrefBot 081016-2022 is the latest version).



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: UCT RefBot

2008-11-20 Thread Mark Boon

Claus,

I think you are raising some very valid questions. I'm a bit  
ambivalent towards AMAF for very similar reasons. One thing in  
defense of AMAF though, is that it doesn't necessarily need to make  
Go-sense to be useful. MC simulations also don't make much Go-sense.  
For example, moves are generally rewarded highly just because they  
allow for the possibility of the other side of putting himself into  
atari with a big group. So they ignore the time dimension almost just  
as much as AMAF does. Yet evidence is very strong that the random MC  
playouts yield useful information despite the obvious shortcomings.  
For now I want to keep an open mind and allow for the possibility  
that AMAF does yield useful information as well. But I do foresee its  
usefulness is much more limited. For example, the most obvious way to  
improve MC playouts is to make the playouts smarter (and heavier) to  
avoid its pitfalls, like trying to avoid putting yourself into atari  
with a big group. To have a similar solution for AMAF's shortcomings  
is not so easy to see.


So possibly the yield of AMAF becomes more and more limited with  
smarter playouts. It's definitely worth it to keep monitoring that.  
Just now I ran a test of a few thousand games of just plain UCT  
against UCT+AMAF using 2,000 simulations. The win-rate was 53.3% vs.  
46.6% (± 0.9). So it seems AMAF made things a little worse, not  
better. But this was the first test-run after I finished implementing  
it, so I'm actually surprised it came this close. I'm sure I'll have  
a few bugs to fix before I can cast a final verdict. The other thing  
different is I now only expand the tree after 10 initial simulations  
whereas originally I always expanded immediately. It's possible that  
2,000 playouts is a bit low to overcome the initial inefficiency of  
doing 10 simulations before expanding. Lastly there's is the  
weighting of the RAVE value. Right now I use:

win/loss + uct-value + rave-value
where the rave-value is the accumulation of all the weighted win/loss  
ratio where the weight decreases according to the depth in the  
sequence the move was encountered in the same way it's done in the  
original ref-bot. This may well be a little too braindead a formula  
and at some point I'll have to think a bit what to do there.


What I'm going to do next is clean up my new implementation such that  
I can easily adjust two knobs. One controls the number of simulations  
before expansion and the other controls the weight of the rave-value  
in the formula above. In that case I should get exactly the same  
result as my original plain UCT implementation when I set both to  
zero. Only after that checks out will I start experimenting with  
other settings.


Mark

P.S. I got Don's javabot implementation here:  http:// 
cgos.boardspace.net/public/javabot.zip
Or you can get mine here: https://plug-and-go.dev.java.net/servlets/ 
ProjectProcess?tab=4stage=1


On 20-nov-08, at 23:34, Claus Reinke wrote:

My first monte carlo programs used ownership info very  
effectively,  but it can be that by using

AMAF this information is used already.


As a relative beginner in these matters, the more I look at AMAF,
the less I like it, and I think that has to do with AMAF ignoring
relative sequencing. By looking at all moves as if they were played
first, it ignores that some moves only make sense after certain other
moves. I haven't yet been able to pin this down with evidence,
because the the game is lost, all moves are equally bad effect
tends to obscure it (I guess I need to insert resign moves at the
right places to limit the game records to the interesting pieces, even
if that means my bot can't score the result).

If I ignore that the Go board is 2d (dimensions, not dans;-), and  
think
only of time (number of moves), space (board positions), and  
alternatives
(simulation runs), then ownership maps project this 3d spacetime  
multiverse
onto a 2d space histogram (likelihood of each position being black/ 
white)

while AMAF tries to project these three dimensions onto a 2d time
histogram (likelihood of each move resulting in win/lose).

As the various tweaks (don't count moves by other player, don't count
moves already played, value early moves higher than later moves) and
remaining bot blunders demonstrate, that doesn't quite work right, in
spite of first appearances.

My (as yet unproven) hypothesis is that the issues stem from AMAF
trying to ignore the time dimension of its projection, hence you get
highly rated moves that place themselves into atari (because that  
would

have been the last, necessary, move to capture the opponent group,
*after* surrounding it), that would be suicide right now (but not  
at some
later point in many playouts), or even on an occupied position (no  
longer
occupied later in many playouts), that try to fill an opponent's  
eye (which
would make sense only if the second, nearly complete eye were  
prevented

[computer-go] Selling a computer go program

2008-11-20 Thread Michael Gherrity

Hi,

I have read that the amount of money that a winning computer go  
program would make in a go tournament is insignificant compared to the  
amount of money that such a program would earn selling to the general  
public. I have also read that the biggest pirates of computer software  
come from Germany, the UK, and the US. The foreign exchange student we  
are hosting from Beijing China said that most people in China do not  
buy software, but download it for free off the net.


So what is true?

mike
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/