Hi, Marc,
You may want to take a look at my papers, they are kind of related to these
ideas (although I didn't yet change the team on the fly, as you are
suggesting):
* http://teamcore.usc.edu/people/sorianom/aaai14.pdf
* http://teamcore.usc.edu/papers/2013/ijcai13.pdf
* http://teamcore.usc.edu/people/sorianom/AAMAS2011.pdf
Concerning changing the settings dynamically, my most recent work may help
(when using voting):
* http://teamcore.usc.edu/papers/2015/aamas2015.pdf
Let me know if you have questions/comments... :)
Best,
Leandro
On Sun, Apr 26, 2015 at 7:59 AM, Marc Landgraf mahrgel...@gmail.com wrote:
Heya,
I lately tried to think about, whether it would be possible to combine the
strengths of different bots, or at least different parameter sets/bias
systems for one bots in some way. They may shine at different
situations/phases during the game, but how to figure out, which one is
currently the better one?
What I now came up with, was the following:
For simplicity we assume for now, that our different bots are using the
same playouts, but different approaches during the tree phase. So maybe
they use different ways to bias nodes, different selection formulas etc.
Gonna focus on them using different bias systems.
Now you split up your playouts in percentiles:
25%: Bot1 selects the white moves, Bot2 the black ones
25%: the other way around
0x%50%: Bot1 selects for both.
50%-x%: Bot2 selects for both.
You track the win rates of those first 2 quarters separately and also
calculate winrate of Bot1 vs Bot2.
Now if the bots are identical obviously both should win 50%. But if the
bots are different, you may see different results.
E.g. when Bot1 wins 55% of his games, his move selection is probably
better then Bot2's move selection. Here you have to be careful about wrong
conclusions, because if you would setup a depth-first bot vs a width-first
you would certainly also get win rates heavily in favor of the depth-first.
But where this could shine is, when using different bias systems. Because
it actually tells you, which bias system is doing better in the current
board situation.
Now you can use that knowledge to calculate x. E.g. if either bot wins
60%+ he gains all 50% of the remaining playouts, and let the balance slide
linearly, if the win rate is 40-60% for the bots against each other. (use
whatever formula here, open for testing)
This should enable you to figure out on the fly, which bias system is
doing a better job at the current situation, while doing playouts. You are
just tracking some additional stats.
Of course, there are pros and cons to this method:
+ In general, switching selection method in the tree should not cost any
time, and tracking those additional stats also costs close to no time. Only
additional time used comes from using a second bias function or similar
(because now you have to calculate the bias twice, for most nodes)
- At the same time, the amount of data is actually increasing, because you
have to track the stats for the different bots. This may cause memory
issues! (but when using a distributed memory solution anyway, it does not
create additional data, if each memory/thread unit is assigned to one of
those percentiles of playouts)
+- In general the costs increase depends on how different the 2 Bots are.
If they are the same, there would be basically no cost.
+ Allows you to figure out, which bot/bias/selection is doing better right
now
- but may lead to false conclusions, like above mentioned depth vs width
example
+- As long as both bots are of similar strength, you should not lose from
using this kind of system. Worst case is, that you play the wrong bots
move, if you had above mentioned false conclusions. But when they are close
in strength, that is not worse than using just one bot all the time. Of
course, if one bot is dominating the other one in all situations, you are
losing quality, when figuring out again and again, that this is the case.
(because in 50% of the playouts, half of the moves were selected by the
worse bot)
So some quick ideas how it could be modified further:
- all percentages are obviously placeholders and could be adjusted (even
dynamically)
- assuming you have a low cost and a very high time cost bias function,
you can actually check, if it is worth using the high cost bias function,
or if the board situation is simple enough to churn out more playouts using
the cheaper function.
- identifying certain game situations: you may know, that one bias is
doing much better in corner fights or liberty races, so if that one is
dominating, you are probably in such a situation and can now adjust further
(e.g. modify playouts, add additional routines, whatever)
- using more then 2 routines, possibly in conjunction with the idea above
to identify situations.
Of course, one could now also think how to expand this idea, with using
different playouts, but I'm not sure how to judge,