You are welcome. Figure 1 in [2] is the diagram I was thinking
of.
On 03-Nov-15 20:39, Tobias Pfeiffer
wrote:
This helps very much, thank you for taking the time to answer!
You might be looking for for "Combining Online and Offline
Knowledge in UCT" [1] by Gelly and Silver. Silver Tesauroreference
it in "Monte-carlo Simulation Balancing" [2] with "Unfortunately,
a stronger simulation policy can actually lead to a weaker
Monte-Carlo search (Gelly & Silver,
2007), a paradox that we explore further in this paper."
I'll make it a priority to read both papers in detail thank you!
If you meant another paper, someone else knows one I'm happy to
see more references.
Thanks!
Tobi
[1] http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf
[2] http://www.machinelearning.org/archive/icml2009/papers/500.pdf
You have to be careful what heuristics you apply. This was a
surprising result: using a playout policy which in itself is a
stronger go player can actually make MCTS/AMAF weaker. The
reason is that MCTS depends entirely on accurate estimations
of the value of each position in the tree. Any playout policy
which introduces a bias therefore weakens MCTS. It may
increase precision (lower standard deviation) but gives a less
accurate assessment of the value (an incorrect mean). Most
playouts at the moment (at least published ones) are based on
Remi's Mogo playout policy, which increases precision without
sacrificing accuracy.
There's a really nice diagram in one of David Silver's papers
illustrating the effect that bias can have on playouts. As
soon as you see it you understand the problem. Unfortunately I
don't have it to hand and have unfortunately run out of time
looking for it, otherwise I'd reference it. Hopefully somebody
else can give the reference. I suspect David probably
co-authored the paper in which case apologies to the other
author for not crediting them here!
I hope this helps
Regards
Raffles
On 03-Nov-15 19:38, Tobias Pfeiffer
wrote:
Hi everyone,
I haven't yet caught up on most recent go papers. If what I ask is
answered in one of these, please point there.
It seems everyone is using quite heavy playouts these days (nxn
patterns, atari escapes, opening libraris, lots of stuff that I don't
know yet, ...) - my question is how does that mix with AMAF/RAVE? I
remember from the early papers, that they said it'd be dangerous to do
it with non random playouts and that they shouldn't have too much logic.
Which, well, makes sense (to me) because the argument is that we play
random moves so they are order independent. With patterns that doesn't
hold true anymore.
What's the experience out there? Does it just still work? Does it not
matter because you just "warm up" the tree? Or do you need to be careful
with what heuristics you apply not too break RAVE/AMAF?
Thank you!
Tobi
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2016.0.7163 / Virus Database: 4457/10906 - Release Date: 10/28/15
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go
--
www.pragtob.info
|
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go