You are welcome. Figure 1 in [2] is the diagram I was thinking of.

On 03-Nov-15 20:39, Tobias Pfeiffer wrote:
This helps very much, thank you for taking the time to answer!

You might be looking for for "Combining Online and Offline Knowledge in UCT" [1] by Gelly and Silver. Silver Tesauroreference it in "Monte-carlo Simulation Balancing" [2] with "Unfortunately, a stronger simulation policy can actually lead to a weaker Monte-Carlo search (Gelly & Silver, 2007), a paradox that we explore further in this paper."

I'll make it a priority to read both papers in detail thank you! If you meant another paper, someone else knows one I'm happy to see more references.

Thanks!
Tobi


[1] http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf
[2] http://www.machinelearning.org/archive/icml2009/papers/500.pdf


On 03.11.2015 21:03, robertfinkng...@o2.co.uk wrote:
You have to be careful what heuristics you apply. This was a surprising result: using a playout policy which in itself is a stronger go player can actually make MCTS/AMAF weaker. The reason is that MCTS depends entirely on accurate estimations of the value of each position in the tree. Any playout policy which introduces a bias therefore weakens MCTS. It may increase precision (lower standard deviation) but gives a less accurate assessment of the value (an incorrect mean). Most playouts at the moment (at least published ones) are based on Remi's Mogo playout policy, which increases precision without sacrificing accuracy.

There's a really nice diagram in one of David Silver's papers illustrating the effect that bias can have on playouts. As soon as you see it you understand the problem. Unfortunately I don't have it to hand and have unfortunately run out of time looking for it, otherwise I'd reference it. Hopefully somebody else can give the reference. I suspect David probably co-authored the paper in which case apologies to the other author for not crediting them here!

I hope this helps

Regards

Raffles

On 03-Nov-15 19:38, Tobias Pfeiffer wrote:
Hi everyone,

I haven't yet caught up on most recent go papers. If what I ask is
answered in one of these, please point there.

It seems everyone is using quite heavy playouts these days (nxn
patterns, atari escapes, opening libraris, lots of stuff that I don't
know yet, ...) - my question is how does that mix with AMAF/RAVE? I
remember from the early papers, that they said it'd be dangerous to do
it with non random playouts and that they shouldn't have too much logic.

Which, well, makes sense (to me) because the argument is that we play
random moves so they are order independent. With patterns that doesn't
hold true anymore.

What's the experience out there? Does it just still work? Does it not
matter because you just "warm up" the tree? Or do you need to be careful
with what heuristics you apply not too break RAVE/AMAF?

Thank you!
Tobi



_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2016.0.7163 / Virus Database: 4457/10906 - Release Date: 10/28/15



_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

-- 
www.pragtob.info

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to