Re: [Computer-go] AMAF/RAVE + heavy playouts - is it save?
You are welcome. Figure 1 in [2] is the diagram I was thinking of. On 03-Nov-15 20:39, Tobias Pfeiffer wrote: This helps very much, thank you for taking the time to answer! You might be looking for for "Combining Online and Offline Knowledge in UCT" [1] by Gelly and Silver. Silver Tesauroreference it in "Monte-carlo Simulation Balancing" [2] with "Unfortunately, a stronger simulation policy can actually lead to a weaker Monte-Carlo search (Gelly & Silver, 2007), a paradox that we explore further in this paper." I'll make it a priority to read both papers in detail thank you! If you meant another paper, someone else knows one I'm happy to see more references. Thanks! Tobi [1] http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf [2] http://www.machinelearning.org/archive/icml2009/papers/500.pdf On 03.11.2015 21:03, robertfinkng...@o2.co.uk wrote: You have to be careful what heuristics you apply. This was a surprising result: using a playout policy which in itself is a stronger go player can actually make MCTS/AMAF weaker. The reason is that MCTS depends entirely on accurate estimations of the value of each position in the tree. Any playout policy which introduces a bias therefore weakens MCTS. It may increase precision (lower standard deviation) but gives a less accurate assessment of the value (an incorrect mean). Most playouts at the moment (at least published ones) are based on Remi's Mogo playout policy, which increases precision without sacrificing accuracy. There's a really nice diagram in one of David Silver's papers illustrating the effect that bias can have on playouts. As soon as you see it you understand the problem. Unfortunately I don't have it to hand and have unfortunately run out of time looking for it, otherwise I'd reference it. Hopefully somebody else can give the reference. I suspect David probably co-authored the paper in which case apologies to the other author for not crediting them here! I hope this helps Regards Raffles On 03-Nov-15 19:38, Tobias Pfeiffer wrote: Hi everyone, I haven't yet caught up on most recent go papers. If what I ask is answered in one of these, please point there. It seems everyone is using quite heavy playouts these days (nxn patterns, atari escapes, opening libraris, lots of stuff that I don't know yet, ...) - my question is how does that mix with AMAF/RAVE? I remember from the early papers, that they said it'd be dangerous to do it with non random playouts and that they shouldn't have too much logic. Which, well, makes sense (to me) because the argument is that we play random moves so they are order independent. With patterns that doesn't hold true anymore. What's the experience out there? Does it just still work? Does it not matter because you just "warm up" the tree? Or do you need to be careful with what heuristics you apply not too break RAVE/AMAF? Thank you! Tobi ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go - No virus found in this message. Checked by AVG - www.avg.com Version: 2016.0.7163 / Virus Database: 4457/10906 - Release Date: 10/28/15 ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go -- www.pragtob.info ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] AMAF/RAVE + heavy playouts - is it save?
To make matters more difficult I assume that this also depends on the exact node evaluation you’re using. There’s UCT + RAVE, then there’s just RAVE (as used by Michi). And then you can add other things in there as well like criticality (like Pachi, and at least at one point CrazyStone). I personally saw a definite strength increase when adding RAVE to the exploration strategy with heavy playouts but then my bot isn’t that strong yet so it may be different for you. As always, there’s no replacement for benchmarks. Urban On Tue, Nov 3, 2015 at 8:38 PM, Tobias Pfeifferwrote: > Hi everyone, > > I haven't yet caught up on most recent go papers. If what I ask is > answered in one of these, please point there. > > It seems everyone is using quite heavy playouts these days (nxn > patterns, atari escapes, opening libraris, lots of stuff that I don't > know yet, ...) - my question is how does that mix with AMAF/RAVE? I > remember from the early papers, that they said it'd be dangerous to do > it with non random playouts and that they shouldn't have too much logic. > > Which, well, makes sense (to me) because the argument is that we play > random moves so they are order independent. With patterns that doesn't > hold true anymore. > > What's the experience out there? Does it just still work? Does it not > matter because you just "warm up" the tree? Or do you need to be careful > with what heuristics you apply not too break RAVE/AMAF? > > Thank you! > Tobi > > -- > www.pragtob.info > > > > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > -- Blog: http://bettong.net/ Twitter: https://twitter.com/ujh Homepage: http://www.urbanhafner.com/ ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] AMAF/RAVE + heavy playouts - is it save?
The name "Monte Carlo" strongly seems to suggest, that randomness it at the core of the method. And randomness does play a role. But what really happend in the shift to MC, was that bots didn't try to evaluate intermediate positions anymore. Instead, all game knowledge was put into selecting candidate moves. It turns out, that, for bots, it's much easier to suggest promising moves, than to say who is ahead in an ongoing game. The tree of possible go games is so vast, that trying to explore it with pure randomness fails. Even with statistical feedback. It's already a minor miracle, that it works as well as it does with good move generators. ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
[Computer-go] AMAF/RAVE + heavy playouts - is it save?
Hi everyone, I haven't yet caught up on most recent go papers. If what I ask is answered in one of these, please point there. It seems everyone is using quite heavy playouts these days (nxn patterns, atari escapes, opening libraris, lots of stuff that I don't know yet, ...) - my question is how does that mix with AMAF/RAVE? I remember from the early papers, that they said it'd be dangerous to do it with non random playouts and that they shouldn't have too much logic. Which, well, makes sense (to me) because the argument is that we play random moves so they are order independent. With patterns that doesn't hold true anymore. What's the experience out there? Does it just still work? Does it not matter because you just "warm up" the tree? Or do you need to be careful with what heuristics you apply not too break RAVE/AMAF? Thank you! Tobi -- www.pragtob.info signature.asc Description: OpenPGP digital signature ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] AMAF/RAVE + heavy playouts - is it save?
Many Faces of Go doesn’t use Remi’s playout policy and I don’t think Zen does either. I don’t think Remi’s and Mogo’s are similar either, since they were in some ways competing developments. The bias issue is very real, so as you add knowledge to the playouts you have to be careful to add (for example) both attack and defense moves in a situation. David From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Tobias Pfeiffer Sent: Tuesday, November 03, 2015 12:39 PM To: r...@ffles.com; computer-go@computer-go.org Subject: Re: [Computer-go] AMAF/RAVE + heavy playouts - is it save? This helps very much, thank you for taking the time to answer! You might be looking for for "Combining Online and Offline Knowledge in UCT" [1] by Gelly and Silver. Silver Tesauroreference it in "Monte-carlo Simulation Balancing" [2] with "Unfortunately, a stronger simulation policy can actually lead to a weaker Monte-Carlo search (Gelly & Silver, 2007), a paradox that we explore further in this paper." I'll make it a priority to read both papers in detail thank you! If you meant another paper, someone else knows one I'm happy to see more references. Thanks! Tobi [1] http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf [2] http://www.machinelearning.org/archive/icml2009/papers/500.pdf On 03.11.2015 21:03, robertfinkng...@o2.co.uk wrote: You have to be careful what heuristics you apply. This was a surprising result: using a playout policy which in itself is a stronger go player can actually make MCTS/AMAF weaker. The reason is that MCTS depends entirely on accurate estimations of the value of each position in the tree. Any playout policy which introduces a bias therefore weakens MCTS. It may increase precision (lower standard deviation) but gives a less accurate assessment of the value (an incorrect mean). Most playouts at the moment (at least published ones) are based on Remi's Mogo playout policy, which increases precision without sacrificing accuracy. There's a really nice diagram in one of David Silver's papers illustrating the effect that bias can have on playouts. As soon as you see it you understand the problem. Unfortunately I don't have it to hand and have unfortunately run out of time looking for it, otherwise I'd reference it. Hopefully somebody else can give the reference. I suspect David probably co-authored the paper in which case apologies to the other author for not crediting them here! I hope this helps Regards Raffles On 03-Nov-15 19:38, Tobias Pfeiffer wrote: Hi everyone, I haven't yet caught up on most recent go papers. If what I ask is answered in one of these, please point there. It seems everyone is using quite heavy playouts these days (nxn patterns, atari escapes, opening libraris, lots of stuff that I don't know yet, ...) - my question is how does that mix with AMAF/RAVE? I remember from the early papers, that they said it'd be dangerous to do it with non random playouts and that they shouldn't have too much logic. Which, well, makes sense (to me) because the argument is that we play random moves so they are order independent. With patterns that doesn't hold true anymore. What's the experience out there? Does it just still work? Does it not matter because you just "warm up" the tree? Or do you need to be careful with what heuristics you apply not too break RAVE/AMAF? Thank you! Tobi ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go - No virus found in this message. Checked by AVG - www.avg.com Version: 2016.0.7163 / Virus Database: 4457/10906 - Release Date: 10/28/15 ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go -- www.pragtob.info ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] AMAF/RAVE + heavy playouts - is it save?
You have to be careful what heuristics you apply. This was a surprising result: using a playout policy which in itself is a stronger go player can actually make MCTS/AMAF weaker. The reason is that MCTS depends entirely on accurate estimations of the value of each position in the tree. Any playout policy which introduces a bias therefore weakens MCTS. It may increase precision (lower standard deviation) but gives a less accurate assessment of the value (an incorrect mean). Most playouts at the moment (at least published ones) are based on Remi's Mogo playout policy, which increases precision without sacrificing accuracy. There's a really nice diagram in one of David Silver's papers illustrating the effect that bias can have on playouts. As soon as you see it you understand the problem. Unfortunately I don't have it to hand and have unfortunately run out of time looking for it, otherwise I'd reference it. Hopefully somebody else can give the reference. I suspect David probably co-authored the paper in which case apologies to the other author for not crediting them here! I hope this helps Regards Raffles On 03-Nov-15 19:38, Tobias Pfeiffer wrote: Hi everyone, I haven't yet caught up on most recent go papers. If what I ask is answered in one of these, please point there. It seems everyone is using quite heavy playouts these days (nxn patterns, atari escapes, opening libraris, lots of stuff that I don't know yet, ...) - my question is how does that mix with AMAF/RAVE? I remember from the early papers, that they said it'd be dangerous to do it with non random playouts and that they shouldn't have too much logic. Which, well, makes sense (to me) because the argument is that we play random moves so they are order independent. With patterns that doesn't hold true anymore. What's the experience out there? Does it just still work? Does it not matter because you just "warm up" the tree? Or do you need to be careful with what heuristics you apply not too break RAVE/AMAF? Thank you! Tobi ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go - No virus found in this message. Checked by AVG - www.avg.com Version: 2016.0.7163 / Virus Database: 4457/10906 - Release Date: 10/28/15 ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] AMAF/RAVE + heavy playouts - is it save?
This helps very much, thank you for taking the time to answer! You might be looking for for "Combining Online and Offline Knowledge in UCT" [1] by Gelly and Silver. Silver Tesauroreference it in "Monte-carlo Simulation Balancing" [2] with "Unfortunately, a stronger simulation policy can actually lead to a weaker Monte-Carlo search (Gelly & Silver, 2007), a paradox that we explore further in this paper." I'll make it a priority to read both papers in detail thank you! If you meant another paper, someone else knows one I'm happy to see more references. Thanks! Tobi [1] http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf [2] http://www.machinelearning.org/archive/icml2009/papers/500.pdf On 03.11.2015 21:03, robertfinkng...@o2.co.uk wrote: > You have to be careful what heuristics you apply. This was a > surprising result: using a playout policy which in itself is a > stronger go player can actually make MCTS/AMAF weaker. The reason is > that MCTS depends entirely on accurate estimations of the value of > each position in the tree. Any playout policy which introduces a bias > therefore weakens MCTS. It may increase precision (lower standard > deviation) but gives a less accurate assessment of the value (an > incorrect mean). Most playouts at the moment (at least published ones) > are based on Remi's Mogo playout policy, which increases precision > without sacrificing accuracy. > > There's a really nice diagram in one of David Silver's papers > illustrating the effect that bias can have on playouts. As soon as you > see it you understand the problem. Unfortunately I don't have it to > hand and have unfortunately run out of time looking for it, otherwise > I'd reference it. Hopefully somebody else can give the reference. I > suspect David probably co-authored the paper in which case apologies > to the other author for not crediting them here! > > I hope this helps > > Regards > > Raffles > > On 03-Nov-15 19:38, Tobias Pfeiffer wrote: >> Hi everyone, >> >> I haven't yet caught up on most recent go papers. If what I ask is >> answered in one of these, please point there. >> >> It seems everyone is using quite heavy playouts these days (nxn >> patterns, atari escapes, opening libraris, lots of stuff that I don't >> know yet, ...) - my question is how does that mix with AMAF/RAVE? I >> remember from the early papers, that they said it'd be dangerous to do >> it with non random playouts and that they shouldn't have too much logic. >> >> Which, well, makes sense (to me) because the argument is that we play >> random moves so they are order independent. With patterns that doesn't >> hold true anymore. >> >> What's the experience out there? Does it just still work? Does it not >> matter because you just "warm up" the tree? Or do you need to be careful >> with what heuristics you apply not too break RAVE/AMAF? >> >> Thank you! >> Tobi >> >> >> >> ___ >> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go >> >> >> - >> No virus found in this message. >> Checked by AVG - www.avg.com >> Version: 2016.0.7163 / Virus Database: 4457/10906 - Release Date: 10/28/15 > > > > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go -- www.pragtob.info ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go