Re: [Computer-go] PUCT formula

2018-03-09 Thread Brian Sheppard via Computer-go
Thanks for the explanation. I agree that there is no actual consistency in exploration terms across historical papers. I confirmed that the PUCT formulas across the AG, AGZ, and AZ papers are all consistent. That is unlikely to be an error. So now I am wondering whether the faster decay is

Re: [Computer-go] PUCT formula

2018-03-09 Thread Martin Mueller
I talked to Chenjun just now so this is what we both remember. The PUCB formula as published in Chris Rosin’s paper actually has an additive knowledge term, and it looks nothing like the two different PUCT variants tried in AlphaGo and our paper. Chenjun tried an additive term as in Chris’

Re: [Computer-go] PUCT formula

2018-03-09 Thread Gian-Carlo Pascutto
On 09-03-18 18:03, Brian Sheppard via Computer-go wrote: > I am guessing that Chenjun and Martin decided (or knew) that the AGZ > paper was incorrect and modified the equation accordingly. > I doubt it's just the paper that was incorrect, given that the formula has been given without log

Re: [Computer-go] PUCT formula

2018-03-09 Thread Brian Sheppard via Computer-go
However, the formula in the AGZ paper doesn't look like any "UCT variant". Formula from paper: Cpuct * P(s,a) * sqrt(Sum(N(s,b))) / (1 + N(s,a)) Note that there is no logarithmic term, and the division by N+1 falls outside the sqrt. For comparison, a normal UCT term looks like

[Computer-go] Thought on LeelaZero training

2018-03-09 Thread ChtiGo via Computer-go
Hi,   Generating self-play games represents smost of the computation burden in LZ project. With current setting, games are generated with a 1000 nodes/move budget. As a rough guide, considering 250 moves game length and ignoring resign and possible tree reuse, I assume generating a game,

Re: [Computer-go] PUCT formula

2018-03-09 Thread Gian-Carlo Pascutto
On 08-03-18 18:47, Brian Sheppard via Computer-go wrote: > I recall that someone investigated this question, but I don’t recall the > result. What is the formula that AGZ actually uses? The one mentioned in their paper, I assume. I investigated both that and the original from the referenced