Le 11/01/2017 à 16:14, Bo Peng a écrit :
> Hi,
>
>> How do you get the V(s) for those datasets? You play out the endgame
>> with the Monte Carlo playouts?
>>
>> I think one problem with this approach is that errors in the data for
>> V(s) directly correlate to errors in MC playouts. So a large be
> https://games.slashdot.org/story/17/01/04/2022236/googles-alphago-ai-secretively-won-more-than-50-straight-games-against-worlds-top-go-players
The five Lee Sedol games last year never felt like they were probing
Alpha Go's potential weaknesses. E.g. things like whole board semeai,
complex whole
Hi,
>How do you get the V(s) for those datasets? You play out the endgame
>with the Monte Carlo playouts?
>
>I think one problem with this approach is that errors in the data for
>V(s) directly correlate to errors in MC playouts. So a large benefit of
>"mixing" the two (otherwise independent) eval
Hi,
How do you get the V(s) for those datasets? You play out the endgame
> with the Monte Carlo playouts?
>
Yes, I use result of 100 playout from the endgame.
Sometimes the result stored in sgf differs from result of playouts.
zakki
___
Computer-go mai
Hi zakki,
> I couldn't get positive experiment results on Ray.
> Rn's network structure of V and W are similar and share parameters,
> but only final convolutional layer are different.
> I trained Rn's network to minimize MSE of V(s) + W(s).
> It uses only KGS and GoGoD data sets, no self play wit
It¹s nice to see so many discussions.
Another reason could be that training a good quality v(s) (or V(s)) may
require some different network structures from that of W(s).
Usually it is helpful to have an ensemble of different networks, each
constructed from different principles.
On 1/11/17, 22:1
On 10-01-17 23:25, Bo Peng wrote:
> Hi everyone. It occurs to me there might be a more efficient method to
> train the value network directly (without using the policy network).
>
> You are welcome to check my
> method: http://withablink.com/GoValueFunction.pdf
>
For Method 1 you state:
"Howeve
On 11-01-17 14:33, Kensuke Matsuzaki wrote:
> Hi,
>
> I couldn't get positive experiment results on Ray.
>
> Rn's network structure of V and W are similar and share parameters,
> but only final convolutional layer are different.
> I trained Rn's network to minimize MSE of V(s) + W(s).
> It uses
Hi,
1) Simultaneous policy and value functions reinforcement learning by
MCTS-TD-Lambda ?
What is a good policy network, from a 'Policy & Value - MCTS' (PV-MCTS) point
of view (i.e. in Alphago implementation) ?
Refering to Silver's paper terminology and results, greedy policy using RL
Hi,
I couldn't get positive experiment results on Ray.
Rn's network structure of V and W are similar and share parameters,
but only final convolutional layer are different.
I trained Rn's network to minimize MSE of V(s) + W(s).
It uses only KGS and GoGoD data sets, no self play with RL policy.
Wh
Hi Remi,
Thanks for sharing your experience.
As I am writing this, it seems there could be a third method: the perfect
value function shall have the minimax property in the obvious way. So we
can train our value function to satisfy the minimax property as well. In
fact, we can train it such that
Hi,
Thanks for sharing your idea.
In my experience it is rarely efficient to train value functions from very
short term data (ie, next move). TD(lambda), or training from the final outcome
of the game is often better, because it uses a longer horizon. But of course,
it is difficult to tell wit
For people interested in seeing the reviews for ICLR 2017 for the paper:
https://openreview.net/forum?id=Bk67W4Yxl
On Tue, Jan 10, 2017 at 6:46 AM, Detlef Schmicker wrote:
> Very interesting,
>
> but lets wait some days for getting an idea of the strength,
> 4d it reached due to games against Ay
Hi John,
>You say "the perfect policy network can be
>derived from the perfect value network (the best next move is the move
>that maximises the value for the player, if the value function is
>perfect), but not vice versa.", but a perfect policy for both players
>can be used to generate a perfect
14 matches
Mail list logo