It's interesting to leave unused parameters or unnecessary
parameterizations in the paper. It telegraphs what was being tried as
opposed to simply writing something more concise and leaving the reader to
wonder why and how those decisions were made.
s.
On Nov 7, 2017 10:54 PM, "Imran Hendley" wr
Great, thanks guys!
On Tue, Nov 7, 2017 at 1:51 PM, Gian-Carlo Pascutto wrote:
> On 7/11/2017 19:07, Imran Hendley wrote:
> > Am I understanding this correctly?
>
> Yes.
>
> It's possible they had in-betweens or experimented with variations at
> some point, then settled on the simplest case. You
I can only speculate, but I see two advantages to using MSE:
* MSE accomodates games that have more than just win/loss. One of
AlphaGo Zero's goals (I'm extrapolating from the paper) was to develop
a system that was easy to apply to domains other than go.
* It can be used with TD-lambda-lik
I wonder if some of zero project (project based on alphago zero paper)
that if I understood well was launched
did already had gather some kind of mesurable succeed, even very only of
the order of hundreds points.
If I understand correctly, the previous mails, the computation power you
have is 1700
On 7/11/2017 19:08, Petr Baudis wrote:
> Hi!
>
> Does anyone knows why the AlphaGo team uses MSE on [-1,1] as the
> value output loss rather than binary crossentropy on [0,1]? I'd say
> the latter is way more usual when training networks as typically
> binary crossentropy yields better result, so
On 7/11/2017 19:07, Imran Hendley wrote:
> Am I understanding this correctly?
Yes.
It's possible they had in-betweens or experimented with variations at
some point, then settled on the simplest case. You can vary the
randomness if you define it as a softmax with varying temperature,
that's harder
If I understand your question correctly, "goes to 1" can happen as quickly
or slowly as you'd like. Yes?
On Nov 7, 2017 7:26 PM, "Imran Hendley" wrote:
Hi, I might be having trouble understanding the self-play policy for
AlphaGo Zero. Can someone let me know if I'm on the right track here?
The
Your understanding matches mine. My guess is that they had a temperature
parameter in the code that would allow for things like slowly transitioning
from random sampling to deterministically picking the maximum, but they
ended up using only those particular values.
Álvaro.
On Tue, Nov 7, 2017
Hi!
Does anyone knows why the AlphaGo team uses MSE on [-1,1] as the value
output loss rather than binary crossentropy on [0,1]? I'd say the
latter is way more usual when training networks as typically binary
crossentropy yields better result, so that's what I'm using in
https://github.com/pa
Hi, I might be having trouble understanding the self-play policy for
AlphaGo Zero. Can someone let me know if I'm on the right track here?
The paper states:
In each position s, an MCTS search is executed, guided by the neural
network f_θ . The
MCTS search outputs probabilities π of playing each m
Hi, I also have some Zero stuff brewing since almost two days. Although
I depend on heavy playout MC-evaluation for self play.
I am using my Odin MC-engine as a base as it is. It can use a small AG
style policy network running on CPU implemented with Eigen (C++). It
does not have any value ne
11 matches
Mail list logo