On 03-12-17 21:39, Brian Lee wrote:
> It should default to the Q of the parent node. Otherwise, let's say that
> the root node is a losing position. Upon choosing a followup move, the Q
> will be updated to a very negative value, and that node won't get
> explored again - at least until all 362
2017-12-06 13:52 GMT+00:00 Gian-Carlo Pascutto :
> On 06-12-17 11:47, Aja Huang wrote:
> > All I can say is that first-play-urgency is not a significant
> > technical detail, and what's why we didn't specify it in the paper.
>
> I will have to disagree here. Of course, it's always
On 06-12-17 11:47, Aja Huang wrote:
> All I can say is that first-play-urgency is not a significant
> technical detail, and what's why we didn't specify it in the paper.
I will have to disagree here. Of course, it's always possible I'm
misunderstanding something, or I have a program bug that I'm
Thanks for letting us know the situation Aja. It must be hard for an
engineer to not be able to discuss the details of his work!
As for the first-play-urgency value, if we indulge in some reading between
the lines: It's possible to interpret the paper as saying
first-play-urgency is zero. After
2017-12-06 9:23 GMT+00:00 Gian-Carlo Pascutto :
> On 03-12-17 17:57, Rémi Coulom wrote:
> > They have a Q(s,a) term in their node-selection formula, but they
> > don't tell what value they give to an action that has not yet been
> > visited. Maybe Aja can tell us.
>
> FWIW I
I made a pull request to Leela, and put some data in there. It shows the
details of how Q is initialized are actually important:
https://github.com/gcp/leela-zero/pull/238
2017-12-03 19:56 GMT-06:00 Álvaro Begué :
> You are asking about the selection of the move that
Álvaro, you are quoting from "Expand and evaluate (Figure 2b)". But my
question is about the section before that "Select (Figure 2a)". So the node
has not been expanded+initialized.
As Brian Lee mentioned, his MuGo uses the parent's value, which assumes
without further information the value
ipermail/computer-go/attachments/20171203/8fc94bcd/attachment-0001.html
> >
>
> --------------
>
> Message: 2
> Date: Sun, 3 Dec 2017 10:44:00 -0500
> From: Álvaro Begué <alvaro.be...@gmail.com>
> To: computer-go <computer-go@computer-go.org>
>
The text in the appendix has the answer, in a paragraph titled "Expand and
evaluate (Fig. 2b)":
"[...] The leaf node is expanded and and each edge (s_t, a) is
initialized to {N(s_t, a) = 0, W(s_t, a) = 0, Q(s_t, a) = 0, P(s_t, a) =
p_a}; [...]"
On Sun, Dec 3, 2017 at 11:27 AM, Andy
er-go@computer-go.org>
Envoyé: Dimanche 3 Décembre 2017 16:44:00
Objet: Re: [Computer-go] action-value Q for unexpanded nodes
I am not sure where in the paper you think they use Q(s,a) for a node s that
hasn't been expanded yet. Q(s,a) is a property of an edge of the graph. At a
leaf
Figure 2a shows two bolded Q+U max values. The second one is going to a
leaf that doesn't exist yet, i.e. not expanded yet. Where do they get that
Q value from?
The associated text doesn't clarify the situation: "Figure 2: Monte-Carlo
tree search in AlphaGo Zero. a Each simulation traverses the
I am not sure where in the paper you think they use Q(s,a) for a node s
that hasn't been expanded yet. Q(s,a) is a property of an edge of the
graph. At a leaf they only use the `value' output of the neural network.
If this doesn't match your understanding of the paper, please point to the
12 matches
Mail list logo