Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-07 Thread Gian-Carlo Pascutto
On 03-12-17 21:39, Brian Lee wrote: > It should default to the Q of the parent node. Otherwise, let's say that > the root node is a losing position. Upon choosing a followup move, the Q > will be updated to a very negative value, and that node won't get > explored again - at least until all 362

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-06 Thread Aja Huang
2017-12-06 13:52 GMT+00:00 Gian-Carlo Pascutto : > On 06-12-17 11:47, Aja Huang wrote: > > All I can say is that first-play-urgency is not a significant > > technical detail, and what's why we didn't specify it in the paper. > > I will have to disagree here. Of course, it's always

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-06 Thread Gian-Carlo Pascutto
On 06-12-17 11:47, Aja Huang wrote: > All I can say is that first-play-urgency is not a significant > technical detail, and what's why we didn't specify it in the paper. I will have to disagree here. Of course, it's always possible I'm misunderstanding something, or I have a program bug that I'm

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-06 Thread Andy
Thanks for letting us know the situation Aja. It must be hard for an engineer to not be able to discuss the details of his work! As for the first-play-urgency value, if we indulge in some reading between the lines: It's possible to interpret the paper as saying first-play-urgency is zero. After

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-06 Thread Aja Huang
2017-12-06 9:23 GMT+00:00 Gian-Carlo Pascutto : > On 03-12-17 17:57, Rémi Coulom wrote: > > They have a Q(s,a) term in their node-selection formula, but they > > don't tell what value they give to an action that has not yet been > > visited. Maybe Aja can tell us. > > FWIW I

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Andy
I made a pull request to Leela, and put some data in there. It shows the details of how Q is initialized are actually important: https://github.com/gcp/leela-zero/pull/238 2017-12-03 19:56 GMT-06:00 Álvaro Begué : > You are asking about the selection of the move that

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Andy
Álvaro, you are quoting from "Expand and evaluate (Figure 2b)". But my question is about the section before that "Select (Figure 2a)". So the node has not been expanded+initialized. As Brian Lee mentioned, his MuGo uses the parent's value, which assumes without further information the value

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Brian Lee
ipermail/computer-go/attachments/20171203/8fc94bcd/attachment-0001.html > > > > -------------- > > Message: 2 > Date: Sun, 3 Dec 2017 10:44:00 -0500 > From: Álvaro Begué <alvaro.be...@gmail.com> > To: computer-go <computer-go@computer-go.org> >

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Álvaro Begué
The text in the appendix has the answer, in a paragraph titled "Expand and evaluate (Fig. 2b)": "[...] The leaf node is expanded and and each edge (s_t, a) is initialized to {N(s_t, a) = 0, W(s_t, a) = 0, Q(s_t, a) = 0, P(s_t, a) = p_a}; [...]" On Sun, Dec 3, 2017 at 11:27 AM, Andy

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Rémi Coulom
er-go@computer-go.org> Envoyé: Dimanche 3 Décembre 2017 16:44:00 Objet: Re: [Computer-go] action-value Q for unexpanded nodes I am not sure where in the paper you think they use Q(s,a) for a node s that hasn't been expanded yet. Q(s,a) is a property of an edge of the graph. At a leaf

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Andy
Figure 2a shows two bolded Q+U max values. The second one is going to a leaf that doesn't exist yet, i.e. not expanded yet. Where do they get that Q value from? The associated text doesn't clarify the situation: "Figure 2: Monte-Carlo tree search in AlphaGo Zero. a Each simulation traverses the

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Álvaro Begué
I am not sure where in the paper you think they use Q(s,a) for a node s that hasn't been expanded yet. Q(s,a) is a property of an edge of the graph. At a leaf they only use the `value' output of the neural network. If this doesn't match your understanding of the paper, please point to the