subject:"Re\\\: \\\[Computer\\\-go\\\] action\\\-value Q for unexpanded nodes"

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-07 Thread Gian-Carlo Pascutto

On 03-12-17 21:39, Brian Lee wrote: > It should default to the Q of the parent node. Otherwise, let's say that > the root node is a losing position. Upon choosing a followup move, the Q > will be updated to a very negative value, and that node won't get > explored again - at least until all 362

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-06 Thread Aja Huang

2017-12-06 13:52 GMT+00:00 Gian-Carlo Pascutto : > On 06-12-17 11:47, Aja Huang wrote: > > All I can say is that first-play-urgency is not a significant > > technical detail, and what's why we didn't specify it in the paper. > > I will have to disagree here. Of course, it's always

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-06 Thread Gian-Carlo Pascutto

On 06-12-17 11:47, Aja Huang wrote: > All I can say is that first-play-urgency is not a significant > technical detail, and what's why we didn't specify it in the paper. I will have to disagree here. Of course, it's always possible I'm misunderstanding something, or I have a program bug that I'm

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-06 Thread Andy

Thanks for letting us know the situation Aja. It must be hard for an engineer to not be able to discuss the details of his work! As for the first-play-urgency value, if we indulge in some reading between the lines: It's possible to interpret the paper as saying first-play-urgency is zero. After

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-06 Thread Aja Huang

2017-12-06 9:23 GMT+00:00 Gian-Carlo Pascutto : > On 03-12-17 17:57, Rémi Coulom wrote: > > They have a Q(s,a) term in their node-selection formula, but they > > don't tell what value they give to an action that has not yet been > > visited. Maybe Aja can tell us. > > FWIW I

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Andy

I made a pull request to Leela, and put some data in there. It shows the details of how Q is initialized are actually important: https://github.com/gcp/leela-zero/pull/238 2017-12-03 19:56 GMT-06:00 Álvaro Begué : > You are asking about the selection of the move that

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Andy

Álvaro, you are quoting from "Expand and evaluate (Figure 2b)". But my question is about the section before that "Select (Figure 2a)". So the node has not been expanded+initialized. As Brian Lee mentioned, his MuGo uses the parent's value, which assumes without further information the value

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Brian Lee

ipermail/computer-go/attachments/20171203/8fc94bcd/attachment-0001.html > > > > -------------- > > Message: 2 > Date: Sun, 3 Dec 2017 10:44:00 -0500 > From: Álvaro Begué <alvaro.be...@gmail.com> > To: computer-go <computer-go@computer-go.org> >

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Álvaro Begué

The text in the appendix has the answer, in a paragraph titled "Expand and evaluate (Fig. 2b)": "[...] The leaf node is expanded and and each edge (s_t, a) is initialized to {N(s_t, a) = 0, W(s_t, a) = 0, Q(s_t, a) = 0, P(s_t, a) = p_a}; [...]" On Sun, Dec 3, 2017 at 11:27 AM, Andy

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Rémi Coulom

er-go@computer-go.org> Envoyé: Dimanche 3 Décembre 2017 16:44:00 Objet: Re: [Computer-go] action-value Q for unexpanded nodes I am not sure where in the paper you think they use Q(s,a) for a node s that hasn't been expanded yet. Q(s,a) is a property of an edge of the graph. At a leaf

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Andy

Figure 2a shows two bolded Q+U max values. The second one is going to a leaf that doesn't exist yet, i.e. not expanded yet. Where do they get that Q value from? The associated text doesn't clarify the situation: "Figure 2: Monte-Carlo tree search in AlphaGo Zero. a Each simulation traverses the

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Álvaro Begué

I am not sure where in the paper you think they use Q(s,a) for a node s that hasn't been expanded yet. Q(s,a) is a property of an edge of the graph. At a leaf they only use the `value' output of the neural network. If this doesn't match your understanding of the paper, please point to the

Re: [Computer-go] action-value Q for unexpanded nodes

Re: [Computer-go] action-value Q for unexpanded nodes

Re: [Computer-go] action-value Q for unexpanded nodes

Re: [Computer-go] action-value Q for unexpanded nodes

Re: [Computer-go] action-value Q for unexpanded nodes

Re: [Computer-go] action-value Q for unexpanded nodes

Re: [Computer-go] action-value Q for unexpanded nodes

Re: [Computer-go] action-value Q for unexpanded nodes

Re: [Computer-go] action-value Q for unexpanded nodes

Re: [Computer-go] action-value Q for unexpanded nodes

Re: [Computer-go] action-value Q for unexpanded nodes

Re: [Computer-go] action-value Q for unexpanded nodes

12 matches

Site Navigation

Mail list logo

Footer information