Re: [Computer-go] mini-max with Policy and Value network

2017-06-07 Thread Gian-Carlo Pascutto
On 24-05-17 05:33, "Ingo Althöfer" wrote: >> So, 0.001% probability. Demis commented that Lee Sedol's winning move in >> game 4 was a one in 10 000 move. This is a 1 in 100 000 move. > > In Summer 2016 I checked the games of AlphaGo vs Lee Sedol > with repeated runs of CrazyStone DL: > In 3 of 20

Re: [Computer-go] mini-max with Policy and Value network

2017-06-07 Thread Ingo Althöfer
Hi, just my 2 Cent. "Gian-Carlo Pascutto" wrote: > In the attached SGF, AlphaGo played P10, which was considered a very > surprising move by all commentators... > I can sort-of confirm this: > > 0.295057654 (E13) > ...(60 more moves follow)... > 0.11952 (P10) > > So, 0.001% probability.

Re: [Computer-go] mini-max with Policy and Value network

2017-06-07 Thread Hideki Kato
Generalizing shoulder-hit moves on lower lines may prefer the move in question. Hideki Gian-Carlo Pascutto: : >On 23-05-17 17:19, Hideki Kato wrote: >> Gian-Carlo Pascutto: <0357614a-98b8-6949-723e-e1a849c75...@sjeng.org>: >> >>> Now, even the original AlphaGo played moves that surprised human

Re: [Computer-go] mini-max with Policy and Value network

2017-06-07 Thread Hideki Kato
Alvaro Begue: : >On Tue, May 23, 2017 at 4:51 AM, Hideki Kato wrote: > >> (3) CNN cannot learn exclusive-or function due to the ReLU >> activation function, instead of traditional sigmoid (tangent >> hyperbolic). CNN is good at approximating continuous (analog) >> functions but Boolean (digital)

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Gian-Carlo Pascutto
On 23-05-17 17:19, Hideki Kato wrote: > Gian-Carlo Pascutto: <0357614a-98b8-6949-723e-e1a849c75...@sjeng.org>: > >> Now, even the original AlphaGo played moves that surprised human pros >> and were contrary to established sequences. So where did those come >> from? Enough computation power to over

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread valkyria
(3) CNN cannot learn exclusive-or function due to the ReLU activation function, instead of traditional sigmoid (tangent hyperbolic). CNN is good at approximating continuous (analog) functions but Boolean (digital) ones. Are you sure about that? I can imagine using two ReLU units to construct

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Álvaro Begué
On Tue, May 23, 2017 at 4:51 AM, Hideki Kato wrote: > (3) CNN cannot learn exclusive-or function due to the ReLU > activation function, instead of traditional sigmoid (tangent > hyperbolic). CNN is good at approximating continuous (analog) > functions but Boolean (digital) ones. > Oh, not this

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Hideki Kato
Gian-Carlo Pascutto: : >On 23-05-17 10:51, Hideki Kato wrote: >> (2) The number of possible positions (input of the value net) in >> real games is at least 10^30 (10^170 in theory). If the value >> net can recognize all? L&Ds depend on very small difference of >> the placement of stones or lib

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Hideki Kato
Gian-Carlo Pascutto: <0357614a-98b8-6949-723e-e1a849c75...@sjeng.org>: >Now, even the original AlphaGo played moves that surprised human pros >and were contrary to established sequences. So where did those come >from? Enough computation power to overcome the low probability? >Synthesized by infere

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Hideki Kato
Erik van der Werf: : >On Tue, May 23, 2017 at 10:51 AM, Hideki Kato >wrote: > >> Agree. >> >> (1) To solve L&D, some search is necessary in practice. So, the >> value net cannot solve some of them. >> (2) The number of possible positions (input of the value net) in >> real games is at least 10^3

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Erik van der Werf
On Tue, May 23, 2017 at 10:51 AM, Hideki Kato wrote: > Agree. > > (1) To solve L&D, some search is necessary in practice. So, the > value net cannot solve some of them. > (2) The number of possible positions (input of the value net) in > real games is at least 10^30 (10^170 in theory). If the v

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Gian-Carlo Pascutto
On 23-05-17 10:51, Hideki Kato wrote: > (2) The number of possible positions (input of the value net) in > real games is at least 10^30 (10^170 in theory). If the value > net can recognize all? L&Ds depend on very small difference of > the placement of stones or liberties. Can we provide nece

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Gian-Carlo Pascutto
On 22-05-17 21:01, Marc Landgraf wrote: > But what you should really look at here is Leelas evaluation of the game. Note that this is completely irrelevant for the discussion about tactical holes and the position I posted. You could literally plug any evaluation into it (save for a static oracle,

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Erik van der Werf
On Mon, May 22, 2017 at 4:54 PM, Gian-Carlo Pascutto wrote: > On 22-05-17 15:46, Erik van der Werf wrote: > > Anyway, LMR seems like a good idea, but last time I tried it (in Migos) > > it did not help. In Magog I had some good results with fractional depth > > reductions (like in Realization Pro

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Gian-Carlo Pascutto
On 23-05-17 03:39, David Wu wrote: > Leela playouts are definitely extremely bad compared to competitors like > Crazystone. The deep-learning version of Crazystone has no value net as > far as I know, only a policy net, which means it's going on MC playouts > alone to produce its evaluations. Nonet

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Hideki Kato
Agree. (1) To solve L&D, some search is necessary in practice. So, the value net cannot solve some of them. (2) The number of possible positions (input of the value net) in real games is at least 10^30 (10^170 in theory). If the value net can recognize all? L&Ds depend on very small differe

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread David Wu
Addendum: Some additional playing around with the same position can flip the roles of the playouts and value net - so now the value net is very wrong and the playouts are mostly right. I think this gives good insight into what the value net is doing and why as a general matter playouts are still u

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread David Wu
Leela playouts are definitely extremely bad compared to competitors like Crazystone. The deep-learning version of Crazystone has no value net as far as I know, only a policy net, which means it's going on MC playouts alone to produce its evaluations. Nonetheless, its playouts often have noticeable

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Marc Landgraf
And talkig about tactical mistakes: Another game, where a trick joseki early in the game (top right) completely fools Leela. Leela here play this like it would be done in similar shapes, but then gets completely blindsided. But to make things worse, it finds the one way to make the loss the biggest

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Marc Landgraf
Leela has surprisingly large tactical holes. Right now it is throwing a good number of games against me in completely won endgames by fumbling away entirely alive groups. As an example I attached one game of myself (3d), even vs Leela10 @7d. But this really isn't a onetime occurence. If you look

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Gian-Carlo Pascutto
On 22-05-17 17:47, Erik van der Werf wrote: > On Mon, May 22, 2017 at 3:56 PM, Gian-Carlo Pascutto > wrote: > > Well, I think that's fundamental; you can't be wide and deep at the same > time, but at least you can chose an algorithm that (eventually) explores > all directio

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Erik van der Werf
On Mon, May 22, 2017 at 3:56 PM, Gian-Carlo Pascutto wrote: > On 22-05-17 11:27, Erik van der Werf wrote: > > On Mon, May 22, 2017 at 10:08 AM, Gian-Carlo Pascutto > > wrote: > > > > ... This heavy pruning > > by the policy network OTOH seems to be an issue for me.

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Gian-Carlo Pascutto
On 22-05-17 15:46, Erik van der Werf wrote: > Oh, haha, after reading Brian's post I guess I misunderstood :-) > > Anyway, LMR seems like a good idea, but last time I tried it (in Migos) > it did not help. In Magog I had some good results with fractional depth > reductions (like in Realization Pro

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Gian-Carlo Pascutto
On 22-05-17 14:48, Brian Sheppard via Computer-go wrote: > My reaction was "well, if you are using alpha-beta, then at least use > LMR rather than hard pruning." Your reaction is "don't use > alpha-beta", and you would know better than anyone! There's 2 aspects to my answer: 1) Unless you've mad

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Gian-Carlo Pascutto
On 22-05-17 11:27, Erik van der Werf wrote: > On Mon, May 22, 2017 at 10:08 AM, Gian-Carlo Pascutto > wrote: > > ... This heavy pruning > by the policy network OTOH seems to be an issue for me. My program has > big tactical holes. > > > Do you do any hard prun

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Erik van der Werf
On Mon, May 22, 2017 at 11:27 AM, Erik van der Werf < erikvanderw...@gmail.com> wrote: > On Mon, May 22, 2017 at 10:08 AM, Gian-Carlo Pascutto > wrote: >> >> ... This heavy pruning >> by the policy network OTOH seems to be an issue for me. My program has >> big tactical holes. > > > Do you do any

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Brian Sheppard via Computer-go
uter-go.org] On Behalf Of Gian-Carlo Pascutto Sent: Monday, May 22, 2017 4:08 AM To: computer-go@computer-go.org Subject: Re: [Computer-go] mini-max with Policy and Value network On 20/05/2017 22:26, Brian Sheppard via Computer-go wrote: > Could use late-move reductions to eliminate the hard pruning

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Erik van der Werf
On Mon, May 22, 2017 at 10:08 AM, Gian-Carlo Pascutto wrote: > > ... This heavy pruning > by the policy network OTOH seems to be an issue for me. My program has > big tactical holes. Do you do any hard pruning? My engines (Steenvreter,Magog) always had a move predictor (a.k.a. policy net), but I

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Gian-Carlo Pascutto
On 20/05/2017 22:26, Brian Sheppard via Computer-go wrote: > Could use late-move reductions to eliminate the hard pruning. Given > the accuracy rate of the policy network, I would guess that even move > 2 should be reduced. > The question I always ask is: what's the real difference between MCTS w

Re: [Computer-go] mini-max with Policy and Value network

2017-05-20 Thread Brian Sheppard via Computer-go
Could use late-move reductions to eliminate the hard pruning. Given the accuracy rate of the policy network, I would guess that even move 2 should be reduced. -Original Message- From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Hiroshi Yamashita Sent: Saturday,