Re: [Computer-go] Adding roll-out to Zero

2017-11-16 Thread Hideki Kato
Hi,

Thank you for the info and an interesting idea.  I wonder,
however, if DCNN can replace handcraft rollouts...  

In the case of Zen, there are so many lines for special
cases such as snapbacks, approach moves, and nakades to 
play "correct" reply moves at 100% probability.  ReLU is not 
so good at approximating binary functions and, as a 
result, "dead" stones, for example, may live at a few tens 
percent in the average outcome of rollouts.

Hideki

patrick.bardou via Computer-go: 
:
>Hi Hideki,
>I think they could have used a rollout policy network (RPN), as described 
>in "Convolutional Monte Carlo Rollouts in Go" 
>:https://arxiv.org/abs/1512.03375
>and have it trained based on the MCTS outcome, at the same time and in the 
>same way as the policy head is trained. This RPN would start playing random 
>rollout, then benefit from the policy head training.
>This would let as "human knowledge" the mixing factor between rollout and 
>value net evaluations. But there is anyway such a mixing factor in Zero 
>training pipeline, in the loss function mixing policy and value heads.
>Regards,Patrick
>
>
>
>Message: 1
>Date: Fri, 17 Nov 2017 02:32:29 +0900
>From: Hideki Kato 
>To: computer-go@computer-go.org
>Subject: Re: [Computer-go] Is MCTS needed?
>Message-ID: <5a0dcba9.8060%hideki_ka...@ybb.ne.jp>
>Content-Type: text/plain; charset=US-ASCII
>
>Hi,
>
>I strongly believe adding rollout makes Zero stronger.  
>They removed rollout just to say "no human knowledge".
>#Though the number of past moves (16) has been tuned by 
>human :).
>
>
> inline file
>___
>Computer-go mailing list
>Computer-go@computer-go.org
>http://computer-go.org/mailman/listinfo/computer-go
-- 
Hideki Kato 
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Adding roll-out to Zero

2017-11-16 Thread patrick.bardou via Computer-go
Hi Hideki,
I think they could have used a rollout policy network (RPN), as described in 
"Convolutional Monte Carlo Rollouts in Go" :https://arxiv.org/abs/1512.03375
and have it trained based on the MCTS outcome, at the same time and in the same 
way as the policy head is trained. This RPN would start playing random rollout, 
then benefit from the policy head training.
This would let as "human knowledge" the mixing factor between rollout and value 
net evaluations. But there is anyway such a mixing factor in Zero training 
pipeline, in the loss function mixing policy and value heads.
Regards,Patrick



Message: 1
Date: Fri, 17 Nov 2017 02:32:29 +0900
From: Hideki Kato 
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Is MCTS needed?
Message-ID: <5a0dcba9.8060%hideki_ka...@ybb.ne.jp>
Content-Type: text/plain; charset=US-ASCII

Hi,

I strongly believe adding rollout makes Zero stronger.  
They removed rollout just to say "no human knowledge".
#Though the number of past moves (16) has been tuned by 
human :).


___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go