DeepMind has published a number of papers on how to stabilize RL strategies
in a landscape of nontransitive cycles. See
https://papers.nips.cc/paper/2018/file/cdf1035c34ec380218a8cc9a43d438f9-Paper.pdf
I haven't fully digested the paper, but what I'm getting from it is that if
you want your
Agreed on the book interface!
If you click through to the end of a variation, you can see that the
evaluation is a minimax'd evaluation, instead of an average over the
subtree nodes, even if the overall tree was generated via a UCT algorithm.
The values seem a bit more fragile to me as a
this doesn't actually speed up the neural networks that much; it's a
technique to more quickly brute-force the search space of possible neural
networks for ones that execute faster while maintaining similar accuracy.
Typical hype article.
Anyway, the effort spent looking for bizarre architectures
In particular, they had no way to train a value net, so it was back to
AlphaGo v1 style of training just a policy net and reusing it as the
rollout policy.
On Fri, Apr 6, 2018 at 6:31 AM Fidel Santiago wrote:
> Hello,
>
> Apparently the lessons of Alphago (and many
Hi everyone,
I remember that when I first got started out and put a fledgling MuGo on
CGOS, I was discouraged at the vast gulf between GnuGo and the high-dan
bots. Now that CGOS is flooded with all sorts of NN-based bots that have
very similar playstyles and weaknesses, the Minigo team has been
Hi everyone,
The artifacts of our latest 9x9 run are publicly available for download at
gs://minigo-pub/v3-9x9
Under this directory you can find
data/ Tensorflow tf.Record files corresponding to raw training data
models/ The saved model files for each generation
sgf/ Zip archives of all games in
Hi everybody,
I'm happy to announce MiniGo is now open source.
https://github.com/tensorflow/minigo
We're not aiming for a top-level Go AI; we're merely aiming for a correct,
very readable implementation of the AlphaGoZero algorithm and demonstration
of Google Cloud / Kubernetes / TensorFlow.
It should default to the Q of the parent node. Otherwise, let's say that
the root node is a losing position. Upon choosing a followup move, the Q
will be updated to a very negative value, and that node won't get explored
again - at least until all 362 top-level children have been explored and
I've been wondering about something I've seen in a few papers (AlphaGo's
paper, Cazenave's resnet policy architecture), which is the presence of an
input plane filled with 0s.
The input features also typically include a plane of 1s, which makes sense
to me - zero-padding before a convolution
Hello everyone,
Is anybody else planning to go to this event?
https://www.reddit.com/r/baduk/comments/6ge1ul/the_1st_world_ai_go_open/
I'm a first timer to in-person computer go tournaments and am unsure how I
should deal with logistics. I don't want to pack up my desktop in a
suitcase so I've
As I understand it, RL boosts the performance of the policy network, not
because the winner's moves are particularly better than the loser's moves,
but because it shores up specifically the weaknesses of the SL-trained
network. In other words, a network trained purely with SL will have certain
I've been working on my own AlphaGo replication (code on github
https://github.com/brilee/MuGo), and I've found it reasonably easy to hit
45% prediction rate with basic features (stone locations, liberty counts,
and turns since last move), and a relatively small network (6 intermediate
layers, 32
12 matches
Mail list logo