Re: [Computer-go] Accelerating Self-Play Learning in Go

2019-03-11 Thread Gian-Carlo Pascutto
On 8/03/19 16:14, David Wu wrote:
> I suspect Leela Zero would come off as far *less* favorable if one
> tried to do such a comparison using their actual existing code rather
> than abstracting down to counting neural net evals, because as far as
> I know in Leela Zero there is no cross-game batching of neural net
> evaluations, which makes a huge difference in the ability to use a
> strong GPU efficiently.

We found that the speedup from batching mostly depended on whether:

- you're using cuDNN (i.e. NVIDIA hardware)
- you're using cards with Tensor Cores
- you're using smaller boards

Not so coincidentally all of these are true for *you*, i.e. a
homogeneous farm of powerful NVIDIA Volta cards where you control the
entire software stack, and are mixing in different sized boards in the
games.

But that's not quite the platform Leela Zero was targeted at, it's
almost the exact opposite :-)

There are extensive benchmarks in the github issues of cuDNN vs TensorRT
vs generic OpenCL performance with varying batch sizes, and you'll see
that for the majority of hardware there wasn't that much to gain by
adding batching. Once Tensor Core support was added to the OpenCL code,
batching immediately made a huge difference (on Volta/RTX at least...)
and was thus merged shortly after.

> Only in the last couple months or so based on
> what I've been seeing in chat and pull requests, Leela Zero
> implemented within-search batching of neural net evals, but clients
> still only play one game at a time.

Playing multiple games at the same time has been supported since very
early on (-g switch in AutoGTP or always by running multiple clients),
it's batching networks over multiple games that wasn't (and still isn't)
implemented in the default client.

I do think it's useful to have, it's just that for Leela Zero this
wasn't - and probably still isn't - very important compared to
everything else.

Anyway, I agree and note this is all completely tangential to the
question of "extra computation overhead" for the changes, which should
be negligible.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] A new ELF OpenGo bot and analysis of historical Go games

2019-02-19 Thread Gian-Carlo Pascutto
On 17/02/19 23:24, Hiroshi Yamashita wrote:
> Hi Ingo,
> 
>> * How strong is the new ELF bot in comparison with Leela-Zero?
> 
> from CGOS BayesElo, new ELF(ELFv2) is about +100 stronger than Leela-Zero.

We ran a test match and ELFv2 lost 34 - 62 against LZ-204 at 1600 visits
each, so that's about +100 Elo in favor of LZ at visits rather than time
parity.

This would mean going 800p -> 400p gives -200 Elo? Seems more than I
would expect.

I think I'd want to do more testing before forming an opinion :-)

> Leela Zero's playout is half. Because its net size is double.
> http://www.yss-aya.com/cgos/19x19/bayes.html

Smaller networks have more overhead (heads etc are fixed), so it's
closer to ~60%. Probably depends on the GPU too.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] GCP passing on the staff ...

2019-01-29 Thread Gian-Carlo Pascutto
On 29/01/19 11:23, Petri Pitkanen wrote:
> Just purely curiosity: How strong is Leela now? googling up gives that
> it is better than best humasn already? Is that true?

The network is over 100 Elo stronger than the second generation of ELF,
which was about 100 Elo stronger than the first generation, which
defeated a set of Korean top professional players 14-0.

Differences in implementation speed will shift the strength difference
around a bit, but not enough to change the conclusion that it's likely a
lot better than the best humans now.

I hear rumors it's not 100% undefeatable, and that with some trial and
error you can occasionally still find a weaknesses to pounce on.

It is used by professionals for analysis, e.g.:
https://lifein19x19.com/viewtopic.php?f=13=16074

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AI Ryusei 2018 result

2018-12-18 Thread Gian-Carlo Pascutto
On 17/12/18 01:53, Hiroshi Yamashita wrote:
> Hi,
> 
> AI Ryusei 2018 was held on 15,16th December in Nihon-kiin, Japan.
> 14 programs played preliminary swiss 7 round, and top 6 programs
>  played round-robin final. Then, Golaxy won.
> 
> Result
> https://www.igoshogi.net/ai_ryusei/01/en/result.html

It appears the 2nd place finisher after Golaxy was a hybrid of Rn and
Leela Zero, using rollouts to compensate for Leela's network being
trained with the "wrong" komi for this competition:

https://github.com/zakki/Ray/issues/171#issuecomment-447637052
https://img.igoshogi.net/ai_ryusei/01/data/11.pdf

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Message by Facebook AI group

2018-05-05 Thread Gian-Carlo Pascutto
On 5/05/2018 7:30, "Ingo Althöfer" wrote:
> It was meant from the viewpoint of an
> outside observer/commentator.
> 
> In Germany we have a proverb:
> "Konkurrenz belebt das Geschaeft."
> Roughly translated:
> "Competition enlivens the bbusiness."

So does cooperation.

Thanks to Facebook for making (so far part of) the data public. They
have soundly beaten some of their competition in this regard.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Message by Facebook AI group

2018-05-04 Thread Gian-Carlo Pascutto
On 3/05/2018 5:24, "Ingo Althöfer" wrote:
> Hello,
> 
> in the German computer go forum a link to this message by the
> Facebook AI Research group was posted: 
> https://research.fb.com/facebook-open-sources-elf-opengo/

FYI, we were able to convert the Facebook network into Leela Zero
format, which should make it a lot easier to play against or test with.

https://github.com/gcp/leela-zero/releases
https://github.com/gcp/leela-zero/issues/1329

> I think this action will speed up "the" development.

Depends on what "the" is, I guess.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Leela Zero on 9x9

2018-04-30 Thread Gian-Carlo Pascutto
There has been some discussion whether value networks can "work" on 9x9
and whether the bots can beat the best humans.

While I don't expect this to resolve the discussion, Leela Zero now tops
the CGOS 9x9 list. This seems to be entirely the work of a single user
who has ran 3.2M self-play games on a single GPU over the course of 3
months. He has made the resulting weight file available.

https://github.com/gcp/leela-zero/issues/1291

There was an interesting trick done with switching komi, which you can
read about above.

FWIW, BayesElo suggests there may have been another bot who is very
close in strength, but the name "Maximus_160B_512F" is rather suggestive
this is also a DCNN based bot...

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] PUCT formula

2018-03-09 Thread Gian-Carlo Pascutto
On 09-03-18 18:03, Brian Sheppard via Computer-go wrote:

> I am guessing that Chenjun and Martin decided (or knew) that the AGZ
> paper was incorrect and modified the equation accordingly.
> 

I doubt it's just the paper that was incorrect, given that the formula
has been given without log already in the original Alpha Go Lee Sedol paper.

Of course it would be funny if it was a mistake and just got copy pasted.

I never tried "fixing" the formula, I just tried the original (with
priors), and what they gave, and strength was rather similar.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] PUCT formula

2018-03-09 Thread Gian-Carlo Pascutto
On 08-03-18 18:47, Brian Sheppard via Computer-go wrote:
> I recall that someone investigated this question, but I don’t recall the
> result. What is the formula that AGZ actually uses?

The one mentioned in their paper, I assume.

I investigated both that and the original from the referenced paper, but
after tuning I saw little meaningful strength difference.

One thing of note is that (IIRC) the AGZ formula keeps scaling the
exploration term by the policy prior forever. In the original formula,
it is a diminishing term.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Crazy Stone is back

2018-03-05 Thread Gian-Carlo Pascutto
On 5/03/2018 12:28, valky...@phmp.se wrote:
> Remi twittered more details here (see the discussion with gghideki:
> 
> https://twitter.com/Remi_Coulom/status/969936332205318144

Thank you. So Remi gave up on rollouts as well. Interesting "difference
of opinion" there with Zen.

Last time I tested this in regular Leela, playouts were beneficial, but
this was before combined value+policy nets and much more training data
was available. I do not know what the current status would be.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] 9x9 is last frontier?

2018-03-05 Thread Gian-Carlo Pascutto
On 5/03/2018 10:54, Dan wrote:
> I believe this is a problem of the MCTS used and not due
> to for lack of training. 
> 
> Go is a strategic game so that is different from chess that is full of
> traps.     

Does the Alpha Zero result not indicate the opposite, i.e. that MCTS is
workable?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Crazy Stone is back

2018-03-05 Thread Gian-Carlo Pascutto
On 28-02-18 07:13, Rémi Coulom wrote:
> Hi,
> 
> I have just connected the newest version of Crazy Stone to CGOS. It
> is based on the AlphaZero approach.

In that regard, are you still using Monte Carlo playouts?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] 9x9 is last frontier?

2018-03-05 Thread Gian-Carlo Pascutto
On 02-03-18 17:07, Dan wrote:
> Leela-chess is not performing well enough 

I don't understand how one can say that given that they started with the
random network last week only and a few clients. Of course it's bad!
That doesn't say anything about the approach.

Leela Zero has gotten strong but it has been learning for *months* with
~400 people. It also took a while to get to 30 kyu.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] MCTS with win-draw-loss scores

2018-02-13 Thread Gian-Carlo Pascutto
On 13-02-18 16:05, "Ingo Althöfer" wrote:
> Hello,
> 
> what is known about proper MCTS procedures for games
> which do not only have wins and losses, but also draws
> (like chess, Shogi or Go with integral komi)?
> 
> Should neural nets provide (win, draw, loss)-probabilities
> for positions in such games?

I treat draw the same as a 50% win-rate score. Works well enough, don't
really see what advantages treating it separately would give.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] MiniGo open sourced

2018-01-30 Thread Gian-Carlo Pascutto
On 30-01-18 20:59, Álvaro Begué wrote:
> Chrilly Donninger's quote was probably mostly true in the 90s, but
> it's now obsolete. That intellectual protectionism was motivated by
> the potential economic profit of having a strong engine. It probably
> slowed down computer chess for decades, until the advent of strong
> open-source programs. Paradoxically, when the economic incentive to
> create strong engines was removed, we saw an explosion in strength.

There still seems to be an economic incentive to improve [1] strong
engines and try to sell them.

It should be noted that until Stockfish came along, open source computer
chess engines were a graveyard where every strong enough engine just got
cloned or plagiarized and real enduring cooperation was essentially
nonexistent. You just had 10 non-cooperating forks (some closed source,
and some allegedly commercial ones) that added <-20 ... >+100 Elo.

There had been open source engines as early as GNUChess (or probably
earlier...), and very strong ones like Fruit.

I don't know for sure what allowed Stockfish to (mostly) escape the same
fate. Right now I would say fishtest is a huge factor, but it might've
been doing fine before that.

[1] I originally wrote "create" here but that might not be correct.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Nvidia Titan V!

2017-12-08 Thread Gian-Carlo Pascutto
On 08-12-17 09:29, Rémi Coulom wrote:
> Hi,
> 
> Nvidia just announce the release of their new GPU for deep learning: 
> https://www.theverge.com/2017/12/8/16750326/nvidia-titan-v-announced-specs-price-release-date
>
>  "The Titan V is available today and is limited to two per
> customer."
> 
> $2,999, 110 TFLOPS!

You can test Voltas on AWS, the prices are very acceptable.

I had problems getting good convergence with fp16 training, even taking
into account all the tricks in NVIDIA's "mixed precision learning"
document and using the respective NVIDIA-caffe branches. It worked for
the policy network, but not for the value network.

You only get 110 TFLOPS when using the mixed precision fp16 into fp32
accumulator matrix multipliers from the Tensor Cores, otherwise it's not
so different from a 1080 Ti in speed. It has a lot of cores, but the
clock-speed is much lower.

I also had the impression that using the Tensor Cores disables the
Winograd transform, perhaps due to accuracy issues? So you lose a factor
~3 in speedup.

Things to consider before plunking down 3000 USD :-)

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-07 Thread Gian-Carlo Pascutto
On 03-12-17 21:39, Brian Lee wrote:
> It should default to the Q of the parent node. Otherwise, let's say that
> the root node is a losing position. Upon choosing a followup move, the Q
> will be updated to a very negative value, and that node won't get
> explored again - at least until all 362 top-level children have been
> explored and revealed to have negative values. So without initializing Q
> to the parent's Q, you would end up wasting 362 MCTS iterations.

Note that the same argument could be made for making it 0, which some
people think the AGZ paper implies, so the above can't be the entire
explanation.

That said, empirical testing indicates that initializing Q(s, a) to the
parent is indeed a well performing setting for both strong and weak
policy networks.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-07 Thread Gian-Carlo Pascutto
On 06-12-17 22:29, Brian Sheppard via Computer-go wrote:
> The chess result is 64-36: a 100 rating point edge! I think the
> Stockfish open source project improved Stockfish by ~20 rating points in
> the last year.

It's about 40-45 Elo FWIW.

> AZ would dominate the current TCEC. 

I don't think you'll get to 80 knps with a regular 22 core machine or
whatever they use. Remember that AZ hardware is about 16 x 1080 Ti's.
You'll lose that (70 - 40 = 30 Elo) advantage very, very quickly.

IMHO this makes it all the more clear how silly it is that so much
attention is given to TCEC with its completely arbitrary hardware choice.

> The Stockfish team will have some self-examination going forward for
> sure. I wonder what they will decide to do.

Probably the same the Zen team did. Ignore a large part of the result
because people's actual computers - let alone mobile phones - can't run
a neural net at TPU speeds.

The question is if resizing the network makes the resulting program more
competitive, enough to overcome the speed difference. And, aha, in which
direction are you going to try to resize? Bigger or smaller?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-07 Thread Gian-Carlo Pascutto
On 06-12-17 21:19, Petr Baudis wrote:
> Yes, that also struck me.  I think it's good news for the community
> to see it reported that this works, as it makes the training process
> much more straightforward.  They also use just 800 simulations,
> another good news.  (Both were one of the first tradeoffs I made in
> Nochi.)

The 800 simulations are a setting that works over all 3 games. It's not
necessarily as good for 19x19 Go (more legal moves than the other games,
so less deep trees).

As for both the lack of testing and this parameter, someone has remarked
on github that the DeepMind hardware is fixed, so this also represents a
tuning between the speed of the learning machine and the speed of the
self-play machines.

In my experience, just continuing to train the network further (when no
new data is batched in) often regresses the performance by 200 or more
Elo. So it's not clear this step is *entirely* ignorable unless you have
already tuned the speed of the other two aspects.

> Another interesting tidbit: they use the TPUs to also generate the 
> selfplay games.

I think this was already known.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Gian-Carlo Pascutto
On 6/12/2017 19:48, Xavier Combelle wrote:
> Another result is that chess is really drawish, at the opposite of shogi

We sort-of knew that, but OTOH isn't that also because the resulting
engine strength was close to Stockfish, unlike in other games?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Gian-Carlo Pascutto
On 6/12/2017 18:57, Darren Cook wrote:
>> Mastering Chess and Shogi by Self-Play with a General Reinforcement
>> Learning Algorithm
>> https://arxiv.org/pdf/1712.01815.pdf
> 
> One of the changes they made (bottom of p.3) was to continuously update
> the neural net, rather than require a new network to beat it 55% of the
> time to be used. (That struck me as strange at the time, when reading
> the AlphaGoZero paper - why not just >50%?)

I read that as a simple way of establishing confidence that the result
was statistically significant > 0. (+35 Elo over 400 games - I don't
know by hearth how large the typical error margin of 400 games is, but I
think it won't be far off!)

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-06 Thread Gian-Carlo Pascutto
On 06-12-17 11:47, Aja Huang wrote:
> All I can say is that first-play-urgency is not a significant 
> technical detail, and what's why we didn't specify it in the paper.

I will have to disagree here. Of course, it's always possible I'm
misunderstanding something, or I have a program bug that I'm mixing up
with this.

Or maybe you mean that you expect the program to improve regardless of
this setting. In any case, I've now seen people state here twice that
this is detail that doesn't matter. But practical results suggest otherwise.

For a strong supervised network, FPU=0 (i.e. not exploring all successor
nodes for a longer time, relying strongly on policy priors) is much
stronger. I've seen this in Leela Zero after we tested it, and I've
known it to be true from regular Leela for a long time. IIRC, the strong
open source Go bots also use some form of progressive widening, which
produces the same effect.

For a weak RL network without much useful policy priors, FPU>1 is much
stronger than FPU=0.

Now these are relative scores of course, so one could argue they don't
affect the learning process. But they actually do that as well!

The new AZ paper uses MCTS playouts = 800, and plays proportionally
according to MCTS output. (Previous AGZ had playouts = 1600,
proportional for first 30 moves).

Consider what this means for the search probability outputs, exactly the
thing the policy network has to learn. With FPU=1, the move
probabilities are much more uniform, and the moves played are
consequentially much more likely to be bad or even blunders, because
there are less playouts that can be spent on the best move, even if it
was found.

> The initial value of Q is not very important because Q+U is
> dominated by the U piece when the number of visits is small.

a = Q(s, a) + coeff * P(s,a) * (sqrt(parent->visits) / 1.0f +
child->visits());

Assume parent->visits = 100, sqrt = 10
Assume child->visits = 0
Assume P(s, a) = 0.0027 (near uniform prior for "weak" network)

The right most side of this (U term) is ~1. This clearly does not
dominate the Q term. If Q > 1 (classic FPU) then every child node will
get expanded. If Q = 0 (Q(s, a) = 0) then the first picked child
(largest policy prior) will get something like 10 expansions before
another child gets picked. That's a massive difference in search tree
shape, *especially* with only 800 total playouts.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Is MCTS needed?

2017-11-17 Thread Gian-Carlo Pascutto
On 16-11-17 18:24, Stephan K wrote:
> 2017-11-16 17:37 UTC+01:00, Gian-Carlo Pascutto <g...@sjeng.org>:
>> Third, evaluating with a different rotation effectively forms an
>> ensemble that improves the estimate.
> 
> Could you expand on that? I understand rotating the board has an
> impact for a neural network, but how does that change anything for a
> tree search? Or is it because the monte carlo tree search relies on
> the policy network?

It was a response to the statement "There are no reevaluations that
would improve your estimate."

Consider a quiet position where the tree search wouldn't reveal any
tactics. Normally, searching deeper won't give an immediate benefit. But
because of the rotations, the value network's score is improved from a
single estimate to an ensemble.

In chess/alpha-beta terms, the quiescence search resolves the tactics
(if any), so running it again with part of the tactics resolved would
produce the same score. But with value nets, this is not entirely true.

> Could it be possible to train a value net using only the results of
> already finished games, rather than monte carlo rollouts?

Isn't this how it works already?

> My (extremely vague and possibly fallacious) understanding of the
> situation was that monte carlo tree search was less effective for
> chess because of the more sudden changes there might be when
> evaluating chess positions. For instance, a player with an apparently
> lesser position might actually be a few moves away from a checkmate
> (or just from a big gain), which might be missed by the monte carlo
> tree search because it depends on one particular branch of the tree.

Life and death and capture races behave the same. The inability of MCTS
to switch to a new PV instantly isn't necessarily very different from
the requirement in chess that all moves are searched to an equal
(nominal!) depth. In practical alpha-beta implementations, failing high
on a new best move requires a re-search as well.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Is MCTS needed?

2017-11-17 Thread Gian-Carlo Pascutto
On 17-11-17 02:15, Hideki Kato wrote:
> Stephan K: 
> 

Re: [Computer-go] Is MCTS needed?

2017-11-16 Thread Gian-Carlo Pascutto
On 16-11-17 18:15, "Ingo Althöfer" wrote:
> Something like MCTS would not work in chess, because in
> contrast to Go (and Hex and Amazons and ...) Chess is
> not a "game with forward direction".

Ingo, I think the reason Petr brought the whole thing up is that AlphaGo
Zero uses "MCTS" but it does not actually use Monte Carlo Playouts. I
think your condition about "forward direction" only applies to the
randomized playouts, yes? A neural network evaluation on the other hand
is very much like a classical static chess evaluation.

There are publications about Parallel Randomized Best First Search in
chess, read them and notice how it compares to MCTS.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Is MCTS needed?

2017-11-16 Thread Gian-Carlo Pascutto
On 16/11/2017 16:43, Petr Baudis wrote:
> But now, we expand the nodes literally all the time, breaking the 
> stationarity possibly in drastic ways.  There are no reevaluations
> that would improve your estimate.

First of all, you don't expect the network evaluations to drastically
vary between parent and children, unless there are tactics that you are
not understanding.

Secondly, the evaluations are rather noisy, so averaging still makes sense.

Third, evaluating with a different rotation effectively forms an
ensemble that improves the estimate.

> Therefore, can't we take the next step, and do away with MCTS?  Is 
> there a theoretical viewpoint from which it still makes sense as the
> best policy improvement operator?

People have posted results with that on this list and IIRC programs
using regular alpha-beta were weaker.

As for a theoretical viewpoint: the value net is an estimation of the
value of some fixed amount of Monte Carlo rollouts.

> What would you say is the current state-of-art game tree search for 
> chess?  That's a very unfamiliar world for me, to be honest all I
> really know is MCTS...

The same it was 20 year ago, alpha-beta. Though one could certainly make
the argument that an alpha-beta searcher using late move reductions
(searching everything but the best moves less deeply) is searching a
tree of a very similar shape as an UCT searcher with a small exploration
constant.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] what is reachable with normal HW

2017-11-15 Thread Gian-Carlo Pascutto
On 15-11-17 10:51, Petri Pitkanen wrote:
> I think the intereseting question left now is: How strong GO-program one
> can have in normal Laptop? TPU and GPU are fine for showing what can be
> done but as practical tool for a go player the bot  has to run something
> people can afford. And can buy from shop? From KGS 100 list I can spot
> 8d bots but I do not know how big HW they are using. 
> 
> Could todays laptop with best possible SW beat best humans?

What does "best possible SW" mean? The one that isn't written yet? :-)

Zero was reportedly very strong with 4 TPU. If we say 1 TPU = 1 GTX 1080
Ti the Elo loss from the slowdown from 4 to 1 would still make it far
stronger than the best humans.

As for things that are available right now:

The latest Zen is very strong even without a GPU.

My bot was 8d on KGS with a GTX 1080 Ti and a Ryzen 1700 (roughly ~1000
USD hardware). I don't run on KGS anymore but people from Tygem told me
it is equal to lower ranked pros there, on a smaller system.

You can get very strong software right now, but nothing will change the
fact that better hardware always helps. The difference between a laptop
and a desktop with a real GPU will always be there.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication

2017-11-15 Thread Gian-Carlo Pascutto
On 11-11-17 00:58, Petr Baudis wrote:
>>>   * The neural network is updated after _every_ game, _twice_, on _all_
>>> positions plus 64 randomly sampled positions from the entire history,
>>> this all done four times - on original position and the three
>>> symmetry flips (but I was too lazy to implement 90\deg rotation).
>>
>> The reasoning being to give a stronger and faster reinforcement with the
>> latest data?
> 
> Yes.

One thing I wonder about, given the huge size of the network and the
strong reinforcement, don't you get total overfitting?

I guess the next few games will quickly "point out" the overfit, but I
still wonder whether keeping the overfit under control wouldn't be
better rather than the see-sawing this would seem to cause.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication

2017-11-10 Thread Gian-Carlo Pascutto
On 10/11/2017 1:47, Petr Baudis wrote:

>   * AlphaGo used 19 resnet layers for 19x19, so I used 7 layers for 7x7.

How many filters per layer?

FWIW 7 layer resnet (14 + 2 layers) is still pretty huge - larger than
the initial AlphaGo. Given the amount of games you have, and the size of
the board, I would not be surprised if your neural net program is
"outbooking" the opponent by remembering the sequences rather than
learning more generic things.

(But hey, outbooking is learning too!)

>   * The neural network is updated after _every_ game, _twice_, on _all_
> positions plus 64 randomly sampled positions from the entire history,
> this all done four times - on original position and the three
> symmetry flips (but I was too lazy to implement 90\deg rotation).

The reasoning being to give a stronger and faster reinforcement with the
latest data?

>   * Value function is trained with cross-entropy rather than MSE,
> no L2 regularization, and plain Adam rather than hand-tuned SGD (but
> the annealing is reset time by time due to manual restarts of the
> script from a checkpoint).

I never really had good results with Adam and friends compared to SGD
(even momentum does not always help - but of course it's much faster
early on).

>   * No resign auto-threshold but it is important to play 25% games
> without resigning to escale local "optima".

This makes sense because both sides will miscount in exactly the same way.

>   * 1/Temperature is 2 for first three moves.
>   * Initially I used 1000 "simulations" per move, but by mistake, last
> 1500 games when the network improved significantly (see below) were
> run with 2000 simulations per move.  So that might matter.
> 
>   This has been running for two weeks, self-playing 8500 games.  A week
> ago its moves already looked a bit natural but it was stuck in various
> local optima.  Three days ago it has beaten GNUGo once across 20 games.
> Now five times across 20 games - so I'll let it self-play a little longer
> as it might surpass GNUGo quickly at this point?  Also this late
> improvement coincides with the increased simulation number.

The simulation number if one of the big black boxes in this setup, I
think. If the policy network does not have a strong opinion yet, it
seems that one has to make it sufficiently bigger than the amount of
legal moves. If not, first-play-urgency will cause every successor
position to be evaluated and there's no look ahead, which means MCTS
can't discover anything.

So a few times 361 makes sense for 19x19, but don't ask me why 1600 and
not 1200 etc.

With only 50-ish moves to consider on 7x7, it's interesting that you see
a big improvement by making it (relatively) much larger than DeepMind did.

But uh, you're not simply matching it against GNUGo with more
simulations are you? I mean it would be quite normal to win more when
searching deeper.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero Loss

2017-11-07 Thread Gian-Carlo Pascutto
On 7/11/2017 19:08, Petr Baudis wrote:
> Hi!
> 
> Does anyone knows why the AlphaGo team uses MSE on [-1,1] as the
> value output loss rather than binary crossentropy on [0,1]?  I'd say
> the latter is way more usual when training networks as typically
> binary crossentropy yields better result, so that's what I'm using
> in https://github.com/pasky/michi/tree/nnet for the time being, but
> maybe I'm missing some good reason to use MSE instead?

Not that I know of. You can certainly get some networks to converge
better by using cross-entropy over MSE.

Maybe it's related to the nature of the errors? More avoidance of the
output being entirely wrong? Or habit? MSE is generally preferred for
regression-like problems, but you can argue whether a go position is
being regressed to some winrate%, or to win/loss...

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero self-play temperature

2017-11-07 Thread Gian-Carlo Pascutto
On 7/11/2017 19:07, Imran Hendley wrote:
> Am I understanding this correctly?

Yes.

It's possible they had in-betweens or experimented with variations at
some point, then settled on the simplest case. You can vary the
randomness if you define it as a softmax with varying temperature,
that's harder if you only define the policy as select best or select
proportionally.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Zero is weaker than Master!?

2017-10-27 Thread Gian-Carlo Pascutto
On 27-10-17 10:15, Xavier Combelle wrote:
> Maybe I'm wrong but both curves for alphago zero looks pretty similar
> except than the figure 3 is the zoom in of figure 6

The blue curve in figure 3 is flat at around 60 hours (2.5 days). In
figure 6, at 2.5 days the line is near vertical. So it is not a zoom.

Maybe this can help you:
https://www.reddit.com/r/baduk/comments/77hr3b/elo_table_of_alphago_zero_selfplay_games/

Note the huge Elo advantage of the 20 blocks version early on (it can
learn faster, but stalls out faster).

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] November KGS bot tournament

2017-10-27 Thread Gian-Carlo Pascutto
On 26-10-17 09:43, Nick Wedd wrote:
> Please register by emailing me at mapr...@gmail.com
> , with the words "KGS Tournament Registration"
> in the email title.
> With the falling interest in these events since the advent of AlphaGo,
> it is likely that this will be the last of the series of KGS bot
> tournaments.

Thank you for organizing them for so long!

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Source code (Was: Reducing network size? (Was: AlphaGo Zero))

2017-10-27 Thread Gian-Carlo Pascutto
On 27-10-17 00:33, Shawn Ligocki wrote:
> But the data should be different for different komi values, right? 
> Iteratively producing self-play games and training with the goal of 
> optimizing for komi 7 should converge to a different optimal player 
> than optimizing for komi 5.

For the policy (head) network, yes, definitely. It makes no difference
to the value (head) network.

> But maybe having high quality data for komi 7 will still save a lot
> of the work for training a komi 5 (or komi agnostic) network?

I'd suspect so.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Zero is weaker than Master!?

2017-10-26 Thread Gian-Carlo Pascutto
Figure 6 has the same graph as Figure 3 but for 40 blocks. You can compare
the Elo.

On Thu, Oct 26, 2017, 23:35 Xavier Combelle <xavier.combe...@gmail.com>
wrote:

> Unless I mistake figure 3 shows the plot of supervised learning to
> reinforcement learning, not 20 bloc/40 block
>
> For searching mention of the 20 blocks I search for 20 in the whole
> paper and did not found any other mention
>
> than of the kifu thing.
>
>
> Le 26/10/2017 à 15:10, Gian-Carlo Pascutto a écrit :
> > On 26-10-17 10:55, Xavier Combelle wrote:
> >> It is just wild guesses  based on reasonable arguments but without
> >> evidence.
> > David Silver said they used 40 layers for AlphaGo Master. That's more
> > evidence than there is for the opposite argument that you are trying to
> > make. The paper certainly doesn't talk about a "small" and a "big"
> Master.
> >
> > You seem to be arguing from a bunch of misreadings and
> > misunderstandings. For example, Figure 3 in the paper shows the Elo plot
> > for the 20 block/40 layer version, and it compares to Alpha Go Lee, not
> > Alpha Go Master. The Alpha Go Master line would be above the flattening
> > part of the 20 block/40 layer AlphaGo Zero. I guess you missed this when
> > you say that they "only mention it to compare on kifu prediction"?
> >
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go

-- 

GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-26 Thread Gian-Carlo Pascutto
On 25-10-17 16:00, Petr Baudis wrote:
> That makes sense.  I still hope that with a much more aggressive 
> training schedule we could train a reasonable Go player, perhaps at
> the expense of worse scaling at very high elos...  (At least I feel 
> optimistic after discovering a stupid bug in my code.)

By the way, a trivial observation: the initial network is random, so
there's no point in using it for playing the first batch of games. It
won't do anything useful until it has run a learning pass on a bunch of
"win/loss" scored games and it can at least tell who is the likely
winner in the final position (even if it mostly won't be able to make
territory at first).

This suggests that bootstrapping probably wants 500k starting games with
just random moves.

FWIW, it does not seem easy to get the value part of the network to
converge in the dual-res architecture, even when taking the appropriate
steps (1% weighting on error, strong regularizer).

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Zero is weaker than Master!?

2017-10-26 Thread Gian-Carlo Pascutto
On 26-10-17 10:55, Xavier Combelle wrote:
> It is just wild guesses  based on reasonable arguments but without
> evidence.

David Silver said they used 40 layers for AlphaGo Master. That's more
evidence than there is for the opposite argument that you are trying to
make. The paper certainly doesn't talk about a "small" and a "big" Master.

You seem to be arguing from a bunch of misreadings and
misunderstandings. For example, Figure 3 in the paper shows the Elo plot
for the 20 block/40 layer version, and it compares to Alpha Go Lee, not
Alpha Go Master. The Alpha Go Master line would be above the flattening
part of the 20 block/40 layer AlphaGo Zero. I guess you missed this when
you say that they "only mention it to compare on kifu prediction"?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-25 Thread Gian-Carlo Pascutto
On 25-10-17 16:00, Petr Baudis wrote:

>> The original paper has the value they used. But this likely needs tuning. I
>> would tune with a supervised network to get started, but you need games for
>> that. Does it even matter much early on? The network is random :)
> 
>   The network actually adapts quite rapidly initially, in my experience.
> (Doesn't mean it improves - it adapts within local optima of the few
> games it played so far.)

Yes, but once there's structure, you can tune the parameter with CLOP or
whatever.

>   Yes, but why wouldn't you want that randomness in the second or third
> move?

You only need to play a different move at the root in order for the game
to deviate.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Source code (Was: Reducing network size? (Was: AlphaGo Zero))

2017-10-25 Thread Gian-Carlo Pascutto
On 25-10-17 05:43, Andy wrote:
> Gian-Carlo, I didn't realize at first that you were planning to create a
> crowd-sourced project. I hope this project can get off the ground and
> running!
> 
> I'll look into installing this but I always find it hard to get all the
> tool chain stuff going.

I will provide pre-made packages for common operating systems. Right now
we (Jonathan Roy is helping with the server) are exploring what's
possible for such a crowd-sourced effort, and testing the server. I'll
provide an update here when there's something to play with.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Source code (Was: Reducing network size? (Was: AlphaGo Zero))

2017-10-24 Thread Gian-Carlo Pascutto
On 23-10-17 10:39, Darren Cook wrote:
>> The source of AlphaGo Zero is really of zero interest (pun intended).
> 
> The source code is the first-hand account of how it works, whereas an
> academic paper is a second-hand account. So, definitely not zero use.

This should be fairly accurate:

https://github.com/gcp/leela-zero

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-22 Thread Gian-Carlo Pascutto
On 21/10/2017 14:21, David Ongaro wrote:
> I understand that DeepMind might be unable to release the source code
> of AlphaGo due to policy or licensing reasons, but it would be great
> (and probably much more valuable) if they could release the fully
> trained network.

The source of AlphaGo Zero is really of zero interest (pun intended). It
can be obtained by ripping out ~50% of the Ray/Rn or AQ code (everything
related to MC playouts) and some minimal changes to evaluate the same
network for scoring and policy. Same for Leela.

It's literally possible to have a "Leela/Ray/AQ Zero" in a week or so
(it'll require a GPU or performance will be atrocious).

Of course, I can't give you the trained network to load into it. That'll
take another 88642 weeks.

So yes, the database of 29M self-play games would be immensely more
valuable. (Probably like the last 5M or so is fine, too). I prefer the
games over the network - with the games it's easier to train a smaller
network that gives better results on PC's that don't have 4 TPUs in them.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Zero performance

2017-10-21 Thread Gian-Carlo Pascutto
On 20/10/2017 22:48, fotl...@smart-games.com wrote:
> The paper describes 20 and 40 block networks, but the section on
> comparison says AlphaGo Zero uses 20 blocks. I think your protobuf
> describes a 40 block network. That's a factor of two 

They compared with both, the final 5180 Elo number is for the 40 block
one. For the 20 block one, the numbers stop around 4300 Elo.
See for example:

https://www.reddit.com/r/baduk/comments/77hr3b/elo_table_of_alphago_zero_selfplay_games/

A factor of 2 isn't much, but sure, it seems sensible to start with the
smaller one, given how intractable the problem looks right now.

> Your time looks reasonable when calculating the time to generate the
> 29M games at about 10 seconds per move. This is only the time to
> generate the input data. Do you have an estimate of the additional
> time it takes to do the training? It's probably small in comparison,
> but it might not be.

So far I've assumed that it's zero, because it can happen in parallel
and the time to generate the self-play games dominates. From the revised
hardware estimates, we can also see that the training machines used 64
GPUs, which is a lot smaller than the 1500+ TPU estimate for the
self-play machines.

Training on the GTX 1080 Ti does 4 batches of 32 positions per second.
They use 2048 position batches, and train for 1000 batches before
checkpointing. So the GTX can produce a checkpoint every 4.5 hours [1].
Testing that over 400 games takes 8.5 days (400 x 200 x 9.3s).

So again, it totally bottlenecks on playing games, not on training. At
least, if the improvement is big, one needn't play the 400 games out,
but SPRT termination can be used.

[1] To be honest, this seems very fast - even starting from 0 such a big
network barely advances in 1000 iterations (or I misinterpreted a
training parameter). But I guess it's important to have a very fast -
learn knowledge - use new knowledge - feedback cycle.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Zero performance

2017-10-21 Thread Gian-Carlo Pascutto
On 20/10/2017 22:41, Sorin Gherman wrote:
> Training of AlphaGo Zero has been done on thousands of TPUs,
> according to this source: 
> https://www.reddit.com/r/baduk/comments/777ym4/alphago_zero_learning_from_scratch_deepmind/dokj1uz/?context=3
>
>  Maybe that should explain the difference in orders of magnitude that
> you noticed?

That would make a lot more sense, for sure. It would also explain the
25M USD number from Hassabis. That would be a lot of money to spend on
"only" 64 GPUs, or 4 TPU (which are supposed to be ~1 GPU).

There's no explanation where the number came from, but it seems that he
did similar math as in the original post here.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Zero performance

2017-10-20 Thread Gian-Carlo Pascutto
I agree. Even on 19x19 you can use smaller searches. 400 iterations MCTS is
probably already a lot stronger than the raw network, especially if you are
expanding every node (very different from a normal program at 400
playouts!). Some tuning of these mini searches is important. Surely you
don't want to explore every child node for the first play urgency... I
remember this little algorithmic detail was missing from the first paper as
well.

So that's a factor 32 gain. Because the network is smaller, it should learn
much faster too. Someone on reddit posted a comparison of 20 blocks vs 40
blocks.

With 10 people you can probably get some results in a few months. The
question is, how much Elo have we lost on the way...

Another advantage would be that, as long as you keep all the SGF, you can
bootstrap a bigger network from the data! So, nothing is lost from starting
small. You can "upgrade" if the improvements start to plateau.

On Fri, Oct 20, 2017, 23:32 Álvaro Begué <alvaro.be...@gmail.com> wrote:

> I suggest scaling down the problem until some experience is gained.
>
> You don't need the full-fledge 40-block network to get started. You can
> probably get away with using only 20 blocks and maybe 128 features (from
> 256). That should save you about a factor of 8, plus you can use larger
> mini-batches.
>
> You can also start with 9x9 go. That way games are shorter, and you
> probably don't need 1600 network evaluations per move to do well.
>
> Álvaro.
>
>
> On Fri, Oct 20, 2017 at 1:44 PM, Gian-Carlo Pascutto <g...@sjeng.org>
> wrote:
>
>> I reconstructed the full AlphaGo Zero network in Caffe:
>> https://sjeng.org/dl/zero.prototxt
>>
>> I did some performance measurements, with what should be
>> state-of-the-art on consumer hardware:
>>
>> GTX 1080 Ti
>> NVIDIA-Caffe + CUDA 9 + cuDNN 7
>> batch size = 8
>>
>> Memory use is about ~2G. (It's much more for learning, the original
>> minibatch size of 32 wouldn't fit on this card!)
>>
>> Running 2000 iterations takes 93 seconds.
>>
>> In the AlphaGo paper, they claim 0.4 seconds to do 1600 MCTS
>> simulations, and they expand 1 node per visit (if I got it right) so
>> that would be 1600 network evaluations as well, or 200 of my iterations.
>>
>> So it would take me ~9.3s to produce a self-play move, compared to 0.4s
>> for them.
>>
>> I would like to extrapolate how long it will take to reproduce the
>> research, but I think I'm missing how many GPUs are in each self-play
>> worker (4 TPU or 64 GPU or ?), or perhaps the average length of the games.
>>
>> Let's say the latter is around 200 moves. They generated 29 million
>> games for the final result, which means it's going to take me about 1700
>> years to replicate this. I initially estimated 7 years based on the
>> reported 64 GPU vs 1 GPU, but this seems far worse. Did I miss anything
>> in the calculations above, or was it really a *pile* of those 64 GPU
>> machines?
>>
>> Because the performance on playing seems reasonable (you would be able
>> to actually run the MCTS on a consumer machine, and hence end up with a
>> strong program), I would be interested in setting up a distributed
>> effort for this. But realistically there will be maybe 10 people
>> joining, 80 if we're very lucky (looking at Stockfish numbers). That
>> means it'd still take 20 to 170 years.
>>
>> Someone please tell me I missed a factor of 100 or more somewhere. I'd
>> love to be wrong here.
>>
>
>> --
>> GCP
>
>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go

-- 

GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-20 Thread Gian-Carlo Pascutto
On Fri, Oct 20, 2017, 21:48 Petr Baudis  wrote:

>   Few open questions I currently have, comments welcome:
>
>   - there is no input representing the number of captures; is this
> information somehow implicit or can the learned winrate predictor
> never truly approximate the true values because of this?
>

They are using Chinese rules, so prisoners don't matter. There are simply
less stones of one color on the board.


>   - what ballpark values for c_{puct} are reasonable?
>

The original paper has the value they used. But this likely needs tuning. I
would tune with a supervised network to get started, but you need games for
that. Does it even matter much early on? The network is random :)


>   - why is the dirichlet noise applied only at the root node, if it's
> useful?
>

It's only used to get some randomness in the move selection, no ? It's not
actually useful for anything besides that.


>   - the training process is quite lazy - it's not like the network sees
> each game immediately and adjusts, it looks at last 500k games and
> samples 1000*2048 positions, meaning about 4 positions per game (if
> I understood this right) - I wonder what would happen if we trained
> it more aggressively, and what AlphaGo does during the initial 500k
> games; currently, I'm training on all positions immediately, I guess
> I should at least shuffle them ;)
>

I think the lazyness may be related to the concern that reinforcement
methods can easily "forget" things they had learned before. The value
network training also likes positions from distinct games.


-- 

GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Zero performance

2017-10-20 Thread Gian-Carlo Pascutto
I reconstructed the full AlphaGo Zero network in Caffe:
https://sjeng.org/dl/zero.prototxt

I did some performance measurements, with what should be
state-of-the-art on consumer hardware:

GTX 1080 Ti
NVIDIA-Caffe + CUDA 9 + cuDNN 7
batch size = 8

Memory use is about ~2G. (It's much more for learning, the original
minibatch size of 32 wouldn't fit on this card!)

Running 2000 iterations takes 93 seconds.

In the AlphaGo paper, they claim 0.4 seconds to do 1600 MCTS
simulations, and they expand 1 node per visit (if I got it right) so
that would be 1600 network evaluations as well, or 200 of my iterations.

So it would take me ~9.3s to produce a self-play move, compared to 0.4s
for them.

I would like to extrapolate how long it will take to reproduce the
research, but I think I'm missing how many GPUs are in each self-play
worker (4 TPU or 64 GPU or ?), or perhaps the average length of the games.

Let's say the latter is around 200 moves. They generated 29 million
games for the final result, which means it's going to take me about 1700
years to replicate this. I initially estimated 7 years based on the
reported 64 GPU vs 1 GPU, but this seems far worse. Did I miss anything
in the calculations above, or was it really a *pile* of those 64 GPU
machines?

Because the performance on playing seems reasonable (you would be able
to actually run the MCTS on a consumer machine, and hence end up with a
strong program), I would be interested in setting up a distributed
effort for this. But realistically there will be maybe 10 people
joining, 80 if we're very lucky (looking at Stockfish numbers). That
means it'd still take 20 to 170 years.

Someone please tell me I missed a factor of 100 or more somewhere. I'd
love to be wrong here.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-20 Thread Gian-Carlo Pascutto
On 19-10-17 13:00, Aja Huang via Computer-go wrote:
> Hi Hiroshi,
> 
> I think these are good questions. You can ask them at 
> https://www.reddit.com/r/MachineLearning/comments/76xjb5/ama_we_are_david_silver_and_julian_schrittwieser/

It seems the question was indeed asked but not answered:
https://www.reddit.com/r/MachineLearning/comments/76xjb5/ama_we_are_david_silver_and_julian_schrittwieser/dol03aq/

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-20 Thread Gian-Carlo Pascutto
On 19-10-17 13:23, Álvaro Begué wrote:
> Summing it all up, I get 22,837,864 parameters for the 20-block network
> and 46,461,544 parameters for the 40-block network.
> 
> Does this seem correct?

My Caffe model file is 185887898 bytes / 32-bit floats = 46 471 974

So yes, that seems pretty close. I'll send the model file and some
observations in a separate post.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-19 Thread Gian-Carlo Pascutto
On 18-10-17 19:50, cazen...@ai.univ-paris8.fr wrote:
> 
> https://deepmind.com/blog/
> 
> http://www.nature.com/nature/index.html

Another interesting tidbit:

The inputs don't contain a reliable board edge. The "white to move"
plane contains it, but only when white is to move.

So until AG Zero "black" learned that a go board is 19 x 19, the white
player had a serious advantage.

I think I will use 18 input layers :-)

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-18 Thread Gian-Carlo Pascutto
On 18/10/2017 22:00, Brian Sheppard via Computer-go wrote:
> This paper is required reading. When I read this team’s papers, I think
> to myself “Wow, this is brilliant! And I think I see the next step.”
> When I read their next paper, they show me the next *three* steps.

Hmm, interesting way of seeing it. Once they had Lee Sedol AlphaGo, it
was somewhat obvious that just self-playing that should lead to an
improved policy and value net.

And before someone accuses me of Captain Hindsighting here, this was
pointed out on this list:
http://computer-go.org/pipermail/computer-go/2017-January/009786.html

It looks to me like the real devil is in the details. Don't use a
residual stack? -600 Elo. Don't combine the networks? -600 Elo.
Bootstrap the learning? -300 Elo

We made 3 perfectly reasonable choices and somehow lost 1500 Elo along
the way. I can't get over that number, actually.

Getting the details right makes a difference. And they're getting them
right, either because they're smart, because of experience from other
domains, or because they're trying a ton of them. I'm betting on all 3.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-18 Thread Gian-Carlo Pascutto
On 18/10/2017 22:00, Brian Sheppard via Computer-go wrote:
> A stunning result. The NN uses a standard vision architecture (no Go
> adaptation beyond what is necessary to represent the game state).

The paper says that Master (4858 rating) uses Go specific features,
initialized by SL, and the same technique. Without go features, and
without initialization, it's Zero (5185 rating).

The obvious question is, what would be the result of using go features
and not initializing?

I would expect that providing liberties is a useful shortcut (see my
remark about game history!). But I'm willing to be surprised :-)

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-18 Thread Gian-Carlo Pascutto
On 18/10/2017 19:50, cazen...@ai.univ-paris8.fr wrote:
> 
> https://deepmind.com/blog/
> 
> http://www.nature.com/nature/index.html

Select quotes that I find interesting from a brief skim:

1) Using a residual network was more accurate, achieved lower error, and
improved performance in AlphaGo by over 600 Elo.

2) Combining policy and value together into a single network slightly
reduced the move prediction accuracy, but reduced the value error and
boosted playing performance in AlphaGo by around another 600 Elo.

These gains sound very high (much higher than previous experiments with
them reported here), but are likely due to the joint training.

3) The raw neural network, without using any lookahead, achieved an Elo
rating of 3,055. ... AlphaGo Zero achieved a rating of 5,185.

The increase of 2000 Elo from tree search sounds very high, but this may
just mean the value network is simply very good - and perhaps relatively
better than the policy one. (They previously had problems there that SL
> RL for the policy network guiding the tree search - but I'm not sure
there's any relation)

4) History features Xt; Yt are necessary because Go is not fully
observable solely from the current stones, as repetitions are forbidden.

This is a weird statement. Did they need 17 planes just to check for ko?
It seems more likely that history features are very helpful for the
internal understanding of the network as an optimization. That sucks
though - it's annoying for analysis and position setup.

Lastly, the entire training procedure is actually not very complicated
at all, and it's hopeful the training is "faster" than previous
approaches - but many things look fast if you can throw 64 GPU workers
at a problem.

In this context, the graphs of the differing network architectures
causing huge strength discrepancies are both good and bad. Making a
better pick can cause you to get massively better results, take a bad
pick and you won't come close.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Deep Blue the end, AlphaGo the beginning?

2017-08-18 Thread Gian-Carlo Pascutto
On 18/08/2017 23:07, uurtamo . wrote:
> They run on laptops. A program that could crush a grandmaster will run
> on my laptop. That's an assertion I can't prove, but I'm asking you to
> verify it or suggest otherwise.

Sure.

> Now the situation with go is different.

For what it's worth, I would expect the next release of Zen to make this
achievable as well. Especially if it supports GPU acceleration, and you
have one of those laptops with a GTX 1080 in it :-) But yes, chess is
comparatively further ahead against humans.

> But if we do agree that the problem itself is fundamentally harder,
> (which I believe it is) and we don't want to ascribe its solution simply
> to hardware (which people tried to do with big blue), then we should
> acknowledge that it required more innovation.
> 
> I do agree, and hope that you do, that this innovation is all part of a
> continuum of innovation that is super exciting to understand.

Of course I do. That is the whole point I was making with "appreciating
the sharpened tools".

My objection was to the claim that making Deep Blue didn't require any
innovation or new methods at all. They beat Kasparov in 1997, not 2017!

There is a secondary argument whether the methods used for Deep Blue
generalize as well as the methods used for AlphaGo. I think that
argument may not be as simple and clear-cut as Kasparov implied, because
for one, there are similarities and crossover in which methods both
programs used.

But I understand where it comes from. SL/RL and DCNN's (more associated
with AlphaGo) seem like a broader hammer than tree search (more
associated with Deep Blue).

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Deep Blue the end, AlphaGo the beginning?

2017-08-18 Thread Gian-Carlo Pascutto
On 18/08/2017 20:34, Petr Baudis wrote:
>   You may be completely right!  And yes, I was thinking about Deep Blue
> in isolation, not that aware about general computer chess history.  Do
> you have some suggested reading regarding Deep Blue and its lineage and
> their contributions to the field of AI at large?

I sure do. I *love* Hsu's papers, although it's hard now to imagine the
context some of his statements were made in, 30 year ago, because we
know the outcome. Back then, I'm sure many people thought he was a
raving lunatic.

His key idea was that, given ample of evidence program strength scaled
fairly well with computer speed, we should make the programs faster. He
did so by converting chess search to a literal hardware circuit, in an
implementation that was both algorithmically more efficient, and faster,
achieving about *3 orders of magnitude* improvement over what was then
the state of the art. And then designing a parallel search that worked
with it and scaled well enough.

Saying that these "implemented existing methods" is factually wrong, and
betrays a deep misunderstanding of the importance of computing power in
AI research. But I'll get back to this, later.

The original paper, "A Two-Million Moves/s CMOS Single-Chip Chess Move
Generator" was published in 1987. In the conclusion, it states "The best
chess machine now in existence is still about 400 to 500 rating points
below the Human World Chess Champion. Earlier experimental evidence
shows that each doubling in machine speed roughly corresponds to a 100
rating points increase...It is questionable that this remains true at
high level play. But nonetheless with a potential 100-1000 fold speed-up
at the door, *something interesting is probably about to happen*."

In this PhD thesis, he goes further, and draws the scaling graph with
Kasparov on it, at the end, and says in the introduction: "This
dissertation is mainly a collection of exploratory work on what I call
the "ultimate" chess machine - a chess machine that is capable of
searching at least 100 million nodes per second and possibly beyond 1
billion nodes per second. Current evidence seems to indicate that such a
machine will have an overwhelming chance of defeating the human World
Chess Champion."

He wrote that in 1989!

Kasparov, the same expert whose claims about Go and chess started this
very thread, had said the year before that no Grandmaster would be
defeated in tournament play before the year 2000. That gives you some
idea how outlandish Hsu's ideas seemed at the time. Or, for that matter,
how reliable Kasparov's opinion is in these matters.

Hsu achieved his goal in 1997, with 3 years to spare. Kasparov's
response was to call him a cheater.

Now, now, you might be thinking, was it all about speed? It was not -
the above was just Hsu's shtick, who was one member of the team. But, do
not for a moment make the mistake of underestimating just how important
speed is.

Do you know why, decades after having discarded them, we suddenly
started using neural networks again, and, well, do they turn out to work
well for Go now?

It's because we have several orders of magnitude more computing power.
Made possible by dedicated chips for neural network computations (OK, so
maybe they were intended for computer games - turns out the functions
are pretty similar, not to speak of TPUs).

And Hsu? He's working on FPGA's at Microsoft, who're mainly using them
to accelerate AI research and applications. In one of his last
interviews, in 2007, he predicted that "world-champion-level Go machine
can be built within 10 years." He got the details of the approach wrong,
though.

Others members also published several papers, i.e. Murray Campbell,
Thomas Anantharaman and Andreas Nowatzyk.

Nowatzyk has published the original automated evaluation tuning code
used by Deep Though. It's available, together with an explanation, at
http://www.tim-mann.org/deepthought.html

This was significant, because software based programs at the time had to
trade off evaluation terms for speed, so they mostly didn't have very
few, and could rely on manual tuning. Existing methods, you say?

Anantharaman's most known work is the publication of Singular
Extensions. The contribution of this method is somewhat hazy - with Hsu
admitting that they overestimated the initial gain from them due to
measurement errors - but improved methods are in fact in use in current
top of the line chess engines.

Campbell has published on a bunch of subjects. A ton on parallel game
tree search, and a method for biasing the tree search based on human
games. We call that a policy net, nowadays. Ok, maybe I'm stretching a
bit here.

Now, in as to how these methods "contributed to the AI field at large",
which I interpret as asking how well they generalize, that's an
interesting question. But it's also an interesting question that you can
ask of AlphaGo's contributions. Doing move prediction with a DCNN was
first done by Clark and Storkey of the 

Re: [Computer-go] Deep Blue the end, AlphaGo the beginning?

2017-08-18 Thread Gian-Carlo Pascutto
On 17-08-17 21:35, Darren Cook wrote:
> "I'm sure some things were learned about parallel processing... but the
> real science was known by the 1997 rematch... but AlphaGo is an entirely
> different thing. Deep Blue's chess algorithms were good for playing
> chess very well. The machine-learning methods AlphaGo uses are
> applicable to practically anything."
> 
> Agree or disagree?

Deep Thought (the predecessor of Deep Blue) used a Supervised Learning
approach to set the initial evaluation weights. The details might be
lost in time but it's reasonable to assume some were carried over to
Deep Blue. Deep Blue itself used hill-climbing to find evaluation
features that did not seem to correlate with strength much, and improve
them.

A lot of the strength of AlphaGo comes from a fast, parallelized tree
search.

Uh, what was the argument again?

Maybe we should stop inventing artificial differences and appreciate
that the tools in our toolbox have become much sharper over the years.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Possible idea - decay old simulations?

2017-07-24 Thread Gian-Carlo Pascutto
On 24-07-17 16:07, David Wu wrote:
> Hmm. Why would discounting make things worse? Do you mean that you
> want the top move to drop off slower (i.e. for the bot to take longer
> to achieve the correct valuation of the top move) to give it "time"
> to search the other moves enough to find that they're also bad?

I don't want the top move to drop off slower, I just don't want to play
other moves until they've been searched to comparable "depth".

If there's a disaster lurking behind the main-variation that we only
just started to understand, the odds are, the same disaster also lurks
in a few of the alternative moves.

> I would have thought that with typical exploration policies, whether
> the top move drops off a little faster or a little slower, once its
> winrate drops down close to the other moves, the other moves should
> get a lot of simulations as well.

Yes. But the goal of the discounting is, that a new move can make it
above the old one, despite having had less total search effort.

My point is that it is not always clear this is a positive effect.

> I know that there are ways to handle this at the root, via time
> control or otherwise.

The situation isn't necessarily different here, if you consider that at
the root the best published technique is still "think longer so the new
move can overtake the old one", not "play the new move".

Anyway, not saying this can't work. Just pointing out the problem areas.

I would be a bit surprised if discounting worked for Go because it's
been published for other areas (e.g. Amazons) but I don't remember any
reports of success in Go. But the devil can be in the details (i.e. the
discounting formula) for tricks like this.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] KGS Bot tournament July

2017-07-09 Thread Gian-Carlo Pascutto
On 9/07/2017 17:41, "Ingo Althöfer" wrote:
> Hello,
> 
> it seems that the KGS bot tournament did not start, yet.
> What is the matter?

The tournament was played, I am not sure why the standings did not update.

If I'm reading the game histories correctly:

1. Zen7 pts
2. Leela  4 pts
3. Aya3 pts
4. gnugo  0 pts

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] July KGS bot tournament

2017-07-08 Thread Gian-Carlo Pascutto
On 8/07/2017 9:07, Nick Wedd wrote:
> The July KGS bot tournament will be on Sunday, July 7th, starting at
> 08:00 UTC and end by 15:00 UTC.  It will use 19x19 boards, with
> time limits of 14 minutes each and  very fast Canadian overtime, and
> komi of 7½.  It will be a Swiss tournament.
>  See http://www.gokgs.com/tournInfo.jsp?id=1116
> 

The announced time control doesn't match up with the one on the web page.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Value network that doesn't want to learn.

2017-06-19 Thread Gian-Carlo Pascutto
On 19-06-17 17:38, Vincent Richard wrote:

> During my research, I’ve trained a lot of different networks, first on
> 9x9 then on 19x19, and as far as I remember all the nets I’ve worked
> with learned quickly (especially during the first batches), except the
> value net which has always been problematic (diverge easily, doesn't
> learn quickly,...) . I have been stuck on the 19x19 value network for a
> couple months now. I’ve tried countless of inputs (feature planes) and
> lots of different models, even using the exact same code as others. Yet,
> whatever I try, the loss value doesn’t move an inch and accuracy stays
> at 50% (even after days of training). I've tried to change the learning
> rate (increase/decrease), it doesn't change. However, if I feed a stupid
> value as target output (for example black always win) it has no trouble
> learning.
> It is even more frustrating that training any other kind of network
> (predicting next move, territory,...) goes smoothly and fast.
> 
> Has anyone experienced a similar problem with value networks or has an
> idea of the cause?

1) What is the training data for the value network? How big is it, how
is it presented/shuffled/prepared?

2) What is the *exact* structure of the network and training setup?

My best guess would be an error in the construction of the final layers.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

2017-06-07 Thread Gian-Carlo Pascutto
On 24-05-17 05:33, "Ingo Althöfer" wrote:
>> So, 0.001% probability. Demis commented that Lee Sedol's winning move in
>> game 4 was a one in 10 000 move. This is a 1 in 100 000 move.
> 
> In Summer 2016 I checked the games of AlphaGo vs Lee Sedol
> with repeated runs of CrazyStone DL:
> In 3 of 20 runs the program selected P10. It
> turned out that a rather early "switch" in the search was
> necessary to arrive at P10. But if CS did that it
> remained with this candidate.

I guess it's possible this move is selected by a policy other than the
neural network. Or perhaps the probability can be much higher with a
differently trained policy net.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Xeon Phi result

2017-06-07 Thread Gian-Carlo Pascutto
Hi all,

I managed to get a benchmark off of a Intel® Xeon Phi™ Processor 7250
16GB, 1.40 GHz, 68 core (272 thread) system.

I used a version of Leela essentially identical to the public Leela
0.10.0, but compiled with -march=knl (using gcc 5.3), using an
appropriate version of Intel MKL (2017.1 for MIC) and increasing the
maximum amount of threads.

benchmark:

~ 151000 g/s (557 g/s per thread)

netbench:

predictions ->   670 p/s
evaluations ->  3007 p/s

This was with the 16G HBM bound as addressable memory. Using the regular
DDR4 cuts "netbench" numbers in half, but has no big impact on "benchmark".

This means it's about 5 times faster (in integer operations) than a quad
core desktop with HT, and about similar in floating point performance to
a mid-range videocard.

It's a nice machine, if a bit pricey.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Gian-Carlo Pascutto
On 23-05-17 17:19, Hideki Kato wrote:
> Gian-Carlo Pascutto: <0357614a-98b8-6949-723e-e1a849c75...@sjeng.org>:
> 
>> Now, even the original AlphaGo played moves that surprised human pros
>> and were contrary to established sequences. So where did those come
>> from? Enough computation power to overcome the low probability?
>> Synthesized by inference from the (much larger than mine) policy network?
> 
> Demis Hassabis said in a talk:
> After the game with Sedol, the team used "adversarial learning" in 
> order to fill the holes in policy net (such as the Sedol's winning 
> move in the game 4).

I said, the "original AlphaGo", i.e. the one used in the match against
Lee Sedol. According to the Nature paper, the policy net was trained
with supervised learning only [1]. And yet...

In the attached SGF, AlphaGo played P10, which was considered a very
surprising move by all commentators. Presumably, this means it's not
seen in high level human play, and would not get a high rating in the
policy net. I can sort-of confirm this:

0.295057654 (E13)
...(60 more moves follow)...
0.11952 (P10)

So, 0.001% probability. Demis commented that Lee Sedol's winning move in
game 4 was a one in 10 000 move. This is a 1 in 100 000 move.
Differently trained policy nets might rate it a bit higher or lower, but
simply due to the fact that was considered very un-human to do, it seems
unlikely to ever be rated highly by a policy net based on supervised
learning.

So in AlphaGo's formula, you're dealing with a reduction of the UCT term
by a factor 100 000 plus or minus some order of magnitude.

  D6 -> 1359934 (W: 53.21%) (U: 49.34%) (V: 55.15%:  38918) (N:  6.3%)
PV: D6 F6 E7 F7 C8 B8 D7 B7 E9 C9 F8 H7 H
9 K7 H3 K9
...many moves...
 P10 -> 421 (W: 52.68%) (U: 50.09%) (V: 53.98%:  8) (N:  0.0%)
PV: P10 Q10 P8 Q9

Now, of course AlphaGo had a few orders of magnitude more hardware, but
you can see from the above that it's, eh, not easy for P10 to overtake
the top moves here in playout count.

And yet, that's the move that was played.

[1] I'm assuming that what played the match corresponds to what they
published there - maybe that is my mistake. I'm not sure I remember the
relevant timeline correctly.

-- 
GCP


sedol.sgf
Description: application/go-sgf
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Gian-Carlo Pascutto
On 23-05-17 10:51, Hideki Kato wrote:
> (2) The number of possible positions (input of the value net) in 
> real games is at least 10^30 (10^170 in theory).  If the value 
> net can recognize all?  L depend on very small difference of 
> the placement of stones or liberties.  Can we provide necessary 
> amount of training data?  Have the network enough capacity?  
> The answer is almost obvious by the theory of function 
> approximation.  (ANN is just a non-linear function 
> approximator.)

DCNN clearly have some ability to generalize from learned data and
perform OK even with unseen examples. So I don't find this a very
compelling argument. It's not like Monte Carlo playouts are going to
handle all sequences correctly either.

Evaluations are heuristic guidance for the search, and a help when the
search terminates in an unresolved position. Having multiple independent
ones improves the accuracy of the heuristic - a basic ensemble.

> (3) CNN cannot learn exclusive-or function due to the ReLU 
> activation function, instead of traditional sigmoid (tangent 
> hyperbolic).  CNN is good at approximating continuous (analog) 
> functions but Boolean (digital) ones.

Are you sure this is correct? Especially if we allow leaky ReLU?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Gian-Carlo Pascutto
On 22-05-17 21:01, Marc Landgraf wrote:
> But what you should really look at here is Leelas evaluation of the game.

Note that this is completely irrelevant for the discussion about
tactical holes and the position I posted. You could literally plug any
evaluation into it (save for a static oracle, in which case why search
at all...) and it would still have the tactical blindness being discussed.

It's an issue of limitations of the policy network, combined with the
way one uses the UCT formula. I'll use the one from the original AlphaGo
paper here, because it's public and should behave even worse:

u(s, a) = c_puct * P(s, a) * sqrt(total_visits / (1 + child_visits))

Note that P(s, a) is a direct factor here, which means that for a move
ignored by the policy network, the UCT term will almost vanish. In other
words, unless the win is immediately visible (and for tactics it won't),
you're not going to find it. Also note that this is a deviation from
regular UCT or PUCT, which do not have such a direct term and hence only
have a disappearing prior, making the search eventually more exploratory.

Now, even the original AlphaGo played moves that surprised human pros
and were contrary to established sequences. So where did those come
from? Enough computation power to overcome the low probability?
Synthesized by inference from the (much larger than mine) policy network?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Gian-Carlo Pascutto
On 23-05-17 03:39, David Wu wrote:
> Leela playouts are definitely extremely bad compared to competitors like
> Crazystone. The deep-learning version of Crazystone has no value net as
> far as I know, only a policy net, which means it's going on MC playouts
> alone to produce its evaluations. Nonetheless, its playouts often have
> noticeable and usually correct opinions about early midgame game
> positions (as confirmed by the combination of own judgment as a dan
> player and Leela's value net). Which I find amazing - that it can even
> approximately get these right.

Leela's Monte Carlo playouts were designed and implemented in 2007,
before most of the current literature around them was public. Back then,
they were very "thick" and good enough to make the program one of the
strongest around. Needless to say, in the ~9 years or so when I was
absent from go programming, others made substantial progress in that
area, especially as before value nets this was clearly one of the most
important components of strength. Leela's Monte Carlo playouts for sure
are weaker than those of Crazy Stone and Zen, and even pachi. I have
done work on this in the last year, but a more complete overhaul isn't
in 0.10.0 yet.

Nevertheless (as you also observe below) they still contribute a benefit
to the strength of the engine. That's why I've been consistently saying
dropping them doesn't seem to be good, and why I like the orthogonality
they provide with the value net (and am generally wary of methods that
tune the playouts with or towards the value net).

> So clearly what's going on is that the playouts allow suicide,

I'll need to reconstruct the position you set up, but this is something
that shouldn't happen. Thank you for pointing it out, I'll try to
confirm on my side.

> Now I'm just speculating. My guess is that somehow 3% of the time, the
> game is scored without black having captured white's group. As in -
> black passes, white passes, white's dead group is still on the board, so
> white wins. The guess would be that liberties and putting it in atari
> increases the likelihood that the playouts kill the group before having
> both players pass and score. But that's just a guess, maybe there's also
> more black magic involving adjusting the "value" of a win depending on
> unknown factors beyond just having a "big win". Would need Gian-Carlo to
> actually confirm or refute this guess though.

Leela allows passes with a very low probability, so your analysis is
probably right.

> given that they're a significant weight in the evaluation
> alongside the value net, they're probably one of the major things
> holding Leela back at this point.

I assume that as well, which is why I've been doing some work on them,
but I'm also prepared to be disappointed. Note that I didn't put the
significant weighting arbitrarily: it's set to what gave the maximum
playing strength.

I suspect that when there are multiple options that seem objectively
equally good (from the value net), the playouts also help play towards
the option where it is harder to mess up. In this case, a larger amount
of stochasticity is not a bad thing.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Gian-Carlo Pascutto
On 22-05-17 17:47, Erik van der Werf wrote:
> On Mon, May 22, 2017 at 3:56 PM, Gian-Carlo Pascutto <g...@sjeng.org
> <mailto:g...@sjeng.org>> wrote:
> 
> Well, I think that's fundamental; you can't be wide and deep at the same
> time, but at least you can chose an algorithm that (eventually) explores
> all directions.

Right. But I'm uncomfortable with the current setup, because many
options won't get explored at all in practical situations. It would seem
logical that some minimum amount of (more spread) search effort would
plug enough holes to stop bad blunders, but finding a way to do that and
preserve strength seems elusive so far.

> BTW I'm a bit surprised that you are still able to find 'big tactical
> holes' with Leela now playing as 8d KGS

I've attached an example. It's not the prettiest one (one side has a 0.5
pt advantage in the critical variation so exact komi is an issue), but
it's a recent one from my mailbox.

This is with Leela 0.10.0 so you can follow along:

Leela: loadsgf tactics2.sgf 261
=


Passes: 0Black (X) Prisoners: 7
White (O) to moveWhite (O) Prisoners: 19

   a b c d e f g h j k l m n o p q r s t
19 . X O O . . . . . . . O O X X X O X . 19
18 . X X O O . . . . O O O X X X O O . O 18
17 . . X X O . . . O O X O O X . X O O . 17
16 . . X O O . . O O X X X X . . X X X . 16
15 . . X O . . O X X X X . . . . . X . X 15
14 . . . X O . O X O O O X . X X O O X(X)14
13 . . . X O O O O O X X X X O O X X X O 13
12 . . . . X X . O X X O O X O . O O X O 12
11 . . . . . . . O O X O X X O . . O O O 11
10 . . . X . X X O O O O O X O O O O . . 10
 9 . . . . X . X X O . O X O X X X O . .  9
 8 . . . . X . X O . . O X O O X O . . .  8
 7 X X X X O X X O . O O X O X X O O . .  7
 6 O O O O O O O . . . O X O . X X O . .  6
 5 X O O . X O O O O O X X X X X O . . .  5
 4 X X O O O X X O . O X X . X O O . . .  4
 3 X . X X X X O . O . O X . . X O . . .  3
 2 X O X . . O O . O . O O X X X O O . .  2
 1 . X . O O . . O O . O X X . X X O . .  1
   a b c d e f g h j k l m n o p q r s t

Hash: 106A3898CEC94132 Ko-Hash: 67E390C41BF2577

Black time: 00:30:00
White time: 00:30:00

Leela: heatmap
=

94.46% G11
 4.20% C1
 1.31% E2
 0.03% all other moves together

Note that O16 P17 N15 wins immediately. It's not that Leela is
completely blind to it, because that sequence is in some variations. But
in here, O16 won't get searched for a long time (it is actually the 4th
rated move) due to the skewed probabilities.

Leela: play w g11
=

Leela: heatmap
=

99.9% F11
 0.1% C1
   0% all other moves together

Leela: genmove b


Score looks bad. Resigning.
= resign

https://timkr.home.xs4all.nl/chess2/resigntxt.htm

If black plays O16 instead here he wins by 0.5 points.

-- 
GCP


tactics2.sgf
Description: application/go-sgf
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Gian-Carlo Pascutto
On 22-05-17 15:46, Erik van der Werf wrote:
> Oh, haha, after reading Brian's post I guess I misunderstood :-)
> 
> Anyway, LMR seems like a good idea, but last time I tried it (in Migos)
> it did not help. In Magog I had some good results with fractional depth
> reductions (like in Realization Probability Search), but it's a long
> time ago and the engines were much weaker then...

What was generating your probabilities, though? A strong policy DCNN or
something weaker?

ERPS (LMR with fractional reductions based on move probabilities) with
alpha-beta seems very similar to having MCTS with the policy prior being
a factor in the UCT formula. This is what AlphaGo did according to their
2015 paper, so it can't be terrible, but it does mean that you are 100%
blind to something the policy network doesn't see, which seems
worrisome. I think I asked Aja once about what they do with first play
urgency given that the paper doesn't address it - he politely ignored
the question :-)

The obvious defense (when looking at it in alpha-beta formulation) would
be to cap the depth reduction, and (in MCTS/UCT formulation) to cap the
minimum probability. I had no success with this in Go so far.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Gian-Carlo Pascutto
On 22-05-17 14:48, Brian Sheppard via Computer-go wrote:
> My reaction was "well, if you are using alpha-beta, then at least use
> LMR rather than hard pruning." Your reaction is "don't use 
> alpha-beta", and you would know better than anyone!

There's 2 aspects to my answer:

1) Unless you've made a breakthrough with value nets, there appears to
be a benefit to keeping the Monte Carlo simulations.

2) I am not sure the practical implementations of both algorithms end up
searching in a different manner.

(1) Is an argument against using alpha-beta. If we want to get rid of
the MC simulations - for whatever reason - it disappears. (2) isn't an
argument against. Stating the algorithm in a different manner may make
some heuristics or optimizations more obvious.

> Yes, LMR in Go has is a big difference compared to LMR in chess: Go 
> tactics take many moves to play out, whereas chess tactics are often
>  pretty immediate.

Not sure I agree with the basic premise here.

> So LMR could hurt Go tactics much more than it hurts chess tactics.
> Compare the benefit of forcing the playout to the end of the game.
LMR doesn't prune anything, it just reduces the remaining search depth
for non-highly rated moves. So it's certainly not going to make
something tactically weaker than hard pruning? If you're talking about
not pruning or reducing at all, you get the issue of the branching
factor again.

In chess you have quiescent search to filter out the simpler tactics. I
guess Monte Carlo simulations may act similar in that they're going to
raise/lower the score if in some simulations tactical shenanigans happen.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Gian-Carlo Pascutto
On 22-05-17 11:27, Erik van der Werf wrote:
> On Mon, May 22, 2017 at 10:08 AM, Gian-Carlo Pascutto <g...@sjeng.org
> <mailto:g...@sjeng.org>> wrote:
> 
> ... This heavy pruning
> by the policy network OTOH seems to be an issue for me. My program has
> big tactical holes.
> 
> 
> Do you do any hard pruning? My engines (Steenvreter,Magog) always had a
> move predictor (a.k.a. policy net), but I never saw the need to do hard
> pruning. Steenvreter uses the predictions to set priors, and it is very
> selective, but with infinite simulations eventually all potentially
> relevant moves will get sampled.

With infinite simulations everything is easy :-)

In practice moves with, say, a prior below 0.1% aren't going to get
searched, and I still regularly see positions where they're the winning
move, especially with tactics on the board.

Enforcing the search to be wider without losing playing strength appears
to be hard.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Gian-Carlo Pascutto
On 20/05/2017 22:26, Brian Sheppard via Computer-go wrote:
> Could use late-move reductions to eliminate the hard pruning. Given
> the accuracy rate of the policy network, I would guess that even move
> 2 should be reduced.
> 

The question I always ask is: what's the real difference between MCTS
with a small UCT constant and an alpha-beta search with heavy Late Move
Reductions? Are the explored trees really so different?

In any case, in my experiments Monte Carlo still gives a strong benefit,
even with a not so strong Monte Carlo part. IIRC it was the case for
AlphaGo too, and they used more training data for the value network than
is publicly available, and Zen reported the same: Monte Carlo is important.

The main problem is the "only top x moves part". Late Move Reductions
are very nice because there is never a full pruning. This heavy pruning
by the policy network OTOH seems to be an issue for me. My program has
big tactical holes.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Patterns and bad shape

2017-04-18 Thread Gian-Carlo Pascutto
On 17-04-17 15:04, David Wu wrote:

> If you want an example of this actually mattering, here's example where
> Leela makes a big mistake in a game that I think is due to this kind of
> issue.

Ladders have specific treatment in the engine (which also has both known
limitations and actual bugs in 0.9.0 with it) so trying to infer
anything about learning, patterns, bad shape, example games etc. with
them isn't going to work.

At this point you're likely to have ascribed some deeper meaning to a
typing mistake on my part.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Zen lost to Mi Yu Ting

2017-03-22 Thread Gian-Carlo Pascutto
On 22-03-17 16:27, Darren Cook wrote:
> (Japanese rules are not *that* hard. IIRC, Many Faces, and all other
> programs, including my own, scored in them

There is a huge difference between doing some variation of territory
scoring and implementing Japanese rules. Understanding this difference
will get you some way to understanding why some people do not like them,
and that has got nothing to do with computer go.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Zen lost to Mi Yu Ting

2017-03-22 Thread Gian-Carlo Pascutto
On 22-03-17 09:41, Darren Cook wrote:
>> The issue with Japanese rules is easily solved by refusing to play
>> under ridiculous rules. Yes, I do have strong opinions. :)
> 
> And the problem with driver-less cars is easily "solved" by banning
> all road users that are not also driver-less cars (including all 
> pedestrians, bikes and wild animals).

I think you misunderstand the sentiment completely. It is not: Japanese
rules are difficult for computers, so we don't like them.

It is: Japanese rules are problematic on many levels, so we prefer to
work with Chinese ones and as a consequence that's what the programs are
trained for and tested on. It is telling that Zen is having these
troubles despite being made by Japanese programmers. I believe the
saying for this is "voting with your feet".

> Or how about this angle: humans are still better than the programs
> at Japanese rules. Therefore this is an interesting area of study.

Maybe some people are interested in studying Japanese rules, like
finding out what they actually are
(http://home.snafu.de/jasiek/j1989c.html). That's fine, but not all that
interesting for AI, or, actually, computer go.

Of course, commercial programs that need to cater to a Japanese (or
Korean) audience are stuck. As are people that want to play the UEC Cup etc.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Zen lost to Mi Yu Ting

2017-03-22 Thread Gian-Carlo Pascutto
On 22-03-17 00:36, cazen...@ai.univ-paris8.fr wrote:
> 
> Why can't you reuse the same self played games but score them

If you have self-play games that are played to the final position so
scoring is fool-proof, then it could work. But I think things get really
interesting when timing of a pass matters (which is the kind of
situation we're trying to resolve) and you're using pure policy players.

Does your DCNN only player know *precisely* when to pass *first* under
Japanese rules?

> The policy network does not use the komi to choose its moves so it
> should make no difference.

Do you not play different moves when you are behind 0.5 points compared
to when you're ahead 0.5 points?

(Or if you're ignoring komi completely, behind multiple stones vs ahead
multiple stones?)

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Zen lost to Mi Yu Ting

2017-03-21 Thread Gian-Carlo Pascutto
On 21/03/2017 21:08, David Ongaro wrote:
>> But how would you fix it? Isn't that you'd need to retrain your value
>> network from the scratch?
> 
> I would think so as well. But I some months ago I already made a
> proposal in this list to mitigate that problem: instead of training a
> different value network for each Komi, add a “Komi adjustment” value as
> input during the training phase. That should be much more effective,
> since the “win/lost” evaluation shouldn’t change for many (most?)
> positions for small adjustments but the resulting value network (when
> trained for different Komi adjustments) has a much greater range of
> applicability.

The problem is not the training of the network itself (~2-4 weeks of
letting a program someone else wrote run in the background, easiest
thing ever in computer go), or whether you use a komi input or a
separate network, the problem is getting data for the different komi values.

Note that if getting data is not a problem, then a separate network
would perform better than your proposal.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] AMD Ryzen benchmarks for Go

2017-03-10 Thread Gian-Carlo Pascutto
Linux 4.10.1 (has SMT scheduler fix)
GCC 5.4 - so no Ryzen optimizations

pachi-git-13115394
Intel Haswell
t=8   13325 g/s t=1 1665 g/s   @3.6GHz
t=49352 g/s t=1 2338 g/s   @3.6GHz
t=12542 g/s@3.8GHz

AMD Ryzen
t=16  26589 g/s t=1 1661 g/s   @3.7GHz
t=8   15464 g/s t=1 1933 g/s   @3.7GHz
t=48141 g/s t=1 2035 g/s   @3.7GHz
t=12221 g/s@3.7GHz

Leela 0.9.4
Intel Haswell @3.8GHz (OpenBLAS Haswell BLAS)
1 thread
benchmark = 5685 g/s   (mostly INT)
netbench predictions = 33 p/s  (DCNN AVX2 FPU)
netbench evaluations = 238 p/s (DCNN AVX2 FPU)

AMD Ryzen @3.7GHz (OpenBLAS Haswell BLAS)
1 thread
benchmark = 5099 g/s   (mostly INT)
netbench predictions = 27 p/s  (DCNN AVX2 FPU)
netbench evaluations = 239 p/s (DCNN AVX2 FPU)

Observations:
- SMT performance of Ryzen appears to be extremely good (+72% on pachi
vs +42% Intel).
- Single core IPC is 8.5% ~ 11.5% behind Haswell.
- Ryzen's AVX2 performance is too good. Ryzen has 2 x 128 bit FPU vs. 2
x 256 bit FPU for Haswell, and the majority of the time in Leela 0.9 is
spent in SGEMM, which is an ideal case for AVX2 code. I would have
predicted AVX2 results to be about half as fast on Ryzen, but its
results are extremely competitive or even better. I have no real
explanation for this, my best guess is a win due to a better fit cache
subsystem.
- By default OpenBLAS selects the Barcelona kernel for Ryzen (ugh!).
Overriding with the Haswell kernel gives much better results due to AVX2
usage.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] New AMD processors

2017-03-03 Thread Gian-Carlo Pascutto
On 03-03-17 21:29, "Ingo Althöfer" wrote:
> Hi,
> 
> AMD has published a new (fast and cool) processor, the Ryzen.
> Did some go programmers already collect experiences with it?
> Do they combine well with GPUs?

I'm not getting one until there are mainboard reviews out, because there
seem to be early BIOS issues, but some things can be predicted:

- *without* GPU: Ryzen has half the AVX power, so running the DCNN on
the CPU goes at half speed. But this is not the only thing the program
needs to do, and you do get twice the amount of cores. So I think it
should easily make up for the difference, compared to the 4 core Intel
chips.

- *with* GPU: in this case only integer performance matters, which
appears to be outstanding. e.g.:
http://www.hardware.fr/articles/956-14/ia-echecs-stockfish-komodo.html

Price/performance should be outstanding.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] UEC wild cards?

2017-02-24 Thread Gian-Carlo Pascutto
On 21/02/2017 16:11, "Ingo Althöfer" wrote:
> Dear UEC organizers,
> 
> GCP wrote (on behalf of Leela):
>> I did not register for the UEC Cup. I seem to be in good company there,
>> sadly.
> 
> do you have a few wild cards for strong late entries?

Posting on behalf of the UEC organizers:

Yes, and one was offered to Leela.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Leela Superstar!

2017-02-21 Thread Gian-Carlo Pascutto
On 21-02-17 16:27, Aja Huang via Computer-go wrote:
> Congrats for Leela's significant improvements. :)

Thank you.

When I said I was "in good company" by not having registered for the UEC
Cup, I was actually referring to you (AlphaGo), BTW.

I feel that maybe Ingo may have misunderstood me there :-)

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Leela Superstar!

2017-02-21 Thread Gian-Carlo Pascutto
On 19-02-17 17:00, "Ingo Althöfer" wrote:
> Hi,
> the rank graph of LeelaX on KGS looks impressive:
> 
> http://www.dgob.de/yabbse/index.php?action=dlattach;topic=6048.0;attach=5658;image
> 
> Of course, its shape will be more "gnubbled" after a few days.

Thank you for the kind words, it is appreciated.

I need my machine back to try a few more things, so Leela won't be on
KGS for a while. (I was surprised to see it on CGOS - it's not my account).

I did not register for the UEC Cup. I seem to be in good company there,
sadly.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Playout policy optimization

2017-02-13 Thread Gian-Carlo Pascutto
On 12/02/2017 5:44, Álvaro Begué wrote:

> I thought about this for about an hour this morning, and this is what I
> came up with. You could make a database of positions with a label
> indicating the result (perhaps from real games, perhaps similarly to how
> AlphaGo trained their value network). Loop over the positions, run a few
> playouts and tweak the move probabilities by some sort of reinforcement
> learning, where you promote the move choices from playouts whose outcome
> matches the label, and you discourage the move choices from playouts
> whose outcome does not match the label.
> 
> The point is that we would be pushing our playout policy to produce good
> estimates of the result of the game, which in the end is what playout
> policies are for.
> 
> Any thoughts? Did anyone actually try something like this?

This is how Facebook trained the playout policy of Darkforest. I
couldn't tell from the paper, but inspecting the code shows exactly this
algorithm at work.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo rollout nakade patterns?

2017-01-31 Thread Gian-Carlo Pascutto
On 31-01-17 16:32, Roel van Engelen wrote:
> @Brain Sheppard
> Thanks that is a really useful explanation! 
> the way you state: "and therefore a 8192-sized pattern set will identify
> all potential nakade." seems to indicate this is a known pattern set?
> could i find some more information on it somewhere? 

http://senseis.xmp.net/?BasicLivingEyeShapes

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo rollout nakade patterns?

2017-01-24 Thread Gian-Carlo Pascutto
On 23-01-17 20:10, Brian Sheppard via Computer-go wrote:
> only captures of up to 9 stones can be nakade.

I don't really understand this.

http://senseis.xmp.net/?StraightThree

Both constructing this shape and playing the vital point are not
captures. How can you detect the nakade (and play at a in time) if you
only check captures?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Messages classified as spam.

2017-01-12 Thread Gian-Carlo Pascutto
On 12/01/2017 11:55, Rémi Coulom wrote:
> It is the mail server of this mailing list that is not well
> configured. Even my own messages are classified as spam for me now.
> The list does not send DKIM identification.

It's been a while since I looked at this in depth, but the problem seems
to be that it modifies the email but doesn't strip the original DKIM,
which then fails to validate. Even adding a DKIM from the mailinglist
wouldn't help, because in Patricks' case, his domain has a stated DMARC
policy, which requires a valid DKIM from that same domain. It's the
DMARC that makes this so much worse as just failing DKIM isn't usually
enough to get classified as spam.

The list is on MailMan 2.1.18, which has support for working around this
problem:
http://www.spamresource.com/2016/09/dmarc-support-in-mailman.html
https://wiki.list.org/DEV/DMARC

Admin, can you try this dmarc_moderation_action = Munge From?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-12 Thread Gian-Carlo Pascutto
On 11-01-17 18:09, Xavier Combelle wrote:
> Of course it means distribute at least the binary so, or the source,
> so proprietary software could be reluctant to share it. But for free
> software there should not any problem. If someone is interested by my
> proposition, I would be pleased to realize it.

It is obvious that having a 30M dataset of games between strong players
(i.e. replicating the AlphaGo training set) would be beneficial to the
community. It is clear that most of us are trying to do the same now,
that is somehow trying to learn a value function from the about ~1.5M
KGS+Tygen+GoGoD games while trying to control overfitting via various
measures. (Aya used small network + dropout. Rn trained multiple outputs
on a network of unknown size. I wonder why no-one tried normal L1/L2
regularization, but then I again I didn't get that working either!)

Software should also not really be a problem: Leela is free, Ray and
Darkforest are open source. If we can use a pure DCNN player I think
there are several more options, for example I've seen several programs
in Python. You can resolve score disagreement by invoking GNU Go --score
aftermath.

I think it's an open question though, *how* the games should be
generated, i.e.:

* Follow AlphaGo procedure but with SL instead of RL player (you can use
bigger or smaller networks too, many tradeoffs possible)
* Play games with full MCTS search and small number of playouts. (More
bias, much higher quality games).
* The author of Aya also stated his procedure.
* Several of those and mix :-)

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Computer-go - Simultaneous policy and value functions reinforcement learning by MCTS-TD-Lambda ?

2017-01-12 Thread Gian-Carlo Pascutto
Patrick, for what it's worth, I think almost no-one will have seen your
email because laposte.net claims it's forged. Either your or
laposte.net's email server is mis-configured.

> Refering to Silver's paper terminology and results, greedy policy 
> using RL Policy Network beated greedy policy using SL Policy
> Network, but PV-MCTS performed better when used with SL Policy
> Networks than with RL-Policy Network. Authors hypothetized that it is
> "presumably because humans select a diverse beam of promising moves,
> whereas RL optimizes for the single best move".

I've always found this to be a rather strange argument. If the wideness
of the selection is an issue, this can be resolved by tuning the UCT
parameters and prior differently, it doesn't need to be tuned in the
DCNN itself.

Someone on the list made a different argument: when there are several
good shape moves and one that tactically resolves the situation, SL may
prefer shape moves. But SL has bad tactical awareness, so resolving the
situation might be better for it and this is what RL learns to strongly
favor. Compare this with playouts (who also have little tactical
awareness themselves) strongly favoring settling the local situation. I
find this a more persuasive argument.

> Thus, one quality of a policy function to be used to bias the search
>  in a MCTS is a good balance between 'sharpness' (being selective)
> and 'open-mindness' (giving a chance to some low-value moves which
> could turn to be important; avoid blind spot).

Because of the above I disagree with this: this is a matter of tuning
the UCT parameters. The goal of the DCNN should be to give an objective
as possible judgment as to the likelihood that a move is best.

> Coudld someone direct me to litterature exploring this idea or 
> explaining why it doesnt't work in practice ?

I think simply no-one has tried it yet, at least publicly. There are
many other ideas to explore.

> I'm wondering  if someone has ever considered using a gradient of 
> temperature, in the softmax layer of the policy network,  with 
> temperature parameter varying with depth in the tree, so that the 
> search is broader in the first levels and becomes narrow in the 
> deepest levels (ultimately, it would turn the search into rollout to 
> the end of the game for deepest nodes). 

Don't typical UCT implementations already do this? If you use priors and
scale the priors down with the amount of visits a node has had, you get
the described effect. Or the opposite way, if you use progressive
widening it has the same effect.

You seem to be thinking all of this fudging of probabilities has to be
done at the DCNN level, but why not do it in the MCTS/UCT search
directly? It has more information, after all.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Gian-Carlo Pascutto
On 10-01-17 23:25, Bo Peng wrote:
> Hi everyone. It occurs to me there might be a more efficient method to
> train the value network directly (without using the policy network).
> 
> You are welcome to check my
> method: http://withablink.com/GoValueFunction.pdf
> 

For Method 1 you state:

"However, because v is an finer function than V (which is already finer
than W), the bias is better controlled than the case of W, and we can
use all states in the game to train our network, instead of just picking
1 state in each game to avoid over-fitting"

This is intuitively true, and I'm sure it will reduce some overfitting
behavior, but empirically the author of Aya reported the opposite, i.e.
training on W/L is superior over a linear interpolation to the endgame.

It's possible this happens because the V(s) flipping from 0.5 to 0 and 1
more steeply helps the positions where this happens stand out from the
MC noise.

Combining this with Kensuke's comment, I think it might be worth trying
to train V(s) and W(s) simultaneously, but with V(s) being the linear
interpolation depending on move number, not the value function (which
leaves us without a way to play handicap games and a bunch of other
benefits).

This could reduce overfitting during training, and if we only use W(s)
during gameplay we still have the "strong signal" advantage.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Gian-Carlo Pascutto
On 11-01-17 14:33, Kensuke Matsuzaki wrote:
> Hi,
> 
> I couldn't get positive experiment results on Ray.
>  
> Rn's network structure of V and W are similar and share parameters,
> but only final convolutional layer are different.
> I trained Rn's network to minimize MSE of V(s) + W(s).
> It uses only KGS and GoGoD data sets, no self play with RL policy.

How do you get the V(s) for those datasets? You play out the endgame
with the Monte Carlo playouts?

I think one problem with this approach is that errors in the data for
V(s) directly correlate to errors in MC playouts. So a large benefit of
"mixing" the two (otherwise independent) evaluations is lost.

This problem doesn't exist when using raw W/L data from those datasets,
or when using SL/RL playouts. (But note that using the full engine to
produce games *would* suffer from the same correlation. That might be
entirely offset by the higher quality of the data, though.)

> But I have no idea about how to use V(s) or v(s) in MCTS.

V(s) seems potentially useful for handicap games where W(s) is no longer
accurate. I don't see any benefit over W(s) for even games.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Golois5 is KGS 4d

2017-01-10 Thread Gian-Carlo Pascutto
On 10-01-17 15:05, Hiroshi Yamashita wrote:
> Hi,
> 
> Golois5 is KGS 4d.
> I think it is a first bot that gets 4d by using DCNN without search.

I found this paper:

https://openreview.net/pdf?id=Bk67W4Yxl

They are using residual layers in the DCNN.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] GTX 1080 benchmark

2016-12-15 Thread Gian-Carlo Pascutto
On 15/12/2016 12:35, Hiroshi Yamashita wrote:
>   F32F128F256MNIST
> GTX 1080  0.48ms  1.45ms  2.38ms  17sec,  CUDA 8.0, cuDNN v5.0, Core i7
> 980X 3.3GHz 6core
> GTX 1080  0.87ms  1.79ms  2.65ms  19sec,  CUDA 8.0, cuDNN v5.1, Core i7
> 980X 3.3GHz 6core
> GTX 980   0.60ms  1.51ms  2.80ms  24sec,  CUDA 7.5, cuDNN v5.0, Xeon
> W3680   3.3GHz 6core

The speedup from the GTX980 -> GTX1080 is very bad, isn't it? The card
has almost 100% more theoretical FLOPS, and much of the increase is due
to clock-speed and more shaders (and less so due to uarch changes) so
the extra FLOPS should be observable at least for the big networks.

I think you are entirely limited by setup/CPU/driver/API overhead. The
Hirabot network is bigger, the author has a smaller GPU, and his CPU is
faster. This would reduce the relative CPU overhead.

I suspect these overheads are also very large in cuDNN with a mini-batch
size of 1. My OpenCL code does not use the Winograd optimization and is
generic, identical code for AMD and NVIDIA cards, yet the performance is
very similar to cuDNN v5.1. This seems to indicate GPU processing is not
the actual bottleneck.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Aya reaches pro level on GoQuest 9x9 and 13x13

2016-11-21 Thread Gian-Carlo Pascutto
On 17-11-16 22:38, Hiroshi Yamashita wrote:
> Value Net is 32 Filters, 14 Layers.
> 32 5x5 x1, 32 3x3 x11, 32 1x1 x1, fully connect 256, fully connect tanh 1

I think this should be:
32 5x5 x1, 32 3x3 x11, 1 1x1 x1, fully connect 256, fully connect tanh 1

Else one has a 361 * 32 * 256 layer with 3M weights, while all the conv
layer have maybe 100k weights in total. That looks strange.

> Features are 50 channels.
> http://computer-go.org/pipermail/computer-go/2016-March/008768.html

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Aya reaches pro level on GoQuest 9x9 and 13x13

2016-11-21 Thread Gian-Carlo Pascutto
On 20-11-16 11:16, Detlef Schmicker wrote:
> Hi Hiroshi,
> 
>> Now I'm making 13x13 selfplay games like AlphaGo paper. 1. make a
>> position by Policy(SL) probability from initial position. 2. play a
>> move uniformly at random from available moves. 3. play left moves
>> by Policy(RL) to the end. (2) means it plays very bad move usually.
>> Maybe it is because making completely different position? I don't
>> understand why this (2) is
> needed.
> 
> I did not read the alphago paper like this.
> 
> I read it uses the RL policy the "usual" way (I would say it means
> something like randomizing with the net probabilities for the best 5
> moves or so)
> 
> but randomize the opponent uniformaly, meaning the net values of the
> opponent are taken from an earlier step in the reinforcement learning.
> 
> Meaning e.g.
> 
> step 1 playing against step 7645 in the reinforcement history?
> 
> Or did I understand you wrong?

You are confusing the Policy Network RL procedure with the Value Network
data production.

For the Value Network indeed the procedure is as described, with one
move at time U being uniformly sampled from {1,361} until it is legal. I
think it's because we're not interested (only) in playing good moves,
but also analyzing as diverse as possible positions to learn whether
they're won or lost. Throwing in one totally random move vastly
increases the diversity and the number of odd positions the network
sees, while still not leading to totally nonsensical positions.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Aya reaches pro level on GoQuest 9x9 and 13x13

2016-11-18 Thread Gian-Carlo Pascutto
On 17/11/2016 22:38, Hiroshi Yamashita wrote:
> Features are 49 channels.
> http://computer-go.org/pipermail/computer-go/2016-February/008606.html
...
> Value Net is 32 Filters, 14 Layers.
> 32 5x5 x1, 32 3x3 x11, 32 1x1 x1, fully connect 256, fully connect tanh 1
> Features are 50 channels.
> http://computer-go.org/pipermail/computer-go/2016-March/008768.htm

Thank you for this information. It takes a long time to train the
networks, so knowing which experiments have not worked is very valuable.

Did you not find a benefit from a larger value network? Too little data
and too much overfitting? Or more benefit from more frequent evaluation?

> Policy + Value vs Policy, 1000 playouts/move, 1000 games. 9x9, komi 7.0
> 0.634  using game result. 0 or 1

I presume this is a winrate, but over what base? Policy network?

> I also made 19x19 Value net. 19x19 learning positions are from KGS 4d over,
> GoGoD, Tygem and 500 playouts/move selfplay. 990255 games. 32 positions
> are selected from a game. Like Detlef's idea, I also use game result.
> I trust B+R and W+R games with komi 5.5, 6.5 and 7.5. In other games,
> If B+ and 1000 playouts at final position is over +0.60, I use it.

How do you handle handicap games? I see you excluded them from the KGS
dataset. Can your value network deal with handicap?

At least in the KGS ruleset, handicap stones are added to the score
calculation, so it is required that the network knows the exact handicap.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Time policy

2016-11-04 Thread Gian-Carlo Pascutto
On 04-11-16 04:45, Billy White wrote:
> Hi,
> 
> Our team is working on a computer go system mainly followed alphago.
> We try to add time policy to our system but cannot find something
> useful.
> 
> I am wondering whether there are some useful material?

Take a large games database, and construct a table of expected number of
moves remaining based on the current move of the game.

Divide total amount of time left by the output of that table.

Test if biasing it to think slightly longer early on helps playing strength.

If there is byo-yomi time. the required extra thinking time generally
flows logically from the byo-yomi timecontrol and the above.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] DarkForest policy network training code is open-source now.

2016-10-05 Thread Gian-Carlo Pascutto
On 04-10-16 23:47, Yuandong Tian wrote:
> Hi all, 
> 
> DarkForest training code is open source now. Hopefully it will help the
> community.
> 
> https://github.com/facebookresearch/darkforestGo
> 
> 
> With 4 GPUs, the training procedure gives 56.1% top-1 accuracy in KGS
> dataset in 3.5 days, and 57.1% top-1 in 6.5 days (see the simple log
> below). The parameters used are the following: --epoch_size 256000 --GPU
> 4 --data_augmentation --alpha 0.1 --nthread 4

It's probably due to my unfamiliarity with Torch but I couldn't find
where the actual network structure is defined.

I think the script runs with alpha=0.05, not alpha=0.1.

I understood from previous comments you didn't find momentum to be
beneficial. This highly surprises me. Is that still the case?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Congratulations to AyaMC!

2016-09-08 Thread Gian-Carlo Pascutto
On 7/09/2016 21:21, Nick Wedd wrote:
> Congratulations to AyaMC, undefeated winner of the September slow KGS
> bot tournament, which ended earlier today!
> 
> My report is at http://www.weddslist.com/kgs/past/S16.2/index.html
> As usual, I will welcome your comments and corrections.

Given that LeelaBot and ManyFaces1 have the same points, SOS and SoDOS,
it seems more appropriate to list both at place 2 rather than Leela as
second and ManyFaces as third. Unless there's some tiebreak shenanigans
I'm not aware of.

The NeuralZ graphs are pretty. I wonder how they were generated.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Converging to 57%

2016-08-23 Thread Gian-Carlo Pascutto
On 23/08/2016 11:26, Brian Sheppard wrote:
> The learning rate seems much too high. My experience (which is from
> backgammon rather than Go, among other caveats) is that you need tiny
> learning rates. Tiny, as in 1/TrainingSetSize.

I think that's overkill, as in you effectively end up doing batch
gradient descent instead of mini-batch/SGD.

But yes, 0.01 is rather high with momentum. Try 0.001 for methods with
momentum, and with the default Adam parameters you have to go even lower
and try 0.0001.

> Neural networks are dark magic. Be prepared to spend many weeks just
> trying to figure things out. You can bet that the Google & FB results
> are just their final runs.

As always it's sad nobody publishes what didn't work saving us the time
of trying it all over again :-)

> Changing batching to match DarkForest style (making sure that a 
> minibatch contains samples from game phases... for example
> beginning, middle and end-game).

This sounds a bit suspicious. The entries in your minibatch should be
randomly selected from your entire training set, so statistically having
positions from all phases would be guaranteed. (Or you can shuffle the
entire training set before the epoch, instead of randomly picking during
it).

Don't feed the positions in in-order or from the same game...

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Converging to 57%

2016-08-23 Thread Gian-Carlo Pascutto
On 23-08-16 08:57, Detlef Schmicker wrote:

> So, if somebody is sure, it is measured against GoGod, I think a 
> number of other go programmers have to think again. I heard them 
> reaching 51% (e. g. posts by Hiroshi in this list)

I trained a 128 x 14 network for Leela 0.7.0 and this gets 51.1% on GoGoD.

Something I noticed from the papers is that the prediction percentage
keeps going upwards with more epochs, even if slowly, but still
clearly up.

In my experience my networks converge rather quickly (like >0.5% per
epoch after the first), get stuck, get one more 0.5% gain if I lower
the learning rate (by a factor 5 or 10) and don't gain any more
regardless of what I do thereafter.

I do use momentum. IIRC I tested without momentum once and it was
worse, and much slower.

I did not find any improvement in playing strength from doing
Facebook's 3 move prediction. Perhaps it needs much bigger networks
than 128 x 12.

Adding ladder features also isn't good enough to (consistently) keep
the network from playing into them. (And once it's played the first
move, you're totally SOL because the resulting positions aren't in the
training set and you'll get 99% confidence for continuing the losing
ladder moves)

I'm currently doing a more systematic comparison of all methods (and
GoGoD vs KGS+GoGoD) on 128 x 12, and testing the resulting strength
(rather than looking at prediction %). I'll post the results here, if
anything definite comes out of it.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Congratulations to Zen!

2016-07-17 Thread Gian-Carlo Pascutto
On 17/07/2016 17:03, Xavier Combelle wrote:
> It seems that on my firefox 47.0.1 some html entities are rendered for
> example komi 7  but with the 9times;9 tournament still running,

Renders correctly here and also with a clean Firefox profile, so the
problem is likely either your addons/extensions or your settings.

That said, the HTML of that page has a fair number of errors: the meta
tag looks corrupted, there's no encoding information, ampersands in
links aren't properly escaped, there's a few stray  and the Annual
points table also has stray closing table tags. So it's also not that
surprising that it can get misrendered.

Nick, maybe it's worthwhile to run your template through an HTML validator.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] DarkForest is open-source now.

2016-06-10 Thread Gian-Carlo Pascutto
On 10/06/2016 19:57, Darren Cook wrote:
> At 5d KGS, is this the world's strongest MIT/BSD licensed program? ...
> actually, is there any other MIT/BSD go program out there? (I thought
> Pachi was, but it is GPLv2)

Huh, that's interesting, because Darkforest seems to have copy-pasted
the pachi playout policy:

https://github.com/facebookresearch/darkforestGo/blob/master/board/pattern.c#L36

https://github.com/pasky/pachi/blob/master/playout/moggy.c#L101

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

  1   2   3   >