Re: [Computer-go] Training an AlphaGo Zero-like algorithm with limited hardware on 7x7 boards

2020-01-27 Thread Álvaro Begué
To be clear, what I was talking about was building an opening book as
part of the game-generation process that produces training data for
the neural network. This makes sure you don't generate the same game
over and over again.

A few more things about my Spanish checkers experiment from a few
years ago:
 * I used a neural network as an evaluation function, and alpha-beta
as the search algorithm. The networks I tried were fully connected and
quite small compared to anything people are trying these days. The
only game-specific knowledge I provided was not stopping the search if
a capture is available (a primitive quiescence search that works well
for checkers).
 * I couldn't get very far until I provided access to endgame
tablebases. An important purpose of the evaluation function is to
establish if there is enough advantage for one side to convert the
game into a win, and the shallow searches I was performing in the
generated games weren't strong enough in the endgame to determine
this. Once I generated 6-men tablebases (pretty easy to do for
checkers), it became very strong very quickly (about 1 week of
computation, if I remember correctly).

If I find some time in the next few weeks, I'll try to repeat the
process for Ataxx.

Álvaro.

>
> Building an opening book is a good idea. I do it too.
>
> By the way, if anybody is interested, I have put a small 9x9 opening book 
> online:
> https://www.crazy-sensei.com/book/go_9x9/
> Evaluation is +1 for a win, -1 for a loss, for a komi of 7. It may not be 
> very good, because evaluations was done by my 19x19 network. I have started 
> to train a specialized 9x9 network last week, and it is already stronger.
>
> Rémi
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Training an AlphaGo Zero-like algorithm with limited hardware on 7x7 boards

2020-01-27 Thread Álvaro Begué
For checkers, I used a naive implementation of UCT as my opening book
(the "playout" being the actual game where the engine is thinking). So
towards the end of the opening book there is always a position where
it will try a random move, but in the long run good opening moves will
be explored more often. I think this method might work well for other
games.

Álvaro.

On Mon, Jan 27, 2020 at 6:04 AM Rémi Coulom  wrote:
>
> This is a report after my first day of training my Ataxx network:
> https://www.game-ai-forum.org/viewtopic.php?f=24=693
> Ataxx is played on a 7x7 board. The rules are different, but I expect 7x7 Go 
> would produce similar results. 2k self-play games are more than enough to 
> produce a huge strength improvement at the beginning.
>
> It would take my system less than one day to generate 285k games on a single 
> GPU. But speed optimizations are probably not your biggest problem at the 
> moment.
>
> As I wrote in my previous message, it is important to control the variety of 
> your self-play game. In my program, I have a function to count the number of 
> distinct board configurations for each move number of the self-play games. 
> This way, I can ensure that the same opening is not replicated too many times.
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Accelerating Self-Play Learning in Go

2019-03-03 Thread Álvaro Begué
From before AlphaGo was announced, I thought the way forward was
generating games that play to the bitter end maximizing score, and
then using the final ownership as something to predict. I am very glad
that someone has had the time to put this idea (and many others!) into
practice. Congratulations on a very compelling paper.

Álvaro.


On Sun, Mar 3, 2019 at 9:21 PM David Wu  wrote:
>
> For any interested people on this list who don't follow Leela Zero discussion 
> or reddit threads:
>
> I recently released a paper on ways to improve the efficiency of 
> AlphaZero-like learning in Go. A variety of the ideas tried deviate a little 
> from "pure zero" (e.g. ladder detection, predicting board ownership), but 
> still only uses self-play starting from random and with no outside human data.
>
> Although longer training runs have NOT yet been tested, for reaching up to 
> about LZ130 strength so far (strong human pro or just beyond it, depending on 
> hardware), you can speed up the learning to that point by roughly a factor of 
> 5 at least compared to Leela Zero, and closer to a factor of 30 for merely 
> reaching the earlier level of very strong amateur strength rather than pro or 
> superhuman.
>
> I found some other interesting results, too - for example contrary to 
> intuition built up from earlier-generation MCTS programs in Go, putting 
> significant weight on score maximization rather than only win/loss seems to 
> help.
>
> Blog post: https://blog.janestreet.com/accelerating-self-play-learning-in-go/
> Paper: https://arxiv.org/abs/1902.10565
> Code: https://github.com/lightvector/KataGo
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Paper “Complexity of Go” by Robson

2018-06-21 Thread Álvaro Begué
...or you could just not get your knickers in a twist over somebody's
pronoun selection. I am here for the discussions about computer go,
not gender politics.



On Thu, Jun 21, 2018 at 6:24 PM, Mario Xerxes Castelán Castro
 wrote:
> “He” is the genetic singular pronoun in English. If anybody feels
> excluded, is because he wants to feel excluded or is intentionally
> playing the ignorant card. What happen is that the social justice
> warrior-dominated United States is the source of many attempts to
> redefine reality when it goes against the ultraliberal agenda.
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Paper “Complexity of Go” by Robson

2018-06-18 Thread Álvaro Begué
I don't think ko fights have anything to do with this. John Tromp told
me that ladders are PSPACE complete: https://tromp.github.io/lad.ps

Álvaro.



On Mon, Jun 18, 2018 at 2:58 PM, uurtamo  wrote:
> FWIW, first-capture go (i.e. winner is first one to make a capture) should
> not be PSPACE-complete.
>
> the thing in go that makes it hard is ko fights, which don't exist in
> capture go.
>
> s.
>
>
> On Mon, Jun 18, 2018 at 11:55 AM Marcel Crasmaru 
> wrote:
>>
>> Errata: > reduction from GO to an EXP hard problem
>>
>> should be the other way around :)
>>
>> --Marcel
>>
>> On 18 June 2018 at 19:36, Marcel Crasmaru  wrote:
>> >>   J. M. Robson (1983) “The Complexity of Go”. Proceedings of the IFIP
>> >> Congress 1983 p. 413-417.
>> >
>> > If you are interested in how to prove that GO with kos and Japanese
>> > rules is EXP complete you can get the gist of it from a very early
>> > draft of my master thesis
>> > - I used Robson's idea of reduction from GO to an EXP hard problem
>> > using ladders instead of pipes (he used groups
>> > connected through long string of pieces, aka, "pipes")
>> >
>> > If you have related questions I am happy to answer them although John
>> > Tromp might have even better insights - ask him too.
>> >
>> > Best,
>> > Marcel
>> >
>> > On 18 June 2018 at 17:54, Mario Xerxes Castelán Castro
>> >  wrote:
>> >> Hello. I am asking for help finding the following paper:
>> >>
>> >>   J. M. Robson (1983) “The Complexity of Go”. Proceedings of the IFIP
>> >> Congress 1983 p. 413-417.
>> >>
>> >> I could not find it online. There is no DOI anywhere to be found (I
>> >> searched Crossref and here:
>> >> https://dblp.uni-trier.de/db/conf/ifip/ifip83.html#Robson83 ) and the
>> >> conference proceedings are not in Library Genesis either.
>> >>
>> >> Thanks in advance.
>> >>
>> >>
>> >> ___
>> >> Computer-go mailing list
>> >> Computer-go@computer-go.org
>> >> http://computer-go.org/mailman/listinfo/computer-go
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] 9x9 is last frontier?

2018-03-06 Thread Álvaro Begué
Sorry, I haven't been paying enough attention lately to know what
"alpha-beta rollouts" means precisely. Can you either describe them or give
me a reference?

Thanks,
Álvaro.



On Tue, Mar 6, 2018 at 1:49 PM, Dan  wrote:

> I did a quick test with my MCTS chess engine wth two different
> implementations.
> A standard MCTS with averaging, and MCTS with alpha-beta rollouts. The
> result is like a 600 elo difference
>
> Finished game 44 (scorpio-pmcts vs scorpio-mcts): 1/2-1/2 {Draw by 3-fold
> repetition}
> Score of scorpio-mcts vs scorpio-pmcts: 41 - 1 - 2  [0.955] 44
> Elo difference: 528.89 +/- nan
>
> scorpio-mcts uses alpha-beta rollouts
> scorpio-pmcts is "pure" mcts with averaging and UCB formula.
>
> Daniel
>
> On Tue, Mar 6, 2018 at 11:46 AM, Dan  wrote:
>
>> I am pretty sure it is an MCTS problem and I suspect not something that
>> could be easily solved with a policy network (could be wrong hree). My
>> opinon is that DCNN is not
>> a miracle worker (as somebody already mentioned here) and it is going to
>> fail  resolving tactics.  I would be more than happy with it if it has same
>> power as a qsearch to be honest.
>>
>> Search traps are the major problem with games like Chess, and what makes
>> transitioning the success of DCNN from Go to Chess non trivial.
>> The following paper discusses shallow traps that are prevalent in chess.
>> ( https://www.aaai.org/ocs/index.php/ICAPS/ICAPS10/paper/downl
>> oad/1458/1571 )
>> They mention traps make MCTS very inefficient.  Even if the MCTS is given
>> 50x more time is needed by an exhaustive minimax tree, it could fail to
>> find a level-5 or level-7 trap.
>> It will spend, f.i, 95% of its time searching an asymetric tree of depth
>> > 7 when a shallow trap of depth-7 exists, thus, missing to find the
>> level-7 trap.
>> This is very hard to solve even if you have unlimited power.
>>
>> The plain MCTS as used by AlphaZero is the most ill-suited MCTS version
>> in my opinion and i have hard a hard time seeing how it can be competitive
>> with Stockfish tactically.
>>
>> My MCTS chess engine with  AlphaZero like MCTS was averaging was missing
>> a lot of tactics. I don't use policy or eval networks but qsearch() for
>> eval, and the policy is basically
>> choosing which ever moves leads to a higher eval.
>>
>> a) My first improvement to the MCTS is to use minimax backups instead of
>> averaging. This was an improvmenet but not something that would solve the
>> traps
>>
>> b) My second improvment is to use alphabeta rollouts. This is a rollouts
>> version that can do nullmove and LMR etc... This is a huge improvment and
>> none of the MCTS
>> versons can match it. More on alpha-beta rollouts here (
>> https://www.microsoft.com/en-us/research/wp-content/upload
>> s/2014/11/huang_rollout.pdf )
>>
>> So AlphaZero used none of the above improvements and yet it seems to be
>> tactically strong. Leela-Zero suffered from tactical falls left and right
>> too as I expected.
>>
>> So the only explanation left is the policy network able to avoid traps
>> which I find hard to believe it can identify more than a qsearch level
>> tactics.
>>
>> All I am saying is that my experience (as well as many others) with MCTS
>> for tactical dominated games is bad, and there must be some breakthrough in
>> that regard in AlphaZero
>> for it to be able to compete with Stockfish on a tactical level.
>>
>> I am curious how Remi's attempt at Shogi using AlphaZero's method will
>> turnout.
>>
>> regards,
>> Daniel
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Mar 6, 2018 at 9:41 AM, Brian Sheppard via Computer-go <
>> computer-go@computer-go.org> wrote:
>>
>>> Training on Stockfish games is guaranteed to produce a blunder-fest,
>>> because there are no blunders in the training set and therefore the policy
>>> network never learns how to refute blunders.
>>>
>>>
>>>
>>> This is not a flaw in MCTS, but rather in the policy network. MCTS will
>>> eventually search every move infinitely often, producing asymptotically
>>> optimal play. But if the policy network does not provide the guidance
>>> necessary to rapidly refute the blunders that occur in the search, then
>>> convergence of MCTS to optimal play will be very slow.
>>>
>>>
>>>
>>> It is necessary for the network to train on self-play games using MCTS.
>>> For instance, the AGZ approach samples next states during training games by
>>> sampling from the distribution of visits in the search. Specifically: not
>>> by choosing the most-visited play!
>>>
>>>
>>>
>>> You see how this policy trains both search and evaluation to be
>>> internally consistent? The policy head is trained to refute the bad moves
>>> that will come up in search, and the value head is trained to the value
>>> observed by the full tree.
>>>
>>>
>>>
>>> *From:* Computer-go [mailto:computer-go-boun...@computer-go.org] *On
>>> Behalf Of *Dan
>>> *Sent:* Monday, March 5, 2018 4:55 AM
>>> *To:* computer-go@computer-go.org
>>> *Subject:* 

Re: [Computer-go] Crazy Stone is back

2018-03-01 Thread Álvaro Begué
> I tried chain pooling too, and it was too slow. It made the network about
twice slower in tensorflow (using tf.unsorted_segment_sum or max). I'd
rather have twice more layers.

tf.unsorted_segment_max didn't exist in the first public release of
TensorFlow, so I requested it just for this purpose (
https://github.com/tensorflow/tensorflow/issues/549). Too bad it's too slow
to be useful.

Thanks for sharing some details of what you have learned so far!

Álvaro.




On Thu, Mar 1, 2018 at 5:48 AM, Rémi Coulom  wrote:

> Hi David,
>
> Thanks for sharing your experiments. It is very interesting.
>
> I tried chain pooling too, and it was too slow. It made the network about
> twice slower in tensorflow (using tf.unsorted_segment_sum or max). I'd
> rather have twice more layers.
>
> I never tried dilated convolutions. That sounds interesting.
>
> The value network of AQ has an interesting architecture. It does not go
> directly from 19x19 to scalar, but works like image-recognition networks,
> with 2x2 pooling until it reaches 1x1. I have not tried it yet, but that
> feels like a good idea.
>
> Rémi
>
> - Mail original -
> De: "David Wu" 
> À: computer-go@computer-go.org
> Envoyé: Mercredi 28 Février 2018 20:04:11
> Objet: Re: [Computer-go] Crazy Stone is back
>
>
>
>
> It's not even just liberties and semeai, it's also eyes. Consider for
> example a large dragon that has miai for 2 eyes in distant locations, and
> the opponent then takes one of them - you'd like the policy net to now
> suggest the other eye-making move far away. And you'd also like the value
> net to distinguish the three situations where the whole group has 2 eyes
> even when they are distant versus the ones where it doesn't.
>
>
> I've been doing experiments with somewhat smaller neural nets (roughly 4-7
> residual blocks = 8-14 layers), without sticking to an idealized "zero"
> approach. I've only experimented with policy nets so far, but presumably
> much of this should also transfer to a value net's understanding too.
>
>
>
> 1. One thing I tried was chain pooling, which was neat, but ultimately
> didn't seem promising:
>
> https://github.com/lightvector/GoNN#chain-pooling
> It solves all of these problems when the strings are solidly connected. It
> helps also when the strings are long but not quite solidly connected too,
> the information still propagates faster than without it. But of course, if
> there are lots of little strings forming a group, diagonal connections,
> bamboo joints, etc, then of course it won't help. And also chain pooling is
> computationally costly, at least in Tensorflow, and it might have negative
> effects on the rest of the neural net that I don't understand.
>
>
>
> 2. A new thing I've been trying recently that actually does seem
> moderately promising is dilated convolutions, although I'm still early in
> testing. They also help increase the speed of information propagation, and
> don't require solidly connected strings, and also are reasonably cheap.
>
>
>
> In particular: my residual blocks have 192 channels, so I tried taking
> several of the later residual blocks in the neural net and making 64 of the
> channels of the first convolution in each block use dilated convolutions
> (leaving 128 channels of regular convolutions), with dilation factors of 2
> or 3. Intuitively, the idea is that earlier blocks could learn to compute
> 2x2 or 3x3 connectivity patterns, and then the dilated convolutions in
> later residual blocks will be able to use that to propagate information
> several spaces at a time across connected groups or dragons.
>
>
> So far, indications are that this works. W hen I looked at it in various
> board positions, it helped in a variety of capturing race and
> large-dragon-two-eye-miai situations, correctly suggesting moves that the
> net without dilated convolutions would fail to find due to the move being
> too far away. Also d ilated convolutions seem pretty cheap - it only
> slightly increases the computational cost of the net.
>
>
> So far, I've found that it doesn't significantly improve the overall loss
> function, presumably because now there are 128 channels instead of 192
> channels of ordinary convolutions, so in return for being better at
> long-distance interactions, the neural net has gotten worse at some local
> tactics. But it also hasn't gotten worse the way it would if I simply
> dropped the number of channels from 192 to 128 without adding any new
> channels, so the dilated convolutions are being "used" for real work.
>
> I'd be curious to hear if anyone else has tried dilated convolutions and
> what results they got. If there's anything at all to do other than just add
> more layers, I think they're the most promising thing I know of.
>
>
>
>
> On Wed, Feb 28, 2018 at 12:34 PM, Rémi Coulom < remi.cou...@free.fr >
> wrote:
>
>
> 192 and 256 are the numbers of channels. They are fully connected, so the
> number of 3x3 

Re: [Computer-go] MiniGo open sourced

2018-01-30 Thread Álvaro Begué
Chrilly Donninger's quote was probably mostly true in the 90s, but it's now
obsolete. That intellectual protectionism was motivated by the potential
economic profit of having a strong engine. It probably slowed down computer
chess for decades, until the advent of strong open-source programs.
Paradoxically, when the economic incentive to create strong engines was
removed, we saw an explosion in strength.

Álvaro.


On Tue, Jan 30, 2018 at 2:14 PM, "Ingo Althöfer" <3-hirn-ver...@gmx.de>
wrote:

> Hi,
>
> GCP wrote:
> > ...
> > > Of course, in the end, strength is the best way to tell that your
> > > implementation is correct :)
> >
> > In other words, do not take "correct" as necessarily meaning "matching
> > the published research".
>
> Chrilly Donnninger, one of the computer chess gurus in the 1990's and
> the early 200x's (project Hydra) had an expressed opinion:
> "Those who know, do not publish.
> And those who publish do not know."
> He himself violated this rule in the early 1990's when he published
> a price-winning paper on how to implement null-move search correctly.
>
> Ingo.
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Art or not?

2017-12-05 Thread Álvaro Begué
I have a personal definition of art that works pretty well: Pretentious
entertainment. Emphasis on “pretentious”.

On a more serious note, I don’t care if anything I produce is art or not,
and neither should you. If you enjoy what you are doing, keep it up!

Álvaro.

On Tuesday, December 5, 2017, "Ingo Althöfer" <3-hirn-ver...@gmx.de> wrote:

> Hi,
> for the traditional Go scene it is hard to
> live in the period of the Alpha revolution.
>
> Part of my process to cope with the changes is
> to design some collages. Here is an example:
>
> http://www.dgob.de/yabbse/index.php?action=dlattach;
> topic=5107.0;attach=6159;image
>
> The German Go scene is just discussing
> if this is art or not.
>
> Ingo.
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Álvaro Begué
The text in the appendix has the answer, in a paragraph titled "Expand and
evaluate (Fig. 2b)":
  "[...] The leaf node is expanded and and each edge (s_t, a) is
initialized to {N(s_t, a) = 0, W(s_t, a) = 0, Q(s_t, a) = 0, P(s_t, a) =
p_a}; [...]"



On Sun, Dec 3, 2017 at 11:27 AM, Andy <andy.olsen...@gmail.com> wrote:

> Figure 2a shows two bolded Q+U max values. The second one is going to a
> leaf that doesn't exist yet, i.e. not expanded yet. Where do they get that
> Q value from?
>
> The associated text doesn't clarify the situation: "Figure 2: Monte-Carlo
> tree search in AlphaGo Zero. a Each simulation traverses the tree by
> selecting the edge with maximum action-value Q, plus an upper confidence
> bound U that depends on a stored prior probability P and visit count N for
> that edge (which is incremented once traversed). b The leaf node is
> expanded..."
>
>
>
>
>
>
> 2017-12-03 9:44 GMT-06:00 Álvaro Begué <alvaro.be...@gmail.com>:
>
>> I am not sure where in the paper you think they use Q(s,a) for a node s
>> that hasn't been expanded yet. Q(s,a) is a property of an edge of the
>> graph. At a leaf they only use the `value' output of the neural network.
>>
>> If this doesn't match your understanding of the paper, please point to
>> the specific paragraph that you are having trouble with.
>>
>> Álvaro.
>>
>>
>>
>> On Sun, Dec 3, 2017 at 9:53 AM, Andy <andy.olsen...@gmail.com> wrote:
>>
>>> I don't see the AGZ paper explain what the mean action-value Q(s,a)
>>> should be for a node that hasn't been expanded yet. The equation for Q(s,a)
>>> has the term 1/N(s,a) in it because it's supposed to average over N(s,a)
>>> visits. But in this case N(s,a)=0 so that won't work.
>>>
>>> Does anyone know how this is supposed to work? Or is it another detail
>>> AGZ didn't spell out?
>>>
>>>
>>>
>>> ___
>>> Computer-go mailing list
>>> Computer-go@computer-go.org
>>> http://computer-go.org/mailman/listinfo/computer-go
>>>
>>
>>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Álvaro Begué
I am not sure where in the paper you think they use Q(s,a) for a node s
that hasn't been expanded yet. Q(s,a) is a property of an edge of the
graph. At a leaf they only use the `value' output of the neural network.

If this doesn't match your understanding of the paper, please point to the
specific paragraph that you are having trouble with.

Álvaro.



On Sun, Dec 3, 2017 at 9:53 AM, Andy  wrote:

> I don't see the AGZ paper explain what the mean action-value Q(s,a) should
> be for a node that hasn't been expanded yet. The equation for Q(s,a) has
> the term 1/N(s,a) in it because it's supposed to average over N(s,a)
> visits. But in this case N(s,a)=0 so that won't work.
>
> Does anyone know how this is supposed to work? Or is it another detail AGZ
> didn't spell out?
>
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Learning related stuff

2017-11-21 Thread Álvaro Begué
The term you are looking for is "transfer learning":
https://en.wikipedia.org/wiki/Transfer_learning


On Tue, Nov 21, 2017 at 5:27 PM, "Ingo Althöfer" <3-hirn-ver...@gmx.de>
wrote:

> Hi Erik,
>
> > No need for AlphaGo hardware to find out; any
> > toy problem will suffice to explore different
> > initialization schemes...
>
> I know that.
>
> My intention with the question is a different one:
> I am thinking how humans are learning. Is it beneficial
> to have learnt related - but different - stuff before?
> The answer will depend on the case, of course.
>
> And in my role as a voyeur, I want to understand if having
> learnt a Go variant X before turning my interest to a
> "slightly" different Go variant Y. Do, I want to combine
> the subject with some entertaining learning process.
> (For instance, looking at the AlphaGo Zero games from the
> 72 h experiment in steps of 2 hours was not only insightful
> but also entertaining.)
>
>
> > you typically want to start with small weights so
> > that the initial mapping is relatively smooth.
>
> But again: For instance, when a eight year old child starts
> to play violin, is it helpful or not when it had played
> say a trumpet before?
>
> My understanding is that the AlphaGo hardware is standing
> somewhere in London, idle and waitung for new action...
>
> Ingo.
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication

2017-11-10 Thread Álvaro Begué
It's a model written using the Keras neural network library:
https://en.wikipedia.org/wiki/Keras


On Fri, Nov 10, 2017 at 7:09 AM, Xavier Combelle 
wrote:

> You make me really curious,
> what is a Keras model ?
>
> Le 10/11/2017 à 01:47, Petr Baudis a écrit :
> >   Hi,
> >
> >   I got first *somewhat* positive results in my attempt to reproduce
> > AlphaGo Zero - 25% winrate against GNUGo on the easiest reasonable task
> > - 7x7 board. :)  a.k.a.
> >
> >   "Sometimes beating GNUGo on a tiny board" without human knowledge
> >
> > (much wow!)
> >
> >   Normally this would be a pretty weak result much but (A) I wanted to
> > help calibrate other efforts on larger boards that are possibly still
> > at the "random" stage, and (B) I'll probably move on to other projects
> > again soon, so this might be as good as it gets for me.
> >
> >   I started the project by replacing MC simulations with a Keras model
> > in my 550-line educational Go program Michi - it lived in its `nnet`
> > branch until now when I separated it to a project on its own:
> >
> >   https://github.com/rossumai/nochi
> >
> > Starting from a small base means that the codebase is tiny and should be
> > easy to follow, though it's not at all as tidy as Michi is.
> >
> > You can grab the current training state (== pickled archive of selfplay
> > positions used for replay, chronological) and neural network weights
> > from the github's "Releases" page:
> >
> >   https://github.com/rossumai/nochi/releases/tag/
> G171107T013304_00150
> >
> >   This is a truly "zero-knowledge" system like AlphaGo Zero - it needs
> > no supervision, and it contains no Monte Carlo simulations or other
> > heuristics. But it's not entirely 1:1, I did some tweaks which I thought
> > might help early convergence:
> >
> >   * AlphaGo used 19 resnet layers for 19x19, so I used 7 layers for 7x7.
> >   * The neural network is updated after _every_ game, _twice_, on _all_
> > positions plus 64 randomly sampled positions from the entire history,
> > this all done four times - on original position and the three
> > symmetry flips (but I was too lazy to implement 90\deg rotation).
> >   * Instead of supplying last 8 positions as the network input I feed
> > just the last position plus two indicator matrices showing
> > the location of the last and second-to-last move.
> >   * No symmetry pruning during tree search.
> >   * Value function is trained with cross-entropy rather than MSE,
> > no L2 regularization, and plain Adam rather than hand-tuned SGD (but
> > the annealing is reset time by time due to manual restarts of the
> > script from a checkpoint).
> >   * No resign auto-threshold but it is important to play 25% games
> > without resigning to escale local "optima".
> >   * 1/Temperature is 2 for first three moves.
> >   * Initially I used 1000 "simulations" per move, but by mistake, last
> > 1500 games when the network improved significantly (see below) were
> > run with 2000 simulations per move.  So that might matter.
> >
> >   This has been running for two weeks, self-playing 8500 games.  A week
> > ago its moves already looked a bit natural but it was stuck in various
> > local optima.  Three days ago it has beaten GNUGo once across 20 games.
> > Now five times across 20 games - so I'll let it self-play a little longer
> > as it might surpass GNUGo quickly at this point?  Also this late
> > improvement coincides with the increased simulation number.
> >
> >   At the same time, Nochi supports supervised training (with the rest
> > kept the same) which I'm now experimenting with on 19x19.
> >
> >   Happy training,
> >
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero self-play temperature

2017-11-07 Thread Álvaro Begué
Your understanding matches mine. My guess is that they had a temperature
parameter in the code that would allow for things like slowly transitioning
from random sampling to deterministically picking the maximum, but they
ended up using only those particular values.

Álvaro.




On Tue, Nov 7, 2017 at 1:07 PM, Imran Hendley 
wrote:

> Hi, I might be having trouble understanding the self-play policy for
> AlphaGo Zero. Can someone let me know if I'm on the right track here?
>
> The paper states:
>
> In each position s, an MCTS search is executed, guided by the neural
> network f_θ . The
> MCTS search outputs probabilities π of playing each move.
>
>
> This wasn't clear at first since MCTS outputs wins and visits, but later
> the paper explains further:
>
> MCTS may be viewed as a self-play algorithm that, given neural
> network parameters θ and a root position s, computes a vector of search
> probabilities recommending moves to play, π =​  α_θ(s), proportional to
> the exponentiated visit count for each move, π_a ∝​  N(s, a)^(1/τ) , where
> τ is
> a temperature parameter.
>
>
> So this makes sense, but when I looked for the schedule for decaying the
> temperature all I found was the following in the Self-play section of
> Methods:
>
>
> For the first 30 moves of each game, the temperature is set to τ = ​1; this
> selects moves proportionally to their visit count in MCTS, and ensures a
> diverse
> set of positions are encountered. For the remainder of the game, an
> infinitesimal
> temperature is used, τ→​0.
>
> This sounds like they are sampling proportional to visits for the first 30
> moves since τ = ​1 makes the exponent go away, and after that they are
> playing the move with the most visits, since the probability of the move
> with the most visits goes to 1 and the probability of all other moves goes
> to zero in the expression π(a | s_0) = N(s_0 , a)^(1/τ) / ∑ b N(s_0 ,
> b)^(1/τ) as τ goes to 0 from the right.
>
> Am I understanding this correctly? I am confused because it seems a little
> convoluted to define this simple policy in terms of a temperature. When
> they mentioned temperature I was expecting something that slowly decays
> over time rather than only taking two trivial values.
>
> Thanks!
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero SGF - Free Use or Copyright?

2017-10-30 Thread Álvaro Begué
I am not sure how people are designing self-driving cars, but if it were up
to me, it would be very explicitly about maximizing expected utility. A
neural network can be trained to estimate the expected sum of future
rewards, usually with some exponential future discount. Actually, that's
explicitly what Q-learning does, and it's not that different from how
AlphaGo's value network works.

The fact that it's hard to figure out why a neural network did what it did
is not worse than the situation with humans. We don't understand neurology
well enough to know why someone didn't see a pedestrian or a red light. And
somehow the legal system doesn't collapse. In the case of neural networks,
the case that resulted in the accident and similar cases can be added to
the training database to make future versions of the network more robust,
so over time the number of accidents should drop fast.

Álvaro.



On Mon, Oct 30, 2017 at 6:06 PM, Pierce T. Wetter III <
pie...@alumni.caltech.edu> wrote:

> I would argue that if I was an engineer for a hypothetical autonomous car
> manufacturer, that it would be critically important to keep a running
> circular buffer of all the inputs over time for the car. Sort of like how
> existing cars have Dash Cams that continuously record to flash, but only
> keep the video if you tell it to or it detects major G forces.
>
> To your point, I’m not sure the car would necessarily be able to tell tree
> from child, tree might be “certain large obstacle” and child is “smaller
> large obstacle”. So that would give them the same utility function -1000.
> But utility functions are rarely so straightforward in a neural network as
> you suppose.
>
> I think it would take differential analysis (A term I just made up) to
> determine the utility function, which is why having a continuous log of all
> the input streams is necessary.
>
> On Oct 30, 2017, 3:45 PM -0700, Álvaro Begué <alvaro.be...@gmail.com>,
> wrote:
>
> In your hypothetical scenario, if the car can give you as much debugging
> information as you suggest (100% tree is there, 95% child is there), you
> can actually figure out what's happening. The only other piece of
> information you need is the configured utility values for the possible
> outcomes.
>
> Say the utility of hitting a tree is -1000, the utility of hitting a child
> is -5000 and the utility of not hitting anything is 0. A rational agent
> maximizes the expected value of the utility function. So:
>  - Option A: Hit the tree. Expected utility = -1000.
>  - Option B: Avoid the tree, possibly hitting the child, if there is a
> child there after all. Expected utility: 0.95 * (-5000) + 0.05 * 0 = -4750.
>
> So the car should pick option A. If the configured utility function is
> such that hitting a tree and hitting a child have the same value, the
> lawyers would be correct that the programmers are endangering the public
> with their bad programming.
>
> Álvaro.
>
>
>
> On Mon, Oct 30, 2017 at 2:22 PM, Pierce T. Wetter III <
> pie...@alumni.caltech.edu> wrote:
>
>> Unlike humans, who have these pesky things called rights, we can abuse
>> our computer programs to deduce why they made decisions. I can see a future
>> where that has to happen. From my experience in trying to best the stock
>> market with an algorithm I can tell you that you have to be able to explain
>> why something happened, or the CEO will rest control away from the
>> engineers.
>>
>> Picture a court case where the engineers for an electric car are called
>> upon to testify about why a child was killed by their self driving car. The
>> fact that the introduction of the self-driving car has reduced the accident
>> rate by 99% doesn’t matter, because the court case is about *this* car
>> and *this* child. The 99% argument is for the closing case, or for the
>> legislature, but it’s early yet.
>>
>> The Manufacturer throws up their arms and says “we dunno, sorry”.
>>
>> Meanwhile, the plaintiff has hired someone who has manipulated the inputs
>> to the neural net, and they’ve figured out that the car struck the child,
>> because the car was 100% sure the tree was there, but it could only be 95%
>> sure the child was there. So it ruthlessly aimed for the lesser
>> probability.
>>
>> The plaintiff’s lawyer argues that a human would have rather hit a tree
>> than a child.
>>
>> Jury awards $100M in damages to the plaintiffs.
>>
>> I would think it would be possible to do “differential” analysis on AGZ
>> positions to see why AGZ made certain moves. Add an eye to a weak group,
>> etc. Essentially that’s what we’re doing with MCTS, right?
>>
>>

Re: [Computer-go] AlphaGo Zero SGF - Free Use or Copyright?

2017-10-30 Thread Álvaro Begué
In your hypothetical scenario, if the car can give you as much debugging
information as you suggest (100% tree is there, 95% child is there), you
can actually figure out what's happening. The only other piece of
information you need is the configured utility values for the possible
outcomes.

Say the utility of hitting a tree is -1000, the utility of hitting a child
is -5000 and the utility of not hitting anything is 0. A rational agent
maximizes the expected value of the utility function. So:
 - Option A: Hit the tree. Expected utility = -1000.
 - Option B: Avoid the tree, possibly hitting the child, if there is a
child there after all. Expected utility: 0.95 * (-5000) + 0.05 * 0 = -4750.

So the car should pick option A. If the configured utility function is such
that hitting a tree and hitting a child have the same value, the lawyers
would be correct that the programmers are endangering the public with their
bad programming.

Álvaro.



On Mon, Oct 30, 2017 at 2:22 PM, Pierce T. Wetter III <
pie...@alumni.caltech.edu> wrote:

> Unlike humans, who have these pesky things called rights, we can abuse our
> computer programs to deduce why they made decisions. I can see a future
> where that has to happen. From my experience in trying to best the stock
> market with an algorithm I can tell you that you have to be able to explain
> why something happened, or the CEO will rest control away from the
> engineers.
>
> Picture a court case where the engineers for an electric car are called
> upon to testify about why a child was killed by their self driving car. The
> fact that the introduction of the self-driving car has reduced the accident
> rate by 99% doesn’t matter, because the court case is about *this* car
> and *this* child. The 99% argument is for the closing case, or for the
> legislature, but it’s early yet.
>
> The Manufacturer throws up their arms and says “we dunno, sorry”.
>
> Meanwhile, the plaintiff has hired someone who has manipulated the inputs
> to the neural net, and they’ve figured out that the car struck the child,
> because the car was 100% sure the tree was there, but it could only be 95%
> sure the child was there. So it ruthlessly aimed for the lesser
> probability.
>
> The plaintiff’s lawyer argues that a human would have rather hit a tree
> than a child.
>
> Jury awards $100M in damages to the plaintiffs.
>
> I would think it would be possible to do “differential” analysis on AGZ
> positions to see why AGZ made certain moves. Add an eye to a weak group,
> etc. Essentially that’s what we’re doing with MCTS, right?
>
> It seems like a fun research project to try to build a system that can
> reverse engineer AGZ, and not only would it be fun, but its a moral
> imperative.
>
> Pierce
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Source code (Was: Reducing network size? (Was: AlphaGo Zero))

2017-10-25 Thread Álvaro Begué
There are ways to do it, but it might be messy. However, the vast majority
of the computational effort will be in playing games to generate a training
database, and that part is trivial to distribute. Testing if the new
version is better than the old version is also very easy to distribute.

Álvaro.


On Wed, Oct 25, 2017 at 11:57 AM, Xavier Combelle  wrote:

> Is there some way to distribute learning of a neural network ?
>
> Le 25/10/2017 à 05:43, Andy a écrit :
>
> Gian-Carlo, I didn't realize at first that you were planning to create a
> crowd-sourced project. I hope this project can get off the ground and
> running!
>
> I'll look into installing this but I always find it hard to get all the
> tool chain stuff going.
>
>
>
> 2017-10-24 15:02 GMT-05:00 Gian-Carlo Pascutto :
>
>> On 23-10-17 10:39, Darren Cook wrote:
>> >> The source of AlphaGo Zero is really of zero interest (pun intended).
>> >
>> > The source code is the first-hand account of how it works, whereas an
>> > academic paper is a second-hand account. So, definitely not zero use.
>>
>> This should be fairly accurate:
>>
>> https://github.com/gcp/leela-zero
>>
>> --
>> GCP
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>
>
>
>
> ___
> Computer-go mailing 
> listComputer-go@computer-go.orghttp://computer-go.org/mailman/listinfo/computer-go
>
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero SGF - Free Use or Copyright?

2017-10-23 Thread Álvaro Begué
No, they are too few games for that.



On Mon, Oct 23, 2017 at 8:05 AM, Jim O'Flaherty 
wrote:

> Couldn't they be useful as part of a set of training data for newly
> trained engines and networks?
>
> On Oct 23, 2017 2:34 AM, "Petri Pitkanen" 
> wrote:
>
>> They are free to use in any attribution. Game score is a reflection of
>> historical fact and hence not copyrightable. Dunno what use them are to
>> anyone though.
>>
>> Petri
>>
>> 2017-10-23 2:29 GMT+03:00 Lucas Baker :
>>
>>> Hi Robert,
>>>
>>> The AlphaGo Zero games are free to use with proper attribution, so
>>> please use them as you like for commentaries as long as you credit DeepMind.
>>>
>>> Best,
>>> Lucas Baker
>>>
>>> On Sun, Oct 22, 2017 at 3:59 PM Robert Jasiek  wrote:
>>>
 AlphaGo Zero games are available as zipped SGF from Deepmind at
 http://www.alphago-games.com/ For earlier AlphaGo games, I have seen
 statements from Deepmind encouraging free use (presuming stating origin,
 of course) so that the games may be commented etc. I cannot find a
 similar statement from Deepmind for the published AlphaGo Zero games.
 Are they for free use or copyrighted? I hope the former so everybody
 including Deepmind can see more commentaries.

 --
 robert jasiek
 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go
>>>
>>>
>>> ___
>>> Computer-go mailing list
>>> Computer-go@computer-go.org
>>> http://computer-go.org/mailman/listinfo/computer-go
>>>
>>
>>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Zero performance

2017-10-20 Thread Álvaro Begué
I suggest scaling down the problem until some experience is gained.

You don't need the full-fledge 40-block network to get started. You can
probably get away with using only 20 blocks and maybe 128 features (from
256). That should save you about a factor of 8, plus you can use larger
mini-batches.

You can also start with 9x9 go. That way games are shorter, and you
probably don't need 1600 network evaluations per move to do well.

Álvaro.


On Fri, Oct 20, 2017 at 1:44 PM, Gian-Carlo Pascutto  wrote:

> I reconstructed the full AlphaGo Zero network in Caffe:
> https://sjeng.org/dl/zero.prototxt
>
> I did some performance measurements, with what should be
> state-of-the-art on consumer hardware:
>
> GTX 1080 Ti
> NVIDIA-Caffe + CUDA 9 + cuDNN 7
> batch size = 8
>
> Memory use is about ~2G. (It's much more for learning, the original
> minibatch size of 32 wouldn't fit on this card!)
>
> Running 2000 iterations takes 93 seconds.
>
> In the AlphaGo paper, they claim 0.4 seconds to do 1600 MCTS
> simulations, and they expand 1 node per visit (if I got it right) so
> that would be 1600 network evaluations as well, or 200 of my iterations.
>
> So it would take me ~9.3s to produce a self-play move, compared to 0.4s
> for them.
>
> I would like to extrapolate how long it will take to reproduce the
> research, but I think I'm missing how many GPUs are in each self-play
> worker (4 TPU or 64 GPU or ?), or perhaps the average length of the games.
>
> Let's say the latter is around 200 moves. They generated 29 million
> games for the final result, which means it's going to take me about 1700
> years to replicate this. I initially estimated 7 years based on the
> reported 64 GPU vs 1 GPU, but this seems far worse. Did I miss anything
> in the calculations above, or was it really a *pile* of those 64 GPU
> machines?
>
> Because the performance on playing seems reasonable (you would be able
> to actually run the MCTS on a consumer machine, and hence end up with a
> strong program), I would be interested in setting up a distributed
> effort for this. But realistically there will be maybe 10 people
> joining, 80 if we're very lucky (looking at Stockfish numbers). That
> means it'd still take 20 to 170 years.
>
> Someone please tell me I missed a factor of 100 or more somewhere. I'd
> love to be wrong here.
>
> --
> GCP
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-20 Thread Álvaro Begué
When I did something like this for Spanish checkers (training a neural
network to be the evaluation function in an alpha-beta search, without any
human knowledge), I solved the problem of adding game variety by using UCT
for the opening moves. That means that I kept a tree structure with the
opening moves and I used the UCB1 formula to pick the next move as long as
the game was in the tree. Once outside the tree, I used alpha-beta search
to play a normal [very fast] game.

One important characteristic of this UCT opening-book builder is that the
last move inside the tree is basically random, so this explores a lot of
unbalanced positions.

Álvaro.



On Fri, Oct 20, 2017 at 9:23 AM, Petr Baudis  wrote:

>   I tried to reimplement the system - in a simplified way, trying to
> find the minimum that learns to play 5x5 in a few thousands of
> self-plays.  Turns out there are several components which are important
> to avoid some obvious attractors (like the network predicting black
> loses on every move from its second game on):
>
>   - disabling resignation in a portion of games is essential not just
> for tuning resignation threshold (if you want to even do that), but
> just to correct prediction signal by actual scoring rather than
> starting to always resign early in the game
>
>   - dirichlet (or other) noise is essential for the network getting
> looped into the same game - which is also self-reinforcing
>
>   - i have my doubts about the idea of high temperature move choices
> at the beginning, especially with T=1 ... maybe that's just bad
> very early in the training
>
> On Thu, Oct 19, 2017 at 02:23:41PM +0200, Petr Baudis wrote:
> >   The order of magnitude matches my parameter numbers.  (My attempt to
> > reproduce a simplified version of this is currently evolving at
> > https://github.com/pasky/michi/tree/nnet but the code is a mess right
> > now.)
>
> --
> Petr Baudis, Rossum
> Run before you walk! Fly before you crawl! Keep moving forward!
> If we fail, I'd rather fail really hugely.  -- Moist von Lipwig
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-19 Thread Álvaro Begué
Yes, residual networks are awesome! I learned about them at ICML 2016 (
http://kaiminghe.com/icml16tutorial/index.html). Kaiming He's exposition
was fantastically clear. I used them in my own attempts at training neural
networks for move prediction. It's fairly easy to train something with 20
layers with residual networks, even without using batch normalization. With
batch normalization apparently you can get to hundreds of layers without
problems, and the models do perform better on the test data for vision
tasks. But I didn't implement that part, and the additional computational
cost probably makes this not worth it for go.

Álvaro.




On Thu, Oct 19, 2017 at 8:51 PM, Brian Sheppard via Computer-go <
computer-go@computer-go.org> wrote:

> So I am reading that residual networks are simply better than normal
> convolutional networks. There is a detailed write-up here:
> https://blog.waya.ai/deep-residual-learning-9610bb62c355
>
> Summary: the residual network has a fixed connection that adds (with no
> scaling) the output of the previous level to the output of the current
> level. The point is that once some layer learns a concept, that concept is
> immediately available to all downstream layers, without need for learning
> how to propagate the value through a complicated network design. These
> connections also provide a fast pathway for tuning deeper layers.
>
> -Original Message-
> From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf
> Of Gian-Carlo Pascutto
> Sent: Wednesday, October 18, 2017 4:33 PM
> To: computer-go@computer-go.org
> Subject: Re: [Computer-go] AlphaGo Zero
>
> On 18/10/2017 19:50, cazen...@ai.univ-paris8.fr wrote:
> >
> > https://deepmind.com/blog/
> >
> > http://www.nature.com/nature/index.html
>
> Select quotes that I find interesting from a brief skim:
>
> 1) Using a residual network was more accurate, achieved lower error, and
> improved performance in AlphaGo by over 600 Elo.
>
> 2) Combining policy and value together into a single network slightly
> reduced the move prediction accuracy, but reduced the value error and
> boosted playing performance in AlphaGo by around another 600 Elo.
>
> These gains sound very high (much higher than previous experiments with
> them reported here), but are likely due to the joint training.
>
> 3) The raw neural network, without using any lookahead, achieved an Elo
> rating of 3,055. ... AlphaGo Zero achieved a rating of 5,185.
>
> The increase of 2000 Elo from tree search sounds very high, but this may
> just mean the value network is simply very good - and perhaps relatively
> better than the policy one. (They previously had problems there that SL
> > RL for the policy network guiding the tree search - but I'm not sure
> there's any relation)
>
> 4) History features Xt; Yt are necessary because Go is not fully
> observable solely from the current stones, as repetitions are forbidden.
>
> This is a weird statement. Did they need 17 planes just to check for ko?
> It seems more likely that history features are very helpful for the
> internal understanding of the network as an optimization. That sucks though
> - it's annoying for analysis and position setup.
>
> Lastly, the entire training procedure is actually not very complicated at
> all, and it's hopeful the training is "faster" than previous approaches -
> but many things look fast if you can throw 64 GPU workers at a problem.
>
> In this context, the graphs of the differing network architectures causing
> huge strength discrepancies are both good and bad. Making a better pick can
> cause you to get massively better results, take a bad pick and you won't
> come close.
>
> --
> GCP
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-19 Thread Álvaro Begué
Yes, it seems really odd that they didn't add a plane of all ones. The
"heads" have weights that depend on the location of the board, but all the
other layers can't tell the difference between a lonely stone at (1,1) and
one at (3,3).

In my own experiments (trying to predict human moves) I found that 3 inputs
worked well: signed liberties, age capped at 8, all ones. I think of the
number of liberties as a key part of the game mechanics, so I don't think
it detracts from the purity of the approach, and it's probably helpful for
learning about life and death.

Álvaro.




On Thu, Oct 19, 2017 at 7:42 AM, Gian-Carlo Pascutto  wrote:

> On 18-10-17 19:50, cazen...@ai.univ-paris8.fr wrote:
> >
> > https://deepmind.com/blog/
> >
> > http://www.nature.com/nature/index.html
>
> Another interesting tidbit:
>
> The inputs don't contain a reliable board edge. The "white to move"
> plane contains it, but only when white is to move.
>
> So until AG Zero "black" learned that a go board is 19 x 19, the white
> player had a serious advantage.
>
> I think I will use 18 input layers :-)
>
> --
> GCP
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-19 Thread Álvaro Begué
This is a quick check of my understanding of the network architecture.
Let's count the number of parameters in the model:
 * convolutional block: (17*9+1)*256 + 2*256
[ 17 = number of input channels
   9 = size of the 3x3 convolution window
   1 = bias (I am not sure this is needed if you are going to do batch
normalization immediately after)
 256 = number of output channels
   2 = mean and standard deviation of the output of the batch normalization
 256 = number of channels in the batch normalization ]
 * residual block: (256*9+1)*256 + 2*256 + (256*9+1)*256 + 2*256
 * policy head: (256*1+1)*2 + 2*2 + (2*361+1)*362
 * value head: (256*1+1)*1 + 2*1 + (1*361+1)*256 + (256+1)*1

Summing it all up, I get 22,837,864 parameters for the 20-block network and
46,461,544 parameters for the 40-block network.

Does this seem correct?

Álvaro.



On Thu, Oct 19, 2017 at 6:17 AM, Petr Baudis  wrote:

> On Wed, Oct 18, 2017 at 04:29:47PM -0700, David Doshay wrote:
> > I saw my first AlphaGo Zero joke today:
> >
> > After a few more months of self-play the games might look like this:
> >
> > AlphaGo Zero Black - move 1
> > AlphaGo Zero White - resigns
>
> ...which is exactly what my quick attempt to reproduce AlphaGo Zero
> yesterday converged to overnight. ;-)  But I'm afraid it's because of
> a bug, not wisdom...
>
> Petr Baudis
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-18 Thread Álvaro Begué
A link to the paper (from the blog post):
https://deepmind.com/documents/119/agz_unformatted_nature.pdf

Enjoy!
Álvaro.


On Wed, Oct 18, 2017 at 2:29 PM, Richard Lorentz 
wrote:

> Wow! That's very exciting. I'm glad they didn't completely shelve the
> project as they implied they might do after the match with Lee Sedol.
>
> I'm looking forward to seeing some games and "... plays unknown to
> humans", as Hassabis states.
>
> Also, I love this comment from Silver, something I have always promoted:
> The implication is that “algorithms matter much more than either computing
> or data available”.
>
> -Richard
>
>
>
> On 10/18/2017 10:50 AM, cazen...@ai.univ-paris8.fr wrote:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__deepmind.com_blog_=DwIGaQ=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U=i0hg-cKH69CA5MsdosvezQ=IPW6s_201Mkb1YsJA4v5VU1jX-PAmMmmrbwYr8hhh2w=zdcDXO2JZU2MfbTwTTrIB8JlmwOD4L11kctLG8w4ktI=
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.nature.com_nature_index.html=DwIGaQ=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U=i0hg-cKH69CA5MsdosvezQ=IPW6s_201Mkb1YsJA4v5VU1jX-PAmMmmrbwYr8hhh2w=rb3-CGHJ_gOUzsjkKh1Ul9f7-eDNkvaWahvgs689xWA=
>
> Impressive!
>
> ___
> Computer-go mailing 
> listComputer-go@computer-go.orghttps://urldefense.proofpoint.com/v2/url?u=http-3A__computer-2Dgo.org_mailman_listinfo_computer-2Dgo=DwIGaQ=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U=i0hg-cKH69CA5MsdosvezQ=IPW6s_201Mkb1YsJA4v5VU1jX-PAmMmmrbwYr8hhh2w=Mw0XMdQlRQY3qFVrnSXjL_zBrnoa72a_n8wWibfyxRg=
>
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Neural nets for Go - chain pooling?

2017-08-18 Thread Álvaro Begué
When TensorFlow was first released I used it to implement a CNN for move
prediction and evaluation, and I requested the addition of a function to
implement chain pooling: https://github.com/tensorflow/tensorflow/issues/549

It's now implemented here:
https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/unsorted-segment-max

By the time they got around to implementing it I wasn't actively doing
computer go anymore (I went back to chess for a while), so I haven't
actually used it. But it is a very natural idea.

Regards,
Álvaro.




On Fri, Aug 18, 2017 at 2:14 PM, David Wu  wrote:

> While browsing the online, I found an interesting idea "chain pooling"
> presented here:
> https://github.com/jmgilmer/GoCNN
>
> The idea is to have some early layers that perform a max-pool across
> solidly-connected stones. I could also imagine it being useful to perform a
> sum. So the input would be a 19x19 layer, and the output would be a 19x19
> layer where the output at a given position, if that position is occupied by
> a stone, is equal to the maximum (or the sum of) all the values in the
> input layer across all stones that are solidly connected to that group.
>
> One might imagine going further and allowing the neural net some early
> convolutional layers that determine the connectivity strength for this
> pooling between groups, so that it could choose to pool across definite
> single-point eyes or bamboo joints, etc. It's possible that one would not
> want to force all layers through this operation, so possibly only some
> feature planes would be fed through this operation, or perhaps all of them
> but the identity transformation would also be an output of the layer to
> feed into the next.
>
> Speculatively, in the best case one might imagine this has a chance to
> improve the ability of the neural net to evaluate large semeai or to judge
> the status of large dragons, by letting it propagate liberty count
> information (including virtual liberties due to approach moves) and
> information about eyes across the board more rapidly than a series of local
> convolutions could do so. In fact, it seems that convolutional layers
> followed by an early pooling of this sort would make it unnecessary to
> provide liberties as an input feature because it would become easy for the
> neural net to compute it on its own, although one would still probably want
> to provide it to save the network the effort of having to learn it.
>
> Of course, this idea could also easily turn out worthless. One thing I'm
> very not sure about is how GPU-friendly this kind of operation could be
> made to be, since I don't understand GPUs. Any thoughts?
>
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Alphago and solving Go

2017-08-06 Thread Álvaro Begué
Eventually exploring the entire tree is what I would call "mathematically
sound", meaning that given enough time the algorithm is guaranteed to play
optimally. I would reserve "brute force" for algorithms that simply search
every possible variant exhaustively, like John Tromp's connect 4 program
Fhourstones does [very well, I may add].

But I do smell troll too, so I'll stop here. Enough feeding.

Álvaro.



On Sun, Aug 6, 2017 at 4:08 PM, Brian Sheppard via Computer-go <
computer-go@computer-go.org> wrote:

> Possibly you are answering a different question than the one posed?
> Possibly your interpretation is the one actually intended. I don’t know,
> and maybe you could be right about what was being asked.
>
>
>
> I do know the semantics of brute force, though, which you quoted below.
>
>
>
> Note that “brute force” != unintelligent. Inevitably, every brute force
> algorithm will incorporate intelligent heuristics. Consider the evolution
> of minimax, for example, via alpha-beta, selective extensions, LMR, etc.
>
>
>
>
>
>
>
> *From:* Steven Clark [mailto:steven.p.cl...@gmail.com]
> *Sent:* Sunday, August 6, 2017 2:52 PM
> *To:* Brian Sheppard 
> *Cc:* computer-go 
>
> *Subject:* Re: [Computer-go] Alphago and solving Go
>
>
>
> This is semantics. Yes, in the limit of infinite time, it is brute-force.
> Meanwhile, in the real world, AlphaGo chooses to balance its finite time
> budget between depth & width. The mere fact that the CNN policy network
> generates a score for each coordinate on the board in a given position,
> does not mean that all of those nodes will be expanded in any reasonable
> scenario.
>
>
>
> On Sun, Aug 6, 2017 at 2:20 PM, Brian Sheppard  wrote:
>
> I understand why most people are saying that AlphaGo is not brute force,
> because it appears to be highly selective. But MCTS is a full width search.
> Read the AlphaGo papers, as one of the other respondents (rather
> sarcastically) suggested: AlphaGo will eventually search every move at
> every node.
>
>
>
> MCTS has the appearance of a selective search because time control
> terminates search while the tree is still ragged. In fact, it will search
> every continuation an infinite number of times.
>
>
>
> In order to have high performance, an MCTS implementation needs to search
> best moves as early as possible in each node. It is in this respect that
> AlphaGo truly excels. (AlphaGo also excels at whole board evaluation, but
> that is a separate topic.)
>
>
>
>
>
> *From:* Steven Clark [mailto:steven.p.cl...@gmail.com]
> *Sent:* Sunday, August 6, 2017 1:14 PM
> *To:* Brian Sheppard ; computer-go <
> computer-go@computer-go.org>
> *Subject:* Re: [Computer-go] Alphago and solving Go
>
>
>
> Why do you say AlphaGo is brute-force? Brute force is defined as: "In
> computer science, brute-force search or exhaustive search, also known as
> generate and test, is a very general problem-solving technique that
> consists of *systematically enumerating all possible candidates* for the
> solution and checking whether each candidate satisfies the problem's
> statement."
>
>
>
> The whole point of the policy network is to avoid brute-force search, by
> reducing the branching factor...
>
>
>
> On Sun, Aug 6, 2017 at 10:42 AM, Brian Sheppard via Computer-go <
> computer-go@computer-go.org> wrote:
>
> Yes, AlphaGo is brute force.
>
> No it is impossible to solve Go.
>
> Perfect play looks a lot like AlphaGo in that you would not be able to
> tell the difference. But I think that AlphaGo still has 0% win rate against
> perfect play.
>
>
>
> My own best guess is that top humans make about 12 errors per game. This
> is estimated based on the win rate of top pros in head-to-head games. The
> calculation starts by assuming that Go is a win at 6.5 komi for either
> Black (more likely) or White, so a perfect player would win 100% for Black.
> Actual championship caliber players win 51% to 52% for Black. In 9-dan play
> overall, I think the rate is 53% to 54% for Black. Then you can estimate
> how many errors each player has to make to bring about such a result. E.g.,
> If players made only one error on average, then Black would win the vast
> majority of games, so they must make more errors. I came up with 12 errors
> per game, but you can reasonably get other numbers based on your model.
>
>
>
> Best,
>
> Brian
>
>
>
> *From:* Computer-go [mailto:computer-go-boun...@computer-go.org] *On
> Behalf Of *Cai Gengyang
> *Sent:* Sunday, August 6, 2017 9:49 AM
> *To:* computer-go@computer-go.org
> *Subject:* [Computer-go] Alphago and solving Go
>
>
>
> Is Alphago brute force search?
>
> Is it possible to solve Go for 19x19 ?
>
> And what does perfect play in Go look like?
>
> How far are current top pros from perfect play?
>
>
>
> Gengyang
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> 

Re: [Computer-go] Alphago and solving Go

2017-08-06 Thread Álvaro Begué
No, it is not possible to solve go on a 19x19 board. The closest we have is
5x5, I believe. We have a pretty good idea what optimal play looks like on
7x7. The difficulty of finding optimal play on large boards is
unfathomable.

Álvaro.


On Sun, Aug 6, 2017 at 10:06 AM Cai Gengyang  wrote:

> Is Alphago brute force search?
> Is it possible to solve Go for 19x19 ?
> And what does perfect play in Go look like?
> How far are current top pros from perfect play?
>
> Gengyang
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] purpose of zero-filled feature planes in CNN

2017-07-18 Thread Álvaro Begué
I agree with you. It makes no sense. You'll take whatever linear
combinations you want and they'll all be zero.

Álvaro.


On Tue, Jul 18, 2017 at 6:53 AM, Brian Lee 
wrote:

> I've been wondering about something I've seen in a few papers (AlphaGo's
> paper, Cazenave's resnet policy architecture), which is the presence of an
> input plane filled with 0s.
>
> The input features also typically include a plane of 1s, which makes sense
> to me - zero-padding before a convolution means that the 0/1 demarcation
> line tells the CNN where the edge of the board is. But as far as I can
> tell, a plane of constant 0s should do absolutely nothing. Can anyone
> enlighten me?
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Álvaro Begué
On Tue, May 23, 2017 at 4:51 AM, Hideki Kato  wrote:

> (3) CNN cannot learn exclusive-or function due to the ReLU
> activation function, instead of traditional sigmoid (tangent
> hyperbolic).  CNN is good at approximating continuous (analog)
> functions but Boolean (digital) ones.
>

Oh, not this nonsense with the XOR function again.

You can see a neural network with ReLU activation function learning XOR
right here: http://playground.tensorflow.org/#activation=relu;
batchSize=10=xor=reg-plane=0.01&
regularizationRate=0=0=4,4=0.96791&
showTestData=false=false=50=true&
y=true=false=false=false=
false=false=false=false=false&
problem=classification=false=false

Enjoy,
Álvaro.
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] What was the final score after the counting of AlphaGo-vs-Ke Jie Game #1?

2017-05-23 Thread Álvaro Begué
AlphaGo as white won by 0.5 points.



On Tue, May 23, 2017 at 3:00 AM, Jim O'Flaherty 
wrote:

> The announcer didn't have her mic on, so I couldn't hear the final score
> announced...
>
> So, what was the final score after the counting of AlphaGo-vs-Ke Jie Game
> #1?
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Aya's selfplay games for training value network

2017-04-05 Thread Álvaro Begué
Oh, that's unfortunate. When I read the description of what you did, I
thought the whole point of using of a sigmoid in the last stage was
precisely so the ownership map would be reasonably correct.

Your database should still be very useful, but it feels like it should be
much easier for the DCNN to learn about life and death if you can look at
who eventually owned this part of the board.

Regards,
Álvaro.




On Wed, Apr 5, 2017 at 3:51 PM, Hiroshi Yamashita <y...@bd.mbn.or.jp> wrote:

> Dear Álvaro,
>
> It looks like "remove all dead stones" doesn't always
>>
>
> I checked this. White never pass if position is losing in Tromp/Taylor
> rule. In this case, white can win even if not removing last black
> stone. And with something unlucky, pass was selected.
> And when a player is winning with big margin like +30.5, both player
> tend to play bad move, like self-atari, filling eye, breaking seki,
> if those moves do not change the result.
> So it is maybe not good at ownership map.
>
> Thanks,
> Hiroshi Yamashita
>
>
> - Original Message - From: "Álvaro Begué" <alvaro.be...@gmail.com>
> To: "computer-go" <computer-go@computer-go.org>
> Sent: Thursday, April 06, 2017 1:53 AM
> Subject: Re: [Computer-go] Aya's selfplay games for training value network
>
>
> Dear Yamashita,
>
> This is a great resource and I am very thankful that you made it available
> to all of us.
>
> One minor issue: It looks like "remove all dead stones" doesn't always
> work. An example is 206_19_0114_2k_r16_add300_7/20170312_0018_05484.sgf ,
> where a black stone at P18 is left on the board for no apparent reason. Do
> you know why white passed on move 420?
>
> I haven't studied how often this happens, but it means I will need to do
> some additional processing before I can use the final ownership map as
> something that a DCNN would predict.
>
> Thanks!
> Álvaro.
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Regarding MCTS playouts

2017-04-05 Thread Álvaro Begué
For identifying points that look like eyes, it's useful to have a 16-bit
value at each position of the board that contains the colors of the 8
neighbors (2 bits per neighbor, with an encoding like 00=empty, 01=black,
10=white, 11=outside). You can maintain this incrementally when a point on
the board changes color. Then a single table lookup will tell you if this
looks like an eye or not. That should speed up your light playouts
considerably.

Other arrays are useful, depending on what kinds of queries you want to
make fast.

Álvaro.



On Wed, Apr 5, 2017 at 10:41 AM, Michail Drozdov 
wrote:

> Hello,
> I'm trying to write my own go playing program. Right now it is GnuGo style
> program - no dedicated search is performed. It is pleasure to see bot
> improving, but recently I decided to learn and implement MCTS, meaning I
> have to use my go board implementation often (for playouts).
> I see some fundamental performance problems, but can't decide if it is
> lack of my programming knowledge or need of paradigm shift. I'm having
> around 1000 playouts/s with very light playouts (which, as I understand, is
> very small number). My playouts consist of selecting all empty fields,
> which are not potential eyes of player to move and playing one at random.
> Benchmarking shows, that current biggest offender is function, which checks
> if field is potential eye, by checking values of color array around point
> of interest.
> My board implementation is strightforward - one dimensional array of
> colors of intersections (black, white, empty or border) and some additional
> info like player to move, ko position, etc. Not going into programming part
> I would like to ask, if there would be benefit of also having other arrays
> for liberties, numbers of neighbours and so on, which I would update
> incrementally and then check? Looks like such libraries like Fuego or
> libego use this approach. Or I am looking at this problem from wrong side
> and need to search for bottlenecks in code itself? Thank you in advance.
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Zen lost to Mi Yu Ting

2017-03-22 Thread Álvaro Begué
Thank you, Gian-Carlo. I couldn't have said it better.

Álvaro.



On Wed, Mar 22, 2017 at 7:07 AM, Gian-Carlo Pascutto  wrote:

> On 22-03-17 09:41, Darren Cook wrote:
> >> The issue with Japanese rules is easily solved by refusing to play
> >> under ridiculous rules. Yes, I do have strong opinions. :)
> >
> > And the problem with driver-less cars is easily "solved" by banning
> > all road users that are not also driver-less cars (including all
> > pedestrians, bikes and wild animals).
>
> I think you misunderstand the sentiment completely. It is not: Japanese
> rules are difficult for computers, so we don't like them.
>
> It is: Japanese rules are problematic on many levels, so we prefer to
> work with Chinese ones and as a consequence that's what the programs are
> trained for and tested on. It is telling that Zen is having these
> troubles despite being made by Japanese programmers. I believe the
> saying for this is "voting with your feet".
>
> > Or how about this angle: humans are still better than the programs
> > at Japanese rules. Therefore this is an interesting area of study.
>
> Maybe some people are interested in studying Japanese rules, like
> finding out what they actually are
> (http://home.snafu.de/jasiek/j1989c.html). That's fine, but not all that
> interesting for AI, or, actually, computer go.
>
> Of course, commercial programs that need to cater to a Japanese (or
> Korean) audience are stuck. As are people that want to play the UEC Cup
> etc.
>
> --
> GCP
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Zen lost to Mi Yu Ting

2017-03-21 Thread Álvaro Begué
I was thinking the same thing. You can easily equip the value network with
several outputs, corresponding to several settings of komi, then train as
usual.

The issue with Japanese rules is easily solved by refusing to play under
ridiculous rules. Yes, I do have strong opinions. :)

Álvaro.



On Tue, Mar 21, 2017 at 7:36 PM,  wrote:

>
> Why can't you reuse the same self played games but score them with a
> different komi value ? The policy network does not use the komi to choose
> its moves so it should make no difference.
>
>
> > On 21/03/2017 21:08, David Ongaro wrote:
> >>> But how would you fix it? Isn't that you'd need to retrain your value
> >>> network from the scratch?
> >>
> >> I would think so as well. But I some months ago I already made a
> >> proposal in this list to mitigate that problem: instead of training a
> >> different value network for each Komi, add a “Komi adjustment† value
> >> as
> >> input during the training phase. That should be much more effective,
> >> since the “win/lost† evaluation shouldn’t change for many (most?)
> >> positions for small adjustments but the resulting value network (when
> >> trained for different Komi adjustments) has a much greater range of
> >> applicability.
> >
> > The problem is not the training of the network itself (~2-4 weeks of
> > letting a program someone else wrote run in the background, easiest
> > thing ever in computer go), or whether you use a komi input or a
> > separate network, the problem is getting data for the different komi
> > values.
> >
> > Note that if getting data is not a problem, then a separate network
> > would perform better than your proposal.
> >
> > --
> > GCP
> > ___
> > Computer-go mailing list
> > Computer-go@computer-go.org
> > http://computer-go.org/mailman/listinfo/computer-go
> >
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] New AMD processors

2017-03-04 Thread Álvaro Begué
Oh, you are using a value net? How did you train it? I don't see anything
about it in the bitbucket repository...

Álvaro.

P.S.- Sorry about the thread hijacking, everyone.


On Sat, Mar 4, 2017 at 4:29 AM, Detlef Schmicker  wrote:

> I looked into this too:
>
> oakfoam would not benefit a lot from more cpu power at the moment, with
> 4 cores I mix 10 playouts with the value net in the ratio (3:7) at the
> moment.
>
> In case of buying a Ryzen: take care the board allows two GTX1080 Ti
> (wait till end of march to buy them) and buy a power supply with 1kw
> (and send a copy of the machine to me) :)
>
> Detlef
>
> P.S.: oakfoam status
> http://www.dragongoserver.net/userinfo.php?uid=97704
> http://www.yss-aya.com/cgos/19x19/cross/NG-06.html
>
> Am 03.03.2017 um 21:29 schrieb "Ingo Althöfer":
> > Hi,
> >
> > AMD has published a new (fast and cool) processor, the Ryzen.
> > Did some go programmers already collect experiences with it?
> > Do they combine well with GPUs?
> >
> > Ingo.
> > ___
> > Computer-go mailing list
> > Computer-go@computer-go.org
> > http://computer-go.org/mailman/listinfo/computer-go
> >
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] dealing with multiple local optima

2017-02-24 Thread Álvaro Begué
I should point out that Reinforcement Learning is a relatively unimportant
part of AlphaGo, according to the paper. They only used it to turn the
move-prediction network into a stronger player (presumably increasing the
weights of the layer before SoftMax would do most of the job, by making the
highest-probability move the vast majority of the time). This stronger
player was then used to label positions by the result of self-play, and
these labelled positions were then used to learn a "value network" (what we
call an "evaluation function" in computer chess). The final AlphaGo playing
program doesn't actually use the RI network at all.


> Is there any success case of a chess, go, or any kind of complex
strategic game playing algorithm, where it gained expert strength without
domain knowledge such as expert game examples?

Yes! In 2015 I made a Spanish checkers program that used a neural network
as evaluation function (121 binary inputs, two hidden layers with 32 units
each and ReLU activation functions and one tanh output unit). I started by
generating 6-men tablebases, because the endgames are hard to get right
with shallow searches, and it is critical to know them to learn what
"enough advantage to win" is. I bootstrapped the learning process by
playing games with random evaluation function and depth-6 search. Then I
used these games to train the neural network, then played games with the
neural network, at around 0.1s/move (that's about 100K nodes searched).
Rinse, repeat (3 times, I think). The first time around, the program didn't
really know if it was better to have more pieces than the opponent or
fewer. By the 3rd iteration it had at least as much positional knowledge of
the game as I do, and I spent a lot of time in my teenage years playing
this game.

One important detail: The first few moves of the games were played using
UCT as a sort of opening book, to create a lot of variety. I should also
point out that the last move in the UCT tree is pretty much random. I
believe both AlphaGo and Giraffe used random moves when generating
databases to learn from.

I sent the program to my friend José Manuel Morán, which is probably the
strongest player in Spain (there are several comparably strong players in
Portugal, where the game is more popular). After a few days his report was
"It cannot be beaten."

Perhaps alpha-beta on a modern computer is too successful for this game to
really be able to tell if the evaluation function was very good, but it was
certainly not bad, and I didn't use any human knowledge anywhere in the
process of producing it.

Regards,
Álvaro.



On Fri, Feb 24, 2017 at 3:39 AM, Minjae Kim  wrote:

> I've recently viewed the paper of AlphaGo, which has done gradient-based
> reinforcement learning to get stronger. The learning was successful enough
> to beat a human master, but in this case, supervised learning with a large
> database of master level human games was preceded the reinforcement
> learning. For a complex enough game as go, one can expect that the search
> space for the policy function would not be smooth at all. So supposedly
> supervised learning was necessary to guide the policy function to a good
> starting point before reinforcement. Without such, applying reinforcement
> learning directly to a random policy can easily make the policy stuck at a
> bad local optimum. I could have a miunderstanding at this point; correct me
> if so, but to continue on: if it is hard to have "the good starting point"
> such as a trained policy from human expert game records, what is a way to
> devise one. I've had a look on NEAT and HyperNEAT, which are evolutionary
> methods. Do these evolutionary algorithms scale well on complex strategic
> decision processes and not just on simple linear decisions such as food
> gathering and danger avoidance? In case not, what alternatives are known?
> Is there any success case of a chess, go, or any kind of complex strategic
> game playing algorithm, where it gained expert strength without domain
> knowledge such as expert game examples?
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Fwd: Playout policy optimization

2017-02-12 Thread Álvaro Begué
Thanks, Rémi!

-- Forwarded message --
From: Rémi Coulom <remi.cou...@free.fr>
Date: Sun, Feb 12, 2017 at 4:24 AM
Subject: Playout policy optimization
To: Álvaro Begué <alvaro.be...@gmail.com>


Hi Alvaro,

I cannot post to the list any more. Please forward this message to the list
if you can.

Your idea is simulation balancing:
https://www.remi-coulom.fr/CG2010-Simulation-Balancing/
http://www.machinelearning.org/archive/icml2009/papers/500.pdf

Maybe using the actual result of a game instead of the outcome of a tree
search might give better results than what we got.

Rémi

PS: This is what I got when I tried to post to the list:
<computer-go@computer-go.org>: host mail.eugeneweb.com[184.105.139.163]
said:
554 5.7.1 <remi.cou...@free.fr>: Sender address rejected: Access denied
(in
reply to RCPT TO command)
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Playout policy optimization

2017-02-11 Thread Álvaro Begué
Hi,

I remember an old paper by Rémi Coulom ("Computing Elo Ratings of Move
Patterns in the Game of Go") where he computed "gammas" (exponentials of
scores that you could feed to a softmax) for different move features, which
he fit to best explain the move probabilities from real games.

Similarly, AlphaGo's paper describes how their rollout policy's weights are
trained to maximize log likelihood of moves from a database.

However, there is no a priori reason why imitating the probabilities
observed in reference games should be optimal for this particular purpose.

I thought about this for about an hour this morning, and this is what I
came up with. You could make a database of positions with a label
indicating the result (perhaps from real games, perhaps similarly to how
AlphaGo trained their value network). Loop over the positions, run a few
playouts and tweak the move probabilities by some sort of reinforcement
learning, where you promote the move choices from playouts whose outcome
matches the label, and you discourage the move choices from playouts whose
outcome does not match the label.

The point is that we would be pushing our playout policy to produce good
estimates of the result of the game, which in the end is what playout
policies are for.

Any thoughts? Did anyone actually try something like this?

Cheers,
Álvaro.
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Notes from the Asilomar Conference on Beneficial AI

2017-02-09 Thread Álvaro Begué
If you like video commentary, Haylee has five game reviews, starting with
this one: https://www.youtube.com/watch?v=b_24iaUMRFs=1109s

You may also enjoy this lecture (probably best for kyu players):
https://www.youtube.com/watch?v=v8Eh41m7gVA (you may want to skip to around
9:00).

Enjoy,
Álvaro.



On Thu, Feb 9, 2017 at 9:28 PM, Shawn Ligocki  wrote:

> I remember hearing widespread surprise at many of AlphaGo's moves vs. Lee
> Sedol:
> * Game 1 - Moves 10&12 - Attach and pull back after pincer. Which is
> widely taught as bad play.
> * Game 2 - Move 37 - 5th line shoulder hit. Likewise students are taught
> to almost never play 5th line shoulder hit.
>
> and to a lesser extent:
> * Game 2 - Move 15 - Peeping so early.
>
> Those are the examples I remember. Don't remember any references to style
> pros didn't like (except maybe that AlphaGo likes the middle more than most
> pros).
>
> On Thu, Feb 9, 2017 at 7:32 PM, Richard J Lorentz 
> wrote:
>
>> Thanks for the interesting link. Indeed, some good reading there.
>>
>> One quote that I've seen various versions of a number of times now: "
>> More interesting for the rest of us, AlphaGo is playing moves and styles
>> that all human masters had dismissed as stupid centuries ago."
>>
>> Can any one point me to concrete examples of such moves, styles, and/or
>> games? With some commentary?!
>>
>> Thanks!
>>
>> -Richard
>>
>>
>> On 02/09/2017 02:42 PM, Freeman Ng wrote:
>>
>> My favorite blogger's account of this conference that the Deep Mind team
>> also participated in.
>>
>> http://slatestarcodex.com/2017/02/06/notes-from-the-asilomar
>> -conference-on-beneficial-ai/
>> 
>>
>> Note: he's neither a Go player nor a computer programmer, which limits
>> his AlphaGo related reporting, but I thought this might be interesting to
>> the list nonetheless.
>>
>> Freeman
>>
>>
>>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Our Silicon Overlord

2017-01-07 Thread Álvaro Begué
If you are killed by an AI-driven car, the manufacturer will use the case
to improve the algorithm and make sure that this type of death never
happens again. Unfortunately a death by a drunk driver doesn't seem to
teach anyone anything and will keep happening as long as people need to
drive and alcoholism exists.



On Sat, Jan 7, 2017 at 10:35 PM, Gonçalo Mendes Ferreira 
wrote:

> Well, I don't know what is the likelihood of being hit by drunk drivers
> or AI driven cars, but if it were the same I'd prefer to have drunk
> drivers. Drunk drivers you can understand: you can improve your chances
> by making yourself more visible, do not jump from beyond obstacles, be
> more careful when crossing or not crossing before they actually stop. A
> failure in an AI car seems much more unpredictable.
>
> Gonçalo
>
> On 07/01/2017 21:24, Xavier Combelle wrote:
> >
> >> ...this is a major objective. E.g., we do not want AI driven cars
> >> working right most of the time but sometimes killing people because
> >> the AI faces situations (such as a local sand storm or a painting on
> >> the street with a fake landscape or fake human being) outside its
> >> current training and reading.
> > currently I don't like to be killed by a drunk driver, and to my opinion
> > it is very more likely to happen than an AI killing me because a mistake
> > in programming (I know, it is not the point of view of most of people
> > which want a perfect AI with zero dead and not an AI which would reduce
> > the death on road by a factor 100)
> > ___
> > Computer-go mailing list
> > Computer-go@computer-go.org
> > http://computer-go.org/mailman/listinfo/computer-go
> >
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Some experiences with CNN trained on moves by the winning player

2016-12-11 Thread Álvaro Begué
On Sun, Dec 11, 2016 at 4:50 PM, Rémi Coulom  wrote:

> It makes the policy stronger because it makes it more deterministic. The
> greedy policy is way stronger than the probability distribution.
>

I suspected this is what it was mainly about. Did you run any experiments
to see if that explains the whole effect?



>
> Rémi
>
> - Mail original -
> De: "Detlef Schmicker" 
> À: computer-go@computer-go.org
> Envoyé: Dimanche 11 Décembre 2016 11:38:08
> Objet: [Computer-go] Some experiences with CNN trained on moves by the
> winning player
>
> I want to share some experience training my policy cnn:
>
> As I wondered, why reinforcement learning was so helpful. I trained
> from the Godod database with only using the moves by the winner of
> each game.
>
> Interestingly the prediction rate of this moves was slightly higher
> (without training, just taking the previously trained network) than
> taking into account the moves by both players (53% against 52%)
>
> Training on winning player moves did not help a lot, I got a
> statistical significant improvement of about 20-30ELO.
>
> So I still don't understand, why reinforcement should do around
> 100-200ELO :)
>
> Detlef
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Time policy

2016-11-04 Thread Álvaro Begué
Start by computing a "normal" amount of time to spend, using the kinds of
rules described by others in this thread.

Since you are using MCTS, you may want to experiment with spending more
time if the move with the best score is not the one that has been explored
the most, since that probably signals an unclear situation.

You can save a bit of time in obvious moves once you realize that the other
moves cannot catch up in the normal time allocated for this move.

Álvaro.



On Fri, Nov 4, 2016 at 5:02 AM, Gonçalo Mendes Ferreira 
wrote:

> I similarly use C x (T / E) in matilda, with
> C = 1.24
> T = time left on current period (absolute or byo yomi)
> E = argmax(estimate of game length divided by two ; 19) or byo yomi stones
> remaining
>
> The length estimate ir around 2/3 of the board points.
>
> It lacks the decision to expend a byo yomi period in more difficult
> positions.
>
> Gonçalo Ferreira
> Em 04/11/2016 08:40, Urban Hafner  escreveu:
>
> I think there are some short papers about it out there. But I would
> suggest looking at the source code of existing bots like michi or pachi.
> What I use in my bot is really simple. I use the following formula:
>
> time for next move = remaining time / (C * max(vacant points, M))
>
> Where C is some constant you need to figure out (I use 0.5 right now),
> “vacant points” is the number of empty intersections on the board and M is
> a lower limit (I currently use 24) so that you don’t use up too much time
> that you might need when a capture happens.
>
> It has worked well enough so far that I haven’t looked at more intricate
> algorithms. Oh, and once I hit byo-yomi time I just divide the time into
> equal parts by the number of stones for the byo-yomi period.
>
> Urban
>
> On Fri, Nov 4, 2016 at 9:00 AM, Gian-Carlo Pascutto  wrote:
>
> On 04-11-16 04:45, Billy White wrote:
> > Hi,
> >
> > Our team is working on a computer go system mainly followed alphago.
> > We try to add time policy to our system but cannot find something
> > useful.
> >
> > I am wondering whether there are some useful material?
>
> Take a large games database, and construct a table of expected number of
> moves remaining based on the current move of the game.
>
> Divide total amount of time left by the output of that table.
>
> Test if biasing it to think slightly longer early on helps playing
> strength.
>
> If there is byo-yomi time. the required extra thinking time generally
> flows logically from the byo-yomi timecontrol and the above.
>
> --
> GCP
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
>
>
>
> --
> Blog: http://bettong.net/
> Twitter: https://twitter.com/ujh
> Homepage: http://www.urbanhafner.com/
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Having an "estimated winrate" on the AGA pro game broadcasts

2016-08-31 Thread Álvaro Begué
One could use a curve to map the MC winning rate to an actual winning
probability. It would take only thousands of games to learn such a curve
(as opposed to the 30 million games used to train the value network in
AlphaGo).

Álvaro.

On Wed, Aug 31, 2016 at 8:24 PM, Dan Schmidt  wrote:

> Hi Andrew, I enjoy the AGA broadcasts. I am mostly just an observer on
> this list rather than a go programming expert, but I do have a couple of
> things to say on this topic:
>
> 1. This is done all the time on chess broadcasts these days, and I hate
> it. I want to know what a strong player thinks is going on, not a computer.
> This is a bigger "problem" with chess than go because computers are now
> hugely stronger than the pro commentators, so the commentators become
> largely reduced to interpreting the computer output rather than getting
> across their own human ideas. Of course, a) this is less of an issue right
> now as go programs (except for AlphaGo, which is private) are not quite pro
> strength yet, b) top-level go games have less volatile swings than chess
> games, and perhaps most importantly c) my opinion may be in the minority.
>
> 2. Win rate is tough. Many programs can produce the win rate of their
> Monte Carlo Tree Search, but because that involves looking at lots of
> suboptimal moves on both sides, the reported win rate is closer to 50% than
> it should be. A reported win rate of 60% (in the futures explored by MCTS)
> probably means at least 80% in reality, assuming high-level play on both
> sides.
>
> Crazy Stone reports an evaluation (like B+3.5) and confidence though it
> cautions against taking it too seriously. My assumption is that it's
> something like the median result, and standard deviation, of all the MCTS
> playouts. I find this more useful than the win rate it provides.
>
> Dan
>
>
> On Wed, Aug 31, 2016 at 6:41 PM, Jackson, Andrew 
> wrote:
>
>> Hello all!  Long time lurker, first time poster:  I'm Andrew Jackson, i
>> volunteer w/ the AGA in a number of capacities.
>>
>> Most recently, i've been putting together live broadcasts of games --
>> professional games from CJK, and also games from AGA events [1]
>>
>> These broadcasts feature professional commentary to help amateurs
>> understand what's going on, but the number one question we get in the chat
>> during the broadcasts is invariably "who's winning?"  I was hoping
>> computer-go could provide some suggestions on how we might run the
>> game-in-progress through an engine to get an 'estimated winrate' -- or
>> better yet, a graph -- that could be overlaid in the corner of the screen.
>>
>> Which brings me to the computer-go mailing list :)  It seems like someone
>> on this mailing list would probably have some good ideas about how this
>> might be accomplished :)  What do you think?
>>
>>
>> Here are some more details:
>>
>> The game sgf is kept mirrored on KGS by a human transcriber w/ permission
>> from the KBA.
>>
>> The overlay for these broadcasts is already rendered as a local webserver
>> hosting a simple html shell holding the graphics; my thought was that this
>> server could easily make requests to another service somewhere.  That
>> service would be tracking the game and able to send back some json
>> describing the engine's view of the game, its confidence, etc.  We could
>> then plot with d3.js right on the overlay.
>>
>> The service would need to run somewhere else, as the computer streaming
>> the game is already using all the CPU for transcoding & pushing the video.
>>
>> Given general overview of the goal & constraints, I ask you:
>>
>>  - Does this sound plausible?
>>  - Any ideas for an engine that would be easy to hook this into?
>>  - Are bots capable of spectating games on KGS?
>>
>> Thanks in advance for any help you might provide!
>> -Andrew
>>
>>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Converging to 57%

2016-08-23 Thread Álvaro Begué
There are situations where carefully crafting the minibatches makes sense.
For instance, if you are training an image classifier it is good to build
the minibatches so the classes are evenly represented. In the case of
predicting the next move in go I don't expect this kind of thing will make
much of a difference.

I got to around 52% on a subset of GoGoD using ideas from the ResNet paper (
https://arxiv.org/abs/1512.03385). I used 128x20 followed by 64x20 and
finally 32x20, with skip connections every two layers. I started the
training with Adam(1e-4) and later on I lowered it to 1e-5 and eventually
1e-6. The only inputs I use are the signed count of liberties (positive for
black, negative for white), the age of each stone capped at 8, and a block
of ones indicating where the board is.

I'll be happy to share some code if people are interested.

Álvaro.



On Tue, Aug 23, 2016 at 7:29 AM, Gian-Carlo Pascutto  wrote:

> On 23/08/2016 11:26, Brian Sheppard wrote:
> > The learning rate seems much too high. My experience (which is from
> > backgammon rather than Go, among other caveats) is that you need tiny
> > learning rates. Tiny, as in 1/TrainingSetSize.
>
> I think that's overkill, as in you effectively end up doing batch
> gradient descent instead of mini-batch/SGD.
>
> But yes, 0.01 is rather high with momentum. Try 0.001 for methods with
> momentum, and with the default Adam parameters you have to go even lower
> and try 0.0001.
>
> > Neural networks are dark magic. Be prepared to spend many weeks just
> > trying to figure things out. You can bet that the Google & FB results
> > are just their final runs.
>
> As always it's sad nobody publishes what didn't work saving us the time
> of trying it all over again :-)
>
> > Changing batching to match DarkForest style (making sure that a
> > minibatch contains samples from game phases... for example
> > beginning, middle and end-game).
>
> This sounds a bit suspicious. The entries in your minibatch should be
> randomly selected from your entire training set, so statistically having
> positions from all phases would be guaranteed. (Or you can shuffle the
> entire training set before the epoch, instead of randomly picking during
> it).
>
> Don't feed the positions in in-order or from the same game...
>
> --
> GCP
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Creating the playout NN

2016-06-12 Thread Álvaro Begué
I don't understand the point of using the deeper network to train the
shallower one. If you had enough data to be able to train a model with many
parameters, you have enough to train a model with fewer parameters.

Álvaro.


On Sun, Jun 12, 2016 at 5:52 AM, Michael Markefka <
michael.marke...@gmail.com> wrote:

> Might be worthwhile to try the faster, shallower policy network as a
> MCTS replacement if it were fast enough to support enough breadth.
> Could cut down on some of the scoring variations that confuse rather
> than inform the score expectation.
>
> On Sun, Jun 12, 2016 at 10:56 AM, Stefan Kaitschick
>  wrote:
> > I don't know how the added training compares to direct training of the
> > shallow network.
> > It's prob. not so important, because both should be much faster than the
> > training of the deep NN.
> > Accuracy should be slightly improved.
> >
> > Together, that might not justify the effort. But I think the fact that
> you
> > can create the mimicking NN, after the deep NN has been refined with self
> > play, is important.
> >
> > On Sun, Jun 12, 2016 at 9:51 AM, Petri Pitkanen <
> petri.t.pitka...@gmail.com>
> > wrote:
> >>
> >> Would the expected improvement be reduced training time or improved
> >> accuracy?
> >>
> >>
> >> 2016-06-11 23:06 GMT+03:00 Stefan Kaitschick
> >> :
> >>>
> >>> If I understood it right, the playout NN in AlphaGo was created by
> using
> >>> the same training set as the one used for the large NN that is used in
> the
> >>> tree. There would be an alternative though. I don't know if this is
> the best
> >>> source, but here is one example: https://arxiv.org/pdf/1312.6184.pdf
> >>> The idea is to teach a shallow NN to mimic the outputs of a deeper net.
> >>> For one thing, this seems to give better results than direct training
> on the
> >>> same set. But also, more importantly, this could be done after the
> large NN
> >>> has been improved with selfplay.
> >>> And after that, the selfplay could be restarted with the new playout
> NN.
> >>> So it seems to me, there is real room for improvement here.
> >>>
> >>> Stefan
> >>>
> >>> ___
> >>> Computer-go mailing list
> >>> Computer-go@computer-go.org
> >>> http://computer-go.org/mailman/listinfo/computer-go
> >>
> >>
> >>
> >> ___
> >> Computer-go mailing list
> >> Computer-go@computer-go.org
> >> http://computer-go.org/mailman/listinfo/computer-go
> >
> >
> >
> > ___
> > Computer-go mailing list
> > Computer-go@computer-go.org
> > http://computer-go.org/mailman/listinfo/computer-go
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Hajin Lee will play a live commented game against CrazyStone

2016-05-22 Thread Álvaro Begué
Disclaimer: I haven't actually implemented MCTS with NNs, but I have played
around with both techniques.

Would it make sense to artificially scale down the values before the
SoftMax is applied, so the probability distribution is not as concentrated,
and unlikely moves are not penalized as much?



On Sun, May 22, 2016 at 3:54 PM, Rémi Coulom  wrote:

> Hi,
>
> Thanks for using Crazy Stone.
>
> I tried changes during the week, but nothing worked. So the version that
> played the game was almost identical to the commercial version.
>
> The search did not anticipate Black E8 after B3. It seems the NN makes the
> search too selective. I will investigate more.
>
> Rémi
>
> - Mail original -
> De: "Paweł Morawiecki" 
> À: computer-go@computer-go.org
> Envoyé: Dimanche 22 Mai 2016 21:29:56
> Objet: Re: [Computer-go] Hajin Lee will play a live commented game
> against  CrazyStone
>
>
>
> Hi,
>
>
>
>
>
>
>
>
>
> It's fun to hear the pro making comments as she goes. I had hoped for a
> better game, though.
> Any comments from the CS camp?
>
>
>
> I'm not from CrazyStone Team but a happy user of CS Deep Learning.
>
>
> I analyzed the game (30 000 playouts per move) with the version
> commercially available and it got everything right. I mean every move Hajin
> Lee questioned was also questioned by CrazyStone running on my PC. It
> includes:
>
>
> - questionable attachment in the first joseki they played
> - hane in upper-left corner
> - and finally a blunder at b3 (loosing move) and selection of this joseki
> as well
>
>
> Remi said that he's been working hard to improve CS over the last week,
> but it looks like something went wrong and instead he got clearly a weaker
> program. Particularly this B3 is really strange, where right after this
> white catches everything with one move. Remi, what went wrong?
>
>
> Cheers,
> Paweł
>
>
>
>
>
>
>
>
>
>
>
>
>
> Thanks,
> Álvaro.
>
>
>
>
>
>
>
>
> On Mon, May 16, 2016 at 3:58 AM, Xavier Combelle <
> xavier.combe...@gmail.com > wrote:
>
>
>
> That's fantastic
>
>
> I suppose crazystone will play with crazystone account, but what will be
> her handle ?
>
>
>
>
>
> 2016-05-16 9:50 GMT+02:00 Rémi Coulom < remi.cou...@free.fr > :
>
>
> Hi,
>
> I am very happy to announce that Hajin Lee will play a live commented game
> against Crazy Stone on Sunday, at 8PM Korean time. The game will take place
> on KGS, and she will make live comments on her youtube channel.
>
> Haylee's youtube:
> https://www.youtube.com/c/HayleesWorldofGoBaduk
>
> Rémi
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Hajin Lee will play a live commented game against CrazyStone

2016-05-22 Thread Álvaro Begué
I just saw the video here: https://www.youtube.com/watch?v=ZdrV2H5zIOM

It's fun to hear the pro making comments as she goes. I had hoped for a
better game, though.

Any comments from the CS camp?

Thanks,
Álvaro.



On Mon, May 16, 2016 at 3:58 AM, Xavier Combelle 
wrote:

> That's fantastic
>
> I suppose crazystone will play with crazystone account, but what will be
> her handle ?
>
> 2016-05-16 9:50 GMT+02:00 Rémi Coulom :
>
>> Hi,
>>
>> I am very happy to announce that Hajin Lee will play a live commented
>> game against Crazy Stone on Sunday, at 8PM Korean time. The game will take
>> place on KGS, and she will make live comments on her youtube channel.
>>
>> Haylee's youtube:
>> https://www.youtube.com/c/HayleesWorldofGoBaduk
>>
>> Rémi
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Machine for Deep Neural Net training

2016-04-27 Thread Álvaro Begué
What are you doing that uses so much disk space? An extremely naive
computation of required space for what you are doing is:
30M samples * (42 input planes + 1 output plane)/sample * 19*19
floats/plane * 4 bytes/float = 1.7 TB

So that's cutting it close, But I think the inputs and outputs are all
binary, which allows a factor of 32 compression right there, and you might
be using constant planes for some inputs, and if the output is a move it
fits in 9 bits...

Álvaro.


On Wed, Apr 27, 2016 at 12:55 AM, David Fotland 
wrote:

> I have my deep neural net training setup working, and it's working so well
> I
> want to share.  I already had Caffe running on my desktop machine (4 core
> i7) without a GPU, with inputs similar to AlphaGo generated by Many Faces
> into an LMDB database.  I trained a few small nets for a day each to get
> some feel for it.
>
> I bought an Alienware Area 51 from Dell, with two GTX 980 TI GPUs, 16 GB of
> memory, and 2 TB of disk.  I set it up to dual boot Ubuntu 14.04, which
> made
> it trivial to get the latest caffe up and running with CUDNN.  2 TB of disk
> is not enough.  I'll have to add another drive.
>
> I expected something like 20x speedup on training, but I was shocked by
> what
> I actually got.
>
> On my desktop, the Caffe MNIST sample took 27 minutes to complete.  On the
> new machine it was 22 seconds.  73x faster.
>
> My simple network has 42 input planes, and 4 layers of 48 filters each.
> Training runs about 100x faster on the Alienware.  Training 100k Caffe
> iterations (batches) of 50 positions takes 13 minutes, rather than almost a
> full day on my desktop.
>
> David
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Beginner question : how to choose a board representation

2016-04-10 Thread Álvaro Begué
Hi,

I also did computer chess before go (and checkers before chess). I would
start with a straight-forward implementation and learn with it. If you end
up finding your board representation limiting, rewrite it.

Here's some code from my program:

int const N = 19;
int const XN = N + 2;
int const XN2 = XN * XN;

enum class Color : char {
  Empty, Black, White, Outside
};

inline bool is_occupied(Color c) {
  return c == Color::Black || c == Color::White;
}

inline Color enemy(Color c) {
  return c == Color::Black ? Color::White : c == Color::White ?
Color::Black : Color::Empty;
}

struct Board {
  Color array[XN2];
  int chain_id[XN2];
  int chain_size[XN2];
  int chain_liberties[XN2];
  int next_in_chain[XN2];
  int age[XN2];
[...]



On Sun, Apr 10, 2016 at 3:19 AM, Jean-Francois Romang 
wrote:

> Hello to everyone ; I'm a newcomer in this list and computer go
> programming. I have a chess programming background, but I want to start
> something new. :-)
> I'm currently in the early phases of developing GTP compatible go engine ;
> now it's time for me to choose a board representation : are there some
> articles or tips on this ?
> Thanks,
> Jean-Francois
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] new challenge for Go programmers

2016-03-31 Thread Álvaro Begué
A very simple-minded way of trying to identify what a particular neuron in
the upper layers is doing is to find the 50 positions in the database that
make it produce the highest activation values. If the neuron is in one of
the convolutional layers, you get a full 19x19 image of activation values,
which would let you figure out what particular local pattern it seems to be
detecting. If the neuron is in a fully-connected layer at the end, you only
get one overall value, but you could still try to compute the gradient of
its activation with respect to all the inputs, and that would tell you
something about what parts of the board led to this activation being high.
I think this would be a fun exercise, and you'll probably be able to
understand something about at least some of the neurons.

Álvaro.



On Thu, Mar 31, 2016 at 9:55 AM, Michael Markefka <
michael.marke...@gmail.com> wrote:

> Then again DNNs also manage feature extraction on unlabeled data with
> increasing levels of abstraction towards upper layers. Perhaps one
> could apply such a specifically trained DNN to artificial board
> situations that emphasize specific concepts and examine the network's
> activation, trying to map activation patterns to human Go concepts.
>
> Still hard work, and questionable payoff, but just wanted to pitch
> that in as idea.
>
>
> > However, if someone was to do all the dirty work setting up all the
> > infrastructure, hunt down the training data and then financially
> facilitate
> > the thousands of hours of human work and the tens to hundreds of
> thousands
> > of hours of automated learning work, I would become substantially more
> > interested...and think a high quality desired outcome remains a low
> > probability.
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] new challenge for Go programmers

2016-03-30 Thread Álvaro Begué
> no lack of respect for DeepMind's achievement was contained in my
> posting; on the contrary, i was as surprised as anyone at how well she
> did and it gave me great pause for thought.
>

Well, you wrote this:

> but convolutional neural networks and monte-carlo simulators have not
> advanced the science of artificial intelligence one whit further than
> being engineered empirical validations of the 1940s-era theories of
> McCullough & Pitts and Ulam respectively, albeit their conjunction
> being a seminal validation insofar as duffing up human Go players is
> concerned.
>

That paragraph is disrespectful of AlphaGo and every important development
that it was built on. Theorists of the 40s didn't know jackshit about how
to make a strong go program or any other part of AI, for that matter.

This is like giving credit to the pre-Socratic philosophers for atomic
theory, or to Genesis for the Big Bang theory. I am sure there are people
that see connections, but no. Just no.

one has to expect a certain amount of abuse when going public, and to
> expect that eager critics will misrepresent what was said.
>

Your vast experience in the field means your opinions were formed way
before we knew what works and what doesn't, and are essentially worthless.

There, you like abuse?

Álvaro.


On Wed, Mar 30, 2016 at 6:04 AM, djhbrown .  wrote:

> one has to expect a certain amount of abuse when going public, and to
> expect that eager critics will misrepresent what was said.
>
> no lack of respect for DeepMind's achievement was contained in my
> posting; on the contrary, i was as surprised as anyone at how well she
> did and it gave me great pause for thought.
>
> as to preconceived notions, my own notions are postconceived, having
> studied artificial intelligence and biological computation over 40
> post-doctoral years during which i have published 50 or so
> peer-reviewed scientific papers, some in respectable journals,
> including New Scientist.
>
> On 30/03/2016, Stefan Kaitschick  wrote:
> > Your lack of respect for task performance is misguided imo. Your
> > preconceived notions of what intelligence is, will lead you astray.
> >
>
>
> --
> patient: "whenever i open my mouth, i get a shooting pain in my foot"
> doctor: "fire!"
> http://sites.google.com/site/djhbrown2/home
> https://www.youtube.com/user/djhbrown
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] UEC cup 2nd day

2016-03-24 Thread Álvaro Begué
I have used TensorFlow to train a CNN that predicts the next move, with a
similar architecture to what others have used (1 layers of 5x5 convolutions
followed by 10 more layers of 3x3 convolutions, with 192 hidden units per
layer and ReLU activation functions) but with much simpler inputs. I found
the Python parts worked quite well, and that's how I did the training,
using a GTX 980 GPU. But getting the trained network to work from C++ was a
pain, and I ended up rolling out my own code to evaluate it using the
CPU(s).

I also tried to train the network to predict the final ownership for each
point on the board (in the hopes that it would learn about life and death),
but this didn't work too well, presumably because I didn't have good enough
data to train it with.

Álvaro.



On Thu, Mar 24, 2016 at 2:42 PM, Darren Cook  wrote:

> Thanks for the very interesting replies, David, and Remi.
>
> No-one is using TensorFlow, then? Any reason not to? (I'm just curious
> because there looks to be a good Udacity DNN course
> (https://www.udacity.com/course/deep-learning--ud730), which I was
> considering, but it is using TensorFlow.)
>
>
> Remi wrote:
> > programming back-propagation efficiently on the GPU. We did get a GPU
> > version working, but it took a lot of time to program it, and was not
> > so efficient. So the current DCNN of Crazy Stone is 100% trained on
> > the CPU, and 100% running on the CPU. My CPU code is efficient,
> > though. It is considerably faster than Caffe. My impression is that
> > Caffe is inefficient because it uses the GEMM approach, which may be
> > good for high-resolution pictures, but is not for small 19x19
> > boards.
>
> I did a bit of study on what GEMM is, and found this article and the 2nd
> comment on it quite interesting:
>
>
> http://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/
>
> The comment, by Scott Gray, mentioned:
>
>   So instead of thinking of convolution as a problem of one large gemm
> operation, it’s actually much more efficient as many small gemms. To
> compute a large gemm on a GPU you need to break it up into many small
> tiles anyway. So rather than waste time duplicating your data into a
> large matrix, you can just start doing small gemms right away directly
> on the data. Let the L2 cache do the duplication for you.
>
>
> He doesn't quantify large vs. small; though I doubt anyone is doing
> image recognition on 19x19 pixel images :-)
>
> Darren
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Congratulations to AlphaGo (Statistical significance of results)

2016-03-22 Thread Álvaro Begué
On Tue, Mar 22, 2016 at 1:40 PM, Nick Wedd <mapr...@gmail.com> wrote:

> On 22 March 2016 at 17:20, Álvaro Begué <alvaro.be...@gmail.com> wrote:
>
>> A very simple-minded analysis is that, if the null hypothesis is that
>> AlphaGo and Lee Sedol are equally strong, AlphaGo would do as well as we
>> observed or better 15.625% of the time. That's a p-value that even social
>> scientists don't get excited about. :)
>>
>>
> "For "as well ... or better", I make it 18.75%.
>

I obviously can't count. :)

Thanks for the correction.

Álvaro.




>
> Nick
>
>
>
>> Álvaro.
>>
>>
>> On Tue, Mar 22, 2016 at 12:48 PM, Jason House <
>> jason.james.ho...@gmail.com> wrote:
>>
>>> Statistical significance requires a null hypothesis... I think it's
>>> probably easiest to ask the question of if I assume an ELO difference of x,
>>> how likely it's a 4-1 result?
>>> Turns out that 220 to 270 ELO has a 41% chance of that result.
>>> >= 10% is -50 to 670 ELO
>>> >= 1% is -250 to 1190 ELO
>>> My numbers may be slightly off from eyeballing things in a simple excel
>>> sheet. The idea and ranges should be clear though
>>> On Mar 22, 2016 12:00 PM, "Lucas, Simon M" <s...@essex.ac.uk> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I was discussing the results with a colleague outside
>>>> of the Game AI area the other day when he raised
>>>> the question (which applies to nearly all sporting events,
>>>> given the small sample size involved)
>>>> of statistical significance - suggesting that on another week
>>>> the result might have been 4-1 to Lee Sedol.
>>>>
>>>> I pointed out that in games of skill there's much more to judge than
>>>> just the final
>>>> outcome of each game, but wondered if anyone had any better (or worse :)
>>>> arguments - or had even engaged in the same type of
>>>> conversation.
>>>>
>>>> With AlphaGo winning 4 games to 1, from a simplistic
>>>> stats point of view (with the prior assumption of a fair
>>>> coin toss) you'd not be able to claim much statistical
>>>> significance, yet most (me included) believe that
>>>> AlphaGo is a genuinely better Go player than Lee Sedol.
>>>>
>>>> From a stats viewpoint you can use this approach:
>>>> http://www.inference.phy.cam.ac.uk/itprnn/book.pdf
>>>> (see section 3.2 on page 51)
>>>>
>>>> but given even priors it won't tell you much.
>>>>
>>>> Anyone know any good references for refuting this
>>>> type of argument - the fact is of course that a game of Go
>>>> is nothing like a coin toss.  Games of skill tend to base their
>>>> outcomes on the result of many (in the case of Go many hundreds of)
>>>> individual actions.
>>>>
>>>> Best wishes,
>>>>
>>>>   Simon
>>>>
>>>>
>>>> ___
>>>> Computer-go mailing list
>>>> Computer-go@computer-go.org
>>>> http://computer-go.org/mailman/listinfo/computer-go
>>>
>>>
>>> ___
>>> Computer-go mailing list
>>> Computer-go@computer-go.org
>>> http://computer-go.org/mailman/listinfo/computer-go
>>>
>>
>>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>
>
>
> --
> Nick Wedd  mapr...@gmail.com
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Congratulations to AlphaGo (Statistical significance of results)

2016-03-22 Thread Álvaro Begué
A very simple-minded analysis is that, if the null hypothesis is that
AlphaGo and Lee Sedol are equally strong, AlphaGo would do as well as we
observed or better 15.625% of the time. That's a p-value that even social
scientists don't get excited about. :)

Álvaro.


On Tue, Mar 22, 2016 at 12:48 PM, Jason House 
wrote:

> Statistical significance requires a null hypothesis... I think it's
> probably easiest to ask the question of if I assume an ELO difference of x,
> how likely it's a 4-1 result?
> Turns out that 220 to 270 ELO has a 41% chance of that result.
> >= 10% is -50 to 670 ELO
> >= 1% is -250 to 1190 ELO
> My numbers may be slightly off from eyeballing things in a simple excel
> sheet. The idea and ranges should be clear though
> On Mar 22, 2016 12:00 PM, "Lucas, Simon M"  wrote:
>
>> Hi all,
>>
>> I was discussing the results with a colleague outside
>> of the Game AI area the other day when he raised
>> the question (which applies to nearly all sporting events,
>> given the small sample size involved)
>> of statistical significance - suggesting that on another week
>> the result might have been 4-1 to Lee Sedol.
>>
>> I pointed out that in games of skill there's much more to judge than just
>> the final
>> outcome of each game, but wondered if anyone had any better (or worse :)
>> arguments - or had even engaged in the same type of
>> conversation.
>>
>> With AlphaGo winning 4 games to 1, from a simplistic
>> stats point of view (with the prior assumption of a fair
>> coin toss) you'd not be able to claim much statistical
>> significance, yet most (me included) believe that
>> AlphaGo is a genuinely better Go player than Lee Sedol.
>>
>> From a stats viewpoint you can use this approach:
>> http://www.inference.phy.cam.ac.uk/itprnn/book.pdf
>> (see section 3.2 on page 51)
>>
>> but given even priors it won't tell you much.
>>
>> Anyone know any good references for refuting this
>> type of argument - the fact is of course that a game of Go
>> is nothing like a coin toss.  Games of skill tend to base their
>> outcomes on the result of many (in the case of Go many hundreds of)
>> individual actions.
>>
>> Best wishes,
>>
>>   Simon
>>
>>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Go Bot for the Browser?

2016-03-19 Thread Álvaro Begué
Actually the DCNN plays on 9x9 acceptably well (somewhere in the
single-digit kyus).

On Friday, March 18, 2016, Benjamin Teuber  wrote:

> This is really cool. Now it just needs to learn 9x9 via reinforcement
> learning ;-)
>
> Josef Moudrik  > schrieb am Fr.,
> 18. März 2016 10:21:
>
>> Aha! Thanks for the clarification.
>>
>> Josef
>>
>> Dne pá 18. 3. 2016 9:59 uživatel Darren Cook > > napsal:
>>
>>> > If I remember correctly, it is not browser implementation, but rather a
>>> > frontend. The actual computation runs on server, browser only
>>> communicates the
>>> > moves and shows the results.
>>>
>>> No, a quick test shows once it loads it has not made any server calls.
>>> It has a 14MB file which looks like [1]
>>>
>>> Darren
>>>
>>>
>>> [1]:
>>> // 8 layer network trained on GoGoD data, truncated to 6 decimal places
>>> to reduce size
>>> var json_net = {"layers": [{"layer_type": "input", "out_sy": 25,
>>> "out_depth": 8, "out_sx": 25}, {"layer_type"
>>> : "conv", "sy": 25, "sx": 25, "out_sx": 19, "out_sy": 19, "stride": 1,
>>> "pad": 0, "biases": {"depth":
>>>  64, "sx": 1, "sy": 1, "w": [0.519023, -1.379795, -0.495255, -0.051380,
>>> -0.466160, -1.380873, -0.630742
>>> , -0.174662, -0.743714, -1.288785, -0.607110, -0.536119, -0.819585,
>>> -0.248130, -0.629681, -0.004683,
>>>  -0.408890, -1.701742, -0.011255, -0.833270, -0.665327, -0.127002,
>>> -0.793772, -0.518614, -1.390844, -1
>>> .982825, -0.012530, -0.140848, -1.255086, -0.761665, -0.077154,
>>> -0.748323, -0.086952, -0.175683, -1.526860
>>> , 0.098685, -0.030402, -0.903232, -
>>> ...
>>>
>>> ___
>>> Computer-go mailing list
>>> Computer-go@computer-go.org
>>> 
>>> http://computer-go.org/mailman/listinfo/computer-go
>>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> 
>> http://computer-go.org/mailman/listinfo/computer-go
>
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Go Bot for the Browser?

2016-03-19 Thread Álvaro Begué
A while back somebody posted a link to a browser implementation of a DCNN:
https://chrisc36.github.io/deep-go/

Would something like that do?

Álvaro.



On Wed, Mar 16, 2016 at 4:44 PM, Benjamin Teuber  wrote:

> Hi everyone,
>
> for a Go beginner website I would like to have a bot that runs in
> client-side javascript, it doesn't have to be that strong.
> An option might be transpiling gnugo with emscripten, but I'm not very
> familiar with that technology and the first google entry (
> http://comments.gmane.org/gmane.comp.compilers.emscripten/1051) looks
> like it might be not so easy.
>
> Any ideas? Or even better, is anyone curious about it and willing to help?
> :)
>
> Regards Benjamin
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Move evalution by expected value, as product of expected winrate and expected points?

2016-02-23 Thread Álvaro Begué
I have experimented with a CNN that predicts ownership, but I found it to
be too weak to be useful. The main difference between what Google did and
what I did is in the dataset used for training: I had tens of thousands of
games (I did several different experiments) and I used all the positions
from each game (which is known to be problematic); they used 30M positions
from independent games. I expect you can learn a lot about ownership and
expected number of points from a dataset like that. Unfortunately,
generating such a dataset is infeasible with the resources most of us have.

Here's an idea: Google could make the dataset publicly available for
download, ideally with the final configurations of the board as well. There
is a tradition of making interesting datasets for machine learning
available, so I have some hope this may happen.

The one experiment I would like to make along the lines of your post is to
train a CNN to compute both the expected number of points and its standard
deviation. If you assume the distribution of scores is well approximated by
a normal distribution, maximizing winning probability can be achieved by
maximizing (expected score) / (standard deviation of the score). I wonder
if that results in stronger or more natural play than making a direct model
for winning probability, because you get to learn more about each position.

Álvaro.



On Tue, Feb 23, 2016 at 5:36 AM, Michael Markefka <
michael.marke...@gmail.com> wrote:

> Hello everyone,
>
> in the wake of AlphaGo using a DCNN to predict expected winrate of a
> move, I've been wondering whether one could train a DCNN for expected
> territory or points successfully enough to be of some use (leaving the
> issue of win by resignation for a more in-depth discussion). And,
> whether winrate and expected territory (or points) always run in
> parallel or whether there are diverging moments.
>
> Computer Go programs play what are considered slack or slow moves when
> ahead, sometimes being too conservative and giving away too much of
> their potential advantage. If expected points and expected winrate
> diverge, this could be a way to make the programs play in a more
> natural way, even if there were no strength increase to be gained.
> Then again there might be a parameter configuration that might yield
> some advantage and perhaps this configuration would need to be
> dynamic, favoring winrate the further the game progresses.
>
>
> As a general example for the idea, let's assume we have the following
> potential moves generated by our program:
>
> #1: Winrate 55%, +5 expected final points
> #2: Winrate 53%, +15 expected final points
>
> Is the move with higher winrate always better? Or would there be some
> benefit to choosing #2? Would this differ depending on how far along
> the game is?
>
> If we knew the winrate prediction to be perfect, then going by that
> alone would probably result in the best overall performance. But given
> some uncertainty there, expected value could be interesting.
>
>
> Any takers for some experiments?
>
>
> -Michael
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Frisbee Go

2016-02-22 Thread Álvaro Begué
You could express the intended move as a pair of real numbers. A random
offset is then added, following some probability distribution (Gaussian, or
uniform in a disk of a certain radius, or ...), and then the result is
rounded to the nearest point of integer coordinates. What possibilities
does this not cover?

I like the idea of using Gaussian noise and handicapping games by assigning
a larger variance to the stronger player. :)

Álvaro.



On Mon, Feb 22, 2016 at 10:27 AM, John Tromp  wrote:

> dear Nick,
>
> > There's an assumption implicitly made here, which does not accord with my
> > experience of frisbee Go: that the player will always aim at an
> > intersection.
> >
> > Suppose I want to play on either of two adjacent points, and I don't care
> > which. If I aim for one of them, I will land on one of them with
> probability
> > (3p+1)/4, or whatever the formula says. I feel that I ought to be able
> to do
> > better by aiming midway between them.
>
> But then why stop there? You may also want to aim in between 4 points.
> Or perhaps just epsilon more toward the right of there.
>
> There's no accounting for all possibilities of real life frisbee Go,
> so we settle for the simplest rule that captures the esssence...
>
> regards,
> -John
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering the Game of Go with Deep Neural Networks and Tree Search (value network)

2016-02-04 Thread Álvaro Begué
I am not sure how exactly they define MSE. If you look at the plot in
figure 2b, the MSE at the very beginning of the game (where you can't
possibly know anything about the result) is 0.50. That suggests it's
something else than your [very sensible] interpretation.

Álvaro.



On Thu, Feb 4, 2016 at 2:24 PM, Detlef Schmicker  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> >> Since all positions of all games in the dataset are used, winrate
> >> should distributes from 0% to 100%, or -1 to 1, not 1. Then, the
> >> number 70% could be wrong.  MSE is 0.37 just means the average
> >> error is about 0.6, I think.
>
> 0.6 in the range of -1 to 1,
>
> which means -1 (eg lost by b) games -> typical value -0.4
> and +1 games -> typical value +0.4 of the value network
>
> if I rescale -1 to +1 to  0 - 100% (eg winrate for b) than I get about
> 30% for games lost by b and 70% for games won by B?
>
> Detlef
>
>
> Am 04.02.2016 um 20:10 schrieb Hideki Kato:
> > Detlef Schmicker: <56b385ce.4080...@physik.de>: Hi,
> >
> > I try to reproduce numbers from section 3: training the value
> > network
> >
> > On the test set of kgs games the MSE is 0.37. Is it correct, that
> > the results are represented as +1 and -1?
> >
> >> Looks correct.
> >
> > This means, that in a typical board position you get a value of
> > 1-sqrt(0.37) = 0.4  --> this would correspond to a win rate of 70%
> > ?!
> >
> >> Since all positions of all games in the dataset are used, winrate
> >> should distributes from 0% to 100%, or -1 to 1, not 1. Then, the
> >> number 70% could be wrong.  MSE is 0.37 just means the average
> >> error is about 0.6, I think.
> >
> >> Hideki
> >
> > Is it really true, that a typical kgs 6d+ position is judeged with
> > such a high win rate (even though it it is overfitted, so the test
> > set number is to bad!), or do I misinterpret the MSE calculation?!
> >
> > Any help would be great,
> >
> > Detlef
> >
> > Am 27.01.2016 um 19:46 schrieb Aja Huang:
>  Hi all,
> 
>  We are very excited to announce that our Go program, AlphaGo,
>  has beaten a professional player for the first time. AlphaGo
>  beat the European champion Fan Hui by 5 games to 0. We hope
>  you enjoy our paper, published in Nature today. The paper and
>  all the games can be found here:
> 
>  http://www.deepmind.com/alpha-go.html
> 
>  AlphaGo will be competing in a match against Lee Sedol in
>  Seoul, this March, to see whether we finally have a Go
>  program that is stronger than any human!
> 
>  Aja
> 
>  PS I am very busy preparing AlphaGo for the match, so
>  apologies in advance if I cannot respond to all questions
>  about AlphaGo.
> 
> 
> 
>  ___ Computer-go
>  mailing list Computer-go@computer-go.org
>  http://computer-go.org/mailman/listinfo/computer-go
> 
> >> ___ Computer-go
> >> mailing list Computer-go@computer-go.org
> >> http://computer-go.org/mailman/listinfo/computer-go
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v2.0.22 (GNU/Linux)
>
> iQIcBAEBAgAGBQJWs6WFAAoJEInWdHg+Znf4eTsP/21vawWsmrZkDuAjTkwbKB2S
> 7LpLi3huuLlepkulmUr3rIUvDHhTOwD04pDHjjVrIDBB1k3JjQQ/YKWDfijQQYu6
> ZI1GK55pglUPH+uc+rxfM89ziJwCQrza71l5XU+5ffcBwxRjeAL+D1fGGyr0CPlv
> WKR/Q07XDslXhwlk2O6NDpd80d38dMlMV9lO4s8Zf3Y+o8WJOuyEdybRpg8VOibq
> o59RCAWUiVkTs++iSihcIrVAwGnLtkPyMJ/lBN6zMyZQeuM0dyYL+IAoMH9IdCLQ
> 0jpbtJEqtSsp1ZjWs9s/M4pxKlvUZLThtYSjyGDJ2qDYXII6DeBgxHGUoUxc5A6a
> HVF04gG77U2fMCa/6eGlQN2380kNCjdyRCDUZc9St3tbQPnWU+syk6U/inF7bhAA
> 7ONJD0dcjZROmblqurv32pO6sLuS8wA4DfJhpM5xSSJcYI46YQtVWL4OXY+dtx6S
> 6uQ1fiPqgo4WM0iHEOnh7BEz0NqZeahIUJJVmgKODzp2krOqbpOpbwe7WUI7UHmK
> 3LCNC9oMRybNuc+jrbHqFwT+tgQLTqpbHZuDVzKkBcxqPSj7hRvjLXAjkWNCzL7j
> Yo4MySS6rzenuj9ZRSrQDSYfowRZyzPzMnmjkMbM7R7wpR5CL4U95LqOdMnce2IG
> s/6iYcuUH8KqpG9NMy0U
> =TnKW
> -END PGP SIGNATURE-
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering the Game of Go with Deep Neural Networks and Tree Search (value network)

2016-02-04 Thread Álvaro Begué
The positions they used are not from high-quality games. They actually
include one last move that is completely random.

Álvaro.


On Thursday, February 4, 2016, Detlef Schmicker  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Hi,
>
> I try to reproduce numbers from section 3: training the value network
>
> On the test set of kgs games the MSE is 0.37. Is it correct, that the
> results are represented as +1 and -1?
>
> This means, that in a typical board position you get a value of
> 1-sqrt(0.37) = 0.4  --> this would correspond to a win rate of 70% ?!
>
> Is it really true, that a typical kgs 6d+ position is judeged with
> such a high win rate (even though it it is overfitted, so the test set
> number is to bad!), or do I misinterpret the MSE calculation?!
>
> Any help would be great,
>
> Detlef
>
> Am 27.01.2016 um 19:46 schrieb Aja Huang:
> > Hi all,
> >
> > We are very excited to announce that our Go program, AlphaGo, has
> > beaten a professional player for the first time. AlphaGo beat the
> > European champion Fan Hui by 5 games to 0. We hope you enjoy our
> > paper, published in Nature today. The paper and all the games can
> > be found here:
> >
> > http://www.deepmind.com/alpha-go.html
> >
> > AlphaGo will be competing in a match against Lee Sedol in Seoul,
> > this March, to see whether we finally have a Go program that is
> > stronger than any human!
> >
> > Aja
> >
> > PS I am very busy preparing AlphaGo for the match, so apologies in
> > advance if I cannot respond to all questions about AlphaGo.
> >
> >
> >
> > ___ Computer-go mailing
> > list Computer-go@computer-go.org 
> > http://computer-go.org/mailman/listinfo/computer-go
> >
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v2.0.22 (GNU/Linux)
>
> iQIcBAEBAgAGBQJWs4XOAAoJEInWdHg+Znf4btkQAKP5T6o8qk9Fv1z/JJGTdcph
> A8rXWXNPNybSTqBVh7IJHKB9WxNIIOODjj6JYL4vhjFC+Eqy7AcpHkFZEn+f9j2G
> Kl3w6G88M9P8/iCn++pNHF3dPTrsn2doUcUjF1fJZsKeNCISuvbHwbzgXupna+lH
> qeCpYx/VBQMJIdhGqTmWsQozbFMFbeTumMH94UwNkTwo4Tnue/UCJweU0bWIIt0D
> TtCyLDsDcFy/qNrZC97858tpvOpo3hWs7pLf8ed+9r13UGeJhQJkedg6Oq0e5wTl
> Ye36Z1/2QHnQtvUbk6yjd6GMK0lo6LOOC1lTpp1nFvzcZ4ifrY2LejQ+7nWmafRq
> y46aeH4jqtEF2GXsFTq7ATftSYoeeUzgKxb8t4D8ShP3cRRWkpHFyGtdq25yYO2P
> hp35zJHAlCtUUOzy0YBY+mngYnbwxjp1ykwUi5DmubtRTfhf4pTAH+DF5/UeItn/
> IebMR9JlfCsFJZ7BLq7P7UoHz0eiG2vVKNXUP0Np4LA3KAm1IAOfJWRmLP9TrHOX
> 32vdIQLXlbDMTfbloXrFjQo3wm3pKrKstI41Pyoo5d8kE8FPxjCga6EfmarjRKGY
> UBf7Vz8iijGWjTjszui78HlJBIGls9qyuxNUcBN4+kfJJLP633HO+tmDM1hcsD4D
> BPtgsamuDtf5jvxGTa9d
> =Ab0j
> -END PGP SIGNATURE-
> ___
> Computer-go mailing list
> Computer-go@computer-go.org 
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering the Game of Go with Deep Neural Networks and Tree Search (value network)

2016-02-04 Thread Álvaro Begué
I re-read the relevant section and I agree with you. Sorry for adding noise
to the conversation.

Álvaro.







On Thu, Feb 4, 2016 at 12:21 PM, Detlef Schmicker <d...@physik.de> wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Thanks for the response, I do not refer to the finaly used data set:
> in the referred chapter they state, they have used their kgs dataset
> in a first try (which is in another part of the paper referred to
> being a 6d+ data set).
>
> Am 04.02.2016 um 18:11 schrieb Álvaro Begué:
> > The positions they used are not from high-quality games. They
> > actually include one last move that is completely random.
> >
> > Álvaro.
> >
> >
> > On Thursday, February 4, 2016, Detlef Schmicker <d...@physik.de>
> > wrote:
> >
> > Hi,
> >
> > I try to reproduce numbers from section 3: training the value
> > network
> >
> > On the test set of kgs games the MSE is 0.37. Is it correct, that
> > the results are represented as +1 and -1?
> >
> > This means, that in a typical board position you get a value of
> > 1-sqrt(0.37) = 0.4  --> this would correspond to a win rate of 70%
> > ?!
> >
> > Is it really true, that a typical kgs 6d+ position is judeged with
> > such a high win rate (even though it it is overfitted, so the test
> > set number is to bad!), or do I misinterpret the MSE calculation?!
> >
> > Any help would be great,
> >
> > Detlef
> >
> > Am 27.01.2016 um 19:46 schrieb Aja Huang:
> >>>> Hi all,
> >>>>
> >>>> We are very excited to announce that our Go program, AlphaGo,
> >>>> has beaten a professional player for the first time. AlphaGo
> >>>> beat the European champion Fan Hui by 5 games to 0. We hope
> >>>> you enjoy our paper, published in Nature today. The paper and
> >>>> all the games can be found here:
> >>>>
> >>>> http://www.deepmind.com/alpha-go.html
> >>>>
> >>>> AlphaGo will be competing in a match against Lee Sedol in
> >>>> Seoul, this March, to see whether we finally have a Go
> >>>> program that is stronger than any human!
> >>>>
> >>>> Aja
> >>>>
> >>>> PS I am very busy preparing AlphaGo for the match, so
> >>>> apologies in advance if I cannot respond to all questions
> >>>> about AlphaGo.
> >>>>
> >>>>
> >>>>
> >>>> ___ Computer-go
> >>>> mailing list Computer-go@computer-go.org <javascript:;>
> >>>> http://computer-go.org/mailman/listinfo/computer-go
> >>>>
> >> ___ Computer-go
> >> mailing list Computer-go@computer-go.org <javascript:;>
> >> http://computer-go.org/mailman/listinfo/computer-go
> >
> >
> >
> > ___ Computer-go mailing
> > list Computer-go@computer-go.org
> > http://computer-go.org/mailman/listinfo/computer-go
> >
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v2.0.22 (GNU/Linux)
>
> iQIcBAEBAgAGBQJWs4idAAoJEInWdHg+Znf4gyEP/iqAdAkxlsilYGQflCyN3z4V
> xg7esBxj2p8cR2SP3erFQNtThDMN+Qr8FNSYMbMHyJRUaJcsEhsk76Qbro/Bh815
> Xk/79w2LF0rdHwzYkIye3YunifIAwREaIXwCokzuPv0zFrKFgCu7UbIpup7oMXdr
> q3c+FvBwjXX/UtChBZC6kC8U2b1dijMnxPOQC05Hw/LMycKinOzzwByKS3CHdzg6
> eFHAAmrJsaY9iJvCyJQL5ZLdOMBVl50iLez5P8F2t1Bf+Qm03w2nnhAWl/3bjyVy
> hdvcNw6VGSUNeXo2wmF8SJoB1fOUOLAVVenc9jJHkcdcRQxSEBzuH25OfPgNTz55
> JgRqiSM0iOeQ9NmQlC1LRz1BDRYRUx0RsaCvcA1G3m5gKCsbbsVkluppNHUzxAUz
> o3+jazCi+88Gb5EZfdKF7p+g0JoWE2OucwXKyzlmUZMz4Hce+zOfSwv2k/9vrTLW
> z0LfKxDbqQG7cj1jVysvTcQvxSkA54cNLtj/uVNzTvoti+pwyscd5DqJ8jXfcHGG
> HZC3tPsVM7wvqf46EGgmjDI9jjhSTzXXdpDW7gfFTtUvZx8S4iGfmxWLod48deIP
> MiLDehl3rQBuQq5fx+i1ZDB3Gej6vvc9MHAiTo+kUf5TCYBMDAaxRt4OiLnK6N+d
> J02Sn7O1jSG4Fw5ud6iR
> =p2U5
> -END PGP SIGNATURE-
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering the Game of Go with Deep Neural Networks and Tree Search (value network)

2016-02-04 Thread Álvaro Begué
I just want to see how to get 0.5 for the initial position on the board
with some definition.

One possibility is that 0=loss, 1=win, and the number they are quoting is
sqrt(average((prediction-outcome)^2)).


On Thu, Feb 4, 2016 at 3:40 PM, Hideki Kato  wrote:

> I think the error is defined as the difference between the
> output of the value network and the average output of the
> simulations done by the policy network (RL) at each position.
>
> Hideki
>
> Michael Markefka: 

Re: [Computer-go] Mastering the Game of Go with Deep Neural Networks and Tree Search

2016-02-03 Thread Álvaro Begué
I searched for the file name on the web and found this copy:
http://airesearch.com/wp-content/uploads/2016/01/deepmind-mastering-go.pdf

Álvaro.



On Wed, Feb 3, 2016 at 4:37 AM, Oliver Lewis  wrote:

> Is the paper still available for download? The direct link appears to be
> broken.
>
> Thanks
>
> Oliver
>
>
> On Wed, Feb 3, 2016 at 2:06 AM, Igor Polyakov 
> wrote:
>
>> I think it would be an awesome commercial product for strong Go players.
>> Maybe even if the AI shows the continuations and the score estimates
>> between different lines, it will give the player enough reasoning to
>> understand why one move is better than the other.
>>
>>
>> On 2016-02-02 8:29, Jim O'Flaherty wrote:
>>
>> And to meta this awesome short story...
>>
>> AI Software Engineers: Robert, please stop asking our AI for
>> explanations. We don't want to distract it with limited human
>> understanding. And we don't want the Herculean task of coding up that
>> extremely frail and error prone bridge.
>> On Feb 1, 2016 3:03 PM, "Rainer Rosenthal"  wrote:
>>
>>> ~~
>>> Robert: "Hey, AI, you should provide explanations!"
>>> AI: "Why?"
>>> ~~
>>>
>>> Cheers,
>>> Rainer
>>>
 Date: Mon, 1 Feb 2016 08:15:12 -0600
 From: "Jim O'Flaherty" 
 To: computer-go@computer-go.org
 Subject: Re: [Computer-go] Mastering the Game of Go with Deep Neural
 Networks and Tree Search
 Message-ID:
 <
 cakx5gkjc7j0uq_pmxyumyfre7r+7ydltigbna5oo7kvnzq7...@mail.gmail.com>
 Content-Type: text/plain; charset="utf-8"

 Robert,

 I'm not seeing the ROI in attempting to map human idiosyncratic
 linguistic
 systems to/into a Go engine.

>>>
>>> ___
>>> Computer-go mailing list
>>> Computer-go@computer-go.org
>>> http://computer-go.org/mailman/listinfo/computer-go
>>
>>
>>
>> ___
>> Computer-go mailing 
>> listComputer-go@computer-go.orghttp://computer-go.org/mailman/listinfo/computer-go
>>
>>
>>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering the Game of Go with Deep Neural Networks and Tree Search

2016-02-01 Thread Álvaro Begué
Aja,

I read the paper with great interest. [Insert appropriate praises here.]

I am trying to understand the part where you use reinforcement learning to
improve upon the CNN trained by imitating humans. One thing that is not
explained is how to determine that a game is over, particularly when a
player is simply a CNN that has a probability distribution as its output.
Do you play until every point is either a suicide or looks like an eye? Do
you do anything to make sure you don't play in a seki?

I am sure you are a busy man these days, so please answer only when you
have time.

Thanks!
Álvaro.



On Wed, Jan 27, 2016 at 1:46 PM, Aja Huang  wrote:

> Hi all,
>
> We are very excited to announce that our Go program, AlphaGo, has beaten a
> professional player for the first time. AlphaGo beat the European champion
> Fan Hui by 5 games to 0. We hope you enjoy our paper, published in Nature
> today. The paper and all the games can be found here:
>
> http://www.deepmind.com/alpha-go.html
>
> AlphaGo will be competing in a match against Lee Sedol in Seoul, this
> March, to see whether we finally have a Go program that is stronger than
> any human!
>
> Aja
>
> PS I am very busy preparing AlphaGo for the match, so apologies in advance
> if I cannot respond to all questions about AlphaGo.
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo MCTS & Reinforcement Learning?

2016-01-31 Thread Álvaro Begué
How about you read the paper first? The conversation would make much more
sense if you actually spent some time trying to understand the details of
what they did. :) <-- (mandatory smiley to indicate I am not upset or
anything)



On Sun, Jan 31, 2016 at 10:20 AM, Greg Schmidt 
wrote:

> The articles I've read so far about AlphaGo mention both MCTS and
> RL/Q-Learning.  Since MCTS (and certainly UCT) keeps statistics on wins and
> propagates that information up the tree, that in and of itself would seem
> to constitute RL, so how does it make sense to have both?  It seems
> redundant to me.  Any thoughts on that?
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Game Over

2016-01-27 Thread Álvaro Begué
It's in the paper: "ladder capture" and "ladder escape" are features that
are fed as inputs into the CNN.

Álvaro.




On Wed, Jan 27, 2016 at 6:03 PM, Ryan Grant  wrote:

> To the authors: Did the deep-NN architecture learn ladders on its own,
> or was any extra ladder-evaluation code added to the playout module?
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Facebook Go

2016-01-27 Thread Álvaro Begué
Yes, it has been:
http://computer-go.org/pipermail/computer-go/2015-November/008267.html

Are there any news on Google's efforts?
Álvaro.



On Wed, Jan 27, 2016 at 10:10 AM, Richard Lorentz 
wrote:

> Not sure if this has been posted here already or not:
>  http://arxiv.org/abs/1511.06410
>
> I found out about it from here, which makes sense I guess:
>  https://www.facebook.com/zuck/posts/10102619979696481?fref=nf
>
> -Richard
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Board evaluation using a convolutional neural network

2016-01-15 Thread Álvaro Begué
I understand that using games from humans to learn about life and death
introduces all sorts of biases. That's why I tried to use games from an
engine instead.

In their standard configuration, MCTS engines will sometimes let lots of
groups die after they know the game is hopeless, or if they have a large
advantage and they still see a guaranteed victory after the group dies.
That's why I tried to configure Pachi to maximize score for this particular
purpose. However, as I said, I couldn't get Pachi to really maximize score
and play the games to the bitter end; in a good fraction of the games there
were dead stones left on the board, and in some games the penultimate eye
of a group was plugged for no good reason (probably because of some bug in
determining if it's OK to pass).

Álvaro.



On Fri, Jan 15, 2016 at 4:26 PM, Michael Sué  wrote:

> Hi,
>
> My experience is the same: My CNN was a very poor judge of life and
>> death. Part of the problem is that I couldn't get Pachi to behave
>> exactly the way I wanted (play to maximize score; play to the bitter
>> end, assuming everything left after two passes is considered alive). But
>> perhaps there is some deeper problem, or we are just missing an
>> important twist to make the technique work.
>>
>
> I think the problem with live and death is that if it comes to a group
> that will die the pro will resign; if he can save the group the bot may not
> even got it that there might have been a live and death issue.
>
> So, why can't you train the DNN on pure live and death problems, just like
> human players do it - from time to time. There are many collections (some
> even with solutions).
>
> - Michael.
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Board evaluation using a convolutional neural network

2016-01-15 Thread Álvaro Begué
On Fri, Jan 15, 2016 at 5:14 PM, Petr Baudis <pa...@ucw.cz> wrote:

>   Two ideas re Pachi:
>
>   (i) In uct/uct.c maximize_score section, try changing the
> max_losing_komi from 30 to, say, 300.
>
>   (ii) Try to work out that if Pachi decides it's losing, the gameplay
> is switched over to GNUGo to finish the game.
>

Interesting. I hadn't thought of (ii). Either GNUGo or a CNN would probably
work well to finish off the games after Pachi thinks the probability of
winning is outside of [0.1,0.9] or something like that.

Thanks!
Álvaro.


On Fri, Jan 15, 2016 at 5:14 PM, Petr Baudis <pa...@ucw.cz> wrote:

>   Hi!
>
> On Fri, Jan 15, 2016 at 04:54:18PM -0500, Álvaro Begué wrote:
> > In their standard configuration, MCTS engines will sometimes let lots of
> > groups die after they know the game is hopeless, or if they have a large
> > advantage and they still see a guaranteed victory after the group dies.
> > That's why I tried to configure Pachi to maximize score for this
> particular
> > purpose. However, as I said, I couldn't get Pachi to really maximize
> score
> > and play the games to the bitter end; in a good fraction of the games
> there
> > were dead stones left on the board, and in some games the penultimate eye
> > of a group was plugged for no good reason (probably because of some bug
> in
> > determining if it's OK to pass).
>
>   Two ideas re Pachi:
>
>   (i) In uct/uct.c maximize_score section, try changing the
> max_losing_komi from 30 to, say, 300.
>
>   (ii) Try to work out that if Pachi decides it's losing, the gameplay
> is switched over to GNUGo to finish the game.
>
> Petr Baudis
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Board evaluation using a convolutional neural network

2016-01-12 Thread Álvaro Begué
> Regarding 9x9, I believe Alvaro Begue has explored this idea in a way
> which perhaps would work better in a go engine. He used pachi to generate a
> database of games by playing against itself and then trained a model in a
> similar fashion to what I did. I'm not sure about the results of his
> experiments. If someone can point me to a large database of 9x9 games it
> would be easy to edit my code to do that.
>

My experience is the same: My CNN was a very poor judge of life and death.
Part of the problem is that I couldn't get Pachi to behave exactly the way
I wanted (play to maximize score; play to the bitter end, assuming
everything left after two passes is considered alive). But perhaps there is
some deeper problem, or we are just missing an important twist to make the
technique work.

Since those initial 9x9 experiments I have worked on the much easier
problem of coming up with a probability distribution for the next move, to
make sure I wasn't doing anything really wrong with the neural network. It
seems to be working well enough (48% accuracy with a very limited set of
inputs), so I think I'll switch back to trying to predict ownership and
score.

Álvaro.


On Tue, Jan 12, 2016 at 6:10 PM, Justin .Gilmer  wrote:

> Quick question: When using this mailing list, how to I explicately reply
> to a thread, so far I've just been editing the subject and sending it to
> computer-go@computer-go.org.
>
> Regarding use in a MTCS engine, I strongly suspect it would perform poorly
> in its current form. It is quite poor at life and death, especially if you
> give it situations very different from the training set. One issue with the
> method of training was I only used games which were played until the end
> (i.e. didn't end in resignation), as a result the model is extremely biased
> that large groups of stones live simply because games not ending in
> resignation tend to be close and not have large groups die.
>
> Depending on how hard it would be to integrate into a MCTS I could try it.
> My hope was that a well trained evaluator could allow for alpha beta
> pruning to be competitive with MCTS, interested to hear the groups thoughts
> on this.
>
> Regarding 9x9, I believe Alvaro Begue has explored this idea in a way
> which perhaps would work better in a go engine. He used pachi to generate a
> database of games by playing against itself and then trained a model in a
> similar fashion to what I did. I'm not sure about the results of his
> experiments. If someone can point me to a large database of 9x9 games it
> would be easy to edit my code to do that.
>
>
> -Justin
>
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] CNN with 54% prediction on KGS 6d+ data

2015-12-08 Thread Álvaro Begué
I don't think the CPU-GPU communication is what's going to kill this idea.
The latency in actually computing the feed-forward pass of the CNN is going
to be in the order of 0.1 seconds (I am guessing here), which means
finishing the first playout will take many seconds.

So perhaps it would be interesting to do something like this for
correspondence games, but not for regular games.


Álvaro.



On Tue, Dec 8, 2015 at 12:03 PM, Petr Baudis  wrote:

>   Hi!
>
>   Well, for this to be practical the entire playout would have to be
> executed on the GPU, with no round-trips to the CPU.  That's what my
> email was aimed at.
>
> On Tue, Dec 08, 2015 at 04:37:05PM +, Josef Moudrik wrote:
> > Regarding full CNN playouts, I think that problem is that a playout is a
> > long serial process, given 200-300 moves a game. You need to construct
> > planes and transfer them to GPU for each move and read result back (at
> > least with current CNN implementations afaik), so my guess would be that
> > such playout would take time in order of seconds. So there seems to be a
> > tradeoff, CNN playouts are (probably much) better (at "playing better
> > games") than e.g. distribution playouts, but whether this is worth the
> > implied (probably much) lower height of the MC tree is a question.
> >
> > Maybe if you had really a lot of GPUs and very high thinking time, this
> > could be the way.
> >
> > Josef
> >
> > On Tue, Dec 8, 2015 at 5:17 PM Petr Baudis  wrote:
> >
> > >   Hi!
> > >
> > >   In case someone is looking for a starting point to actually implement
> > > Go rules etc. on GPU, you may find useful:
> > >
> > >
> > > https://www.mail-archive.com/computer-go@computer-go.org/msg12485.html
> > >
> > >   I wonder if you can easily integrate caffe GPU kernels in another GPU
> > > kernel like this?  But without training, reimplementing the NN could be
> > > pretty straightforward.
> > >
> > > On Tue, Dec 08, 2015 at 04:53:14PM +0100, Michael Markefka wrote:
> > > > Hello Detlef,
> > > >
> > > > I've got a question regarding CNN-based Go engines I couldn't find
> > > > anything about on this list. As I've been following your posts here,
> I
> > > > thought you might be the right person to ask.
> > > >
> > > > Have you ever tried using the CNN for complete playouts? I know that
> > > > CNNs have been tried for move prediction, immediate scoring and move
> > > > generation to be used in an MC evaluator, but couldn't find anything
> > > > about CNN-based playouts.
> > > >
> > > > It might only be feasible to play out the CNN's first choice move for
> > > > evaluation purposes, but considering how well the performance of
> batch
> > > > sizes scales, especially on GPU-based CNN applications, it might be
> > > > possible to setup something like 10 candidate moves, 10 reply
> > > > candidate moves and then have the CNN play out the first choice move
> > > > for those 100 board positions until the end and then sum up scores
> > > > again for move evaluation (and/or possibly apply some other tried and
> > > > tested methods like minimax). Given that the number of 10 moves is
> > > > supposed to be illustrative rather than representative, other
> > > > configurations of depth and width in position generation and
> > > > evaluation would be possible.
> > > >
> > > > It feels like CNN can provide a very focused, high-quality width in
> > > > move generation, but it might also be possible to apply that quality
> > > > to depth of evaluation.
> > > >
> > > > Any thoughts to share?
> > > >
> > > >
> > > > All the best
> > > >
> > > > Michael
> > > >
> > > > On Tue, Dec 8, 2015 at 4:13 PM, Detlef Schmicker 
> wrote:
> > > > > -BEGIN PGP SIGNED MESSAGE-
> > > > > Hash: SHA1
> > > > >
> > > > > Hi,
> > > > >
> > > > > as somebody ask I will offer my actual CNN for testing.
> > > > >
> > > > > It has 54% prediction on KGS 6d+ data (which I thought would be
> state
> > > > > of the art when I started training, but it is not anymore:).
> > > > >
> > > > > it has:
> > > > > 1
> > > > > 2
> > > > > 3
> > > > >> 4 libs playing color
> > > > > 1
> > > > > 2
> > > > > 3
> > > > >> 4 libs opponent color
> > > > > Empty points
> > > > > last move
> > > > > second last move
> > > > > third last move
> > > > > forth last move
> > > > >
> > > > > input layers, and it is fully convolutional, so with just editing
> the
> > > > > golast19.prototxt file you can use it for 13x13 as well, as I did
> on
> > > > > last sunday. It was used in November tournament as well.
> > > > >
> > > > > You can find it
> > > > > http://physik.de/CNNlast.tar.gz
> > > > >
> > > > >
> > > > >
> > > > > If you try here some points I like to get discussion:
> > > > >
> > > > > - - it seems to me, that the playouts get much more important with
> such
> > > > > a strong move prediction. Often the move prediction seems better
> the
> > > > > playouts (I use 8000 at the moment against pachi 32000 with about
> 70%
> > > > > 

Re: [Computer-go] CNN with 54% prediction on KGS 6d+ data

2015-12-08 Thread Álvaro Begué
Of course whether these "neuro-playouts" are any better than the heavy
playouts currently being used by strong programs is an empirical question.
But I would love to see it answered...



On Tue, Dec 8, 2015 at 1:31 PM, David Ongaro <david.ong...@hamburg.de>
wrote:

> Did everyone forget the fact that stronger playouts don't necessarily lead
> to an better evaluation function? (Yes, that what playouts essential are, a
> dynamic evaluation function.) This is even under the assumption that we can
> reach the same number of playouts per move.
>
>
> On 08 Dec 2015, at 10:21, Álvaro Begué <alvaro.be...@gmail.com> wrote:
>
> I don't think the CPU-GPU communication is what's going to kill this idea.
> The latency in actually computing the feed-forward pass of the CNN is going
> to be in the order of 0.1 seconds (I am guessing here), which means
> finishing the first playout will take many seconds.
>
> So perhaps it would be interesting to do something like this for
> correspondence games, but not for regular games.
>
>
> Álvaro.
>
>
>
> On Tue, Dec 8, 2015 at 12:03 PM, Petr Baudis <pa...@ucw.cz> wrote:
>
>>   Hi!
>>
>>   Well, for this to be practical the entire playout would have to be
>> executed on the GPU, with no round-trips to the CPU.  That's what my
>> email was aimed at.
>>
>> On Tue, Dec 08, 2015 at 04:37:05PM +, Josef Moudrik wrote:
>> > Regarding full CNN playouts, I think that problem is that a playout is a
>> > long serial process, given 200-300 moves a game. You need to construct
>> > planes and transfer them to GPU for each move and read result back (at
>> > least with current CNN implementations afaik), so my guess would be that
>> > such playout would take time in order of seconds. So there seems to be a
>> > tradeoff, CNN playouts are (probably much) better (at "playing better
>> > games") than e.g. distribution playouts, but whether this is worth the
>> > implied (probably much) lower height of the MC tree is a question.
>> >
>> > Maybe if you had really a lot of GPUs and very high thinking time, this
>> > could be the way.
>> >
>> > Josef
>> >
>> > On Tue, Dec 8, 2015 at 5:17 PM Petr Baudis <pa...@ucw.cz> wrote:
>> >
>> > >   Hi!
>> > >
>> > >   In case someone is looking for a starting point to actually
>> implement
>> > > Go rules etc. on GPU, you may find useful:
>> > >
>> > >
>> > >
>> https://www.mail-archive.com/computer-go@computer-go.org/msg12485.html
>> > >
>> > >   I wonder if you can easily integrate caffe GPU kernels in another
>> GPU
>> > > kernel like this?  But without training, reimplementing the NN could
>> be
>> > > pretty straightforward.
>> > >
>> > > On Tue, Dec 08, 2015 at 04:53:14PM +0100, Michael Markefka wrote:
>> > > > Hello Detlef,
>> > > >
>> > > > I've got a question regarding CNN-based Go engines I couldn't find
>> > > > anything about on this list. As I've been following your posts
>> here, I
>> > > > thought you might be the right person to ask.
>> > > >
>> > > > Have you ever tried using the CNN for complete playouts? I know that
>> > > > CNNs have been tried for move prediction, immediate scoring and move
>> > > > generation to be used in an MC evaluator, but couldn't find anything
>> > > > about CNN-based playouts.
>> > > >
>> > > > It might only be feasible to play out the CNN's first choice move
>> for
>> > > > evaluation purposes, but considering how well the performance of
>> batch
>> > > > sizes scales, especially on GPU-based CNN applications, it might be
>> > > > possible to setup something like 10 candidate moves, 10 reply
>> > > > candidate moves and then have the CNN play out the first choice move
>> > > > for those 100 board positions until the end and then sum up scores
>> > > > again for move evaluation (and/or possibly apply some other tried
>> and
>> > > > tested methods like minimax). Given that the number of 10 moves is
>> > > > supposed to be illustrative rather than representative, other
>> > > > configurations of depth and width in position generation and
>> > > > evaluation would be possible.
>> > > >
>> > > > It feels like CNN can provide a very focused, high-quality width in
>>

Re: [Computer-go] Strong engine that maximizes score

2015-11-17 Thread Álvaro Begué
Thanks for your answer.

Unfortunately Pachi doesn't seem to really try to maximize score, even with
these settings: Once one side has won by a large enough margin, it will
stop trying to kill small groups and I am precisely trying to generate a
database to learn about life and death. Perhaps I can play around with the
settings in uct/dynkomi.c to see if I can make it behave closer to what I
need.

If anyone knows of a program that can actually use something like expected
score in the UCT move selection formula, that's probably what I need. I
might end up modifying Pachi to do this, but it sounds daunting.


Álvaro.


On Tue, Nov 17, 2015 at 5:26 AM, Josef Moudrik  wrote:

> I am trying to create a database of games to do some machine-learning
>> experiments. My requirements are:
>>  * that all games be played by the same strong engine on both sides,
>>  * that all games be played to the bitter end (so everything on the board
>> is alive at the end), and
>>  * that both sides play trying to maximize score, not winning probability.
>>
>> This last requirement is a bit unusual. Ideally I would like to do this
>> on 9x9 first (to be able to try many things quickly) and then 19x19 (the
>> real thing).
>>
>> Is there a strong engine that would allow me to do this? Linux or Mac
>> strongly preferred. I'll be happy to pay for it if it's commercial.
>>
>
> Hello,
>
> pachi ( pachi.or.cz ) is reasonably strong and you can achieve what you
> want by using following comandline options:
>
> pass_all_alive,maximize_score,resign_threshold=0.0
>
> but as the README for maximize_score says:
> Note that Pachi in this mode may be slightly weaker, and result margin
> should not be taken into account when judging either player's strength.
> During the game, the winning/losing margin can be approximated from
> Pachi's "extra komi" or "xkomi" reporting in the progress messages.
>
>
> Regards,
> Josef
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Strong engine that maximizes score

2015-11-17 Thread Álvaro Begué
After reading the relevant code, I realized that val_scale=1.0 should do
precisely what I wanted. I have tested it a bit, and so far so good.

Thanks!
Álvaro.



On Tue, Nov 17, 2015 at 7:12 AM, Petr Baudis <pa...@ucw.cz> wrote:

>   Hi!
>
> On Tue, Nov 17, 2015 at 07:05:34AM -0500, Álvaro Begué wrote:
> > If anyone knows of a program that can actually use something like
> expected
> > score in the UCT move selection formula, that's probably what I need. I
> > might end up modifying Pachi to do this, but it sounds daunting.
>
>   That's already implemented - see the options described in uct/uct.c
> under the heading "Node value result scaling".
>
>   This even is one of the things that happens when you enable
> maximize_score, but the score takes just (at most) 0.01 of the value,
> where 0.99 is depending on win/loss.  You might try to see what happens
> if, in addition to maximize_score, you also pass
>
> val_scale=0.1
>
> (instead of the default 0.01) or even larger value.  Not sure what
> effect on strength it might have.
>
> Petr Baudis
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Strong engine that maximizes score

2015-11-17 Thread Álvaro Begué
I wouldn't say they are "not compatible", since the move that maximizes
score is always in the top class (win>draw>loss) for any setting of komi.
You probably mean it in a practical sense, in that MCTS engines are
stronger when maximizing win probability.

I am more interested in attempting to maximize score, even if the engine is
significantly weaker. Of course this is not what most people want, so I
understand I am looking for something unusual.


Álvaro.


On Tue, Nov 17, 2015 at 2:58 PM, David Fotland 
wrote:

> Attempting to maximize the score is not compatible with being a strong
> engine.  If you want a dan level engine it is maximizing win-probability.
>
> David
>
> > -Original Message-
> > From: Computer-go [mailto:computer-go-boun...@computer-go.org] On
> Behalf Of
> > Darren Cook
> > Sent: Tuesday, November 17, 2015 6:49 AM
> > To: computer-go@computer-go.org
> > Subject: Re: [Computer-go] Strong engine that maximizes score
> >
> > > I am trying to create a database of games to do some machine-learning
> > > experiments. My requirements are:
> > >  * that all games be played by the same strong engine on both sides,
> > >  * that all games be played to the bitter end (so everything on the
> > > board is alive at the end), and
> > >  * that both sides play trying to maximize score, not winning
> probability.
> >
> > GnuGo might fit the bill, for some definition of strong. Or Many Faces,
> on
> > the level that does not use MCTS.
> >
> > Sticking with MCTS, you'd have to use komi adjustments: first find two
> > extreme values that give each side a win, then use a binary-search-like
> > algorithm to narrow it down until you find the correct value for komi for
> > that position. This will take approx 10 times longer than normal MCTS,
> for
> > the same strength level.
> >
> > (I'm not sure if this is what Pachi is doing?)
> >
> > Darren
> >
> > ___
> > Computer-go mailing list
> > Computer-go@computer-go.org
> > http://computer-go.org/mailman/listinfo/computer-go
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Strong engine that maximizes score

2015-11-16 Thread Álvaro Begué
Hi,

I am trying to create a database of games to do some machine-learning
experiments. My requirements are:
 * that all games be played by the same strong engine on both sides,
 * that all games be played to the bitter end (so everything on the board
is alive at the end), and
 * that both sides play trying to maximize score, not winning probability.

This last requirement is a bit unusual. Ideally I would like to do this on
9x9 first (to be able to try many things quickly) and then 19x19 (the real
thing).

Is there a strong engine that would allow me to do this? Linux or Mac
strongly preferred. I'll be happy to pay for it if it's commercial.

Regards,
Álvaro.
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Frisbee Go Simulation

2015-11-12 Thread Álvaro Begué
Normalizing the probabilities and re-throwing the frisbee until it lands in
a valid move are equivalent, of course.



On Thu, Nov 12, 2015 at 5:01 AM, David Peters  wrote:

> To keep changes to the protocol and number of parameters low, wouldn't it
> be a possibility to consider multiple 'throws' of a frisbee?
>
> So if the engine decides to play a move you have the described arrangement
> of hitting adjactent fields with probability eps. If this results in a move
> outside the board or an illegal move, you just repeat until you get a legal
> move. This could even mean, that you could even use an existing engine
> without change. You just add the additional step generating the random
> noise on the moves.
>
> Or is this orthogonal to the envisioned game?
>
> Best regards,
> David Peters
>
> *Gesendet:* Donnerstag, 12. November 2015 um 10:24 Uhr
> *Von:* "Darren Cook" 
> *An:* computer-go@computer-go.org
> *Betreff:* Re: [Computer-go] Frisbee Go Simulation
> > If one or two of these cells are outside the board the
> > move will count as a pass. If the landing cell is occupied by another
> > stone the move is also counted as a pass. Illegal moves are also counted
> > as pass moves.
>
> Alternatively, the probability could be adjusted for the number of legal
> moves. (E.g. taking the easy example of (1,1) on an empty board,and eps
> of 0.2, you'd adjust (1,1), (2,1) and (1,2) to each be 1/3 probability).
>
> This does away with the involuntary pass concept. (But if you keep it, I
> agree with John Tromp that it is just a wasted move, not able to cause
> early game termination.)
>
> Darren
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Frisbee Go Simulation

2015-11-11 Thread Álvaro Begué
1/5 also seems natural (equal chance of hitting each the 5 possible points).

Álvaro.



On Wed, Nov 11, 2015 at 10:08 AM, John Tromp  wrote:

> > By the way: It would also be necessary to decide about
> > the eps for the event. Natural candidates would be
> > eps=0.1 or eps=0.125.
>
> I would say the 2 most interesting choices are 1/8 or 1/4.
> The latter guarantees you miss your aim by distance 1,
> while the former gives you an even chance to hit it.
>
> -John
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Frisbee Go Simulation

2015-11-11 Thread Álvaro Begué
Oh! You can have a continuous handicap control by giving the players
different epsilons. :)



On Wed, Nov 11, 2015 at 2:25 PM, John Tromp  wrote:

> >> Would the game end after two unintentional passes?
>
> > Good point. In principle I would say so.
>
> That makes little sense to me.
> IMO, the principled rule is that two consecutive intentional passes
> end the game.
>
> To make sure that infinitely long games have 0 probability,
> we must then require that
>   the frisbee aim itself be a legal move (if eps < 1/4)
> or, in case eps=1/4, that
>   at least one of its neighbours be a legal move
>
> -John
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Number of 3x3 patterns

2015-11-03 Thread Álvaro Begué
I get 1107 (954 in the middle + 135 on the edge + 18 on a corner).

Álvaro.



On Tue, Nov 3, 2015 at 2:00 PM, Detlef Schmicker  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Thanks, but I need them reduced by reflection and rotation symmetries
> (and leave the center empty so 3^8 + 3^5 + 3^3 and than reduce)
>
>
>
> Am 03.11.2015 um 19:32 schrieb Gonçalo Mendes Ferreira:
> > If you are considering only black stone, white, empty and border,
> > ignoring symmetry, wouldn't it be
> >
> > 3^9 + 3^6 + 3^4
> >
> > 3^9 for patterns away from the border, 3^6 for near the sides and
> > 3^4 near the corners, assuming you are also interested in the
> > center value.
> >
> > This makes 20493, then you need to take out illegal patterns
> > (surrounded middle stone). So I'd hint it's close to 2.
> >
> > On 03/11/2015 18:17, Detlef Schmicker wrote: I could not find the
> > number of 3x3 patterns in Go, if used all symmetrie s.
> >
> > Can anybody give me a hint, were to find. Harvesting 4 games I
> > get 1093:)
> >
> > Thanks, Detlef
> >> ___ Computer-go
> >> mailing list Computer-go@computer-go.org
> >> http://computer-go.org/mailman/listinfo/computer-go
> >
> > ___ Computer-go mailing
> > list Computer-go@computer-go.org
> > http://computer-go.org/mailman/listinfo/computer-go
> >
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v2.0.22 (GNU/Linux)
>
> iQIcBAEBAgAGBQJWOQRAAAoJEInWdHg+Znf4SCcP/1oQI5ZI4nIMym8c/1gg1OWh
> IDqhO3Ie1Rd7nxgDxaXZDaYACZr9UYRsueyLCLNGignLw6FVpXZvfC9YRn+NKenw
> fruwUt/lRzfZGhRyxS5R55pq1X8ux1F/+aa63sRo4/SZGFCVCKeX/hsv3RyM6nuw
> iuO9BhFSr0upmarcAJiyt4qSg9JWVdrQI7CaWE6C5XdJI3v5zXxV6yG7mcrSl8oy
> 66fVbpLo++TjWRUZoChn+HSLuq1rTfp6fWPbMsV3Wwfk7Y3kGkBzMMjvjUpVRktv
> 5Nu3tcFEf2Hts1iudQ/lPTdA9UYPhkSKwD3l4Z8khR6HpxO8kKClbU6g/uXBwLP+
> k/ORYx4imeXMhSfgbzBxZAljRqL86cpUT9A0F0+Llqq4lClCq7I+CjZDXamAH8P8
> AqpDQhizni0wEy0KLizkwvJ8mZnqg310553wKaZs+BfP2kkAw+iUtrqvkQ6dsk2t
> 8fgwMT5Yuw/BfvXLHT/Bp2/GSw5Q97yMNXBBdMqvbIpWkann5bwqVmJljz1VV+TU
> G8CA9xeIj8HjVETBUIMURZDxTs+RkxT3fIdlEhH0MvreKBEw0Xp/eSUffqINVSk4
> pIcUdjTnutQgL/AtxePcnQuGksvB0KPK5zvUscNgXw9US3JUu1DXcJ6mn+y1PRKK
> 9xaazKCRBfJjAkhnBlzR
> =j9hC
> -END PGP SIGNATURE-
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] How to handle triple ko efficiently?

2015-10-16 Thread Álvaro Begué
> Btw does anyone have a good initialization vector for the Zobrist table?

The obvious thing to try is random numbers. Another idea is turning your
Zobrist key into CRC64, which I think is what you get if you generate your
numbers like this:

#include 

int main() {
  unsigned long long const P = 0x42F0E1EBA9EA3693ull;
  unsigned long long h = P;
  for (int i = 0; i < 1000; ++i) { // Or as many as you need
std::printf("%016llx\n", h);
h = (h & 0x8000ull) ? (h << 1) ^ P : (h << 1);
  }
}


Álvaro.


On Fri, Oct 16, 2015 at 9:36 AM, Gonçalo Mendes Ferreira 
wrote:

> Hmmm, that's interesting but for random playouts seems a bit overkill. In
> a random playout kos will only consistently happen if they are the last
> legal positions in the board, at which point it might be better just
> waiting for a few more plays, reaching the max depth and calling the score
> as it is. Testing simple kos should be faster too. Did you benchmark the
> difference between your method and just prohibiting play at the last eaten
> position (if one stone only)? What max depth (if any) did you use to limit
> the playouts?
>
> For hashing I use 64 bit Zobrist hashes of the board contents without
> last  played/eaten information, so it is not changed by passing. On
> collision I memcmp the whole board; if I didn't play legality would be off
> in false eyes, kos, suicides, etc and I store that in the node statistics.
> Btw does anyone have a good initialization vector for the Zobrist table?
>
> Gonçalo
>
> On 16/10/2015 13:50, Ben Ellis wrote:
>
>> My go playing program mostly just does random play outs so I don't claim
>> to
>> be an expert, but I check for super KO in my go program and it doesn't
>> seem
>> to slow it down too much. I use the same algorithm for checking for simple
>> KO as well.
>>
>> *This doesn't catch 100% of super KO but it is good enough to prevent
>> looping in your play-outs.*
>>
>>
>>  From memory, the high-level algorithm,
>>
>> At the start of the play out, start with
>> - HashSet of board hashes
>> - HashSet of board intersections (pre-populated with all board
>> intersections)
>>
>> After each move has been played and captured stones removed,
>>
>> 1. If the played intersection has not been played on before,
>>  - empty the board hashes set. O(1)
>>
>> 2. If any stones were captured and the intersection has not been played
>> before,
>>  - Generate hash O(1)
>>  - Insert hash into set of board hashes. O(1)
>>
>> 3. If the intersection has been played on before,
>> - Generate hash O(1)
>> - Insert hash into set of board hashes and check for duplicate O(1)
>> - If duplicate, KO or super KO detected O(1)
>>
>> 4. Remove intersection from set of intersections not yet played. O(1)
>>
>> Conditions at 2 and 3 are infrequent and tend to only happen in the end
>> game.
>>
>> Zobrist hash is common I'm not sure of any alternatives.
>>
>> If you want to check for all moves that would result in a KO or super KO,
>> you can iterate through all the empty intersections that have already been
>> played (there usually aren't many unless a big dragon is captured), and
>> see
>> if they would cause a duplicate hash.
>>
>> I'm sure someone more academic than me on this list will be able to pick
>> plenty of holes out of this :)
>>
>>
>> On 31 August 2015 at 09:18,  wrote:
>>
>> I once tried to use 32bit hashes only. It might work for super ko
>>> detection, but if you use it for transposition detection storing a lot of
>>> poisitions you will get collisions now and then. So my reasoning is: if
>>> 32
>>> bits almost works then using another 32 bits will work.
>>>
>>> Best
>>> Magnus
>>>
>>>
>>> On 2015-08-31 06:26, Minjae Kim wrote:
>>>
>>> Is 64 bits really enough?

 I may be wrong but there are 3^361 possible board positions in 19x19.
 log2(3^361) gives me 572.171..., which means I need at least 573 bits
 to record all possible board positions. This still doesn't mean that a
 2048 bit Zobrist hash for example will never collide for a 19x19
 board, but the odds will be significantly lesser.

 Do you have some source about the probability of Zobrist hash
 collision on a go board for different bit string sizes?

 On Mon, Aug 31, 2015 at 12:37 PM, Peter Drake 
 wrote:

 64 bits seems to be enough. As I understand it, the convention is to

> simply _ignore_ the possibility of collisions; you're more likely to
> have a hardware error.
>
> On Sun, Aug 30, 2015 at 8:27 PM, Minjae Kim 
> wrote:
>
> To my understanding, you need a sufficiently large bitstring to
> minimize possible hash collisions when using Zobrist hashing. When a
> hash collision does occur, it can possibly generate an illegal move.
> What is an acceptable size of the hash bitstring for a 19x19 board?
>
> On Mon, Aug 31, 2015 at 10:52 AM, 

Re: [Computer-go] How to handle triple ko efficiently?

2015-10-16 Thread Álvaro Begué
That sounds kind of obsessive. I think the probability of having a 0 or
repeated numbers in the table is about 3.18*10^-15 (someone else please
check my math). To generate some intuition for how small this number is, if
you did this a thousand times per second, the expected time to finding a
problematic table is about 10,000 years.

Álvaro.



On Fri, Oct 16, 2015 at 9:59 AM, Gonçalo Mendes Ferreira <go...@sapo.pt>
wrote:

> I'm currently using random numbers ensuring they are not 0 or repeated in
> the table, for obvious reasons. I suspect the next step for those
> interested would be ensuring each bit has a similar ratio of occurrences,
> or bruteforcing it.
>
> On 16/10/2015 14:51, Álvaro Begué wrote:
>
>> Btw does anyone have a good initialization vector for the Zobrist table?
>>>
>> The obvious thing to try is random numbers. Another idea is turning your
>> Zobrist key into CRC64, which I think is what you get if you generate your
>> numbers like this:
>>
>> #include 
>>
>> int main() {
>>unsigned long long const P = 0x42F0E1EBA9EA3693ull;
>>unsigned long long h = P;
>>for (int i = 0; i < 1000; ++i) { // Or as many as you need
>>  std::printf("%016llx\n", h);
>>  h = (h & 0x8000ull) ? (h << 1) ^ P : (h << 1);
>>}
>> }
>>
>>
>> Álvaro.
>>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] How to handle triple ko efficiently?

2015-10-16 Thread Álvaro Begué
We are talking about initialization of the Zobrist table. That will happen
once per execution of the program; or even just once ever, if he just
generates the table and hard-codes it into the program --which is what I do.

Álvaro.



On Fri, Oct 16, 2015 at 10:29 AM, Igor Polyakov <weiqiprogramm...@gmail.com>
wrote:

> If you're only getting 1000 table generations a second, you should look
> into your algorithm. You should get at least 100,000 table generations a
> second!
>
>
> On 2015-10-16 7:21, Álvaro Begué wrote:
>
> That sounds kind of obsessive. I think the probability of having a 0 or
> repeated numbers in the table is about 3.18*10^-15 (someone else please
> check my math). To generate some intuition for how small this number is, if
> you did this a thousand times per second, the expected time to finding a
> problematic table is about 10,000 years.
>
> Álvaro.
>
>
>
> On Fri, Oct 16, 2015 at 9:59 AM, Gonçalo Mendes Ferreira < <go...@sapo.pt>
> go...@sapo.pt> wrote:
>
>> I'm currently using random numbers ensuring they are not 0 or repeated in
>> the table, for obvious reasons. I suspect the next step for those
>> interested would be ensuring each bit has a similar ratio of occurrences,
>> or bruteforcing it.
>>
>> On 16/10/2015 14:51, Álvaro Begué wrote:
>>
>>> Btw does anyone have a good initialization vector for the Zobrist table?
>>>>
>>> The obvious thing to try is random numbers. Another idea is turning your
>>> Zobrist key into CRC64, which I think is what you get if you generate
>>> your
>>> numbers like this:
>>>
>>> #include 
>>>
>>> int main() {
>>>unsigned long long const P = 0x42F0E1EBA9EA3693ull;
>>>unsigned long long h = P;
>>>for (int i = 0; i < 1000; ++i) { // Or as many as you need
>>>  std::printf("%016llx\n", h);
>>>  h = (h & 0x8000ull) ? (h << 1) ^ P : (h << 1);
>>>}
>>> }
>>>
>>>
>>> Álvaro.
>>>
>>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>
>
>
> ___
> Computer-go mailing 
> listComputer-go@computer-go.orghttp://computer-go.org/mailman/listinfo/computer-go
>
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] what do you (or does your computer) see?

2015-09-04 Thread Álvaro Begué
Could you please stop posting your videos to this list? I find nothing of
value in them. If others disagree, please speak up.

Álvaro.



On Thu, Sep 3, 2015 at 11:31 PM, djhbrown .  wrote:

>
>
> https://www.youtube.com/watch?v=IoO7Nhlf_k4=PL4y5WtsvtduqNW0AKlSsOdea3Hl1X_v-S=10
>
> Please reply by posting a comment on YouTube
>
>
>
>
>
>
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] fast + good RNG

2015-03-29 Thread Álvaro Begué
If your PRNG is consuming 40% of your CPU time, your playouts are too light.

Anyway, it's very easy to make a fast PRNG these days. The first thing that
comes to mind is a 64-bit linear congruential generator of which you use
the middle bits, or you can XOR the high 32 bits and the low 32 bits
together. LCGs have well-understood limitations that don't really matter
for a go program.

If you want higher-quality PRNs you need to use a large state, but you can
still make it be very fast. Still, try the LCG first. I would be surprised
if you find any degradation in strength of your engine compared to the
Mersenne twister.

Álvaro.



On Sun, Mar 29, 2015 at 12:05 PM, folkert folk...@vanheusden.com wrote:

 Ah!
 But how do you make sure the numbers are uniformly distributed?

 On Sun, Mar 29, 2015 at 05:58:56PM +0800, remco.bloe...@singularityu.org
 wrote:
  I switched to SFMT [0].  But that was some years ago, there might be
 faster options now.
 
  I also generated it in megabyte batches and consume it from there,
 generating a new megabyte as needed.
 
  Lastly, I had some code to make sure I did not consume more bits of
 entropy than required.  Two uniform choices,  one bit.  Three choices:
 fractional bits.
 
  [0]
 
  http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/
 
  ??? Remco
 
  -Original Message-
  From: folkert folk...@vanheusden.com
  To: computer-go@computer-go.org
  Sent: Sun, 29 Mar 2015 17:50
  Subject: [Computer-go] fast + good RNG
 
  Hi,
 
  I measured that std::mt19937_64 (the mersenne twister from the standard
  c++ libraries) uses about 40% cpu time during playouts.
 
  So I wonder: is there a faster prng while still generating good enough
  random?
 
 
  Folkert van Heusden
 
  --
  Nagios user? Check out CoffeeSaint - the versatile Nagios status
  viewer! http://www.vanheusden.com/java/CoffeeSaint/
  --
  Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
  ___
  Computer-go mailing list
  Computer-go@computer-go.org
  http://computer-go.org/mailman/listinfo/computer-go

  ___
  Computer-go mailing list
  Computer-go@computer-go.org
  http://computer-go.org/mailman/listinfo/computer-go



 Folkert van Heusden

 --
 MultiTail er et flexible tool for å kontrolere Logfiles og commandoer.
 Med filtrer, farger, sammenføringer, forskeliger ansikter etc.
 http://www.vanheusden.com/multitail/
 --
 Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] monte carlo search; all valid moves?

2015-03-28 Thread Álvaro Begué
I am not sure I understand the question. The only thing that is typically
not checked in the playouts is superko. What other validity checks are
you performing?

Álvaro.



On Sat, Mar 28, 2015 at 9:54 AM, holger krekel hol...@merlinux.eu wrote:

 On Sat, Mar 28, 2015 at 08:51 +0100, folkert wrote:
  Hi,
 
  For a monte carlo search, are only valid moves performed? Or does it
  work from beginning to the end of a playout using whatever free position
  is available on the board?

 I am also interested in the question.

  Because I read here that people can do 25k playouts per second while my
  program can only do ~ 20 per second when doing full validity checks on
  all steps.

 Do you have a reference, some context for the 25K playouts?

 holger


 
  Folkert van Heusden
 
  --
  --
  Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
  ___
  Computer-go mailing list
  Computer-go@computer-go.org
  http://computer-go.org/mailman/listinfo/computer-go

 --
 about me:http://holgerkrekel.net/about-me/
 contracting: http://merlinux.eu
 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Representing Komi for neural network

2015-03-20 Thread Álvaro Begué
On Fri, Mar 20, 2015 at 8:24 PM, Hugh Perkins hughperk...@gmail.com wrote:

 On 1/12/15, Álvaro Begué alvaro.be...@gmail.com wrote:
  A CNN that starts with a board and returns a single number will typically
  have a few fully-connected layers at the end. You could make the komi an
  extra input in the first one of those layers, or perhaps in each of them.

 That's an interesting idea.  But then, the komi wont really
 participate in the hierarchical representation we are hoping that the
 network will build, that I suppose we are hoping is the key to
 obtaining human-comparable results?


I don't see why komi needs to participate in the hierarchical
representation at all. The representation is supposed to learn higher-level
notions like good shape, life and death, territory... The effect of komi
can easily be incorporated into the mix at a later stage, since it has no
bearing on what's good shape, what's alive or dead or what constitutes
territory.



On Fri, Mar 20, 2015 at 8:24 PM, Hugh Perkins hughperk...@gmail.com wrote:

 On 1/12/15, Álvaro Begué alvaro.be...@gmail.com wrote:
  A CNN that starts with a board and returns a single number will typically
  have a few fully-connected layers at the end. You could make the komi an
  extra input in the first one of those layers, or perhaps in each of them.

 That's an interesting idea.  But then, the komi wont really
 participate in the hierarchical representation we are hoping that the
 network will build, that I suppose we are hoping is the key to
 obtaining human-comparable results?

 But on the other hand, in the general case, where we want to give a
 variety of inputs to the computer, eg a map, and an x/y position, has
 anyone come up with a clean, effective way of combining these inputs
 into the net?  I dont recall seeing any such attempt/paper?
 - if we feed the map into a conv net, and the x/y pos into the fc
 layers, it seems like the x/y pos wont really participate in any
 hierarchical representation?
 - if we have 100 conv input planes for each possible value of x, and
 another 100 for each possible value of y, seems like overkill ... ?
 - feeding reals into neural nets, which have layered activation
 functions, empirically doesnt work well, and logically doesnt sound
 like it should work that well
 - contemplating just feeding them in as visual representations of the
 number, printed each on a single plane :-D

 Are there some papers/research/approaches in the area of combining
 non-image inputs into convnets, in such a way that the non-image
 inputs participate in the hierarchical structure, and at the same
 without creating hundreds of input planes, for each single natural
 input, which planes might contain only 5-10 bits of actual
 information?
 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Fwd: Teaching Deep Convolutional Neural Networks to Play Go

2015-03-16 Thread Álvaro Begué
The human brain is not the most powerful AI, because it fails the A test.

I suspect bootstrapping is not very hard. I have recently written a Spanish
checkers program starting with no knowledge and I got it to play top-human
level checkers within a few weeks.

You can build a database of games as you go. The important thing is that
the games don't have to be played perfectly: They just need to be
significantly better than your current model, so you can tweak the model to
learn from them. In the case of checkers the model is the evaluation
function, for which I used a small neural network. In the case of go the
model could be an evaluation function or it could be something that assigns
probabilities to moves. In either case, a database with hundreds of
thousands of games played by the current version of the program is very
useful to refine the model. A few iterations of (1. build database using
current model, 2. train model using database) should figure out how to play
the game well without any human knowledge.

Álvaro.



On Mon, Mar 16, 2015 at 4:52 AM, Darren Cook dar...@dcook.org wrote:

  To be honest, what I really want is for it to self-learn,...

 I wonder if even the world's most powerful AI (i.e. the human brain)
 could self-learn go to, say, strong dan level? I.e. Give a boy genius a
 go board, the rules, and two years, but don't give him any books, hints,
 or the chance to play against anyone who has had access to books/teaching.

 Darren

 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] public valid move generator algorithms?

2015-03-15 Thread Álvaro Begué
You can keep track of pseudo-liberties, where you count a liberty
multiple times if it is adjacent to multiple stones in a chain. That seems
to be the easiest way to implement it, although a serious program will
eventually need the actual liberty count, so perhaps you should just do
that from the beginning.

Álvaro.


On Sunday, March 15, 2015, holger krekel hol...@merlinux.eu wrote:

 Hi Petr,

 On Sun, Mar 15, 2015 at 19:08 +0100, Petr Baudis wrote:
Hi!
 
  On Sun, Mar 15, 2015 at 03:18:46PM +, holger krekel wrote:
   could anyone point me to a current good public source algorithm
   for generating valid moves from a given Go board position?
  
   All else failing, i am going to analyze Gnugo first (board.c mostly)
   but wanted to ask for hints here first.  I am mostly interested in
   readability of the algorithm, not speed, at this point.  So high-level
   language implementations are fine.
 
Well, any open source Go program should be able to do the job,
  in principle.  But it's also a little unclear for me exactly what
  are you asking for.  In principle, the algorithm is
 
for point in board.empty_points():
if board.play(point) is ok:
# Not a suicide or ko violation
yield point
 
  ...but I guess you are asking for something more/else?

 I am particularly interested in the implementation of the comment
 (suicide).
 It seems you need to track stone strings and their liberties to properly
 determine if something is a suicide move.  Determining Ko-status is
 comparatively easy (superko or avoiding board repetetitions is a
 different matter).

(If you are interested in Python, there are various Go implementations
  in it.  I'm working on
 https://github.com/pasky/michi/blob/master/michi.py)

 thanks for the link!

 holger

 --
 about me:http://holgerkrekel.net/about-me/
 contracting: http://merlinux.eu
 ___
 Computer-go mailing list
 Computer-go@computer-go.org javascript:;
 http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Tromp Taylor rules http://senseis.xmp.net/?LogicalRules

2015-03-11 Thread Álvaro Begué
Ko is not missing: It is a particular case of the prohibition to repeat
positions. Making suicide illegal is an easy patch.

Álvaro.



On Wed, Mar 11, 2015 at 7:08 AM, folkert folk...@vanheusden.com wrote:

 Hi,

 After 3 years of not working on my Go software, I decided to tinker
 again a bit on it.
 First thing I noticed is that it is very slow. I have a feeling that my
 implementation of the rules is way too complex. I did it all from
 scratch and as I never played a game of Go before, it may have a clumsy
 design.
 So I'm considering rewriting things.
 I read back in the archives of this mailinglist and I read about the
 Tromp/Taylor rules. If I implement those, will I really have a program
 that plays valid Go and can participate in CGOS maybe even KGS one day?
 Because things like KO and suicide etc are missing?


 thanks

 Folkert van Heusden

 --
 You've probably gotten really fed up with never winning in the Mega-
 Millions lottery. Well, weep no longer: www.smartwinning.info tells
 you everything that might help you deciding what numbers to choose.
 With nice graphs and pretty animations!
 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] CNN for winrate and territory

2015-02-08 Thread Álvaro Begué
What network architecture did you use? Can you give us some details?



On Sun, Feb 8, 2015 at 5:22 AM, Detlef Schmicker d...@physik.de wrote:

  Hi,

 I am working on a CNN for winrate and territory:

 approach:
  - input 2 layers for b and w stones
  - 1. output: 1 layer territory (0.0 for owned by white, 1.0 for owned by
 black (because I missed TANH in the first place I used SIGMOID))
  - 2. output: label for -60 to +60 territory leading by black
 the loss of both outputs is trained

 the idea is, that this way I do not have to put komi into input and make
 the winrate from the statistics of the trained label:

 e.g. komi 6.5: I sum the probabilites from +7 to +60 and get something
 like a winrate

 I trained with 80 positions with territory information through 500
 playouts from oakfoam, which I symmetrized by the 8  transformation leading
 to 600 positions. (It is expensive to produce the positions due to the
 playouts)

 The layers are the same as the large network from Christopher Clark
 http://arxiv.org/find/cs/1/au:+Clark_C/0/1/0/all/0/1, Amos Storkey
 http://arxiv.org/find/cs/1/au:+Storkey_A/0/1/0/all/0/1 :
 http://arxiv.org/abs/1412.3409


 I get reasonable territory predictions from this network (compared to 500
 playouts of oakfoam), the winrates seems to be overestimated. But anyway,
 it looks as it is worth to do some more work on it.

 The idea is, I can do the equivalent of lets say 1000 playouts with a call
 to the CNN for the cost of 2 playouts some time...


 Now I try to do a soft turnover from conventional playouts to CNN
 predicted winrates within the framework of MC.

 I do have some ideas, but I am not happy with them.

 Maybe you have better ones :)


 Thanks a lot

 Detlef


 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Representing Komi for neural network

2015-01-11 Thread Álvaro Begué
A CNN that starts with a board and returns a single number will typically
have a few fully-connected layers at the end. You could make the komi an
extra input in the first one of those layers, or perhaps in each of them.

Álvaro.



On Sun, Jan 11, 2015 at 10:59 AM, Detlef Schmicker d...@physik.de wrote:

 Hi,

 I am planing to play around a little with CNN for learning who is leading
 in a board position.

 What would you suggest to represent the komi?

 I would try an additional layer with every point having the value of komi.

 Any better suggestions:)


 By the way:
 Todays bot tournament nicego19n (oakfoam) played with a CNN for move
 prediction.
 It was mixed into the original gamma with some quickly optimized parameter
 leading to 100ELO improvement for selfplay with 2000 playouts/move. I used
 the Clark and Storkey Network, but with no additional features (only a
 black and a white layer). I trained it on 6 kgs games and reached about
 41% prediction rate. I have no delayed evaluation, as I evaluate no
 mini-batch but only one position taking about 1.6ms on the GTX-970. A
 little delay might happen anyway, as only one evaluation is done at once
 and other threads might go on playing while one thread is doing CNN. We
 have quite slow playouts anyway, so I had around 7 playouts/move during
 the game.

 If you want to get an impression, how such a bot plays, have a look at the
 games :)

 Detlef
 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Move Evaluation in Go Using Deep Convolutional NeuralNetworks

2015-01-09 Thread Álvaro Begué
Yes, it's 0.15 seconds for 128 positions.

A minibatch is a small set of samples that is used to compute an
approximation to the gradient before you take a step of gradient descent. I
think it's not simply called a batch because batch training refers to
computing the full gradient with all the samples before you take a step of
gradient descent. Minibatch is standard terminology in the NN community.

Álvaro.





On Fri, Jan 9, 2015 at 6:04 PM, Darren Cook dar...@dcook.org wrote:

 Aja wrote:
  I hope you enjoy our work. Comments and questions are welcome.

 I've just been catching up on the last few weeks, and its papers. Very
 interesting :-)

 I think Hiroshi's questions got missed?

 Hiroshi Yamashita asked on 2014-12-20:
  I have three questions.
 
  I don't understand minibatch. Does CNN need 0.15sec for a positon, or
  0.15sec for 128 positions?

 I also wasn't sure what minibatch meant. Why not just say batch?

  Is KGS rank set 9 dan when it plays against Fuego?

 For me, the improvement from just using a subset of the training data
 was one of the most surprising results.

 Darren


 --
 Darren Cook, Software Researcher/Developer
 My new book: Data Push Apps with HTML5 SSE
 Published by O'Reilly: (ask me for a discount code!)
   http://shop.oreilly.com/product/0636920030928.do
 Also on Amazon and at all good booksellers!
 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Move Evaluation in Go Using Deep Convolutional Neural Networks

2014-12-25 Thread Álvaro Begué
No, you don't need double precision at all.

Álvaro.


On Thu, Dec 25, 2014 at 5:00 AM, Detlef Schmicker d...@physik.de wrote:

 Hi,

 as I want to by graphic card for CNN: do I need double precision
 performance? I give caffe (http://caffe.berkeleyvision.org/) a try, and
 as far as I understood most is done in single precision?!

 You get comparable single precision performance NVIDA (as caffe uses
 CUDA I look for NVIDA) for about 340$ but the double precision
 performance is 10x smaller than the 1000$ cards

 thanks a lot

 Detlef

 Am Mittwoch, den 24.12.2014, 12:14 +0800 schrieb hughperkins2:
  Whilst its technically true that you can use an nn with one hidden
  layer to learn the same function as a deeper net, you might need a
  combinatorally large number of nodes :-)
 
 
  scaling learning algorithms towards ai, by bengio and lecunn, 2007,
  makes a convincing case along these lines.
 
 
 
  ___
  Computer-go mailing list
  Computer-go@computer-go.org
  http://computer-go.org/mailman/listinfo/computer-go


 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Move Evaluation in Go Using Deep Convolutional Neural Networks

2014-12-25 Thread Álvaro Begué
You are going to be computing gradients of functions, and most people find
it easier to think about these things using a type that roughly corresponds
to the notion of real number. You can use a fixed-point representation of
reals, which uses ints in the end, but then you have to worry about what
scale to use, so you get enough precision but you don't run the risk of
overflowing.

The only reason I might consider a fixed-point representation is to achieve
reproducibility of results.




On Thu, Dec 25, 2014 at 5:44 AM, hughperkins2 hughperki...@gmail.com
wrote:

  as I want to by graphic card for CNN: do I need double precision
 performance?

 Personally, i was thinking of experimenting with ints, bytes, and shorts,
 even less precise than singles :-)

 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Move Evaluation in Go Using Deep Convolutional Neural Networks

2014-12-25 Thread Álvaro Begué
This is my guess as to what the number of parameters actually is:
First layer: 128 * (5*5*36 + 19*19) (128 filters of size 5x5 on 36 layers
of input, position-dependent biases)
11 hidden layers: 11 * 128 * (3*3*128 + 19*19) (128 filters of size 3x3 on
128 layers of input, position-dependent biases)
Final layer: 2 *(3*3*128 + 19*19) (2 filters of size 3x3 on 128 layers of
input, position-dependent biases)

Total number of parameters: 2294738

Did I get that right?

I have the same question about the use of symmetry as Hugh.

Álvaro.


On Thu, Dec 25, 2014 at 8:49 PM, Hugh Perkins hughperk...@gmail.com wrote:

 Hi Aja,

 Couple of questions:

 1. connectivity, number of parameters

 Just to check, each filter connects to all the feature maps below it,
 is that right?  I tried to check that by ball-park estimating number
 of parameters in that case, and comparing to the section paragraph in
 your section 4.  And that seems to support that hypothesis.  But
 actually my estimate is for some reason under-estimating the number of
 parameters, by about 20%:

 Estimated total number of parameters
 approx = 12 layers * 128 filters * 128 previous featuremaps * 3 * 3
 filtersize
 = 1.8 million

 But you say 2.3 million.  It's similar, so seems feature maps are
 fully connected to lower level feature maps, but I'm not sure where
 the extra 500,000 parameters should come from?

 2. Symmetry

 Aja, you say in section 5.1 that adding symmetry does not modify the
 accuracy, neither higher or lower.  Since adding symmetry presumably
 reduces the number of weights, and therefore increases learning speed,
 why did you thus decide not to implement symmetry?

 Hugh
 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Building a database for training CNNs

2014-12-20 Thread Álvaro Begué
Hi,

There are things a CNN could probably do well, if only we had the right
database to train it. I have in mind these two possibilities:
 * using a CNN as an evaluation function,
 * using a CNN to estimate ownership for each point (i.e., a number
between -1 and 1 that is an estimate of who is going to end up scoring it).

So we need a large set of positions labelled with a final score for the
game and who ended up scoring each point.

I believe the right database to use for this purpose would consist of
positions from games played by strong computer opponents which play to
maximize score and which play to the bitter end, passing only when the
opponent has no dead stones left on the board.

I would like to know if you think this would be an interesting resource to
have, if you have any recommendations on what engine(s) to use and if you
would be willing to collaborate in creating it. Any other comments are
welcome too, of course.

Cheers,
Álvaro.
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Move Evaluation in Go Using Deep Convolutional Neural Networks

2014-12-20 Thread Álvaro Begué
If you start with a 19x19 grid and you take convolutional filters of size
5x5 (as an example), you'll end up with a board of size 15x15, because a
5x5 box can be placed inside a 19x19 board in 15x15 different locations. We
can get 19x19 outputs if we allow the 5x5 box to be centered on any point,
but then you need to do multiply by values outside of the original 19x19
board. Zero-padding just means you'll use 0 as the value coming from
outside the board. You can either prepare a 23x23 matrix with two rows of
zeros along the edges, or you can just keep the 19x19 input and do your
math carefully so terms outside the board are ignored.



On Sat, Dec 20, 2014 at 12:01 PM, Detlef Schmicker d...@physik.de wrote:

 Hi,

 I am still fighting with the NN slang, but why do you zero-padd the
 output (page 3: 4 Architecture  Training)?

 From all I read up to now, most are zero-padding the input to make the
 output fit 19x19?!

 Thanks for the great work

 Detlef

 Am Freitag, den 19.12.2014, 23:17 + schrieb Aja Huang:
  Hi all,
 
 
  We've just submitted our paper to ICLR. We made the draft available at
  http://www.cs.toronto.edu/~cmaddis/pubs/deepgo.pdf
 
 
 
  I hope you enjoy our work. Comments and questions are welcome.
 
 
  Regards,
  Aja
  ___
  Computer-go mailing list
  Computer-go@computer-go.org
  http://computer-go.org/mailman/listinfo/computer-go


 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [computer-go] More details about MM + source code

2009-12-13 Thread Álvaro Begué
2009/12/13 Petr Baudis pa...@ucw.cz:
 On Sun, Dec 13, 2009 at 12:15:42PM +0100, Rémi Coulom wrote:
 If I understand correctly, you are refering to that loop:

 for (int i = Max; --i = Min;)

 It starts at Max - 1, not Max.

 I am pretty confident this part is correct. Otherwise, it would have
 generated obvious errors.

 Oh, I'm sorry, you are right, it's obvious when I look at it after good
 night sleep. ;-)


A few years ago I could have written that loop the same way, but now I
think this is significantly easier to read:

for (int i = Max-1; i=Min; --i)

My job involves being called at 3 o'clock in the morning because a
program is doing the wrong thing, and this has made me develop an
enormous appreciation for clarity. I prefer code that is still obvious
when you haven't had a good night sleep.


Álvaro.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Kinds of Zobrist hashes

2009-12-09 Thread Álvaro Begué
2009/12/9 Christian Nentwich christ...@modeltwozero.com:
 On Tue, Dec 08, 2009 at 09:30:47PM -0500, Eric Boesch wrote:

  You can mathematically prove the two systems are almost the same, so
  there's no need to test.

  Yes, this was my line of thought, but I wasn't sure if I'm not missing
 anything...


 If you ever decide to test which is faster, please post the results, I'm
 curious about how expensive the branch prediction miss is when using two
 values :-)

I don't think there is any branching involved. When you place a stone,
you add zobrist_table[point][color]. When you remove it, you subtract
it. That's all you need to do. If you had a value for empty, you would
have to add and subtract
zobrist_table[point][color]-zobrist_table[point][empty]. Nothing else
changes.

Álvaro.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


  1   2   3   >