from:"Eric Boesch"

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Eric Boesch

I could be drawing wrong inferences from incomplete information, but as
Darren pointed out, this paper does leave the impression Alpha Zero is not
as strong as the real AlphaGo Zero, in which case it would be clearer to
say so explicitly. Of course the chess and shogi results are impressive
regardless. (In chess, the 28/100 wins is good, but 0 losses is even
better. Entering a drawn sequence starting from an inferior position --
such as playing black -- is a desirable result for even a perfect program
without contempt, so failing to win as black is not a good indicator of
strength.)

Comparing the Elo charts in this new paper and the Nature paper on AlphaGo
Zero, and assigning AlphaGo Lee a reference rating of 0 Elo, it appears
that the order in strength of go play is Alpha Zero (~900 Elo), AlphaGo
Master (~1400 Elo), then the full-strength AlphaGo Zero (~1500 Elo).

I would also think Alpha Zero's 8 hours of training with the help of an
immense network of 5,000 first generation TPUs is more expensive, and only
faster in a strictly chronological sense, than AlphaGo Zero 20-block
3-day's training with 4 second generation TPUs.


On Wed, Dec 6, 2017 at 4:29 PM, Brian Sheppard via Computer-go <
computer-go@computer-go.org> wrote:

> The chess result is 64-36: a 100 rating point edge! I think the Stockfish
> open source project improved Stockfish by ~20 rating points in the last
> year. Given the number of people/computers involved, Stockfish’s annual
> effort level seems comparable to the AZ effort.
>
>
>
> Stockfish is really, really tweaked out to do exactly what it does. It is
> very hard to improve anything about Stockfish. To be clear: I am not
> disparaging the code or people or project in any way. The code is great,
> people are great, project is great. It is really easy to work on Stockfish,
> but very hard to make progress given the extraordinarily fine balance of
> resources that already exists.  I tried hard for about 6 months last year
> without any successes. I tried dozens (maybe 100?) experiments, including
> several that were motivated by automated tuning or automated searching for
> opportunities. No luck.
>
>
>
> AZ would dominate the current TCEC. Stockfish didn’t lose a game in the
> semi-final, failing to make the final because of too many draws against the
> weaker players.
>
>
>
> The Stockfish team will have some self-examination going forward for sure.
> I wonder what they will decide to do.
>
>
>
> I hope this isn’t the last we see of these DeepMind programs.
>
>
>
> *From:* Computer-go [mailto:computer-go-boun...@computer-go.org] *On
> Behalf Of *Richard Lorentz
> *Sent:* Wednesday, December 6, 2017 12:50 PM
> *To:* computer-go@computer-go.org
> *Subject:* Re: [Computer-go] Mastering Chess and Shogi by Self-Play with
> a General Reinforcement Learning Algorithm
>
>
>
> One chess result stood out for me, namely, just how much easier it was for
> AlphaZero to win with white (25 wins, 25 draws, 0 losses) rather than with
> black (3 wins, 47 draws, 0 losses).
>
> Maybe we should not give up on the idea of White to play and win in chess!
>
> On 12/06/2017 01:24 AM, Hiroshi Yamashita wrote:
>
> Hi,
>
> DeepMind makes strongest Chess and Shogi programs with AlphaGo Zero
> method.
>
> Mastering Chess and Shogi by Self-Play with a General Reinforcement
> Learning Algorithm
> https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.
> org_pdf_1712.01815.pdf=DwIGaQ=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-
> j14fFwh71U=i0hg-cKH69CA5MsdosvezQ=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlN
> u40BOCWfsO-gQ=dsola-9J77ArHVeuVc0ZCZKn2nJOsjfsnJzPc_MdPDo=
>
> AlphaZero(Chess) outperformed Stockfish after 4 hours,
> AlphaZero(Shogi) outperformed elmo after 2 hours.
>
> Search is MCTS.
> AlphaZero(Chess) searches 80,000 positions/sec.
> Stockfishsearches 70,000,000 positions/sec.
> AlphaZero(Shogi) searches 40,000 positions/sec.
> elmo searches 35,000,000 positions/sec.
>
> Thanks,
> Hiroshi Yamashita
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__
> computer-2Dgo.org_mailman_listinfo_computer-2Dgo=DwIGaQ=Oo8bPJf7k7r_
> cPTz1JF7vEiFxvFRfQtp-j14fFwh71U=i0hg-cKH69CA5MsdosvezQ=
> w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ=
> Dflm7ezefzMJ9xLNmNYrSQKWa7qvG9FkzlCHngo_NcY=
>
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Learning related stuff

2017-11-29 Thread Eric Boesch

Could you be reading too much into my comment? AlphaGo Zero is an amazing
achievement, and I might guess its programmers will succeed in applying
their methods to other fields. Nonetheless, I thought it was interesting,
and it would appear the programmers did too, that before improving to
superhuman level, AlphaGo was temporarily stuck in a rut of playing
literally the worst first move on the board (excluding pass). That doesn't
mean I think I could do better.


On Tue, Nov 28, 2017 at 4:50 AM, uurtamo . <uurt...@gmail.com> wrote:

> This is starting to feel like asking along the lines of, "how can I
> explain this to myself or improve on what's already been done in a way that
> will make this whole process work faster on my hardware".
>
> It really doesn't look like there are a bunch of obvious shortcuts. That's
> the whole point of decision-trees imposed by humans for 20+ years on the
> game; it wasn't really better.
>
> Probably what would be good to convince oneself of these things would be
> to challenge each assumption in divergent branches (suggested earlier) and
> watch the resulting players' strength over time. Yes, this might take a
> year or more on your hardware.
>
> I feel like maybe a lot of this is sour grapes; let's  please again
> acknowledge that the hobbyists aren't there yet without trying to tear down
> the accomplishments of others.
>
> s.
>
> On Nov 27, 2017 7:36 PM, "Eric Boesch" <ericboe...@gmail.com> wrote:
>
>> I imagine implementation determines whether transferred knowledge is
>> helpful. It's like asking whether forgetting is a problem -- it often is,
>> but evidently not for AlphaGo Zero.
>>
>> One crude way to encourage stability is to include an explicit or
>> implicit age parameter that forces the program to perform smaller
>> modifications to its state during later stages. If the parameters you copy
>> from problem A to problem B also include that age parameter, so the network
>> acts old even though it is faced with a new problem, then its initial
>> exploration may be inefficient. For an MCTS based example, if a MCTS node
>> is initialized to a 10877-6771 win/loss record based on evaluations under
>> slightly different game rules, then with a naive implementation, even if
>> the program discovers the right refutation under the new rules right away,
>> it would still need to revisit that node thousands of times to convince
>> itself the node is now probably a losing position.
>>
>> But unlearning bad plans in a reasonable time frame is already a feature
>> you need from a good learning algorithm. Even AlphaGo almost fell into trap
>> states; from their paper, it appears that it stuck with 1-1 as an opening
>> move for much longer than you would expect from a program probably already
>> much better than 40 kyu. Even if it's unrealistic for Go specifically, you
>> could imagine some other game where after days of analysis, the program
>> suddenly discovers a reliable trick that adds one point for white to every
>> single game. The effect would be the same as your komi change -- a mature
>> network now needs to adapt to a general shift in the final score. So the
>> task of adapting to handle similar games may be similar to the task of
>> adapting to analysis reversals within a single game, and improvements to
>> one could lead to improvements to the other.
>>
>>
>>
>> On Fri, Nov 24, 2017 at 7:54 AM, Stephan K <stephan.ku...@gmail.com>
>> wrote:
>>
>>> 2017-11-21 23:27 UTC+01:00, "Ingo Althöfer" <3-hirn-ver...@gmx.de>:
>>> > My understanding is that the AlphaGo hardware is standing
>>> > somewhere in London, idle and waitung for new action...
>>> >
>>> > Ingo.
>>>
>>> The announcement at
>>> https://deepmind.com/blog/applying-machine-learning-mammography/ seems
>>> to disagree:
>>>
>>> "Our partners in this project wanted researchers at both DeepMind and
>>> Google involved in this research so that the project could take
>>> advantage of the AI expertise in both teams, as well as Google’s
>>> supercomputing infrastructure - widely regarded as one of the best in
>>> the world, and the same global infrastructure that powered DeepMind’s
>>> victory over the world champion at the ancient game of Go."
>>> ___
>>> Computer-go mailing list
>>> Computer-go@computer-go.org
>>> http://computer-go.org/mailman/listinfo/computer-go
>>>
>>
>>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Learning related stuff

2017-11-27 Thread Eric Boesch

I imagine implementation determines whether transferred knowledge is
helpful. It's like asking whether forgetting is a problem -- it often is,
but evidently not for AlphaGo Zero.

One crude way to encourage stability is to include an explicit or implicit
age parameter that forces the program to perform smaller modifications to
its state during later stages. If the parameters you copy from problem A to
problem B also include that age parameter, so the network acts old even
though it is faced with a new problem, then its initial exploration may be
inefficient. For an MCTS based example, if a MCTS node is initialized to a
10877-6771 win/loss record based on evaluations under slightly different
game rules, then with a naive implementation, even if the program discovers
the right refutation under the new rules right away, it would still need to
revisit that node thousands of times to convince itself the node is now
probably a losing position.

But unlearning bad plans in a reasonable time frame is already a feature
you need from a good learning algorithm. Even AlphaGo almost fell into trap
states; from their paper, it appears that it stuck with 1-1 as an opening
move for much longer than you would expect from a program probably already
much better than 40 kyu. Even if it's unrealistic for Go specifically, you
could imagine some other game where after days of analysis, the program
suddenly discovers a reliable trick that adds one point for white to every
single game. The effect would be the same as your komi change -- a mature
network now needs to adapt to a general shift in the final score. So the
task of adapting to handle similar games may be similar to the task of
adapting to analysis reversals within a single game, and improvements to
one could lead to improvements to the other.

On Fri, Nov 24, 2017 at 7:54 AM, Stephan K  wrote:

> 2017-11-21 23:27 UTC+01:00, "Ingo Althöfer" <3-hirn-ver...@gmx.de>:
> > My understanding is that the AlphaGo hardware is standing
> > somewhere in London, idle and waitung for new action...
> >
> > Ingo.
>
> The announcement at
> https://deepmind.com/blog/applying-machine-learning-mammography/ seems
> to disagree:
>
> "Our partners in this project wanted researchers at both DeepMind and
> Google involved in this research so that the project could take
> advantage of the AI expertise in both teams, as well as Google’s
> supercomputing infrastructure - widely regarded as one of the best in
> the world, and the same global infrastructure that powered DeepMind’s
> victory over the world champion at the ancient game of Go."
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [computer-go] Kinds of Zobrist hashes

2009-12-08 Thread Eric Boesch

On Tue, Dec 8, 2009 at 5:49 PM, Petr Baudis pa...@ucw.cz wrote:
  Hi!

  In most papers I've read, three-valued Zobrist hashes are used - per
 intersection, values for empty, black and white coloring [1].
 I'm not clear on why is the empty value needed, however, shouldn't
 only black, white values work just fine? Is the hash behaving better
 with extra values for empty intersections? Has anyone measured it?

You can mathematically prove the two systems are almost the same, so
there's no need to test. For simplicity, suppose you're doing a
3-color hash of just hashing two intersections, and your Zobrist
values are b1, w1, e1, b2, w2, and e2 (where e.g. e1 means the first
intersection is empty and b2 means the second intersection is black).
This is equiavlent to using the Zobrist values b1' = (b1 xor e1), w1'
= (w1 xor e1), e1' = 0, b2' = (b2 xor e2), w2' = (w2 xor e2), e2' = 0,
and then when you're finished, xoring the result with (e1 xor e2). So
a 3-color Zobrist with one set of constants is equivalent to a 2-color
Zobrist (and zero for empty) with a different set of constants, plus a
constant. Assuming your Zobrist values are random, the random 3-color
Zobrist is equivalent to a random 2-color Zobrist (failing to hash the
empty squares) plus a random constant.


  [1] In pattern-matching, it is desirable to also use edge coloring.

  Thanks,

                                Petr Pasky Baudis
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] MPI vs Thread-safe

2009-10-31 Thread Eric Boesch

On Fri, Oct 30, 2009 at 12:50 PM, Brian Sheppard sheppar...@aol.com wrote:
 Parallelization *cannot* provide super-linear speed-up.

I don't see that at all.

 This is standard computer science stuff, true of all parallel programs and
 not just Go players. No parallel program can be better than N times a serial
 version.

 The result follows from a simulation argument. Suppose that you had a
 parallel process that performed better than N times a serial program.
 Construct a new serial program that simulates the parallel process. There is
 a contradiction.

 Technically, this argument only establishes the fact up to multiplicative
 constant. But in the case of parallel Go players, I cannot accept that
 simulating a parallel process using serial code would result in a slowdown.
 (If anything, serialization would be faster.)

At the risk of belaboring the obvious, extra memory associated with
each processor or node (cache, main memory, hard disk, whatever) is
one reason adding nodes can in practice give a speedup greater than
the increase in processing power. It may not be a typical result, but
it's not rare, either. It's like when you have two processes that hum
along when each given their own machine, but when you try to run them
both on the same machine, the system thrashes and runs ten times
slower.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] First ever win of a computer against a pro 9P as black (game of Go, 9x9).

2009-10-28 Thread Eric Boesch

Am I remembering correctly (maybe not) that Mogo communicates between
nodes three times per second? That isn't a lot of communication
opportunities if each turn lasts a few seconds. Olivier, have you
tested parallel Mogo's ability to scale with core count at blitz
speeds? I might imagine, for example, playing a series against itself
with pondering turned off and one side playing blitz with 100 cores,
and the other side playing with 10 cores each given 5 times as much
thinking time.

As others have said, congratulations...
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Complicated seki with Ko

2009-06-30 Thread Eric Boesch

Not sure this helps, but...

I found send-two receive-one, as in your example, to be the most
common non-simple ko cycle in my program. The program was heavy, so
forbidding send two added little to the total cost. Specifically,
the rule forbade playing on an isolated empty intersection that
neighbored a lone friendly stone that had started with two liberties.
With light playouts, your mileage will probably vary.

The reason I bothered with the prohibition is that I didn't properly
handle graph-history interaction. Forbidding send-two greatly reduced
GHI problems (simple ko was handled by counting the ko ban point, if
any, as part of the position). In my program, not only did send-two
receive-one outnumber all other long cycles combined four to one, but
it happened that even when the longer cycles did occur, they rarely
led to erroneous output, which was not the case for send-two
receive-one.

Double ko and triple ko seem more prominent in real games because they
affect best play lines. Unbalanced ko cycles are usually very stupid
plays for one side or the other, so people don't think about them very
much. For a program dumb enough to be trapped by it, send-two
receive-one can be important.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] April KGS bot tournament: results

2009-04-10 Thread Eric Boesch

On Mon, Apr 6, 2009 at 11:32 AM, Jason House
jason.james.ho...@gmail.com wrote:
 I disagree with your comment about AyaMC's move 28 in round 4. The move
 looks to me like it primarilly aims to build a large framework along the
 bottom/center. All basic fights seem to favor white..

 I'm only KGS 3k, so take my comments with a grain of salt...

(Idle kibitzing, no real go programming content...)

Yeah, I'm only KGS 1k so I'm not sure either, but I kind of agree. I
think that at this point killing the stones would have been smaller
than Aya's move 28, but maybe locally someplace else, like the spot
immediately down and right of the chosen move would have been even
better -- or not. I'm positive 28 is either a bad half-measure all
around (to quote Lessons in the Fundamentals of Go) or a good
multi-purpose move, I just don't understand the tactics well enough to
confidently say which one it is. Much easier to say the opening was
unusual but decent for kyu level and to point to the obvious tactical
mistakes that came later, like the broken ladder or the Oh move
where Aya connected two stones to the killable top group instead of
saving the group (both sides could be criticized there, one for
bothering to attack two unimportant stones and the other for bothering
to defend them, when the life of the whole group was unsettled --
presumably this is the old seki-reading problem that many programs
have).
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] CGT approximating the sum of subgames

2009-02-18 Thread Eric Boesch

On Tue, Feb 17, 2009 at 4:39 PM,  dave.de...@planet.nl wrote:
 I've been looking into CGT lately and I stumbled on some articles about
 approximating strategies for determining the sum of subgames (Thermostrat,
 MixedStrat, HotStrat etc.)
 It is not clear to me why approximating strategies are needed. What is the
 problem? Is Ko the problem? Is an exact computation too slow?
 Can anyone shed some light on this?

I'm sure someone else could give a better answer, but it does come
down to slowness. Same thing as assuming the value of a gote move
equals the position value if black moves first averaged with its value
if white moves first, and the game score equals original score plus
half the value of the largest move on the board -- these assumptions
are wrong, and they estimates are not guaranteed to yield either the
correct score or the biggest play on the board, but you have to do
something if you can't perfectly read out the rest of the game. If CGT
values were all nice real numbers that summed normally when combining
subgames, then CGT would be a lot simpler.

The possibility of wanting to play a ko threat in another part of the
board prevents the two areas from being separate subgames in the
technical sense -- the information sets are shared.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Human Learning against MoGo

2009-02-16 Thread Eric Boesch

An amateur 5D also beat Mogo with 3 handicap. I would love to see more
serious games between top programs and roughly evenly matched human
opponents.


On Sun, Feb 15, 2009 at 8:08 AM, Ingo Althöfer 3-hirn-ver...@gmx.de wrote:
 Hello,

 During the last week (February 10 - 13, 2009)
 there were several exhibition games between
 program MoGo and some professional go players
 from Taiwan (Jun-Xun Zhou 9p; Li-Chen Chien
 (12 years old) 1p; Shih Chin 2p).
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] 7.5-komi for 9x9 in Beijing

2008-10-08 Thread Eric Boesch

On Wed, Oct 8, 2008 at 3:46 PM, Don Dailey [EMAIL PROTECTED] wrote:
 Is there any way to prove that with best play the game cannot end in
 seki?

It seems like most reasonable sequences in Chinese rules 4x4 go end in
a whole-board seki. I would expect that for 19x19 go, some avenues of
best play can also lead to seki, because the perfect-play game tree is
likely to be immense, and seki is common in 19x19 games (the larger
the board, the greater the number of groups that could possibly end in
seki). I do not have any proof, and I think a proof either way would
be extremely difficult to do. I think a proof would also be very
difficult (nearly as bad as solving the empty board) for 9x9.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Test position set for MC programs

2008-04-26 Thread Eric Boesch

Huh is right. You wrote C2 C1 and so did I, but I somehow did so
while looking at C1 C2. I'm not sure how I managed that myself.

On Sat, Apr 26, 2008 at 5:24 PM, Gunnar Farnebäck [EMAIL PROTECTED] wrote:
 Huh? A seki doesn't get much more basic than this.

   4 X X X X X . X . X 4
   3 . O O O O X X O . 3
   2 O O O . O O X . . 2
   1 . X X X O X X . . 1
A B C D E F G H J

  /Gunnar


  Eric Boesch wrote:
   Black doesn't need the lower-right corner as territory after killing
   the lower left.  After B B1 W C2 B C1 W H3 B G3, w gains about 7
   points in the lower right but has already lost like 31 in the lower
   left, and black wins by 15. Either I'm missing something obvious, or
   you overlooked that the lower left is not a seki, or something like
   that.
  
   On Sat, Apr 26, 2008 at 2:58 PM, Gunnar Farnebäck [EMAIL PROTECTED]
 wrote:
   Yamato wrote:
 I attached the fixed version to this email. Thanks for your help.
  
Another correction. In 119 black has a serious weakness on the right.
Is there any way for black to win after B B1, W C2, B C1, W H3?

  ___
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] 19x19 MC improvement

2008-01-26 Thread Eric Boesch

On Jan 23, 2008 7:39 PM, Jason House [EMAIL PROTECTED] wrote:
 On Wed, 2008-01-23 at 18:57 -0500, Eric Boesch wrote:
  I am curious if any of those of you who have heavy-playout programs
  would find a benefit from the following modification:
 
 exp_param = sqrt(0.2); // sqrt(2) times the original parameter value.
 uct = exp_param * sqrt( log(sum of all children playout)
 * (child-win-rate-2) /
   (number of child playout) );
 uct_value = (child winning rate) + uct;
 
  where child-win-rate-2 is defined as
 
  (#wins + 1) / (#wins + #losses + 2)

 I'm surprised to see that this works as listed, because the math looks
 all wrong to me...

Argh. I have to retract the claim that this helps. I didn't optimize
the libego parameters correctly before I tested it. Sorry about that
-- I thought I did. There's a lot more I could add, but I thought I'd
get that out there before anyone wasted (probably) any more time on my
error.

By the way, does anybody know of any nifty tools or heuristics for
efficient probabilistic multi-parameter optimization? In other words,
like multi-dimensional optimization, except instead of your function
returning a deterministic value, it returns the result of a Bernoulli
trial, and the heuristic uses those trial results to converge as
rapidly as possible to parameter values that roughly maximize the
success probability. The obvious approach is to cycle through all
dimensions in sequence, treating it as a one-dimensional optimization
problem -- though the best way to optimize in one dimension isn't
obvious to me either -- but just as with the deterministic version of
optimization, I assume it's possible to do better than that. It might
be fun problem to play with, but if good tools already exist then it
wouldn't be very productive for me to waste time reinventing the
broken wheel.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] 19x19 MC improvement

2008-01-23 Thread Eric Boesch

On Jan 23, 2008 3:11 AM, Hiroshi Yamashita [EMAIL PROTECTED] wrote:
 2. Change UCT exploration parameter
   exp_param = sqrt(2.0);
   uct = exp_param * sqrt( log(sum of all children playout) /
 (number of child playout) );
   uct_value = (child winning rate) + uct;

  I changed exp_param sqrt(2.0) to sqrt(0.1).

2.0 to 0.1? That's a pretty big step. I did notice that libego worked
better with lower constants than recommended by the formula -- in
libego, the original UCB1 formula corresponds to an explore_rate
coefficient of 8, but the default explore_rate coefficient of 1 seemed
to be about the best.

I am curious if any of those of you who have heavy-playout programs
would find a benefit from the following modification:

   exp_param = sqrt(0.2); // sqrt(2) times the original parameter value.
   uct = exp_param * sqrt( log(sum of all children playout)
   * (child-win-rate-2) /
 (number of child playout) );
   uct_value = (child winning rate) + uct;

where child-win-rate-2 is defined as

(#wins + 1) / (#wins + #losses + 2)

If it happens that you do the equivalent of initializing #wins to 1
and #losses to 1 (in libego, setting the initial bias to 2), then you
can just use your original (child winning rate) value. (W+1)/(W+L+2)
is the mean of a beta(W+1,L+1) random variable, which is what you get
when you start with a uniform(0,1) distribution and condition it by
the observation of a W-L record. The explore parameter inside the
square root is doubled so that when you have an average
child-win-rate-2 value of 0.5, the formula returns the same value as
before. (Using an initial bias of 2.0 seems like a good thing anyhow.)

Adding this extra term seemed to help a bit (57% +/- 4.5% win rate over
unmodified program) when the basic playouts were uniform.

This change tends to make the formula stick with proven winners more
than before, so you might need to increase the explore rate by a
little more than sqrt(2).

Your mileage may vary -- I'd also like to know if you try this change
and find it unhelpful.

The justification is that the standard deviation of a beta(1,21) is
lower than the standard deviation of a beta(11,11) variable. The error
term of a beta(21,1) is lower too, but applying the (L+1)/(W+L+2) term
to the UCB1 formula can yield nonsensical results where when two moves
have the same bias, the one with the worse win-loss record is assigned
the higher UCB1 value.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Is MC-UCT really scalable against humans?

2008-01-22 Thread Eric Boesch

On Jan 22, 2008 1:43 AM, Petri Pitkanen [EMAIL PROTECTED] wrote:
 Even top MC programs fail to see  that a group with 3 liberties with
 no eyes is dead.

A 3-liberty group with no eyes has a 100% chance to die during
playouts unless a surrounding group dies first. 100% chance to die is
as good a job of seeing deadness as a generic MC playout can ever
do.

MC playouts explicitly understand eyes at the don't fill a 1-eye in
playouts level and implicitly have a crude understanding of the 2-eye
rule. They essentially give you Benson's algorithm -- clearly an
eye-counting algorithm -- for free. Benson's-alive chains almost never
die in playouts while chains that lack eyespace and are surrounded by
Benson's-alive chains almost always do. Under-the-stones chains can
kill Benson's-alive chains during playouts,

 --   --   --
| . O O # . . .| | . O O # . . .| | . O O # # . .|
| O . O # # # #| | O . O # # # #| | O . O # # # #|
| O O O # # . .| | O O O # # . .| | O O O # # # .|
| # # # O # # #| | # # # . # # #| | # # # O # # #|
| . # . O O # .| | . # # . . # .| | a # # O . # .|
 --   --   --

but I think this scenario is so unlikely that it's not worth a second thought.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Please have your bot resign, for your own good

2008-01-02 Thread Eric Boesch

In chess, playing the game out to the bitter end can be defensible,
but in go, it isn't. The meaning of the end of the game in go is
fluid, but it's not when it's no longer possible to play a move. In
absolute time limit games, when significant per-move lag exists (which
is true in all human matches and some computer ones), it is impossible
to schedule correctly to deal with the possibility that the opponent
will continue playing for as long as possible after the game is
already over. Either you allocate too little time for the real game,
putting yourself at a disadvantage if you and your opponent play
reasonably, or you leave yourself without enough time to handle
unreasonable play. All go players know how to keep playing after the
game ends, but it's as childish and outside the spirit of the rules as
starting a no time limit game, realizing you have fallen behind, and
never returning to the board. In the eyes of most players, a time win
in a lost position well after the game has ended isn't even a
win-with-an-asterisk -- it's a dead loss, even moreso if the player
with the winning position played as quickly as lag would permit. This
is why byo yomi can actually shorten games: compared to absolute time,
it removes much of the incentive for children to keep playing after
the game is over.

All methods of compensating for net lag require some level of trust.
Even if a fraud-proof way to detect net lag existed, it would
interfere with time controls and scheduling. If it takes a second
before program B knows what move program A played, then that's a
second during which program A has better information about the game
than program B has. You can say that it hopefully averages out in the
wash, but the possibility that a program might gain an advantage from
its poor network connection is still there. Any compensation for net
lag at all means that a program that ponders gains a greater advantage
than the official time controls would suggest: imagine 1 second per
move, plus 2 seconds of net lag, in which case the programs have 1
second per move to think, plus 2+1+2 seconds between moves to ponder.
So I think Peter was right to direct his request to other programmers,
instead of asking Don to compensate for net lag.

I was going to suggest copying KGS's no time cost for a pass within a
reasonable number of seconds rule, but I see Erik just did that.
Well, I'll just second the suggestion. Unfortunately, the reasonable
number of seconds would probably have to be low, just so it doesn't
increase the game durations to unreasonable levels. One player passing
over and over while the other keeps playing dumb moves is a scenario
that is certain to occur over and over, and the only reason it's not a
bigger issue is that long delays by the passing player ought to be
infrequent.

With a little bit of go knowledge, it is possible for the server to
punish programs that play time-wasting moves, but if you can't count
on the players to be smart enough to know when they're wasting time,
that could mean assigning a loss to programs that are actually
winning. A simpler approach would be if the server just ended the game
as soon as it became statically solvable -- though exotic sekis aren't
statically solvable, so if they appeared then you'd have to fall back
on the two-passes-end-the-game rule. (In theory, superko can break
static analysis -- superko might forbid capturing a surrounded chain,
for example -- but in practice, I bet it never would.)

Finally, I'm still not aware of any go programs that keep playing
after they have obtained a dead lost position because the programmer
wants them to do it. It's just easier to write a program that doesn't
know when to resign than to write a program that does.

On this point, I disagree with Erik. We get enough stupid games by
accident that we don't need programs that play asinine games on
purpose. Peter already identified some reasons. Also, if programs
prolong the game as much as possible, it means that the next round of
ALL programs will be delayed, and the more programs you have that
deliberately refuse to resign, the greater the probability that you
will get two that, because of a failure on the part of the leader to
correctly bring the game to a close, play a match that goes on
basically forever. In that case, only a double-forfeit is sufficient
-- switching to true absolute time limits would not be good enough,
because two players in a Who can play stupid moves fastest? battle
could swamp the network and cause lag for programs playing real games.
One could even temporarily block the accounts so other programs can
continue while the programmers in question debug their broken
programs.

On Jan 2, 2008 8:22 AM, Don Dailey [EMAIL PROTECTED] wrote:
 Hi Peter,

 CGOS  doesn't count the first 1/4 second of thinking time and this could
 help a little.

 This isn't the same as Fischer time however because you are not given
 the time if it adds to your surplus.   It is designed so that if you

Re: [computer-go] rotate board

2007-12-20 Thread Eric Boesch

Taking the min of the 8 rotated and reflected values is safe enough.
Yes, the probability density will be eight times higher at the low
end, so you're left with 61 bits and change worth of collision
protection instead of 64. If that's not enough, then you can use a
bigger hash size, as has been mentioned. And since all practical hash
table sizes are far less than 2^61, let alone 2^64, I think that
(minimum hash % hash_table_size) should work fine as a key to your
hash table, while -- and this may be different from what Jason had in
mind -- the formula ((bit-reverse of mininum hash) % hash_table_size))
will, if hash_table_size is a multiple of 8, needlessly favor hash
values that are even or multiples of 4 or 8.

On Dec 20, 2007 1:33 PM, Don Dailey [EMAIL PROTECTED] wrote:
  If you are going to compute all 8 hash keys, you can just add them up
  at the end instead of picking the minimum. Wouldn't that be better?

 I think that's pretty workable.XOR is definitely wrong here.   If
 you use xor,  then the empty board would hash to the same value as the
 position after a stone (of either color) is placed on e5 as well as any
 other symmetry like this.I also think symetries like putting a black
 stone on 2 points across from each other (such as in diagonal corners)
 would create a zero hash because you have 2 sets of 4 hashes that cancel
 each other.I think addition as Álvaro suggests fixes this.

No, the problem you identified applies to addition too. There is no
100% certainty of collision, but there is a very elevated probability
of it. The eight symmetries include reflections and 180 degree
rotations, all of which have the property that s(s(p)) = p. Suppose
your symmetry transformation exchanges points a and c, and points b
and d. How does the sum of the Zobrist hashes compare for the set
{a,b} versus the set {a,d}? They will collide if

(a XOR b) + (c XOR d) = (a XOR d) + (c XOR b)

If (but not only if) ((a XOR c) AND (b XOR d)) == 0 then a collision
is guaranteed. The probability of this is closer to 2^-32 than to
2^-64.

I suggest that those who are interested follow Erik's link
(http://computer-go.org/pipermail/computer-go/2007-February/thread.html#8653),
as this is not the first or second (or probably even third) time this
issue has come up, and as people have warned several times before, it
is easy to get wrong. I vaguely remember that somebody found a safe
set of Zobrist values that allows reflections to be detected without
recomputation and without greatly increasing the collision probability
was found, but I don't remember the details. I also vaguely remember
hearing that the random values with rotated nybbles approach is
broken too.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] rotate board

2007-12-20 Thread Eric Boesch

I wrote:
 If (but not only if) ((a XOR c) AND (b XOR d)) == 0 then a collision
 is guaranteed. The probability of this is closer to 2^-32 than to
 2^-64.

Before anybody else feels the need to correct me here -- to be more
precise, the probability of collision is at least
E(0.5**binomial_variable(64, 0.5)) ~= 1/100,000,000.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Hall of fame for CGOS

2007-12-15 Thread Eric Boesch

For what it's worth, Hall of Fame can mean anything from the
Baseball Hall of Fame (serious business) to the Mullet Hall of Fame (a
total joke). To me this looks like a pretty clear misunderstanding.
Don and Hideki have both contributed usefully to the mailing list, and
it would be too bad if this incident spoiled that from either end.

I think most people who even know what CGOS is, probably know the
context in which it is used, so they're not going to take poor results
too seriously. Nobody would deny that applying static ratings to a
program that changes may not be representative, but if that is an
issue, then people can create new IDs for their 100% debugged bots so
that static ratings will give them all the credit they deserve. The
new bot would not get established immediately, but I doubt there are
that many people outside this mailing list who are even aware of the
CGOS all-time 9x9 rating list yet. As for any lower-rated versions of
the same bot, I think that as long as there exists a highly rated
version of a given program that has an established rating, people will
have enough sense to ignore lower-rated versions that may represent
experiments, buggy prototypes, and the like.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How does MC do with ladders?

2007-12-13 Thread Eric Boesch

On 12/11/07, Mark Boon [EMAIL PROTECTED] wrote:
 Question: how do MC programs perform with a long ladder on the board?

 My understandig of MC is limited but thinking about it, a crucial
 long ladder would automatically make the chances of any playout
 winning 50-50, regardless of the actual outcome of the ladder.

No, 50/50 would make too much sense. It might be higher or it might be
lower, depending on whose move it is in the ladder and what style of
playouts you use, but exactly 50/50 would be like flipping a coin and
having it land on its edge. In test cases, MC-UCT evaluations tend to
cluster near 50/50 in any case, because MC-UCT, especially dumb
uniform MC-UCT, tends to be conservative about predicting the winner,
especially in 19x19 where the opportunities for luck to overwhelm the
actual advantage on the board are greater. But if you accept this as
just a moot scaling issue -- that a clearly lopsided exchange can mean
just a 2% increment in winning percentage even if read more or less
correctly -- then the numbers may not look so even after all. I's
certainly possible for MC-UCT to climb a broken ladder in a winning
position (and climbing a broken ladder in an even position is at least
half as bad as that anyhow).

I tried testing this on 19x19 using libego at 1 million playouts per
move. The behavior was not consistent, but the numbers trended in the
defender's favor as the sides played out the ladder. In one bizarre
case, the attacker played out the ladder until there were just 17
plies left, and then backed off.

Why would the attacker give up a winning ladder? It appears the MC-UCT
was never actually reading the ladder to begin with; just four or five
plies in, sometimes just a few thousand simulations were still
following the key line. 1 million playouts were not nearly enough for
that in this case; maybe 100 million would be enough, but I couldn't
test that. Also, after enough simulations, decisively inferior moves
lead to fewer losses than slightly inferior ones. Suppose you have
three moves available: one wins 75% of the time, one 50%, and one 25%.
In the long run, the 75% move will be simulated almost all the time,
but the middle move will be simulated roughly four times as often as
the 25% one that, compared to the best move available, is twice as
bad, and four times the simulations with half the loss per simulation
adds up to twice the excess losses compared to the 25% move. That is
apropos here, because giving up on an open-field ladder once it has
been played out for a dozen moves is much more painful for the
defender than for the attacker. The longer the ladder got, the more
the evaluations trended in the defender's favor, and my best
explanation would be the fact that -- until you actually read the
ladder all the way out and find that the defender is dead -- every
move except pulling out of atari is so obviously bad that even uniform
MC-UCT did a better job of focusing on that one good move.

(Incidentally, the conservative nature of MC-UCT ratings largely
explains why maximizing winning probabilities alone is not a bad
strategy, at least in even games. The classic beginner mistake, when
you already have a clear lead in theory, is to fail to fight hard to
grab still more points as blunder insurance. But an MC-UCT evaluation
of 90% typically means a 90% probability of actually winning against
even opposition, not just a 90% likelihood of a theoretical win.
Assigning a 65% evaluation to an obvious THEORETICAL win allows plenty
of room to assign higher evaluations to even more lopsided advantages.
As Don said, when MC-UCT starts blatantly throwing away points for no
obvious reason, it's almost certainly because the game is REALLY over,
because MC-UCT's errors tend to be probabilistic instead of absolute
-- it may in effect evaluate a dead group as 75% alive, but it won't
call it 100% alive except in the rare cases when the underlying random
playout rules forbid the correct line of play.)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How does MC do with ladders?

2007-12-11 Thread Eric Boesch

Make sure that you use the -19 argument when starting 19x19 Mogo, and
restart GoGui (in order to restart Mogo indirectly) after you change
the settings. Somewhat confusingly, Mogo does not automatically play
19x19 style just because it receives a request for 19x19 board. Poor
ladder handling and squeezing the toothpaste are both behaviors that
Mogo can exhibit when playing 9x9-style on the 19x19 board. If,
assuming you're in GoGui, the GTP shell window shows shishoCheck is
called comments, then you're really playing 19x19 style.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Computer Go tournaments - various

2007-11-28 Thread Eric Boesch

On Nov 27, 2007 8:29 PM, Don Dailey [EMAIL PROTECTED] wrote:
 Ian Osgood wrote:
  Checking the participants, I see that MoGo and CrazyStone were
  specifically invited. Also playing is a version of GNU Go
  (presumably), as well as veterans Aya and Katsunari, and two dozen
  others.
 
  What boggles my mind is the lack of participation in these events from
  commercial players like KCC Igo, Haruka, Go4++, Handtalk, and Many Faces.
 It's because these programs will get killed by the top Monte Carlo
 programs.   It's risky competing when your reputation is involved.In
 fact, it's better not to compete than to compete and score poorly.

I was about to call you on the Many Faces case if Dave didn't. He has
never hesitated to admit it when other programs were stronger, and
Many Faces 11 plays on 19x19 CGOS when the site is up. Also, I don't
know if Handtalk is active development anymore. But you're basically
right, and your direct language is justified to cut through the
hemming and hawing.

Some programs do not compete because they are no longer being
maintained -- but these obsolescent programs would lose anyway. Other
programs do not compete because they would lose. Self-promotion while
ducking stronger competition still works, but hopefully more people
are starting to smell the trick's age. CrazyStone and Mogo win by
winning, not by hiding. I haven't even seen any reason to believe that
right now there exist any commercial programs that can make as strong
a claim to third place as the latest GnuGo or MonteGnu.

Aside: fair descriptions of your program and how it works, and
possibly what didn't work, can be very useful. The Mogo and CrazyStone
papers are excellent, and it would be great to see more of those
one-sentence program descriptions on Sensei's turned into links to
actual web pages. If any of you do this, be sure to inform the mailing
list. That kind of information is definitely not the sort of
self-promotion I was criticizing.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Tweaking the Pseudo Liberty Code

2007-11-24 Thread Eric Boesch

On Nov 24, 2007 10:38 AM,  [EMAIL PROTECTED] wrote:
 * For liberty values, I use the following equation:
  values[i] = 25000 + (16 * (i+1)) + ((i+1) * (i+1));
   Some bits are constant, some are linear, and some
   are quadratic, which guarantees that the sum of
   up to 4 values will only be in the set if they
   are from the same liberty. I have more confidence
   in an equation than in random numbers.

It's embarrassing how long it took me to notice that you were
exploiting the theorem that if you have n terms that sum to m, then
the sum of the squares of the terms will exceed (m**2)/n unless each
term equals exactly m/n. (This is essentially the same statistics
theorem that says that the expected value of the square equals the
square of the expected value plus the variance.)

Instead of smushing the three parts of your triple together, why not
keep them separate? So when you add a 1-dimensional coordinate, you do

{
  libs++;
  sum += coordinate;
  sumSquares += coordinate * coordinate;
}

which is admittedly a little more complicated than a table lookup, but
then the atari test becomes

   (sum * sum == sumSquares * libs)

Computing the triple-sum and testing for atari are just a couple adds
and multiplies. The process involves no initialization, lookup tables,
64-bit numbers, loops, branching, divisions, or remainders, and (even
though this doesn't matter for normal go, it's still nice) there's
almost no limit to how many duplicates of the same value you can
detect. I like it.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Speed of generating random playouts

2007-11-16 Thread Eric Boesch

On 11/15/07, Chris Fant [EMAIL PROTECTED] wrote:
 On Nov 15, 2007 3:20 PM, Eric Boesch [EMAIL PROTECTED] wrote:
  On 11/14/07, Chris Fant [EMAIL PROTECTED] wrote:
   Based on more recent emails, this may not be useful anymore, but I
   have a list of 361 32-bit numbers that satisfy these properties in
   case anyone is interested.
 
  I'd be interested in your implementation tricks. Where did the number
  17 come from?

 17 didn't come from me.  Here's how I made my list:

 Start with 500 random numbers.
 Throw out the ones that violate the tests.
 Hope that you are left with enough (361).

 This actually worked all the way down to 15-bit numbers.

Neat. Was the 15-bit version for 81 values or 361? At the risk of
putting my foot in my mouth, I don't think there exist 361 15-bit
numbers that satisfy minimum requirements (if the floating-point
average of any four code values is a code value, then the four code
values are identical).
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Speed of generating random playouts

2007-11-16 Thread Eric Boesch

On 11/16/07, John Tromp [EMAIL PROTECTED] wrote:
 On Nov 16, 2007 10:05 AM, Chris Fant [EMAIL PROTECTED] wrote:
Neat. Was the 15-bit version for 81 values or 361? At the risk of
putting my foot in my mouth, I don't think there exist 361 15-bit
numbers that satisfy minimum requirements (if the floating-point
average of any four code values is a code value, then the four code
values are identical).
  
   It was 361 values.  So either you are wrong or I have a bug.  I
   probably have a bug.  Here's the list.  If it violates the rules,
   please let me know.
 
  Yep, I think I had a bug.  I just removed an optimization that I
  thought was valid and now I'm getting smaller lists.  So I guess it
  was not valid.  Let me see how small I can get the numbers without
  that optimization...

 No, it was far from valid; e.g. 14+14+14+3022 = 4 * 766
 I don't think you can get 81 15-bit values either...

I think we are applying different standards. For the minimum
requirements I mentioned, 14 bits will do, for a strictly limited
defintion of do. 81 equals 1010001 base 2, and 1010001 base 5 is a
hair under 2^14. But you'll need to do your arithmetic in base 16, and
you'll need some external way (e.g. a separate pseudoliberty count) to
distinguish 1+5+6+6 from 6+6+6. So one could argue that these minimum
requirements aren't good enough, and really I would agree with that.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Speed of generating random playouts

2007-11-14 Thread Eric Boesch

Sorry, I didn't mean to send that one yet. I pressed tab, meaning to
print a tab character, and return soon after, which caused gmail to,
first, change the focus to the send button, and second, send the
mail. That last bit was supposed to be

if (code_sum  5 * threshold) {
int pseudoliberty_count = code_sum / threshold;
if (code_sum % pseudoliberty_count == 0) {
int average = code_sum / pseudoliberty_count;
if (average is in my original code_value set) then there is
just one liberty.
   }
}

The if average is in my original code_value set seems like a
bottleneck here, requiring about #bits (i.e. about 9, since 361 is a
9-bit number) operations no matter how you do it as far as I can tell
(unless you use a stupidly gigantic lookup table), and there's a
solution to that, too, if you don't mind wasting a little more space.
Use base 8 instead of base 5. Unfortunately, then it is no longer
possible (without using a separate pseudoliberty count) to squeeze the
result into a 32-bit number and be sure that, for chains with 19
liberties, for example, you don't get overflow. But it does permit
using a bitmask to convert the in my original code_value set test to
constant-time:

if ((average  0b110110110110110110110110110) == 0) { then there
is just one liberty }

Here, 0bfoo is the binary number foo (yes, I know that's not
legitimate C code) and I'm supposing that #bits == 9. I hope I got
this right -- I'm sort of hurrying to correct it before anybody wasted
too much time trying to decode it. (Sorry, Jason :)

On 11/14/07, Eric Boesch [EMAIL PROTECTED] wrote:
 Encode each number by swapping base 2 with base 5, without changing
 the digits. So binary 11011 (27) is represented as base-5 11011 (756).
 This allows you to sum up to 4 such numbers without experiencing
 overflow from one digit to the next (since 4 1's adds up to just 4).
 Essentially, you are performing b additions in parallel. If the
 numbers you are summing are all the same, and the pseudoliberty count
 (#libs) is no more than 4, then each base-5 digit of the sum will
 equal 0 or #libs.

 With a slight modification, you can also eliminate the pseudoliberty
 count completely and just use a single number. Take the largest code
 value you will need, add 1, multiply by four, round up to the next
 power of 5, and add this value to every code value, and now you can
 just use the test

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Speed of generating random playouts

2007-11-14 Thread Eric Boesch

Encode each number by swapping base 2 with base 5, without changing
the digits. So binary 11011 (27) is represented as base-5 11011 (756).
This allows you to sum up to 4 such numbers without experiencing
overflow from one digit to the next (since 4 1's adds up to just 4).
Essentially, you are performing b additions in parallel. If the
numbers you are summing are all the same, and the pseudoliberty count
(#libs) is no more than 4, then each base-5 digit of the sum will
equal 0 or #libs.

With a slight modification, you can also eliminate the pseudoliberty
count completely and just use a single number. Take the largest code
value you will need, add 1, multiply by four, round up to the next
power of 5, and add this value to every code value, and now you can
just use the test

  if (code_sum = threshold) {


 ((code_sum code_sum div threshold

! All you have to do is to add a sufficiently large fixed value to
every single representation.


Intersperse double-zeroes inside your input value, so binary 11011 is
encoded as 100101001. Now summing up to 7 such values is
guaranteed not to overflow one triad of bits to another, so
essentially you're performing b binary additions in parallel.

That's it. Divide the sum by

a little bit more than 3b bits. 3b bits seems to be very much

This could be made more compact, but then bitmasking wouldn't be as easy.



. Any coordinate is just a sequence of bits. Each bit can be encoded
separately. So the problem reduces to how to encode a single bit (0 or
1) so that the sum of up to 4 values equals a given number (say 0)
only if the bit is equal each time.

For bitmasking ease, you can use 3 bits per bit. Spread each input
number out like this: the

Up to 4 times 1 (mod 0) equals 0 only if



On 11/14/07, John Tromp [EMAIL PROTECTED] wrote:
 On Nov 14, 2007 5:03 PM, Imran Hendley [EMAIL PROTECTED] wrote:
 
  On Nov 14, 2007 3:19 PM, John Tromp [EMAIL PROTECTED] wrote:
   On Nov 14, 2007 2:00 PM, John Tromp [EMAIL PROTECTED] wrote:
  
My solution doesn't make use of that, and satisfies the stronger 
property:
  
0 = a_i = 4 and sum a_i * n_i is in 1*nums union 2*nums union 3*nums
union 4*nums
= only one a_i is nonzero.
  
   that was not quite correct. it should say:
  
   let a_i be the number of adjacencies to a liberty at point i.
  
   if sum a_i = 4, and sum (a_i * n_i)  is in {1,2,3,4} * nums,
   then only one a_i is nonzero.
 
  I'm really lost now. a_i is the number of stones we have adjacent to a
  liberty at intersection i? Do we need to know the location of our
  liberties to update sum a_i? How is this easier than just remembering

 For every string, you can keep track of this sum incrementally.
 When the string establishes a new adjacency to an empty point i,
 you add code[i] into the sum.

  the locations of all real liberties you saw? How do we know that the
  stones around i are from the same group? What are the n_i in

 n_i was the name that Tom gave to my code[i].

  sum(a_i*n_i)? is {1,2,3,4}*nums supposed to be a cartesian product of

 no, it's just my shorthand for the union of the 4 sets nums, 2*nums,
 3*nums and 4*nums.

 regards,
 -John
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Speed of generating random playouts

2007-11-12 Thread Eric Boesch

On 11/12/07, Petr Baudis [EMAIL PROTECTED] wrote:
 Does any frequently playing real-world bot use libEGO? It seems still
 order of magnitude faster, but it looks like that is because it
 simplifies some things too much. For example, its board::merge_chains()
 does not seem to take account for merging chains that might have had
 liberties in common. But maybe I just overlooked something?

I believe those are pseudo-liberty counts, not actual liberty counts.
If either count equals zero, then the other one will too, so
pseudo-liberties are good enough for identifying suicide and captures.
Check the mailing list archive.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Former Deep Blue Research working on Go

2007-10-11 Thread Eric Boesch

On 10/9/07, Andrés Domínguez [EMAIL PROTECTED] wrote:
 2007/10/9, Eric Boesch [EMAIL PROTECTED]:
  Naive null move is unhelpful because throughout much of a go game,
  almost every move is better than passing,

 I think this is not the point of null move. Null move is if pass is good 
 enough
 to an alpha cut, then will be a _better_ move. It is not important if
 pass is the
 worse move, is important that there is a better (=) move than pass (not
 zugzwang). Then you  bet searching not so deep.

Sorry, that was sloppy writing in several places. I was not trying to
argue why null-move pruning (NMP) would give the wrong answer, but
why, even if NMP performs as intended and horizon effects don't spoil
its evaluation, it might not prune many moves. The hope is to prune
variations that are bad and lopsided enough that starting at some
point, one side loses the equivalent of a whole move compared to, er,
more or less, correct play by both sides, right? The fraction of
variations that fit that description will increase with the depth of
the tree and the variability of move quality. The depth and
variability are both likely to be lower in global go search than in
global chess search. (As for local go search, as I already explained,
I think that even if NMP is effective when compared with bare-bones
alpha-beta, it is still less effective than other approaches like
lambda search.)

If all moves except pass for both players are better than nothing,
then if NMP works as intended, no moves will be pruned in the first
two plies of the search (it takes at least two moves by the same
player to fall a full move behind). If an average move is more than
two-thirds as valuable as the best move -- which is usually true in go
for, very roughly, the first 20 moves of a typical 19x19 game --
you'll have to go six levels deep before you see many NMP cutoffs
(even if white's sequence is below average and cumulatively a move
worse than best, it may not lose a full move to black's imperfect
responses, so only a minority of 6-ply sequences will be eligible, and
then you have to consider how many of those sequences would be cut off
by alpha-beta anyhow -- I would assume the sequences that NMP might
prune would be cut off by ordinary alpha-beta at a greater rate than
more balanced sequences would be). You won't see NMP cutoffs at the
bottom of the tree, either, because it's too late to prune then. If
NMP doesn't prune much near the root or the deepest nodes, and the
tree is not very deep because the branching factor and static
evaluation cost are high enough that there isn't time to search very
deeply, then NMP isn't doing much, period. I think that is at least
part of what has limited the benefits of  null move pruning for
full-breadth global search in go. Selective global search allows
deeper searches, but a good selector should prune away most of the
sequences NMP might otherwise have caught.

None of this is an argument that NMP would be literally useless, just
that it's unlikely to lead to a dramatic strength improvement. Even in
chess, Chrilly Donninger said NMP was good for, what, 100 Elo points?
The only alpha-beta tweak that can add 400 Elo to a chess program on
its own is transposition tables, and everybody already has those. That
makes it difficult to understand why non-go-programmers are sometimes
so willing to believe that just souping up an alpha-beta search could
turn today's top go programs, which I would say are at about the go
equivalent of class B at 19x19, into the go equivalent of
grandmasters. A simple-but-fast full-breadth alpha-beta go solver
would have even further to go to reach grandmaster level, because it
would need to reach the level of being competitive with the top tier
of extant programs first (which no such program currently is). Either
way, in terms of performance measured in human terms, the jump from
the state of the art to world-champion-caliber play would be a far
bigger leap beyond the state of the art than Deep Thought and Deep
Blue ever made. (The leap to dan level, if gaining just two stones
can be called that, surely requires only throwing a little more
hardware at existing programs.)

Okay, enough of that. If people aren't persuaded by other programmers'
experience trying to map computer chess methods to computer go in a
straightforward way, then they're not likely to be convinced by my
hand-waving arguments either.

[Regarding programmers' experience: when a top chess programmer
(Chrilly) and a successful go programmer (Peter Woitke) collaborated
on a chess-style go program, the result fell -- at last report, anyhow
-- about 600 Elo points short of the top tier of programs at 9x9, and
presumably much farther short at real go. (The 600 figure is derived
from Chrilly's claims of a 60% success record against GnuGo, and
GnuGo's placement nearly 700 Elo points behind Mogo on CGOS -- 9x9 is
not GnuGo's long suit.) That should dispel any residual hopes that
applying state-of-the-art chess-search

Re: [computer-go] Former Deep Blue Research working on Go

2007-10-08 Thread Eric Boesch

On 10/8/07, Tapani Raiko [EMAIL PROTECTED] wrote:
  May sound unpolite. But Deep Blue reached a very
  important step in IA. They will be known for ever.
  But, from a research point of view, they didn't much
  really. It was mainly a technological/technical
  achivement.
 
 Maybe they will reimplement Mogo, try a null-move tweak, use a
 supercomputer, and claim to have the strongest computer Go player ever. :-)

Naive null move is unhelpful because throughout much of a go game,
almost every move is better than passing, but generalizations of null
move can help in local fights, where most of the board really doesn't
matter. Thomsen calls lambda search an extension of null move. I
implemented a local search that involved a pass to fill outside
liberties move that acted as a stand-in for all nonlocal moves. Maybe
Feng-hsiung has something similar in mind. For programs that read out
local goals in the first place, it's natural to implement some method
-- lambda proof-number search with inversions, as in Thomsen's MadLab,
is probably one of the better ones -- to insure you're not searching
the whole board just to solve, say, a lousy crane's nest
(http://senseis.xmp.net/?CranesNestTesuji). I think Mogo and
CrazyStone do not do this, instead using very good whole-board vision
to compensate for relatively weak local tactics.

Even MadLab can be slow to solve the kind of tactical problems you
would think it. MadLab's search is admissible (though a bit buggy in
case of ko), and it seems that admissible search is often very hard
even when making a guess that is probably right is easy. With many
harder problems (MadLab did solve some some tricky, let's say 3 dan
level, problems very quickly, when the key variation stayed reasonably
narrow all the way to the end) I concluded that MadLab was finding the
tesujis that you would normally call the solution, but then getting
bogged down in the easier (to human eyes) life and death problem of
mopping up cut-off chains. There are endless practical examples of
easy to guess, hard to prove positions, with wide-branching (even
after narrowing the search region down to intersections that really
matter), deep, boring, straightforward grinds towards inevitable
victory, where a glance or 100 Monte Carlo simulations might reveal
the correct answer. For example, can a black stone in the center of an
empty 19x19 board live? Of course the answer is yes. Okay, now try to
prove it -- or don't, because it's my bet that even with computer
help, no one will succeed in doing so in the next five years. In
running battles with sketchy boundaries and nothing resembling an eye
yet, you can usually forget about trying to prove who will win. (If
the aforementioned stone in the center of the board had the 17x17
region above the first line all to itself, it might still be dead --
strong players say that if just the border of the 19x19 board is
filled with stones of one color, then with correct play by both sides,
the other player cannot live anywhere inside.) Even in the closed and
semi-closed go problems the Smart Tools team examined in their paper,
they said (I'm paraphrasing from memory, but I hope I get the gist
right) that often, proving the correct answer with their tsume-go
solver took far longer than just being 95% sure. Similar issues also
arise in chess, but are easier to handle within a classic alpha-beta
framework -- if proving checkmate is hard but recognizing the sure win
is easy, it's usually because one side forces a material advantage,
which even the crudest static evaluator can recognize. If you're
writing a generalize go playing program, there's plenty of opportunity
to admissibly optimize tactical searches, but don't expect tweaking
the admissible elements of your search to the limit to adequately
compensate for lack of guessing skill when proof is not practical,
even if  the search is meant only for clearly tactical problems and
not for direct application to opening play, strategic decisions, or
loose positions.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Heuristics for MC/UCT with all-or-nothing payouts

2007-06-10 Thread Eric Boesch


The UCT heuristic of trying every child of a node once before trying
any child twice is reasonable when the payoff distribution is unknown.
Why try the lever that paid $5 a second time if there might be another
lever that pays $1,000,000? But when the set of possible payoffs is
known to be {1, -1} -- you either win or you lose -- it makes no sense
to abandon a move with a perfect winning record. Suppose there are
1000 candidate moves -- if the first move you try happens to win, are
you going to waste the next 999 playouts on moves that (because you
only try them once) can't possibly yield candidates with better
records than the one you just found? The repeat-winners heuristic
(keep trying a move until it loses) allows you to find exceptionally
promising moves without having to identify 500 mildly promising ones
first.

It may make sense to revisit promising children that have both won and
lost at least once before visiting some other children at all. In a
game with an infinite branching factor, this is obviously the case --
otherwise the tree would never grow past the first two levels, so
you'd never advance past naive play -- and the same may hold if the
branching factor is just very high, even if you have no a priori
reason to prefer one child to another. (I *think* that's a difference
from the iterative/progressive widening/unpruning. Heuristics are
good; it's no surprise that a good pattern database leads to a much
stronger MC player -- but using a progressive approach without
heuristics requires an expansion strategy that won't just end up
trying new children all the time like the repeat-winners heuristic
when the children's pre-evaluations are all equal, or even worse,
retry random old children without regard to their win/loss record). If
you have already examined 100 child nodes, then you'd expect to have
to examine about 100 more before you find a more promising candidate.
Before doing that, it might make sense to expend some effort on
reexamining the most promising candidates so far, to sort the really
good candidates from the ones that started out on a lucky streak and
to sort the robust moves from the ones that only work against inferior
replies. As a dead-simple heuristic, one might, when there some
children that remain unexpanded, allocate half of the effort to
reexamining the most promising previously expanded children (not
counting children with perfect winning records which would be
immediately reexamined anyhow). This approach only doubles the number
of playouts needed before all children are explored (to repeat myself,
I don't consider the children to be properly explored until each child
has lost at least once), but it can lead to a less trivial and
hopefully smarter tree topology -- in short, it is more resistant to
pathological behavior (such as never expanding any part of the tree
past depth 2 because you're too busy trying new children) than the
repeat-winners heuristic. The extra effort is not wasted once all
children are explored, either, because the extra playouts are mostly
performed on nodes that probably would soon be reexamined anyhow.

It is also easy to imagine why MC programmers in games with {1, -1}
payoff sets would discover that the choose-best-average-payoff
heuristic is not as good as the choose-the-move-with-the-most-wins
heuristic when selecting the best move to play. The older Mogo paper
mentioned some plausible reasons, and it's easy to imagine others. For
typical distributions, a 10-2 record is better than a 6-1 record,
which is better than a 1-0 record.

Finally, with the repeat-winners or progressive-widening-type (not
sure what to call it; a quick glance at the papers suggested to me --
perhaps wrongly -- that it's not precisely either of the heuristics
called progressive widening or progressive unpruning, but a more
obvious and naive cousin of both) heuristics, the odd/even ply bias
that Jason brought up is not likely to be much of an issue (whether or
not it is ever a big issue for the try-every-child-once heuristic),
because the search tree becomes imbalanced so quickly. My guess is
that there is likely to very rapidly develop an expected roughly 50/50
mix of tree leaves at even and odd depths if wins and losses are
roughly equally frequent even after minimaxing.

So the normal UCT protocol makes sense with unknown sets of possible
payout values, but it also makes sense to use slightly different
choice heuristics when the payout value set is {1, -1}. Of course most
of you have probably done that already, along with possibly more
radical changes -- I know some of these heuristic suggestions are not
new (because I copied them from this mailing list), and I would bet
that none of them are.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Paper presents results on proximity heuristic

2007-02-08 Thread Eric Boesch


On 2/7/07, David Doshay [EMAIL PROTECTED] wrote:



On 7, Feb 2007, at 1:35 PM, Don Dailey wrote:

 Although suicide can occasionally be the best move in some
 rule-sets,  I think it weakens your program to include it,

At best you are going to get a ko threat, so it requires a pretty
sophisticated program to know how and when to use it.



There are weird cases where suicide is more than just a ko threat.

http://www.goban.demon.co.uk/go/bestiary/rule_challenge.html

(Just picking nits.)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Effective Go Library v0.101

2007-02-07 Thread Eric Boesch


On 2/7/07, steve uurtamo [EMAIL PROTECTED] wrote:


 And of course don't forget about this no_variable_in_for thing...

i'll have to read up on what you're describing.



The following pseudocode block

for (int i = 0; i  10; ++i) {
... code ...
}
// i's lifetime ends after the for loop does

is valid in just about any version of C++ and in the 1999 ISO C standard
(also known as C99), but it is not valid in most older C standards. (Some
older versions of gcc would accept this code but assign the wrong lifetime
(according to the standard) to variable i. If you want to test this code
with gcc, then use the -std=c99 flag, which, as of quite recently at least,
was not enabled by default.) There are at least a couple other C++-isms --
offhand, the // single-line comment form and variable declarations in the
middle of code blocks come to mind -- that are also valid in C99 but invalid
in at least some of the older C standards.

I'm not trying to run off on a language standards tangent! I just feared
that we might be headed towards one anyway, and I wanted to make sure the
information in it was not 8 years out of date.

Sorry, now back to go (I hope)...
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Re: [Computer-go] Learning related stuff

Re: [Computer-go] Learning related stuff

Re: [computer-go] Kinds of Zobrist hashes

Re: [computer-go] MPI vs Thread-safe

Re: [computer-go] First ever win of a computer against a pro 9P as black (game of Go, 9x9).

Re: [computer-go] Complicated seki with Ko

Re: [computer-go] April KGS bot tournament: results

Re: [computer-go] CGT approximating the sum of subgames

Re: [computer-go] Human Learning against MoGo

Re: [computer-go] 7.5-komi for 9x9 in Beijing

Re: [computer-go] Test position set for MC programs

Re: [computer-go] 19x19 MC improvement

Re: [computer-go] 19x19 MC improvement

Re: [computer-go] Is MC-UCT really scalable against humans?

Re: [computer-go] Please have your bot resign, for your own good

Re: [computer-go] rotate board

Re: [computer-go] rotate board

Re: [computer-go] Hall of fame for CGOS

Re: [computer-go] How does MC do with ladders?

Re: [computer-go] How does MC do with ladders?

Re: [computer-go] Computer Go tournaments - various

Re: [computer-go] Tweaking the Pseudo Liberty Code

Re: [computer-go] Speed of generating random playouts

Re: [computer-go] Speed of generating random playouts

Re: [computer-go] Speed of generating random playouts

Re: [computer-go] Speed of generating random playouts

Re: [computer-go] Speed of generating random playouts

Re: [computer-go] Former Deep Blue Research working on Go

Re: [computer-go] Former Deep Blue Research working on Go

[computer-go] Heuristics for MC/UCT with all-or-nothing payouts

Re: [computer-go] Paper presents results on proximity heuristic

Re: [computer-go] Effective Go Library v0.101

33 matches

Site Navigation

Mail list logo

Footer information