Re: [Computer-go] Training an AlphaGo Zero-like algorithm with limited hardware on 7x7 boards

2020-01-27 Thread Álvaro Begué
To be clear, what I was talking about was building an opening book as
part of the game-generation process that produces training data for
the neural network. This makes sure you don't generate the same game
over and over again.

A few more things about my Spanish checkers experiment from a few
years ago:
 * I used a neural network as an evaluation function, and alpha-beta
as the search algorithm. The networks I tried were fully connected and
quite small compared to anything people are trying these days. The
only game-specific knowledge I provided was not stopping the search if
a capture is available (a primitive quiescence search that works well
for checkers).
 * I couldn't get very far until I provided access to endgame
tablebases. An important purpose of the evaluation function is to
establish if there is enough advantage for one side to convert the
game into a win, and the shallow searches I was performing in the
generated games weren't strong enough in the endgame to determine
this. Once I generated 6-men tablebases (pretty easy to do for
checkers), it became very strong very quickly (about 1 week of
computation, if I remember correctly).

If I find some time in the next few weeks, I'll try to repeat the
process for Ataxx.

Álvaro.

>
> Building an opening book is a good idea. I do it too.
>
> By the way, if anybody is interested, I have put a small 9x9 opening book 
> online:
> https://www.crazy-sensei.com/book/go_9x9/
> Evaluation is +1 for a win, -1 for a loss, for a komi of 7. It may not be 
> very good, because evaluations was done by my 19x19 network. I have started 
> to train a specialized 9x9 network last week, and it is already stronger.
>
> Rémi
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Training an AlphaGo Zero-like algorithm with limited hardware on 7x7 boards

2020-01-27 Thread Rémi Coulom
Building an opening book is a good idea. I do it too.

By the way, if anybody is interested, I have put a small 9x9 opening book
online:
https://www.crazy-sensei.com/book/go_9x9/
Evaluation is +1 for a win, -1 for a loss, for a komi of 7. It may not be
very good, because evaluations was done by my 19x19 network. I have started
to train a specialized 9x9 network last week, and it is already stronger.

Rémi
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Training an AlphaGo Zero-like algorithm with limited hardware on 7x7 boards

2020-01-27 Thread Álvaro Begué
For checkers, I used a naive implementation of UCT as my opening book
(the "playout" being the actual game where the engine is thinking). So
towards the end of the opening book there is always a position where
it will try a random move, but in the long run good opening moves will
be explored more often. I think this method might work well for other
games.

Álvaro.

On Mon, Jan 27, 2020 at 6:04 AM Rémi Coulom  wrote:
>
> This is a report after my first day of training my Ataxx network:
> https://www.game-ai-forum.org/viewtopic.php?f=24=693
> Ataxx is played on a 7x7 board. The rules are different, but I expect 7x7 Go 
> would produce similar results. 2k self-play games are more than enough to 
> produce a huge strength improvement at the beginning.
>
> It would take my system less than one day to generate 285k games on a single 
> GPU. But speed optimizations are probably not your biggest problem at the 
> moment.
>
> As I wrote in my previous message, it is important to control the variety of 
> your self-play game. In my program, I have a function to count the number of 
> distinct board configurations for each move number of the self-play games. 
> This way, I can ensure that the same opening is not replicated too many times.
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Training an AlphaGo Zero-like algorithm with limited hardware on 7x7 boards

2020-01-27 Thread Rémi Coulom
This is a report after my first day of training my Ataxx network:
https://www.game-ai-forum.org/viewtopic.php?f=24=693
Ataxx is played on a 7x7 board. The rules are different, but I expect 7x7
Go would produce similar results. 2k self-play games are more than enough
to produce a huge strength improvement at the beginning.

It would take my system less than one day to generate 285k games on a
single GPU. But speed optimizations are probably not your biggest problem
at the moment.

As I wrote in my previous message, it is important to control the variety
of your self-play game. In my program, I have a function to count the
number of distinct board configurations for each move number of the
self-play games. This way, I can ensure that the same opening is not
replicated too many times.
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Training an AlphaGo Zero-like algorithm with limited hardware on 7x7 boards

2020-01-26 Thread Igor Polyakov
I would be surprised if my model ever lost to GNU Go on 9x9. It's a lot
stronger than Fuego, which already stomps GNU Go. It would be a waste of
time to test it vs. GNU Go or even MCTS bots. I only plan on running tests
vs. current best models to see how it does against the state of the art 9x9
nets

On Mon, Jan 27, 2020, 06:39 cody2007 via Computer-go <
computer-go@computer-go.org> wrote:

> Thanks again for your thoughts and experiences Rémi and Igor.
>
> I'm still puzzled by what is making training slower for me than Rémi
> (although I wouldn't be surprised if Igor's results were faster when
> matched for hardware, model size, strength etc-- see below). Certainly komi
> sounds like it might help a lot. I'm going to have to check out the code
> from David Wu.
>
> It takes me longer than a day for "training" to actually start with my
> code -- because I first generate 128*2*32*35 = 285k training samples before
> even running the first round of backprop. After the first day, therefore,
> my model is always still entirely random.  So, possibly:
>
> (1) either your and David Wu's implementations are faster in wall clock
> time computationally
> (2) backprop is being started before the initial training buffer is filled
> (the Wu paper used 250k but it's not 100% clear to me if training did not
> start until that initial buffer was filled)
> (3) "training" time is being counted as the time when backprop starts
> regardless of how long the initial training buffer took to create.
>
> Another thing is that I'm not using any of the techniques beyond AlphaGo
> Zero that David Wu used. So, depending on if you guys are using some or all
> of those additional features and/or loss functions, it'd be expected that
> you're getting much faster training than me. I was actually starting to
> test adding some of his ideas from that paper to my code a while back but
> then coincidentally discovered the models I was training weren't as
> horrible as I had first thought.
>
> Have either of you ever benchmarked your 7x7 (or 9x9) models against GNU
> Go?
>
> By the way, all benchmarking against GNU Go that I've reported was in
> single-pass mode only (i.e., I was not running the tree search on top of
> the net outputs)
>
> Thanks,
> Cody
>
> ‐‐‐ Original Message ‐‐‐
> On Sunday, January 26, 2020 11:22 AM, Igor Polyakov <
> weiqiprogramm...@gmail.com> wrote:
>
> I trained using David Wu's code for a few months on 9x9 only and it's been
> superhuman after a few months.
>
> I'm not sure if anyone's interested, but I can release my network to the
> world. It's around the strength of KataGo, but only on 9x9. I could do a
> final test before releasing it into the wild
>
> On Mon, Jan 27, 2020, 00:17 Rémi Coulom  wrote:
>
>> Yes, using komi would help a lot. Still, I feel that something else must
>> be wrong, because winning 100% of the games as Black without komi should be
>> very easy on 7x7.
>>
>> I have not written anything about what I did with Crazy Stone. But my
>> experiments and ideas were really very similar to what David Wu did:
>> https://blog.janestreet.com/accelerating-self-play-learning-in-go/
>>
>> To clarify what I wrote in my previous message: "strong from scratch in a
>> single day" was for 7x7. I like testing new ideas with small networks on
>> small boards, because training is very fast, and what works on small boards
>> with small networks usually also works on large boards with big networks.
>>
>> Rémi
>>
>> On Sun, Jan 26, 2020 at 12:30 AM cody2007 
>> wrote:
>>
>>> Hi Rémi,
>>>
>>> Thanks for your comments! I am not using any komi and had not given much
>>> thought to it. Although, I suppose by having black win most games, I'm
>>> depriving the network of its only learning signal. I will have to try with
>>> an appropriately set komi next...
>>>
>>> >When I started to develop the Zero version of Crazy Stone, I spend a
>>> lot of time optimizing my method on a single (V100) GPU
>>> Any chance you've written about it somewhere? I'd be interested to learn
>>> more but wasn't able to find anything on the Crazy Stone website.
>>>
>>> Thanks,
>>> Cody
>>>
>>> ‐‐‐ Original Message ‐‐‐
>>> On Saturday, January 25, 2020 5:49 PM, Rémi Coulom <
>>> remi.cou...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> Thanks for sharing your experiments.
>>>
>>> Your match results are strange. Did you use a komi? You should use a
>>> komi of 9:
>>> https://senseis.xmp.net/?7x7
>>>
>>> The final strength of your network looks surprisingly weak. When I
>>> started to develop the Zero version of Crazy Stone, I spend a lot of time
>>> optimizing my method on a single (V100) GPU. I could train a strong network
>>> from scratch in a single day. Using a wrong komi might have hurt you. Also,
>>> on such a small board, it is not so easy to make sure that the self-play
>>> games have enough variety. You'd have to find many balanced random initial
>>> positions in order to avoid replicating the same game again and again.
>>>
>>> Rémi

Re: [Computer-go] Training an AlphaGo Zero-like algorithm with limited hardware on 7x7 boards

2020-01-26 Thread cody2007 via Computer-go
Thanks again for your thoughts and experiences Rémi and Igor.

I'm still puzzled by what is making training slower for me than Rémi (although 
I wouldn't be surprised if Igor's results were faster when matched for 
hardware, model size, strength etc-- see below). Certainly komi sounds like it 
might help a lot. I'm going to have to check out the code from David Wu.

It takes me longer than a day for "training" to actually start with my code -- 
because I first generate 128*2*32*35 = 285k training samples before even 
running the first round of backprop. After the first day, therefore, my model 
is always still entirely random.  So, possibly:

(1) either your and David Wu's implementations are faster in wall clock time 
computationally
(2) backprop is being started before the initial training buffer is filled (the 
Wu paper used 250k but it's not 100% clear to me if training did not start 
until that initial buffer was filled)
(3) "training" time is being counted as the time when backprop starts 
regardless of how long the initial training buffer took to create.

Another thing is that I'm not using any of the techniques beyond AlphaGo Zero 
that David Wu used. So, depending on if you guys are using some or all of those 
additional features and/or loss functions, it'd be expected that you're getting 
much faster training than me. I was actually starting to test adding some of 
his ideas from that paper to my code a while back but then coincidentally 
discovered the models I was training weren't as horrible as I had first thought.

Have either of you ever benchmarked your 7x7 (or 9x9) models against GNU Go?

By the way, all benchmarking against GNU Go that I've reported was in 
single-pass mode only (i.e., I was not running the tree search on top of the 
net outputs)

Thanks,
Cody

‐‐‐ Original Message ‐‐‐
On Sunday, January 26, 2020 11:22 AM, Igor Polyakov 
 wrote:

> I trained using David Wu's code for a few months on 9x9 only and it's been 
> superhuman after a few months.
>
> I'm not sure if anyone's interested, but I can release my network to the 
> world. It's around the strength of KataGo, but only on 9x9. I could do a 
> final test before releasing it into the wild
>
> On Mon, Jan 27, 2020, 00:17 Rémi Coulom  wrote:
>
>> Yes, using komi would help a lot. Still, I feel that something else must be 
>> wrong, because winning 100% of the games as Black without komi should be 
>> very easy on 7x7.
>>
>> I have not written anything about what I did with Crazy Stone. But my 
>> experiments and ideas were really very similar to what David Wu did:
>> https://blog.janestreet.com/accelerating-self-play-learning-in-go/
>>
>> To clarify what I wrote in my previous message: "strong from scratch in a 
>> single day" was for 7x7. I like testing new ideas with small networks on 
>> small boards, because training is very fast, and what works on small boards 
>> with small networks usually also works on large boards with big networks.
>>
>> Rémi
>>
>> On Sun, Jan 26, 2020 at 12:30 AM cody2007  wrote:
>>
>>> Hi Rémi,
>>>
>>> Thanks for your comments! I am not using any komi and had not given much 
>>> thought to it. Although, I suppose by having black win most games, I'm 
>>> depriving the network of its only learning signal. I will have to try with 
>>> an appropriately set komi next...
>>>
When I started to develop the Zero version of Crazy Stone, I spend a lot of 
time optimizing my method on a single (V100) GPU
>>> Any chance you've written about it somewhere? I'd be interested to learn 
>>> more but wasn't able to find anything on the Crazy Stone website.
>>>
>>> Thanks,
>>> Cody
>>>
>>> ‐‐‐ Original Message ‐‐‐
>>> On Saturday, January 25, 2020 5:49 PM, Rémi Coulom  
>>> wrote:
>>>
 Hi,

 Thanks for sharing your experiments.

 Your match results are strange. Did you use a komi? You should use a komi 
 of 9:
 https://senseis.xmp.net/?7x7

 The final strength of your network looks surprisingly weak. When I started 
 to develop the Zero version of Crazy Stone, I spend a lot of time 
 optimizing my method on a single (V100) GPU. I could train a strong 
 network from scratch in a single day. Using a wrong komi might have hurt 
 you. Also, on such a small board, it is not so easy to make sure that the 
 self-play games have enough variety. You'd have to find many balanced 
 random initial positions in order to avoid replicating the same game again 
 and again.

 Rémi
>>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Training an AlphaGo Zero-like algorithm with limited hardware on 7x7 boards

2020-01-26 Thread Igor Polyakov
I trained using David Wu's code for a few months on 9x9 only and it's been
superhuman after a few months.

I'm not sure if anyone's interested, but I can release my network to the
world. It's around the strength of KataGo, but only on 9x9. I could do a
final test before releasing it into the wild

On Mon, Jan 27, 2020, 00:17 Rémi Coulom  wrote:

> Yes, using komi would help a lot. Still, I feel that something else must
> be wrong, because winning 100% of the games as Black without komi should be
> very easy on 7x7.
>
> I have not written anything about what I did with Crazy Stone. But my
> experiments and ideas were really very similar to what David Wu did:
> https://blog.janestreet.com/accelerating-self-play-learning-in-go/
>
> To clarify what I wrote in my previous message: "strong from scratch in a
> single day" was for 7x7. I like testing new ideas with small networks on
> small boards, because training is very fast, and what works on small boards
> with small networks usually also works on large boards with big networks.
>
> Rémi
>
> On Sun, Jan 26, 2020 at 12:30 AM cody2007  wrote:
>
>> Hi Rémi,
>>
>> Thanks for your comments! I am not using any komi and had not given much
>> thought to it. Although, I suppose by having black win most games, I'm
>> depriving the network of its only learning signal. I will have to try with
>> an appropriately set komi next...
>>
>> >When I started to develop the Zero version of Crazy Stone, I spend a lot
>> of time optimizing my method on a single (V100) GPU
>> Any chance you've written about it somewhere? I'd be interested to learn
>> more but wasn't able to find anything on the Crazy Stone website.
>>
>> Thanks,
>> Cody
>>
>> ‐‐‐ Original Message ‐‐‐
>> On Saturday, January 25, 2020 5:49 PM, Rémi Coulom 
>> wrote:
>>
>> Hi,
>>
>> Thanks for sharing your experiments.
>>
>> Your match results are strange. Did you use a komi? You should use a komi
>> of 9:
>> https://senseis.xmp.net/?7x7
>>
>> The final strength of your network looks surprisingly weak. When I
>> started to develop the Zero version of Crazy Stone, I spend a lot of time
>> optimizing my method on a single (V100) GPU. I could train a strong network
>> from scratch in a single day. Using a wrong komi might have hurt you. Also,
>> on such a small board, it is not so easy to make sure that the self-play
>> games have enough variety. You'd have to find many balanced random initial
>> positions in order to avoid replicating the same game again and again.
>>
>> Rémi
>>
>>
>> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Training an AlphaGo Zero-like algorithm with limited hardware on 7x7 boards

2020-01-26 Thread Rémi Coulom
Yes, using komi would help a lot. Still, I feel that something else must be
wrong, because winning 100% of the games as Black without komi should be
very easy on 7x7.

I have not written anything about what I did with Crazy Stone. But my
experiments and ideas were really very similar to what David Wu did:
https://blog.janestreet.com/accelerating-self-play-learning-in-go/

To clarify what I wrote in my previous message: "strong from scratch in a
single day" was for 7x7. I like testing new ideas with small networks on
small boards, because training is very fast, and what works on small boards
with small networks usually also works on large boards with big networks.

Rémi

On Sun, Jan 26, 2020 at 12:30 AM cody2007  wrote:

> Hi Rémi,
>
> Thanks for your comments! I am not using any komi and had not given much
> thought to it. Although, I suppose by having black win most games, I'm
> depriving the network of its only learning signal. I will have to try with
> an appropriately set komi next...
>
> >When I started to develop the Zero version of Crazy Stone, I spend a lot
> of time optimizing my method on a single (V100) GPU
> Any chance you've written about it somewhere? I'd be interested to learn
> more but wasn't able to find anything on the Crazy Stone website.
>
> Thanks,
> Cody
>
> ‐‐‐ Original Message ‐‐‐
> On Saturday, January 25, 2020 5:49 PM, Rémi Coulom 
> wrote:
>
> Hi,
>
> Thanks for sharing your experiments.
>
> Your match results are strange. Did you use a komi? You should use a komi
> of 9:
> https://senseis.xmp.net/?7x7
>
> The final strength of your network looks surprisingly weak. When I started
> to develop the Zero version of Crazy Stone, I spend a lot of time
> optimizing my method on a single (V100) GPU. I could train a strong network
> from scratch in a single day. Using a wrong komi might have hurt you. Also,
> on such a small board, it is not so easy to make sure that the self-play
> games have enough variety. You'd have to find many balanced random initial
> positions in order to avoid replicating the same game again and again.
>
> Rémi
>
>
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Training an AlphaGo Zero-like algorithm with limited hardware on 7x7 boards

2020-01-25 Thread cody2007 via Computer-go
Hi Rémi,

Thanks for your comments! I am not using any komi and had not given much 
thought to it. Although, I suppose by having black win most games, I'm 
depriving the network of its only learning signal. I will have to try with an 
appropriately set komi next...

>When I started to develop the Zero version of Crazy Stone, I spend a lot of 
>time optimizing my method on a single (V100) GPU
Any chance you've written about it somewhere? I'd be interested to learn more 
but wasn't able to find anything on the Crazy Stone website.

Thanks,
Cody

‐‐‐ Original Message ‐‐‐
On Saturday, January 25, 2020 5:49 PM, Rémi Coulom  
wrote:

> Hi,
>
> Thanks for sharing your experiments.
>
> Your match results are strange. Did you use a komi? You should use a komi of 
> 9:
> https://senseis.xmp.net/?7x7
>
> The final strength of your network looks surprisingly weak. When I started to 
> develop the Zero version of Crazy Stone, I spend a lot of time optimizing my 
> method on a single (V100) GPU. I could train a strong network from scratch in 
> a single day. Using a wrong komi might have hurt you. Also, on such a small 
> board, it is not so easy to make sure that the self-play games have enough 
> variety. You'd have to find many balanced random initial positions in order 
> to avoid replicating the same game again and again.
>
> Rémi___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Training an AlphaGo Zero-like algorithm with limited hardware on 7x7 boards

2020-01-25 Thread Rémi Coulom
Hi,

Thanks for sharing your experiments.

Your match results are strange. Did you use a komi? You should use a komi
of 9:
https://senseis.xmp.net/?7x7

The final strength of your network looks surprisingly weak. When I started
to develop the Zero version of Crazy Stone, I spend a lot of time
optimizing my method on a single (V100) GPU. I could train a strong network
from scratch in a single day. Using a wrong komi might have hurt you. Also,
on such a small board, it is not so easy to make sure that the self-play
games have enough variety. You'd have to find many balanced random initial
positions in order to avoid replicating the same game again and again.

Rémi
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


[Computer-go] Training an AlphaGo Zero-like algorithm with limited hardware on 7x7 boards

2020-01-25 Thread cody2007 via Computer-go
Hi All,

I wanted to share an update to a post I wrote last year about using the AlphaGo 
Zero algorithm on small boards (7x7). I train for approximately 2 months on a 
single desktop PC with 2 GPU cards.

In the article I was getting mediocre performance from the networks. Now, I've 
found that there was a bug in the way that I was evaluating the networks and 
that what I've been training seems to be matching GNU Go's level of performance.

Anyway, I'm aware I'm not exactly pushing the bounds of what's been done 
before, but I thought some might be interested to see how one can still get 
decent performance, at least in my opinion, on extremely limited hardware 
setups -- orders of magnitude less than what DeepMind (and Leela) have used.

The post where I talk about the model's performance, training, and setup:
https://medium.com/@cody2007.2/how-i-trained-a-self-supervised-neural-network-to-beat-gnugo-on-small-7x7-boards-6b5b418895b7

A video where I play the network and show some of its move probabilities during 
self-play games:
https://www.youtube.com/watch?v=a5vq1OjZrCU

The model weights and tensorflow code:
https://github.com/cody2007/alpha_go_zero_implementation

-Cody___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go