Re: [Computer-go] AlphaZero paper difference between 2017 and 2018

2019-04-04 Thread Yuji Ichikawa
Yamashita san,

About your question, I think that the answer is yes.

AlphaZero Symmetries seems successfully saturated.
That means that 20b neural network with symmetries has a capacity to learn at 
most 21M full games.
If you let the network to learn 21M full games without preprocessing inputs for 
symmetries, the network may over-fit by breaking symmetries since the input 
data for training are too small (1/8).
So they generated more games in exchange for the preprocessing.

I agree with you that they could not remove domain dependent knowledge 
completely.
Thinning out positions of each game for game symmetries may be important.
I have no knowledges about generalization of symmetries. It sounds hard problem 
if you don't preprocess training inputs.
-
ICHIKAWA, Yuji

> 2019/04/04 23:34、Hiroshi Yamashita のメール:
> 
> Hi Ichikawa san,
> 
> Thank you for nice explanation. I think your guess is maybe right.
> And 2018 nature paper might have no mistake.
> 
> I had checked carefully both Figure 1.
> 
> 1. 2017 reaches AlphaGo Lee in 170,000 step. 2018 reaches in 80,000 step.
> 2. 2017 and 2018 reach "AlphaGo Zero(20 block)" in similar steps.
> 3. Final strength is similar.
> 
> So I had thought "If you use 7 times games record, initial learning speed is 
> fast,
> but final strength is similar.".
> So maybe they want to say "21 million Training Games is enough."
> 
> But it is wrong.
> In Go, if you use all positions from a game, it makes overfitting? And 
> learning will fail?
> Without symmery-augmented, Go can use only 20 positions from a game.
> Chess and Shogi is ok. It looks like domain dependent...
> 
> Thanks,
> Hiroshi Yamashita
> 
>> Go version in AlphaZero 2017 finished the training in 34 hours according to 
>> Table S3.
>> And it looks like AlphaZero Symmetries in AlphaZero 2018 finished the 
>> training in the same time according to Figure S1.
>> So I think that the authors had adopted AlphaZero Symmetries in 2017 paper 
>> by mistake and retried the experiment again in 2018 paper.
>> In order to compensate symmetries with real self-plays, they generated 8 
>> times more games and reduced positions per game to 1/8.
>> It is just my guess^^
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] AlphaZero paper difference between 2017 and 2018

2019-04-04 Thread Hiroshi Yamashita

Hi Ichikawa san,

Thank you for nice explanation. I think your guess is maybe right.
And 2018 nature paper might have no mistake.

I had checked carefully both Figure 1.

1. 2017 reaches AlphaGo Lee in 170,000 step. 2018 reaches in 80,000 step.
2. 2017 and 2018 reach "AlphaGo Zero(20 block)" in similar steps.
3. Final strength is similar.

So I had thought "If you use 7 times games record, initial learning speed is 
fast,
 but final strength is similar.".
So maybe they want to say "21 million Training Games is enough."

But it is wrong.
In Go, if you use all positions from a game, it makes overfitting? And learning 
will fail?
Without symmery-augmented, Go can use only 20 positions from a game.
Chess and Shogi is ok. It looks like domain dependent...

Thanks,
Hiroshi Yamashita


Go version in AlphaZero 2017 finished the training in 34 hours according to 
Table S3.
And it looks like AlphaZero Symmetries in AlphaZero 2018 finished the training 
in the same time according to Figure S1.
So I think that the authors had adopted AlphaZero Symmetries in 2017 paper by 
mistake and retried the experiment again in 2018 paper.
In order to compensate symmetries with real self-plays, they generated 8 times 
more games and reduced positions per game to 1/8.
It is just my guess^^

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaZero paper difference between 2017 and 2018

2019-04-01 Thread Yuji Ichikawa
Yamashita san,

Go version in AlphaZero 2017 finished the training in 34 hours according to 
Table S3.
And it looks like AlphaZero Symmetries in AlphaZero 2018 finished the training 
in the same time according to Figure S1.
So I think that the authors had adopted AlphaZero Symmetries in 2017 paper by 
mistake and retried the experiment again in 2018 paper.
In order to compensate symmetries with real self-plays, they generated 8 times 
more games and reduced positions per game to 1/8.
It is just my guess^^
-
ICHIKAWA Yuji

> 2019/03/29 10:11、Hiroshi Yamashita のメール:
> 
> Hi,
> 
> Number of learned positions from a game record
> 
> pos steps  minibatch   games
> AlphaGoZero  293 (  700,000 * 2048) /   4,900,000 3 
> days
> AlphaGoZero  219 (3,100,000 * 2048) /  29,000,000256 x 40 block, 40 
> days
> AlphaZero 2017   137 (  700,000 * 4096) /  21,000,000
> AlphaZero 201820 (  700,000 * 4096) / 140,000,000
> ELF 2019 154 (1,500,000 * 2048) /  20,000,000
> AlphaZero(Chess)  65 (  700,000 * 4096) /  44,000,000
> AlphaZero(Shogi) 119 (  700,000 * 4096) /  24,000,000
> 
> All Network is 256 x 20 blocks, except AlphaGoZero 40 days.
> 
> Average of game moves are
> Go220
> Chess  80
> Shogi 120
> 
> So I had thought learning all positions(from a game) once is nice.
> But AlphaZero2018 uses only 20 positions from a game.
> 
> 
> By the way, I did not received any mails since Ingo's mail(Mar 1 2019).
> 
> Erik reported in Feb 17 2019,
>> It looks like gmail is broken again for this list. I never got Remi's
> 
> Remi also reported in Mar 24 2019. (I found this from archives.)
>> I have just found out that the list is not sending emails to my free.fr
> 
> Thanks,
> Hiroshi Yamashita
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] AlphaZero paper difference between 2017 and 2018

2019-03-31 Thread Hiroshi Yamashita

Hi,

Number of learned positions from a game record

 pos steps  minibatch   games
AlphaGoZero  293 (  700,000 * 2048) /   4,900,000 3 days
AlphaGoZero  219 (3,100,000 * 2048) /  29,000,000256 x 40 block, 40 days
AlphaZero 2017   137 (  700,000 * 4096) /  21,000,000
AlphaZero 201820 (  700,000 * 4096) / 140,000,000
ELF 2019 154 (1,500,000 * 2048) /  20,000,000
AlphaZero(Chess)  65 (  700,000 * 4096) /  44,000,000
AlphaZero(Shogi) 119 (  700,000 * 4096) /  24,000,000

All Network is 256 x 20 blocks, except AlphaGoZero 40 days.

Average of game moves are
Go220
Chess  80
Shogi 120

So I had thought learning all positions(from a game) once is nice.
But AlphaZero2018 uses only 20 positions from a game.


By the way, I did not received any mails since Ingo's mail(Mar 1 2019).

Erik reported in Feb 17 2019,

It looks like gmail is broken again for this list. I never got Remi's


Remi also reported in Mar 24 2019. (I found this from archives.)

I have just found out that the list is not sending emails to my free.fr


Thanks,
Hiroshi Yamashita
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] AlphaZero paper difference between 2017 and 2018

2019-03-31 Thread Hiroshi Yamashita

Hi,

I found AlphaZero paper Table S3 is different 2017 and 2018.


   2017  2018
Mini-batches   700k  700k
Training Time   34h   13d
Training Games  21 million   140 million
Thinking Time  800 sims, 200ms   800 sims, 200ms


Training Time  is  34h ->13d 9.2 times
Training Games is  21 million  ->   140 million  6.6 times

Chess and Shogi is same.
And Figure 1 is also a bit different in Shogi and Go. Chess looks same.

Why these numbers are so different? Is it typo?


AlphaZero(2017/12/05)
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning 
Algorithm
https://arxiv.org/abs/1712.01815

AlphaZero(2018/12/07)
A general reinforcement learning algorithm that masters chess, shogi, and Go 
through self-play
https://deepmind.com/documents/260/alphazero_preprint.pdf

Thanks,
Hiroshi Yamashita
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go