Re: [Computer-go] AlphaZero paper difference between 2017 and 2018
Yamashita san, About your question, I think that the answer is yes. AlphaZero Symmetries seems successfully saturated. That means that 20b neural network with symmetries has a capacity to learn at most 21M full games. If you let the network to learn 21M full games without preprocessing inputs for symmetries, the network may over-fit by breaking symmetries since the input data for training are too small (1/8). So they generated more games in exchange for the preprocessing. I agree with you that they could not remove domain dependent knowledge completely. Thinning out positions of each game for game symmetries may be important. I have no knowledges about generalization of symmetries. It sounds hard problem if you don't preprocess training inputs. - ICHIKAWA, Yuji > 2019/04/04 23:34、Hiroshi Yamashita のメール: > > Hi Ichikawa san, > > Thank you for nice explanation. I think your guess is maybe right. > And 2018 nature paper might have no mistake. > > I had checked carefully both Figure 1. > > 1. 2017 reaches AlphaGo Lee in 170,000 step. 2018 reaches in 80,000 step. > 2. 2017 and 2018 reach "AlphaGo Zero(20 block)" in similar steps. > 3. Final strength is similar. > > So I had thought "If you use 7 times games record, initial learning speed is > fast, > but final strength is similar.". > So maybe they want to say "21 million Training Games is enough." > > But it is wrong. > In Go, if you use all positions from a game, it makes overfitting? And > learning will fail? > Without symmery-augmented, Go can use only 20 positions from a game. > Chess and Shogi is ok. It looks like domain dependent... > > Thanks, > Hiroshi Yamashita > >> Go version in AlphaZero 2017 finished the training in 34 hours according to >> Table S3. >> And it looks like AlphaZero Symmetries in AlphaZero 2018 finished the >> training in the same time according to Figure S1. >> So I think that the authors had adopted AlphaZero Symmetries in 2017 paper >> by mistake and retried the experiment again in 2018 paper. >> In order to compensate symmetries with real self-plays, they generated 8 >> times more games and reduced positions per game to 1/8. >> It is just my guess^^ > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
[Computer-go] AlphaZero paper difference between 2017 and 2018
Hi Ichikawa san, Thank you for nice explanation. I think your guess is maybe right. And 2018 nature paper might have no mistake. I had checked carefully both Figure 1. 1. 2017 reaches AlphaGo Lee in 170,000 step. 2018 reaches in 80,000 step. 2. 2017 and 2018 reach "AlphaGo Zero(20 block)" in similar steps. 3. Final strength is similar. So I had thought "If you use 7 times games record, initial learning speed is fast, but final strength is similar.". So maybe they want to say "21 million Training Games is enough." But it is wrong. In Go, if you use all positions from a game, it makes overfitting? And learning will fail? Without symmery-augmented, Go can use only 20 positions from a game. Chess and Shogi is ok. It looks like domain dependent... Thanks, Hiroshi Yamashita Go version in AlphaZero 2017 finished the training in 34 hours according to Table S3. And it looks like AlphaZero Symmetries in AlphaZero 2018 finished the training in the same time according to Figure S1. So I think that the authors had adopted AlphaZero Symmetries in 2017 paper by mistake and retried the experiment again in 2018 paper. In order to compensate symmetries with real self-plays, they generated 8 times more games and reduced positions per game to 1/8. It is just my guess^^ ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] AlphaZero paper difference between 2017 and 2018
Yamashita san, Go version in AlphaZero 2017 finished the training in 34 hours according to Table S3. And it looks like AlphaZero Symmetries in AlphaZero 2018 finished the training in the same time according to Figure S1. So I think that the authors had adopted AlphaZero Symmetries in 2017 paper by mistake and retried the experiment again in 2018 paper. In order to compensate symmetries with real self-plays, they generated 8 times more games and reduced positions per game to 1/8. It is just my guess^^ - ICHIKAWA Yuji > 2019/03/29 10:11、Hiroshi Yamashita のメール: > > Hi, > > Number of learned positions from a game record > > pos steps minibatch games > AlphaGoZero 293 ( 700,000 * 2048) / 4,900,000 3 > days > AlphaGoZero 219 (3,100,000 * 2048) / 29,000,000256 x 40 block, 40 > days > AlphaZero 2017 137 ( 700,000 * 4096) / 21,000,000 > AlphaZero 201820 ( 700,000 * 4096) / 140,000,000 > ELF 2019 154 (1,500,000 * 2048) / 20,000,000 > AlphaZero(Chess) 65 ( 700,000 * 4096) / 44,000,000 > AlphaZero(Shogi) 119 ( 700,000 * 4096) / 24,000,000 > > All Network is 256 x 20 blocks, except AlphaGoZero 40 days. > > Average of game moves are > Go220 > Chess 80 > Shogi 120 > > So I had thought learning all positions(from a game) once is nice. > But AlphaZero2018 uses only 20 positions from a game. > > > By the way, I did not received any mails since Ingo's mail(Mar 1 2019). > > Erik reported in Feb 17 2019, >> It looks like gmail is broken again for this list. I never got Remi's > > Remi also reported in Mar 24 2019. (I found this from archives.) >> I have just found out that the list is not sending emails to my free.fr > > Thanks, > Hiroshi Yamashita > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
[Computer-go] AlphaZero paper difference between 2017 and 2018
Hi, Number of learned positions from a game record pos steps minibatch games AlphaGoZero 293 ( 700,000 * 2048) / 4,900,000 3 days AlphaGoZero 219 (3,100,000 * 2048) / 29,000,000256 x 40 block, 40 days AlphaZero 2017 137 ( 700,000 * 4096) / 21,000,000 AlphaZero 201820 ( 700,000 * 4096) / 140,000,000 ELF 2019 154 (1,500,000 * 2048) / 20,000,000 AlphaZero(Chess) 65 ( 700,000 * 4096) / 44,000,000 AlphaZero(Shogi) 119 ( 700,000 * 4096) / 24,000,000 All Network is 256 x 20 blocks, except AlphaGoZero 40 days. Average of game moves are Go220 Chess 80 Shogi 120 So I had thought learning all positions(from a game) once is nice. But AlphaZero2018 uses only 20 positions from a game. By the way, I did not received any mails since Ingo's mail(Mar 1 2019). Erik reported in Feb 17 2019, It looks like gmail is broken again for this list. I never got Remi's Remi also reported in Mar 24 2019. (I found this from archives.) I have just found out that the list is not sending emails to my free.fr Thanks, Hiroshi Yamashita ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
[Computer-go] AlphaZero paper difference between 2017 and 2018
Hi, I found AlphaZero paper Table S3 is different 2017 and 2018. 2017 2018 Mini-batches 700k 700k Training Time 34h 13d Training Games 21 million 140 million Thinking Time 800 sims, 200ms 800 sims, 200ms Training Time is 34h ->13d 9.2 times Training Games is 21 million -> 140 million 6.6 times Chess and Shogi is same. And Figure 1 is also a bit different in Shogi and Go. Chess looks same. Why these numbers are so different? Is it typo? AlphaZero(2017/12/05) Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm https://arxiv.org/abs/1712.01815 AlphaZero(2018/12/07) A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play https://deepmind.com/documents/260/alphazero_preprint.pdf Thanks, Hiroshi Yamashita ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go