Re: [Computer-go] Messages classified as spam.
Hello, On Thu, Jan 12, 2017 at 12:44:44PM +0100, Gian-Carlo Pascutto wrote: > On 12/01/2017 11:55, Rémi Coulom wrote: > > It is the mail server of this mailing list that is not well > > configured. Even my own messages are classified as spam for me now. > > The list does not send DKIM identification. for mailing lists, the topic of DKIM is complicated, it's not just about outgoing email. It saddens me if Remi's ISP free.fr goes as far as assuming emails without DKIM are spam, but I believe this is still quite uncommon and none of us probably has time to start fiddling with this either. > It's been a while since I looked at this in depth, but the problem seems > to be that it modifies the email but doesn't strip the original DKIM, > which then fails to validate. Even adding a DKIM from the mailinglist > wouldn't help, because in Patricks' case, his domain has a stated DMARC > policy, which requires a valid DKIM from that same domain. It's the > DMARC that makes this so much worse as just failing DKIM isn't usually > enough to get classified as spam. > > The list is on MailMan 2.1.18, which has support for working around this > problem: > http://www.spamresource.com/2016/09/dmarc-support-in-mailman.html > https://wiki.list.org/DEV/DMARC > > Admin, can you try this dmarc_moderation_action = Munge From? I just tried to enable this, even though the action taken is mildly horrifying for me - hopefully it will indeed happen only to emails from DMARC-reject domains. Thanks for the pointer! -- Petr Baudis Run before you walk! Fly before you crawl! Keep moving forward! If we fail, I'd rather fail really hugely. -- Moist von Lipwig ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Training the value network (a possibly more efficient approach)
So I will start to create software, and if someone want to use it you will be free as free software, and I already found someone who is ready to host the server side. From a practical point of view, I will use public key signing to distribute go software (binary or source), so I will ask the author to sign it and give me their public key. Xavier Combelle Le 12/01/2017 à 11:04, Gian-Carlo Pascutto a écrit : > On 11-01-17 18:09, Xavier Combelle wrote: >> Of course it means distribute at least the binary so, or the source, >> so proprietary software could be reluctant to share it. But for free >> software there should not any problem. If someone is interested by my >> proposition, I would be pleased to realize it. > It is obvious that having a 30M dataset of games between strong players > (i.e. replicating the AlphaGo training set) would be beneficial to the > community. It is clear that most of us are trying to do the same now, > that is somehow trying to learn a value function from the about ~1.5M > KGS+Tygen+GoGoD games while trying to control overfitting via various > measures. (Aya used small network + dropout. Rn trained multiple outputs > on a network of unknown size. I wonder why no-one tried normal L1/L2 > regularization, but then I again I didn't get that working either!) > > Software should also not really be a problem: Leela is free, Ray and > Darkforest are open source. If we can use a pure DCNN player I think > there are several more options, for example I've seen several programs > in Python. You can resolve score disagreement by invoking GNU Go --score > aftermath. > > I think it's an open question though, *how* the games should be > generated, i.e.: > > * Follow AlphaGo procedure but with SL instead of RL player (you can use > bigger or smaller networks too, many tradeoffs possible) > * Play games with full MCTS search and small number of playouts. (More > bias, much higher quality games). > * The author of Aya also stated his procedure. > * Several of those and mix :-) > 0xFA1051C4.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Messages classified as spam.
On 12/01/2017 11:55, Rémi Coulom wrote: > It is the mail server of this mailing list that is not well > configured. Even my own messages are classified as spam for me now. > The list does not send DKIM identification. It's been a while since I looked at this in depth, but the problem seems to be that it modifies the email but doesn't strip the original DKIM, which then fails to validate. Even adding a DKIM from the mailinglist wouldn't help, because in Patricks' case, his domain has a stated DMARC policy, which requires a valid DKIM from that same domain. It's the DMARC that makes this so much worse as just failing DKIM isn't usually enough to get classified as spam. The list is on MailMan 2.1.18, which has support for working around this problem: http://www.spamresource.com/2016/09/dmarc-support-in-mailman.html https://wiki.list.org/DEV/DMARC Admin, can you try this dmarc_moderation_action = Munge From? -- GCP ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Messages classified as spam.
It is the mail server of this mailing list that is not well configured. Even my own messages are classified as spam for me now. The list does not send DKIM identification. - Mail original - De: "Gian-Carlo Pascutto"À: computer-go@computer-go.org Envoyé: Jeudi 12 Janvier 2017 10:45:43 Objet: Re: [Computer-go] Computer-go - Simultaneous policy and value functions reinforcement learning by MCTS-TD-Lambda ? Patrick, for what it's worth, I think almost no-one will have seen your email because laposte.net claims it's forged. Either your or laposte.net's email server is mis-configured. ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Training the value network (a possibly more efficient approach)
On 11-01-17 18:09, Xavier Combelle wrote: > Of course it means distribute at least the binary so, or the source, > so proprietary software could be reluctant to share it. But for free > software there should not any problem. If someone is interested by my > proposition, I would be pleased to realize it. It is obvious that having a 30M dataset of games between strong players (i.e. replicating the AlphaGo training set) would be beneficial to the community. It is clear that most of us are trying to do the same now, that is somehow trying to learn a value function from the about ~1.5M KGS+Tygen+GoGoD games while trying to control overfitting via various measures. (Aya used small network + dropout. Rn trained multiple outputs on a network of unknown size. I wonder why no-one tried normal L1/L2 regularization, but then I again I didn't get that working either!) Software should also not really be a problem: Leela is free, Ray and Darkforest are open source. If we can use a pure DCNN player I think there are several more options, for example I've seen several programs in Python. You can resolve score disagreement by invoking GNU Go --score aftermath. I think it's an open question though, *how* the games should be generated, i.e.: * Follow AlphaGo procedure but with SL instead of RL player (you can use bigger or smaller networks too, many tradeoffs possible) * Play games with full MCTS search and small number of playouts. (More bias, much higher quality games). * The author of Aya also stated his procedure. * Several of those and mix :-) -- GCP ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Computer-go - Simultaneous policy and value functions reinforcement learning by MCTS-TD-Lambda ?
Patrick, for what it's worth, I think almost no-one will have seen your email because laposte.net claims it's forged. Either your or laposte.net's email server is mis-configured. > Refering to Silver's paper terminology and results, greedy policy > using RL Policy Network beated greedy policy using SL Policy > Network, but PV-MCTS performed better when used with SL Policy > Networks than with RL-Policy Network. Authors hypothetized that it is > "presumably because humans select a diverse beam of promising moves, > whereas RL optimizes for the single best move". I've always found this to be a rather strange argument. If the wideness of the selection is an issue, this can be resolved by tuning the UCT parameters and prior differently, it doesn't need to be tuned in the DCNN itself. Someone on the list made a different argument: when there are several good shape moves and one that tactically resolves the situation, SL may prefer shape moves. But SL has bad tactical awareness, so resolving the situation might be better for it and this is what RL learns to strongly favor. Compare this with playouts (who also have little tactical awareness themselves) strongly favoring settling the local situation. I find this a more persuasive argument. > Thus, one quality of a policy function to be used to bias the search > in a MCTS is a good balance between 'sharpness' (being selective) > and 'open-mindness' (giving a chance to some low-value moves which > could turn to be important; avoid blind spot). Because of the above I disagree with this: this is a matter of tuning the UCT parameters. The goal of the DCNN should be to give an objective as possible judgment as to the likelihood that a move is best. > Coudld someone direct me to litterature exploring this idea or > explaining why it doesnt't work in practice ? I think simply no-one has tried it yet, at least publicly. There are many other ideas to explore. > I'm wondering if someone has ever considered using a gradient of > temperature, in the softmax layer of the policy network, with > temperature parameter varying with depth in the tree, so that the > search is broader in the first levels and becomes narrow in the > deepest levels (ultimately, it would turn the search into rollout to > the end of the game for deepest nodes). Don't typical UCT implementations already do this? If you use priors and scale the priors down with the amount of visits a node has had, you get the described effect. Or the opposite way, if you use progressive widening it has the same effect. You seem to be thinking all of this fudging of probabilities has to be done at the DCNN level, but why not do it in the MCTS/UCT search directly? It has more information, after all. -- GCP ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go