date:20170112

Re: [Computer-go] Messages classified as spam.

2017-01-12 Thread Petr Baudis

  Hello,

On Thu, Jan 12, 2017 at 12:44:44PM +0100, Gian-Carlo Pascutto wrote:
> On 12/01/2017 11:55, Rémi Coulom wrote:
> > It is the mail server of this mailing list that is not well
> > configured. Even my own messages are classified as spam for me now.
> > The list does not send DKIM identification.

  for mailing lists, the topic of DKIM is complicated, it's not just
about outgoing email.  It saddens me if Remi's ISP free.fr goes as far
as assuming emails without DKIM are spam, but I believe this is still
quite uncommon and none of us probably has time to start fiddling with
this either.

> It's been a while since I looked at this in depth, but the problem seems
> to be that it modifies the email but doesn't strip the original DKIM,
> which then fails to validate. Even adding a DKIM from the mailinglist
> wouldn't help, because in Patricks' case, his domain has a stated DMARC
> policy, which requires a valid DKIM from that same domain. It's the
> DMARC that makes this so much worse as just failing DKIM isn't usually
> enough to get classified as spam.
> 
> The list is on MailMan 2.1.18, which has support for working around this
> problem:
> http://www.spamresource.com/2016/09/dmarc-support-in-mailman.html
> https://wiki.list.org/DEV/DMARC
> 
> Admin, can you try this dmarc_moderation_action = Munge From?

  I just tried to enable this, even though the action taken is mildly
horrifying for me - hopefully it will indeed happen only to emails from
DMARC-reject domains.  Thanks for the pointer!

-- 
Petr Baudis
Run before you walk! Fly before you crawl! Keep moving forward!
If we fail, I'd rather fail really hugely.  -- Moist von Lipwig
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-12 Thread Xavier Combelle

So I will start to create software, and if someone want to use it you
will be free as free software, and I already found someone

who is ready to host the server side.

From a practical point of view, I will use public key signing to
distribute go software (binary or source), so I will ask the author to
sign it and give me their public key.

Xavier Combelle


Le 12/01/2017 à 11:04, Gian-Carlo Pascutto a écrit :
> On 11-01-17 18:09, Xavier Combelle wrote:
>> Of course it means distribute at least the binary so, or the source,
>> so proprietary software could be reluctant to share it. But for free
>> software there should not any problem. If someone is interested by my
>> proposition, I would be pleased to realize it.
> It is obvious that having a 30M dataset of games between strong players
> (i.e. replicating the AlphaGo training set) would be beneficial to the
> community. It is clear that most of us are trying to do the same now,
> that is somehow trying to learn a value function from the about ~1.5M
> KGS+Tygen+GoGoD games while trying to control overfitting via various
> measures. (Aya used small network + dropout. Rn trained multiple outputs
> on a network of unknown size. I wonder why no-one tried normal L1/L2
> regularization, but then I again I didn't get that working either!)
>
> Software should also not really be a problem: Leela is free, Ray and
> Darkforest are open source. If we can use a pure DCNN player I think
> there are several more options, for example I've seen several programs
> in Python. You can resolve score disagreement by invoking GNU Go --score
> aftermath.
>
> I think it's an open question though, *how* the games should be
> generated, i.e.:
>
> * Follow AlphaGo procedure but with SL instead of RL player (you can use
> bigger or smaller networks too, many tradeoffs possible)
> * Play games with full MCTS search and small number of playouts. (More
> bias, much higher quality games).
> * The author of Aya also stated his procedure.
> * Several of those and mix :-)
>



0xFA1051C4.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Messages classified as spam.

2017-01-12 Thread Gian-Carlo Pascutto

On 12/01/2017 11:55, Rémi Coulom wrote:
> It is the mail server of this mailing list that is not well
> configured. Even my own messages are classified as spam for me now.
> The list does not send DKIM identification.

It's been a while since I looked at this in depth, but the problem seems
to be that it modifies the email but doesn't strip the original DKIM,
which then fails to validate. Even adding a DKIM from the mailinglist
wouldn't help, because in Patricks' case, his domain has a stated DMARC
policy, which requires a valid DKIM from that same domain. It's the
DMARC that makes this so much worse as just failing DKIM isn't usually
enough to get classified as spam.

The list is on MailMan 2.1.18, which has support for working around this
problem:
http://www.spamresource.com/2016/09/dmarc-support-in-mailman.html
https://wiki.list.org/DEV/DMARC

Admin, can you try this dmarc_moderation_action = Munge From?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Messages classified as spam.

2017-01-12 Thread Rémi Coulom

It is the mail server of this mailing list that is not well configured. Even my 
own messages are classified as spam for me now. The list does not send DKIM 
identification.

- Mail original -
De: "Gian-Carlo Pascutto" 
À: computer-go@computer-go.org
Envoyé: Jeudi 12 Janvier 2017 10:45:43
Objet: Re: [Computer-go] Computer-go - Simultaneous policy and value functions 
reinforcement learning by MCTS-TD-Lambda ?

Patrick, for what it's worth, I think almost no-one will have seen your
email because laposte.net claims it's forged. Either your or
laposte.net's email server is mis-configured.
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-12 Thread Gian-Carlo Pascutto

On 11-01-17 18:09, Xavier Combelle wrote:
> Of course it means distribute at least the binary so, or the source,
> so proprietary software could be reluctant to share it. But for free
> software there should not any problem. If someone is interested by my
> proposition, I would be pleased to realize it.

It is obvious that having a 30M dataset of games between strong players
(i.e. replicating the AlphaGo training set) would be beneficial to the
community. It is clear that most of us are trying to do the same now,
that is somehow trying to learn a value function from the about ~1.5M
KGS+Tygen+GoGoD games while trying to control overfitting via various
measures. (Aya used small network + dropout. Rn trained multiple outputs
on a network of unknown size. I wonder why no-one tried normal L1/L2
regularization, but then I again I didn't get that working either!)

Software should also not really be a problem: Leela is free, Ray and
Darkforest are open source. If we can use a pure DCNN player I think
there are several more options, for example I've seen several programs
in Python. You can resolve score disagreement by invoking GNU Go --score
aftermath.

I think it's an open question though, *how* the games should be
generated, i.e.:

* Follow AlphaGo procedure but with SL instead of RL player (you can use
bigger or smaller networks too, many tradeoffs possible)
* Play games with full MCTS search and small number of playouts. (More
bias, much higher quality games).
* The author of Aya also stated his procedure.
* Several of those and mix :-)

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Computer-go - Simultaneous policy and value functions reinforcement learning by MCTS-TD-Lambda ?

2017-01-12 Thread Gian-Carlo Pascutto

Patrick, for what it's worth, I think almost no-one will have seen your
email because laposte.net claims it's forged. Either your or
laposte.net's email server is mis-configured.

> Refering to Silver's paper terminology and results, greedy policy 
> using RL Policy Network beated greedy policy using SL Policy
> Network, but PV-MCTS performed better when used with SL Policy
> Networks than with RL-Policy Network. Authors hypothetized that it is
> "presumably because humans select a diverse beam of promising moves,
> whereas RL optimizes for the single best move".

I've always found this to be a rather strange argument. If the wideness
of the selection is an issue, this can be resolved by tuning the UCT
parameters and prior differently, it doesn't need to be tuned in the
DCNN itself.

Someone on the list made a different argument: when there are several
good shape moves and one that tactically resolves the situation, SL may
prefer shape moves. But SL has bad tactical awareness, so resolving the
situation might be better for it and this is what RL learns to strongly
favor. Compare this with playouts (who also have little tactical
awareness themselves) strongly favoring settling the local situation. I
find this a more persuasive argument.

> Thus, one quality of a policy function to be used to bias the search
>  in a MCTS is a good balance between 'sharpness' (being selective)
> and 'open-mindness' (giving a chance to some low-value moves which
> could turn to be important; avoid blind spot).

Because of the above I disagree with this: this is a matter of tuning
the UCT parameters. The goal of the DCNN should be to give an objective
as possible judgment as to the likelihood that a move is best.

> Coudld someone direct me to litterature exploring this idea or 
> explaining why it doesnt't work in practice ?

I think simply no-one has tried it yet, at least publicly. There are
many other ideas to explore.

> I'm wondering  if someone has ever considered using a gradient of 
> temperature, in the softmax layer of the policy network,  with 
> temperature parameter varying with depth in the tree, so that the 
> search is broader in the first levels and becomes narrow in the 
> deepest levels (ultimately, it would turn the search into rollout to 
> the end of the game for deepest nodes). 

Don't typical UCT implementations already do this? If you use priors and
scale the priors down with the amount of visits a node has had, you get
the described effect. Or the opposite way, if you use progressive
widening it has the same effect.

You seem to be thinking all of this fudging of probabilities has to be
done at the DCNN level, but why not do it in the MCTS/UCT search
directly? It has more information, after all.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Messages classified as spam.

Re: [Computer-go] Training the value network (a possibly more efficient approach)

Re: [Computer-go] Messages classified as spam.

Re: [Computer-go] Messages classified as spam.

Re: [Computer-go] Training the value network (a possibly more efficient approach)

Re: [Computer-go] Computer-go - Simultaneous policy and value functions reinforcement learning by MCTS-TD-Lambda ?

6 matches

Site Navigation

Mail list logo

Footer information