Re: Email filtering theory and the definition of spam

2018-02-13 Thread Rupert Gallagher
Humans tend to confuse Science and Engineering, including professional journalists: their mistake does not change the facts, but certainly confuses the weaker minds. Sent from ProtonMail Mobile On Mon, Feb 12, 2018 at 08:49, Groach wrote: > On 12/02/2018

Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-13 Thread Horváth Szabolcs
Dear members, User repeatedly sends us spam messages to train SA. Traning - at the moment - requires manual intervention: administrator verifies if it's really spam then issues sa-learn. Then the user thinks the process is done, and the next time when the same email arrives, it will

RE: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-13 Thread Horváth Szabolcs
Reindl Harald [mailto:h.rei...@thelounge.net] wrote: > > However, that doesn't happen. > > 0.000 0 338770 0 non-token data: nspam > > 0.000 01460807 0 non-token data: nham > what do you expect when you train 4 times more ham than spam? > frankly you

RE: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-13 Thread Horváth Szabolcs
Reindl Harald [mailto:h.rei...@thelounge.net] wrote: >> This is a mail gateway for multiple companies. I'm not supposed to read >> e-mails on that, or picking mails that can be used for learning ham > > how did you then manage 1.4 Mio ham-samples in your biased corpus Looks like in this

RE: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-13 Thread Horváth Szabolcs
Reindl Harald [mailto:h.rei...@thelounge.net] wrote: >> I think I have no control over what is learnt automatically. > surely, don't do autolearning at all This is a mail gateway for multiple companies. I'm not supposed to read e-mails on that, or picking mails that can be used for learning ham.

Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-13 Thread David Jones
On 02/13/2018 07:55 AM, Horváth Szabolcs wrote: Dear members, User repeatedly sends us spam messages to train SA. Traning - at the moment - requires manual intervention: administrator verifies if it's really spam then issues sa-learn. Then the user thinks the process is done, and the next

Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-13 Thread David Jones
On 02/13/2018 11:24 AM, Horváth Szabolcs wrote: Hello, David Jones [mailto:djo...@ena.com] wrote: There should be many more rule hits than just these 3. It looks like network tests aren't happening. Can you post the original email to pastebin.com with minimal redacting so the rest of us can

Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-13 Thread David Jones
On 02/13/2018 11:45 AM, Horváth Szabolcs wrote: Reindl Harald [mailto:h.rei...@thelounge.net] wrote: I think I have no control over what is learnt automatically. surely, don't do autolearning at all This is a mail gateway for multiple companies. I'm not supposed to read e-mails on that, or

RE: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-13 Thread Horváth Szabolcs
Hello, David Jones [mailto:djo...@ena.com] wrote: > There should be many more rule hits than just these 3. It looks like > network tests aren't happening. > Can you post the original email to pastebin.com with minimal redacting > so the rest of us can run it through our SA to see how it

Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-13 Thread John Hardin
On Tue, 13 Feb 2018, Horváth Szabolcs wrote: After: pts rule name description -- -- 0.0 HTML_IMAGE_RATIO_08BODY: HTML has a low ratio of text to image area 0.0 HTML_MESSAGE BODY: HTML included

RE: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-13 Thread Horváth Szabolcs
Hello, David Jones [mailto:djo...@ena.com] wrote: > With non-English email flow, it's more challenging. If no RBLs hit, then you > really must train your Bayes properly which requires some way to accurately > determine the ham and spam. You must keep a copy of the ham and spam corpi and be

Re: Email filtering theory and the definition of spam

2018-02-13 Thread @lbutlr
On 13 Feb 2018, at 06:57, Rupert Gallagher wrote: > Not sure why you guys are still discussing RFCs, though, Because one person keeps insisting that RFC822 is the relevant active standard despite being shown multiple times that it’s been obsoleted. Twice. -- If you

URIBL_BLOCKED

2018-02-13 Thread @lbutlr
0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [URIs:

Re: URIBL_BLOCKED

2018-02-13 Thread David B Funk
If you read that informational spamassassin wiki page referenced in that message you'd know that it has nothing to do with querying a Russian RBL. That Russian URI is what the query to URIBL was asking. So your use of URIBL (via spamassassin) hit a threshold and was blocked. Read that

Re: Malformed List-Id header

2018-02-13 Thread Kenneth Porter
On 2/4/2018 3:35 PM, Kenneth Porter wrote: I've noticed quite a bit of spam lately with a malformed List-Id header. Most notably, the angle brackets are missing, but the contents of the angle brackets when present often don't look like a domain. No dots, for example.

Re: Email filtering theory and the definition of spam

2018-02-13 Thread Rupert Gallagher
Said the blind person... Sent from ProtonMail Mobile On Tue, Feb 13, 2018 at 21:03, @lbutlr wrote: > On 13 Feb 2018, at 06:57, Rupert Gallagher wrote: > Not sure why you guys are > still discussing RFCs, though, Because one person keeps insisting that RFC822 > is the

Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-13 Thread John Hardin
On Tue, 13 Feb 2018, David Jones wrote: Properly training your Bayes and increasing the score for BAYES_80, BAYES_95, and BAYES_99 and BAYES_999 is the best bet on this one. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174

Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-13 Thread Benny Pedersen
John Hardin skrev den 2018-02-14 02:28: Properly training your Bayes and increasing the score for BAYES_80, BAYES_95, and BAYES_99 and BAYES_999 score BAYES_999 5000 /me hiddes, could not resists :=)

RE: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-13 Thread John Hardin
On Tue, 13 Feb 2018, Horváth Szabolcs wrote: 3. populate the ham database That's the tricky part. As I mentioned earlier, I don't really want end-users involved in this. You might be able to find a few that are somewhat technically competent and don't mind their ham samples being manually

Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-13 Thread Bill Cole
On 13 Feb 2018, at 9:33, Horváth Szabolcs wrote: This is a production mail gateway serving since 2015. I saw that a few messages (both hams and spams) automatically learned by amavisd/spamassassin. Today's statistics: 3616 autolearn=ham 10076 autolearn=no 2817 autolearn=spam 134