Adding IPs to the check list

2018-02-14 Thread Pedro David Marco
Is there any "relativelly easy" way to add a new IP found in a non-standard 
header to the IPs checks (e.g. DNSRBL)???  plugin is the only way?
Thanks.
--PedroD


Re: URIBL_BLOCKED

2018-02-14 Thread @lbutlr
On 2018-02-14 (09:55 MST), Tobi  wrote:
> 
> Am 14.02.2018 um 17:16 schrieb @lbutlr:
>> I can't imagine why i'd be over limit, my mail server is tiny.
> 
> its not the mailserver that got blocked by limits, but the dns resolver
> your mailserver uses!

I use my own DNS on Bind 9.12, however the block error is not appearing today, 
so...



-- 
"...and that's not incense"



Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-14 Thread RW
On Wed, 14 Feb 2018 16:20:30 +0100
Matus UHLAR - fantomas wrote:

> >On Tue, 13 Feb 2018 21:02:46 +
> >Horváth Szabolcs wrote:  
> >> One more question: is there a recommended ham to spam ratio? 1:1?  
> 
> On 14.02.18 15:09, RW wrote:
> >No, this is a myth.  Bayes computes token probabilities from a
> >token's frequencies in spam and ham, so it all scales through. If
> >you have 2000 ham and 200 spam the problem is too few spams, not a
> >bad ratio.  
> 
> my experience says you will need more ham than spam, because you want
> to get rid of false positives (ham marked as spam) much more than of
> false negatives.


My point is that an imbalance doesn't create a bias.



Re: URIBL_BLOCKED

2018-02-14 Thread John Hardin

On Wed, 14 Feb 2018, Tobi wrote:




Am 14.02.2018 um 17:16 schrieb @lbutlr:

I can't imagine why i'd be over limit, my mail server is tiny.


its not the mailserver that got blocked by limits, but the dns resolver
your mailserver uses!
If you're using a 3rd party resolver (ex the ones from your provider or
8.8.8.8) you can hit the limits quite fast depending on how many other
users use the same resolver for their uribl queries.
I recommend to setup a local resolver (unbound or something similar) and
use that resolver for your mailserver(s).


This detail always gets glossed over: set up a local NON-FORWARDING 
resolver.


If you set up a local resolver and it just forwards requests to your ISP's 
DNS servers, you have not materially changed the problem.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  A sword is never a killer, it is but a tool in the killer's hands.
  -- Lucius Annaeus Seneca (Martial) 4BC-65AD
---
 8 days until George Washington's 286th Birthday


Re: URIBL_BLOCKED

2018-02-14 Thread Tobi


Am 14.02.2018 um 17:16 schrieb @lbutlr:
> I can't imagine why i'd be over limit, my mail server is tiny.

its not the mailserver that got blocked by limits, but the dns resolver
your mailserver uses!
If you're using a 3rd party resolver (ex the ones from your provider or
8.8.8.8) you can hit the limits quite fast depending on how many other
users use the same resolver for their uribl queries.
I recommend to setup a local resolver (unbound or something similar) and
use that resolver for your mailserver(s).

Cheers

tobi


Re: URIBL_BLOCKED

2018-02-14 Thread Kevin A. McGrail

On 2/14/2018 11:16 AM, @lbutlr wrote:

Ah, I didn't know URIBL was a blacklist, I thought it was being used as a 
generic abbreviation variant of RBL.

I can't imagine why i'd be over limit, my mail server is tiny.


It's confusing, I agree.  See 
https://issues.apache.org/jira/browse/COMDEV-267?jql=text%20~%20%22GSOC%202018%22 
for one of the ideas I wrote for improving it.




Re: URIBL_BLOCKED

2018-02-14 Thread @lbutlr
On 2018-02-13 (14:45 MST), Reindl Harald  wrote:
> 
> Am 13.02.2018 um 21:21 schrieb @lbutlr:
>> 0.0 URIBL_BLOCKED  ADMINISTRATOR NOTICE: The query to URIBL was 
>> blocked.
>> See
>> 
>> http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
>>  for more information.
>> [URIs: cz-salda.ru]
>> So, I’ve never heard of cz-salda.ru, is that the RBL that is blocking me? If 
>> so, where is it listed in SA’s configuration (FreeBSD 11.1-RELEASE)? (tried 
>> a `grep salda.ru /usr/local/etc/mail/spamassassin/*` for no results)
> 
> jesus christ click on the link you even quote

I did click on the link.

> "cz-salda.ru" was the domain which would have been checked against URIBL and 
> URIBL said "you are over limit, go away"

Ah, I didn't know URIBL was a blacklist, I thought it was being used as a 
generic abbreviation variant of RBL.

I can't imagine why i'd be over limit, my mail server is tiny.

-- 
Women like silent men, they think they're listening.



Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-14 Thread David Jones

On 02/14/2018 09:20 AM, Matus UHLAR - fantomas wrote:

On Tue, 13 Feb 2018 21:02:46 +
Horváth Szabolcs wrote:

One more question: is there a recommended ham to spam ratio? 1:1?


On 14.02.18 15:09, RW wrote:

No, this is a myth.  Bayes computes token probabilities from a token's
frequencies in spam and ham, so it all scales through. If you have
2000 ham and 200 spam the problem is too few spams, not a bad ratio.


my experience says you will need more ham than spam, because you want to 
get
rid of false positives (ham marked as spam) much more than of false 
negatives.




This is also my experience.



what really matters is how many of FP/FNs you have, you can decrease
probability by training anything too far from BAYES_00 for ham and BAYES_99
for ham


Correct.  You want to get ham hitting BAYES_00 and spam hitting 
BAYES_80, BAYES_95, BAYES_99, or BAYES_999 which mine does very well.


A problem I have found is you shouldn't blindly train all spam as spam. 
I have some spam hitting BAYES_00 because it truly could be ham based on 
the body contents but it's spam because it was unsolicited email from 
someone "cold" emailing for a meeting or something.


In this case, I block the sender and report it to SpamCop and other 
abuse so the account can be blocked/locked/disabled hopefully.


If I had trained my Bayes with this email as spam, then legit email 
could hit BAYES_99.  That is why my nightly process to train my Bayes DB 
in redis learns ham first then spam second.  This seems to be the best 
order from my experience.


--
David Jones


Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-14 Thread Matus UHLAR - fantomas

On Tue, 13 Feb 2018 21:02:46 +
Horváth Szabolcs wrote:

One more question: is there a recommended ham to spam ratio? 1:1?


On 14.02.18 15:09, RW wrote:

No, this is a myth.  Bayes computes token probabilities from a token's
frequencies in spam and ham, so it all scales through. If you have
2000 ham and 200 spam the problem is too few spams, not a bad ratio.


my experience says you will need more ham than spam, because you want to get
rid of false positives (ham marked as spam) much more than of false negatives.

what really matters is how many of FP/FNs you have, you can decrease
probability by training anything too far from BAYES_00 for ham and BAYES_99
for ham
--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
LSD will make your ECS screen display 16.7 million colors


Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-14 Thread RW
On Tue, 13 Feb 2018 21:02:46 +
Horváth Szabolcs wrote:

> One more question: is there a recommended ham to spam ratio? 1:1? 

No, this is a myth.  Bayes computes token probabilities from a token's 
frequencies in spam and ham, so it all scales through. If you have
2000 ham and 200 spam the problem is too few spams, not a bad ratio.


Theoretically there is a case for new training to match the ratio that's
already in the database because then a new token will get a token
probability that reflects its frequencies in recent mail. But I wouldn't
worry about that, it's hard to stick to, and probably minor. 


ApacheCon NA 2018 Travel Assistance Applications now open!

2018-02-14 Thread Sidney Markowitz
Hi everyone,

The Travel Assistance Committee has asked the various Apache Project
Management Committees to forward the following announcement to the user and
dev mailing lists:

 ---

The Travel Assistance Committee (TAC) are pleased to announce that travel
assistance applications for ApacheCon NA 2018 are now open!

We will be supporting ApacheCon NA Montreal, Canada on 24th - 29th September 
2018

 TAC exists to help those that would like to attend ApacheCon events, but are
unable to do so for financial reasons.
For more info on this years applications and qualifying criteria, please visit
the TAC website at < http://www.apache.org/travel/ >. Applications are now
open and will close 1st May.

Important: Applications close on May 1st, 2018. Applicants have until the
closing date above to submit their applications (which should contain as much
supporting material as required to efficiently and accurately process their
request), this will enable TAC to announce successful awards shortly afterwards.

As usual, TAC expects to deal with a range of applications from a diverse
range of backgrounds. We therefore encourage (as always) anyone thinking about
sending in an application to do so ASAP.
We look forward to greeting many of you in Montreal

Kind Regards,
Gavin - (On behalf of the Travel Assistance Committee)


Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-14 Thread Rupert Gallagher
They cannot (do not want, do not have the know how) study the e-mails, and 
therefore they cannot build a reliable corpus. All they can do is to trust the 
ability of their users to study their own e-mails well enough to do the job, 
hence the mess with ham/spam when feeding the Bayesian filter. They need to 
consult with a lawyer, fix their paperwork, hire people who can teach them 
everything they need to know, and invest at least two years full-time in the 
process. They just cannot install centos and SA and hope Bayesian filters to do 
their job out of magic. It just does not work that way.

Sent from ProtonMail Mobile

On Wed, Feb 14, 2018 at 05:48, Bill Cole 
 wrote:

> On 13 Feb 2018, at 9:33, Horváth Szabolcs wrote: > This is a production mail 
> gateway serving since 2015. I saw that a few > messages (both hams and spams) 
> automatically learned by > amavisd/spamassassin. Today's statistics: > > 3616 
> autolearn=ham > 10076 autolearn=no > 2817 autolearn=spam > 134 
> autolearn=unavailable That's quite high for spam, ham, AND "unavailable" 
> (which indicates something wrong with the Bayes subsystem, usually 
> transient.) This seems like a recipe for a mis-learning disaster. For 
> comparison, my 2018 autolearn counts: spam: 418 ham: 15018 unavailable: 166 
> no: 129555 I also manually train any spam that gets through to me (the 
> biggest spam target,) a small number of spams reported by others, and 'trap' 
> hits. A wide variety of ham is harder to get for training but I have found it 
> useful to give users a well-documented and simple way to help. One way is to 
> look at what happens to mail AFTER delivery which can indicate that a message 
> is ham without needing an admin to try to make a determination based on 
> content. The simplest one is to learn anything users mark as $NotJunk as ham. 
> Another is to create an "Archive" mailbox for every user and learn anything 
> as ham that has been moved there a day after it is moved. The most important 
> factor (especially in jurisdictions where human examination of email is a 
> problem) is to tell users how to protect their email and then do what you 
> tell them, robotically. In the US, Canada, and *SOME* of the EU, this is not 
> risky. However, I have been told by people in *SOME* EU countries that they 
> can't even robotically scan ANY mail content, so you shouldn't take my advice 
> as authoritative: I'm not even a lawyer in the US, much less Hungary... > I 
> think I have no control over what is learnt automatically. Yes, you do. Run 
> "perldoc Mail::SpamAssassin::Plugin::AutoLearnThreshold" for details. You can 
> set the learning thresholds, which control what gets learned. The defaults 
> (0.1 and 12) mis-learn far too much spam as ham and not enough spam. I use 
> -0.2 and 6, which means I don't autolearn a lot but everything I autolearn as 
> ham has at least one hit on a substantial "nice" rule or 2 hits on weak ones. 
> There's a lot of vehemence against autolearn expressed here but not a lot of 
> evidence that it operates poorly when configured wisely. The defaults are NOT 
> wise. > Let's just assume for a moment that 1.4M ham-samples are valid. Bad 
> assumption. Your Bayes checks are uncertain about mail you've told SA is 
> definitely spam. That's broken. It's a sort of breakage that cannot exist if 
> you do not have a large quantity of spam that has been learned as ham. > Is 
> there a ham:spam ratio I should stick to it? No. > I presume if we have a 1:1 
> ratio then future messages won't be > considered as spam as well. The 
> ham:spam ratio in the Bayes DB or its autolearning is not a generally useful 
> metric. 1:1 is not magically good and neither is any other ratio, even with 
> reference to a single site's mailstream. A very large ratio *on either side* 
> indicates a likely problem in what is being learned, but you can't correlate 
> the ratio to any particularly wrong bias in Bayes scoring. It is an 
> inherently chaotic relationship. Factors that actually matter are correctness 
> of learning, sample quality, and currency. You can control how current your 
> Bayes DB is (USE AUTO-EXPIRE) but the other two factors are never going to be 
> perfect.