Adding IPs to the check list
Is there any "relativelly easy" way to add a new IP found in a non-standard header to the IPs checks (e.g. DNSRBL)??? plugin is the only way? Thanks. --PedroD
Re: URIBL_BLOCKED
On 2018-02-14 (09:55 MST), Tobiwrote: > > Am 14.02.2018 um 17:16 schrieb @lbutlr: >> I can't imagine why i'd be over limit, my mail server is tiny. > > its not the mailserver that got blocked by limits, but the dns resolver > your mailserver uses! I use my own DNS on Bind 9.12, however the block error is not appearing today, so... -- "...and that's not incense"
Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam
On Wed, 14 Feb 2018 16:20:30 +0100 Matus UHLAR - fantomas wrote: > >On Tue, 13 Feb 2018 21:02:46 + > >Horváth Szabolcs wrote: > >> One more question: is there a recommended ham to spam ratio? 1:1? > > On 14.02.18 15:09, RW wrote: > >No, this is a myth. Bayes computes token probabilities from a > >token's frequencies in spam and ham, so it all scales through. If > >you have 2000 ham and 200 spam the problem is too few spams, not a > >bad ratio. > > my experience says you will need more ham than spam, because you want > to get rid of false positives (ham marked as spam) much more than of > false negatives. My point is that an imbalance doesn't create a bias.
Re: URIBL_BLOCKED
On Wed, 14 Feb 2018, Tobi wrote: Am 14.02.2018 um 17:16 schrieb @lbutlr: I can't imagine why i'd be over limit, my mail server is tiny. its not the mailserver that got blocked by limits, but the dns resolver your mailserver uses! If you're using a 3rd party resolver (ex the ones from your provider or 8.8.8.8) you can hit the limits quite fast depending on how many other users use the same resolver for their uribl queries. I recommend to setup a local resolver (unbound or something similar) and use that resolver for your mailserver(s). This detail always gets glossed over: set up a local NON-FORWARDING resolver. If you set up a local resolver and it just forwards requests to your ISP's DNS servers, you have not materially changed the problem. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- A sword is never a killer, it is but a tool in the killer's hands. -- Lucius Annaeus Seneca (Martial) 4BC-65AD --- 8 days until George Washington's 286th Birthday
Re: URIBL_BLOCKED
Am 14.02.2018 um 17:16 schrieb @lbutlr: > I can't imagine why i'd be over limit, my mail server is tiny. its not the mailserver that got blocked by limits, but the dns resolver your mailserver uses! If you're using a 3rd party resolver (ex the ones from your provider or 8.8.8.8) you can hit the limits quite fast depending on how many other users use the same resolver for their uribl queries. I recommend to setup a local resolver (unbound or something similar) and use that resolver for your mailserver(s). Cheers tobi
Re: URIBL_BLOCKED
On 2/14/2018 11:16 AM, @lbutlr wrote: Ah, I didn't know URIBL was a blacklist, I thought it was being used as a generic abbreviation variant of RBL. I can't imagine why i'd be over limit, my mail server is tiny. It's confusing, I agree. See https://issues.apache.org/jira/browse/COMDEV-267?jql=text%20~%20%22GSOC%202018%22 for one of the ideas I wrote for improving it.
Re: URIBL_BLOCKED
On 2018-02-13 (14:45 MST), Reindl Haraldwrote: > > Am 13.02.2018 um 21:21 schrieb @lbutlr: >> 0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was >> blocked. >> See >> >> http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block >> for more information. >> [URIs: cz-salda.ru] >> So, I’ve never heard of cz-salda.ru, is that the RBL that is blocking me? If >> so, where is it listed in SA’s configuration (FreeBSD 11.1-RELEASE)? (tried >> a `grep salda.ru /usr/local/etc/mail/spamassassin/*` for no results) > > jesus christ click on the link you even quote I did click on the link. > "cz-salda.ru" was the domain which would have been checked against URIBL and > URIBL said "you are over limit, go away" Ah, I didn't know URIBL was a blacklist, I thought it was being used as a generic abbreviation variant of RBL. I can't imagine why i'd be over limit, my mail server is tiny. -- Women like silent men, they think they're listening.
Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam
On 02/14/2018 09:20 AM, Matus UHLAR - fantomas wrote: On Tue, 13 Feb 2018 21:02:46 + Horváth Szabolcs wrote: One more question: is there a recommended ham to spam ratio? 1:1? On 14.02.18 15:09, RW wrote: No, this is a myth. Bayes computes token probabilities from a token's frequencies in spam and ham, so it all scales through. If you have 2000 ham and 200 spam the problem is too few spams, not a bad ratio. my experience says you will need more ham than spam, because you want to get rid of false positives (ham marked as spam) much more than of false negatives. This is also my experience. what really matters is how many of FP/FNs you have, you can decrease probability by training anything too far from BAYES_00 for ham and BAYES_99 for ham Correct. You want to get ham hitting BAYES_00 and spam hitting BAYES_80, BAYES_95, BAYES_99, or BAYES_999 which mine does very well. A problem I have found is you shouldn't blindly train all spam as spam. I have some spam hitting BAYES_00 because it truly could be ham based on the body contents but it's spam because it was unsolicited email from someone "cold" emailing for a meeting or something. In this case, I block the sender and report it to SpamCop and other abuse so the account can be blocked/locked/disabled hopefully. If I had trained my Bayes with this email as spam, then legit email could hit BAYES_99. That is why my nightly process to train my Bayes DB in redis learns ham first then spam second. This seems to be the best order from my experience. -- David Jones
Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam
On Tue, 13 Feb 2018 21:02:46 + Horváth Szabolcs wrote: One more question: is there a recommended ham to spam ratio? 1:1? On 14.02.18 15:09, RW wrote: No, this is a myth. Bayes computes token probabilities from a token's frequencies in spam and ham, so it all scales through. If you have 2000 ham and 200 spam the problem is too few spams, not a bad ratio. my experience says you will need more ham than spam, because you want to get rid of false positives (ham marked as spam) much more than of false negatives. what really matters is how many of FP/FNs you have, you can decrease probability by training anything too far from BAYES_00 for ham and BAYES_99 for ham -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. LSD will make your ECS screen display 16.7 million colors
Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam
On Tue, 13 Feb 2018 21:02:46 + Horváth Szabolcs wrote: > One more question: is there a recommended ham to spam ratio? 1:1? No, this is a myth. Bayes computes token probabilities from a token's frequencies in spam and ham, so it all scales through. If you have 2000 ham and 200 spam the problem is too few spams, not a bad ratio. Theoretically there is a case for new training to match the ratio that's already in the database because then a new token will get a token probability that reflects its frequencies in recent mail. But I wouldn't worry about that, it's hard to stick to, and probably minor.
ApacheCon NA 2018 Travel Assistance Applications now open!
Hi everyone, The Travel Assistance Committee has asked the various Apache Project Management Committees to forward the following announcement to the user and dev mailing lists: --- The Travel Assistance Committee (TAC) are pleased to announce that travel assistance applications for ApacheCon NA 2018 are now open! We will be supporting ApacheCon NA Montreal, Canada on 24th - 29th September 2018 TAC exists to help those that would like to attend ApacheCon events, but are unable to do so for financial reasons. For more info on this years applications and qualifying criteria, please visit the TAC website at < http://www.apache.org/travel/ >. Applications are now open and will close 1st May. Important: Applications close on May 1st, 2018. Applicants have until the closing date above to submit their applications (which should contain as much supporting material as required to efficiently and accurately process their request), this will enable TAC to announce successful awards shortly afterwards. As usual, TAC expects to deal with a range of applications from a diverse range of backgrounds. We therefore encourage (as always) anyone thinking about sending in an application to do so ASAP. We look forward to greeting many of you in Montreal Kind Regards, Gavin - (On behalf of the Travel Assistance Committee)
Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam
They cannot (do not want, do not have the know how) study the e-mails, and therefore they cannot build a reliable corpus. All they can do is to trust the ability of their users to study their own e-mails well enough to do the job, hence the mess with ham/spam when feeding the Bayesian filter. They need to consult with a lawyer, fix their paperwork, hire people who can teach them everything they need to know, and invest at least two years full-time in the process. They just cannot install centos and SA and hope Bayesian filters to do their job out of magic. It just does not work that way. Sent from ProtonMail Mobile On Wed, Feb 14, 2018 at 05:48, Bill Colewrote: > On 13 Feb 2018, at 9:33, Horváth Szabolcs wrote: > This is a production mail > gateway serving since 2015. I saw that a few > messages (both hams and spams) > automatically learned by > amavisd/spamassassin. Today's statistics: > > 3616 > autolearn=ham > 10076 autolearn=no > 2817 autolearn=spam > 134 > autolearn=unavailable That's quite high for spam, ham, AND "unavailable" > (which indicates something wrong with the Bayes subsystem, usually > transient.) This seems like a recipe for a mis-learning disaster. For > comparison, my 2018 autolearn counts: spam: 418 ham: 15018 unavailable: 166 > no: 129555 I also manually train any spam that gets through to me (the > biggest spam target,) a small number of spams reported by others, and 'trap' > hits. A wide variety of ham is harder to get for training but I have found it > useful to give users a well-documented and simple way to help. One way is to > look at what happens to mail AFTER delivery which can indicate that a message > is ham without needing an admin to try to make a determination based on > content. The simplest one is to learn anything users mark as $NotJunk as ham. > Another is to create an "Archive" mailbox for every user and learn anything > as ham that has been moved there a day after it is moved. The most important > factor (especially in jurisdictions where human examination of email is a > problem) is to tell users how to protect their email and then do what you > tell them, robotically. In the US, Canada, and *SOME* of the EU, this is not > risky. However, I have been told by people in *SOME* EU countries that they > can't even robotically scan ANY mail content, so you shouldn't take my advice > as authoritative: I'm not even a lawyer in the US, much less Hungary... > I > think I have no control over what is learnt automatically. Yes, you do. Run > "perldoc Mail::SpamAssassin::Plugin::AutoLearnThreshold" for details. You can > set the learning thresholds, which control what gets learned. The defaults > (0.1 and 12) mis-learn far too much spam as ham and not enough spam. I use > -0.2 and 6, which means I don't autolearn a lot but everything I autolearn as > ham has at least one hit on a substantial "nice" rule or 2 hits on weak ones. > There's a lot of vehemence against autolearn expressed here but not a lot of > evidence that it operates poorly when configured wisely. The defaults are NOT > wise. > Let's just assume for a moment that 1.4M ham-samples are valid. Bad > assumption. Your Bayes checks are uncertain about mail you've told SA is > definitely spam. That's broken. It's a sort of breakage that cannot exist if > you do not have a large quantity of spam that has been learned as ham. > Is > there a ham:spam ratio I should stick to it? No. > I presume if we have a 1:1 > ratio then future messages won't be > considered as spam as well. The > ham:spam ratio in the Bayes DB or its autolearning is not a generally useful > metric. 1:1 is not magically good and neither is any other ratio, even with > reference to a single site's mailstream. A very large ratio *on either side* > indicates a likely problem in what is being learned, but you can't correlate > the ratio to any particularly wrong bias in Bayes scoring. It is an > inherently chaotic relationship. Factors that actually matter are correctness > of learning, sample quality, and currency. You can control how current your > Bayes DB is (USE AUTO-EXPIRE) but the other two factors are never going to be > perfect.