Re: Update SA on CentOS

2021-04-03 Thread Amir Caspi
On Apr 3, 2021, at 9:15 PM, Simon Wilson wrote: > > And then you are not stepping away from one of CentOS's main advantages - > stable packages not built outside of RPM. For what it's worth, using the Fedora package has been exceedingly stable on my CentOS 7 system. SA is extensively tested

Re: Update SA on CentOS

2021-04-03 Thread Amir Caspi
On Apr 3, 2021, at 12:26 PM, Olaf Sommer wrote: > > Can someone help me to update SA to a newer version? The easiest way would probably be to build a local package from source, using the Fedora SRPM. Under a regular (not root) account, do the following: # wget

Re: Technically not spam

2020-05-31 Thread Amir Caspi
On May 31, 2020, at 3:35 PM, @lbutlr wrote: > > Good to know (I guess?) the last update note I saw for Squirrelmail was to > make it work with PHP 5.5 back in 2013. Is there a fork somewhere or does it > just work with PHP 7.3. And does that include 7.2? The SVN version (1.4.23+) supports PHP

Re: rpm of centos

2020-02-06 Thread Amir Caspi
On Jan 11, 2020, at 11:14 AM, John Hardin wrote: > > I found one very minor modification was needed: the spec file requires > "perl-interpreter" and that doesn't seem to see the "perl" package installed > under Centos 7. Ah, now I recall why I didn't run into this... I had installed the

Re: rpm of centos

2020-01-09 Thread Amir Caspi
On Jan 9, 2020, at 6:59 PM, Rick Gutierrez wrote: > > Hi everyone, someone from the list who can share the rpm of the > latest version of spamassassin for centos 7 and 6 of x64, I want to > update to the latest version and I can't find the rpm. SA 3.4.2 is available for Fedora, and you can

Re: DMARC_REJECT?

2019-11-15 Thread Amir Caspi
On Nov 15, 2019, at 4:35 PM, RW wrote: > > DKIM_VALID_AU is too strict for DMARC as it requires strict alignment. Indeed, although I wonder if DKIM_VALID_AU is itself too strict? In particular, one sender that triggers this issue is coming from a .gov 3rd-level subdomain where the valid DKIM

Re: DMARC_REJECT?

2019-11-15 Thread Amir Caspi
On Nov 15, 2019, at 9:50 AM, David Jones wrote: > > If SA is being run post MTA (i.e. inside Thunderbird) then any filtering > can change the content to remove potentially bad attachments, add an > "EXTERNAL" warning to the Subject or body, etc. which will break DKIM > signing. I believe

DMARC_REJECT?

2019-11-13 Thread Amir Caspi
Hi all, Over the past few weeks I've been getting occasional DMARC_REJECT hits on valid mail (e.g., from family, from valid bank emails, etc.). It's unclear why this is happening since the SA report doesn't really give any information. The score on DMARC_REJECT is +10, which is

Re: Where to find the highest version to be installed by "yum"?

2019-09-27 Thread Amir Caspi
On Sep 27, 2019, at 10:40 AM, Bowie Bailey wrote: >> Question: Are you folks aware of any 'yum' repository that carries a version >> higher than 3.3.1? >> > > I'm not aware of any newer yum repositories What version of Linux distro are you using? For RHEL/CentOS 7, SA 3.4.0 is available

Re: Loads of recent low-scoring snowshoe spam

2019-09-26 Thread Amir Caspi
On Sep 26, 2019, at 10:18 AM, John Hardin wrote: > > Some of those are following a pattern I've recently noticed - fairly > obviously bogus spamvertising domain URLs with some .gov URLs thrown in as > well. I'm assuming that's an attempt to leverage naïve domain whitelisting. > One has a

Loads of recent low-scoring snowshoe spam

2019-09-25 Thread Amir Caspi
Hi all, In recent weeks, my server has been getting hit with tons of snowshoe spam. Much of it is not getting filtered because even when it hits Bayes, it doesn't hit basically any other rules, and therefore is scoring just below 5 points. (Much of it hits only BAYES_50 and is therefore

Re: new emotet campain

2019-09-18 Thread Amir Caspi
On Sep 18, 2019, at 3:19 AM, Riccardo Alfieri wrote: > > You are correct, URLhaus domains enter DBL as abused legit malware, but the > default SA score is not enough to mark the email as spam (and that's correct > as it checks only the domain). Since the return code for the domain is

Re: new emotet campain

2019-09-17 Thread Amir Caspi
On Sep 17, 2019, at 12:15 PM, John Hardin wrote: > > On Tue, 17 Sep 2019, hg user wrote: > >> It is a "dumb" rule but the quicker I could create. >> >> https://pastebin.com/bxRSds7a > > Suggestions: > > (1) use a URI rule rather than a BODY rule > > (2) escape the periods; you want to match

Re: FSL_BULK_SIG firing on only one of two identical spams?

2019-07-26 Thread Amir Caspi
On Jul 26, 2019, at 4:24 PM, RW wrote: > > Most of the difference in score came from RAZOR2_CHECK and > RAZOR2_CF_RANGE_51_100 rather than FSL_BULK_SIG. Thanks to both you and Bill for the clarification. If only the first spam had come 15 seconds later, Razor would have caught it. What a

FSL_BULK_SIG firing on only one of two identical spams?

2019-07-26 Thread Amir Caspi
Hi all, In recent weeks I've been receiving many of my spams in doubles -- essentially identical spam except for the faked From and the various "Bayes poison" random text. I just got one such pair where FSL_BULK_SIG fired on one spam, but not the other, even though their content (except for

Re: Meta for bogus MIME with DKIM valid?

2019-07-08 Thread Amir Caspi
On Jul 8, 2019, at 2:15 PM, Joseph Brennan wrote: > > I am sorry to say that this spammer seems to have fixed the error. I have > seen none at all for a few weeks. What I *have* seen are heavy spam barrages > once a week that are from similar IP ranges that the spammer used but without > the

Re: Zero-width rules?

2019-06-28 Thread Amir Caspi
On Jun 28, 2019, at 11:33 AM, Antony Stone wrote: > > Indeed - people even promote its use: > > https://litmus.com/blog/the-little-known > Uuughh. I'd argue they deserve to be classified as spam just for doing that. =P I know, I

Machine learning with or vs. Bayes?

2019-06-27 Thread Amir Caspi
Hi all, I don't suppose anyone has a neural-net-based SA Machine Learning plugin or external program, to complement or replace Bayes? There are a number of fairly compact Python ML packages that would greatly ease this task nowadays, like TensorFlow. It looks like rspamd has a neural net

Re: Zero-width rules?

2019-06-27 Thread Amir Caspi
On Jun 27, 2019, at 12:04 PM, John Hardin wrote: > >> There's still not enough of that to trigger a scored rule, though. It may >> need some review of the masscheck results, and tuning. > > OK, retuned. FWIW, the x200b entity occurs only in my spam; I see it nowhere in my ham inbox or

Re: Rules for invisible div and 0pt font?

2019-06-26 Thread Amir Caspi
On Jun 18, 2019, at 2:21 AM, Giovanni Bechis wrote: > >> rawbody AC_HIDDEN_FONT /font-size\s*:\s*0\s*(?:em|pt|px|%)\s*;/ >> > There is T_HIDDEN_WORD on my sandbox > (https://ruleqa.spamassassin.org/20190617-r1861495-n/T_HIDDEN_WORD/detail) > I have just committed a more generic version.

Re: Zero-width rules?

2019-06-26 Thread Amir Caspi
On Jun 26, 2019, at 4:13 PM, Kevin A. McGrail wrote: > I don't know charset="UTF-8" is in the email and for the ZWNJ at least, that > was in windows-1256. Is anything in UTF-8? > According to https://www.codetable.net/hex/200b and

Re: Zero-width rules?

2019-06-26 Thread Amir Caspi
On Jun 26, 2019, at 4:04 PM, Kevin A. McGrail wrote: > > The sample you sent isn't encoded with a charset that will do anything > with . I think it's a literal string of "" because the > email is just plain text. I am using KAM.cf, yes. The plaintext portion of the email is just the literal

Zero-width rules?

2019-06-26 Thread Amir Caspi
John et al, I recall from a prior thread last year that there were supposed to be some rules to check for zero-width joiner characters... but I'm seeing spams recently that have these, but don't hit any such rules. Here's one spample, where the ZWJ entity #x200B is being used to try to

Re: Rules for invisible div and 0pt font?

2019-06-18 Thread Amir Caspi
On Jun 18, 2019, at 10:55 AM, Bill Cole wrote: > > Looking at the 2 most recent (a USPS "Informed Delivery Daily Digest" message > and Office Depot order followup) I see display:none only in inline style > attributes of block elements. e.g.: Looks like the first one is a web bug. The

Re: Rules for invisible div and 0pt font?

2019-06-18 Thread Amir Caspi
Are the matches all within @media blocks like lbutlr suggested or do they occur inline within div/span/etc as well? Thanks! --- Amir thumbed via iPhone > On Jun 18, 2019, at 8:42 AM, Bill Cole > wrote: > >> On 17 Jun 2019, at 15:25, @lbutlr wrote: >> >>> On

Re: Rules for invisible div and 0pt font?

2019-06-17 Thread Amir Caspi
On Jun 17, 2019, at 2:17 PM, Amir Caspi wrote: > > rawbody AC_MEDIA_DISPLAYNONE > /@media[^{]*{[^}]*display\s*:\s*none\s*;/i > Well, urgh, this particular rule wouldn't work well since it wouldn't capture classes within the @media block. But something LIKE it. --- Amir

Re: Rules for invisible div and 0pt font?

2019-06-17 Thread Amir Caspi
On Jun 17, 2019, at 1:45 PM, @lbutlr wrote: > > Would only be active if the width of the window is 900px or less. That can > include setting a display property to hidden or not. One way of working around that, then, would be to ensure this is only within a div/span tag... Maybe something

Re: Rules for invisible div and 0pt font?

2019-06-17 Thread Amir Caspi
On Jun 17, 2019, at 1:18 PM, Antony Stone wrote: > > If this feature *is* used for screenreaders, you could be creating a false > positive trap here... You may well be right, hence the request to sandbox and see how it compares against masscheck. On Jun 17, 2019, at 1:25 PM, @lbutlr wrote:

Re: Rules for invisible div and 0pt font?

2019-06-17 Thread Amir Caspi
On Jun 17, 2019, at 1:14 PM, Amir Caspi wrote: > > rawbody AC_HIDDEN_FONT /font-size\s*:\s*0\s*(?:em|pt|px|%)\s*;/ > Actually, based on another spample (https://pastebin.com/rrU2AsVT <https://pastebin.com/rrU2AsVT>), let's modify this one -- the em/pt/px/% isn't req

Rules for invisible div and 0pt font?

2019-06-17 Thread Amir Caspi
Hi all, In reviewing today's FNs I came across the following spample: https://pastebin.com/9QQVwUY6 There is a div here with display:none, as well as font-size:0px. The spample hits HTML_FONT_LOW_CONTRAST but does not appear to hit any rule relating to a hidden div or tiny font. Does

Re: Meta for bogus MIME with DKIM valid?

2019-06-12 Thread Amir Caspi
On Jun 4, 2019, at 2:11 PM, Amir Caspi wrote: > > Locally, I've got the score at 4.0, and will be increasing it to 4.5 shortly. > At least with my spamset (per the spamples I posted), a score of 4.5 seems > to be the "magic" value that should catch almost all the FNs

Re: Proposed rule for too many dots in From

2019-06-10 Thread Amir Caspi
On Jan 26, 2019, at 10:27 AM, John Hardin wrote: > > On Thu, 24 Jan 2019, Amir Caspi wrote: > >> On Jan 15, 2019, at 8:46 AM, John Hardin wrote: >>> >>>> On Dec 20, 2018, at 6:16 PM, Amir Caspi wrote: >>>>> >>>>&

Re: New URL shortener

2019-06-06 Thread Amir Caspi
On Jun 6, 2019, at 9:03 PM, Kenneth Porter wrote: > I'm seeing a lot of fake DHL delivery notices using the shortener > smarturl.it. I suggest adding it to __URL_SHORTENER. FWIW there is a long list of url shorteners as part of the DecodeShortURLs plugin (sadly, no longer maintained), here:

Re: Help matching a spam (regex)

2019-06-04 Thread Amir Caspi
On Jun 4, 2019, at 4:05 PM, RW wrote: > > On Tue, 4 Jun 2019 16:06:10 -0300 Marcio Vogel Merlone dos Santos wrote: > >> Trying to match a message using uri_detail with no luck. On body I >> have something like this: >> >> Something > represents a '→' (right arrow) character, IIWY I'd try

Re: Meta for bogus MIME with DKIM valid?

2019-06-04 Thread Amir Caspi
On Jun 4, 2019, at 1:24 PM, Paul Stead wrote: > > Certainly worth letting QA do it's thing and autoscore? My worry about autoscore is that if it looks at network tests, particularly RBLs, then it may reduce the value of the rule. The primary value of this rule is for early botnet runs before

Re: Meta for bogus MIME with DKIM valid?

2019-06-03 Thread Amir Caspi
Hi Kevin, Here are some spamples -- I've specifically chosen the ones that did NOT score enough through other means to get tagged, i.e., these are false negatives. Note that many of them have valid DKIM and hit no other markers. (The spample will NOT pass DKIM because headers have been

Re: Meta for bogus MIME with DKIM valid?

2019-05-29 Thread Amir Caspi
171 > > >> On Wed, May 29, 2019 at 7:44 PM Amir Caspi wrote: >> I’m surprised, a huge percentage of the spam we get hits this rule. I am >> happy to submit spamples, but it is a very big spam indicator for our little >> server. >> >> --- Amir >>

Re: Meta for bogus MIME with DKIM valid?

2019-05-29 Thread Amir Caspi
the rule had no merit based on current > mailstreams. Our guess was that the spam run it hit has ended. It is a > deadweight rule. > >> On Wed, May 29, 2019, 18:05 John Hardin wrote: >> On Thu, 16 May 2019, John Hardin wrote: >> >> > On Thu, 16 May 2019,

Re: Meta for bogus MIME with DKIM valid?

2019-05-16 Thread Amir Caspi
On Apr 26, 2019, at 4:51 PM, RW wrote: > > headerBOGUS_MIME_VERSION MIME-Version =~ /^(?!\s*1\.0).+/ > > it may be better to change that to > > /^(?!.*\b1\.0\b).+/ > > to avoid punishing the form > > Mime-Version: (Nosuch Mail 2.0) 1.0 > > which is valid, though I don't think

Meta for bogus MIME with DKIM valid?

2019-04-26 Thread Amir Caspi
I've been getting a bunch of FNs lately that are managing to avoid my Bayes DB. Invariably, they ALL seem to hit on BOGUS_MIME_VERSION (which I don't know whether is standard, but I implemented it locally and would recommend it in the distro if it's not there already), and it seems like most

Re: uninitialized value $( in Util.pm line 1595

2019-04-08 Thread Amir Caspi
On Apr 8, 2019, at 9:46 PM, Bill Cole wrote: > I think it's right now... I'm still a bit puzzled by how a null apparently > got into the first position of $(, but I hope it was an aberration. > > Got it. Try the new patch. I'll spin up my Centos test VM tomorrow and try to > reproduce this

Re: uninitialized value $( in Util.pm line 1595

2019-04-08 Thread Amir Caspi
On Apr 8, 2019, at 7:06 PM, Bill Cole wrote: > I believe the issue is with group 0. I'm working on it... > Have you tested with a user who is NOT in group 0? I'm a bit confused. spamd is running setuid root so it starts in group 0, but spamc is called with -u so spamd does a setuid and setgid

Re: uninitialized value $( in Util.pm line 1595

2019-04-08 Thread Amir Caspi
On Apr 8, 2019, at 5:37 PM, Bill Cole wrote: > What does running 'id -a' as the problem user say? uid=1000(centos) gid=1000(centos) groups=1000(centos),4(adm),10(wheel),190(systemd-journal) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 > I still haven't been able to reproduce

Re: uninitialized value $( in Util.pm line 1595

2019-04-08 Thread Amir Caspi
On Apr 8, 2019, at 4:38 PM, @lbutlr wrote: > Is it usually too have a spam user? My system runs SA per-user, using sendmail MTA, procmail LDA "glue" calling spamc with the -u option. So spamc will setuid to the calling user. Any user who is in more than one group will experience this

uninitialized value $( in Util.pm line 1595

2019-04-08 Thread Amir Caspi
Kevin et al, I am getting errors in maillog relating to an uninitialized $( in Util.pm, line 1595: Mar 24 03:28:35 server spamd[27149]: Use of uninitialized value $( in concatenation (.) or string at /usr/share/perl5/vendor_perl/Mail/SpamAssassin/Util.pm line 1595. I saw an existing bugzilla

Re: Proposed rule for too many dots in From

2019-01-24 Thread Amir Caspi
On Jan 15, 2019, at 8:46 AM, John Hardin wrote: > >> On Dec 20, 2018, at 6:16 PM, Amir Caspi wrote: >>> >>> header AC_FROM_MANY_DOTS From =~ /<(?:\w{2,}\.){2,}\w+@/ > > Argh. I lost track of that over the holidays. Thanks for the reminder, add

Re: Proposed rule for too many dots in From

2019-01-14 Thread Amir Caspi
On Dec 20, 2018, at 6:16 PM, Amir Caspi wrote: > > headerAC_FROM_MANY_DOTS From =~ /<(?:\w{2,}\.){2,}\w+@/ > > John, could you update the sandbox rule to the above? That should whittle > down FPs. I'd recommend leaving it as 2 letters, though, since a n

Re: Proposed rule for too many dots in From

2018-12-20 Thread Amir Caspi
On Dec 20, 2018, at 7:49 PM, Grant Taylor wrote: > > So here's the user parts (left hand side of the @) of emails. Are these in the From: header or the envelope-from (Return-Path)? Some of the ones with equal-signs look like bounce addresses from envelopes, that would not be in the From

Re: Proposed rule for too many dots in From

2018-12-20 Thread Amir Caspi
On Dec 20, 2018, at 5:13 PM, Noel Butler wrote: > I have to agree with Grant, two dots is crazy low, you might as well score at > one dot. A lot of emails are firstname.initial.surname even many government > departments in this part of the world use two dot format. > I never intended for the

Proposed rule for too many dots in From

2018-12-20 Thread Amir Caspi
John, would you mind sandboxing a rule? Two or more dots in the From username seems to be rather spammy (and we've talked about it before on the list). Would you mind sandboxing this test rule to see if it would be helpful as a main rule? I get a lot of spam locally that hits this...

Re: Bayes underperforming, HTML entities?

2018-12-07 Thread Amir Caspi
On Dec 6, 2018, at 12:14 PM, John Hardin wrote: > > Runaway backtracking that was killing masscheck for several people. Hrm, that is disconcerting. I'm not sure where any backtracking might be occurring... Can anyone help improve this suggested rule? rawbody AC_HTML_ENTITY_BONANZA_NEW

Re: Bayes underperforming, HTML entities?

2018-12-04 Thread Amir Caspi
On Dec 1, 2018, at 10:31 AM, John Hardin wrote: > >> On Thu, 29 Nov 2018, Amir Caspi wrote: >> >>> A) Could you sandbox the proposed rule change (AC_HTML_ENTITY_BONANZA_NEW) >>> and see how it performs, including possible FPs? > > Done. Any preliminar

Re: Bayes underperforming, HTML entities?

2018-11-30 Thread Amir Caspi
On Nov 30, 2018, at 7:00 AM, Bill Cole wrote: > >> Since HTML is already getting rendered to text, then perhaps the conversion >> code should strip (literally, just delete) any zero-width characters during >> this conversion? That should make normal body rules, and Bayes, function >>

Re: Bayes underperforming, HTML entities?

2018-11-30 Thread Amir Caspi
On Nov 30, 2018, at 6:09 AM, RW wrote: > > The most substantial problem here is that these invisible characters > make it very hard to write ordinary body rules. Thanks for the clarification on my confusion. Since HTML is already getting rendered to text, then perhaps the conversion code

Re: Bayes underperforming, HTML entities?

2018-11-29 Thread Amir Caspi
On Nov 29, 2018, at 10:11 PM, Bill Cole wrote: > > I have no issue with adding a new rule type to act on the output of a partial > well-defined HTML parsing, something in between 'rawbody' and 'body' types, > but overloading normalize_charset with that and so affecting every existing > rule

Re: Bayes underperforming, HTML entities?

2018-11-29 Thread Amir Caspi
On Nov 29, 2018, at 3:27 PM, John Hardin wrote: > > I'll see whether those can be incorporated into the existing UNICODE_OBFU_ZW > rule (which of course will no longer actually be UNICODE :) ) Great. Maybe rename the rule. ;-) What are your thoughts on item #2? Specifically: A) Could you

Re: Bayes underperforming, HTML entities?

2018-11-29 Thread Amir Caspi
On Nov 10, 2018, at 11:30 AM, John Hardin wrote: > > Initial results (again, all corpora aren't in yet)... > > The rawbody rules perform much better (unsurprising), and the ASCII-only one > has a better raw S/O: > >

Re: Bayes underperforming, HTML entities?

2018-11-15 Thread Amir Caspi
On Nov 15, 2018, at 2:36 PM, John Hardin wrote: > >> It doesn't seem to have a very high score just yet... I'm still getting FNs >> with the rule hitting (due to those messages hitting BAYES_00/05). > > Manually train those messages as spam and that should repair itself... Actually... right

Re: Bayes underperforming, HTML entities?

2018-11-15 Thread Amir Caspi
On Nov 15, 2018, at 2:36 PM, John Hardin wrote: > > That and its resistance to FP avoidance. Despite the generality, I don't see a significant FP risk on the general unicode version. I don't see ANY legitimate reason why an email would hard-encode long sequences of human-readable text, in

Re: Bayes underperforming, HTML entities?

2018-11-15 Thread Amir Caspi
On Nov 10, 2018, at 11:30 AM, John Hardin wrote: > > The rawbody rules perform much better (unsurprising), and the ASCII-only one > has a better raw S/O: It looks like HTML_ENTITY_ASCII has been rolled out -- did you decide against the more general unicode due to S/O score? I predict we will

Re: Bayes underperforming, HTML entities?

2018-11-09 Thread Amir Caspi
On Nov 9, 2018, at 8:49 AM, John Hardin wrote: > >> rawbody HTML_ENC_ASCII >> /(?:&\#(?:(?:\d{1,2}|1[01]\d|12[0-7])|x[0-7][0-9a-f])\s*;\s*){10}/i > > I'll add that too so that we can compare the results. Per my reply a few minutes ago, I think this will be too restrictive. While the

Re: Bayes underperforming, HTML entities?

2018-11-09 Thread Amir Caspi
On Nov 9, 2018, at 8:10 AM, Matus UHLAR - fantomas wrote: > > how many spams and hams did you train then? As of right now: 0.000 0 258427 0 non-token data: nspam 0.000 0 106813 0 non-token data: nham 0.000 0 438310 0 non-token

Re: Bayes underperforming, HTML entities?

2018-11-09 Thread Amir Caspi
On Nov 9, 2018, at 7:41 AM, RW wrote: > > I was really referring to the fact that it's pure ASCII text that's > being encoded rather than long runs per se That is true for the current batch of messages, but as we've seen, spammers love to use unicode obfuscation to try to foil Bayes and other

Re: Bayes underperforming, HTML entities?

2018-11-08 Thread Amir Caspi
On Nov 8, 2018, at 7:55 PM, John Hardin wrote: > > I left it case-sensitive; is there some reason the entities cannot be coded > as (e.g.) ? I kinda doubt it, so it should *probably* be > case-insensitive to avoid trivial bypass. I think it should be insensitive, sorry for that oversight on

SpamAssassin 3.4.2 -- RPM for CentOS 5

2018-11-08 Thread Amir Caspi
Hi all, I finally had some bandwidth and was able to get an RPM built for CentOS 5. I used Kevin Fenzi's CentOS 6 source RPM from COPR rather than one from Fedora, though I imagine Fedora would probably work just fine. The only thing I had to do to get this to work was to install the

Re: Bayes underperforming, HTML entities?

2018-11-08 Thread Amir Caspi
On Nov 8, 2018, at 7:41 PM, John Hardin wrote: > > Sure, but I't also prefer to have a sample to test on before committing. I'll > see if I can get the pastebin to work (i.e. fix the boundary) > I can send you some new spamples via attachment, privately. Unfortunately I lost those

Re: Bayes underperforming, HTML entities?

2018-11-08 Thread Amir Caspi
On Nov 8, 2018, at 4:51 PM, RW wrote: > > Unnecessary encoding is fairly common, but a long runs of ASCII > characters encoded like this seems extreme. Right, that was a question I had asked in my email this morning... whether we have a rule to detect long sequences of HTML entities. It would

Re: Bayes underperforming, HTML entities?

2018-11-08 Thread Amir Caspi
On Nov 8, 2018, at 2:19 PM, Bill Cole wrote: > > [Resending because it looks like my first send went into a black hole...] All SA messages appear to be coming with significantly delays today... not sure why. I got RW's first message, sent at 8am today, only about an hour ago, AFTER the

Re: Bayes underperforming, HTML entities?

2018-11-08 Thread Amir Caspi
On Nov 8, 2018, at 12:20 PM, RW wrote: > > these emails don't contain a valid HTML mime section. They contain a bogus > html section that doesn't > start with the separator defined in the top-level Content-Type header. Sorry, that is totally my fault. In the spample, I was trying to sanitize

Re: Bayes underperforming, HTML entities?

2018-11-08 Thread Amir Caspi
On Nov 8, 2018, at 12:20 PM, RW wrote: > > I've already explained this. Sorry, I don't recall this discussion, my apologies. > Do these actually display on any email client? Yes. For example, for the first spample (https://pastebin.com/peiXZivJ), Apple Mail (OS X) displays the decoded HTML

Re: Bayes underperforming, HTML entities?

2018-11-08 Thread Amir Caspi
On Nov 8, 2018, at 2:30 AM, Matus UHLAR - fantomas wrote: > > Do you use autolearn? There are a few rules to detect ham (score > negatively), many of them based on default whitelists and DNS whitelists, > where many mails come from grey area companies, not necessarily spam, but > training their

Re: Bayes underperforming, HTML entities?

2018-11-08 Thread Amir Caspi
> do you regularly perform sa-update on that box? Yes, it is run every night. However, I am still running 3.4.1, so if the sha1 access has already been disabled, my updates are likely failing as of recently. I'm working on updating to 3.4.2 but this is an ancient box and I haven't yet had the

Re: Bayes underperforming, HTML entities?

2018-11-08 Thread Amir Caspi
On Nov 7, 2018, at 12:33 PM, Amir Caspi wrote: > > In many cases, it would appear that these spams have either very little > (real) text (besides the usual attempt at Bayes poisoning) and/or are using > HTML-entity encoding to try to bypass Bayes. Here are a couple of spamples

Bayes underperforming, HTML entities?

2018-11-07 Thread Amir Caspi
Hi all, In the past couple of weeks I've gotten a number of clearly-spam messages that slipped past SA, and the only reason was because they were getting low Bayes scores (BAYES_50 or even down to BAYES_00 or BAYES_05). I do my Bayes training manually on both ham and spam so there

Re: [ANNOUNCE] Apache SpamAssassin 3.4.2 available

2018-09-17 Thread Amir Caspi
> On Sep 17, 2018, at 11:22 AM, Kevin A. McGrail wrote: > > I'd be pretty shocked if you have to do very much to that src rpm for > 3.4.1 to get 3.4.2 working. Possibly if I knew what I was doing with src rpms, that would be the case. ;-) Hoping someone who knows a lot more than I do is

Re: [ANNOUNCE] Apache SpamAssassin 3.4.2 available

2018-09-17 Thread Amir Caspi
Is there anyone so kind as to perhaps make an RPM for CentOS 5? There are still more than a few dinosaurs running that OS that can't upgrade but would love to have SA. I could probably build it from the src rpm but I'm not an expert... Kevin Fenzi has a repo with 3.4.1 for CentOS 5 and 6, but

Re: Periodic error

2018-08-01 Thread Amir Caspi
On Aug 1, 2018, at 6:09 PM, John Hardin wrote: > Recommendation: download the spamassassin-3.4.1-12 (or later) SRPM from > Fedora and try building an RPM from it in a Centos 6 dev environment. That's > what I did for Centos 7 and it works jes' fine. Kevin Fenzi maintains an SA repo for

Re: Spamassassin and spamc do not use same rules

2018-04-25 Thread Amir Caspi
On Apr 25, 2018, at 11:08 AM, Matus UHLAR - fantomas wrote: > > actually, no. it changes to users as needed. This way multiple users can use > spamd with per-user config files. Sorry, let me be more clear: if spamc is invoked by root, such that spamd would try to setuid

Re: Spamassassin and spamc do not use same rules

2018-04-25 Thread Amir Caspi
On Apr 25, 2018, at 8:57 AM, Paul R. Ganci wrote: > > Sorry I should have mentioned that. I was aware of that issue. As you can see > spamd is running as root in this case and the spamassassin tests were also > done as root. spamd running as root doesn't run as root; it

Re: SpamAssassin 3.4.2.

2018-04-17 Thread Amir Caspi
On Apr 17, 2018, at 4:23 PM, Bill Cole wrote: > > At my last job where there were supported RHEL machines, I asked a RH support > person a similar question regarding Postfix and got the answer: "If you want > Fedora, you know where to get it." I'd

Re: SpamAssassin 3.4.2.

2018-04-17 Thread Amir Caspi
On Apr 17, 2018, at 2:38 PM, David Jones wrote: > > The CentOS 5 and 6 boxes out there aren't going to get the new version unless > it gets put in some other repo like EPEL or another third party since they > are not getting any updates. EPEL 5 is frozen AFAIK. This would

Re: SpamAssassin 3.4.2.

2018-04-17 Thread Amir Caspi
On Apr 17, 2018, at 1:12 PM, David Jones wrote: > > Once 3.4.2 comes out soon, we need to get an official version in EPEL or > something. Hopefully someone knows someone at EPEL to make this happen. I > think everyone had to build 3.4.1 themselves from the Fedora RPM spec

Re: Differing scores on spamassassin checks

2018-04-16 Thread Amir Caspi
On Apr 16, 2018, at 11:15 AM, RW wrote: > > You seem to be confusing unix and virtual users. Sorry, I was confusing "virtual hosting" with "virtual users." Oops. Ignore me! --- Amir

Re: Differing scores on spamassassin checks

2018-04-16 Thread Amir Caspi
> On Apr 15, 2018, at 12:39 PM, Computer Bob wrote: > > I still am a bit puzzled how bayes db gets handled when using virtual users > and domains. I see no trace of bayes or .spamassassin files in any of the > virtual locations or in the sql databases. If you want

Re: Spam from compromised accounts scoring just under block threshold

2018-04-02 Thread Amir Caspi
On Mar 31, 2018, at 4:52 AM, Pedro David Marco wrote: > > Amir, can you provide any pastebin sample, please? I thought it was relatively self-explanatory, but I'm talking about names very much like the ones that Rich Wales included in his recent email (subject: "Spam

Re: Spam from addresses where full name mirrors left-hand side of address

2018-04-02 Thread Amir Caspi
On Apr 1, 2018, at 11:33 PM, Rich Wales wrote: > > I do realize some perfectly legitimate "From:" lines conform to this same > pattern, and the only way to really tell the difference may be via AI or a > real human brain. Not just "some" legitimate mail... a LOT of legitimate

Re: This sucks

2018-04-01 Thread Amir Caspi
On Apr 1, 2018, at 10:26 AM, Michael Brunnbauer wrote: > > running my example spam through spamassassin gets it marked as spam while > using spamc+spamd does not. I know this is the equivalent of “did you plug it in” but... did you restart spamd after rebuilding Net::DNS?

Re: Spam from compromised accounts scoring just under block threshold

2018-03-30 Thread Amir Caspi
s four or more. Has anyone been testing this as a meta rule? Cheers. --- Amir > On Mar 6, 2018, at 9:37 AM, John Hardin <jhar...@impsec.org> wrote: > > On Mon, 5 Mar 2018, Amir Caspi wrote: > >> On Mar 5, 2018, at 11:13 PM, John Hardin <jhar...@impsec.org> wr

Bayes and hyphens

2018-03-30 Thread Amir Caspi
Hi all, Does Bayes tokenize on word boundaries and hence would ignore hyphens? Or does it include them? I've seen a lot of spam lately inserting random hyphens between key spammy words (like "economic-crisis"), presumably in an attempt to bypass word filters and/or Bayes. So would

Re: DecodeShortURLs database breaks with setuid spamd

2018-03-06 Thread Amir Caspi
On Mar 6, 2018, at 5:19 PM, RW wrote: > > Or probably more commonly when running the spamassassin perl script as > an ordinary user for test purposes. Right, if the DB is owned by that user, then they would see the rule fire with spamassassin and might assume it's

Re: Spam from compromised accounts scoring just under block threshold

2018-03-05 Thread Amir Caspi
On Mar 5, 2018, at 11:13 PM, John Hardin wrote: > > *before* the @ sign. > > It may be perfectly valid to do that, but if it happens more often in spam > than in legitimate mail it is useful to us. I’m seeing a lot of spam lately with usernames like

DecodeShortURLs database breaks with setuid spamd

2018-03-05 Thread Amir Caspi
Hi all, Just FYI, for those of you who use DecodeShortURLs.pm ... it appears that, if you are running in a per-user setup (i.e., running spamd as root such that it does a setuid when invoked from spamc, and/or allowing individual users to run spamassassin), then the short-URL cache

Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)

2018-02-26 Thread Amir Caspi
> On Feb 26, 2018, at 11:00 AM, Kevin A. McGrail > wrote: > >> DecodeShortURLs has been on my list of must-have plugins for years, so >> I was a little surprised it took so long for someone to mention it >> in this thread. > Yeah, my firm is going to look at

Re: Bayes not auto-learning?

2018-02-23 Thread Amir Caspi
On Feb 23, 2018, at 11:47 PM, David B Funk wrote: > It could have 20 points from a whole bunch of body rules but if it only hit 2 > points via header rules it still will not auto-learn. Gotcha. The spam in question that triggered this hit a lot of rules, but hard

Bayes not auto-learning?

2018-02-23 Thread Amir Caspi
Hi all, So, I've been trying to tweak my setup and noticed that VERY few of my emails are being autolearned as spam, even when their spam threshold is far above the autolearn threshold. The threshold is set to 12; I just saw a spam with score >25 not being autolearned. Are

Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)

2018-02-21 Thread Amir Caspi
> On Feb 21, 2018, at 12:45 PM, Dianne Skoll wrote: > > Someone earlier posted a link to https://github.com/smfreegard/DecodeShortURLs Oops, I missed that... must have thought it was just about decoding and not about SA. Thanks for clarifying! --- Amir

Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)

2018-02-21 Thread Amir Caspi
> On Feb 21, 2018, at 9:57 AM, Dianne Skoll wrote: > > That's why you only want to do it for URLs that are > absolutely known to be shortened URLs. You have to keep a list of > known URL-shorteners. On that note -- regardless of what OTHER HW/SW solutions might do,

Re: From:name spoofing

2018-02-16 Thread Amir Caspi
> On Feb 16, 2018, at 4:41 PM, John Hardin wrote: > > Not necessarily safe. If your MTA receives a message without a Message-ID, it > is supposed to generate one. And if it does so, it will probably do so using > your (recipient) domain... Wouldn't this also FP on messages

Re: New Mail::SpamAssassin::Plugin::HeadersEqual plugin

2016-09-08 Thread Amir Caspi
> On Sep 8, 2016, at 10:05 AM, apache.org+spamassas...@daniel-rudolf.de wrote: > > As you can see, SA will increase the score by 0.5 when the From: and > Return-Path: headers don't match ("ne" for "not equal"). This particular rule will FP for most mailing list emails... including this one.

Re: Malware URI rule

2015-11-09 Thread Amir Caspi
On Nov 9, 2015, at 10:20 AM, Benny Pedersen wrote: > > and it was the only rule that hitted ? > > think again A score of 6 is a poison pill for a threshold of 5 unless there are significant negative-score rules that hit. If an email is otherwise "neutral" (Bayes 50, no

  1   2   >