Re: recent update to __STYLE_GIBBERISH_1 leads to 100% CPU usage
On Wed, 2019-05-29 at 12:47 +0200, Stoiko Ivanov wrote: > On Wed, 29 May 2019 11:31:42 +0200 Matthias Egger wrote: > > On 28.05.19 10:31, Stoiko Ivanov wrote: > > > with a recent update to the ruleset, we're encountering certain > > > mails, which cause the rule-evaluation to use 100% cpu. Thanks for the report, Stoiko. > > Your sample just triggered the error and therefore the system started > > blowing off partially :-) So next time, please paste that example to > > e.g. pastebin or github or some website and link to it ;-) > > Aye - sorry for that! I first wanted to open a bug-report at bugzilla, > but since the one which dealt with a similar issue contained the > suggestion to contact the user-list with problems for single rules - I > did just that - without considering those implications! > > Next time I'll definitely take the pastebin-option! Both is good advice, filing a bug report as well as generally using pastebin or similar external method to provide samples... I see this has been filed in bugzilla by now. > > But anyway, can you tell me how you found out __STYLE_GIBBERISH_1 is > > the culprit? I have no clue how to isolate that, since a strace does > > not really help... Or is there some strace for perl which i do not > > know? > > hmm - in that case the way to go was to enable a commented out > debug-statement in the spamassassin source, which lists which rule is > evaluated. (on 3.4.2 installed on a Debian this is > in /usr/share/perl5/Mail/Spamassassin/Plugin/Check.pm - in > do_rawbody_tests - just comment out the if-condition for would_log > > Then you see it in the debug-output Hmm, curious why that would be commented out. It's the rules-all debug area feature that should generally be available since the 3.4 branch, IIRC. spamassassin -D rules-all will then announce regex rules *before* evaluating them, so even long- running regex rules that do not match are easy to identify. -- Karsten Bräckelmann -- open source. hacker. assassin.
Re: recent update to __STYLE_GIBBERISH_1 leads to 100% CPU usage
On Wed, 2019-05-29 at 08:27 +0200, Markus Benning wrote: > Hi, > > seems to work. > > Had to add > > score __STYLE_GIBBERISH_1 0 That's a non-scoring sub-rule, setting its score to 0 has no effect. Redefining the rule to disable it is the way to go: meta __STYLE_GIBBERISH_1 0 > to my SA config to make your mail pass. -- Karsten Bräckelmann -- open source. hacker. assassin.
Re: Can't Get Removed From List
On Mon, 2018-02-26 at 10:13 -0700, Kevin Viner wrote: > Hi everybody, I have an opt-in mailing list through MailChimp, and follow all > best practices for my monthly emails. Unfortunately, every time I send out a > list, I'm getting my fingerprint marked by Razor as spammy. SpamAssassin > advice is: The following text is not SA "advice" nor report. You should start by consulting who / what gave that text in response to get details. > "You're sending messages that people don't want to receive, for example > "=?utf-8?Q?=E2=9D=A4=C2=A0Valentine=27s=20Day=20Mind=20Reading?=". You need > to audit your mailing lists." > > The problem I'm having is that I'm not receiving any abuse reports through > MailChimp, I follow all best practice sending guidelines, and am not sending > out spammy emails. I'm a professional entertainer with a fairly large list. > > Cloudmark has been helpful in resetting my fingerprint upon request, but > this has become an ongoing monthly problem that they don't seem to be > interested in resolving with me. Please advise, as nobody seems to be able > to tell me what is happening. I have a monthly email database of 10,000+, so > if there are 1 or 2 complaints happening (which MailChimp isn't even > seeing), it seems like a 0.1% or less rate of complaints isn't anything I > can really do something about. And every time I'm flagged, I start having > issues sending out emails in my day to day work. -- Karsten Bräckelmann -- open source. hacker. assassin.
Re: FROM header with two email addresses
On Tue, 2017-10-24 at 13:22 +0200, Merijn van den Kroonenberg wrote: > > Hello all, I was the original poster of this topic but was away for a > > couple of days. > > I find it amazing to see the number of suggestions and ideas that have > > come up here. > > > > However none of the constuctions matched "my" From: lines of the form > > > > From: "Firstname Lastname@" > sendern...@real-senders-domain.com > > <mailto:sendern...@real-senders-domain.com>> > My comments in this mail are only about the > "us...@companya.com" <us...@companyb.com> > situation, not about actual double from addresses. Indeed, in this thread multiple different forms of "email address alike in From: sender real name" have surfaced. This type is occasionally used to try to look legit by using real, valid addresses of the recipient's domain (a colleague) instead of a real name, wich is harder to get correct and easier for humans to spot irregularities in. The OP's form looks like a broken From header and an intermediate SMTP choking on and rewriting it. -- Karsten Bräckelmann -- open source. hacker. assassin.
Re: Sender needs help with false positive
On Mon, 2017-08-07 at 19:15 -0400, Alex wrote: > > version=3.4.0 > > Version 3.4.0 is like ten years old. I also don't recall BAYES_999 > being available in that version, so one thing or the other is not > correct. Minor nitpick: 3.4.0 was released in Feb 2014, slightly less than 10 years ago. ;) But that's code only anyway, with sa-update rules' version and age are kept up-to-date independently. Similarly the BAYES_999 test indeed is not part of the original 3.4.0 release. It has been published via sa-update though, and even older 3.3.x installations with sa-update have that rule today. The check_bayes() eval rule always supported the 99.9% variant, it's just a float number less than 1.0... -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Results of Individual Tests on spamd "CHECK"
On Mon, 2017-08-07 at 14:17 -0500, Jerry Malcolm wrote: > I tried SYMBOLS. You are correct that it lists the tests, but not the > results: > > BAYES_95,HTML_IMAGE_ONLY_32,HTML_MESSAGE,JAM_DO_STH_HERE,LOTS_OF_MONEY,MIME_HTML_ONLY, > [...] > > But I saw this line in a forum discussion... So I'm sure there is some > way to generate it. > > >>> tests=[AWL=-1.103, BAYES_00=-2.599, > HTML_MESSAGE=0.001,URIBL_BLACK=1.955, URIBL_GREY=0.25] > > Any ideas? That particular one appears to be part of the Amavisd-new generated headers. You can get the same rules with individual scores in stock SA using the _TESTSSCORES(,)_ Template Tag with the add_header config option. See M::SA::Conf docs [1]. For ad-hoc testing without adding this to your general SA / spamd configuration, feed the sample message to the plain spamassassin script with additional --cf configuration: spamassassin --cf="add_header all TestsScores tests=_TESTSSCORES(,)_" < message Also see 10_default_prefs.cf for more informational detail in the stock Status header. > On 8/7/2017 1:13 PM, Daniel J. Luke wrote: > > On Aug 7, 2017, at 2:00 PM, Jerry Malcolmwrote: > > > I'm invoking spamd using: > > > > > > CHECK SPAMC/1.2\r\n > > > Not your best option for ad-hoc tests... ;) > > > Can someone tell me what I need to add to the spamd call (and the > > > syntax) in order to get the results of the individual tests > > > returned as part of the status? You will need SA configuration. The spamd protocol itself does not allow such fine grained configuration. [1] http://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i >=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Is this really the SpamAssassin list? (was Re: unsubscribe)
On Tue, 2014-10-28 at 11:19 -0700, jdebert wrote: On Tue, 28 Oct 2014 04:27:14 +0100 Karsten Bräckelmann guent...@rudersport.de wrote: On Mon, 2014-10-27 at 19:44 -0700, jdebert wrote: Redirecting them makes people lazy. Better than annoying but they don't learn anything except to repeat their mistakes. Your assumption, the list moderators (aka owner, me being one of them) would simply and silently obey and dutifully do the un-subscription for them, is flawed. ;) This assumption is unwarranted. I did not say that. You said that the unsubscribe-to-list posting user would not learn and get lazy, when those posts get redirected to the owner rather than hitting the list. Not learning: False. As I said, moderators would respond with explanation and instructions. In particular learning about his mistake and how to properly (and in future) unsubscribe, does make him learn. Since we'd not just unsub him, the user will even have to proof that he learned, by following procedures unsubscribing himself. Getting lazy: People are lazy. But since there's absolutely nothing we would simply do for them, there's no potential in the process to get lazy over. They will have to read and understand how to do it. And they will have to follow every step of the unsub procedure themselves. So if my assumption was really that unwarranted, please explain what else you did mean with those two sentences. Did you read the rest of the message? Yes. And quite frankly, catching unsub messages and bouncing them with a note as you mentioned is almost identical to the proposed redirect them to owner to handle it. With the latter involving moderators, having the advantage, that we can and will offer additional help if need be. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: procmail
On Tue, 2014-10-28 at 22:10 -0400, David F. Skoll wrote: frankly in times of LMTP and Sieve there is hardly a need to use procmail - it is used because i know it and it just works - so why should somebody step in and maintain it while nobody is forced to use it I use Email::Filter, not procmail, but tell me: Can LMTP and Sieve do the following? Dammit, this is just too teasing... Sorry. ;) procmail can do all of those. (Yeah, not your question, but still...) 1) Cc: mail containing a specific header to a certain address, but only between 08:00-09:00 or 17:00-21:00. Sure. Limiting to specific days or hours can be achieved without external process by recipe conditions based on our own SMTP server's Received header, which we can trust to be correct. 2) Archive mail in a folder called Received-Archive/-MM. Trivial. See man procmailex. 3) Take mail to a specific address, shorten it by replacing things like four with 4, this with dis, etc. and send as much of the result as possible as a 140-character SMS message? Oh, and only do this if the support calendar says that I am on the support pager that week. Yep. Completely internal, given there's an email to SMS gateway (flashback 15 years ago), calling an external process for SMS delivery otherwise. 4) Take the voicemail notifications produced by our Asterisk software and replace the giant .WAV attachment with a much smaller .MP3 equivalent. Check. Calling an external process, but I doubt procmail and ffmpeg / avconv is worse than Perl and the modules required for that audio conversion. Granted, in this case I'd need some rather skillful sed-fu in the pipe, or a little help of an external Perl script using MIME-tools... ;) These are all real-world requirements that my filter fulfills. And it does most of them without forking external processes. (Item 3 actually consults a calendar program to see who's on support, but the rest are all handled in-process.) That said, and all joking apart: Do you guys even remember when this got completely off topic? -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Is this really the SpamAssassin list? (was Re: unsubscribe)
On Tue, 2014-10-28 at 19:56 -0700, jdebert wrote: On Wed, 29 Oct 2014 00:33:04 +0100 Karsten Bräckelmann guent...@rudersport.de wrote: Redirecting them makes people lazy. Better than annoying but they don't learn anything except to repeat their mistakes. Your assumption, the list moderators (aka owner, me being one of them) would simply and silently obey and dutifully do the un-subscription for them, is flawed. ;) This assumption is unwarranted. I did not say that. You said that the unsubscribe-to-list posting user would not learn and get lazy, when those posts get redirected to the owner rather than hitting the list. Not exactly what I said. In the part you snipped of my previous post, I asked you to explain what you did mean, if not what I discussed in detail. This response is not helpful, neither constructive. Not learning: False. As I said, moderators would respond with explanation and instructions. In particular learning about his mistake and how to properly (and in future) unsubscribe, does make him learn. Since we'd not just unsub him, the user will even have to proof that he learned, by following procedures unsubscribing himself. False as evidenced by how the same people repeat the same thing on the same list and on other lists. Got it. Show me an example of one subscriber repeating this mistake on this list. Show me an example of one subscriber repeating this mistake on this list, after the proposed and discussed redirect to owner procedure is in effect, which is meant to help with the issue. You cannot possibly show the latter, since it is not yet in effect. So there is no evidence as you just claimed. Moreover, there is absolutely no basis to your evidence claim, that directly approaching those subscribers by moderators would not make them learn. You'll have a really hard time showing the first, too. Got it. (Not a native English speaker, what's that supposed to mean in the context of your quote? Equivalent of a foot-stomp?) Getting lazy: People are lazy. But since there's absolutely nothing we would simply do for them, there's no potential in the process to get lazy over. They will have to read and understand how to do it. And they will have to follow every step of the unsub procedure themselves. The long form of saying we're agreed. And one of the reasons to automate the process. Fun research project for you in strong favor of automation: How many such posts did this list get in the last month? Statistically irrelevant spike. Last 6 months? Last year? Two years? I am a moderator of this list. I do know that handling those bad unsub requests manually would be barely noticeable compared to the general moderation load. Which isn't high either. Did you read the rest of the message? Yes. And quite frankly, catching unsub messages and bouncing them with a note as you mentioned is almost identical to the proposed redirect them to owner to handle it. With the latter involving moderators, having the advantage, that we can and will offer additional help if need be. Having the listserver catch the messages and handle them is almost identical to redirecting them to the owner for manual handling? I could see that if list owners still managed lists manually. But there's this nifty new software that manages lists automatically, freeing the list owners from all that drudge work. I am very sorry, but it appears you have absolutely no clue what nursing mailing lists today means. Yes, all subscription (and un-subscription) is handled automatically. No owner intervention, not even notices. Automation. What we mostly do face is posts by non-subscribers. Mostly spam (just ignore), but also a non-negligible amount of valid posts by non-subscribers, or list-replies by subscribers using a wrong address. The latter outweighs by far the amount of non-subscribers. Unsub posts to the list? About the same as non-subscriber posts. Very limited. Almost negligible, if some rare samples won't trigger an on-list shitstorm. With the proposed process in place, I would have spent less lime managing and resolving the last 12 months' bad unsub requests, than it took me arguing with you about something that really does not concern you. Your assumption is that I am telling you to do all this manually. You seemed to be ambivalent about this, not preferring to do it manually but seeming to prefer to do it manually. No. I know from experience that doing this manually is the easiest, least time consuming solution. And with no word did I imply you are telling me to do all this manually. Quite the contrary. My assumption was expecting it to occur to everyone that it might be done automatically. I really did not expect to have to write to ISO-9002 standards on a user list. Exactly, *might*. Not the best solution in this case. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4
Re: Is this really the SpamAssassin list? (was Re: unsubscribe)
On Mon, 2014-10-27 at 17:00 -0400, Kevin A. McGrail wrote: On 10/27/2014 4:48 PM, Kevin A. McGrail wrote: On 10/27/2014 4:45 PM, David F. Skoll wrote: How hard would it be to have the mailing list quarantine a message whose subject consists solely of the word unsubscribe ? Heh... Apparently more needed than I hoped. I'll have to ask the foundation if they can implement something to achieve this. I've emailed infra with the following request: Might help, but not worth much effort if infra cannot set it up easily. While we've seen a few recently, usual and overall frequency is *much* lower. header__KAM_SA_BLOCK_UNSUB1Subject =~ /unsubscribe/i Ouch. Would you please /^anchor$/ that beast? Unless you actually intend this sub-thread to be swept off the list, too. ;) -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Is this really the SpamAssassin list? (was Re: unsubscribe)
On Mon, 2014-10-27 at 19:44 -0700, jdebert wrote: On Mon, 27 Oct 2014 17:00:11 -0400 Kevin A. McGrail kmcgr...@pccc.com wrote: I've emailed infra with the following request: ...we have been getting consistent unsubscribe messages posted to the entire users list which begs the question if there is a way to redirect those to the mailing list owner instead of just posting them? Redirecting them makes people lazy. Better than annoying but they don't learn anything except to repeat their mistakes. Your assumption, the list moderators (aka owner, me being one of them) would simply and silently obey and dutifully do the un-subscription for them, is flawed. ;) Just as with regular moderation, we'd respond with a template explaining things, offering instructions -- and additional information on a case-by-case basis. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: How is it that my X-Spam-Status is no, but my header gets marked with
On Mon, 2014-10-27 at 20:19 -0700, jdebert wrote: On Mon, 27 Oct 2014 15:45:03 -0700 (PDT) John Hardin jhar...@impsec.org wrote: The apparent culprit is a procmail rule that explicitly passes a message through the mail system again. The message is being scanned twice. If she can either deliver to a local mailbox rather than forwarding to an email address, or modify the procmail rule that calls SA to ignore messages that have already passed through the server once, I think the problem would go away. It looks as if it's the global procmailrc that always puts all mail, even mail between local users through spamassassin. However, I don't see how going through spamassassin again will modify the header. It's It is not the second run that modifies the header. It's the first one. With the second run classifying the mail as not-spam. already modified before the user procmail rule sees it. Something appears to be causing the first run of sa to modify the header unconditionally. If global procmail actually does the first run. A system-wide procmail recipe feeds mail to SA. Then there's a user procmail recipe that forwards mail with a Subject matching /SPAM/ to another dedicated spam dump address with the same domain, which ends up being delivered to that domain's MX. The same SMTP server. Now re-processing the original mail (possibly wrapped in an RFC822 attachment by SA), feeding it to SA due to the system-wide procmail recipe... On that second run, the message previously classified spam does not exceed the threshold. Thus the X-Spam-Status of no, overriding the previous Status header which is being ignored by SA anyway. Result: Subject header rewritten by SA, despite final (delivery time) spam status of no. This thread's Subject. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: .link TLD spammer haven?
On Fri, 2014-10-24 at 19:05 -0700, John Hardin wrote: On Fri, 24 Oct 2014, John Hardin wrote: On Sat, 25 Oct 2014, Martin Gregorie wrote: Less obviously, it doesn't seem to matter whether you write the rule as /\.link\b/ or /\.link$/ - both give identical matches. Both match the following regexes just as you'd expect: http://www.linkedin.com/home/user/data.link http://www.example.link but, less obviously, both also match this: http://www.example.link/path/to/file.txt {boggle} ...but grep -P '\.link\b' matches it, but grep -P '\.link$' does not. I presume that this means that the uri rule tests against two strings: one being just the domain name and the other being the whole URI and declares a rule hit if either string matches. Basically correct. SA uri rules are not only tested against the raw URI as extracted from the message, but also some normalized variations. Without going into details, OTOH this includes un-escaping, protocol prefix (if missing) and path stripping. $ echo -e \n apache.org/path/ | ./spamassassin -D -L --cf=uri URI_DOMAIN /^http:\/\/[^\/]+$/ dbg: rules: ran uri rule URI_DOMAIN == got hit: http://apache.org; Note the regex matching a domain only anything-but-slash [^/]+ substring anchored at the end of the string. Also note the input message's URI lacking a protocol, but the rule hit showing the (default) protocol added by SA in one variation. I don't think so, but I'm not positive. If you have a testing environment set up, try adding this and see what you get in the log: uri__ALL_URI /.*/ oops. This too: tflags __ALL_URI multiple Sorry for forgetting that bit, it's rather important. :) That seemingly straight-forward approach does not work in this case. The tflags multiple option does not make uri rules match multiple times on a single URI extracted from the message. It still generates a single hit per extracted URI only, not including multiple hits on its normalized variations. The tflags multiple option on a uri rule enables it to match multiple times on different URIs extracted from the message. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: How is it that my X-Spam-Status is no, but my header gets marked with
On Sat, 2014-10-25 at 20:06 -0700, Cathryn Mataga wrote: Okay, here's another header.Shows X-Xpam-Status as no. In local.cf I changed to this, just to be sure. rewrite_header Subject [SPAM][JUNGLEVISION SPAM CHECK] X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on ecuador.junglevision.com X-Spam-Level: * X-Spam-Status: No, score=1.5 required=3.5 tests=BAYES_50,HTML_MESSAGE, MIME_HTML_ONLY,MIME_QP_LONG_LINE autolearn=disabled version=3.3.2 Subject: [SPAM][JUNGLEVISION SPAM CHECK] Confirmation of Order Number 684588 * Please Do Not Reply To This Email * Somehow, you are passing messages to SA twice. First one classifies it spam and rewrites the Subject. Second run doesn't. Added headers, content wrapping, or most likely re-transmission from trusted networks makes the second run fail. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: URIBL_RHS_DOB high hits
On Sat, 2014-10-11 at 23:40 +0200, Reindl Harald wrote: it hits again and i doubt that sourceforge is a new domain whatever the reason is - for me enough to disable it forever Jumping to conclusions, aren't you? Oct 11 23:34:43 mail-gw spamd[28079]: spamd: result: . 0 - BAYES_50,CUST_DNSWL_7,CUST_DNSWL_9,DKIM_ADSP_ALL,HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_PASS,T_RP_MATCHES_RCVD,URIBL_RHS_DOB,USER_IN_MORE_SPAM_TO scantime=0.9,size=8902,user=sa-milt,uid=189,required_score=4.5,rhost=localhost,raddr=127.0.0.1,rport=39381,mid=7655276d-92b5-4dbd-8041-6db5c4fb8...@tieman.se,bayes=0.499983,autolearn=disabled Oct 11 23:34:43 mail-gw postfix/qmgr[28308]: 3jFfYt4WVTz1l: from=netatalk-admins-boun...@lists.sourceforge.net, size=8829, nrcpt=1 (queue active) $ host sourceforge.net.dob.sibl.support-intelligence.net Host sourceforge.net.dob.sibl.support-intelligence.net not found: 3(NXDOMAIN) $ host tieman.se.dob.sibl.support-intelligence.net tieman.se.dob.sibl.support-intelligence.net has address 127.0.0.2 $ whois tieman.se | grep 2014 created: 2014-01-11 modified: 2014-09-20 -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: URIBL_RHS_DOB high hits
On Sun, 2014-10-12 at 00:29 +0200, Reindl Harald wrote: Am 12.10.2014 um 00:23 schrieb Reindl Harald: Am 12.10.2014 um 00:18 schrieb Karsten Bräckelmann: On Sat, 2014-10-11 at 23:40 +0200, Reindl Harald wrote: it hits again and i doubt that sourceforge is a new domain whatever the reason is - for me enough to disable it forever Jumping to conclusions, aren't you? yes - the conclusion is that it had way too much FP's recently Arguably, tieman.se should be sufficiently old to not be listed. However, what I am much more annoyed about is your rambling, claiming DOB would list sourceforge.net -- and by that, particularly with this thread's topic, giving the impression of DOB again listing the world. Which it doesn't. Obviously, you did not check facts or investigate the issue at all. frankly it hitted even my own message you replied to see at bottom Yes, so will this one. DOB does NOT operate on sender or From header. See for yourself: echo -e \n tieman.se | ./spamassassin So yes, it hit on your mail. But no, it does not list your domain. Oct 11 23:34:43 mail-gw spamd[28079]: spamd: result: . 0 - FWIW, you can investigate and check any detail you want, because the mail has been accepted by your SMTP server. With a configuration of add_header all Report _REPORT_, the listed domain even is included in the report, without any need for manual post-processing. * 0.3 URIBL_RHS_DOB Contains an URI of a new domain (Day Old Bread) * [URIs: tieman.se] -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: URIBL_RHS_DOB high hits
On Sun, 2014-10-12 at 01:28 +0200, Reindl Harald wrote: Am 12.10.2014 um 01:09 schrieb Karsten Bräckelmann: it hits again and i doubt that sourceforge is a new domain However, what I am much more annoyed about is your rambling, claiming DOB would list sourceforge.net -- and by that, particularly with this thread's topic, giving the impression of DOB again listing the world. Which it doesn't. it seems to hit randomly which is even more worse because listing the world is more obvious - i claim that it is not trustable currently, not more and not less, may anybody make his own decision, i told mine and there is nothing worng with that You have exactly one false positive listing. That is not even close to hit randomly. Please stop the repeated, false accusations on this list. Obviously, you did not check facts or investigate the issue at all. don't get me wrong, there ist not much to investigate if it hits legit mailing-list messages Correct, there is not much to investigate. The *only* thing would be to verify *which* domain hit the DOB listing, and whether it actually is a bad or warranted listing. Besides, that one is absolutely crucial to check before claiming a false positive. A single thing to verify. You did not. Besides, it is just a coincidence that another domain in your log paste actually was listed when I checked. Any other domain from the body could have been the culprit. And still potentially can, since you only posted logs -- no SA headers, body, or list of URIs. With a configuration of add_header all Report _REPORT_, the listed domain even is included in the report, without any need for manual post-processing. * 0.3 URIBL_RHS_DOB Contains an URI of a new domain (Day Old Bread) * [URIs: tieman.se] which is just not true - the domain is way older Yes, that seems to be a DOB false positive listing (and the only one known right now, see above). Get over it. And BTW, that was meant as a helpful hint for you and anyone else reading this thread, about getting crucial details while investigating (or reporting) issues. No need to bark at me, and repeat yet again that's the one bad listing you encountered. The above is how to do it and what you get. and the SBL hit because support-intelligence.net makes things not better URIBL_SBL Contains an URL's NS IP listed in the SBL blocklist * [URIs: tieman.se.dob.sibl.support-intelligence.net] That is a SpamHaus listing. Support Intelligence is not responsible for it, but the victim. This is entirely unrelated to URIBL_RHS_DOB and this thread's topic. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: URIBL_RHS_DOB high hits
On Sun, 2014-10-12 at 02:58 +0200, Reindl Harald wrote: Am 12.10.2014 um 02:20 schrieb Karsten Bräckelmann: You have exactly one false positive listing. That is not even close to hit randomly. well, i can't verify the other hits because don't have access to other users email - the follwoing is another one and that *is* the definition of randomly - in doubt such a list must not answer when there is not verified data instead hit a FP URIBL_RHS_DOB Contains an URI of a new domain (Day Old Bread) [URIs: goo.gl] Another false positive DOB listing. Not good. Thanks for taking some time to actually provide detail. As for your personal definition of randomness, please see what others have to say about it. Multiple bad listings still is not random. http://en.wikipedia.org/wiki/Randomness Please stop the repeated, false accusations on this list. point out that it is not trustable currently is not a accusation and You claimed DOB listed sourceforge.net, which it didn't. You repeatedly claimed their listing to be random, which it isn't. That is what I referred to as false accusations. frankly http://support-intelligence.com/dob/ itself states The list is currently in BETA and should be used accordingly. We still have some kinks in it and occasionally domains older than five days, or other important domains end up in the list Yes. So what? You are free to disable DOB on your server. You are free and in fact welcome to report any issue with stock SA included DNSBLs, on-list or in bugzilla, with founded evidence. You are not free to claim $list responses to be random without proof. Obviously, you did not check facts or investigate the issue at all. don't get me wrong, there ist not much to investigate if it hits legit mailing-list messages Correct, there is not much to investigate. The *only* thing would be to verify *which* domain hit the DOB listing, and whether it actually is a bad or warranted listing. Besides, that one is absolutely crucial to check before claiming a false positive. A single thing to verify. You did not if it hits a regular mailing list thread it is problematic and as said No. It depends on the content. See this list for prime example. if there are no data for whatever reason the answer should be NXDOMAIN and not 127.0.0.1 in doubt because FP does more harm than FN False accusation, again. You just claimed $list would return anything other than NXDOMAIN in case of not-being-listed. $ host not-registered-domain.com.dob.sibl.support-intelligence.net Host not-registered-domain.com.dob.sibl.support-intelligence.net not found: 3(NXDOMAIN) We're talking false positive listings. Not random responses, neither positive listing if in doubt. Again, stop unfounded false accusations on this list. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Score Ignored
On Wed, 2014-10-08 at 15:48 -0500, Robert A. Ober wrote: On Mon, 22 Sep 2014 15:11:44 -0500 Robert A. Ober wrote: *Yes, my test messages and SPAM hit the rules but ignore the score.* What is the easiest way to know what score is applied per rule? Neither the server log nor the header breaks it down. Wait. If there's no Report, if you do not have the list of rules hit and its respective scores, how do you tell your custom rule's score is ignored by SA? Besides the Report as mentioned by Axb already, you also can modify the default Status header to include per-rule scores. add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ tests=_TESTSSCORES(,)_ autolearn=_AUTOLEARN_ version=_VERSION_ -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: rejected Null-Senders
On Tue, 2014-10-07 at 17:46 +0200, Reindl Harald wrote: can somebody comment in what context null-senders and so bounces and probably autorepsonders are blocked by DKIM_ADSP_NXDOMAIN,USER_IN_BLACKLIST SA does not block. *sigh* In this context, the DKIM_ADSP_NXDOMAIN hit is irrelevant, given its low score. The USER_IN_BLACKLIST hit is what's pushing the score beyond your STMP reject threshold. DKIM_ADSP_NXDOMAIN,USER_IN_BLACKLIST from= to=u...@example.com 3jC2XD1j8Cz1y: milter-reject: END-OF-MESSAGE See whitelist_from documentation for the from / sender type mail headers SA uses for black- and whitelisting. The above seems to show SMTP stage MAIL FROM, which results in only one of the possible headers and depends on your SMTP server (and milter in your case). a customer sends out his yearly members-invitation nad i see some bounces / autrorepsonders pass through and some are blocked with the above tags, at least one from his own outgoing mainserver what i don't completly understand is the DKIM_ADSP_NXDOMAIN since in case of NXDOMAIN the message trigger the response could not have been delivered and how the USER_IN_BLACKLIST comes with a empty sender not that i am against block some amount of backscatters, i just want to understand the conditions -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: spamd does not start
On Tue, 2014-10-07 at 18:55 +0300, Jari Fredrisson wrote: I built SA 3.4 using cpan to my old Debian Squeeze-lts. root@hurricane:~# time service spamassassin start Starting SpamAssassin Mail Filter Daemon: child process [4868] exited or timed out without signaling production of a PID file: exit 255 at /usr/local/bin/spamd line 2960. real0m1.230s I read that line in spamd and it talks about two bugs. And a long timeout needed. But this dies at once, hardly a timeout? It states the child process exited or timed out. Indeed, obviously not a timeout, so the child process simply exited. Anything in syslog left by the child? -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: recent channel update woes
On Tue, 2014-10-07 at 18:49 -0400, Eric Cunningham wrote: Is there a way to configure URIBL_RHS_DOB conditionally such that if there are issues with dob.sibl.support-intelligence.net like we're seeing, that associated scoring remains neutral rather than increasing (or decreasing)? No. As-is, a correct DNSxL listing is indistinguishable from a false positive listing. One possible strategy to detect FP listings would be an additional DNSxL query of a test-point or known-to-be not listed value. This comes at the cost of increased load both for the DNSxL as well as SA instance, and will lag behind due to TTL and DNS caching. The lower the lag, the lower the caching, the higher the additional load. By doing such tests not on a per message basis but per spamd child. or even having the parent process monitor for possible world-listed situations, the additional overhead and load could be massively reduced. Simply monitoring real results (without test queries) likely would not work. It is entirely possible that really large chunks of the mail stream continuously result in positive DNSxL listings. Prime candidates would be PBL hitting botnet spew, or exclusively DNSWL trusted messages during otherwise low traffic conditions. Distinguishing lots of consecutive correct listings from false positives would be really hard and prone to errors. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: recent channel update woes
On Wed, 2014-10-08 at 01:18 +0200, Reindl Harald wrote: Am 08.10.2014 um 00:49 schrieb Eric Cunningham: Is there a way to configure URIBL_RHS_DOB conditionally such that if there are issues with dob.sibl.support-intelligence.net like we're seeing, that associated scoring remains neutral rather than increasing (or decreasing)? not really - if you get the response from the DNS - well, you are done the only exception are dnslists which stop to answer if you excedd the free limit but in that case they answer with a different response what is caught by the rules Exceeding free usage limit is totally different from the recent DOB listing the world issue. Also, exceeding limit is handled differently in lots of ways. It ranges from specific limit exceeded results, up to listing the world at the hostile end or in extreme situations to finally get the admin's attention. It also includes simply no results other than NXDOMAIN, which is hard to distinguish from proper operation in certain low-listing conditions. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: recent channel update woes
On Tue, 2014-10-07 at 16:37 -0700, Dave Warren wrote: If you're paranoid, you can monitor the DNSBLs that you use via script (externally from SpamAssassin) and generate something that reports to you when there's a possible issue. If you're really paranoid, you can have it write a .cf that would 0 out the scores, but I assure you that you'll spend more time building, testing and maintaining such a system than it's worth in the long run, in my experience it's better to just page an admin. I monitor positive and negative responses, for IP based DNS BLs, I use the following by default: 127.0.0.1 should not be listed. 127.0.0.2 should be listed. Depending on how the DNSBL implements such static test-points, they might not be affected by the issue causing the false listings. Similarly, domains likely to appear on exonerate lists (compare uridnsbl_skip_domain e.g.) might also not be affected. For paranoid monitoring, low-profile domains that definitely do not and will not match the listing criteria might be better suited for the task. $MYIP should not be listed. Obviously these need to be tweaked and configured per-list, not all lists list 127.0.0.2, and some lists use status codes, so should not be listed and should be listed are really match/do-not-match some condition In the case of DNSWL, $MYIP should be listed, if I get de-listed, I want to know about that too. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Administrivia (was: Re: recent channel update woes)
On Mon, 2014-10-06 at 13:36 -0400, Kevin A. McGrail wrote: On 10/6/2014 1:23 PM, Kevin A. McGrail wrote: On 10/6/2014 1:11 PM, Jason Goldberg wrote: How to i get removed from this stupid list. I love begin spammed by a list about spam which i did not signup for. Email users-h...@spamassassin.apache.org and the system will mail you instructions. If you did not sign up for the list, that is very troublesome and we can ask infrastructure to research but I believe we have a confirmation email requirement to get on the list. First of all: Jason's posts are stuck in moderation. The sender address he uses is not the one he subscribed with. Sidney and I (both list moderators) have been contacting Jason off-list with detailed instructions how to find the subscribed address and offering further help. Obviously we take this very seriously as anti-spammers because the definition I follow for spam is it's about consent not content. If you didn't consent to receive these emails, we have a major issue. The list server requires clear and active confirmation of the subscription request by mail, validating both the address as well as consent. I've confirmed we have a confirmation email process in place that requires the subscribee to confirm the subscription request. And I believe this has been in place for many years. So if you did not subscribe to the list or confirm the subscription, you may need to check if your email address credentials have been compromised as that's the second most likely scenario for the cause beyond an administrator adding you directly. Karsten, any thoughts other than if a list administrator added them directly? Have infrastructure check the records for when and how the subscriber was added? Open a ticket with Google? He has not been added by a list administrator. Without the subscribed address, there is absolutely nothing we can do. I grepped the subscription list and transaction logs for parts of Jason's name and company. The address in question is entirely different. Just to give some answers. This issue should further be handled off-list. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: SpamAssassin false positive bayes with attachments
On Mon, 2014-10-06 at 09:03 -0400, jdime abuse wrote: I have been seeing some issues with bayes detection from base64 strings within attachments causing false positives. Example: Oct 6 09:02:14.374 [15869] dbg: bayes: token 'H4f' = 0.71186828264 Oct 6 09:02:14.374 [15869] dbg: bayes: token 'wx2' = 0.68644662127 Oct 6 09:02:14.374 [15869] dbg: bayes: token 'z4f' = 0.68502147581 Oct 6 09:02:14.378 [15869] dbg: bayes: token '0vf' = 0.66604823748 Is there a solution to prevent triggering bayes from the base64 data in an attachment? It was my impression that attachments should not trigger bayes data, but it seems that it is parsing it as text rather than an attachment. Bayes tokens are basically taken from rendered, textual body parts (and mail headers). Attachments are not tokenized. Unless the message's MIME-structure is severely broken, these tokens appear somewhere other than a base64 encoded attachment. Can you provide a sample uploaded to a pastebin? -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: running own updateserver
On Wed, 2014-10-01 at 13:19 +0200, A. Schulze wrote: Hello, I had the idea to run my own updateserver for two purposes: 1. distribute own rules 2. override existing rules But somehow I fail on #2. SA rules normally reside in /var/.../spamassassin/$SA-VERSION/channelname/*.cf Also the are files /var/.../spamassassin/$SA-VERSION/channelname.cf including the real files in channelname/ Now I had some rules overriding existing SA rules in /etc/mail/spamassassin/local.cf These rules I moved to my own channelname and now the defaults from updates_spamassassin_org are active again. My guess: rules are included in lexical order from Correct. /var/.../spamassassin/$SA-VERSION/channelname.cf and my new channel spamassassin_example_org is *not after* updates_spamassassin_org I proved my guess by renaming the channelfiles to z_spamassassin_example_org ( adjusted the .cf + include also ) Immediately the intended override was active again. Is my guess right? Yes. If so, any (other then renaming the channel) chance to modify the order? No. The directory name and accompanying cf file are generated by sa-update based on the channel name. There is no way for the channel to enforce order. Besides picking a channel name that lexicographically comes after the to-be-overridden target channel, you're limited to local post sa-update rename or symlink hacks with additional maintenance cost. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: bad local parts (thisisjusttestletter)
On Sat, 2014-10-04 at 22:15 +0200, Reindl Harald wrote: i recently found thisisjusttestletter@random-domain as sender as well as thisisjusttestletter@random-of-our-domains as RCPT in my logs and remember that crap for many years now Surely, SA would never see that message, since that's not an actual, valid address at your domain. And you're not using catch-all, do you? (Yes, that question is somewhere between rhetoric and sarcastic.) well, postfix access maps after switch away from commercial appliances - are there other well nown local-parts to add to this list? What would you need a blacklist of spammy address local parts for? Do not accept messages to SMTP RCPT addresses that don't exist. Do not use catch-all. Problem solved... Other than that, this is an OT postfix question. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: bad local parts (thisisjusttestletter)
On Sun, 2014-10-05 at 01:53 +0200, Reindl Harald wrote: Am 05.10.2014 um 01:41 schrieb Karsten Bräckelmann: On Sat, 2014-10-04 at 22:15 +0200, Reindl Harald wrote: i recently found thisisjusttestletter@random-domain as sender as well as thisisjusttestletter@random-of-our-domains as RCPT in my logs and remember that crap for many years now Surely, SA would never see that message, since that's not an actual, valid address at your domain. And you're not using catch-all, do you? (Yes, that question is somewhere between rhetoric and sarcastic.) but thisisjusttestletter@random-domain is a valid address in his domain until you prove the opposite with sender-verification and it's drawbacks Correct. And it is unsafe to assume any given address local part could not possibly be valid and used as sender address in ham. If at all, such tests should be assigned a low-ish score, not used in SMTP access map blacklisting. However, I seriously doubt it's actually worthwhile to maintain such rules. well, postfix access maps after switch away from commercial appliances - are there other well nown local-parts to add to this list? What would you need a blacklist of spammy address local parts for? Do not accept messages to SMTP RCPT addresses that don't exist. Do not use catch-all. Problem solved... don't get me wrong but you missed the 'i recently found thisisjusttestletter@random-domain' as sender at the start of my post As sender, continued by as well as [...] as RCPT using the exact same local part. So you just found one such instance in your logs. And yes, I have seen that very address local part, too, occasionally. Although only in SMTP logs and AFAIR never ever in SMTP accepted spam, let alone FNs, because just like your sample, they always sported a similarly invalid RCPT address. Did you ever see this in MAIL FROM with a *valid* RCPT TO address? And did it end up scored low-ish? Below 15? Otherwise, it's just not worth it. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: bad local parts (thisisjusttestletter)
On Sun, 2014-10-05 at 02:43 +0200, Reindl Harald wrote: Am 05.10.2014 um 02:27 schrieb Karsten Bräckelmann: On Sun, 2014-10-05 at 01:53 +0200, Reindl Harald wrote: Am 05.10.2014 um 01:41 schrieb Karsten Bräckelmann: On Sat, 2014-10-04 at 22:15 +0200, Reindl Harald wrote: i recently found thisisjusttestletter@random-domain as sender as well as thisisjusttestletter@random-of-our-domains as RCPT in my logs and remember that crap for many years now Surely, SA would never see that message, since that's not an actual, valid address at your domain. And you're not using catch-all, do you? (Yes, that question is somewhere between rhetoric and sarcastic.) but thisisjusttestletter@random-domain is a valid address in his domain until you prove the opposite with sender-verification and it's drawbacks Correct. And it is unsafe to assume any given address local part could not possibly be valid and used as sender address in ham. most - any excludes that one honestly I would agree, gladly. If only I would not have these pictures in my head of an admin creating that as a deliverability testing address. Same ball park as a Subject of test. I almost can hear his accent... If at all, such tests should be assigned a low-ish score, not used in SMTP access map blacklisting. However, I seriously doubt it's actually worthwhile to maintain such rules. agreed - i only asked if there are known other local parts of that sort because i noticed that one at least 5 years ago as annoying Annoying? That was before using SA and with using catch-all, right? So it was annoying back then. Doesn't explain why you're chasing it today. How many of them can you find in your logs? Even including its variants (e.g. atall appended), I assume the total number to be really low. And, frankly, exclusively existent in SMTP logs rejecting the message. Unless there still is catch-all in effect, that should have been axed some 10 years ago. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Custom rule not hitting suddenly?
On Mon, 2014-09-08 at 11:35 -0600, Amir Caspi wrote: One of my spammy URI template rules is, for some reason, not hitting any more. Spample here: http://pastebin.com/jy6WZhWW In my local.cf sandbox I have the following: uri __AC_STOPRANDDOM_URI1 /(?:stop|halt|quit|leave|leavehere|out|exit|disallow|discontinue|end)\.[a-z0-9-]{10,}\.(?:us|me|com|club|org|net)\b/ This is part of my AC_SPAMMY_URI_PATTERNS meta rule, which hits just fine on other emails (including others of this particular format). Debug output shows this subrule didn't hit anything (that is, the rule isn't mentioned at all in the debug output), but regexpal.com says it should have hit just fine. Works for me. Pulled the sample from pastebin and fed to spamassassin -D with your custom rule added as additional configuration. That rule hits. Could the problem be with the \b delimiter at the end? No. The word-boundary \b does not only match between a word \w and non-word \W char, but also at the beginning or end of the string, if the adjacent char is a word char. I've noticed that sometimes can cause issues in failing to hit, but usually only when a URI ends with a slash... That, too, would be unrelated to the \b word-boundary. What bothers me is that sometimes qualification. Either it matches or it doesn't. If it matches sometimes, something yet unnoticed has a severe impact. Did you grep the -D debug output for the hostname? Also try grepping for URIHOSTS (SA 3.4, without -L local only mode), which lists all hostnames found in the message. and this same rule hits other matching URIs in other spams. However, this isn't the first time I've noticed a failure to match... so any idea why it's not hitting? Per the regex rules, it SHOULD be hitting fine unless it's the \b... Any ideas? The URI is at the very end of a line with a CRLF delimiter following and the next line beginning with a word character. If you inject a space after the URI, does that make the rule match? (That should not be the issue, just trying to rule out conversion problems.) Also I noticed the headers are CRLF delimited, too. How did you get that sample? Any chance it has been modified or re-formatted by a text editor and does not equal the raw, original message? Does the pastebin uploaded file still not trigger the rule for you? -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Custom rule not hitting suddenly?
On Mon, 2014-09-08 at 18:08 -0600, Amir Caspi wrote: On Sep 8, 2014, at 4:09 PM, Karsten Bräckelmann guent...@rudersport.de wrote: Pulled the sample from pastebin and fed to spamassassin -D with your custom rule added as additional configuration. That rule hits. It does not hit on mine, and I think I've figured out why. I'm using SA 3.3.2 with perl 5.8.8 on CentOS 5.10. Yes, I know I should be using 3.4, but I haven't yet had a chance to try the RPM that a couple of people have built. Nonetheless, with SA 3.3.2, it appears that the URI engine doesn't like the .club TLD. See below. Good one. Yes, it's the TLD. Sep 8 20:02:58.897 [9267] dbg: rules: ran uri rule AC_ALL_URI == got hit: negative match So, for some reason, the URI engine is not picking out these .club URIs, it's getting negative match. Is it because the engine in 3.3.2 doesn't like that TLD? To test this, I manually changed the TLD of the second spam URI (out.blah) to .us or .org, and then the engine picked it out just fine: At the time of the 3.3.2 release, the .club TLD simply didn't exist. It has been accepted by IANA just recently. Of course I was conveniently using a trunk checkout for testing and kind of shrugged off that TLD in question. FWIW, this is not actually a 3.3.x issue. It's the same with 3.4.0. Yes, that is a *recent* TLD addition... *sigh* Sep 8 20:03:43.151 [9197] dbg: rules: ran uri rule AC_ALL_URI == got hit: http://out.dosearchcarsonsale.us; Sep 8 20:04:35.578 [9227] dbg: rules: ran uri rule AC_ALL_URI == got hit: http://out.dosearchcarsonsale.org; So, it seems to me that the URI engine is barfing on the TLD, and that's the problem... Is there a patch I can apply that would fix this, until I can upgrade to 3.4? SVN revision 1615088. The text changed link shows the diff and has a link to the plain patch. http://svn.apache.org/viewvc?view=revisionrevision=1615088 Dunno if that the patch applies cleanly to 3.3.2, though. You also can change M::SA::Util::RegistrarBoundaries manually. As per the svn diff above, two blobs are involved: (a) the VALID_TLDS hash foreach() definition and (b) the VALID_TLDS_RE. So you could get those out of trunk and edit RegistrarBoundaries.pm locally. It also should be possible to simply replace that Perl module with the current trunk version. And last but not least, generation ob both these TLD blobs is documented in the code right before their definition. You can always generate it fresh. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Valid TLDs (was: Re: Custom rule not hitting suddenly?)
Some discussion of the underlying issue. On Tue, 2014-09-09 at 02:59 +0200, Karsten Bräckelmann wrote: At the time of the 3.3.2 release, the .club TLD simply didn't exist. It has been accepted by IANA just recently. Of course I was conveniently using a trunk checkout for testing and kind of shrugged off that TLD in question. FWIW, this is not actually a 3.3.x issue. It's the same with 3.4.0. Yes, that is a *recent* TLD addition... *sigh* Unlike the util_rb_[23]tld options, the set of valid TLDs is actually hard-coded. It would not be a problem to make that an option, too. Which, on the plus side, would make it possible to propagate new TLDs via sa-update. Not only 3.3.x would benefit from that, but also 3.4.0 instances. Plus, it would be generally faster anyway. There is one down side: A new dependency on Regexp::List [1]. The RE pre-compile one-time upstart penalty should be negligible. The question is: Is it worth it? WILL it be worth it? This incidence is part of the initial round of IANA accepting generic TLDs. There's hundreds in this wave, and some are abused early. This is moonshine registration, nothing like new TLDs being accepted in the coming years. Or is it? Will new generic TLDs in the future be abused like that, too? How frequently will that happen? Is it worth being able to react to it quickly? How long will URIBLs take to list them? How long will it take for the average MUA to even linki-fy them? Opinions? Discussion in here, or should I move this to dev? I guess I'd be happy to introduce to you... util_rb_tld. [1] Well, or a really, really f*cking ugly option that takes a pre-optimzed qr// blob containing the VALID_TLDS_RE. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Valid TLDs (was: Re: Custom rule not hitting suddenly?)
On Mon, 2014-09-08 at 22:15 -0400, Daniel Staal wrote: --As of September 9, 2014 3:45:33 AM +0200, Karsten Bräckelmann is alleged to have said: This incidence is part of the initial round of IANA accepting generic TLDs. There's hundreds in this wave, and some are abused early. This is moonshine registration, nothing like new TLDs being accepted in the coming years. Or is it? Will new generic TLDs in the future be abused like that, too? How frequently will that happen? Is it worth being able to react to it quickly? How long will URIBLs take to list them? How long will it take for the average MUA to even linki-fy them? Opinions? Discussion in here, or should I move this to dev? --As for the rest, it is mine. New TLDs will always be abused... And old ones. TK, re-naming the web. Yes, sometimes it is valid to add a point or two for the mere occurence of a TLD in a URI. For how long? Whoever applied for new generic $tld put about 180 grand up the shelve. How much is it worth them to prevent spammers from tasting domains and actually turn their investment into serious customers paying bucks? Anyway, personal opinion: Spamassassin is currently structured to have code and rules as separate things. Putting this in the code blurs that - it's a rule. Unless there is a major performance penalty, I would move it to be with the rest of the rules. It should make maintenance easier and clearer. It is and would not be a rule as you stated, but configuration. Apart from that nitpick, I understand you would be in favor of a Valid TLD option, rather than hard-coded. Noted. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Valid TLDs (was: Re: Custom rule not hitting suddenly?)
On Mon, 2014-09-08 at 22:37 -0400, listsb-spamassas...@bitrate.net wrote: On Sep 8, 2014, at 21.45, Karsten Bräckelmann guent...@rudersport.de wrote: Some discussion of the underlying issue. On Tue, 2014-09-09 at 02:59 +0200, Karsten Bräckelmann wrote: At the time of the 3.3.2 release, the .club TLD simply didn't exist. It has been accepted by IANA just recently. Of course I was conveniently using a trunk checkout for testing and kind of shrugged off that TLD in question. FWIW, this is not actually a 3.3.x issue. It's the same with 3.4.0. Yes, that is a *recent* TLD addition... *sigh* Unlike the util_rb_[23]tld options, the set of valid TLDs is actually hard-coded. It would not be a problem to make that an option, too. Which, on the plus side, would make it possible to propagate new TLDs via sa-update. Not only 3.3.x would benefit from that, but also 3.4.0 instances. Plus, it would be generally faster anyway. There is one down side: A new dependency on Regexp::List [1]. The RE pre-compile one-time upstart penalty should be negligible. The question is: Is it worth it? WILL it be worth it? pardon my possible technical ignorance here - could this potentially be a network test, rather than a list propagated by sa-update? e.g. query dns for existence of delegation? This cannot be queried for. Because the Valid TLDs (code|option) is what is used to identify URIs in the first place, even from plain text links any normal MUA would linki-fy. Apart from that, the list of generic TLDs is not going to change *that* frequent, that a few days between IANA acceptance, SA incorporating it, and first occurrence in mail as sa-update takes would make a difference. And as I hinted at before, (new) generic TLD owners have a vital interest in their TLD not be mostly abused. If it is, it's not worth the investment. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Valid TLDs (was: Re: Custom rule not hitting suddenly?)
On Mon, 2014-09-08 at 21:45 -0500, Dave Pooser wrote: On 9/8/14 8:45 PM, Karsten Bräckelmann guent...@rudersport.de wrote: There is one down side: A new dependency on Regexp::List [1]. The RE pre-compile one-time upstart penalty should be negligible. [1] Well, or a really, really f*cking ugly option that takes a pre-optimzed qr// blob containing the VALID_TLDS_RE. I may be biased as I've been dealing with a different CPAN dependency flustercluck recently (love maintainers who can't be bothered to update the version info so CPAN doesn't realize there's an update and I have to manually un/re install), but I'm a vote for the hideously ugly preoptimized blob over adding a new dependency. That said, I'd rather have the new dependency than keep the configuration embedded in the rules. ^ Code, not rules. Which basically is the issue here... So, in order of preference: 1) Pre-optimized blob 2) Regexp::List dependency 3) Current method Got ya. Both (1) and (2) would require code changes, so it's 3.4.1+ only anyway. Thanks. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: shouldn't spamc -L spam always create BAYES_99?
On Sun, 2014-09-07 at 09:09 +1200, Jason Haar wrote: We've got a problem with a tonne of spam getting BAYES_50 or even BAYES_00. We're re-training SA using spamc -L spam but it doesn't seem to do as much as we'd like. Sometimes it doesn't change the BAYES_ score, and other times it might go from BAYES_50 to BAYES_80 I think bayes is working (there's also a tonne of mail getting BAYES_99) but I'm guessing there's some learning logic I'm not aware of to explain why me telling SA this is spam doesn't seem to be entirely listened to? The Bayesian classifier operates on tokens, not messages. So while training a message as spam is like this is spam as you put it, according to Bayes it's these tokens appear in spam. For each token (think of it as words), the number of ham and spam they appeared in and have been learned from are counted. The higher that ratio is, the higher the probability of a message to be the same classification for any given token found in later mail. So my question is: shouldn't -L spam/-L ham always make SA re-train the bayes more explicitly? Or is that really not possible with a single email message? (ie it's a statistics thing). Just trying to understand the backend :-) It's statistics. Learning (increasing the number of ham or spam a token has been seen in) has less effect for tokens seen about equally frequent in both ham and spam, than if there already is a bias. Similarly, tokens with high counts need more training to change overall probability, than tokens less common in mail. IOW, words like and will never be a strong spammyness indicator. For more details on that entire topic of Bayes and training, I suggest the sa-learn man page / documentation. For a closer look at the tokens used for classification see the hammy/spammytokens Template Tags in the M::SA::Conf docs. Both available here: http://spamassassin.apache.org/doc/ For ad-hoc debugging after training see the spamassassin --cf option to add_header the token details without a need to actually add them to every mail. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Bayes autolearn questions
Please use plain-text rather than HTML. In particular with that really bad indentation format of quoting. On Sat, 2014-09-06 at 17:22 -0400, Alex wrote: On Thu, Sep 4, 2014 at 1:44 PM, Karsten Bräckelmann wrote: On Wed, 2014-09-03 at 23:50 -0400, Alex wrote: I looked in the quarantined message, and according to the _TOKEN_ header I've added: X-Spam-MyReport: Tokens: new, 47; hammy, 7; neutral, 54; spammy, 16. Isn't that sufficient for auto-learning this message as spam? That's clearly referring to the _TOKEN_ data in the custom header, is it not? Yes. Burning the candle at both ends. Really overworked. Sorry to hear. Nonetheless, did you take the time to really understand my explanations? It seems you sometimes didn't in the past, and I am not happy to waste my time on other people's problems if they aren't following thoroughly. That has absolutely nothing to do with auto-learning. Where did you get the impression it might? If the conditions for autolearning had been met, I understood that it would be those new tokens that would be learned. Learning is not limited to new tokens. All tokens are learned, regardless their current (h|sp)ammyness. Still, the number of (new) tokens is not a condition for auto-learning. That header shows some more or less nice information, but in this context absolutely irrelevant information. I understood new to mean the tokens that have not been seen before, and would be learned if the other conditions were met. Well, yes. So what? Did you understand that the number of previously not seen tokens has absolutely nothing to do with auto-learning? Did you understand that all tokens are learned, regardless whether they have been seen before? This whole part is entirely unrelated to auto-learning and your original question. Auto-learning in a nutshell: Take all tests hit. Drop some of them with certain tflags, like the BAYES_xx rules. For the remaining rules, look up their scores in the non-Bayes scoreset 0 or 1. Sum up those scores to a total, and compare with the auto-learn threshold values. For spam, also check there are at least 3 points each by header and body rules. Finally, if all that matches, learn. Is it important to understand how those three points are achieved or calculated? In most cases, no, I guess. Though that is really just a distinction usually easy to do based on the rule's type: header vs body-ish rule definitions. If the re-calculated total score in scoreset 0 or 1 exceeds the auto-learn threshold but the message still is not -- then it is important. Unless you trust the auto-learn discriminator to not cheat on you. Okay, of course I understood the difference between points and tokens. Since the points were over the specified threshold, I thought those new tokens would have been added. As I have mentioned before in this thread: It is NOT the message's reported total score that must exceed the threshold. The auto-learning discriminator uses an internally calculated score using the respective non-Bayes scoreset. Very helpful, thanks. Is there a way to see more about how it makes that decision on a particular message? spamassassin -D learn Unsurprisingly, the -D debug option shows information on that decision. In this case limiting debug output to the 'learn' area comes in handy, eliminating the noise. The output includes the important details like auto-learn decision with human readable explanation, score computed for autolearn as well as head and body points. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Large commented out body HTML causing SA to timeout/give up/allow spam
On Fri, 2014-09-05 at 11:55 -0400, Justin Edmands wrote: We are seeing a few emails that are about a 1MB and [...] dbg: timing: total 46640 ms BUT, because the live test likely took 46 seconds, I think SA is giving up or something similar. The actual email run through the live SA instance shows no score at all. If SA timed out, this would be reflected in your logs. Your guessing suggests you did not check logs. How are you passing messages to SA? Using spamc/d? With spamc the size limit of messages it will process is 500 kByte by default. Other methods and glue are likely to have a size limit, too. Odds are, that message simply has not been passed to SA. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Bayes autolearn questions
On Wed, 2014-09-03 at 23:50 -0400, Alex wrote: I looked in the quarantined message, and according to the _TOKEN_ header I've added: X-Spam-MyReport: Tokens: new, 47; hammy, 7; neutral, 54; spammy, 16. Isn't that sufficient for auto-learning this message as spam? That's clearly referring to the _TOKEN_ data in the custom header, is it not? That has absolutely nothing to do with auto-learning. Where did you get the impression it might? If the conditions for autolearning had been met, I understood that it would be those new tokens that would be learned. Learning is not limited to new tokens. All tokens are learned, regardless their current (h|sp)ammyness. Still, the number of (new) tokens is not a condition for auto-learning. That header shows some more or less nice information, but in this context absolutely irrelevant information. Auto-learning in a nutshell: Take all tests hit. Drop some of them with certain tflags, like the BAYES_xx rules. For the remaining rules, look up their scores in the non-Bayes scoreset 0 or 1. Sum up those scores to a total, and compare with the auto-learn threshold values. For spam, also check there are at least 3 points each by header and body rules. Finally, if all that matches, learn. Okay, of course I understood the difference between points and tokens. Since the points were over the specified threshold, I thought those new tokens would have been added. As I have mentioned before in this thread: It is NOT the message's reported total score that must exceed the threshold. The auto-learning discriminator uses an internally calculated score using the respective non-Bayes scoreset. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: A rule for Phil
On Thu, 2014-09-04 at 13:54 -0600, Philip Prindeville wrote: On Sep 3, 2014, at 7:36 PM, Karsten Bräckelmann guent...@rudersport.de wrote: header __KAM_PHIL1To =~ /phil\@example\.com/i header __KAM_PHIL2Subject =~ /(?:CV|Curriculum)/i Bonus points for using non-matching grouping. But major deduction of points for that entirely un-anchored case insensitive 'cv' substring match. I’d anchor both matches, Generally correct, of course. For anchoring the To header regex, I suggest using the To:addr variant I used in my rules. That way the address easily can be anchored at the beginning /^ and end $/ of the whole string, which equals the address. Without the :addr option, proper anchoring is a real mess. or else amp...@example.community.org will fire. Granted, the To header is cosmetic and does not necessarily hold the actual recipient address. However, since example.com is the OPs domain (so to speak), it is unlikely he'll receive mail with addresses like that. ;) -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: correct AWL on training
On Thu, 2014-09-04 at 09:11 -0600, Jesse Norell wrote: On Thu, 2014-09-04 at 13:04 +0200, Matus UHLAR - fantomas wrote: On 03.09.14 15:13, Jesse Norell wrote: Both today and in the past I've looked at some FP's that scored very high on AWL. At least today I dug up the old messages that caused AWL to get out of line, and trained them as ham. AWL's scores still show the high scores on those (in this case I manually corrected AWL). It sure seems like manual training should at minimum remove the incorrect score from AWL, if not actually make an adjustment in the opposite direction. I can see how one could wish for this. However, keep in mind those are entirely unrelated sub-systems. The AWL really only is a rather simple historic score-averager. In this context it is also important to note, that sa-learn is Bayes only. Any other type of reporting is spamc or spamassassin, including AWL manipulation. The spamassassin executable notably is the only one that actually can handle both. The AWL manipulating options are rather limited, offering addition of a high scoring positive or negative entry, or plain removal of an address. In particular unlike Bayes, AWL doesn't work on a per-message basis. Forgetting a single message's history entry is not supported. spamassassin has options for manipulating adress list: --add-to-whitelist --add-to-blacklist --remove-from-whitelist --add-addr-to-whitelist --add-addr-to-blacklist --remove-addr-from-whitelist and you can clean up AWL by using sa-awl. I can as an admin, but pop/imap users can't. They can access the spam/ham training, it just doesn't correct the AWL data any. In this So you implemented a feedback / training mechanism for Bayes for your POP or IMAP users. SA doesn't provide it. case I'm looking at, a few messages came in first that got AWL way off, and now training it as ham (which is hard enough to get users to do) doesn't help the situation. (Some of our systems allow the user access to whitelist, but unfortunately this one doesn't - they can't fix it.) Bayes training will have an effect of ~5.5 at max, which is the extreme between BAYES_00 and 999. Real life effect of training is commonly about half of that max. This is likely to not suffice way off AWL scores. Besides you're trying to correct AWL by Bayes training. The question is: Why was the AWL score way off in the first place? In your FP case, why have (more than one?) messages from that sender address, originating from a given net-block been classified spam before? Even worse, given AWL now was way off and pulled the score above threshold, the previous messages recorded in AWL are not just spam, but with a high score. Again, why? Ie. after training, AWL had score of ~47 from 7 messages. Seems like those FP scores should be subtracted, and even another -5 per message trained wouldn't hurt. Likewise, FN should adjust AWL upwards on manual training, no? I am not sure how should the manual training be done when talking about AWL. The only way I think is to remove the address from AWL. Just adjust the score would be another option. AWL, you got it wrong, lets take the score the other direction. (or at least undue the mistake/damage it just did) You could have a config option for how much adjustment to make in the other direction (maybe 3 to 5ish?). -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: correct AWL on training
On Fri, 2014-09-05 at 01:05 +0200, Karsten Bräckelmann wrote: The AWL manipulating options are rather limited, offering addition of a high scoring positive or negative entry, or plain removal of an address. In particular unlike Bayes, AWL doesn't work on a per-message basis. Forgetting a single message's history entry is not supported. In related news: The AWL plugin was enabled by default in 3.1 and 3.2, disabled by default again since 3.3. TxRep is a proposed replacement (see bugzilla). It might be worth evaluating whether it better addresses the features you'd benefit from in this case, including forgetting or correcting per-message entries. Since it still is under development, even feature requests or discussing these issues for TxRep might be worth it. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: A rule for Phil
On Wed, 2014-09-03 at 12:30 +0200, Luciano Rinetti wrote: Thank You for the answer Karsten, you have right, Phil doesn't exists, (as example.com) but i hide the real address for obvious reasons, and it is a role email that i want will receive only mail with subject CV or Curriculum and all the general mail will be treated and scored as spam. My intention are not top secret, i will be glad even only if you address me to the SA conf docs or the rule-writing wiki. Let me google that for you. The first result should be the SA wiki WritingRules page as a starter. http://lmgtfy.com/?q=spamassassin+rule+writing Il 03/09/2014 05:21, Karsten Bräckelmann ha scritto: On Mon, 2014-09-01 at 07:36 +0200, Luciano Rinetti wrote: I need a rule that, when a message is sento to p...@example.com and the Subject contains CV or Curriculum, scores the message with -9 and a rule that, when a message is sent to to p...@example.com and the Subject doesn't contains CV or Curriculum, scores the message with 7 The specified criteria are trivial, and can be easily translated into rules. Reading the SA conf docs and maybe some of the rule-writing wiki docs should enable the reader to do exactly that. (Hint: meta rules) Oh well, here goes. Untested. header __PHIL_TOTo:addr =~ /phil\@example.com/i header __PHIL_SUBJ Subject =~ /\b(cv|curriculum)\b/i meta PHIL_CURRICULUM __PHIL_TO __PHIL_SUBJ describe PHIL_CURRICULUM CV for Phil scorePHIL_CURRICULUM -2 meta PHIL_NOT_CURRICULUM __PHIL_TO !__PHIL_SUBJ describe PHIL_NOT_CURRICULUM Not a CV for Phil scorePHIL_NOT_CURRICULUM 1 Do note though, that this approach is NOT fool-proof. Messages containing a CV still can end up classified spam for various reasons. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: A rule for Phil
On Wed, 2014-09-03 at 17:18 -0400, Kevin A. McGrail wrote: On 9/3/2014 5:14 PM, Karsten Bräckelmann wrote: The specified criteria are trivial, and can be easily translated into rules. [...] header __PHIL_TOTo:addr =~ /phil\@example.com/i header __PHIL_SUBJ Subject =~ /\b(cv|curriculum)\b/i meta PHIL_CURRICULUM __PHIL_TO __PHIL_SUBJ describe PHIL_CURRICULUM CV for Phil scorePHIL_CURRICULUM -2 meta PHIL_NOT_CURRICULUM __PHIL_TO !__PHIL_SUBJ describe PHIL_NOT_CURRICULUM Not a CV for Phil scorePHIL_NOT_CURRICULUM 1 It appears I did not email the list my response but should provide an interesting exercise if only to see how similar our approach was: Which isn't much of a surprise. It's practically the very translation of the stated requirements into simple logic and regex header rules. ;) header __KAM_PHIL1To =~ /phil\@example\.com/i header __KAM_PHIL2Subject =~ /(?:CV|Curriculum)/i Bonus points for using non-matching grouping. But major deduction of points for that entirely un-anchored case insensitive 'cv' substring match. (As a matter of principle, since that's a seriously short substring match. Granted, that char combination is pretty rare in dict/words.) -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Bayes autolearn questions
On Tue, 2014-09-02 at 21:11 -0400, Alex wrote: I have a spamassassin-3.4 system with the following bayes config: required_hits 5.0 rbl_timeout 8 use_bayes 1 bayes_auto_learn 1 bayes_auto_learn_on_error 1 bayes_auto_learn_threshold_spam 9.0 bayes_expiry_max_db_size 950 bayes_auto_expire 0 However, spam with scores greater than 9.0 aren't being autolearned: http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html Sep 2 21:01:51 mail01 amavis[25938]: (25938-10) header_edits_for_quar: bmu011...@bmu-011.hichina.com - bestd...@example.com, Yes, score=16.519 tag=-200 tag2=5 kill=5 tests=[BAYES_50=0.8, KAM_LAZY_DOMAIN_SECURITY=1, KAM_LINKBAIT=5, LOC_DOT_SUBJ=0.1, LOC_SHORT=3.1, RCVD_IN_BL_SPAMCOP_NET=1.347, RCVD_IN_BRBL_LASTEXT=1.449, RCVD_IN_PSBL=2.3, RCVD_IN_UCEPROTECT1=0.01, RCVD_IN_UCEPROTECT2=0.01, RDNS_NONE=0.793, RELAYCOUNTRY_CN=0.1, RELAYCOUNTRY_HIGH=0.5, SAGREY=0.01] autolearn=no autolearn_force=no I've re-read the autolearn section of the docs, The one I linked to above? and don't see any reason why this 16-point email wouldn't have any new tokens to be learned? Rules with certain tflags are ignored when determining whether a message should be trained upon. Most notably here BAYES_xx. Moreover, the auto-learning decision occurs using scores from either scoreset 0 or 1, that is using scores of a non-Bayes scoreset. IOW the message's score of 16 is irrelevant, since the auto-learn algorithm uses different scores per rule. Next safety net is requiring at least 3 points each from header and body rules, unless autolearn_force is enabled. Which it is not in your sample. Either of those could have prevented auto-learning. Also, according to your wording, you seem to think in terms of (number of) new tokens to be learned. Which has nothing in common with auto-learning. (Even worse, new tokens would strongly apply to random gibberish strings, hapaxes in Bayes context. Which are commonly ignored in Bayes classification.) I looked in the quarantined message, and according to the _TOKEN_ header I've added: X-Spam-MyReport: Tokens: new, 47; hammy, 7; neutral, 54; spammy, 16. Isn't that sufficient for auto-learning this message as spam? That has absolutely nothing to do with auto-learning. Where did you get the impression it might? I just wanted to be sure this is just a case of not enough new points (tokens?) for the message to be learned, and that I I wasn't doing something wrong. Points: aka score, used in the context of per-rule (per-test) and overall score classifying a message based on the required_score setting. Token: think of it as word used by the Bayesian classifier sub-system. In practice, it is more complicated than simply space separated words. Context (e.x. headers) and case might be taken into account, too. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Bayes autolearn questions
On Tue, 2014-09-02 at 20:22 -0600, LuKreme wrote: On 02 Sep 2014, at 19:11 , Alex mysqlstud...@gmail.com wrote: However, spam with scores greater than 9.0 aren't being autolearned: I believe the score threshold is the base score WITHOUT bayes. Try running the email through with a -D flag and see what you get. (And that is only a partial answer, the threshold number ignores certain classes of tests beyond bayes,but I don't remember which ones. It's unfortunate that the learn_threshold_spam uses a number that appears to be related to the spam score, because it isn't. It is. Using the accompanying, non-Bayes score-set. To avoid direct Bayes self-feeding, and other rules indirect self-feeding due to Bayes- enabled scores. BTW, if one knows of that mysterious (bayes_auto_) learn_threshold_spam you mentioned, one found the AutoLearnThreshold doc mentioning exactly that: Bayes auto-learning is based on non-Bayes scores. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: A rule for Phil
On Mon, 2014-09-01 at 07:36 +0200, Luciano Rinetti wrote: I need a rule that, when a message is sento to p...@example.com and the Subject contains CV or Curriculum, scores the message with -9 Scoring the message with $number is impossible and not how SA works. Triggering a rule with a negative score (e.x. -9) is possible. and a rule that, when a message is sent to to p...@example.com and the Subject doesn't contains CV or Curriculum, scores the message with 7 Same. Won't score the message with 7, but can trigger a rule worth some points. The specified criteria are trivial, and can be easily translated into rules. Reading the SA conf docs and maybe some of the rule-writing wiki docs should enable the reader to do exactly that. (Hint: meta rules) However, since this request is just too simple, and way too easy too shoot one's own foot, I'll spend more time on this explanation than simply dumping the requested flawed rules would take. What are you actually after? What is your problem? And why would Phil distinguish that strong between Subject tagged mail and general mail to him? Sure, because it's not phil but a role account. But you chose to disguise the purpose, so it's harder for us to help you. It's easier, if you don't try to hide your actual question. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Bayes autolearn questions
On Tue, 2014-09-02 at 21:16 -0600, LuKreme wrote: On 02 Sep 2014, at 20:50 , Karsten Bräckelmann guent...@rudersport.de wrote: On Tue, 2014-09-02 at 20:22 -0600, LuKreme wrote: I believe the score threshold is the base score WITHOUT bayes. Try running the email through with a -D flag and see what you get. (And that is only a partial answer, the threshold number ignores certain classes of tests beyond bayes,but I don't remember which ones. It's unfortunate that the learn_threshold_spam uses a number that appears to be related to the spam score, because it isn't. It is. Using the accompanying, non-Bayes score-set. To avoid direct Bayes self-feeding, and other rules indirect self-feeding due to Bayes- enabled scores. BTW, if one knows of that mysterious (bayes_auto_) learn_threshold_spam you mentioned, one found the AutoLearnThreshold doc mentioning exactly that: Bayes auto-learning is based on non-Bayes scores. But that is not the case, You can have a score without bayes that exceeds the threshold and still have the message not auto learned. True. I chose to not repeat myself highlighting the details and mentioning the constraint of header and body rules' points. See my other post half an hour earlier to this thread. And the docs. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Add spamassassin triggered rules in logs when email is blocked
On Fri, 2014-08-29 at 11:27 -0400, Karl Johnson wrote: I'm using amavisd-new-2.9.1 and SpamAssassin v3.3.1. I would like to know if it's possible to add Spamassassin triggered rules when an email is blocked because I discard the email when it's spam and I want to know why it's blocked (which rules). Wrong place, that is an Amavis question. SA does not reject, discard or otherwise block mail. Amavis does, based on the SA score. For now I only have the score (hits) in maillog: Aug 24 04:04:36 relais amavis[3475]: (03475-08) Blocked SPAM {DiscardedInternal}, MYNETS LOCAL [205.0.0.0]:54459 [205.0.0.0] bluew...@zzz.zzz.ca - z...@zzz.ca, Message-ID: e1xlsmo-0002nt...@zz.zz.ca, mail_id: 4RZ-Vm0_iZmi, Hits: 13.573, size: 4269, 10089 ms That log line is generated by Amavis. SA has no control of its contents. I would like to add in logs for example: DATE_IN_FUTURE_06_12=0.001, DCC_CHECK=4, SPF_PASS=-0.001,TVD_SPACE_RATIO=0.001 Is that possible? -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Advice on how to block via a mail domain in maillog
On Fri, 2014-08-29 at 12:43 -0600, Philip Prindeville wrote: On Aug 29, 2014, at 6:45 AM, Kevin A. McGrail kmcgr...@pccc.com wrote: On 8/29/2014 5:48 AM, emailitis.com wrote: I have a lot of Spam getting into our mail servers where the common thread is cloudapp You guys realize cloudapp.net is Microsoft Azure, don't you? And the hyperlinks in the emails are http://expert.cloudapp.net/. Please could you advise on how I can block by the information on the maillog on that, or using a rule which checks the URL to include the above thread? SA does not block. There is a new feature in trunk that I believe will help you easily called URILocalBL.pm That should do it. There’s a configuration example in the bug, and POD documentation in the plugin, but in this particular case you’d do something like: uri_block_cidr L_BLOCK_CLOUDAPP 191.237.208.246 body L_BLOCK_CLOUDAPP eval:check_uri_local_bl() That seem an overly complicated variant of a simple uri regex rule. And it really depends on the IP to match a URI? And manual looking it up? uri URI_EXPERT_CLOUDAPP m~^https?://expert\.cloudapp\.net$~ describe L_BLOCK_CLOUDAPP Block URI’s pointing to expert.cloudapp.net score L_BLOCK_CLOUDAPP5.0 SA does not block. *sigh* -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: remove_header not working?
On Fri, 2014-08-29 at 11:46 +0200, Axb wrote: Those reports are added by Exim's interface which does not seem to respect the local.cf directives. Exim accessing SA template tags? On 08/29/2014 11:29 AM, Fürtbauer Wolfgang wrote: unfortunatelly not, X-Spam-Reports are still there If the option report_safe 0 is set, SA automatically adds a Report header, though only to spam. Equivalent add_header spam Report _REPORT_ The following is not only added to ham, but its contents are not the _REPORT_ template tag but resemble the default report template, the body text used for spam with report_safe 1. There is no template tag to access the report template. Thus, this header must be defined somewhere in the configuration, complete with all that text, embedded \n newlines and _PREVIEW_ and _SUMMARY_ template tags. X-Spam-Report: Spam detection software, running on the system hausmeister.intern.luisesteiner.at, has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see postmaster for details. Content preview: [...] Content analysis details: (-221.0 points, 5.0 required) pts rule name description -- -- -100 USER_IN_WHITELIST From: address is in the user's white-list X-Spam-Report: Software zur Erkennung von Spam auf dem Rechner aohsupport02.asamer.holding.ah Are there really *two* X-Spam-Report headers? Also, why is this one in German? SA doesn't mix languages during a single run. Why do the hostnames differ? And, well, which hostmaster fat-fingered that ccTLD? -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Spam info headers
On Fri, 2014-08-29 at 00:30 -0400, Alex wrote: Regarding report_safe, the docs say it can only be applied to spam. Is that correct? Yes, it only applies to spam. It defines whether classified spam will be attached to a newly generated reporting message, or only modified by adding some X-Spam headers. Ham will never get wrapped in another message by SA... -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: no subject tagging in case of X-Spam-Status: Yes
On Fri, 2014-08-29 at 12:02 +0200, Reindl Harald wrote: Am 29.08.2014 um 04:03 schrieb Karsten Bräckelmann: Now, moving forward: I've had a look at the message diffs. Quite interesting, and I honestly want to figure out what's happening. it looks really like spamass-milter is responsible in the second version below it whines it can't extract the score to decide if it's above reject and so it really looks like the milter heavily relies on headers Yay for case in-sensitive parsing... found that out much later last night by plaing with headers in general spamass-milter[14891]: Could not extract score from Yes: Score=5.7, Tag-Level=5.0, Block-Level=10 add_header all Status _YESNO_, score=_SCORE_, tag-level=_REQD_, block-level=10 add_header all Status _YESNO_, Score=_SCORE_, Tag-Level=_REQD_, Block-Level=10 If you use the SA default Status header, or at least the prefix containing score and required, is header rewriting retained by the milter without the Flag header? add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ ... Given that log line, a likely explanation simply is that the milter needs to determine the spam status, to decide which SA generated headers to apply to the message. Your choice of custom Status header is not what the milter expects, and thus needs to resort to the simple Flag header. (Note the comma after yes/no, but no comma between score and required.) First of all, minus all those different datetime strings, IDs and ordering, the real differences are -Subject: [SPAM] Test^M -X-Spam-Flag: Yes^M +Subject: Test^M So it appears that only the sample with add_header spam Flag has the Subject re-written. correct However, there's something else going on. When re-writing the Subject header, SA adds an X-Spam-Prev-Subject header with the original. Which is clearly missing. the version is killed in smtp_header_checks which is also the reason that i started to play around with headers nobody but me has a reason to know exact versions of running software Previous-Subject, not Version. I mentioned this specifically, because the absence of the Previous Subject header with Subject rewrite clearly shows, SA generated headers are not unconditionally added to the message, but single headers are cherry picked. IOW, header rewriting does work without the Flag header. It is the glue that decides whether to inherit the rewritten header, and outright ignores the Previous Subject header. Thus, something else has a severe impact on which headers are added or modified. In *both* cases, there is at least one SA generated header missing and/or SA modified header not preserved. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: formatting of report headers
On Thu, 2014-08-28 at 11:08 +0200, Reindl Harald wrote: is it somehow possible to get line-breaks in the report headers to have them better readable? SA inserts line-breaks by default, to keep headers below 80 chars wide. report_safe 0 clear_headers add_header spam Flag _YESNO_ add_header all Status _YESNO_, score=_SCORE_/_REQD_, tests=_TESTS_, report=_REPORT_ on the shell it looks like this What you get in the shell is precisely what SA returns -- to the shell or any other calling process. Any reformatting or re-flow of multiline headers has been done by other tools. X-Spam-Status: No, score=4.3/5.0, tests=ADVANCE_FEE_4_NEW,ADVANCE_FEE_4_NEW_MONEY,ADVANCE_FEE_5_NEW,ADVANCE_FEE_5_NEW_MONEY,ALL_TRUSTED,BAYES_99,BAYES_999,DEAR_SOMETHING,DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,LOTS_OF_MONEY,T_MONEY_PERCENT,URG_BIZ, report= * -2.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100% * [score: 1.] That long _TESTS_ string without line-breaks is due to the very long _REPORT_ in that header. If you add a dedicated Report header, the Status header and its list of tests will be wrapped appropriately, too. FWIW, SA even generates the Report header by default with your setting of report_safe 0. Not in your case, because you chose to clear_headers and manually define almost identical versions to the default headers. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Certain types of spam seem to get through SA
On Thu, 2014-08-28 at 09:15 -0600, LuKreme wrote: X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on mail.covisp.net X-Spam-Level: * X-Spam-Status: No, score=1.7 required=5.0 tests=URIBL_BLACK autolearn=no version=3.3.2 X-Spam-Status: No, score=-0.0 required=5.0 tests=SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 Bayes and auto-learning are enabled, yet there are no BAYES_XX rules hit in either sample. Something seems broken. (Not a first time poster, so I just assume the Bayes DB isn't fresh.) -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: formatting of report headers
On Thu, 2014-08-28 at 21:43 +0200, Reindl Harald wrote: Am 28.08.2014 um 19:11 schrieb Karsten Bräckelmann: FWIW, SA even generates the Report header by default with your setting of report_safe 0. Not in your case, because you chose to clear_headers and manually define almost identical versions to the default headers. no, it don't Yes, it does. Read my comment again, carefully. And see the docs, option report_safe in the section Basic Message Tagging Options. http://spamassassin.apache.org/doc/Mail_SpamAssassin_Conf.html -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: formatting of report headers
On Thu, 2014-08-28 at 21:43 +0200, Reindl Harald wrote: Am 28.08.2014 um 19:11 schrieb Karsten Bräckelmann: FWIW, SA even generates the Report header by default with your setting of report_safe 0. Not in your case, because you chose to clear_headers and manually define almost identical versions to the default headers. More detail, in addition to my other reply. # header configuration fold_headers 1 report_safe 0 If this option is set to 0, [...]. In addition, a header named X-Spam-Report will be added to spam. -- M::SA::Conf docs X-Spam-Status: No, score=0.3 required=5.0 tests=BAYES_50,CUST_DNSBL_2, CUST_DNSBL_5,CUST_DNSWL_7,DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,SPF_SOFTFAIL autolearn=disabled version=3.4.0 Not spam, no X-Spam-Report header. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Reporting to SpamCop
On Thu, 2014-08-28 at 16:14 -0500, Chris wrote: I'm having an issue with getting SA 3.4.0 when run as spamassassin -D -r to report spam to SpamCop. The errors I'm seeing are: Ignoring the Perl warnings for now. In my v310.pre file I have: loadplugin Mail::SpamAssassin::Plugin::SpamCop /usr/local/share/perl/5.18.2/Mail/SpamAssassin/Plugin/SpamCop.pm It should never be necessary to provide the (optional) filename argument with stock SA plugins. Even worse, absolute paths will eventually be harmful. I have set the SpamCop from and to addresses in the SpamCop.pm file: The Perl modules are no user-serviceable parts. Do not edit them. Moreover, the SpamCop plugin provides the spamcop_(from|to)_address options to set these in your configuration. See http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_SpamCop.html setting = 'cpoll...@example.com', setting = 'submit.exam...@spam.spamcop.net', Wait... What exactly did you edit? The only instances of 'setting' in SpamCop.pm are the ones used to register SA options. Did you replace the string spamcop_from_address with your email address? I have a gut feeling the Perl warnings will disappear, if you revert any modifications to the SpamCop.pm Perl module and set the options in your configuration instead... -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: writing own rbl rules
On Fri, 2014-08-29 at 00:22 +0200, Reindl Harald wrote: the simple answer to my question would have been no, in no case SA does any RBL check if the client is from the same network range and there is no way to change that temporary even for development [...] That would have been simpler indeed, but that also would have been wrong. if there is no hop before and hence no received headers before there is still a known IP - the one and only and in that case the currently connection client - there is no reason not fire a DNSBL/DNSWL against that IP SA is not the SMTP server, it has no knowledge of the connection's remote IP. SA depends on the Received headers added by the internal network's SMTP server (or its milter) to get that information. Besides: SA is not an SMTP. It does not add the Received header. And it absolutely has to inspect headers, whether you like that or not. That is how SA determines exactly that last, trustworthy, physical IP. And for that, trusted and internal networks need be correct, so by extension external networks also are correct. and the machine SA is running on receiving the message adds that header which is in case of direct testing the one and only and so trustable Your configuration stated that machine is not trustable. In particular, your MX, your first internal relay, absolutely MUST be trusted by SA. That is the SMTP relay identifying the sending host, complete with IP and rDNS. again: the machine running SA *is the MX* Correct (even though it is irrelevant whether it is or not). So don't configure SA to not trust that machine, and include at the very least that IP in your trusted_networks. Your configuration stated that machine is not trustable. Received headers before that simply CANNOT be trusted. There is no way to guarantee the host they claim to have received the message from is legit in case running postfix with SA as milter *there are no* Received headers *before* because there is nobody before There almost always is at least one Received header before, the sender's outgoing SMTP server. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: no subject tagging in case of X-Spam-Status: Yes
On Fri, 2014-08-29 at 00:30 +0200, Reindl Harald wrote: besides the permissions problem after the nightly sa-update the reason was simply clear_headers without add_header spam Flag _YESNO which is entirely unexpected behavior No, that is not the cause. $ echo -e Subject: Foo\n | ./spamassassin | grep Subject Subject: [SPAM] Foo X-Spam-Prev-Subject: Foo $ cat rules/99_DEVEL.cf required_score -999# regardless of score, classify spam # to enforce header rewriting clear_headers rewrite_header Subject [SPAM] Besides, your own reply to my first post to this thread on Mon also shows this claim to be false. The output of the command I asked you to run clearly shows clear_headers in your config being in effect and a rewritten Subject. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: writing own rbl rules
On Fri, 2014-08-29 at 01:06 +0200, Reindl Harald wrote: the question was just how can i enforce RBL tests inside the own LAN the question was just how can i enforce RBL tests inside the own LAN the question was just how can i enforce RBL tests inside the own LAN RBL tests cannot be enforced. Internal and trusted networks settings need to be configured correctly to match the RBL test's scope, in your case last-external. If there are trusted relays found in the Received headers, and the first trusted one's connecting relay is external (not in the internal_networks set), then an RBL test for last-external will be run. This is entirely unrelated to own LAN or network range. Received headers before that simply CANNOT be trusted. There is no way to guarantee the host they claim to have received the message from is legit in case running postfix with SA as milter *there are no* Received headers *before* because there is nobody before There almost always is at least one Received header before, the sender's outgoing SMTP server *no no no and no again* there is no Received header before because a botnet zombie don't use a outgoing SMTP server I said almost always, with direct-to-MX delivery being the obvious exception. Possible with botnet spam, yes, but too easy to detect. Thus, botnet zombies frequently forge Received headers. (Besides, in your environment SA won't see much botnet spam anyway. Spamhaus PBL as first level of defense in your Postfix configuration will reject most of them. But that's not the point here.) -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: no subject tagging in case of X-Spam-Status: Yes
On Fri, 2014-08-29 at 01:23 +0200, Reindl Harald wrote: Am 29.08.2014 um 01:20 schrieb Karsten Bräckelmann: On Fri, 2014-08-29 at 00:30 +0200, Reindl Harald wrote: besides the permissions problem after the nightly sa-update the reason was simply clear_headers without add_header spam Flag _YESNO which is entirely unexpected behavior No, that is not the cause. $ echo -e Subject: Foo\n | ./spamassassin | grep Subject Subject: [SPAM] Foo X-Spam-Prev-Subject: Foo $ cat rules/99_DEVEL.cf required_score -999# regardless of score, classify spam # to enforce header rewriting clear_headers rewrite_header Subject [SPAM] Besides, your own reply to my first post to this thread on Mon also shows this claim to be false. The output of the command I asked you to run clearly shows clear_headers in your config being in effect and a rewritten Subject i verfied that 20 times in my environment removing the line add_header spam Flag _YESNO_ and no tagging maybe the combination of spamass-milter and SA but it's fact So far I attributed most of your arguing to being stubborn and opinionated. Not any longer. Now you're outright lying. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: writing own rbl rules
On Fri, 2014-08-29 at 01:59 +0200, Reindl Harald wrote: Am 29.08.2014 um 01:51 schrieb Karsten Bräckelmann: On Fri, 2014-08-29 at 01:06 +0200, Reindl Harald wrote: the question was just how can i enforce RBL tests inside the own LAN RBL tests cannot be enforced. Internal and trusted networks settings need to be configured correctly to match the RBL test's scope, in your case last-external. If there are trusted relays found in the Received headers, and the first trusted one's connecting relay is external (not in the internal_networks set), then an RBL test for last-external will be run. This is entirely unrelated to own LAN or network range that may all be true for blacklists and default RBL rules it is no longer true in case of 4 internal WHITELISTS which you want to use to LOWER scores to reduce false positives while otherwise bayes may hit - such traffic can also come from the internal network There is absolutely no difference between black and whitelists. With the only, obvious exception of the rule's score. So, yes, it still is true in the case of (internal) whitelists. Besides that, you are (still) confusing SA *_networks settings with the local network topology. They are loosely related, but don't have to match. You can easily run RBL tests against IPs from within the local network and treat them like any other sending SMTP client, by (a) excluding them from the appropriate *_networks settings, and (b) define the RBL test accordingly. If you want to query for the last-external, it has to be the last external relay according to the configuration. BTW, unless the set of IPs to whitelist is permanently changing, it is much easier to write a negative score rule based on the X-Spam-Relays-* pseudo-headers. This also has the benefit of being highly flexible, not depend on trust borders and allow to maintain internal_networks matching the LAN topology. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: no subject tagging in case of X-Spam-Status: Yes
On Fri, 2014-08-29 at 02:15 +0200, Reindl Harald wrote: look at the attached zp-archive and both messages produced with the same content before you pretend others lying damned - to make it easier i even added a config-diff But no message diff. ;) and now what? maybe you should accept that even new users are no idiots and know what they are talking about Please accept my apologies. It appears something else is going on here, and you in fact did not lie. I'd like to add, though, that I do *not* assume new users to be idiots. Plus, I generally spend quite some time on helping others fixing their problems, including new users, as you certainly have noticed. Now, moving forward: I've had a look at the message diffs. Quite interesting, and I honestly want to figure out what's happening. First of all, minus all those different datetime strings, IDs and ordering, the real differences are -Subject: [SPAM] Test^M -X-Spam-Flag: Yes^M +Subject: Test^M So it appears that only the sample with add_header spam Flag has the Subject re-written. However, there's something else going on. When re-writing the Subject header, SA adds an X-Spam-Prev-Subject header with the original. Which is clearly missing. Thus, something else has a severe impact on which headers are added or modified. In *both* cases, there is at least one SA generated header missing and/or SA modified header not preserved. Definitely involved: Postfix, spamass-milter, SA. And probably some other tool rewriting the message / reflowing headers, as per some previous posts (and the X-Spam-Report header majorly inconvenienced by re-flowing headers). Regarding SA and the features in question: There is no different behavior between calling the plain spamassassin script and using spamc/d. There is absolutely nothing in SA itself that could explain the discrepancy in Subject rewriting, nor the missing X-Spam-Prev-Subject header. My best bet would be on the SA invoking glue, not accepting or overwriting headers as received by SA. Which tool that actually is, I don't know. But I'd be interested to hear about it, if you find out. (The additional empty line between message headers and body in the case without X-Spam-Flag header most likely is just copy-n-paste body. Or possibly another artifact of some tool munging messages.) -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: no subject tagging in case of X-Spam-Status: Yes
On Fri, 2014-08-29 at 02:15 +0200, Reindl Harald wrote: look at the attached zp-archive [...] Since I already had a closer look at the contents including your local cf, and I am here to offer help and didn't mean no harm, some comments regarding the SA config. # resolves a bug with milter always triggering a wrong informational header score UNPARSEABLE_RELAY 0 See the RH bug you filed and its upstream report. Do you still need that? This would be the first instance of continued triggering of that test I ever encountered. # disable most builtin DNSBL/DNSWL to not collide with webinterface settings score __RCVD_IN_SORBS 0 score __RCVD_IN_ZEN 0 score __RCVD_IN_DNSWL 0 Rules starting with double-underline are non-scoring sub-rules. Assigning a zero score doesn't disable them like it does with regular rules. In the case of RBL sub-rules like the above, it does not prevent DNS queries. It is better to meta __FOO 0 overwrite the sub-rule, rather than set a score that doesn't exist. # unconditional sender whitelists whitelist_from *@apache.org whitelist_from *@bipa.co.at whitelist_from *@centos.org whitelist_from *@dovecot.org [...] Unconditional whitelisting generally is a bad idea and might appear in forged addresses. If possible, it is strongly suggested to use whitelist_from_auth, or at least whitelist_from_rcvd (which requires *_networks be set correctly). -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Spam info headers
On Wed, 2014-08-27 at 17:07 -0400, Alex wrote: I've set up a local URI DNSBL and I believe there are some FPs that I'd like to identify. I've currently set up amavisd to set $sa_tag_level_deflt at a value low enough that it always produces the X-Spam-Status header on every email. It will show LOC_URIBL=1 in the status, but is it possible to have it somehow report/show the domain that caused the rule to fire, in the same way that it can be done with spamassassin directly on the command-line using -t? The URIs [1] are automatically added to the uridnsbl rule's description for _REPORT_ and _SUMMARY_ template tags. The latter is identical to the additional summary at the end with the -t option, the first one is suitable for headers. add_header spam Report _REPORT_ That Report header is set by default with report_safe 0 (stock SA, not Amavis). [1] Actually lists a single one only, if multiple URIs are hit. That's a comment documented TODO item. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Spam info headers
On Wed, 2014-08-27 at 21:37 -0400, Alex wrote: On Wed, Aug 27, 2014 at 6:18 PM, Karsten Bräckelmann guent...@rudersport.de wrote: The URIs [1] are automatically added to the uridnsbl rule's description for _REPORT_ and _SUMMARY_ template tags. The latter is identical to the additional summary at the end with the -t option, the first one is suitable for headers. add_header spam Report _REPORT_ That Report header is set by default with report_safe 0 (stock SA, not Amavis). I now recall having added a few custom headers in the past, and it was indeed necessary to instruct amavis to display them. I did a little more digging around, and learned how I was doing it previously was replaced with the following, in amavisd.conf: $allowed_added_header_fields{lc('X-Spam-Report')} = 1; So I've modified my local.cf with the following: report_safe 0 clear_report_template That's actually a historic, unfortunate naming. Despite it's name, the report option (see 10_default_prefs.cf) sets the template used with report_safe 1 or 2, which by default shows a brief description, (attached spam) content preview and _SUMMARY_. It does not have any impact on the X-Spam-Report header added with report_safe 0 by default or the _REPORT_ template tag. In the case of report_safe 0, the clear_report_template option actually has no effective impact at all. That report will just not be added anyway. add_header all Report _REPORT_ Despite specifying all, it's only displayed in quarantined messages. I need it to be displayed on non-spam messages, and all messages would be most desirable. That'd be an Amavis specific issue. Using add_header all, SA does add that header to both ham and spam no matter what. In particular, quarantining is outside the scope of SA, and if that makes a difference whether a certain header appears or not, that's also outside the scope of SA. There's also this in the SA conf docs: report ...some text for a report... Set the report template which is attached to spam mail messages. See the 10_misc.cf configuration file in /usr/share/spamassassin for an example. Is this still valid? 10_misc.cf apparently no longer exists, so I wasn't able to follow through there. Wow. 10_misc.cf last appeared in 3.1.x, and is otherwise identical to 10_default_prefs.cf since 3.2. In particular with respect to that very doc snippet -- nothing at all changed in that paragraph, except that file name. You want to update your docs bookmarks. It's times like these I wonder whether I am the only one left grepping his way through files and directories, searching for $option. Or remembering the ancient magic of a tab, when looking for possibly matching (numbered!) files... -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Prevent DNSBL URI matches, without affecting regex URI rules?
On Tue, 2014-08-26 at 11:22 -0400, Kris Deugau wrote: Is there a way to prevent a URI from being looked up in DNSBLs, without *also* preventing that URI from matching on uri regex rules? I would like to add quite a few popular URL shorteners to uridnsbl_skip_domain, but then I can't match those domains in uri regex rules for feeding x and URL shortener meta rules. Works for me. $ echo -e \n example.com | ./spamassassin -D --cf=uri HAS_URI /.+/ dbg: rules: ran uri rule HAS_URI == got hit: http://example.com; $ ./spamassassin --version SpamAssassin version 3.3.3-r1136734 running on Perl version 5.14.2 $ grep example.com rules/25_uribl.cf uridnsbl_skip_domain example.com example.net example.org Still using SA 3.3.2; if the behaviour of uridnsbl_skip_domain has been narrowed down in 3.4 to only skipping the listed domains on DNSBL lookups (as per its name) that may prod me to get 3.4 running. Oh, 3.3.2... Also verified the 3.3.2 (and 3.3.0 for that matter) svn tag version, in addition to my local 3.3 branch above. Same result, works for me. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: writing own rbl rules
On Wed, 2014-08-27 at 01:08 +0200, Reindl Harald wrote: below the stdout/sterr of following script filtered for dns so the lists are asked, but the question remains why that don't happen from a IP in the same network Nope, no RBL queries. See below. in the meantime there are a lot of custcount-lastexternal generated from a web-interface including the 4 below and the local network range is listed on them, hence why i want them used unconidtionally and not only with foreign IP's If it's internal, it's internal. There is a reason you are setting up lastexternal DNSxL rules. Do not invalidate SA *_networks configuration in an attempt to adjust it to poorly, non real-live generated samples. Generate a proper sample instead, either by actually sending mail from external IPs, or if need be by manually editing the MX Received header, forging an external source (do pay attention to detail). Besides, there is no point in whitelisting your own LAN IPs. Those should simply hit ALL_TRUSTED, or just not be filtered in the first place. /usr/bin/spamassassin -D /var/lib/spamass-milter/spam-example.eml [sa-milt@mail-gw:~]$ cat debug.txt | grep -i dns Aug 27 00:59:29.249 [30833] dbg: metadata: X-Spam-Relays-Untrusted: [ ip=10.0.0.19 rdns=mail-gw.thelounge.net helo=mail-gw.thelounge.net by=mail.thelounge.net ident= envfrom= intl=0 id=3hjPzJ6TWVz23 auth= msa=0 ] [ ip=10.0.0.6 rdns=arrakis.thelounge.net helo=arrakis.thelounge.net by=mail-gw.thelounge.net ident= envfrom= intl=0 id=3hjPzJ2tkPz1w auth= msa=0 ] There is no X-Spam-Relays-Trusted metadata in your grep for dns, which means there is absolutely no trusted relay. Given those relays are in the 10/8 class A network and you deliberately breaking trusted_networks in a previous post, that seems about right... Aug 27 00:59:29.249 [30833] dbg: metadata: X-Spam-Relays-External: [ ip=10.0.0.19 rdns=mail-gw.thelounge.net helo=mail-gw.thelounge.net by=mail.thelounge.net ident= envfrom= intl=0 id=3hjPzJ6TWVz23 auth= msa=0 ] [ ip=10.0.0.6 rdns=arrakis.thelounge.net helo=arrakis.thelounge.net by=mail-gw.thelounge.net ident= envfrom= intl=0 id=3hjPzJ2tkPz1w auth= msa=0 ] Same issue with X-Spam-Relays-Internal not showing up in the grep, thus being completely empty. Unless you specified internal_networks manually, it is set to trusted_networks. Thus equally invalid. Aug 27 00:59:29.254 [30833] dbg: dns: checking RBL bl.spameatingmonkey.net., set cust12-lastexternal Aug 27 00:59:29.254 [30833] dbg: dns: checking RBL spam.dnsbl.sorbs.net., set cust15-lastexternal Aug 27 00:59:29.254 [30833] dbg: dns: checking RBL psbl.surriel.com., set cust14-lastexternal All those third-party RBLs with your cust sets are extremely fishy. Anyway, there are no dbg: dns: IPs found: and dbg: dns: launching lines, so this clearly shows the RBLs are NOT queried. Aug 27 00:59:29.254 [30833] dbg: dns: checking RBL dnswl-low.thelounge.net., set cust16-lastexternal No activity with your custom RBL either. But well, how would you expect SA to query *last* external, given you deliberately told SA there are no internal relays... All external. No internal, no last external aka hop before first internal either. First of all, do read and understand the (trusted|internal)_networks options in the M::SA::Conf [1] docs, section Network Test Options. Then remove the current bad *_networks options in your conf. If you don't fully understand those docs, keep it at that, default. If you do understand and see an actual need to manually set them, do so, but do so *correctly*. Hints on gathering relevant information from the debug output: Don't just grep for generic dns, but check specifics by grepping for X-Spam-Relays and (trusted|internal)_networks. Better yet, don't grep but search the debug output interactively, and read nearby / related info. While debugging, actually reading, searching for terms or at least glimpsing the entire debug output is good advice anyway. [1] http://spamassassin.apache.org/doc/Mail_SpamAssassin_Conf.html -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: writing own rbl rules
On Wed, 2014-08-27 at 03:01 +0200, Reindl Harald wrote: If it's internal, it's internal. There is a reason you are setting up lastexternal DNSxL rules. the intention is to handle the internal IP like it would be external Again: Craft your samples to match real-life (production) environment. Do not configure or try to fake an environment that will not match production later. It won't work. You want to configure SA. So configure SA. Correctly. If you insist on not following that advice, please refrain from further postings to this list. Aug 27 00:59:29.249 [30833] dbg: metadata: X-Spam-Relays-Untrusted: [ ip=10.0.0.19 rdns=mail-gw.thelounge.net helo=mail-gw.thelounge.net by=mail.thelounge.net ident= envfrom= intl=0 id=3hjPzJ6TWVz23 auth= msa=0 ] [ ip=10.0.0.6 rdns=arrakis.thelounge.net helo=arrakis.thelounge.net by=mail-gw.thelounge.net ident= envfrom= intl=0 id=3hjPzJ2tkPz1w auth= msa=0 ] There is no X-Spam-Relays-Trusted metadata in your grep for dns, which means there is absolutely no trusted relay. Given those relays are in the 10/8 class A network and you deliberately breaking trusted_networks in a previous post, that seems about right... the intention to berak it was to behave like it is external and just check the RBL behavior Read my previous post again, carefully. If you define everything to be external, there is no *last* external SA can trust. Anyway, there are no dbg: dns: IPs found: and dbg: dns: launching lines, so this clearly shows the RBLs are NOT queried. that's my problem :-) So you know how to fix it. Configure *_networks in SA correctly, and send a message from an external host. No activity with your custom RBL either. But well, how would you expect SA to query *last* external, given you deliberately told SA there are no internal relays... well, there will never be internal relays, just a inbound-only MX That IS an internal relay. Your MX must be in your internal_networks, and it is by the very definition of MX an SMTP relay. All external. No internal, no last external aka hop before first internal either. i want that RBL checks in general only for the *phyiscal* IP with no header inspections - 90% of inflow will be finally filtered out by postcsreen anyways You need an internal, trusted relay to get that IP you desire. That relay is what generates the Received header with precisely that IP. Besides: SA is not an SMTP. It does not add the Received header. And it absolutely has to inspect headers, whether you like that or not. That is how SA determines exactly that last, trustworthy, physical IP. And for that, trusted and internal networks need be correct, so by extension external networks also are correct. First of all, do read and understand the (trusted|internal)_networks options in the M::SA::Conf [1] docs, section Network Test Options. Then remove the current bad *_networks options in your conf. If you don't fully understand those docs, keep it at that, default. If you do understand and see an actual need to manually set them, do so, but do so *correctly*. the intention is no trust / untrust at all and handle any IP with it's phyiscal connection Do read the docs I linked to. You are totally misunderstanding trust. It is not about what you trust, or don't. It is about which Received headers SA can trust to be correct. In particular, your MX, your first internal relay, absolutely MUST be trusted by SA. That is the SMTP relay identifying the sending host, complete with IP and rDNS. Received headers before that simply CANNOT be trusted. There is no way to guarantee the host they claim to have received the message from is legit. [1] http://spamassassin.apache.org/doc/Mail_SpamAssassin_Conf.html thanks! In general, I stand to what I wrote in the previous post. And I strongly suggest you follow that advice. The approach you tried and defended with claws in this already lengthy thread will not work and is bound to fail. Stop arguing, and start setting up a serious test environment and correct SA options. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: no subject tagging in case of X-Spam-Status: Yes
On Mon, 2014-08-25 at 11:37 +0200, Reindl Harald wrote: header contains X-Spam-Status: Yes, score=7.5 required=5.0 but the subject does not get [SPAM] tagging with the config below - not sure what i am missing What does this command return? echo -e Subject: Foo\n | spamassassin --cf=required_score 1 -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: drop of score after update tonight
On Mon, 2014-08-25 at 17:47 +0200, Reindl Harald wrote: yes and that is one which the currently existing Barracuda Spamfirewall scored with around 20 and grabbed from the backend there for testings the plain content i attached as ZIP (what made it to the listg) is used for testing by just copy the content to a formmailer or in a new plaintext message in TB point directly to the test MX Given (a) you disabled RBL checks in SA, (b) that sample is a plain body without any headers, and (c) your method of sending the sample even hits ALL_TRUSTED, SA still does a pretty decent job in comparison. The Barracuda appliance you're comparing results to did not have those disadvantages. Anyway, changing scores after a successful sa-update are to be expected. The re-scoring algorithm only uses the default threshold of 5.0, it does not know the concept of a second reject score. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: no subject tagging in case of X-Spam-Status: Yes
On Mon, 2014-08-25 at 18:55 +0200, Reindl Harald wrote: Am 25.08.2014 um 18:00 schrieb Karsten Bräckelmann: What does this command return? echo -e Subject: Foo\n | spamassassin --cf=required_score 1 as root as expected the modified subject as the milter user the unmodified [root@mail-gw:~]$ echo -e Subject: Foo\n | spamassassin --cf=required_score 1 X-Spam-Status: Yes, score=3.7 required=1.0 tests=MISSING_DATE,MISSING_FROM, MISSING_HEADERS,MISSING_MID,NO_HEADERS_MESSAGE,NO_RECEIVED,NO_RELAYS Subject: [SPAM] Foo X-Spam-Prev-Subject: Foo Exactly as expected. Subject tagging works. [root@mail-gw:~]$ su - sa-milt [sa-milt@mail-gw:~]$ echo -e Subject: Foo\n | spamassassin --cf=required_score 1 X-Spam-Status: No, score=0.0 required=1.0 tests=none Subject: Foo No tests at all. I doubt the milter generated all those missing headers including From and Date, instead of a Received one only. So it seems the restricted sa-milt user has no read permissions on the SA config. As that user, have a close look at the -D debug output. spamassassin -D --lint -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: no subject tagging in case of X-Spam-Status: Yes
On Mon, 2014-08-25 at 19:43 +0200, Reindl Harald wrote: Am 25.08.2014 um 19:13 schrieb Karsten Bräckelmann: No tests at all. I doubt the milter generated all those missing headers including From and Date, instead of a Received one only. So it seems the restricted sa-milt user has no read permissions on the SA config. As that user, have a close look at the -D debug output. spamassassin -D --lint bingo - only a snippet below thank you so much for setp in that thread the files inside exept one have correct permissions (0644) but /var/lib/spamassassin/3.004000/updates_spamassassin_org not i guess i will setup a cronjob to make sure the permissions below /var/lib/spamassassin/ are 755 and 644 for any item A dedicated cron job doesn't make sense. You should add that to the existing cron job that runs sa-update and conditionally restarts spamd. Changing permissions has to be done before restarting spamd. Alternatively, ensure the respective users for spamd, sa-update and the milter are identical, or at least share a common group. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: drop of score after update tonight
On Tue, 2014-08-26 at 00:08 +0200, Reindl Harald wrote: the bayes=1.00 below makes me wonder because around 1000 careful selected ham/spam messages for training - IMHO that should be more in such clear cases Please do read the docs or at least the rule's description (hint, see the BAYES_99 one) before venting such opinion. The Bayesian Classifier returns a probability of the mail being ham or spam, in a range between 0 and 1. Zero being ham, 1 spam, and a value of 0.5 being neutral, kind of undecided. A bayes value of 1. is as high as it gets, and the rules' descriptions also clearly state the spam probability being 99.9 to 100%. however, i admit that i am a beginner with SA! Aug 26 00:01:32 mail-gw spamd[6836]: spamd: result: Y 5 - ADVANCE_FEE_4_NEW,ADVANCE_FEE_4_NEW_MONEY,ADVANCE_FEE_5_NEW,ADVANCE_FEE_5_NEW_MONEY,ALL_TRUSTED,BAYES_99,BAYES_999,DEAR_SOMETHING,DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,LOTS_OF_MONEY,T_MONEY_PERCENT,URG_BIZ scantime=0.3,size=4760,user=sa-milt,uid=189,required_score=1.0,rhost=localhost,raddr=127.0.0.1,rport=29317,mid=*,bayes=1.00,autolearn=disabled -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Rule to check return-path for To address
On Sat, 2014-08-23 at 14:59 -0400, Jeff wrote: I recently started getting hammered by spam and nearly all of the spam emails have one thing in common. The return-path header contains the email address that the spam is being sent to. Below is a sample header: ... Return-Path: amazon-voucher-myname=mydomain@indiarti.com ... The green text above is the email address that the spam is being sent to (i.e., myn...@mydomain.com). That's common practice with legitimate mail, too, in particular mailing lists. Have a look at this mail's Return-Path header. Is there a way to write a custom SpamAssassin rule that will mark any message as spam if the return-path contains the 'To' address, regardless of what it may be, and the equal sign (i.e., user=domain.tld)? See the TO_EQ_FROM stock rule. A similar rule for the Return-Path should actually be simpler, though. The Return-Path header (or similar envelope from type headers) is generated by the MTA, so the order of Return-Path and To headers should be static -- unlike To and From, which are set by the sending MUA. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Bayes training via inotify (incron)
On Fri, 2014-08-22 at 17:32 -0700, Ian Zimmerman wrote: Isn't inotify a bit of overkill for this? If you have a dedicated maildir for training, you know that anything in maildir/new is, uh, new. So you process it and move it to maildir/cur. What am I missing? The new/ directory is for delivery, messages moved will end up in cur/. Training on messages in new/ means training solely on classification. These messages have not been seen by a human, and he's most likely not even aware there's new mail at all. Messages moved (copied) into dedicated (ham|spam) learning folders will be placed in cur/. Thus, training on content in dedicated learning folders' new/ dirs won't work, because human reviewed mail does not go there. And training on new/ dirs in general is like overriding all of the precaution measures of SA auto-learning, and blindly train anything and everything above or below the required_score threshold. Besides, moving messages from new/ to cur/ is the IMAP server's duty. No third-party script should ever mess with that. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Learning both spam and ham, edge case
On Fri, 2014-08-22 at 17:44 -0700, Ian Zimmerman wrote: I know that if you misclassify a mail as spam with sa-learn --spam /path/to/ham you can later run sa-learn --ham /path/to/ham to correct the mistake, and SA will do the right thing (ie. forget the wrong classification). And conversely, with ham - spam. Correct. SA will recognize it has been learned before, and automatically forget the previous training before re-training. My question is, what happens if you run sa-learn --spam /path/to/spam --ham /path/to/ham and the same message is in both mailboxes? Is the behavior even well-defined (ie. not random)? And if so, can it be relied on in new versions? Interesting... First of all, see the man-page. --ham and --spam are options, they don't take arguments. sa-learn [options] [file]... So your example is flawed by the assumption that --ham or --spam would affect its file/path arguments, or possibly any following file/paths. Which they don't. Experimenting with --ham and --spam options, and two (identical) file arguments yields: Learning as ham or spam is not based on command-line option order, but sa-learn code: --ham file --spam file results in learning spam, then ham. If you want to know more about sa-learn innards, I recommend looking at its source code, or at least investigating sa-learn -D [...] 21 | egrep '(learn|archive-iterator)' In short: It is not random, but well-defined (see the source code). In particular, there is no order of options. It is not guaranteed to be the same in future (major|minor) versions, since your invocation sample is not even documented. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Delays with Check_Bayes
On Thu, 2014-08-21 at 13:13 -0700, redtailjason wrote: Are you open to the possibility of upgrading to 3.4.0 and using the Redis backend for Bayes? (Just offering an alternative.) We have been developing and upgrade plan to 3.4. Based on this, we are prioritize this upgrade and will be expediting it. Thanks. Thanks for including the part you're directly referring to, as I requested. However, please do distinguish the quoted part from your comments. The first paragraph actually was written by John, but your post lacks any hint of the author, and even worse displays the quote and your text visually identical. See the difference between your latest two posts and any other post in this thread? I blame Nabble for even making this possible. In a reply, the quoted text must be visually distinctive. More reason to avoid Nabble. View this message in context: http://spamassassin.1065346.n5.nabble.com/Delays-with-Check-Bayes-tp111067p18.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com. Sic. This is a mailing list. And Nabble a third-party list archive service and poor forum-style web frontend to the mailing list. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Delays with Check_Bayes
On Wed, 2014-08-20 at 07:35 -0700, redtailjason wrote: Here is the dump from one of the scanners: netset: cannot include 127.0.0.1/32 as it has already been included 0.000 0 3 0 non-token data: bayes db version 0.000 0613 0 non-token data: nspam 0.000 0 0 0 non-token data: nham 0.000 0 50382 0 non-token data: ntokens 0.000 0 1362372138 0 non-token data: oldest atime 0.000 0 1396547409 0 non-token data: newest atime That's back in April -- and obviously not a production database. You need to run sa-update as the user SA uses during scan. In your case that's the user Amavis uses. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Delays with Check_Bayes
On Wed, 2014-08-20 at 08:51 -0700, redtailjason wrote: The initial post was data extracted from mail.log on the scanner using cat /var/log/mail.log | grep check_bayes while logged as administrator. It doesn't matter what user greps the logs. It was Amavis generating the logs. Thus, for debugging, all execution of Amavis or SA commands must be done as the user Amavis runs as. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Delays with Check_Bayes
On Wed, 2014-08-20 at 06:15 -0700, redtailjason wrote: Hello and good morning. We are running into some delays that we are trying to pin down a root cause for. Below are some examples. Within the examples, you can see that the check_bayes: scan is consuming most of the timing. Does anyone have any suggests on what to look at? We use 3.3.2. We have eight scanners setup to handle the scanning with 5GB RAM and 4 CPUs each. Volume is 250K - 500K per day. That volume means throughput of about 350 messages per minute, 5.8 per second. Sounds reasonable for 8 dedicated scanners. Your samples are showing overall timings between about 90 seconds and more than 2 minutes. Which means processing commonly takes less time, and these are some extreme cases -- unless you really do have 50-100 busy processes per machine. How many such long-running processes do you see, how frequent are they? Also, you mentioned you are using the MySQL backend for Bayes. You did not add any further detail, though. Do you have dedicated MySQL servers for Bayes? Or does each scanner machine run a local MySQL server? Do they share / sync databases somehow? Please elaborate on your environment, in particular everything concerning Bayes. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Delays with Check_Bayes
On Wed, 2014-08-20 at 13:38 -0700, redtailjason wrote: We are seeing about 4000-7000 delayed messages per day. We do utilize a dedicated MySQL Server for the Bayes and all 8 scanners share it. Please let me know if this does not fully clarify our setup for you. So we're talking about 1% of the messages. Does this happen with all scanner machines, or is this isolated to a single one? If not all scanners are affected, any differences in network connection? When did this start? Any relevant changes roughly about that time? What's your DB server load? Any noticeable load spikes, like 5k times a day? In particular, while a message is taking 2 minutes wall-clock time for Bayes, does either the scanner or database server have an unusual high load? Do you have MySQL logs which might show issues? Can you reproduce the Bayes lags? That is, can you identify a sample message, and re-process manually? When replying, please include the relevant quoted parts you're directly referring to. With some context it is easier to follow the thread. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Advice sought on how to convince irresponsible Megapath ISP.
On Sun, 2014-08-17 at 07:37 -0700, Linda Walsh wrote: Karsten Bräckelmann wrote: Be liberal in what you accept, strict in what you send. In particular, later stages simply must not be less liberal than early stages. Your MX has accepted the message. My ISP's MX has accepted it, because it doesn't do domain checking. My machine's MX rejects it so fetchmail keeps trying to deliver it. There is only one MX, run by your ISP. You are running an SMTP relay, not an MX. While I *could* figure out how to hack sendmail to not reject the message, You don't have a choice. That sendmail is an *internal* SMTP relay after the MX border. While you certainly are not looking at it this way, your own services *together* with the SMTP run by your ISP form your internal network. The internal relay you run must not be stricter than the MX. In fact, it simply cannot be stricter, without mail ending up in limbo. Exactly what you have... There is no forwarding. It comes in their MX, and is forwarded to their users. Again, that is not forwarding. (Hint: You are using fetchmail, not being-forwarded-to-me-mail.) Any ideas on how to get a cheapo-doesn't want to support anything ISP to start blocking all the garbage the pass on? Change ISP. You decided for them to run your MX. I didn't decide for them, I inherited them when they bought out the competition to supply lower quality service for the same price. We're about to split hairs, but it is your decision to try get your ISP to behave as you want, instead of taking your business elsewhere. So, yes, it is your decision to let them run your MX. It is your choice to aim for a cheapo service (your words). It wasn't when I signed up. Cost $100 extra/month. Now only $30 extra/month that I don't host the domain with them. But it is now, and all you're doing is complaining about it. Expenses dropped to a fraction of what it used to be, yet you expect the same service as before? If you're unhappy with the service, take your business elsewhere. Better service doesn't necessarily mean more expensive, but you might need to shell out a few bucks for the service you want. I already am... my ISP (cable company) doesn't have the services I want for mail hosting. I went to another company for that, It is irrelevant weather your mail service provider happens to also be your cable provider. You are paying for mail services. And if you want better service, you might need to pay more -- which is what I said. Besides, your wording is almost ironic. Your ISP didn't offer the email service you want, so you went for another company. Now your current (mail) service provider doesn't offer the service you want... -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Advice sought on how to convince irresponsible Megapath ISP.
On Fri, 2014-08-15 at 19:06 -0700, Linda A. Walsh wrote: My old email service was bought out by Megapath who is letting alot of services slide. My main issue is that my incoming email scripts follow the SMTP RFC's and if the sender address isn't valid, then it's not a valid email that should be forwarded. My script simply check for the domain existing or not - if it doesn't exist, then it rejects it. This causes about 100-200 messages a month that get stuck in an IMAP queue waiting for download -- only to be downloaded and rejected due to the sender domain not existing. Linda, your are rather vague on details, and definitely confusing terms and terminology. You state your ISP would forward mail to you. While on the other hand, a sub-set of the mail is not accepted by your scripts, thus stuck in an IMAP account waiting for download. Both, the usage of IMAP as well as mentioning download shows, your ISP is not forwarding mail, but you fetching mail. Similarly, your scripts do not reject messages, but choose not to fetch them. Pragmatic solution: If you insist on your scripts to not fetch those spam messages (which have been accepted by the MX, mind you), automate the manual download and delete stage, which frankly only exists due to your choice of not downloading them in the first place. Make your scripts delete, instead of skipping over them. Be liberal in what you accept, strict in what you send. In particular, later stages simply must not be less liberal than early stages. Your MX has accepted the message. At that point, there is absolutely no way to not accept, reject it later. You can classify, which you use SA for (I guess, given you posting here). You can filter or even delete based on classification, or other criteria. The only response my ISP will give is to turn on their spam filtering. I tried that. In about a 2 hour time frame, over 400 messages were blocked as spam. Of those less than 10 were actually spam, the rest were from various lists. So having them censoring my incoming mail isn't gonna work, but neither will the reject the obvious invalid domain email. I can't believe that they insist on forwarding SPAM to their users even though they know it is invalid and is spam. There is no censoring. There is no forwarding. Any ideas on how to get a cheapo-doesn't want to support anything ISP to start blocking all the garbage the pass on? Change ISP. You decided for them to run your MX. It is your choice to aim for a cheapo service (your words). If you're unhappy with the service, take your business elsewhere. Better service doesn't necessarily mean more expensive, but you might need to shell out a few bucks for the service you want. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
RE: Hotfix/phishing spam
On Thu, 2014-08-14 at 19:37 -0500, John Traweek CCNA, Sec+ wrote: Usually an end user has to request the hotfix and fill out a form on the MS site and then MS will send out an email with the URI. Pardon my ignorance, but... WHY!? Why would anyone require filling out a web form, to send an automated email with a link as response? Why not simply, you know, put the link in the page the user gets in return after sending that completed form anyway? Using an email message as response to an HTTP GET or POST request to transfer a http(s) URI is beyond clusterfuck. (Yes, I do realize you merely described what MS does, and you're not responsible for their lame process.) So to answer your question, yes, MS does send out emails with hotfixes, but only when an end user requests it, at least in my experience… If the end user did not specifically fill out a form/request the hot fix, then I would be very suspicious… -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Second step with SA
On Fri, 2014-08-15 at 12:21 -0400, Daniel Staal wrote: --As of August 15, 2014 1:23:37 PM +0200, Antony Stone is alleged to have said: http://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_SpamAssassin_Conf .html#language_options Both of these links are out of date. The whitelist/blacklist it probably doesn't matter to much, but the language option in the first has been discontinued entirely. Nope. The ok_languages option has not been discontinued. It has been plugin-ized since 3.1, still lives to this date in the TextCat language guesser plugin. I do however agree, that those 3.0 links are way too old. I guess Antony should clean up some bookmarks. ;) Regarding white- and blacklist options, there have been some significant changes since. Most notably, in addition to the whitelist_from_rcvd, today there's the most convenient whitelist_auth and its piece-meal whitelist_from_(spf|dk|dkim) counterparts. The correct links for the current version of Spamassassin are: http://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html#language_options http://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html#whitelist_and_blacklist_options Latest stable version documentation, always: http://spamassassin.apache.org/doc/ -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: spamassassin at 100 percent CPU
On Wed, 2014-08-13 at 11:20 -0700, Noah wrote: This is a new machine with rules copied over from another machine. How about this? I just start new. Is there a good page out that explains setting up spamassassin from scratch and getting the sa rules set up well and cleaned up nicely? I am happy to start from the beginning with best practices. If you cannot answer our rather specific questions, you're in for a much steeper learning curve than you seem to expect... What the best way of setting up SA on a new machine is? Just install the distro provided SA packages. Getting the SA rules set up well? Same. Cleaned up? Do not copy over configuration and rules from $ome other system, unless you know what you are copying. IOW, don't. That's clean by definition. What I really don't get from your reply is this, though: A new machine, with rules copied over. Yet, you seem to be unable to answer our questions regarding custom rules and configuration you put there. Which equals everything you copied over to begin with. If you did, why can't you answer our question? Or revert that copying over, which results in the cleaned up state you asked for. Regardless of continuing with the current system, or setting up the whole system from scratch again -- there are important questions raised, you just didn't answer. Which, frankly, are likely to have a *much* more severe impact than removing bad, copied rules. What mail is that system handling, if it is not an MX? How large are those messages, and what's your size limit? How is SA integrated, what software is passing mail to SA? What is the actual process's name, and for how long does it run at CPU max? Without answering these (basically, get back to my previous post and actually answer all my very specific questions), there is absolutely no point in you posing more or other questions. It won't help. Reference: On 8/11/14 4:31 PM, Karsten Bräckelmann wrote: On Mon, 2014-08-11 at 09:18 -0400, Joe Quinn wrote: Keep replies on list. Do you remember making any changes, or are you using spamassassin as it comes? What kind of email is going through your server? Very large emails can cause trouble with poorly written rules. If you can, perhaps systematically turn off things that are pushing email to that server could narrow it down to a particular type of email. On 8/9/2014 4:41 PM, Noah wrote: thanks for your response. I am not handling much email its a new server and currently the MX points to another server. What mail is it handling? Not MX, so I assume it does not receive externally generated mail at all. Which pretty much leaves us with locally generated -- cron noise and other report types. How is SA integrated? What's your message size limit (see config of the service passing mail to SA)? Are you per chance scanning multi MB text reports? A sane size limit is about 500 kB. Besides, local generated mail isn't worth processing with SA, and in the case of cron mail often harmful (think virus scanner report). How do I check the SA configuration? How do I check if I am using additional rules? By additional rules, we mean any rules or configuration that is not stock SA. Anything other than the debian package or running sa-update. Generally, anything *you* added. On 7/31/2014 3:19 PM, Noah wrote: what are some things to check with spamassassin commonly running at 100 percent? For how long does it run at CPU max? What is the actual process name? It would be rather common for the plain 'spamassassin' script to consume a couple wall-clock seconds of CPU, since it has to read and compile the full rule-set at each invocation. Unlike the 'spamd' daemon, which has that considerable overhead only once during service start. In both cases may the actual scan time with high CPU load be lower than the start-up overhead. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Rule for single URL in body with very few text
On Tue, 2014-08-12 at 11:42 -0400, Karl Johnson wrote: Thanks for the rule Karsten. I've already searched the archive to find this kind of rule and found few topic but I haven't been able to make it works yet. I will try this one and see how it goes. Searching is much easier, if you know some unique pointers like the sub-rule's name in question. Which is what I used to dig up the rules. ;) I didn't mean to RTFM you, just didn't feel like discussing yet again what should be possible to deduct from the rules itself, or from the archived threads. Hence me pointing at the archives with info on how to find what you need, just in case you do need or want more details. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Running SA without the bayesian classifier
On Mon, 2014-08-11 at 16:38 +0200, Matteo Dessalvi wrote: I am planning to install SA on our SMTP MTAs, which deals only with outgoing traffic generated in the internal network. Outgoing traffic. That means, most DNSBLs are either completely useless or effectively disabled. You'll also need to zero out the ALL_TRUSTED rule for the same reason. I am making the assumption that our clients are mostly sending 'clean' email (I know, I am trusting *a lot* my users but nevertheless). So the question is: how efficient will be SA without using the bayesian classifier? Are all the remaining rulesets (apart from BAYES_*) sufficient to shave off spam email? Define spam. Running SA on your outgoing SMTP will not catch botnet generated junk, neither spam nor malware. This would require sniffing raw traffic. Or completely firewalling off outgoing port 25 connections. You explicitly mention your users (corporate or home?) sending mail. Are you talking about them possibly running bulk sending services, or hand crafted unsolicited mail to individual recipients? Unless there's a 419 gang operating from your internal network, there might not be much left for SA with stock rules to classify spam... That said, it is entirely possible to run SA without the Bayesian classifier. There's an option to disable it, and different score sets are used generated specifically for this case. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Rule for single URL in body with very few text
On Mon, 2014-08-11 at 15:48 -0400, Karl Johnson wrote: Is there any rule to score an email with only 1 URL and very few text? It could trigger only text formatted email because they usually aren't in HTML. Identify very short (raw)bodies. rawbody __RB_GT_200 /^.{201}/s meta__RB_LE_200 !__RB_GT_200 Chain together with the stock __HAS_URI sub-test. metaSHORT_BODY_WITH_URI __RB_LE_200 __HAS_URI I have discussed and explained the rule to identify short messages a few times already. Please search your preferred archive [1] for the rule's name, to find the complete threads. [1] List of archives: http://wiki.apache.org/spamassassin/MailingLists -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Rule for single URL in body with very few text
On Mon, 2014-08-11 at 22:57 +0300, Jari Fredriksson wrote: * 1.8 DKIM_ADSP_DISCARD No valid author signature, domain signs all mail * and suggests discarding the rest This is a corner case. I got it tagged, but probably just because I tested it later and URIBL has it now. Minus the 1.8 score for DKIM_ADSP_DISCARD, it wouldn't have crossed the 5.0 threshold for you either. Seeing all those x instead of (real|user|host) names and domains, it seems safe to assume the unredacted message does not claim to be sent from an x.com address... ;) -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: spamassassin at 100 percent CPU
On Mon, 2014-08-11 at 09:18 -0400, Joe Quinn wrote: Keep replies on list. Do you remember making any changes, or are you using spamassassin as it comes? What kind of email is going through your server? Very large emails can cause trouble with poorly written rules. If you can, perhaps systematically turn off things that are pushing email to that server could narrow it down to a particular type of email. On 8/9/2014 4:41 PM, Noah wrote: thanks for your response. I am not handling much email its a new server and currently the MX points to another server. What mail is it handling? Not MX, so I assume it does not receive externally generated mail at all. Which pretty much leaves us with locally generated -- cron noise and other report types. How is SA integrated? What's your message size limit (see config of the service passing mail to SA)? Are you per chance scanning multi MB text reports? A sane size limit is about 500 kB. Besides, local generated mail isn't worth processing with SA, and in the case of cron mail often harmful (think virus scanner report). How do I check the SA configuration? How do I check if I am using additional rules? By additional rules, we mean any rules or configuration that is not stock SA. Anything other than the debian package or running sa-update. Generally, anything *you* added. On 7/31/2014 3:19 PM, Noah wrote: what are some things to check with spamassassin commonly running at 100 percent? For how long does it run at CPU max? What is the actual process name? It would be rather common for the plain 'spamassassin' script to consume a couple wall-clock seconds of CPU, since it has to read and compile the full rule-set at each invocation. Unlike the 'spamd' daemon, which has that considerable overhead only once during service start. In both cases may the actual scan time with high CPU load be lower than the start-up overhead. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Similar pattern of emails Comparing Prices
On Thu, 2014-08-07 at 17:14 +0100, emailitis.com wrote: I have had a fair number of VERY similar Spam emails that are all about comparing prices. I have put a number in a pastebin below. We need full, raw samples. Those are mostly just headers with the raw body missing (multipart/alternative, thus most likely HTML and plain text versions). The blobs including a body-ish part appear to be copied from your MUA's rendered display. They all seem to be originating from Fasthosts in UK which I cannot really blacklist in entirety. Can anyone suggest how to block it with a Spamassassin rule? First impression thought was to match on that List-Unsubscribe header's domain. On second thought, bad idea, since cloudapp.net is MS Azure, not the spammer's domain. Still, that might make for an easy rule. That unsub link includes some campaign, recipient, etc identifying numbers. And one that most likely identifies the sender, identical in all 7 samples. header AZURE_BAD_CUSTOMER List-Unsubscribe =~ /email-delivery.cloudapp.net\/sender\/box.php?.*s=bfa2e2429e7a4f0b0993c32a75aebc0e/ Note: This is only assuming the s value identifies the campaign's sender and misbehaving Azure customer. The body most certainly contains links with very similar structure. http://pastebin.com/B9YqTsvZ I had tried to create something from a meta rule, but that has not worked so far: body __CGK_CLOUDAPP_1 /cloudapp/i body __CGK_CLOUDAPP_2 /\bCompare\b/i meta CGK_CLOUDAPP (( __CGK_CLOUDAPP_1 + __CGK_CLOUDAPP_2) 1) No surprise. There is no cloudapp string in the body at all, according to your two formatted samples. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: unsubscribe
Wrong address. To unsubscribe, send a mail to the appropriate list-command address, not the mailing list itself. See the headers of each and every post on this list: list-help: mailto:users-h...@spamassassin.apache.org list-unsubscribe: mailto:users-unsubscr...@spamassassin.apache.org -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: New at SpamAssassin - how to not get headers
On Mon, 2014-08-04 at 14:11 -0700, Robert Grimes wrote: Both spamc and hMailServer SA service are running in the same directory where the binaries for SA are. I am not sure the significance of the directory name. As I stated both use the same parameters which is only -l therefore SA uses default config file locations for both. Earlier in this thread you mentioned using the -l option with spamd. Now you mention using that option with both. So, by hMailServer SA service, are you referring to spamd? In either case, your assumption of using identical command line options resulting in spamd and spamc using the same configuration is false. * For spamc, the -l option sends log messages to stderr instead of syslog. Given you're running Windows, I don't even know if that option has any effect at all. * For spamd, the -l option enables telling, that is allowing learning (Bayes) and reporting spam to external services via spamc. The latter is a rather uncommon option, and even less likely to be used deliberately in the environment of a new SA user. For spamc/d options and a lot more details, see the documentation. In particular the docs named after their respective programs and the Conf one. http://spamassassin.apache.org/doc/ I have had serveral hundred hams. Wouldn't that be enough? Yes, as Martin mentioned, learning 200 spam and ham each is sufficient for Bayes to start working. But see my other reply to this thread in a few. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: New at SpamAssassin - how to not get headers
On Mon, 2014-08-04 at 13:02 -0700, Robert Grimes wrote: Robert Grimes wrote I have changed the user that runs the spamd service to be the same as when I ran from command line. I will see what, if any changes occur. I will leave Bayes alone for the moment; just try one thing at a time to keep the confusion down. By that change of the user your spamd service runs as, you lost your previous Bayes training (which seems to be linked to the service user). Unless you deliberately nuked the Bayes DB to start fresh. Ignoring DNSBL blocking and broken format, which has been covered already. X-Spam-Status: No, score=0.0 required=5.0 tests=HTML_MESSAGE,SPF_HELO_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 There is no BAYES_xx rule hit. If Bayes is enabled and has been trained sufficiently, there will *always* be a BAYES_xx rule indicating the Bayesian probability of being spam. The absence of any such rule since you changed the spamd service user means, that user has no access to the previously trained Bayes DB. I saved the messaged from outlook and ran spamc [...] X-Spam-Status: Yes, score=7.3 required=5.0 tests=MISSING_DATE,MISSING_FROM, MISSING_HEADERS,MISSING_MID,MISSING_SUBJECT,NO_HEADERS_MESSAGE,NO_RECEIVED, NO_RELAYS,NULL_IN_BODY,URIBL_BLOCKED,URI_HEX autolearn=no autolearn_force=no version=3.4.0 No BAYES_xx rule either, same problem as above. However, do note the autolearn=no part. Bayes is enabled (just not sufficiently trained yet). In a follow-up to this thread, you pasted headers of spam manually scanned with spamc, showing autolearn=ham. A spam message incorrectly has been learned as ham. You want to correct that by re-training (simply learn as spam). And keep an eye on that part in future. both should be running under the same administrator account. It is important to use the same user (a) scanning incoming mail, and (b) using for training as well as (c) manually running through spamc later. Unless spamd changes user on a per-recipient basis (which it seems is not the case in your setup), that's a single user. Changing that user as you just did, requires moving $HOME data or changing ownership for the Bayes DB. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: moving from fetched mail to direct deliver mail
On Mon, 2014-08-04 at 18:16 -0400, Joe Acquisto-j4 wrote: On 8/4/2014 at 5:03 PM, RW rwmailli...@googlemail.com wrote: Do I gotta start fresh? or will the config changes to SA for direct drop allow magic to happen? There's magic. And there's probably no SA conf changes. ;) I'm not sure whether you are referring to the Bayes database or a collection of email, but either way I'd keep it - at least until I had a few thousand new hams and spams to reset it. Well, either or both, I guess. I guess my question really is, is Bayes OK as is, or will the changes that will exist in the headers make it useless.I think I hear, it should be ok, for now. ? Bayes is entirely fine with that. For now, and later. Your change in environment only effects a very few headers added by the relays, like Received ones. Bayes tokens taken from headers do include header specifics. With a change like this, you will only lose a *very* few indicators for spam vs ham. There's hardly any potential for damage at all regarding your Bayes training. You'll probably not even notice. If you are going to learn from older mail you should ideally keep the old internal and trusted network settings. You can comment them out in normal use, but they should be present for sa-learn. Umm. ?. So, I can keep the existing Bayes, but if I should have to re-learn, I should revert to my old settings for learning. Yes. The only settings you'd want to keep in case of re-training from a corpus including those old mail are internal_ and trusted_networks, though. If at all. SA does detect certain mail fetching and does the magic for you. E.g. in a rather straight environment of using 'fetchmail' with local SA afterward (postfix, and possibly procmail), the internal and trusted networks do not need to be set. So in that case, there's no config needed to be retained, because there's no config you had to set due to your mail fetching environment in the first place. Point in case: Retain configuration you did need in the previous setup, which becomes obsolete with your new environment. I guess I should also, once I change, start a second corpus with the new settings and, at least until I amass a sufficient store of new mail, relearn from both, adjusting SA config as appropriate? As I hopefully made clear above, there's no need for starting a new corpus. There's probably no need for new settings, if at all very limited. Your text sounds like major conf changes to me. Go through 'em, which changes do you think you'll need? My guess is little to none. Make sense? Am I way off base and/or making this too complicated? Too complicated. ;) -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: stable branch vs trunk (was: Re: colors TLDs in spam)
On Sun, 2014-08-03 at 09:22 -0400, Kevin A. McGrail wrote: Hi Karsten, I did bring this up a few months ago discussing releases. I'm currently catching up on list mail, and figured recent threads might be more important than revising old-ish, finished threads, in particular about releases already published. *sigh* Right now trunk is effectively 3.4.1 and there is no reason to maintain a branch. When 3.4.1 is released, I would make sure this was the case and recopy from trunk but do not stress as I will confirm this. We should aim for a sept 30 3.4.1 release. But until we have a need for the branch, to me it is a waste of time to sync both. Fair enough. And the plugin system let's new, experimental code go into trunk without risking stability. That holds true only for new plugins, like TxRep (trunk) or the Redis BayesStore during 3.4 development. It does not prevent potential major issues in cases like e.g. new URIDNSBL features, general DNS system rewrite or tflags changes, which happened in trunk with the (then) stable 3.3 branch being unaffected. Not opposing in general. Just pointing out that this argument is only valid, as long as substantial changes are in fact isolated in new plugins. So right now, I do not really envision a need for a branch and I run trunk. My $0.02. Hey, I didn't say trunk is unsafe either! Even while Mark happens to rewrite large parts of DNS handling or DNSBL return value masks. ;) As long as there is no real need for separating stable and development branches, I'm fine with this. Given branching will happen prior to disruptive commits. I guess my concerns also can be outlined by anecdotal evidence: I recently asked for RTC votes, to commit a patch not only to trunk, but the 3.4 branch also. You told me we're not in RTC mode and to go ahead, so I committed to the stable branch and closed the bug report. You did not tell me committing to the branch would be needless... -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}