Re: How to get removed from spamcop?
On Mon, Oct 28, 2013 at 3:08 PM, John Levine jo...@taugh.com wrote: They have several of our IP addresses listed and delisting doesn't seem to work. We're a spam filtering company (Junk Email Filter) and if we fail to block a spam it can appear we are the source. Uh, Marc, if the spam comes out of your servers, you ARE the source. Nobody but you cares about your business model. More to the point, if you're a spam filtering company, you shouldn't be delivering something you failed to block to anybody but your own customers. Doesn't that make this a customer education issue? Why are your customers reporting you to spamcop?
Re: Really getting discouraged... when does the learning happen?
On Mon, Sep 16, 2013 at 1:38 PM, Harry Putnam rea...@newsguy.com wrote: Yes, here is an example of a message rated as spam: X-Spam-Report: * 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100% * [score: 0.] OK, so you've got a BAYES_99 on that message, which is a pretty good indication that the training has worked. However, SA's confidence in the Bayes algorithm is only worth about one point out of a necessary five, so the rest of the rules have to contribute the other (just a bit more than) four points, and they do not: * 0.4 STOX_REPLY_TYPE STOX_REPLY_TYPE * 1.2 RCVD_NUMERIC_HELO Received: contains an IP address used for HELO * 1.8 STOX_REPLY_TYPE_WITHOUT_QUOTES STOX_REPLY_TYPE_WITHOUT_QUOTES This could be because the scores are tuned to include network tests which aren't able to be applied to your archive, or some such. In any case it's not the training that is failing you here. You have a couple of choices. You can assign your own higher score to the BAYES_99 rule in your local spamassassin config, or you can modify your procmail recipe to look for BAYES_99 in the filtered message and treat messages that have it as spam even if they do not score above the five point threshold. Anything that's falsely BAYES_99 is probably something you want to re-learn as ham anyway.
Re: procmail/spassassin training session
On Sat, Sep 14, 2013 at 1:07 PM, Harry Putnam rea...@newsguy.com wrote: 1) Does it matter that I have autolearn turned off in spamassassin conf filt 'local.cf' while doing my sandbox work No, it doesn't. In fact it's probably better that way because SA won't waste time updating the bayes database with the mis-classified stuff that will have to be backed out later. 2) I've dirived the mbox files of pure ham and pure spam by running mixed mail so SA has already seen this mail. That definitely doesn't make any difference *IF* you disabled auto-learning in the previous step. It shouldn't make any difference even if autolearning was on, because sa-learn will discard the tokens from the first pass on each message before re-learning, but it'll be somewhat faster if that's not necessary.
Re: Really getting discouraged... when does the learning happen?
On Sun, Sep 15, 2013 at 7:53 PM, Harry Putnam rea...@newsguy.com wrote: I've been trying to `teach' SA to spam from ham in my mail system. I've made it thru two main learning sessions where I ran around 450 msgs (each time) thru sa-learn spam/ham and yet SA is still incapable of getting it right more than about 40 % or maybe less. You say you've run 1100 messages through -- have at least 200 of those been ham? Bayes won't kick in until 200 *each* of spam and ham are trained. You can run sa-learn --dump magic to see how many of each it believes it has seen. If you've sa-learned enough of both types, is it possible you haven't enabled bayes scoring? Are the BAYES_* rules showing up at all in the score details for newly arrived messages fed through spamc?
Re: Comment - GFI/SORBS
http://blog.wordtothewise.com/2010/12/gfi-sorbs-considered-harmful-part-5/
RAZOR2 and SpamAssassin version or configuration
We have a couple of mail servers running SpamAssassin. One is stock CentOS5 and therefore running SA 3.2.4. The other is a test platform running SA 3.3.1 (installed from rpmforge in case that matters). Both have the latest sa-update configurations for their respective versions. On both hosts, when I put the 3.3.1 sample-spam.txt message through spamassassin, it reports RAZOR2_CHECK as expected. If I run with -D, SA3.2.4 reports that it is using razor2 version 2.82, and SA3.3.1 reports razor2 version 2.84. Again this is as I expect. However, I have another message received from outside, which when put through spamassassin 3.2.4 reports a hit on RAZOR2_CHECK, but when put through 3.3.1 it does not. Run with -D, it does appear that the razor server is being contacted in both cases, but I confess I haven't yet resorted to sniffing traffic to be sure. Where should I be looking for a configuration difference that would cause this?
OT: SPF: Some statistics
Coincidental to the recent thread on SPF comes this from Terry Zink: http://blogs.msdn.com/tzink/archive/2010/02/23/some-stats-and-figures-on-dkim-and-spf.aspx
Re: outlook 2007 Test email scores 30+
On Sat, Oct 31, 2009 at 9:31 AM, John Hardin jhar...@impsec.org wrote: Here is a site that gives you your IP address and lets you check it against DNSBLs: http://cqcounter.com/rbl_check/ Just as a word of warning, that site is still checking blacklist.spambag.org, which has been offline since 2007 and now lists the entire Internet.
Re: Problem starting/stoping spamassasin
On RedHat systems, at least, the init.d script that runs spamd is named spamassassin. So possibly what was meant here was service spamassassin start service spamassassin stop
Re: googlepages.com abuse
On Nov 13, 2007 3:32 AM, Michael Scheidell [EMAIL PROTECTED] wrote: How do you folks trap these mails , And how do we report abuse to google ( if they really bother ) You can't. Google ignores complaints, and email to @googlepages.com will bounce in 5 days due to their refusal to even follow the RFC's and have a server to receive email. Er, which RFC are you claiming requires them to have a server to receive mail? In any case I inquired of an acquaintance who works at Google and got this response to the question of how do we report abuse: Web link: http://www.google.com/support/pages/bin/request.py with links there for spam, phishing, scumware, etc. Email links get too heavily spammed so the one contact address which has leaked gets a push-back mail to use the web form. All complaints are acted upon quickly but no individual replies are sent. Quickly here is in practice within one business day (but I don't believe that there's a formal commitment to this). Any evidence that a complaint has been ignored would be interesting to see; ultimately, is the page still up? is the check.
Re: Myspace's mail headers
On Oct 30, 2007 12:27 PM, Joseph Brennan [EMAIL PROTECTED] wrote: Notice the all-lower-case field names, which do not conform to the RFC 2822 field names that they almost match. That is, a from: header is not the same as a From: header! It's *supposed* to be the same. RFC *822 does not specify case for field names. See RFC 2234 where the ABNF format is defined, section 2.3. if SA is not finding the from-address as a result of lower-casing from:, then SA has a bug. If SA is *intentionally* ignoring from: because lower-casing is a spam-sign, then that's a somewhat different matter ... but probably still wrong.
Re: Bit OT but it's about SPAM
On 10/17/07, Randal, Phil [EMAIL PROTECTED] wrote: Hyperbole? Well, let's take a look at the figures on my mail relay boxes Not to single out Phil, but so far everyone is quoting (among other things) the percentage of mail that they reject out of hand. You're all 100% confident that none of those were false positives? My point was that rejected/filtered by anti-spam techniques and is spam are not synonymous, but nearly everyone who publishes spam figures behaves as if they were, and most of them have a vested interest in making the number sound as big and scary as possible.
Re: Bit OT but it's about SPAM
On 10/17/07, Tom Ray [EMAIL PROTECTED] wrote: I just thought if anyone hasn't read it yet, this article might be interesting to many of you. According to this report SPAM has now reached being 95% of all email. This is hyperbole. What it really means is that 95% of the mail processed by someone's commercial spam filter has been classified, possibly incorrectly, as spam. The rates are much lower (though still too high for comfort) if false positives are accounted for. See, for example: http://www.bcs.org/server.php?show=conWebDoc.14617
Re: Suggestion to developers
On 9/13/07, Justin Mason [EMAIL PROTECTED] wrote: if anyone feels like trying it out to see if they can make an auto-shortcircuiting plugin which outperforms base SpamAssassin over a mixed corpus of 50:50 nonspam and spam, go for it ;) I dunno about your mail, but if it outperformed base SA on a corpus of 20:80 ham:spam that'd be worth it for what we end up filtering. Of course outperform means it also has to maintain the same (or a smaller) FP ratio, not just that it does the wrong thing faster.
Re: another bouncing list person
On 7/25/07, Jerry Durand [EMAIL PROTECTED] wrote: It also is trying to claim to come from me, I don't have a POST or OFFICE address here. Begin forwarded message: From: [EMAIL PROTECTED], [EMAIL PROTECTED] It's almost certainly the case that your own mail server did that. The originating mail server very likely sent you From: Post Office (that is, an invalid address which your server attempted to clean up by attaching the local domain in places it thought appropriate).
TQMcube apparently gone dormant
If you read JM's Planet Antispam, you know this already, but: http://www.dnsbl.com/2007/06/status-of-dnsbltqmcubecom-abandoned.html
Re: Blacklist a mailing list
On 7/1/07, dougp23 [EMAIL PROTECTED] wrote: How do I go about blocking the mailing list? here are some headers from a recent message: (It seems everyone on [EMAIL PROTECTED] is getting this junk). Prompted by Doug but directed to no one in particular: Please don't use things like mailinglist.org and especially mydomain.com as either generic examples or as placeholders for whatever your domain really is. There actually *is* a mydomain.com and unless that really is your domain it just causes needless confusion. If for some reason you think its essential to purge references to your domain name, then simply replace them with obvious mark-out like --.com or the like. Thanks.
Re: A different approach to scoring spamassassin hits
On 6/29/07, Tom Allison [EMAIL PROTECTED] wrote: The thought I had, and have been working on for a while, is changing how the scoring is done. Rather than making Bayes a part of the scoring process, make the scoring process a part of the Bayes statistical Engine. As an example you would simply feed into the Bayesian process, as tokens, the indications of scoring hits (binary yes/no) would be examined next to the other tokens in the message. There are a few problems with this. (1) It assumes that Bayesian (or similar) classification is more accurate than SA's scoring system. Either that, or you're willing to give up accuracy in the name of removing all those confusing knobs you don't want to touch, but it would seem to me to be better to have the knobs and just not touch them. (2) For many SA rules you would be, in effect, double-counting some tokens. An SA scoring rule that matches a phrase, for example, is effectively matching a collection of tokens that are also being fed individually to the Bayes engine. In theory, you should not second-guess the system by passing such compound tokens to Bayes; instead it should be allowed to learn what combinations of tokens are meaningful when they appear together. (It might be worthwhile, though, to e.g. add tokens that are not otherwise present in the message, such as for the results of network tests.) (3) It introduces a bootstrapping problem, as has already been noted. Everyone has to train the engine and re-train it when new rules are developed. I've thought of a few more, but they all have to do with the benifits of having all those knobs and if you've already adopted the basic premise that they should be removed there doesn't seem to be any reason to argue that part. To summarize my opinion: If what you want is to have a Bayesian-type engine make all the decisions, then you should install a Bayesian engine and work on ways to feed it the right tokens; you should not install SpamAssassin and then work on ways to remove the scoring.
Re: My Newly Expanded DNS Blacklist - Who wants to try it?
On 6/16/07, Marc Perkel [EMAIL PROTECTED] wrote: Using my new ideas here's my raw blacklist file. It has about 80k IP addresses and is updated every 10 minutes. http://iplist.junkemailfilter.com/black.txt Just glancing through the list and reversing an IP address whose first two quads I recognize, I see you've blacklisted Red Condor (redcondor.com), a network security and anti-phishing service provider (64.84.16.173). So either they've got a problem they ought to be made aware of, or you do ...
Re: R: Inappropriate use of E-Mail addresses
On 5/13/07, Gregory P. Ennis [EMAIL PROTECTED] wrote: SPF seems very interesting. Does spamAssassin automatically use an SPF record if it exists? There's a plugin. Do I set up an SPF record with whoever manages my MX DNS record? Yes. It's a TXT record. Some DNS hosting companies will set it up for you, some will give you the ability to create a TXT record through their management intraface (but you have to figure out what to put in the record yourself), and some don't support TXT records at all. Note that SPF is not a magic bullet. It's not yet that widely adopted, and any MTA that's doing accept-and-bounce for unknown addresses is probably not checking SPF either. You probably also want to look at SenderID. Wikipedia has a reasonable summary.
Re: Weirdsvill
On 4/13/07, Gene Heskett [EMAIL PROTECTED] wrote: Now, I *think* I have that X-Originating-Ip: 193.93.97.195 in my .procmailrc, but it didn't fire. Odd... Is that rule before or after the point at which you run the message through spamassassin? If after, it probably ddin't fire because spamassassin moved it out of the top-level message header. You'd have to be looking for X-Originating-IP in the body, then.
Re: Weirdsvill
On 4/13/07, Gene Heskett [EMAIL PROTECTED] wrote: The trail starts at localhost! HTF did they do that? You're looking at the header of the wrapper message created by spamassassin, not at the header of the actual spam (which will be inside a message/rfc822 body part of the message created by spamassassin).
Re: newbie question on spamassassin trainer
On 4/3/07, JOYDEEP [EMAIL PROTECTED] wrote: how can I configure spamassassin to look after the spam and ham folder of all the cyrus mail boxes, so that all the users has their own spamassasin trainer ? it is something like white box and black box per user could any one kindly suggest me how to implement this ? I don't know if sa-learn can read cyrus mailboxes; I suspect it can't. So it's up to you to get the mail out of the mailboxes and into a format sa-learn can read. As for how you do this, you'll have to set up a periodic cron job (overnight may be often enough) to pull the mail out of the mailboxes and feed it through sa-learn. You may be able to do this with logrotate and some pre/postrotate scripts in the config file, but having cyrus in the equation may interfere with that approach.
Re: dkim: lookup failed: DNS query timeout for _policy._domainkey.joysticktowers.com
On 3/10/07, Chris [EMAIL PROTECTED] wrote: For some reason when this happens fetchmail will not delete the message after downloading it therefore it just sits there and get downloaded over and over again and prevents othere mail after it from being downloaded. Could this be a) a fetchmail issue Yes, it is. fetchmail won't download messages out of order, it won't mark the message on the server as one it has already seen until it has successfully delivered it locally, and any timeout either in the connection to the server or in the local delivery causes the local delivery to be treated as a failure (so the message isn't marked) and the entire fetchmail process to exit (so no later messages get fetched in that session). You're effectively deadlocked until you either fix the timeout problem or delete the offending message from the server by some other access.
Re: Sorting SA Discussion List Messages
On 3/3/07, Don Ireland [EMAIL PROTECTED] wrote: Every email list I've ever subscribed to has had something in the subject line (usually in square brackets) to identify 1) that it is a mailing list and 2) what list it is. Why doesn't this list have something similar? Because it's a really annoying thing to do and interacts badly with both threading algorithms and with other automated header rewriting that's done by mail readers? I'm on a couple of lists where this kind of tagging is done and there are always threads where the subject has become Re: [List Name] Re: [List Name] Re: [List Name] Silly subject rewrites ad infinitum Just Say No to unnecessary administrative mangling of messages that pass through list exploders.
False positive on LONGWORDS
A technical newsletter about transistors contains the introductory paragraph Use of gallium nitride (GaN) power transistors in microwave applications is expected to increase significantly with recent technology improvements, but lateral double diffuse metal oxide semiconductor (LDMOS) transistors are expected to stay in the lead, according to a new report by RF Design editorial director Ashok Bindra, and technical editor Mark Valentine. The duo's report also addresses the general GaN market and exposes the latest advancements in complementary metal oxide semiconductor (CMOS) radio-frequency integrated circuits (RFICs) and integrated passive components. This integrated passives technology could potentially displace conventional manufacturing techniques presently used to produce passive RF components. Several substrings including integrated passives technology could potentially displace conventional manufacturing techniques presently match the LONGWORDS regex, for 3.0 points. That seems a bit excessive ... is this worth filing a bugzilla?
Re: Google Summer of Code 2007 ...
On 2/16/07, Justin Mason [EMAIL PROTECTED] wrote: Also, any suggestions from outside the dev team? Anyone got good ideas for new SpamAssassin features that would be good to pay someone to work on for 3 months? http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3785
Re: Yum and Spamassassin
On 2/7/07, Theo Van Dinter [EMAIL PROTECTED] wrote: On Wed, Feb 07, 2007 at 03:04:43PM +, Michael Bartlett wrote: I can't believe the yum package is so out of date, am I missing something? You're running Fedora Core 4 (hey, me too,) which is generally out of date at this point. I'd suggest ditching their package, grabbing the SA tarball, and building your own 3.1.x. The RPMforge project at rpmforge.net has up-to-date versions of spamassassin in a yum repository for most RedHat-derived operating systems including Fedora Core. I don't use FC4 so I don't know specifically about that one, but it's pretty likely.
Re: spamdoptions ???
On 1/23/07, R Lists06 [EMAIL PROTECTED] wrote: On Redhat or CentOS machines would that be under SPAMDOPTIONS ? Using the RPM install of spamassassin from either the CentOS project or rpmforge, you make changes to the spamd command line in /etc/sysconfig/spamassassin, and yes, you place those switches in the assignment to the SPAMDOPTIONS variable. As far as what switches you use, man spamd. For how many spamd children you can run, you probably just have to experiment with gradually increasing the number.
Re: procmailrc question
On 1/10/07, D Ivago [EMAIL PROTECTED] wrote: :0: * ^Subject:.*\[SPAM]\ /dev/null Square brackets have special meaning: [SPAM] is a character class matching one of any of the characters S, P, A, or M. What you need is: :0 * ^Subject:.*\\[SPAM\] /dev/null However, I'd not recommend that. Instead, continue to file the spam in a folder, but set up something to discard the contents of the folder on a regular basis. For example, I configure the logrotate package to move the spam folder to a backup name once a day, and discard the oldest backup every few days; so the spam doesn't pile up forever, but I can recover anything that gets mis-filed (and with a threshold of 3.0 you *will* get something mis-filed eventually, even if you have not yet).
Re: Salesforce web bug
On 12/19/06, Michael Scheidell [EMAIL PROTECTED] wrote: I noticed an email from salesforce has a 'user tracking' web bug in it but it isn't currently detected by SA or SARES Why do you want to consider this a spam sign? I'm just curious.
Re: Salesforce web bug
On 12/20/06, Loren Wilton [EMAIL PROTECTED] wrote: Why do you want to consider this a spam sign? I'm just curious. Bugs in mail messages are generally a suspicious circumstance, and probably good for a fractional point all by themselves. In general any tracking that will auto-identify without the user at least clicking on something is suspicious. In general I'd agree with you, but here we're talking very specifically about SalesForce. Is there evidence, for example, of someone using SalesForce to send spam?
Re: sa-update is broken
On 12/18/06, Christian Eichert [EMAIL PROTECTED] wrote: server:~# perl -MCPAN -e 'install LWP::UserAgent' Can't locate object method install via package LWP::UserAgent at -e line 1. # perl -MCPAN -e shell cpan install LWP::UserAgent
Ongoing trusted_networks confusion
Maybe the name of that config option should be changed to truthful_networks.
Re: Easyjet e-mail scoring very high
On 12/1/06, Chris Lear [EMAIL PROTECTED] wrote: In fact, every full stop in the html is represented as #46; for some reason. In SMTP, a dot all by itself on a line is interpreted as the end of the message. The SMTP client is supposed to double any such dot that is truly present in the message body, and the SMTP server then removes the extra dot for final delivery. My guess would be that (a) they have a crappy SMTP cllient, probably something written in Java by a junior programmer who doesn't know a protocol from a parsnip, to send mail directly from a web server platform; and (b) they once had a message truncated because there was a dot in the wrong place; so (c) because they don't know how to fix the crappy SMTP client, they encode all the dots instead. Still wondering though... how do you solve a problem like EasyJet? By doing what you don't want to do: whitelisting.
Re: Bayes failure on hi, it's Somebody spam
On 11/16/06, Jon Trulson [EMAIL PROTECTED] wrote: Hmm, that has not been my experience at all... Bayes (99) is still catching every one for me. In this instance, SpamAssassin is running after POP download from gmail, so I'm only seeing the samples that have already made it through google's filters. That may have something to do with it.
Bayes failure on hi, it's Somebody spam
It looks to me as if the recent spate of pump'n'dump spams are deliberately crafted to avoid being Bayes-learned by spamassassin. In spite of all having different subject lines and senders and other minor differences, once you've learned one of them sa-learn ignores all the rest -- and they all still get a BAYES_00 score for me. I thought I had a pretty good understanding of how SA's Bayes training worked, but this is pretty clearly confusing it somehow.
Re: RPM -vs- CPAN install
On 9/6/06, jdow [EMAIL PROTECTED] wrote: The RPM installs do not seem to include the tools that you get with the CPAN install. The rpmforge project packages the tools as a separate RPM, named, surprisingly enough, spamassassin-tools.
Re: RPM -vs- CPAN install
On 9/6/06, jdow [EMAIL PROTECTED] wrote: From: Bart Schaefer [EMAIL PROTECTED] The rpmforge project packages the tools as a separate RPM, named, surprisingly enough, spamassassin-tools. And then one distro spamassassin-tools was no longer present. I'm not sure what you mean. yum list spamassassin shows me: Installed Packages spamassassin.i3863.1.5-1.el4.rf installed Available Packages spamassassin-tools.i386 3.1.5-1.el4.rf rpmforge Maybe it is in extras now. If you're talking about RedHat, no, it's not in extras. They don't provide it at all, unless as part of the source RPM. However, as they also don't provide anything newer than 3.0.6, I've already gone looking elsewhere, in this case rpmforge.net. This lack has left me distrustful of distros of late. I've noticed that they all leave little things out of what they package. Mostly I suspect they leave out things about which they're concerned there may be even the slightest possibility of licensing or copyrigh hassles.
Re: Calling Regex Experts
On 8/24/06, D. J. [EMAIL PROTECTED] wrote: I'm expecting these type of strings for sure: cat dog cat dog dog cat But I may get something like this too: cat cat dog dog dog Essentially I want it to match if anything other than cat or dog is in the string. That constraint means you have to construct a regex that can be anchored at both beginning and end of string, e.g. /\A(\s*(cat|dog)\s*)+\Z/. I'm not sure that ever makes sense in the context of a spamassassin rule, except maybe one matching against a specific header.
Re: What changes would you make to stop spam? - United Nations Paper
On 8/2/06, Marc Perkel [EMAIL PROTECTED] wrote: Here's what I've written so far. Deadline is today. Still working on it. http://wiki.ctyme.com/index.php/UN_Spam_Paper Rather than extend POP/IMAP to send mail, which quite frankly will never happen (contact the author of the IMAP protocol, Mark Crispin, if you want the full rant -- you shouldn't have any trouble finding his email address if you search), please suggest that the SUBMIT protocol be used. RFC 2476 and 4409. See also RFC 4405.
Re: What changes would you make to stop spam? - United Nations Paper
On 8/2/06, Marc Perkel [EMAIL PROTECTED] wrote: doesn't require a separate connection on a separate port. Why use 2 protocols when you can use one? Indeed, why don't we just close all ports except 80 and layer everything atop HTTP? For heavens sake, Marc. This debate about using IMAP/POP for outbound mail already happened more than a decade ago. If you can't be bothered to look through the archives of the IETF lists that discussed creation of these protocols, at least take the word of those of us who were present at the time: It was a poor idea then, it's still a poor idea, and you'd be much better off spending your time pushing something else. And NONE of this is relevent to SpamAssassin any more. Take it somewhere else.
Re: Why is there so much hype behind Image spam
On 7/16/06, John Andersen [EMAIL PROTECTED] wrote: The comment was off-hand and not researched. One of my earliest ISPs recommended Spamassassin when it was just a bunch of scripts written by some woman who's name escapes me. I suspect you're thinking of SpamBouncer. Catherine A. Hampton. Other than possibly being a source of inspiration, SpamBouncer has nothing to do with SpamAssassin.
Re: The best way to use Spamassassin is to not use Spamassassin
On 7/12/06, Marc Perkel [EMAIL PROTECTED] wrote: Catchy subject line eh? What you really mean is the best way to use SpamAssassin is as an analysis tool. Which of course is what the best way to use it always was. You're just abstracting the analysis rather than applying it directly. The reaso [sic] of spam is rejected before I get to SA through a fairly large number of tricks that allow me to determine with near 100% accuracy things that are spam. There's been a fellow over on the procmail list claiming for well over a year now that he can get better accuracy than SA through message header analysis alone, based on rules he's compiled by analyzing what gets through the rules he already has. Just like you've done so far in this thread, though, all he'll do is claim that without providing any details -- which he says is because he doesn't want to give away all the hours of his work that went into it. It is none mostly through behavior and karma related lists. Being host blacklisted or URI blacklisted. Similarly, I have created a whitelisting system that tracks hosts and other aspects of the message The trick, of course, is to be able to automatically feed back into these lists based on the output of the analysis tool. If someone has to do it by hand, it's a losing proposition.
Re: The best way to use Spamassassin is to not use Spamassassin
On 7/12/06, Marc Perkel [EMAIL PROTECTED] wrote: Bart Schaefer wrote: There's been a fellow over on the procmail list claiming for well over a year now that he can get better accuracy than SA through message header analysis alone His claim might well be true. Oh, I have no doubt that he's speaking truthfully. Problem is that if no one else can look at what he's done, there's no way to confirm or deny my own suspicion, which is that most of his rules are only that accurate in his specific environment. That is, I tend to expect that if you picked up his rules and dropped them on another machine halfway around the world with a different ISP and mail routing chain, their accuracy would plummet.
Re: sa-learn script
On 7/11/06, Nicholas Payne-Roberts [EMAIL PROTECTED] wrote: Does anybody know a good way to script sa-learn to daily check on junk e-mail folders? I use logrotate because it handles automatically removing or renaming the files after learning, but I don't use maildir-format folders so I can't provide a tested configuration. Something like this: notifempty missingok /home/vpopmail/domains/*/*/.Junk E-mail/cur/* { rotate 0 daily nomail prerotate spamc -t 20 -l -L spam $1 endscript } Be careful of that rotate 0 which means to delete the file. If there's any chance that a false-positive might need to be recovered later, you probably want to increase that and add an olddir directive to tell logrotate where to archive the spam. If you have logrotate running regularly as a system process, that config would go in (for example, may vary by OS distribution) /etc/logrotate.d/sa-learn. If not or if you have to run logrotate as a user other than root, put that in a file somewhere in the correct user's home directory (I like to use a subdirectory named .logrotate and name the file conf) and install a crontab entry for that user, similar to 1 3 * * * logrotate -f --state $HOME/.logrotate/state $HOME/.logrotate/conf
Re: Warnings in procmail log
On 7/9/06, Geoff Soper [EMAIL PROTECTED] wrote: Apologies, I've little idea of what is traditional and didn't realise my situation was unusual! I didn't say it was unusual ... it's just not the assumed default state of affairs. I mean all users should have the same rules and spam threshold, subject rewriting setting etc. Including sharing a bayes database (if you're using bayes)? In that case all you have to do is make sure that the user who is running spamassassin is not root and has a writable home directory. If your conjecture that procmail is running as the user popuser is correct, make sure that popuser does not have its home directory set to /.
Re: Warnings in procmail log
On 7/8/06, Geoff Soper [EMAIL PROTECTED] wrote: .qmail contains the lines: | true ./Maildir/ Caveat: I don't use qmail, and don't even particularly like qmail, so what I'm about to say are really educated guesses rather than definitive answers. which I've altered to: | true | /usr/bin/procmail -m ./.procmailrc No, don't use the -m option. Just use | /usr/bin/procmail and let procmail figure out where the $HOME/.procmailrc file is on its own. If you want any options to procmail there at all, you want the -d recipient option (where you'll have to get the value for recipient from qmail somehow, I don't know how). Incidentally, I have no idea what the purpose of that pipe to true is, and I suspect you should just remove it. and in that .procmailrc : DIR=./Maildir/ What exactly do you think that's accomplishing? If you never refer to $DIR again anywhere, this is meaningless. If you want to change directories, assign to MAILDIR. If you are trying to force procmail to deliver in maildir format, I think what you want is DEFAULT=$HOME/Maildir/ I'm not sure about the $HOME part, but DEFAULT should never be a relative path (never one starting with ./ or with no directory reference at all). I've no desire to run different configurations for different users or addresses, the single configuration is fine, I just want to solve these errors I'm seeing in the procmail_log file. Where is this ./.procmailrc file that you are trying to read with the -m option? That is, what do you expect the current directory (./) to be at the time procmail runs? If you really want exactly this same config for all users, then you should move that ./.procmailrc file (wherever it is) to /etc/procmailrc (with no dot) and insert DROPPRIVS=yes somewhere before the recipe that runs spamassassin, probably at the very top of the file (unless you want all users to write to the same log file as well). If you later add things to /etc/procmailrc, you'll need to research whether they belong above or below the DROPPRIVS (below will usually be safe, but not always correct).
Re: Warnings in procmail log
On 7/8/06, Geoff Soper [EMAIL PROTECTED] wrote: Bart Schaefer wrote: I think I need to specify the .procmailrc as the .procmailrc file is per e-mail address, not per user or even system-wide I think we've just uncovered a crucial bit of missing information. You're apparently running procmail in some kind of virtual-user environment, where there is no user login name corresponding to the email address being processed. You need to explain these sorts of things up front. All the answers so far have assumed you have a traditional unix/linux type environment where mail is delivered to individual user accounts that have /etc/passwd file entries, separate home directories, etc. So, forget everything that's been said, and let's start over. Where is this ./.procmailrc file that you are trying to read with the -m option? That is, what do you expect the current directory (./) to be at the time procmail runs? the .procmailrc file is in /var/qmail/mailnames/domain.tld/test alongside the .qmail file and the Maildir directory In that case you need to tell spamassassin to look for its configuration files in that location. There may be a way to finagle the options to spamassassin itself to make this work, but the easiest approach is to run spamd: spamd --create-prefs --virtual-config-dir=/var/qmail/mailnames/%d/%u (see man spamd for other options you might want to pass, such as -m to limit the number of simultaneous processes, etc.). This is a daemon that needs to run as a system service; you may already have an /etc/rc.d/init.d/spamassassin or similar script for managing this service. It depends on your OS and whether you built SA yourself or installed it with some kind of package management tool (other than CPAN). Then in each appropriate .procmailrc file, :0fw * 256000 | /usr/bin/spamc -u [EMAIL PROTECTED] where you'll need to get the equivalent of [EMAIL PROTECTED] for each virtual address from somewhere; I don't know enough about qmail to tell you how, but if it's not in an environment variable, perhaps you can add it to the procmail command line after the ./.procmailrc and then refer to it here as $1. Just to confirm, the .procmailrc file isn't common to all users but the SA setup is. I'm no longer sure I understand what you consider to be the SA setup.
Re: Looking for Turn-key SA solution
On 7/5/06, Burton Windle [EMAIL PROTECTED] wrote: Does anybody know of a vendor that sells boxes with SpamAssassin pre-installed, with a pretty GUI with quarantine ability? (My company won't allow home-brewed solutions, as they want a vendor to call if I get hit by a spam bus). It's not exactly a vendor solution, but: http://www.vmware.com/vmtn/appliances/directory/255
Re: Warnings in procmail log
On 7/5/06, jdow [EMAIL PROTECTED] wrote: You need DROPPRIVS=yes somewhere near the front of your .procmailrc. No, you don't. By the time the .procmailrc is read, privileges have already been dropped. The only place you need DROPPRIVS=yes is in /etc/procmailrc in the event that you want to give up privileges before the end of that file has been reached. You should not have an /etc/procmailrc file at all unless you have carefully studied what belongs there.
Re: spamassassin-3.0.4-1.el4
On 7/3/06, Kaushal Shriyan [EMAIL PROTECTED] wrote: I have spamassassin-3.0.4-1.el4 installed by default in RHEL4 Linux There have been updates since then. Current is spamassassin-3.0.6-1.el4 -- but note that I recently reported that spamd in that package has a problem with whitelist_from_rcvd directives leaking from one user to another. You might want to install the 3.1.3 RPM from rpmforge.net box, How do i configure spamassassin and integrate it with Sendmail First you need to run (as root) chkconfig spamassassin on service spamassassin start The RedHat (and rpmforge) spamassassin packages supply some files /etc/mail/spamassassin/spamassassin-default.rc /etc/mail/spamassassin/spamassassin-spamc.rc There's nothing especially magic about these, but the intention is that users who want to pass their mail through SA can insert into $HOME/.procmailrc a line such as INCLUDERC=/etc/mail/spamassassin/spamassassin-spamc.rc and not have to worry about the details. If you as system administrator want to run spamc for all users, you'd place that line in the /etc/procmailrc file. Just *before* that line, you should also have the line DROPPRIVS=yes otherwise spamassassin will run as root rather than as the individual user whose mail is being scanned.
Re: spamassassin-3.0.4-1.el4
On 7/3/06, Kaushal Shriyan [EMAIL PROTECTED] wrote: Thanks for the quick turnaround. I installed the latest spamassassin version for RHEL4 [EMAIL PROTECTED] kaushal]# rpm -qa | grep spam spamass-milter-0.3.0-1.2.el4.rf spamassassin-3.1.3-1 Where did you get that spamassassin RPM? schaefer[502] rpm -qf /etc/mail/spamassassin/spamassassin-spamc.rc spamassassin-3.1.3-1.el4.rf The contents of spamassassin-spamc.rc are very simple and have not changed from the 3.0.6-1 RPM: # send mail through spamassassin :0fw | /usr/bin/spamc I also dont have procmailrc under /etc. That's normal, there is no default global procmail configuration. You can just create that file. However, if you are using the milter, then you should NOT run spamc from procmail, so you don't need to make any changes to procmail configuration in that case. How do i proceed here I've not used spamass-milter, so I don't know what is needed to configure that. You will at least need to make sure the spamd process is running if you are using the milter (chkconfig and service start as I mentioned previously), and you probably need to restart sendmail.
Re: spamassassin-3.0.4-1.el4
On 7/3/06, Kaushal Shriyan [EMAIL PROTECTED] wrote: I did what you said exactly and its up and running, How do i test all this configurations for SPAM sendmail YourLocalEmailAddress /usr/share/doc/spamassassin-3.1.3/sample-spam.txt
Re: RFC: spamd disables virtual-config when no @ in user name
On 7/2/06, martin f krafft [EMAIL PROTECTED] wrote: On mail systems with virtual and local users, it's not easily possible to run per-user spamc with user configuration. I run two copies of spamd with different -p port options, and point the virtual users' spamc at the port corresponding to the spamd with the --virtual-config-dir option.
Re: From header being added..???
On 7/2/06, [EMAIL PROTECTED] wrote: I'm not 100% positive this is even a SA issue but it is driving me up the wall. Some mailings are having this added right after the SA report. It usually isn't an issue except that some user fetch their mail from Exchange and these mailings are showing up munged. Did you recently change from SA 3.0 or earlier to 3.1? In 3.1 SA began inserting its headers at the TOP of the filtered message header, rather than at the end. This is more closely conformant to the IETF mail standards and helps reduce breakage of message signature schemes such as DKIM. The point is that you may have had a problem with mail processing for a while, and the behavior of earlier versions of SA simply masked it. X-Spam-Report: * 1.5 DATE_IN_FUTURE_06_12 Date: is 6 to 12 hours after Received: date * 1.8 FUZZY_AFFORDABLE BODY: Attempt to obfuscate words in spam *ALL rules removed for simplicity* From userid Sun Jul 2 23:47:35 2006 Return-Path: [EMAIL PROTECTED] Received: from k5u5r1 See the From header? In all likelyhood this was a From line (with no leading ) at the time SA was handed the message, and was rewritten as a From line by the local delivery processing after SA was already finished with the message. The latter part is correct if the mail is going to be placed in a unix-style flat-file mailbox, but the former is wrong: The From line, if any, should not be added until final delivery of the message to the mailbox file. Look at the processing that's upstream of spamassassin or spamc. In particular check whether the message is ever written to a file and then read back in to the processing stream. The culprit is likely to be whatever does that write.
Re: White List and Yellow List DNS Servers - Proposal
On 6/30/06, Marc Perkel [EMAIL PROTECTED] wrote: Who likes this idea? Evidently habeas.com does, as that's now their business model. Also Bonded Sender (I think they changed the name recently, but I forget to what). And I believe the ISIPP maintains several such lists. Do a Google on reputation service.
Re: internal/trusted again, MSA tested for SPF ?
On 6/30/06, Daryl C. W. O'Shea [EMAIL PROTECTED] wrote: OK, I see now that you want to unconditionally trust the MSA *and* all hosts after it. Which is reasonable if the MSA is just an MSA. For whatever reason you don't want to rely on auth tokens, etc. Seems reasonable to me. That would mean that SA must be able to verify the Received: chain as far back as the MSA, wouldn't it? Otherwise forging a Received: for the MSA would bypass all the network checks.
spamd not properly resetting whitelist?
We recently installed a new CentOS4 server, which comes with SA 3.0.6 prepackaged, to serve as our local mail store (runs sendmail, clamassassin, spamd, and an imap server). The perl version is 5.8.5, and it's an x86_64 platform. Since migrating our users to this machine we frequently have spam mis-classified as ham with USER_IN_WHITELIST as the culprit. There are no whitelist_from or whitelist_from_rcvd directives at the /etc/mail/spamassassin/* level, and only one user who has any of these directives in his user_prefs file, but *every* user has at random times had spam mismarked as whitelisted. In every case the spam so mis-marked was forwarded from a role address on another server that matches one of the whitelist_from_rcvd lines in the single aforementioned user_prefs. If the same misclassified message is put through spamassasssin rather than spamd, or even if it is run through spamc a second time, USER_IN_WHITELIST disappears (the rest of the rules hit remain unchanged). Looking at the mail logs, there does seem to be a correlation: If the first message scanned by a new spamd child is scanned on behalf of the user who has whiltelist lines in his user_prefs, every other message scanned by that child is misclassified. As long as the very first scan is not for this user, the child behaves properly with respect to the whitelist settings. I don't see any bugzilla for this using a search on USER_IN_WHITELIST. Has anyone else encountered this issue? Can anyone verify that it's fixed in 3.1?
Re: White List and Yellow List DNS Servers - Proposal
On 6/30/06, Marc Perkel [EMAIL PROTECTED] wrote: Yeah - but what I'm thinking of is something that is automatic and reputation based rather that paying someone to certify you. In other words your server get whitelisted because you never send spam. Paid or otherwise, how do you get on the list in the first place? You obviously used some criteria based on your own server logs to determine which IPs never send spam -- but never is a long time, and in some cases spam is objective (people report all kinds of stuff as spam for all kinds of reasons).
Re: trusted_networks confusion
On 6/29/06, Daryl C. W. O'Shea [EMAIL PROTECTED] wrote: EVERYTHING after an MX MUST be listed as BOTH trusted and internal networks. Under what circumstances would one list something as internal but not trusted?
Re: trusted_networks confusion
On 6/29/06, Daryl C. W. O'Shea [EMAIL PROTECTED] wrote: Bart Schaefer wrote: Under what circumstances would one list something as internal but not trusted? NEVER. Newer versions of SA won't even allow you to make that misconfiguration. Ah, good. That's as I expected. (So why doesn't SA simply always merge internal_networks into trusted_networks? It seems a waste of effort to have to manually list the internal_networks in both places.)
Re: Not just use_bayes_rules 0
No one has any comments at all? -- Forwarded message -- From: Bart Schaefer [EMAIL PROTECTED] Date: Jun 23, 2006 10:49 PM Subject: Not just use_bayes_rules 0 To: Spamassassin Users List users@spamassassin.apache.org I want to make sure I'm not misinterpreting something else before I report this as a bug. I just tried use_bayes 1 use_bayes_rules 0 The effect of this seems to be that NONE of the rules are applied, except whitelist_from and blacklist_from. I had assumed it would just turn off the BAYES_* rules, as if they had all been given zero scores. use_bayes_rules ( 0 | 1 ) (default: 1) Whether to use rules using the naive-Bayesian-style classifier built into SpamAssassin. This allows you to disable the rules while leaving auto and manual learning enabled. Under what circumstances would one want to disable ALL the rules while still leaving auto- learning enabled? What good could possibly come of it?
Trouble with UNwhitelist_from_rcvd
The short of it is that I can't get unwhitelist_from_rcvd to unwhitelist anything. Here's the situation: We have a brand-new machine that's going to be swapped in as our mail server. We're trying to test everything thoroughly before we switch over to it. To avoid any loss of mail, I have a test user on the old machine (call it X) with procmail recipes to forward all mail to the same user on the new machine (Y). All mail sent on the LAN is masqueraded so the headers say it is from the brasslantern.com domain. The test user is named untrusted-relay. On both X and Y, the user_prefs file has whitelist_from_rcvd [EMAIL PROTECTED] brasslantern.com Now, the trouble is, that to test SA I forward from X to Y *before* SA processing. Trimmed excerpts from spamassassin -D output on a spam message (removed stuff like SPF query failing to load, etc., and changed machine name to Y): debug: SpamAssassin version 3.0.6 debug: Score set 0 chosen. [...] debug: is Net::DNS::Resolver available? yes debug: Net::DNS version: 0.48 debug: all '*From' addrs: [EMAIL PROTECTED] [EMAIL PROTECTED] debug: Running tests for priority: 0 debug: running header regexp tests; score so far=0 [...] debug: forged-HELO: from= helo=0630.com by=brasslantern.com debug: registering glue method for check_hashcash_value (Mail::SpamAssassin::Plugin::Hashcash=HASH(0x1b9c270)) debug: all '*To' addrs: [EMAIL PROTECTED] [EMAIL PROTECTED] [...] debug: running body-text per-line regexp tests; score so far=-99.702 debug: running uri tests; score so far=-99.702 The reason for the -99.702 is because the first of the two '*From' addresses matches the whitelist_from_rcvd rule. So I tried adding unwhitelist_from_rcvd [EMAIL PROTECTED] brasslantern.com but this does not change anything. In fact I've tried every variant of unwhitelist_from_rcvd that I can think of, to no effect. The only thing that changes the score is removing the whitelist_from_rcvd directive. I've searched bugzilla without finding anything about unwhitelist for anything more recent than 2.60. Anybody have any clues what's going on, or what I'm doing wrong?
Re: Trouble with UNwhitelist_from_rcvd
On 6/23/06, Daryl C. W. O'Shea [EMAIL PROTECTED] wrote: Did you read the Mail::SpamAssassin::Conf perldoc? Yes ... so what you're saying is, previously used in means written in the config file entry not used in spamassassin when matching. The phrase the address is what threw me; the strings in whitelist_from_rcvd are patterns that match addresses, they aren't addresses. OK, so that leaves the question: How do I whitelist *all but one* address from a given domain? I don't think I want blacklist_from because I don't actually want to blacklist this address; I just want to NOT whitelist it.
Re: Trouble with UNwhitelist_from_rcvd
On 6/23/06, Daryl C. W. O'Shea [EMAIL PROTECTED] wrote: Well you could s/address/address pattern/. I could, but plainly what I did was s/used in/used in processing/, because it seemed a whole lot more intuitive for it to function that way. Ah, well. This will probably change in a future version to do what you want instead. Does anyone have an alternate suggestion in the meantime? I don't think I can wait for SA3.2 before I roll out this new server.
Re: Black Copy filtering problem
On 5/28/06, Phil (Sphinx) [EMAIL PROTECTED] wrote: I really don't understand. I haven't attempted to figure out what the SARE rule is doing, I'm afraid. Do you think I should ask the exim-users list ? If the goal is to limit the volume of mail that any particular user can cause to be delivered, and exim is your MTA, then yes, the exim list would be the place to ask.
Virtual user config and auto-whitelist (again)
A while ago, I asked about updating the AWL when using spamd --virtual-config-dir. The discussion got sidetracked onto the topic of the obsolete -a option and the AWL plugin, and consequently my original question never got a satisfactory answer. Here it is again: On 4/26/06, Bart Schaefer [EMAIL PROTECTED] wrote: I've recently switched from running spamd on our mail server machine, where all users have direct access to their SA config in their home directory, to running spamd on a second machine and using --virtual-config-dir for user configuration. (SA 3.1.1) The only problem this has posed is that there's no convenient way for users to modify entries in the auto-whitelist file. Some spam (mostly mortgage offers with obfuscated text) that came in before bayes was retrained got scored low, and consequently the AWL scores are pulling the total score for new spam from the same source back down below the 5.0 threshold in spite of it hitting BAYES_90 and above. I've resorted to deleting the auto-whitelist files from the virtual config dir when someone notices this effect, but that's hardly a scalable solution. Is there another approach I don't know about?
Re: SA and Bayes in a multi-user environment
Bayes is only as accurate as its trainer(s). http://www.jgc.org/blog/2006/05/theres-one-born-every-minute-spam-and.html
sa-update vs. RDJ -- Default rules directory changed somehow?
I think there's some kind of conflict between sa-update and RulesDuJour that has borked my spamassassin installation, but I can't figure out how. This morning after RDJ restarted spamd, spamc started returning messages with ONLY the spamassassin version header added, not the score report. Running spamc -R by hand, I found that spamd was returning no report template found. So I ran spamassassin -D and discovered that the SARE rules are being loaded out of /etc/mail/spamassassin, but none of the default rules are being loaded. [24302] dbg: config: using /var/lib/spamassassin/3.001001 for sys rules pre files [24302] dbg: config: using /var/lib/spamassassin/3.001001 for default rules dir What? The default rules dir is supposed to be /usr/share/spamassassin/. That's where it was before the restart this morning. I've moved the SARE rulesets out of the way, grepped for /var/lib in all the config files that the dbg log says are being loaded, run SA as two different users, etc., and I can't find anything that's changing the default rules directory. There's nothing in /var/lib/spamassassin/3.001001 except the sa-update subdirectories with a single MIRRORED.BY file in one of those, so no wonder I'm not getting any rules loaded. What the heck is going on here? Where am I failing to look?
Re: So, when do we start handling [dot] in a URI
On 5/12/06, jdow [EMAIL PROTECTED] wrote: jdow And you propose we do what instead? Look for other characteristics of the messages that could be filtered. I haven't seen any of these spams, so I don' t know what those might be, but this can hardly be the *only* thing the spammer is doing. It's just the one that jumped out as obvious. But it's also the one that's likely to be easiest to mutate rapidly, so it's probably the worst one to attack.
Re: sa-update vs. RDJ -- Default rules directory changed somehow?
On 5/13/06, Bart Schaefer [EMAIL PROTECTED] wrote: I think there's some kind of conflict between sa-update and RulesDuJour that has borked my spamassassin installation, but I can't figure out how. Apparently the conflict is only that RDJ restarts spamd automatically, but sa-update does not. What? The default rules dir is supposed to be /usr/share/spamassassin/. I finally grepped spamassasin itself and found: Default configuration data is loaded from the first existing directory in: /var/lib/spamassassin/3.001001 /usr/share/spamassassin /usr/share/spamassassin /usr/local/share/spamassassin /usr/share/spamassassin (Why is /usr/share/spamassassin in that list three times?) Well, guess what. sa-update creates the /var/lib/spamassassin/3.001001 directory if it does not exist, rather than finding the directory that does exist and using that. I didn't notice this at first because spamd didn't restart after sa-update, and RDJ didn't do anything new until yesterday. (This is a fairly recent SA reinstall, and an even more recent RDJ install.) Yet the CPAN install of spamassassin uses /usr/share/spamassassin for the installation. Surely the install ought to use the same directory that sa-update is going to create, or vice-versa?
Re: sa-update vs. RDJ -- Default rules directory changed somehow?
On 5/13/06, Theo Van Dinter [EMAIL PROTECTED] wrote: On Sat, May 13, 2006 at 10:57:11AM -0700, Bart Schaefer wrote: Well, guess what. sa-update creates the /var/lib/spamassassin/3.001001 directory if it does not exist, rather than finding the directory that does exist and using that. I didn't Of course. sa-update *only* uses the /var/lib area. It doesn't care about what other rules you already have installed or where they are. But surely there's some kind of disconnect here. sa-update creates an empty directory that spamassassin (and spamd) then uses preferentially to the one that really has the rules in it. Yet the CPAN install of spamassassin uses /usr/share/spamassassin for the installation. Surely the install ought to use the same directory that sa-update is going to create, or vice-versa? No. sa-update is optional and writes stuff to its own area separate from the installation of SA. In that case I would argue that either (a) running sa-update should not create a directory when there are no updates to populate it, or (b) running sa-update should copy the existing set of rules into the update directory. You may want to check out http://wiki.apache.org/spamassassin/RuleUpdates which talks about sa-update, how it works, etc. I did, before the first time I ran it. That page explains about setting up channels, gives an example of running sa-update (I didn't bother to restart spamassassin after I ran it, because it reported there were no updates available) and goes on to say: * Currently, for 3.1.1 and 3.2.0, to use any channel for updates requires that updates.spamassassin.org also be used. This is because once the update directory exists, the SpamAssassin modules expect to find all rules in that directory. Nowhere does it say that it creates this directory and leaves it empty when there are no updated rules. Nowhere in man sa-update does it say that either. How was I supposed to realize that running sa-update would leave me with a crippled installation?
Re: sa-update vs. RDJ -- Default rules directory changed somehow?
On 5/13/06, Theo Van Dinter [EMAIL PROTECTED] wrote: It's not empty if the download is successful. I believe there's a ticket about changing the behavior so an empty directory isn't left behind if the first attempt to do an update fails. Sounds good. In that case I would argue that either (a) running sa-update should not create a directory when there are no updates to populate it, or I'd have to double check, but for (a), I believe that happens already. Having no updates available doesn't create the directory. However, what's more likely is that there's an upgrade available but the download failed. Was there an update available on May 8? That's when I ran sa-update last. It just happens to have been most of a week before anything else caused spamd to restart. I'm pretty sure that I got the exit code 1 from sa-update; I'm quite sure that I *didn't* get an exit code of 4 or more. I ended up with: /var/lib/spamassassin/3.001001/updates_spamassassin_org/ (empty directory) /var/lib/spamassassin/3.001001/updates_spamassassin_org.tmp/MIRRORED.BY Having removed the entire 3.001001 tree, I just re-ran sa-update and now I have what appears to be the correct update: /var/lib/spamassassin/3.001001/updates_spamassassin_org/ (lots of .cf files) /var/lib/spamassassin/3.001001/updates_spamassassin_org.cf So I'm confused. If you're running 3.1.0, sa-update acts completely differently and there are no updates available for it anyway. If you're running 3.1.1, there are updates available. If you're running 3.2.0, there are updates available. So the only thing that makes sense here is that the download failed, which is documented in the wiki page. I'm running 3.1.1.
Re: So, when do we start handling [dot] in a URI
On 5/12/06, Bret Miller [EMAIL PROTECTED] wrote: Seems spammers have taken up to doing what many of us have in posting e-mail addresses, putting [dot] instead of the . in the URL and telling people to replace it Gosh, exactly what regular people have been doing on web sites and in news/list postings for years, to prevent spammers from harvesting their addresses. So now that the spammers are using our own defenses against us, you suggest that we should invent the technology to defeat those defenses? And *then* what happens?
Re: So, when do we start handling [dot] in a URI
On 5/12/06, Kai Schaetzl [EMAIL PROTECTED] wrote: Bart Schaefer wrote on Fri, 12 May 2006 07:34:05 -0700: So now that the spammers are using our own defenses against us, you suggest that we should invent the technology to defeat those defenses? What's there to invent? The point is that these need to be identified as URI. So, convert to URI and then lookup in SURBL. It just seems like a useless rathole to go down. (1) Website maintainer uses technique X to obsure addresses on his site. (2) Spammer notices that his harvester failed to decrypt X. (3) Spammer copies technique X and uses it to obscure his spam. (4) SA programmer devises a way to decrypt X to block the spam. (5) Spammer copies algorithm from SA into his address harvester. (6) Website maintainer starts getting spam, so he devises a new X. (7) Repeat at (1).
Re: Those Re: good obfupills spams (bayes scores)
Incidentally, the FAQ answer for HowScoresAreAssigned on the SA wiki is out of date.
Re: unpacking spam attachments for sa-learn
On 5/1/06, Jeff Portwine [EMAIL PROTECTED] wrote: I tried ripmime, and it does extract the attachments but it throws away all of the header information and gives me only the attachment by itself. I wrote an extractor in procmail for simple (as in, it doesn't handle nested structure well) MIME body parts. http://www.well.com/user/barts/email/mimepart.txt You'd do something like CONTENT_TYPE=message/rfc822 INCLUDERC=mimepart.txt RESULT=`echo $BODY_PART | spamc -L spam` You probably want to avoid doing this on very large messages, as it does slurp the entire message into a variable.
Re: Those Re: good obfupills spams
On 4/29/06, List Mail User [EMAIL PROTECTED] wrote: While SA is quite robust largely because of the design feature that no single reason/cause/rule should by itself mark a message as spam, I have to guess that the FP rate that the majority of users see for BAYES_99 is far below 1%. Anyway, to better address the OP's questions: The system is more robust if instead of changing the weighting of existing rules (assuming that they were correctly established to begin with), you add more possible inputs Exactly. For example, I find that anything in the subset consisting of messages that don't mention my email address anywhere in the To/Cc headers and also scoring above BAYES_70 has close to 100% likelyhood of being spam. However, since I also get quite a lot of mail that doesn't fall into that subset, I can't simply increase the scores for the BAYES rules. In this case I use procmail to examine the headers after SA has scored the message, but I've been considering creating a meta-rule of some kind. Trouble is, SA doesn't know what my email address means (it'd need to be a list of addresses), and I'm reluctant to turn on allow_user_rules.
Re: Those Re: good obfupills spams
On 4/29/06, Matt Kettler [EMAIL PROTECTED] wrote: Besides.. If you want to make a mathematics based argument against me, start by explaining how the perceptron mathematically is flawed. It assigned the original score based on real-world data. Did it? I thought the BAYES_* scores have been fixed values for a while now, to force the perceptron to adapt the other scores to fit.
Re: Those Re: good obfupills spams (bayes scores)
On 4/29/06, Matt Kettler [EMAIL PROTECTED] wrote: In SA 3.1.0 they did force-fix the scores of the bayes rules, particularly the high-end. The perceptron assigned BAYES_99 a score of 1.89 in the 3.1.0 mass-check run. The devs jacked it up to 3.50. That does make me wonder if: 1) When BAYES_9x FPs, it FPs in conjunction with lots of other rules due to the ham corpus being polluted with spam. My recollection is that there was speculation that the BAYES_9x rules were scored too low not because they FP'd in conjunction with other rules, but because against the corpus they TRUE P'd in conjunction with lots of other rules, and that it therefore wasn't necessary for the perceptron to assign a high score to BAYES_9x in order to push the total over the 5.0 threshold. The trouble with that is that users expect training on their personal spam flow to have a more significant effect on the scoring. I want to train bayes to compensate for the LACK of other rules matching, not just to give a final nudge when a bunch of others already hit. I filed a bugzilla some while ago suggesting that the bayes percentage ought to be used to select a rule set, not to adjust the score as a component of a rule set.
Those Re: good obfupills spams
The largest number of spam messages currently getting through SA at my site are short text-only spams with subject Re: good followed by an obfuscated drug name (so badly mangled as to be unrecognizable in many cases). The body contains a gappy-text list of several other kinds of equally unreadable pharmaceuticals, a single URL which changes daily if not more often, and then several random words and a short excerpt from a novel. They usually hit RCVD_IN_BL_SPAMCOP_NET,URIBL_SBL but those alone aren't scored high enough to classify as spam, and I'm reluctant to crank them up just for this. However, the number of spams getting through SA has tripled in the last four days or so, from around 14 for every thousand trapped, to around 40. I'm testing out RdJ on the SARE_OBFU and SARE_URI rulesets but so far they aren't having any useful effect. Other suggestions?
Re: Those Re: good obfupills spams
On 4/28/06, [EMAIL PROTECTED] wrote: I would make a subject Re: good rule that scores just high enough to push it to the spam level. They're only scoring about 3.3, and I'm reluctant to make Re: good worth 2 points all by itself. That'd be worse than increasing the spamcop score. A meta rule, though ...
Re: Virtual user config and auto-whitelist
On 4/26/06, Rosenbaum, Larry M. [EMAIL PROTECTED] wrote: From: Bart Schaefer [mailto:[EMAIL PROTECTED] ... (Someone remind me why the spamd option to disable the auto-whitelist was dropped?) It hasn't been dropped; they just moved the documentation into Plugin/AWL.pm. Ah, right, duh. So the answer to my question is that the -a option was dropped because it doesn't make sense to have an option to en/disable a plugin.
Re: (OT, but relevant) Playing with AOL?
On 2/23/06, Peter P. Benac [EMAIL PROTECTED] wrote: Get enough of those TOS messages in one day and they will still block you IP address and any IP address that you have assigned to you. FUD. They don't block multiple IPs at once as far as I can tell. Furthermore, they have already announced that mail providers who will not send mail to them through goodmail.com will find tightened filters. When the goodmail.com system goes in place it will REPLACE the current whitelist program. More FUD. Please don't spread rumors. The initial report was the Goodmail would replace the *extended* whitelist, which is not the same as the basic whitelist. Further, they've backed off on that (and said that in fact they never intended that to be the case in the first place). The existing whitelists will continue operating as they have.
Re: Bayes rocks
On 9/16/05, jdow [EMAIL PROTECTED] wrote: You are better off to use a normal SpamAssassin meta rule. How so? SA doesn't know how to interpret not to me (unless I write a plugin) -- it has no built-in knowledge of, for example, all possible sendmail aliases for my personal account -- and individual users can't add their own rules, so the only way I can code a custom expression to match all my personal addresses is to do it outside of SA. I suppose it would be possible to write a blacklist_not_to rule as a plugin, but procmail is doing it just fine, thanks.
Bayes rocks
On 9/16/05, jdow [EMAIL PROTECTED] wrote: Yes indeedy. And I've been looking at Bayes scores here just a wee bit. BAYES_99 just does not hit on ham and hits on high percentages of spam. Even BAYES_95 does not hit ham. I go down to BAYES_80 before I hit 0.05 percent of ham. During a two-week period recently I captured a copy of all mail that (1) did not reach an SA score of 5+ points and (2) did not have my personal email address in the To:/Cc: headers. I then examined the set of SA rules that were triggered by those messages (as recorded in the X-Spam-Status). 100% of such mail that hit BAYES_80 or more was in fact spam; about 90% of BAYES_70 was spam. However, there were a few BAYES_80 during the same period that *did* have my address in the headers and that were *not* spam (and, also correctly, not tagged by SA), so it wasn't just a matter of cranking up the score for BAYES_80. Instead I added procmail recipies to treat as spam the combination of not to me plus BAYES_[89][059] regardless of the SA point score. That was a month ago, and I haven't had a false positive yet. If there are any developers listening ... has anyone given any consideration to Bugzilla #3785?
More unintentional spam humor/irony
The choice of anti-bayes-filler below is unfortunate on so many levels ... and on top of that, they spammed our abuse address. (Links to spammer site deleted.) -- Forwarded message -- Date: Sun, 11 Sep 2005 09:45:40 +0500 From: Nadia Joyner [EMAIL PROTECTED] To: abuse Subject: Re: Nadia The Environmental Protection Agency said initial samples of the floodwaters indicated high levels of lead and E. coli and other coliform bacteria. Don't you think it's about time to drop a few pounds? Now you can, without sacrifice or exercise A representative of the Army Corps of Engineers said 23 of the 148 permanent pumps in New Orleans were working, their efforts augmented by three portable pumps.
Re: ANNOUNCE: SpamAssassin 3.1.0-rc2 release candidate available!
On 9/7/05, Loren Wilton [EMAIL PROTECTED] wrote: has this been opened as a bug in BZ yet? I haven't seen a sign of it. I hope the OP does this, I'd hate to have to try to track back through 3 weeks of deleted mail to find the original posting. Especially since I don't remember who posted it! John Rudd, on August 27. Took me about 15 seconds to find it in gmail (well, really, to find the message John sent on August 29 in which he quoted the August 27 mail). The subject was Problems with SpamAssassin 3.1 RC1and MIMEDefang.
Re: Rather too refreshing: bayes.lock
On 5/19/05, Ben Wylie [EMAIL PROTECTED] wrote: debug: refresh: 3392 refresh F:/DOCUME~1/ADMINI~1/SPAMAS~1/bayes.lock [...] debug: expired old Bayes database entries in 1533 seconds: 485101 entries No real help here, but another data point: I've been seeing occasional problems on my Linux workstation with 3.0 leaving bayes.lock files behind, throughout the 3.0 series. I suspect it does have something to do with syncing the journals and/or token expiry, because although it always happens after spamd has run, it does not happen in any observably predictable way.
Re: Bombarded by German political spam
On 5/15/05, Raymond Dijkxhoorn [EMAIL PROTECTED] wrote: http://mailscanner.prolocation.net/german.cf You've got a bit of duplication in there (rules 02 and 22 are the same, as are 04 and 26).
Re: Confession and rage
On 5/6/05, List Mail User [EMAIL PROTECTED] wrote: Again, there is an unfortunate exception provided by Sec 3.17 which allows Transactional or relationship message- and in particular clause A.iii.I specifically allows notification concerning a change in the terms or features of. This has specifically been taken by courts to allow the sending of email notifying you of a change in price(s) Eh? When? Where? Case reference, please. (iii) a valid physical postal address of the sender. [...] It is unfortunate that both P.O. Boxes and Agents for service of a holding company would clearly qualify under this clause (i.e. a physical postal address != physical address). I've actually spoken to an attorney specifically about this point, and it was his contention that this was not at all clear.
Re: AWL whaaat
On 5/4/05, Matt Kettler [EMAIL PROTECTED] wrote: That's great.. I was trying to think up a good scenario for that acronym, but couldn't. Obviously it's the Automated Spam Sorting Average Scoring System Involving Ninjas
Re: sa-learn issues
On Mar 31, 2005 5:46 PM, AltGrendel [EMAIL PROTECTED] wrote: Matt Kettler wrote: The problem is, it seems that sa-learn is ignoring the -u / --user= flag. sa-learn uses the userid of the user that calls it. Period. Then why does man sa-learn show a -u flag: Is this obsolete? It's not obsolete, but it's incomplete. Changing the userid does not change the location that sa-learn uses for the bayes databases, and although you can specify an alternat config file path, the user-level config file is not allowed to change the bayes database location. So for virtual users you must actually set the HOME environment variable to point to the correct location before running sa-learn, and *also* use the -u option to set the userid; and of course the real user running sa-learn must have write permission for the bayes files and for the $HOME/.spamassassin directory (to create lock files). I ended up making a setuid (to the spamassassin user) copy of sa-learn and a wrapper script for running it.
AWL interaction with Bayes, and sa-learn
First, tell me if there's anything wrong with this summary: 1. A message arrives and is passed to spamassassin and/or spamc+spamd. 2. The score for that message is computed. 3. The AWL score for that sender is updated. 4. The message was mis-classified, so after delivery the user feeds the message to sa-learn. 5. The Bayes score for (the tokens in) that message is updated, *but the AWL score for the sender remains unchanged.* 6. A similar message from the same sender arrives. The net score is moved away from the Bayes-influenced value by the (obsolete, or at least incorrectly recorded) AWL value. Assuming I've got that right, tell me whether there's aaanything wrong with this conclusion: The AWL will wrongly influence the score for both spam and non-spam as long as the AWL remains unaffected at step 5, in any case where the initial classification was incorrect. Finally the question: Shouldn't sa-learn retrain the AWL as well? At the least, throw out the entry for that sender and begin recomputing it with the next message?
Re: PERL update broke spamassassin?
It looks as if /usr/bin/spamassassin is being executed by the shell, attempting to interpret perl as shell commands. This probably means that your upgrade changed the path to the perl executable, so the #! line at the top of /usr/bin/spamassassin is no longer pointing to the right thing, but it must have something to do with the way procmail is invoking spamassassin because normally you'd get a no such file or directory error instead in that case.
Re: Sudden spam volume decrease?
On Fri, 14 Jan 2005 10:36:25 -0800, Bart Schaefer [EMAIL PROTECTED] wrote: Menno van Bennekom wrote: Sorry, that was mis-attibuted. I meant to trim that line.
Re: SA 3 - I'm Totally Stuck!
On Fri, 7 Jan 2005 10:27:38 -, bubba [EMAIL PROTECTED] wrote: I'm trying to install Spamassassin 3 on a Linux box w/Ensim control panel installed Meaning you're trying to install it through the control panel rather than using a real login shell? Or only meaning that you're using Ensim to set up the .procmailrc files? but I'm experiencing a variety of errors. I've modified each users' .procmailrc file, but the logs are showing that spamc cannot be found No, they're showing that spamc cannot be *executed*, which is an entirely different thing. This implies to me that procmail is executing on a different machine, with a different binary architecture, from that where spamc was compiled. (regardless of how I address it, and I know it's there - I can run it from the command line). And you're sure there's only one machine involved, and no NFS mounts or the like? Copying spamc to each users' home directory allows it to be run That pretty strongly implies that the mail delivery machine is not the same one where the users have their home directories. Previously, I had version 2.6 working quite happily, so this is confusing the hell out of me! Any help most gratefully received! And did you install 2.6x yourself?
Re: Spamassassin classifying normal mail as SPAM due to RBL tests
The URIBL_* tests are not concerned with where the mail is from; they're examining the message *body* to see if it contains links to websites that are commonly advertised in spam. If you remove the entire message body as private content when posting the sample to the list, you prevent anyone from helping you determine which URLs might be the problem.
Re: ver 3.0 opinions
On Thu, 28 Oct 2004 15:21:59 -0700, Jeff Ramsey [EMAIL PROTECTED] wrote: Is version 3 really any better at stopping spam that 2.63? Version 3 stops different spam than 2.63, in my experience so far. E.g. it's better at catching the drug spam but not as good at the earn cash for making phone calls spam. If you use full network tests, I suspect 3.001 (did I get enough zeroes? too many?) is actually better than 2.63/2.64. Using it in local only mode, though, I've found it not very different. The spams that get through 3.x that do not get through 2.6x are generally (a) those that match BAYES_99, which by itself in the default configuration is no longer a large enough score to make me happy, or (b) would have been tagged as spam except that the AWL smoothed them down to just below the threshhold. I confess with some embarrassment that I haven't yet looked into how to turn off the AWL in spamd. Statement (b) above comes from running the same messages through spamassassin -t and having them marked as spam with the only difference in the latter case being the absence of an AWL hit.