Re: Using spamc--EVERY message has score of zero (including spam)
On Sun, 27 Jan 2008, Don Ireland wrote: Can somebody help me figure out WHY? It's returning *0/0* As far as my experience goes, you get 0/0 only if the spamc did not get a connection to the spamd! A 'real' score of zero woutd be 0/min with 'min' the minimum spamscore of the spamd, normally 5. So you'll have to check how your spamc does try tho reach your spamd. Are they on the same host? (normally they access the same 'unix-domain-socket file' which must be accessable to the uid of spamc! ; you can also try to connect by TCP adding the option '-d localhost') or do you want to ask spamd on another host? Then you'll have to use '-d somehost'. AND make sure this spamd on 'localhost/somehost' allows TCP connections! (you'll need on the spamd the option ' --allowed-ips=##.##.##.##' (with the ip of the spamc host - localhost or somehost)) The default-Installation does ONLY use the sockt-file for security reasons. Stucki (using spamd/spamc on differen thosts :-) -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-75 459| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600| Takustr. 9 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/
Re: The googolbees are getting craftier
On Mon, 21 Jan 2008, John D. Hardin wrote: m,https?://(?:[^\./]+\.)*goo+gle(?:pages)?\.(?:[a-z][a-z][a-z]?(?:\.[a-z][a-z])?)/+.*[?](?:btni|adurl),i If I understand that pattern, both the '*' are 'unbounded'??? This might 'break' your spamfilter, if spamassassin gobbles up all memory during analysis. Better replace any unbounded '*' by reasonable length {0,N}, with N a little more than the seen strings. Stucki -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-75 459| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600| Takustr. 9 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/
Re: Paiment Repre sentative spams
On Mon, 26 Nov 2007, Igor Chudov wrote: for thieves who are moving stolen money to their real accounts, using A german radio-station in Berlin had a feature abount those criminals. Sending trojans as spam to people using homebanking, they capture money, and to transfer this money to themselves they need those 'helpers', who receive the 'payments' in-country, then transfer to other countries where the money 'vanishes'. In germany doing such a transfer is 'laundering money', and the 'helper' not only falls under this law but also has to pay back the whole sum, while the 'real criminal' normally is already gone ... It was assumed that there are millions more 'pins + tans' grabbed, and 'on hold', while the scams do not recruit 'enough helpers' to get hold on the money of the already trojanized bank-accounts. So seemingly lots of people have caught on and are ignorig those scams. (I hope my largely rusty english comes across :-) Stucki (getting lots of those all the time)
Re: 'spamc/spamassassin' crashing with overlong blank line spams?
On Wed, 19 Sep 2007, Karsten Bräckelmann wrote: How so? Since these mails are killing spamd, what use is it to throw yet another rule at it? Well, in the time since I wrote the mail to the list, I circumvented the problem by prefixing my 'spamc' by a little 'awk-filter' to get rid of those overlong lines, and since then the spamfiltger is ok. And I did hope somebody would (did!) write a rule, while I was working on fixing my spamc/scripting ... The meta rule on the list looks promising :-) Stucki -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-75 459| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600| Takustr. 9 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/
'spamc/spamassassin' crashing with overlong blank line spams?
Hi! Seemingly our spamc (3.1.9, not yet 3.2.*) can not transfer a special kind of current spam to a remote spamd. Those Mails always produce '0/0' instead of usable reports. You can see something like the Mail I analyzed at http://page.mi.fu-berlin.de/stucki/mail.txt (I had change the offending line for the browser too, so at the end you seen a descriptive line only) Is this a known failure of the old spamc? Is the MTA supposed to fix Mails with overlong blank/any lines? Do I have to switch to 3.2.* to fix it? Would it be possible to 'just take' a newer 'spamc (3.2.*)' to communicate to an old 'spamd(3.1.*)' or did the protocol change somehow? Thanks for hints ... Yours Stucki PS.: Ideas welcome for catching the characteristic Subject of those spams, which look like 'just random tty line noise'! -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-75 459| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600| Takustr. 9 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/
Re: Number spam (paranoid guess)
On Tue, 07 Aug 2007, John Andersen wrote: Ok, what is this stuff. All it contains is 6 digit numbers. What's up with that stuff? My most paranoid guess is: - Cause: we have summer vacation time ... So LOTS of people are on holidays. If you use E-Mails with totally useless content which goes through all filters for a short time, you can trigger LOTS of vacation-Messages! Then (1) you will have to know, 'who answered' and if you ar not only a a spammer, but also a 'more criminal mind', you (2) might even find the typical vacation messages like I'm away to china for two weeks, try later ...! So you know somebody is *away* and you can safely steal from the flat, impersonate the owner of the addresse etc... That's paranoid, I know, but criminals are not always dumb :-) And lazy anyway, and on the internet too :-) Stucki (who never has vacation [messages:-])
Re: Now its zip attachments ^^
On Mon, 23 Jul 2007, John Scully wrote: ... After adding the sanesecurity sigs to clamd last week not one PDF has made it through. And since clamd unpacks and examines every attachment anyway it is no additional load. In fact, due to the messages not hitting SA it probably reduced load slightly. I have a 'political problem' with that. We 'drop' knowv viruses into a quarantine directory without further notice, and only once in years somebody complained and wanted his virus back :-) We *only* TAG spam with headers, then users decide to drop, move, or read it. So if I 'simply insert' those clamav sigs, spam would be handled as a virus, not as 'our spam', which I'm not allowed to destroy. Did somebody of you create an extra 'instance' of clamad-filter to fight spam with spam-sigs only, without scaning for virus-sigs? Does that sound feasible? Stucki
Re: Spam Du Jour ? *.XLS -- packed into zip now
On Sun, 22 Jul 2007, Robert Schetterer wrote: http://sanesecurity.co.uk/clamav/ catches it now As seen before, they react fast on news on this list :-) Now I got the same 'XLS' *inside* a *.zip file! Stucki -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-75 459| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600| Takustr. 9 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/
Re: Spam Du Jour ? *.XLS
On Sun, 22 Jul 2007, Robert Schetterer wrote: investors news-76212.xls, et all no real challenge jep , got 3 xls spams today well, here too, but I think soon we'll get the whole mix ... a combinatoric explosion of envelope formats and content variants, meaning 'any windows-showable-fileformat' * 'all the already known picture-tricks embedded' Anybody working on generic detectors yet? (I really would like to plug that (w)hole :-) Something like amavis or clamav to first unpack and then spamassassin to analyze it? Stucki
Re: Spam PDF
On Wed, 27 Jun 2007, Wael Shahin wrote: I have two servers one is running DCC and one is not, the one that is running DCC didn't pass the message or maybe I am mistaken but it didn't go through (Maybe didn't get there at all from the first place). On the other server that is not running DCC the email went through and it was an empty email body with a PDF attachment No wonder I think. DCC will notice/flag spam 'already seen elswhere'. AND that may be the only way to decide whether the pdf(s) are junk or real information. So Spamtraps or honeypots may be the fist choice. The last 'try' of the spammers was to put the pictures into Word-docs or powerpoint docs, so I assume they just go through every format of 'embeddable attachment' for which a 'plugin or viewer' exists and which is automagically opening in mailbrowsers (which must be carelessly configured to show the picture, but which is default). So on the long run we need a generic way to mime-strip contents of attachments (like virus-filters do it!) and recursively feed all parts of the mail into scanners for spam (eighter text or picture scanner). If there is a simple way to program signatures for virus-checkers it might be possible to catch specific pictures therewith. Alternatively you could forbid such attachments completely, but that has no chance in a university environment like I'm in. We got wo 'waves' of pdf's here. The first wave was stopped here by noticing that the spammers did program the spambots with a repeated pattern of filenames, but they noticed and the second wave is only random nonsense plus the pdf. But every 'normal' user would never open a pdf out of a mail of nonsense, so they reach only a small fraction which might not be useful for pushing stocks. So I hope that 'fad' might die out soon, like the other waves of doubly-packed pictures in rtf, word, powerpoint did. Stucki -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-75 459| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600| Takustr. 9 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/
Re: TVD_SILLY_URI_OBFU
On Mon, 05 Feb 2007, Bowie Bailey wrote: body Test_01 /remove \\*\/i | /remove \\%\/i | /remove \\!\/i score Test_01 4.0 describe Test_01 Test remove asterisk for URL spams How about this? (untested) body Test_01 /remove \[*%!]\/i Since Sunday after two new obfuscation chars and two new subdomains in the same mails I use (because I hope it to be more specific): [ For Beginners: '\W' is a non-word-character, '\S' is 'not space' and never use '.*'! Instead use a fixed maximum lenght '.{m,M}' where 'm' is minimum and 'M' is maximum of length ] # Obfuscation-nonword-char instead of dot body __MEDOBFU1A/http:\/\S{1,25}\Wcom/i body __MEDOBFU1B/replace ?\W.{1,30}(?:with|by)\s?\./i # Obfuscation-nonword-char inserted body __MEDOBFU2A/http:\/\/\S{1,30}(?:\W\S{0,10}\.com|\.\Wcom)/i body __MEDOBFU2B/remove ?\W/i # both in one rule meta __MEDOBFU1 ( __MEDOBFU1A __MEDOBFU1B ) meta __MEDOBFU2 ( __MEDOBFU2A __MEDOBFU2B ) meta MEDOBFU ( __MEDOBFU1 || __MEDOBFU2 ) score MEDOBFU 3 describe MEDOBFUPharma spam with illegal character in Hostname of URL Using \W may be a risk because the class contains too many characters, but so far I did not hear of FPs. The only trouble with it is, because I write this to the list, tomorrow they will sprout a lot of new different adapted versions of the same basic idea all over the place. So what really will be needed, would be a combination of Rules for 'illegal hostname in url' and something like the URIBLS to catch 'sytactically legal looking' obfuscations. (if such a thing is feasible) Stucki -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-5 57 78| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 66 00| Arnimallee 6 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/
Sudden drop in spam-rate, parallel to a surge of new trojans - beware
Hi! Yesterday we had a sudden drop in spam-percentage from 80% to near 60%. Parallel to it I got six copies of an undetectable (by NAI and ClamAV) new trojan 'exe' in the Mail. Do we have to prepare for a new flood by an updated (just now reorganizing) botnet? Stucki -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-5 57 78| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 66 00| Arnimallee 6 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/
Re: Greylisting
On Tue, 21 Nov 2006, Vahric MUHTARYAN wrote: I'm using SA for a long time without any problem, nowadays spammers are using too much graphical objects and they are tring to change it day by day. I'm tring to use fuzzyocr but it's taking Same Problem here ... too much cpu. I think that try greylisting . I wonder are there anybody use greylisting ? Somebody can give me feedback ? But wouldn't Spammers simply send every Mail twice in an attemtpt to break greylisting, then after the automatic whitelisting has been switched, you get everything twice, simply doubling the amount of spam on the long run? Just curious why I get so many spams twice or thrice in an short time (I have NOT installed greylisting because of that phenomeneon, I assumed geylisting to 'go awy' or 'to be just a fad', but I re-think about it, because of the CPU-Cycles needed for FuzzyOCR). Stucki -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-5 57 78| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 66 00| Arnimallee 6 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/
OT/Humor: Do I have to live in fear of spammers?
Today a subject went undetected through the filter and 'made my day' (ROTFL, couldn't resist to post :-)) Subject: Consequently We must kill you not perhaps. ... Stocks spam ... Does somebody have a list for something like 'the best random-generated spam/text' without polluting this list ? YoursStucki -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-5 57 78| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 66 00| Arnimallee 6 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/
Re: forwarding email using /etc/aliases and keeping spamassassin headers intact
On Wed, 20 Sep 2006, Larry Starr wrote: Are you certain that SA even sees the message before it's forwarded? My first guess, without seeing config files, etc. Would be that your SMTP daemon (sendmail?) is forwarding the message as it's received. This sounds like 'filtering with procmail' during personal delivery. In this configuration the MTA will forward (by .forward or /etc/aliases) *before* procmail would ever be called. In this case the user should forward by a private procmail-rule insteaded of the MTA, so that his procmail has a chance to filter Spam. Stucki
Re: Running on Debian stable
On Fri, 18 Aug 2006, Magnus Holmgren wrote: You could install just spamassassin (but not spamc) from testing, without having to pull in anything else. There's also a spamassassin on dabian 'volatile' under 'volatile-sloppy' (from sources.list): deb http://ftp2.de.debian.org/debian-volatile sarge/volatile main deb http://ftp2.de.debian.org/debian-volatile sarge/volatile-sloppy main I did NOT yet test it yet, I only use the updated clamav there, Stucki! -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-5 57 78| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 66 00| Arnimallee 6 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/
Re: Image spams getting thru
On Tue, 01 Aug 2006, Theo Van Dinter wrote: On Tue, Aug 01, 2006 at 09:24:55AM -0700, John D. Hardin wrote: ... Well, until greylisting becomes enough of a problem that the spammers change their software to queue and retry, thereby eliminating the benefit completely. Or even simply send spam unconditionally twice or thrice just to be sure to get through the greylist. It just needs knowledge how fast you have to give the same combination of envelope-addresses to the same zombie again. And THIS would explain why I get lots of spams more than once, but in 'chunks' of 3 to 6 times the same thing in a few minutes and then pausing for a long while. So just by re-arranging the (spam-)address-lists and sending at least twice the amount of spam, greylisting may be circumvented. Just an idea, because we currently/suddenly get over 20% more spams for the last few days. Stucki -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-5 57 78| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 66 00| Arnimallee 6 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/
Re: exim4 + forwarding + spamassassin
On Thu, 27 Jul 2006, jdow wrote: From: Loren Wilton [EMAIL PROTECTED] ... I've never seen the logic of placing SpamAssassin inside the incoming transaction before the termination of the SMTP connection rather than down the pipe in the MDA. If you want to 'reject spam' (wih score over a given threshold) and because you do not want to generate bounces, you have to check 'inside the transaction', to tell the sending MTA, that you do not accept the current mail becaus of spam. This only works with site-wide bayes and global setup, except if you make sure, that you know the (then exactly one?) recipient of the message at the end of incoming data (the single '.' in the SMTP-Protocol, the 'acl_smtp_data' in exim4). Beware of 'overloading the system' if you check incoming mails 'durig arrival', you will have to restrict the number of concurrent SMTP-connections by the maximum of spamchecks your system can handle. Stucki PS.: I too prefer 'only to tag' the spams, and let the user decide do discard them. I tested both ways and to me the only safe way to never crowd the system ist to spamcheck on the inside in an exim-queuerunner. The nr. of queuerunners can then simply be adjusted to the capabilites of the server. -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-5 57 78| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 66 00| Arnimallee 6 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/
Re: Will bayes-db be 'skewed' by feeding it spam only (one central database)
On Tue, 18 Jul 2006, Dirk Bonengel wrote: ... If I was in your position, I'd try to switch over to a system like Maia Mailguard that keeps a copy of each mail in a database and users can confirm and/or correct the underlying SpamAssassin engine's decisions. This system uses a singel bayes DBWorks fine at a customer of ours that uses some weird proprietary document managing software THIS looks *very* interesting, as it may directly solve the problems we planned to solve in our *next* MTA (not postfix, but exim4 + cyrus) where we already 'test' amavisd-new+clamav+nai-uvscan for filtering and where we needed acces for the users to the filter-settings. Does it really keep *every* Mail in the database? Or only Mail which might be accepted if the user wants it. (50% Mail coming in have useless adresses here) But *now* I'm stuck with qmail+qmail-queue-patch and the older amavis-perl(largely patched). So *now* the users have no influence except 'telling me' [which they mostly do not] :-) Stucki
Re: Will bayes-db be 'skewed' by ... autolearning ham?
On Tue, 18 Jul 2006, Dirk Bonengel wrote: did you investigate auto-learning? This might let your system learn ham as well as spam. Works fine here (same situation - gateway server to a Lotus Notes system, no feedback loop possible) May be I should change the threshholds for autolearning different from the default? (I never touched them so far). I just found *lots* 'autolearn=ham' in my log, and I can not believe that so many are correct. Out of the current log I see Mail classified as 21805 ham 11493 autolearned as ham (this seems suspiciously high?) 85963 spam 52977 autolearned as spam So I fear the 'skew' in my database comes form autoloearning 'bayes-fodder' of spammers and not fron 'skewed explicite learning'. WHat may make it even worse is, that 'inhouse mail==ham' is never learned, because it's never spamchecked (users did complain too much about the slowdown, so only the 'outside' goes through the Spamfilter). Stucki -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-5 57 78| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 66 00| Arnimallee 6 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/
Re: Will bayes-db be 'skewed' by feeding it spam only (one central database)
On Mon, 17 Jul 2006, Logan Shaw wrote: ... someone carrying a knife, they have been a violent criminal, so knife-carrying correlates perfectly with being a criminal. Now imagine that you see a chef. He is carrying a knife, but (Good point: [OT: I even know people who react that way on TV-News] :-) ... by doing that, you will give it a very negative view of the world, where everything looks like spam. (This is all assuming, of course, that your Bayes database is empty when you train it with spam only.) Assuming this scenario I ORIGINALLY started the database on ham of a long backlog of MY mail, which THEN had enough spam AND ham to start with, so it's not as bad as would be possible; but since the last 'fresh start' I 'updated' only the false negatives. And checking near 6000 (low scoring) Spams a week I found only 'classical false positives' (like of this list :-) and for months *I* did not loose(sort away) anything important. But may be one in two months one of our power-users complains about a real false positive, and if I'm allowed, I feed THAT one in. configuration changes that need to be made. Do you have the latest SpamAssassin, and have you enabled some network tests not the latest, because debian 'stable' is not fast in the uptake of new versions. May be I should move to the volatile packages ... like DCC or razor and some RBLs? Those should be carrying some of the load; you shouldn't be relying on Bayes only, Of course. razor, pyzor, dcc, and the newer german iX-plugin, and RBLs do catch lots of mails pushing thousands to scores above 20 :-) If your Bayes database really is messed up, personally I would ... you *do* have is worthwhile. H may be on one of the next 'maintenance days', when (nearly) everything is down for a while, so nothing will slip through during training ... But this 'keeps' me thinking about the different 'hams' in our department. Some are french and some even might be Chinese. So if I train again with *my* mail (postmaster-problems and a bit of half-private stuff) the database might start anew skewed 'against' real hams of other parts of the department! (While I think 'my spam' will be fine to train with). The only 'real solution' might be to switch to a SQL-Database and 'bayes-per-user', but then I'd have to 'train' hundreds of Students how to 'train' their own databases themselves :-)) ... Well, there are probably several different explanations. The best place to start is by looking at the spams that get through and how they scored, especially comparing that to what scores others get on the same messages or similar ones. That's one of the problems here. The mail-filter(-host) runs on old amavis-perl and does not include the whole scoring headers in the mail, but only a marking header with the score itself. So when I later check the same mail (cleaned of the previous marking) I get completely different (mostly horrendously higher) scores for the same, but without really seeing the differences. Seemingly the later in time an 'one of a series spam' comes in, the more of the dynamic systems have learned it and score it. I nearly believe we often are 'at one end' of some 'lists to be spammed', so we get it 'fresh', and only the first users are hit, others get it 'after' the filter dynamically chokes down on it and so the different users do complain about different 'slips'. Sometimes it *seems* as if spammers work their list alphabetically, so user a* is getting something often, which w* never sees, and other way around too :-) Thanks Stucki -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-5 57 78| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 66 00| Arnimallee 6 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/
Will bayes-db be 'skewed' by feeding it spam only (one central database)
Hi! I'm a postmaster working with spamassassin (now debian sarge) for the last years, we habe one filter-host for all mails, so at the moment we have only one global bayes-database.. We are a department for math and computer science and so we get zillions of spam for all addresses 'known on the net' and we get ham for lots of different 'themes' for different workgroups in diverse languages (mostly german of course, being Berlin Germany). Not beeing allowed to peek into other users mailboxes I have no 'representative ham corpus' but only my own, which seems to be very postmaster-specific, while I seem to get a typical average of spams (because my address already existed on a 'News' server :-). Can somebody tell me, whether the bayes-database's accuray does deteriorate by feeding it 'only my spam' (my false negatives) and not feeding it the (to me unknown) typical hams. To me it lately seems to slowly skew to let more and more spam through, instead of 'catching' it. Is this typical? Do I have to recreate the database? Or do I need to get 'ham from a set of typical users' to balance the database? OR are there typical values for bayes_auto_learn_threshold_{non,}spam, different from the defatult, to use in my case? Just curious why so many spams get through to me ... (i.e. around 10 false negatives relative to 90 marked as spam, which ist 'relatively bad' compared to many opinions on the list) Just curious, Stucki (postmaster of math/inf/mi.fu-berlin.de) -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-5 57 78| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 66 00| Arnimallee 6 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/
Re: iXhash plugin docs updated, version for 3.0.x added.
On Wed, 21 Jun 2006, Dirk Bonengel wrote: - added a version that runs under SpamAssassin 3.0.x Thanks a lot! After shortening some of the descriptions (my --lint complains because of more than 50 chars) it already caught some spams this evening! My users will like that :-) Stucki (postmaster at mi.fu-berlin.de) -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-5 57 78| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 66 00| Arnimallee 6 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/
Re: sa-learn not learning with sudo
On Sat, Apr 22, 2006 at 10:55:29AM +0200, Michael Monnerie wrote: ... # sudo -H -u vscan sa-learn --dump ... But when I do # su -l vscan ... # sudo -H -u vscan sa-learn --dump ... Now why is there a diff between sudo as a user or directly logging in as One of the differences will be all the commands in the User's shell-startup-Files! Those are ignored, if you run the command directly by sudo. It also depends on the version of 'sudo', because one of the latest changes *dropped* the HOME-Variable from the environment (at least if you run the command directly from sudo!). Lots of our automated cron-scripts suddenly failed by this 'security fix' and we had to replace OLD: sudo command NEW: sudo env HOME=$HOME command to 'bridge the gap' and re-use the *current* HOME 'inside of sudo'. May be the 'sudo -l vscan' also sets the missing HOME! YoursStucki (postmaster hit by the same? :-)
Re: [OT] Amavisd replacement suggestion
On Tue, Mar 07, 2006 at 04:42:31PM +0100, Michael Monnerie wrote: Isn't PITA some sort of Greek bread? The one they use for Gyros, I believe. Wait, looking on wikipedia: http://en.wikipedia.org/wiki/Pita So why is it like Greek bread? May be, amavisd is best if toasted (as I like pita==pide :-) But if 'amavisd is a PITA' meant the old Version, which starts 'one perl-process per mail' that is enormously slow and cpu-power-hungy compared to amavisd which comples only once, then stays in memory and then only forks children. So the old one is more a pain in the server, a chance to 'toast' the Server or your mail :-) Stucki
Re: pcre
On Thu, Feb 09, 2006 at 03:24:58PM -, John Hall wrote: Ronan [EMAIL PROTECTED] wrote in message Anyone have any input on this? What would be the implications? Should it just be a straight translation perl - c , or are there other factors? Ronan, Why would using pcre be quicker? Perl's regex engine is written in C as well. Besides, there is more to SA than just matching regexes. The most important Difference between 'grep-ing' by pcre versus perl in my opinion is the 'Startup-Time'. Starting/dynamically-linking a whole 'perl-interpreter' is a lot more Work than just starting a pcre Pattern-Engine. So if you 'just grep for Text' with a script, pcre(grep) is your friend. BUT if you need lots of dynamic libraries, use loadable Modules, and connect to networks, like spamassassin does, 'pcre' simply has nothing to compare with that. And in the case of 'spamd' the startup-phase loads only once, then there only fork children, so there should be no large startup-penalty. ONLY you should not use 'dangerous/slow perl-patterns' (avoid ambiguities, avoid remembering brackets without (?: ), limit pattern-match-lengths by not using '.*' but .{min,max}, construct easily decidable left-factored searches) As far as I remember perl does 'allow' a few more complicated (not to say convoluted) cases than pcre does, but you'll better not use them anyway in spamassassin patterns. Stucki -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-75 459| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600| Arnimallee 2-6/14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75454/
Re: Exim+SA=Server Overloaded!
On Tue, Jan 24, 2006 at 02:01:55PM -0200, Eduardo wrote: Hello! Sorry to send another email about the same subject. But my mail server crashed so i couldn't see the answers. I am calling my spamassassin service in SMTP time with some ACL rules in my exim4 configuration file. I start the SA service, start exim4 service As far as I had the same problem, while I work on our new/future exim+amavis+spamassassin-MTA, it stems from 'too many SMTP-connections in parallel' and in my case it forced me after a few tests to move from 'spamassassin by ACL' to 'scanning by pipe / in queue-runners'. After you stop/restart the server, all MTAs which waited for your server to come up, will 'crowd in' to deliver. So your number of parallel incoming connections will be at its maximum. If you do spam-checks 'by ACL' (in the SMTP dialog), you'll need spamd-access for each connection in parallel, which nearly always will either crash the server by overload, or begin to let spam through (or defer connections) by timeout (too many connections to one spamd). Therefore we changed our spam-check to the other method 'by pipe', only checking for spam 'in the queue'. This way, we can tell exim to accept the mail, put it into the queue ('queue_only'), and then set the number of parallel queue-runners exactly to the maximum capacity of the spam-filter. This visibly slows down delivery at a whole, but it should never force the server into thrashing. The ideal setup though would be adapting to the load, scanning by ACL, until too crowded, then switching to 'scan later' in the queue, but I did not yet understand exim THAT far. Stucki (postmaster of math/inf/mi.fu-berlin.de) -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-75 459| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600| Arnimallee 2-6/14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75454/
Re: Gain an extra 25%! (was Purging the Spamassassin Database)
On Mon, Jan 16, 2006 at 04:09:37PM +0100, M.S. Lucas wrote: Could this be made a default with the small size of the id columns and a note in the installation file for the big users? There are more users of SA with less then 65k users then with more. Does it mean '65k is the largest User-Number' (numerical) like in UNIX-UIDs, or really '65k different Users in the Database of Setups and Tokens? The latter really will be relatively seldom. Stucki -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-75 459| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600| Arnimallee 2-6/14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75454/
Re: spamcop.net tactics
On Tue, Nov 22, 2005 at 09:24:28AM -0800, Linda Walsh wrote: That doesn't mean it's a moral, an ethical or respectable reason: Spite is reason enough for most people these days. Michele Neylon:: Blacknight.ie wrote: if your IPs end up in there it's usually for a reason. Before we get into 'arguments' or even 'flamewars': We (@{math,inf,mi}.fu-berlin.de) were hit by the same problem, we also could not find *anything* visible, which had could have put us into their list, and so we had to resort to 'circumventing' the assumed problem. Seemingly 'spamcop' not only counts 'real spam' (explicitly sent to spam-traps) but also counts 'any bounce stranding in their spam-trap' as an 'spammer or open-relay'. So simply by having users use 'vacation' or viruses/worms sending themselves from faked spam-trap-addresses and bouncing at your site, you can be blacklisted for 24 hours (for each?). After reducing 'bounces' by patching 'qmail' with a user check in 'RCPT' of the SMTP-Delivery, making all lists reply to local owner-addresses instead of bouncing, by checking all auto-answering-services to never answer on bounces, bulk-mails and spams, and such, thereby reducing the 'chance' of hitting the spam-traps again, we 'survived' so far without being blocked again (at least without being blocked again for more than the lifetime of mails sent to us). Stucki(postmaster)
Re: SA 3.1 X-headers prepended instead of appended
On Fri, Oct 21, 2005 at 05:19:40PM -0400, Daryl C. W. O'Shea wrote: No but here is what the headers look like: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on domain.com X-Spam-Status: No, score=-2.4 required=5.2 tests=BAYES_00=-2.599, DNS_FROM_AHBL_RHSBL=0.231,HTML_MESSAGE=0.001,SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=no version=3.1.0 From mailnull Thu Oct 20 23:42:08 2005 Return-Path: [EMAIL PROTECTED] Received: from ccm08.roving.com (ccm08.roving.com [63.251.135.109]) etcetcetc. Something in your mail flow is broken. You've got what appears to be an mbox line: From mailnull Thu Oct 20 23:42:08 2005 Assumed that I pipe a Mail directly from my MUA (mutt, pine, ???) through 'spamc' and back into my mailbox. Will I get the same (wrong, because of destroying my 'From ...'-Line ) result? I ask, because I often did that 'piping' via 'procmail' from one to another mailbox; and I can not test it yet, having no 3.1.0 yet :-) Stucki
Re: missed by AV programs
On Mon, Sep 19, 2005 at 03:55:12PM -0400, Rob McEwen (PowerView Systems) wrote: RE: missed by great AV programs (keeping in mind that these I'm mentioned may catch up by the time you read this) Right, in the time since you wrote this, NAI (McAffee) first sent an extra ALERT-Letter, then created/updated an earlier new DAT-File to catch the new Variants (Number 4585)! Stucki -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-75 459| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600| Arnimallee 2-6/14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75454/
Re: Spam with Re[2]: or Re[4]:
On Thu, Sep 15, 2005 at 03:42:42PM -0400, Ronald I. Nutter wrote: # Check for bad Re: tag header BAD_RECOLON_TAG Subject =~ /\bRe:\b/i stopping email with something past the Re:. Is my concern valid and how do I allow the email to get through that has something after Re: ? I assume you want to catch Mails with 'Re:', but 'only without any further contents'? Then you'd need to use '$'(line end) instead of the second '\b'(word end) giving: header BAD_RECOLON_TAG Subject =~ /\bRe:$i This will be DANGEROUS IF mail-programs automatically add 'Re:' to empty Subjects! Then you'll possibly get false positives. OH, by the way, what are the double-quotes for? I think they would be seached for! So the pattern will not work as assumed? In an exim4-filter (it uses PCRE Patterns just like perl) I just wrote/tested a pattern against the 'Re...'-Spams analogous/rewritten to spamassassin: header BAD_RECOLON_TAG Subject =~ /^re:?\s*\[\d+\]:?\s*$/i Which is: re the characters :* the colon (possibly) \s* whitespace (possibly) \[ the left bracket (the typical case) \d+ one ore more digits (from 2 to 111 I saw random numbers) \] the closing bracket (all my spams had it) :? another colon (I really saw those Re:[1] and Re[2]:) \s* possibly more whitespace up to $ the end of the Subject: If anything (except more whitespace) follows the tag this pattern fails. So writing 'Re: [2] something' goes without hitting the rule. Stucki -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-75 459| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600| Arnimallee 2-6/14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75454/
Re: SpamCop listing internal hotmail servers?
On Wed, Sep 07, 2005 at 06:37:54PM -0400, Greg Allen wrote: As a result, she got our server blacklisted several times and affected about 400 users. I went round and round with her telling her to knock it off. You don't even need a user to actively report to spamcop. A normal users simple 'vacation'-Program may be enough! Spamcop sends out 'relay-probes' and 'bounce-probes'. And I was told, if *anything* ist bouncing back to teir testserver (instead of being stopped at the SMTP dialog) the host is assumed to send spam-bounces and goes into the rbl-list for minimally 20 hours. (We had to patch our qmail to get out of this after being rbled for a week). So I'd say spamcop is 'harmful' instead of 'useful'. Stucki -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-75 459| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600| Arnimallee 2-6/14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75454/
Re: OT: sa-learn, interfaced with Cyrus mailboxes
On Sun, Aug 21, 2005 at 01:59:00AM -0400, Forrest Aldrich wrote: I just switched over to Cyrus IMAP - and it didn't occur to me I'd need ... I wonder whom else is using Cyrus IMAP here, and how you may be handling ... I'm on the way from 'qmail'+'UW-Imap' to 'exim'+'cyrus'. (Testing configurations and waiting for our Project of generating an account-database to be used for mail-addressing). I'll let spamassassin add a header, then later an exim-router will switch the 'tagged' mails to the mailbox named username+spam into cyrus. And the '+spam' tells cyrus to put the mail into an username/spam extra Mailbox. (you also can sort by 'sieve'-scripot in cyrus). 'spamassassin' can be run eighter in ACLs (works if you limit the number of concurrent smtp-connections), or via exim queue-runners filtering the mail later (then you need to limit this number of queue-runners). But sorry, 'bayes-learning' is on the agenda for 'later' because we're not yet sure whether we'll keep per-user-data (in SQL -database?) or stay with site-wide-data as now. Yours Stucki (postmaster at mi.fu-berlin.de)
Re: How to use Multilog ?
On Mon, Aug 15, 2005 at 09:09:20AM -0400, Matt Kettler wrote: Perhaps you want something like: spamd -s stdout | multilog {insert multilog options here} This should be exactly what you want. BUT in the manual I only see 'stderr' allowed for '... -s stderr'. If 'stdout' does not work you might need to run /bin/sh -c 'exec spamassassin -s stderr ... ... ... 21' instead of 'spamassassin -s stdout ... ... ...' This way you'll get stderr redirected to stdout by the shell, and multilog gets the output. Multilog (normally started by Bernsteins Daemontools via supervisor) analyses standard input! See: http://cr.yp.to/daemontools/multilog.html Stucki -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-75 459| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600| Arnimallee 2-6/14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75454/
Re: Very long scan times - Finding the culprit rule
On Mon, Aug 15, 2005 at 06:51:48AM -0700, jdow wrote: As soon as you touch swap space you're dead. It's not unusual to see times for processes increase by 10 or even 100 times. (Although about 10 is most common.) Happened to us already twice. Is seems to hit 'just by chance'. I assume it to be a 'bunch of too many large mails' hitting 'complicated rules' (especially rules with 'variably long' patterns like '.{1,30}'), and so bloating up *all* children of spamd in parallel. Normally only one or two are bloated and they 'die soon' being replaced by normally sized ones, but extremely seldon *all* bloat, and the server goes down. Stucki
Re: Very long scan times - Finding the culprit rule
On Mon, Aug 15, 2005 at 07:27:33AM -0700, Loren Wilton wrote: You can stop the first two from being problems by running a manual expire from a cron job every so often and disabling the auto-expire runs. You should have a limit of 250K or so on the mail size to try to keep the third from being a problem. Did that, it works (mostly, see below)... Usually (at least in my experience) the way a rule is written doesn't affect the spamd memory size. Sorry, this is definitely WRONG! If you write (like I once did) some rule containing spurious 'arbitrary long ..*-Constructs', the regex-automaton goes crazy and a mail of 250k may need more than 250MByte memory per child, instead of the currently seen near 80M. Simply 'shortening' the possible evaluation of the expression by replacing '..*' by .{1,N} (with 'N' a 'reasonably short' number) shrunk the problem to manageable sizes! Since then I never again used .+ or .* but ALWAYS limit the length. Stucki