[SAtalk] Re: Consonant and Vowel Pairs or Sequences

2003-10-14 Thread era
single hit in a message.) On the other hand, n-gram search space with n=2 is nicely bounded, whereas if you go to larger values of n, you get a large and sparse search space. But there are methods for coping with that. /* era */ -- formail -s procmail <http://www.iki.fi/era/spam/ >http://ww

[SAtalk] Re: procmail processes rising

2003-10-15 Thread era
that other instances give up on waiting for their turn. If you read what it says in procmailrc.example, this is (apparently) done on purpose this way so as to "keep load down", but it's by no means the only way to skin a cat. Probably the most straightforward change would be to install

[SAtalk] Re: Spamassassin and pre-filtering in .procmailrc

2003-10-19 Thread era
Other than that, if the To: header does not contain exactly one space and the string "[EMAIL PROTECTED]" after the colon, the regex will not match, but you probably knew that already (^; but in any event you might actually want to use [EMAIL PROTECTED] instead). Hope this helps, /* era */

[SAtalk] Re: Consonant and Vowel Pairs or Sequences

2003-10-28 Thread era
.ln ones. Haven't tried that so can't offer advice.) 5. (Optionally, remove some of lm/*.l[mn] -- I think a number of them are probably superfluous in practice.) 6. Dump your new models into lm/ and run lm/build.pl 7. (Maybe hack on SA itself to interpret the results from

[SAtalk] Re: Procmail+Sendmail+SpamAssassin

2003-10-29 Thread era
the spam. See also separate reply in private mail. /* era */ -- The email address era the contact information Just for kicks, imagine at iki dot fi is heavily link on my home page at what it's like to get spam filtered. If you<http://www.iki.fi/era/> 500 pieces of spam

[SAtalk] Re: Consonant and Vowel Pairs or Sequences

2003-10-30 Thread era
ctive/haw.lm sa/lm/ja.iso-2022-jp.ln sa/lm/tr.iso-8859-9.ln The mappings are attached. /* era */ LM/afrikaans.lm lm/af.lm LM/albanian.lm lm/sq.lm LM/amharic-utf.lm lm/am.utf-8.lm LM/arabic-iso8859_6.lm lm/ar.iso-8859-6.lm LM/arabic-windows1256.lmlm/ar.windows-1256.lm LM/armenia

[SAtalk] Re: [RD] Open source is Naughty!!!

2003-10-30 Thread era
E \xC8 \xC9 \xCA \xCB (E grave, acute, circumflex, dieresis) i 1 l \xEC \xED \xEE \xEF I \xCC \xCD \xCE \xCF (same for i and I) ! \xA1 | \xA6 ... and maybe even L This is for ISO-8859-1; I imagine other character sets are less likely to be targeted because they are less likely to be s

[SAtalk] Re: How can I mark all mails with specific words in the subject as spam?

2003-10-30 Thread era
add X-Spam-Status to any messages which don't already have it. :0fhw * ! ^X-Spam-Status: * ^Subject: \/(fr33|h0t|w0m3n) | formail -I "X-Spam-Status: Yes (Subject: contains $MATCH)" Then proceed with delivery just as before. Procmail doesn't care if the destination is an IMA

[SAtalk] Re: [RD] Need eval test for message len?

2003-10-30 Thread era
> The problem is that none of the test options (rawbody, body, full) > provide the ability to test the entire message like this. Try adding an /s: fullT_FOO//is Normally the . in a Perl regular expression cannot match a newline, but adding the /s changes that. Hope this helps,

[SAtalk] Re: tok_put atime uninitialized

2003-10-31 Thread era
me up with a test case which triggers the bug, I imagine it won't be hard to fix. /* era */ -- The email address era the contact information Just for kicks, imagine at iki dot fi is heavily link on my home page at what it's like to get spam filtered. If you<http://www.i

[SAtalk] Re: spamd processing time excessive

2003-11-03 Thread era
Try turning them off and see if that helps. If that helps, you can turn them off one by one to figure out which one exactly is causing excessive delays. This is under one interpretation of what punctuation you left out. /* era */ -- The email address era the contact information Just for kicks

[SAtalk] Re: help

2003-11-03 Thread era
rds you tried obviously do not contain \b:s on both sides of the sequence s-e-x which is what your regex requires. Try reducing the punctuation by some other means, like for example by allowing for only a maximum of three non-word characters between the letters, like so: /\b[Ss]\W{0,3}[Ee]\W{0,3}[X

[SAtalk] Re: X-pvkhgmeblyqcmv header

2003-11-03 Thread era
e text was a dead giveaway? /* era */ -- The email address era the contact information Just for kicks, imagine at iki dot fi is heavily link on my home page at what it's like to get spam filtered. If you <http://www.iki.fi/era/> 500 pieces of spam for want

[SAtalk] Re: help

2003-11-03 Thread era
ex > sussex > asterix > disannex > vasoreflex > HERE sex s::e::x I.e. only the two "sex" and "s::e::x" matched and were printed. /* era */ -- The email address era the contact information Just for kicks, imagine at iki dot fi is heavily link on my h

[SAtalk] Re: Parser.pm version >3.24

2003-11-06 Thread era
he generic technique for using your own libraries with Perl. Look in the Perl documentation for more instructions if you need them. (For completeness, you could also change SpamAssassin to either change @INC at BEGIN{} time or, preferrably, use lib "path/to/your/lib";.) Hope this hel

[SAtalk] Re: OT: Distributed Spamming Engine?

2003-11-06 Thread era
ediately add the new IP address to your local blocklist, and submit it to your DNSBL of preference. /* era */ -- The email address era the contact information Just for kicks, imagine at iki dot fi is heavily link on my home page at what it's like to get spam filtered. If you

[SAtalk] Re: which folder?

2003-11-07 Thread era
on't like the way it is, create your own Debian package, or file a bug and hope that the maintainer will change it to your liking in a future version. /* era */ -- The email address era the contact information Just for kicks, imagine at iki dot fi is heavily link on my home page at

[SAtalk] Re: 'random' character sets

2003-11-07 Thread era
be "gpl" (think GNU General Public Licence :-) and so this sequence alone would trip the score down below zero. (Real scoring systems tend to add scores for "good" and deduct for "bad" and have a gray area where you don't change the score because you don't kno

[SAtalk] Re: gibberish hook?

2003-11-27 Thread era
maybe whitespace too) with a single punctuation character ... or even strip out all punctuation and whitespace entirely and then look at the resulting n-grams. More generally, I believe it would make sense to define a handful of different "normal forms" for different classes of rules.

[SAtalk] Re: More .procmailrc

2003-12-04 Thread era
der Hope this helps, /* era */ -- The email address era the contact information Just for kicks, imagine at iki dot fi is heavily link on my home page at what it's like to get spam filtered. If you <http://www.iki.fi/era/> 500 pieces of spam for want to reach me, see

[SAtalk] Re: paris hilton

2003-12-04 Thread era
mings to back it up, but probably it will be slightly faster as well as more human-readable if you normalize the expressions to use classes wherever you can. Thanks for a useful tool, BTW! I wish I had thought of setting that up. /* era */ -- The email address era the contact informat

[SAtalk] Re: Rule Length

2003-12-05 Thread era
iding factor :-) /* era */ -- The email address era the contact information Just for kicks, imagine at iki dot fi is heavily link on my home page at what it's like to get spam filtered. If you<http://www.iki.fi/era/> 500 pieces of spam for want to reach me, see instead

[SAtalk] Re: howto consider as spam any russian mail ?

2003-12-05 Thread era
ation for details. /* era */ -- The email address era the contact information Just for kicks, imagine at iki dot fi is heavily link on my home page at what it's like to get spam filtered. If you<http://www.iki.fi/era/> 500 pieces of spam for want to reach

[SAtalk] Re: Customise the default alert message

2003-12-05 Thread era
can change the content of that message ? Yes. Conveniently, it's described in the documentation. <http://spamassassin.org/doc/Mail_SpamAssassin_Conf.html>, look for "clear_report_template" and "report", or look in 10_misc.cf for an example. /* era */ -- The email address

[SAtalk] Re: BUGGY_CGI

2003-12-05 Thread era
d probably be tagged as duplicates). /* era */ -- The email address era the contact information Just for kicks, imagine at iki dot fi is heavily link on my home page at what it's like to get spam filtered. If you<http://www.iki.fi/era/> 500 pieces of spam for want to reach me,

[SAtalk] Re: More .procmailrc

2003-12-08 Thread era
m >> works for me. > I'm not normally much good at procmail scripts, but wouldn't: > * ^X-Spam-Level: \*{9,} > Be a bit shorter? Yes. But it would also not work. Procmail doesn't grok x{numbers} /* era */ -- The email address era the contact information

[SAtalk] Re: bigevil.cf + rsync?

2003-12-08 Thread era
ain 7bit ASCII (although that would be kind of pointless). If Mailman dumps the base64 straight into a digest with a different Content-Transfer-Encoding then that's definitely a bug in Mailman. > I've just changed my digest option for the list from plain-text to > MIME - I'll se

[SAtalk] Re: Mailer daemon mail in whitelist

2003-12-08 Thread era
host's canonical name would allow you to make a clean distinction between these. It would seem kind of pointless to call your DNS server "mail" anyhow. I'm not enough of a DNS guru to tell you how exactly this should be set up properly. Anybody? /* era */ -- The email ad

[SAtalk] Re: New spammer trick (HTML tables)?

2003-12-08 Thread era
less you actually +want+ your ad to be ugly [1])? The example Martin posted is in Arial, though, and looks quite different from the old example on the TSC site. /* era */ [1] Some of the ad posters you see on town these days would seem to confirm that this is a prevalent advertising trend ..

[SAtalk] Re: One persistent spammer defeating SA.

2003-12-08 Thread era
examples was that the same address would be repeated twice. Also the examples are in the .com domain so the restriction to .org/.net is wrong. I'd go with simply: /^Reply-to:\s+(\S+)\s+\1/i I'm guessing the multi-line appearance was simply due to word wraps in Robert's mai

[SAtalk] Re: filtering spam tagged email before hitting exchange 2000

2003-12-09 Thread era
ndmail, by all means listen to her. It's not like Postfix and Sendmail are the only two alternatives. In particular, Qmail and Exim are quite popular, and Courier tends to get mentioned as a commercial alternative. /* era */ -- The email address era the contact information Just for

[SAtalk] Re: another SpamAssassin ???

2003-12-09 Thread era
On Tue, 9 Dec 2003 09:50:53 -0600, Terry Shows <[EMAIL PROTECTED]> posted to spamassassin-talk: > Has anyone considered changing the spelling? Just changing the last > i to an e (spamassassen) just might be enough. How professionol. /* era */ -- The email address era

[SAtalk] Re: Habeas test

2003-12-09 Thread era
AND operations, you could tell it to not look for lines 2 and 3 if line 1 isn't there, but that's apparently not doable at the moment.) /* era */ -- The email address era the contact information Just for kicks, imagine at iki dot fi is heavily link on my home page at what it'

[SAtalk] Re: Bounce / reject e-mail using Postfix with scores above n

2003-12-09 Thread era
7;re running Postfix, you'd like to run your own custom SA rather than rely on some upstream entity to run it for you. If you have the luxury of an upstream mail relay whom you trust and who runs SA on all your mail before relaying it to you, then ... what's the point of running Postfix?

[SAtalk] Re: Content Analysis

2003-12-09 Thread era
ons are legit (and for all I know, they probably are -- confirmed opt-in and all that, yes? Yes?), any genuinely useful response could just as easily be read by a spammer. We don't want them to get any ideas, do we? /* era */ -- The email address era the contact information Just for kick

[SAtalk] Re: [RD] Help with Subject rule

2003-12-09 Thread era
sis: header T_SBJT_ENC Subject:raw =~ /=\?(us\-ascii|iso\-8859\-[1-9][0-9]?|windows\-1251)\?b\?/i describe T_SBJT_ENC Subject uses RFC2047 base64 encoding scoreT_SBJT_ENC .01 I guess some variants of ISO-8859 would legitimately use base64 most of the time, but unless you're using

[SAtalk] Re: RHSBL Usage

2003-12-10 Thread era
based DNS blocklists). Look at how e.g. dsn.rfc-ignorant.org is being invoked in 20_dnsbl_tests.cf /* era */ -- The email address era the contact information Just for kicks, imagine at iki dot fi is heavily link on my home page at what it's like to get spam filtered. If you <

[SAtalk] Re: One persistent spammer defeating SA.

2003-12-10 Thread era
Ma Belle <[EMAIL PROTECTED]> Your mileage not included when stirred, etc. >> I'm guessing the multi-line appearance was simply due to word wraps in >> Robert's mail program, and not actually there in the original headers. Oh, and even if the header was spread over

[SAtalk] Re: One persistent spammer defeating SA.

2003-12-10 Thread era
Homestore | Everything Home > <[EMAIL PROTECTED]> For many of these, one can observe that the "user name" in the From: header often also occurs in the Subject line. This could be a useful rule pattern, although there are bound to be false positives, so the scor

[SAtalk] Re: Log Help!

2003-12-10 Thread era
t /etc/default/spamd.conf to set it to use a different syslog facility (or even not log through syslog but use something else instead). I can't think of a scenario where it would make sense to run 2.20 so you should try to find a good backport or build from 2.60 yourself. /* era */ -- The ema

[SAtalk] Re: rule match counting

2003-12-10 Thread era
S can handle). 16,000 SpamAssassin rules doesn't sound very manageable in any event so perhaps you should at least think about other ways to handle this. /* era */ -- The email address era the contact information Just for kicks, imagine at iki dot fi is heavily link on my home page at

[SAtalk] Re: spamassassin procmail

2003-12-10 Thread era
sensible to me. Have you somehow managed to redefine SENDMAIL to /var/spool/mail/il? Don't touch Procmail variables you don't understand (other than SHELL=/bin/sh if you're on a csh-infested system and are experiencing problems). /* era */ -- The email address era the contact

[SAtalk] Re: [RD] raw/rare/folded/plain/alphed body/subject rende ring streams

2003-12-10 Thread era
be really neat would be to have an automaton which recognizes all possible variants at the same time. The obfu script (look back in the archives for a few days) is a nice start, but it could obviously be improved. In the grand scheme of things, I imagine you would have to use another formalism ins

[SAtalk] Re: new user of spamassassin

2003-12-11 Thread era
ies. Here's an example: $ cat >/tmp/some_email_file·txt From: [EMAIL PROTECTED] Subject: furr33 w4r3z 1 dfgjopwetq Date: Thu, 31 Feb 2038 24:00:00 -0700 http://www.hotmail.com:[EMAIL PROTECTED]/%65%72%61/ ^D $ spamassass

[SAtalk] Re: Bayes Corpus Project

2003-12-11 Thread era
contribution -- I don't know of anybody who actually uses NANAS for anything real. Paul Judge started the spam archive project <http://spamarchive.org> roughly a year ago but results so far are less than startling, and they don't seem to be responding to email. (Ah, they h

[SAtalk] Re: Will Recipe work to skip certian messages?

2003-12-11 Thread era
Ignoring point #2, try this: :0fw * ! ^From.*@([^ <>.]+\.)*rose-hulman\.edu\> | spamc -options ... If you want to add more conditions, you can do that: :0fw * ! ^From.*@([^ <>.]+\.)*rose-hulman\.edu\> * ! ^From: <[EMAIL PROTECTED]> | spamc -options ... Oh, and finally

[SAtalk] Re: One persistent spammer defeating SA.

2003-12-12 Thread era
ords TODO: canonicalize to lower case? my %from_words = map { $_ => 1 } @from_words; my @best_of_both = grep { defined $from_words{$_} } @subj_words; print "Found in both From and Subject: \"", join ('", "', @best_of_both), "\"\n";

[SAtalk] Re: Will Recipe work to skip certian messages?

2003-12-12 Thread era
he second is a tab. /* era */ -- The email address era the contact information Just for kicks, imagine at iki dot fi is heavily link on my home page at what it's like to get spam filtered. If you <http://www.iki.fi/era/> 500 pieces of spam for want to reach me, see i

[SAtalk] Re: SA Long Process Times / Memory Utilization (Possible Bug?)

2003-12-12 Thread era
nage to process large messages in SA without problems? I'd basically expect any 1Mb-message to effectively hang my machine. (Haven't tested with 2.61 though.) /* era */ -- The email address era the contact information Just for kicks, imagine at iki dot fi is heavily link on my home

[SAtalk] Re: RD: "justified" HTML

2003-12-15 Thread era
want to change it slightly to avoid the braces. Look in the archives for recent postings from Scott A Crosby. <http://search.gmane.org/search.php?group=gmane.mail.spam.spamassassin.general&query=crosby> /* era */ -- The email address era the contact information Just for kicks, imagi

[SAtalk] Re: daily / weekly reports

2003-12-15 Thread era
you'd run this out of a cron job or whatever to periodically summarize recent activity. I'm not familiar with enough of these tools to give an informed recommendation but Analog seems to get a lot of press. See also <http://dmoz.org/Computers/Software/Internet/Site_Management/Log_Anal

[SAtalk] Re: DB_File problem prevents Bayes working

2003-12-15 Thread era
ally means you need to install the -dev version of whatever Berkeley libxxx.deb you already have installed. Probably something vaguely similar can be said of Red Hat/Mandrake/SuSE/what have you. /* era */ (If you're not on Debian, you should be ;^) -- The email address era the contact inf

[SAtalk] Re: SpamAssassin 2.61 released!

2003-12-15 Thread era
OH I never install directly from CPAN (but use dh-make-perl to create my own .deb from a CPAN module). /* era */ -- The email address era the contact information Just for kicks, imagine at iki dot fi is heavily link on my home page at what it's like to get spam filtered. I

[SAtalk] Re: Clever spam (first of many, I'm afraid...)

2003-12-15 Thread era
ple of how to reimplement the regex engine in Perl (in Perl, sic) which could surely be a good starting point ... here: <http://perl.plover.com/Regex/> /* era */ -- The email address era the contact information Just for kicks, imagine at iki dot fi is heavily link on my home page a

[SAtalk] Re: Bayes Corpus Project

2003-12-15 Thread era
Right. I'm sure those who could consider contributing wouldn't mind doing this sort of auditing of any sizable collection of tokens. /* era */ -- The email address era the contact information Just for kicks, imagine at iki dot fi is heavily link on my home page at what it&#