single hit in a message.)
On the other hand, n-gram search space with n=2 is nicely bounded,
whereas if you go to larger values of n, you get a large and sparse
search space. But there are methods for coping with that.
/* era */
--
formail -s procmail <http://www.iki.fi/era/spam/ >http://ww
that other instances give up on
waiting for their turn.
If you read what it says in procmailrc.example, this is (apparently)
done on purpose this way so as to "keep load down", but it's by no
means the only way to skin a cat.
Probably the most straightforward change would be to install
Other than that, if the To: header does not contain exactly one space
and the string "[EMAIL PROTECTED]" after the colon, the regex will
not match, but you probably knew that already (^; but in any event you
might actually want to use [EMAIL PROTECTED] instead).
Hope this helps,
/* era */
.ln ones. Haven't tried
that so can't offer advice.)
5. (Optionally, remove some of lm/*.l[mn] -- I think a number of them
are probably superfluous in practice.)
6. Dump your new models into lm/ and run lm/build.pl
7. (Maybe hack on SA itself to interpret the results from
the spam.
See also separate reply in private mail.
/* era */
--
The email address era the contact information Just for kicks, imagine
at iki dot fi is heavily link on my home page at what it's like to get
spam filtered. If you<http://www.iki.fi/era/> 500 pieces of spam
ctive/haw.lm
sa/lm/ja.iso-2022-jp.ln
sa/lm/tr.iso-8859-9.ln
The mappings are attached.
/* era */
LM/afrikaans.lm lm/af.lm
LM/albanian.lm lm/sq.lm
LM/amharic-utf.lm lm/am.utf-8.lm
LM/arabic-iso8859_6.lm lm/ar.iso-8859-6.lm
LM/arabic-windows1256.lmlm/ar.windows-1256.lm
LM/armenia
E \xC8 \xC9 \xCA \xCB (E grave, acute, circumflex, dieresis)
i 1 l \xEC \xED \xEE \xEF I \xCC \xCD \xCE \xCF (same for i and I)
! \xA1 | \xA6 ... and maybe even L
This is for ISO-8859-1; I imagine other character sets are less likely
to be targeted because they are less likely to be s
add
X-Spam-Status to any messages which don't already have it.
:0fhw
* ! ^X-Spam-Status:
* ^Subject: \/(fr33|h0t|w0m3n)
| formail -I "X-Spam-Status: Yes (Subject: contains $MATCH)"
Then proceed with delivery just as before. Procmail doesn't care if
the destination is an IMA
> The problem is that none of the test options (rawbody, body, full)
> provide the ability to test the entire message like this.
Try adding an /s:
fullT_FOO//is
Normally the . in a Perl regular expression cannot match a newline,
but adding the /s changes that.
Hope this helps,
me up with a test case which triggers the bug, I imagine
it won't be hard to fix.
/* era */
--
The email address era the contact information Just for kicks, imagine
at iki dot fi is heavily link on my home page at what it's like to get
spam filtered. If you<http://www.i
Try turning them off and see if that
helps. If that helps, you can turn them off one by one to figure out
which one exactly is causing excessive delays.
This is under one interpretation of what punctuation you left out.
/* era */
--
The email address era the contact information Just for kicks
rds you tried
obviously do not contain \b:s on both sides of the sequence s-e-x
which is what your regex requires.
Try reducing the punctuation by some other means, like for example by
allowing for only a maximum of three non-word characters between the
letters, like so: /\b[Ss]\W{0,3}[Ee]\W{0,3}[X
e text was a dead giveaway?
/* era */
--
The email address era the contact information Just for kicks, imagine
at iki dot fi is heavily link on my home page at what it's like to get
spam filtered. If you <http://www.iki.fi/era/> 500 pieces of spam for
want
ex
> sussex
> asterix
> disannex
> vasoreflex
> HERE
sex
s::e::x
I.e. only the two "sex" and "s::e::x" matched and were printed.
/* era */
--
The email address era the contact information Just for kicks, imagine
at iki dot fi is heavily link on my h
he generic technique for using your own libraries with Perl.
Look in the Perl documentation for more instructions if you need them.
(For completeness, you could also change SpamAssassin to either change
@INC at BEGIN{} time or, preferrably, use lib "path/to/your/lib";.)
Hope this hel
ediately add the new IP address to your
local blocklist, and submit it to your DNSBL of preference.
/* era */
--
The email address era the contact information Just for kicks, imagine
at iki dot fi is heavily link on my home page at what it's like to get
spam filtered. If you
on't like the
way it is, create your own Debian package, or file a bug and hope that
the maintainer will change it to your liking in a future version.
/* era */
--
The email address era the contact information Just for kicks, imagine
at iki dot fi is heavily link on my home page at
be "gpl" (think GNU General Public Licence :-) and so
this sequence alone would trip the score down below zero.
(Real scoring systems tend to add scores for "good" and deduct for
"bad" and have a gray area where you don't change the score because
you don't kno
maybe whitespace too) with a single
punctuation character ... or even strip out all punctuation and
whitespace entirely and then look at the resulting n-grams.
More generally, I believe it would make sense to define a handful of
different "normal forms" for different classes of rules.
der
Hope this helps,
/* era */
--
The email address era the contact information Just for kicks, imagine
at iki dot fi is heavily link on my home page at what it's like to get
spam filtered. If you <http://www.iki.fi/era/> 500 pieces of spam for
want to reach me, see
mings to back it up, but probably it will
be slightly faster as well as more human-readable if you normalize the
expressions to use classes wherever you can.
Thanks for a useful tool, BTW! I wish I had thought of setting that up.
/* era */
--
The email address era the contact informat
iding factor :-)
/* era */
--
The email address era the contact information Just for kicks, imagine
at iki dot fi is heavily link on my home page at what it's like to get
spam filtered. If you<http://www.iki.fi/era/> 500 pieces of spam for
want to reach me, see instead
ation for details.
/* era */
--
The email address era the contact information Just for kicks, imagine
at iki dot fi is heavily link on my home page at what it's like to get
spam filtered. If you<http://www.iki.fi/era/> 500 pieces of spam for
want to reach
can change the content of that message ?
Yes. Conveniently, it's described in the documentation.
<http://spamassassin.org/doc/Mail_SpamAssassin_Conf.html>, look for
"clear_report_template" and "report", or look in 10_misc.cf for an
example.
/* era */
--
The email address
d
probably be tagged as duplicates).
/* era */
--
The email address era the contact information Just for kicks, imagine
at iki dot fi is heavily link on my home page at what it's like to get
spam filtered. If you<http://www.iki.fi/era/> 500 pieces of spam for
want to reach me,
m
>> works for me.
> I'm not normally much good at procmail scripts, but wouldn't:
> * ^X-Spam-Level: \*{9,}
> Be a bit shorter?
Yes. But it would also not work. Procmail doesn't grok x{numbers}
/* era */
--
The email address era the contact information
ain 7bit ASCII (although that would be kind of pointless). If
Mailman dumps the base64 straight into a digest with a different
Content-Transfer-Encoding then that's definitely a bug in Mailman.
> I've just changed my digest option for the list from plain-text to
> MIME - I'll se
host's canonical name would allow
you to make a clean distinction between these.
It would seem kind of pointless to call your DNS server "mail" anyhow.
I'm not enough of a DNS guru to tell you how exactly this should be
set up properly. Anybody?
/* era */
--
The email ad
less you actually
+want+ your ad to be ugly [1])?
The example Martin posted is in Arial, though, and looks quite
different from the old example on the TSC site.
/* era */
[1] Some of the ad posters you see on town these days would seem to
confirm that this is a prevalent advertising trend ..
examples was that the same address would be repeated twice.
Also the examples are in the .com domain so the restriction to .org/.net
is wrong.
I'd go with simply:
/^Reply-to:\s+(\S+)\s+\1/i
I'm guessing the multi-line appearance was simply due to word wraps in
Robert's mai
ndmail, by all means listen to her.
It's not like Postfix and Sendmail are the only two alternatives. In
particular, Qmail and Exim are quite popular, and Courier tends to get
mentioned as a commercial alternative.
/* era */
--
The email address era the contact information Just for
On Tue, 9 Dec 2003 09:50:53 -0600, Terry Shows <[EMAIL PROTECTED]> posted to
spamassassin-talk:
> Has anyone considered changing the spelling? Just changing the last
> i to an e (spamassassen) just might be enough.
How professionol.
/* era */
--
The email address era
AND operations, you
could tell it to not look for lines 2 and 3 if line 1 isn't there, but
that's apparently not doable at the moment.)
/* era */
--
The email address era the contact information Just for kicks, imagine
at iki dot fi is heavily link on my home page at what it'
7;re running Postfix, you'd like to run your own
custom SA rather than rely on some upstream entity to run it for you.
If you have the luxury of an upstream mail relay whom you trust and
who runs SA on all your mail before relaying it to you, then ...
what's the point of running Postfix?
ons are legit (and for all I know, they
probably are -- confirmed opt-in and all that, yes? Yes?), any
genuinely useful response could just as easily be read by a spammer.
We don't want them to get any ideas, do we?
/* era */
--
The email address era the contact information Just for kick
sis:
header T_SBJT_ENC Subject:raw =~
/=\?(us\-ascii|iso\-8859\-[1-9][0-9]?|windows\-1251)\?b\?/i
describe T_SBJT_ENC Subject uses RFC2047 base64 encoding
scoreT_SBJT_ENC .01
I guess some variants of ISO-8859 would legitimately use base64 most
of the time, but unless you're using
based DNS blocklists).
Look at how e.g. dsn.rfc-ignorant.org is being invoked in 20_dnsbl_tests.cf
/* era */
--
The email address era the contact information Just for kicks, imagine
at iki dot fi is heavily link on my home page at what it's like to get
spam filtered. If you <
Ma Belle <[EMAIL PROTECTED]>
Your mileage not included when stirred, etc.
>> I'm guessing the multi-line appearance was simply due to word wraps in
>> Robert's mail program, and not actually there in the original headers.
Oh, and even if the header was spread over
Homestore | Everything Home
> <[EMAIL PROTECTED]>
For many of these, one can observe that the "user name" in the From:
header often also occurs in the Subject line. This could be a useful
rule pattern, although there are bound to be false positives, so the
scor
t /etc/default/spamd.conf to set it
to use a different syslog facility (or even not log through syslog but
use something else instead).
I can't think of a scenario where it would make sense to run 2.20 so
you should try to find a good backport or build from 2.60 yourself.
/* era */
--
The ema
S can handle).
16,000 SpamAssassin rules doesn't sound very manageable in any event
so perhaps you should at least think about other ways to handle this.
/* era */
--
The email address era the contact information Just for kicks, imagine
at iki dot fi is heavily link on my home page at
sensible to me. Have you somehow managed to redefine
SENDMAIL to /var/spool/mail/il? Don't touch Procmail variables you
don't understand (other than SHELL=/bin/sh if you're on a csh-infested
system and are experiencing problems).
/* era */
--
The email address era the contact
be really neat would
be to have an automaton which recognizes all possible variants at the
same time. The obfu script (look back in the archives for a few days)
is a nice start, but it could obviously be improved. In the grand
scheme of things, I imagine you would have to use another formalism
ins
ies.
Here's an example:
$ cat >/tmp/some_email_file·txt
From: [EMAIL PROTECTED]
Subject: furr33 w4r3z 1 dfgjopwetq
Date: Thu, 31 Feb 2038 24:00:00 -0700
http://www.hotmail.com:[EMAIL PROTECTED]/%65%72%61/
^D
$ spamassass
contribution -- I don't know of
anybody who actually uses NANAS for anything real.
Paul Judge started the spam archive project <http://spamarchive.org>
roughly a year ago but results so far are less than startling, and
they don't seem to be responding to email. (Ah, they h
Ignoring point #2, try this:
:0fw
* ! ^From.*@([^ <>.]+\.)*rose-hulman\.edu\>
| spamc -options ...
If you want to add more conditions, you can do that:
:0fw
* ! ^From.*@([^ <>.]+\.)*rose-hulman\.edu\>
* ! ^From: <[EMAIL PROTECTED]>
| spamc -options ...
Oh, and finally
ords
TODO: canonicalize to lower case?
my %from_words = map { $_ => 1 } @from_words;
my @best_of_both = grep { defined $from_words{$_} } @subj_words;
print "Found in both From and Subject: \"",
join ('", "', @best_of_both), "\"\n";
he second
is a tab.
/* era */
--
The email address era the contact information Just for kicks, imagine
at iki dot fi is heavily link on my home page at what it's like to get
spam filtered. If you <http://www.iki.fi/era/> 500 pieces of spam for
want to reach me, see i
nage to process large messages in SA without
problems? I'd basically expect any 1Mb-message to effectively hang my
machine. (Haven't tested with 2.61 though.)
/* era */
--
The email address era the contact information Just for kicks, imagine
at iki dot fi is heavily link on my home
want to change it slightly to avoid the braces. Look
in the archives for recent postings from Scott A Crosby.
<http://search.gmane.org/search.php?group=gmane.mail.spam.spamassassin.general&query=crosby>
/* era */
--
The email address era the contact information Just for kicks, imagi
you'd run this out of a cron job
or whatever to periodically summarize recent activity.
I'm not familiar with enough of these tools to give an informed
recommendation but Analog seems to get a lot of press. See also
<http://dmoz.org/Computers/Software/Internet/Site_Management/Log_Anal
ally means you need to install the -dev version of whatever
Berkeley libxxx.deb you already have installed. Probably something
vaguely similar can be said of Red Hat/Mandrake/SuSE/what have you.
/* era */
(If you're not on Debian, you should be ;^)
--
The email address era the contact inf
OH I never install directly from CPAN (but
use dh-make-perl to create my own .deb from a CPAN module).
/* era */
--
The email address era the contact information Just for kicks, imagine
at iki dot fi is heavily link on my home page at what it's like to get
spam filtered. I
ple of how to
reimplement the regex engine in Perl (in Perl, sic) which could surely
be a good starting point ... here:
<http://perl.plover.com/Regex/>
/* era */
--
The email address era the contact information Just for kicks, imagine
at iki dot fi is heavily link on my home page a
Right. I'm sure those who could consider contributing wouldn't mind
doing this sort of auditing of any sizable collection of tokens.
/* era */
--
The email address era the contact information Just for kicks, imagine
at iki dot fi is heavily link on my home page at what it
55 matches
Mail list logo