Re: SA 3.3.1 bug or mistake in my custom rules?

2011-10-13 Thread Lawrence @ Rogers

On 13/10/2011 1:45 AM, Karsten Bräckelmann wrote:

On Wed, 2011-10-12 at 23:32 -0230, Lawrence @ Rogers wrote:

Starting today, I've noticed that 3 of my rules fire in situations where
they should not. They are simple meta rules that count how many rule,
against certain URIBL rules, fire. They then raise the spam score.
meta LW_URIBL_LO ((URIBL_BLACK + URIBL_RED + URIBL_SBL + URIBL_AB_SURBL
+ URIBL_JP_SURBL + URIBL_OB_SURBL + URIBL_PH_SURBL + URIBL_SC_SURBL +
URIBL_WS_SURBL) == 1)

URIBL_RHS_DOB is missing here.


meta LW_URIBL_MD ((URIBL_BLACK + URIBL_RED + URIBL_SBL + URIBL_AB_SURBL
+ URIBL_JP_SURBL + URIBL_OB_SURBL + URIBL_PH_SURBL + URIBL_SC_SURBL +
URIBL_WS_SURBL + URIBL_RHS_DOB) == 2)

meta LW_URIBL_HI [...]
I'm receiving e-mails where both LW_URIBL_LO and LW_URIBL_MD are fired.

That would happen, if URIBL_RHS_DOB and another rule of the LO meta
variant are hit.


The only rule in the message that could trigger them are URIBL_DBL_SPAM
and URIBL_RHS_DOB

DBL is not part of the meta, so I don't get this. Or did you actually
mean to communicate, these are the only URI DNSBL rules triggered? That
would be even more confusing -- a real Status header copied would have
helped...

The above rules are *verbatim*, copy and paste from your rc files, with
no human messing around, right?


In a related note, as per the M::SA::Conf docs for meta rules -- The
value of a hit meta test is that of its arithmetic expression. The value
of a hit eval test is that returned by its method.

The latter means, this style of adding rules is not necessarily safe,
since these are eval tests. However, in this case, I believe they all
should be set to 1 in case of a match.

The former means, you could eliminate such issues due to inconsistencies
and code duplication, by using an additional meta level:

   meta __VALUE  FOO + BAR

   meta ONE  __VALUE == 1
   meta TWO  __VALUE == 2



Hi Karsten,

I don't know how I overlooked that omission in the first rule :)

Thanks, it's working as expected now.

I designed the rules using the information available on 
http://wiki.apache.org/spamassassin/WritingRules


Under Meta rules

It has this rule

meta LOCAL_MULTIPLE_TESTS (( __LOCAL_TEST1 + __LOCAL_TEST2 + 
__LOCAL_TEST3)  1)


The value of the sub rule in an arithmetic meta rule is the true/false 
(1/0) value for whether or not the rule hit. 


If this is incorrect, perhaps this documentation should be updated.

Regards,
Lawrence


Re: Results of eval and meta rules

2011-10-13 Thread Lawrence @ Rogers

On 13/10/2011 9:01 PM, Karsten Bräckelmann wrote:

On Thu, 2011-10-13 at 03:57 -0230, Lawrence @ Rogers wrote:

On 13/10/2011 1:45 AM, Karsten Bräckelmann wrote:

In a related note, as per the M::SA::Conf docs for meta rules -- The
value of a hit meta test is that of its arithmetic expression. The value
of a hit eval test is that returned by its method.

The latter means, this style of adding rules is not necessarily safe,
since these are eval tests. However, in this case, I believe they all
should be set to 1 in case of a match.

The former means, you could eliminate such issues due to inconsistencies
and code duplication, by using an additional meta level:

meta __VALUE  FOO + BAR

meta ONE  __VALUE == 1
meta TWO  __VALUE == 2

I don't know how I overlooked that omission in the first rule :)

In particular, since these rules are not exactly complex, and seeing
them side by side... ;)

Anyway, that's why I also included a way, to prevent this from ever
happening. Define once, don't duplicate code, simply by adding another
meta rule level.



Thanks, it's working as expected now.

I designed the rules using the information available on
http://wiki.apache.org/spamassassin/WritingRules

Under Meta rules

It has this rule

meta LOCAL_MULTIPLE_TESTS (( __LOCAL_TEST1 + __LOCAL_TEST2 +
__LOCAL_TEST3)  1)

The value of the sub rule in an arithmetic meta rule is the true/false
(1/0) value for whether or not the rule hit. 

If this is incorrect, perhaps this documentation should be updated.

Well, incorrect... Put into easy terms, I'd say. It's intended as a
quick-start tutorial. After that, I seriously recommend having a look
into the full documentation.

There are two points here:

   The value of a hit eval test is that returned by its method.

Which, I believe (without looking at the code) is generally the boolean
value as mentioned in the wiki. Including the URI DNSBL eval rules you
are using.

However, and that was mostly meant as a heads-up, it MAY NOT hold true
always, since eval rules MAY return something else.

   The value of a hit meta test is that of its arithmetic expression.

This also most likely is generally the boolean value. Definitely in the
example given, since the (non-boolean!) sub-result of the arithmetic
expression then is compared against a number -- either true, of false.

The trick is, to keep the duplicated arithmetic sub-expression in a
single meta, and use that result for your comparison. Using the
supported, though generally not used feature for your benefit.



Thanks for the info :)

Regards,
Lawrence


SA 3.3.1 bug or mistake in my custom rules?

2011-10-12 Thread Lawrence @ Rogers

Hi,

I am using SpamAssassin 3.3.1 (cPanel) with latest rule updates. 
Starting today, I've noticed that 3 of my rules fire in situations where 
they should not. They are simple meta rules that count how many rule, 
against certain URIBL rules, fire. They then raise the spam score.


They are as follows

-

meta LW_URIBL_LO ((URIBL_BLACK + URIBL_RED + URIBL_SBL + URIBL_AB_SURBL 
+ URIBL_JP_SURBL + URIBL_OB_SURBL + URIBL_PH_SURBL + URIBL_SC_SURBL + 
URIBL_WS_SURBL) == 1)


meta LW_URIBL_MD ((URIBL_BLACK + URIBL_RED + URIBL_SBL + URIBL_AB_SURBL 
+ URIBL_JP_SURBL + URIBL_OB_SURBL + URIBL_PH_SURBL + URIBL_SC_SURBL + 
URIBL_WS_SURBL + URIBL_RHS_DOB) == 2)


meta LW_URIBL_HI ((URIBL_BLACK + URIBL_RED + URIBL_SBL + URIBL_AB_SURBL 
+ URIBL_JP_SURBL + URIBL_OB_SURBL + URIBL_PH_SURBL + URIBL_SC_SURBL + 
URIBL_WS_SURBL + URIBL_RHS_DOB)  2)


score LW_URIBL_LO 1.5
tflags LW_URIBL_LO net

score LW_URIBL_MD 3.0
tflags LW_URIBL_MD net

score LW_URIBL_HI 4.5
tflags LW_URIBL_HI net

-

I'm receiving e-mails where both LW_URIBL_LO and LW_URIBL_MD are fired. 
The only rule in the message that could trigger them are URIBL_DBL_SPAM 
and URIBL_RHS_DOB


Any thoughts?

Regards,
Lawrence


Re: sa users list down due to irene?

2011-08-29 Thread Lawrence @ Rogers

On 29/08/2011 4:03 PM, Michael Scheidell wrote:

On 8/29/11 2:13 PM, David F. Skoll wrote:
Is anyone even maintaining qmail any more?  I thought the project was 
dead.

  I wish it would just go
away.)


I wish ASF  would stop using it for its mailing lists, or just apply 
all the patches that seem to be needed to make it 'play nice' with the 
rest of the world.
(ok, I don't care if it plays nice with aol/hotmail/etc, you get free 
email? you get what you pay for).


What about Yahoo, which is not only freemail, but also used by the 
biggest ISP here in Canada (Rogers)?


Unfortunately, talking about RFC compliance is all well and good, but 
not everybody will be. It's like HTML and CSS support in browsers. 
Everyone has a different level of compliance. Some are average, and some 
are pretty spot on (Firefox and KTML-based tech such as webkit).


- Lawrence


Re: Theories on blocking OUTGOING spam

2011-08-19 Thread Lawrence @ Rogers

On 16/08/2011 7:32 PM, Marc Perkel wrote:


When email is coming fast from an account I start tracking the number 
of bad recipients and if the number of bad recipients is high it's 
probably spam.


I also have restrictions on valid domains the from has to match, I 
look for URIBLs, high SA scores, etc.


Just curious what others do to detect outgoing spam.

I use Exim for the MTA because it has the power to do the tricks I 
need done. 
Exim + MailScanner does the job fine here. Just configure MailScanner to 
scan outgoing e-mail as well using SpamAssassin and discard anything 
over a certain point (I usually say 7.0 as that seems to be the point 
where any FPs end).


- Lawrence


Re: exclude from freemail_domains

2011-06-29 Thread Lawrence @ Rogers

On 29/06/2011 8:37 AM, Tom Kinghorn wrote:

Good afternoon list.

is there a way to exclude a domain from the fremail_domain checks in 
the local.cf?


I do not want to have to manually remove our domain (which does not 
offer freemail) from the 20_freemail_domains.cf file every time we 
update.


refer to bug 6542

http://old.nabble.com/-Bug-6542--New%3A-Freemail_domains.cf-FP-td30899122.html 



our own mail is matching:

FREEMAIL_FROM
FROM_MISSP_FREEMAIL

thanks in advance.

Tom



Hi Tom,

FREEMAIL_FROM by itself is harmless. However, if your e-mail is also 
hitting FROM_MISSP_FREEMAI, it means it has malformed From: headers.


Something like:

From: Lawrence Williamslawrencewilli...@nl.rogers.com

Notice the missing space between the quotes ending my name, and the 
beginning of the e-mail address.


A proper From header would be:

From: lawrencewilli...@nl.rogers.com
or
From: Lawrence Williams lawrencewilli...@nl.rogers.com

This is most likely a bug in the e-mail system you are using.

Regards,
Lawrence


Re: [Q] Writing rule for career opportunity type messages

2011-06-29 Thread Lawrence @ Rogers

On 29/06/2011 3:59 PM, JKL wrote:

On 06/29/2011 04:59 PM, John Hardin wrote:

On Wed, 29 Jun 2011, J4K wrote:


Over the past few months I noticed an increase in 'Start New Employment
Today | Career Opportunity' style email. The rules I use, that are
pretty much stock rules, correctly tag the email as spam. Usually the
Spam score hovers between 5.5 and 6.9.

Is there some reason you're unwilling or unable to use Bayes? If you
are getting these regularly, then training a few as spam would likely
catch most of the rest.


Hi,

 I thought that Baynes was enabled.  I have fed spam and ham into
sa-learn daily since February 2011.  Of course, I might well have been
feeding data into a black hole if it is not working.

I enabled (I Thought) Baynes as per the local.cf below:-
use_bayes 1
bayes_auto_learn 1
bayes_expiry_max_db_size  30
bayes_auto_expire   1


I read somewhere that this might explain what is into the dB.  Not a
lot, really.
# sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  0  0  non-token data: nspam
0.000  0  0  0  non-token data: nham
0.000  0  0  0  non-token data: ntokens
0.000  0 2147483647  0  non-token data: oldest atime
0.000  0  0  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal
sync atime
0.000  0  0  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire
atime delta
0.000  0  0  0  non-token data: last expire
reduction count

nham and nspam = 0Says it all :(


spamassassin -D -lint confirms:
Jun 29 20:25:17.682 [26298] dbg: plugin: loading
Mail::SpamAssassin::Plugin::Bayes from @INC
Jun 29 20:25:17.847 [26298] dbg: config: fixed relative path:
/var/lib/spamassassin/3.003001/updates_spamassassin_org/23_bayes.cf
Jun 29 20:25:17.847 [26298] dbg: config: using
/var/lib/spamassassin/3.003001/updates_spamassassin_org/23_bayes.cf
for included file
Jun 29 20:25:17.848 [26298] dbg: config: read file
/var/lib/spamassassin/3.003001/updates_spamassassin_org/23_bayes.cf
Jun 29 20:25:19.998 [26298] dbg: plugin:
Mail::SpamAssassin::Plugin::Bayes=HASH(0x3e42670) implements
'learner_new', priority 0
Jun 29 20:25:19.998 [26298] dbg: bayes: learner_new
self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x3e42670),
bayes_store_module=Mail::SpamAssassin::BayesStore::MySQL
Jun 29 20:25:20.010 [26298] dbg: bayes: using username: 
Jun 29 20:25:20.010 [26298] dbg: bayes: learner_new: got
store=Mail::SpamAssassin::BayesStore::MySQL=HASH(0x40bfe48)
Jun 29 20:25:20.010 [26298] dbg: plugin:
Mail::SpamAssassin::Plugin::Bayes=HASH(0x3e42670) implements
'learner_is_scan_available', priority 0
Jun 29 20:25:20.012 [26298] dbg: bayes: database connection established
Jun 29 20:25:20.013 [26298] dbg: bayes: found bayes db version 3
Jun 29 20:25:20.013 [26298] dbg: bayes: Using userid: 77
Jun 29 20:25:20.013 [26298] dbg: bayes: not available for scanning, only
0 spam(s) in bayes DB  200
Jun 29 20:25:20.027 [26298] dbg: bayes: database connection established
Jun 29 20:25:20.027 [26298] dbg: bayes: found bayes db version 3
Jun 29 20:25:20.028 [26298] dbg: bayes: Using userid: 77
Jun 29 20:25:20.028 [26298] dbg: bayes: not available for scanning, only
0 spam(s) in bayes DB  200


I read the entry on
http://wiki.apache.org/spamassassin/SiteWideBayesSetup, and it looks
like these are missing in my local.cf:

  bayes_path /var/spamassassin/bayes/bayes
  bayes_file_mode 0777

* QUESTION
 Other than defining these entries (baynes_path baynes_file) into the 
local.cf, and rerunning sa-learn, is there anything else I should do to get 
this to work?






You don't need those entries at all. Most likely, your MTA (Exim most 
likely) is running as a user other than root.


Set bayes_sql_override_username to the user name that your MTA is 
running under


Example:
bayes_sql_override_username mailnull

Then access your Bayes MySQL database and open the bayes_vars table. It 
should only contain one record if it's set up properly. Change the user 
name to the same one you used above as well.


If you are using spamd, restart it and restart your MTA.

Regards,
Lawrence


Re: [Q] Writing rule for career opportunity type messages

2011-06-29 Thread Lawrence @ Rogers

On 29/06/2011 4:58 PM, JKL wrote:

select count(spam_count) from bayes_vars

Run this query

SELECT username,spam_count,ham_count FROM bayes_vars

This will give a list of usernames that have been used to learn ham and 
spam into SpamAssassin's Bayes MySQL DB. For a site-wide installation, 
this should only return one result.


To answer your previous question, I meant to simply add the 
bayes_sql_override_username setting to your local.cf and restart 
spamassassin


If you are using Postfix with the postfix username, set it as

bayes_sql_override_username postfix

This ensures that all future e-mails are labeled as being learned from 
the postfix user, regardless of whether you did it manually using 
sa-learn via ssh or another interface, or auto-learning is used. For one 
site-wide Bayes installation, this is what you want.


Regards,
Lawrence



Re: Issuing rollback() due to DESTROY without explicit disconnect() of DBD::mysql::db handle

2011-06-28 Thread Lawrence @ Rogers

On 28/06/2011 11:20 PM, Marc Perkel wrote:

Hi everyone,

Now I'm seeing these error messages in the logs:

Issuing rollback() due to DESTROY without explicit disconnect() of 
DBD::mysql::db handle


I'm beginning to wonder if MySQL bays actually works. I'm just geeing 
too many strange errors.


Thanks in advance for any help.

Works here on a cPanel server with no issues. Are you running 3.3.2 or 
3.3.1?


Re: Migrating bayes to mysql fails with parsing errors

2011-06-21 Thread Lawrence @ Rogers

On 21/06/2011 7:01 PM, Dave Wreski wrote:

Hi,


It looks like that may be my problem too. This is the result with your
patch:

dbg: bayes: database connection established
dbg: bayes: found bayes db version 3
dbg: bayes: Using userid: 2
dbg: bayes: database connection established
dbg: bayes: found bayes db version 3
dbg: bayes: using userid: 3
dbg: bayes: _put_token: Updated an unexpected number of rows: 3, id: 3,
token: 7�OR�
dbg: bayes: error inserting token for line: t 0 1 1308332646 37fc4f52eb
dbg: bayes: _put_token: Updated an unexpected number of rows: 3, id: 3,
token: Y
dbg: bayes: error inserting token for line: t 0 2 1308070890 d2eec4f659

I'll try the suggested my.cnf changes and restart the process.


I thought it would take longer before it started to fail again, but 
trying to change the character set didn't make a difference for me.


Thanks,
Dave


It may be easier to just start from scratch with your Bayes database.

- Lawrence


Re: Migrating bayes to mysql fails with parsing errors

2011-06-21 Thread Lawrence @ Rogers

On 21/06/2011 7:01 PM, Dave Wreski wrote:

Hi,


It looks like that may be my problem too. This is the result with your
patch:

dbg: bayes: database connection established
dbg: bayes: found bayes db version 3
dbg: bayes: Using userid: 2
dbg: bayes: database connection established
dbg: bayes: found bayes db version 3
dbg: bayes: using userid: 3
dbg: bayes: _put_token: Updated an unexpected number of rows: 3, id: 3,
token: 7�OR�
dbg: bayes: error inserting token for line: t 0 1 1308332646 37fc4f52eb
dbg: bayes: _put_token: Updated an unexpected number of rows: 3, id: 3,
token: Y
dbg: bayes: error inserting token for line: t 0 2 1308070890 d2eec4f659

I'll try the suggested my.cnf changes and restart the process.


I thought it would take longer before it started to fail again, but 
trying to change the character set didn't make a difference for me.


Thanks,
Dave



Ignore my last suggestion of starting from scratch. Try commenting out 
these lines (or similar ones) if present in /etc/my.cnf and restarting 
MySQL before attempting again


default-character-set=utf8
character-set-server=utf8
collation-server=utf8_unicode_ci
init_connect='set collation_connection = utf8_unicode_ci;'

Regards,
Lawrence


Re: Migrating bayes to mysql fails with parsing errors

2011-06-21 Thread Lawrence @ Rogers

On 21/06/2011 8:47 PM, Benny Pedersen wrote:

On Tue, 21 Jun 2011 22:16:05 +0300, Panagiotis Christias wrote:


After commenting out the utf8 definitions and reverting back to latin1
sa-learn --restore worked fine.


thanks for this report, but imho this should NOT be fixed in my.cnf



What other option does he have? iconv??

- Lawrence


Re: Migrating bayes to mysql fails with parsing errors

2011-06-20 Thread Lawrence @ Rogers

This one is the current SQL schema and works

http://svn.apache.org/repos/asf/spamassassin/tags/spamassassin_current_release_3.3.x/sql/bayes_mysql.sql

- Lawrence

On 20/06/2011 7:34 PM, Dave Wreski wrote:

Hi,

I have an existing v3.3.2 on fedora14 (perl v5.12.3) that I'm trying 
to convert bayes to use mysql. The restore process fails after a few 
minutes due to too many errors:


dbg: bayes: error inserting token for line: t 1 0 1308114254 4fd2b3f2f0
dbg: bayes: _put_token: Updated an unexpected number of rows.
[repeats ...]
bayes: encountered too many errors (20) while parsing token line, 
reverting to empty database and exiting
dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x26b8af8) 
implements 'learner_close', priority 0
ERROR: Bayes restore returned an error, please re-run with -D for more 
information


This was already run with -D, so no further information is available.

I used the sql files from 
spamassassin.apache.org/full/3.0.x/dist/sql/bayes_mysql.sql to create 
the tables. Maybe the format has changed since then and there is a 
more updated file?


I'm using the sa from 
http://kojipkgs.fedoraproject.org/packages/spamassassin/3.3.2/1.fc14/x86_64/


Is there a way to skip these invalid records? Other ideas for 
resolving this?


I can successfully restore back to the normal dbm database.

Thanks,
Dave





Re: Migrating bayes to mysql fails with parsing errors

2011-06-20 Thread Lawrence @ Rogers

On 20/06/2011 10:09 PM, Dave Wreski wrote:


I have an existing v3.3.2 on fedora14 (perl v5.12.3) that I'm trying
to convert bayes to use mysql. The restore process fails after a few
minutes due to too many errors:

dbg: bayes: error inserting token for line: t 1 0 1308114254 4fd2b3f2f0
dbg: bayes: _put_token: Updated an unexpected number of rows.
[repeats ...] 

Did you make the backup using 3.3.2 as well?

Lawrence


Re: Migrating bayes to mysql fails with parsing errors

2011-06-20 Thread Lawrence @ Rogers

On 20/06/2011 11:55 PM, Dave Wreski wrote:

Hi,


I have an existing v3.3.2 on fedora14 (perl v5.12.3) that I'm trying
to convert bayes to use mysql. The restore process fails after a few
minutes due to too many errors:

dbg: bayes: error inserting token for line: t 1 0 1308114254 4fd2b3f2f0
dbg: bayes: _put_token: Updated an unexpected number of rows.
[repeats ...]

Did you make the backup using 3.3.2 as well?


Yes, and the bdb was originally created just recently using a v3.3.2 
pre-release as well. I also made sure the bdb was synced before trying 
to do the backup.


Thanks,
Dave


Was it made using the very same version though?

I don't know what to tell you. I've never seen this issue myself, it 
sounds like a corrupt backup or bug in the restore.


When I did it, I used the instructions found at the end of this file

http://svn.apache.org/repos/asf/spamassassin/branches/3.3/sql/README.bayes

On cPanel servers, exim and such generally run as the mailnull user, so 
I had to set


bayes_sql_override_username mailnull

Once that was done, I restored the backup and it has worked flawlessly 
since.


- Lawrence


Re: Spam not stopped???

2011-06-15 Thread Lawrence @ Rogers

On 15/06/2011 10:00 PM, User for SpamAssassin Mail List wrote:



Hello,

I have something I cannot explain. We blacklisted an email address for 
a client but Spam assassin still let it through. Here are the logs:



Jun 15 08:08:10 mail spamd[20901]: spamd: identified spam (104.0/6.0) 
for client:2130 in 0.2 seconds, 1729 bytes.


Jun 15 08:08:10 mail spamd[20901]: spamd: result: Y 103 - 
BAYES_50,HTML_MESSAGE,MISSING_SUBJECT,SPF_PASS,TVD_SPAC
E_RATIO,USER_IN_BLACKLIST 
scantime=0.2,size=1729,user=client,uid=2130,required_score=6.0,rhost=localhost,raddr=127.
0.0.1,rport=55987,mid=snt117-w309552c1e79d42eb67a294ad...@phx.gbl,bayes=0.479706,autolearn=no 



Jun 15 08:08:10 mail sm-mta[21077]: p5FF86ld021067: 
to=cli...@pcez.com, delay=00:00:03, xdelay=00:00:02, mailer=local, 
pri=31672, dsn=2.0.0, stat=Sent


As you can see the use is in the black list but yet the mail was 
delivered. I checked other email that was over a score of 9 and the 
mail was rejected, but for some reason or another this was not.


Anyone have an idea why this making it through?

Thanks,

Ken

SpamAssassin merely assigns scores and doesn't do any rejections on it's 
own. That is handled by whatever is calling SpamAssassin and using the 
score that the e-mail is assigned. This could be something like 
MailScanner, Amavis, or some other third party software.


Also, it would be better to blacklist an e-mail address at the MTA level 
(ex: Exim, Postfix)


Regards,
Lawrence


Re: Spam not stopped???

2011-06-15 Thread Lawrence @ Rogers

On 15/06/2011 11:13 PM, User for SpamAssassin Mail List wrote:


Lawrence,

Thanks for the responce. I know Spam Assassin doesn't stop it we use a 
spamassassin milter for sendmail to reject it. (We been doing this for 
years). Anyway here is a log on a email that was rejected:


Jun 15 06:27:33 mail spamd[981]: spamd: identified spam (22.2/6.0) for 
spamass-milter:111 in 2.1 seconds, 5378 bytes.


Jun 15 06:27:33 mail spamd[981]: spamd: result: Y 22 - 
AWL,BAYES_99,HTML_IMAGE_ONLY_12,HTML_MESSAGE,HTML_SHORT_LINK_IMG_1,SARE
_SPEC_ROLEX,SARE_SPOOF_COM2COM,SARE_SPOOF_COM2OTH,SPOOF_COM2COM,SPOOF_COM2OTH,URIBL_AB_SURBL,URIBL_BLACK,URIBL_JP_SURBL,URIBL_ 

RHS_DOB,URIBL_SBL,URIBL_SC_SURBL,URIBL_WS_SURBL 
scantime=2.1,size=5378,user=spamass-milter,uid=111,required_score=6.0,rhost=
localhost,raddr=127.0.0.1,rport=42127,mid=20110615185711.2964.qmail@vsp-6214cbe9e6d,bayes=1.00,autolearn=spam 



Jun 15 06:27:33 mail sm-mta[1251]: p5FDRUgF001251: Milter: data, 
reject=550 5.7.1 Blocked by SpamAssassin


Jun 15 06:27:33 mail sm-mta[1251]: p5FDRUgF001251: to=u...@pcez.com, 
delay=00:00:02, pri=35237, stat=Blocked by SpamAssassin



The reason we did not block this at the MTA level is we do not know if 
OTHER users might want email from this email address.


Anyway I'm still looking for a clue why one is blocked and the other 
is not.


Thanks,

Ken


On Wed, 15 Jun 2011, Lawrence @ Rogers wrote:


On 15/06/2011 10:00 PM, User for SpamAssassin Mail List wrote:



Hello,

I have something I cannot explain. We blacklisted an email address 
for a client but Spam assassin still let it through. Here are the logs:



Jun 15 08:08:10 mail spamd[20901]: spamd: identified spam 
(104.0/6.0) for client:2130 in 0.2 seconds, 1729 bytes.


Jun 15 08:08:10 mail spamd[20901]: spamd: result: Y 103 - 
BAYES_50,HTML_MESSAGE,MISSING_SUBJECT,SPF_PASS,TVD_SPAC
E_RATIO,USER_IN_BLACKLIST 
scantime=0.2,size=1729,user=client,uid=2130,required_score=6.0,rhost=localhost,raddr=127.
0.0.1,rport=55987,mid=snt117-w309552c1e79d42eb67a294ad...@phx.gbl,bayes=0.479706,autolearn=no 

Jun 15 08:08:10 mail sm-mta[21077]: p5FF86ld021067: 
to=cli...@pcez.com, delay=00:00:03, xdelay=00:00:02, mailer=local, 
pri=31672, dsn=2.0.0, stat=Sent


As you can see the use is in the black list but yet the mail was 
delivered. I checked other email that was over a score of 9 and 
the mail was rejected, but for some reason or another this was not.


Anyone have an idea why this making it through?

Thanks,

Ken

SpamAssassin merely assigns scores and doesn't do any rejections on 
it's own. That is handled by whatever is calling SpamAssassin and 
using the score that the e-mail is assigned. This could be something 
like MailScanner, Amavis, or some other third party software.


Also, it would be better to blacklist an e-mail address at the MTA 
level (ex: Exim, Postfix)


Regards,
Lawrence



Although you shouldn't be using SARE rules anymore (No longer developed 
and reportedly hit many FPs), this e-mail would be blocked by a 9.0 
limit. That would indicate that your setup is working, at least sometimes.


The first set of headers you posted were as follows

Jun 15 08:08:10 mail spamd[20901]: spamd: result: Y 103 - 
BAYES_50,HTML_MESSAGE,MISSING_SUBJECT,SPF_PASS,TVD_SPAC
E_RATIO,USER_IN_BLACKLIST 
scantime=0.2,size=1729,user=client,uid=2130,required_score=6.0,rhost=localhost,raddr=127.
0.0.1,rport=55987,mid=snt117-w309552c1e79d42eb67a294ad...@phx.gbl,bayes=0.479706,autolearn=no 



BAYES_50 is 0.8
HTML_MESSAGE is 0.001
MISSING_SUBJECT is 0.001
SPF_PASS is -0.001
TVD_SPACE_RATIO is 0.001
USER_IN_BLACKLIST is 100.00

I got this from
http://spamassassin.apache.org/tests_3_3_x.html (except MISSING_SUBJECT 
and TVD_SPACE_RATIO, which are not listed but are present in the current 
3.3 rules available via sa-update)


So the overall score should have been 100.802

What was the score shown as being returned by SA?

Regards,
Lawrence


Re: Spam not stopped???

2011-06-15 Thread Lawrence @ Rogers

On 16/06/2011 3:13 AM, User for SpamAssassin Mail List wrote:



On Thu, 16 Jun 2011, Lawrence @ Rogers wrote:


On 15/06/2011 11:13 PM, User for SpamAssassin Mail List wrote:


Lawrence,

Thanks for the responce. I know Spam Assassin doesn't stop it we use 
a spamassassin milter for sendmail to reject it. (We been doing this 
for years). Anyway here is a log on a email that was rejected:


Jun 15 06:27:33 mail spamd[981]: spamd: identified spam (22.2/6.0) 
for spamass-milter:111 in 2.1 seconds, 5378 bytes.


Jun 15 06:27:33 mail spamd[981]: spamd: result: Y 22 - 
AWL,BAYES_99,HTML_IMAGE_ONLY_12,HTML_MESSAGE,HTML_SHORT_LINK_IMG_1,SARE
_SPEC_ROLEX,SARE_SPOOF_COM2COM,SARE_SPOOF_COM2OTH,SPOOF_COM2COM,SPOOF_COM2OTH,URIBL_AB_SURBL,URIBL_BLACK,URIBL_JP_SURBL,URIBL_ 
RHS_DOB,URIBL_SBL,URIBL_SC_SURBL,URIBL_WS_SURBL 
scantime=2.1,size=5378,user=spamass-milter,uid=111,required_score=6.0,rhost=
localhost,raddr=127.0.0.1,rport=42127,mid=20110615185711.2964.qmail@vsp-6214cbe9e6d,bayes=1.00,autolearn=spam 

Jun 15 06:27:33 mail sm-mta[1251]: p5FDRUgF001251: Milter: data, 
reject=550 5.7.1 Blocked by SpamAssassin


Jun 15 06:27:33 mail sm-mta[1251]: p5FDRUgF001251: 
to=u...@pcez.com, delay=00:00:02, pri=35237, stat=Blocked by 
SpamAssassin



The reason we did not block this at the MTA level is we do not know 
if OTHER users might want email from this email address.


Anyway I'm still looking for a clue why one is blocked and the other 
is not.


Thanks,

Ken


On Wed, 15 Jun 2011, Lawrence @ Rogers wrote:


On 15/06/2011 10:00 PM, User for SpamAssassin Mail List wrote:



Hello,

I have something I cannot explain. We blacklisted an email address 
for a client but Spam assassin still let it through. Here are the 
logs:



Jun 15 08:08:10 mail spamd[20901]: spamd: identified spam 
(104.0/6.0) for client:2130 in 0.2 seconds, 1729 bytes.


Jun 15 08:08:10 mail spamd[20901]: spamd: result: Y 103 - 
BAYES_50,HTML_MESSAGE,MISSING_SUBJECT,SPF_PASS,TVD_SPAC
E_RATIO,USER_IN_BLACKLIST 
scantime=0.2,size=1729,user=client,uid=2130,required_score=6.0,rhost=localhost,raddr=127.
0.0.1,rport=55987,mid=snt117-w309552c1e79d42eb67a294ad...@phx.gbl,bayes=0.479706,autolearn=no 
Jun 15 08:08:10 mail sm-mta[21077]: p5FF86ld021067: 
to=cli...@pcez.com, delay=00:00:03, xdelay=00:00:02, 
mailer=local, pri=31672, dsn=2.0.0, stat=Sent


As you can see the use is in the black list but yet the mail was 
delivered. I checked other email that was over a score of 9 and 
the mail was rejected, but for some reason or another this was not.


Anyone have an idea why this making it through?

Thanks,

Ken

SpamAssassin merely assigns scores and doesn't do any rejections on 
it's own. That is handled by whatever is calling SpamAssassin and 
using the score that the e-mail is assigned. This could be 
something like MailScanner, Amavis, or some other third party 
software.


Also, it would be better to blacklist an e-mail address at the MTA 
level (ex: Exim, Postfix)


Regards,
Lawrence



Although you shouldn't be using SARE rules anymore (No longer 
developed and reportedly hit many FPs), this e-mail would be blocked 
by a 9.0 limit. That would indicate that your setup is working, at 
least sometimes.


The first set of headers you posted were as follows

Jun 15 08:08:10 mail spamd[20901]: spamd: result: Y 103 - 
BAYES_50,HTML_MESSAGE,MISSING_SUBJECT,SPF_PASS,TVD_SPAC
E_RATIO,USER_IN_BLACKLIST 
scantime=0.2,size=1729,user=client,uid=2130,required_score=6.0,rhost=localhost,raddr=127.
0.0.1,rport=55987,mid=snt117-w309552c1e79d42eb67a294ad...@phx.gbl,bayes=0.479706,autolearn=no 


BAYES_50 is 0.8
HTML_MESSAGE is 0.001
MISSING_SUBJECT is 0.001
SPF_PASS is -0.001
TVD_SPACE_RATIO is 0.001
USER_IN_BLACKLIST is 100.00

I got this from
http://spamassassin.apache.org/tests_3_3_x.html (except 
MISSING_SUBJECT and TVD_SPACE_RATIO, which are not listed but are 
present in the current 3.3 rules available via sa-update)


So the overall score should have been 100.802

What was the score shown as being returned by SA?

Regards,
Lawrence




As the log showed:

Jun 15 08:08:10 mail spamd[20901]: spamd: identified spam (104.0/6.0)



spamd is reporting it as spam. sendmail.mc is set up as:

INPUT_MAIL_FILTER(`spamassassin',
 `S=local:/var/run/spamass/spamass.sock, F=,
 T=S:6m;R:9m;E:16m')dnl

As you can see the one message is blocked by MTA:

 Jun 15 06:27:33 mail sm-mta[1251]: p5FDRUgF001251: Milter: data, 
reject=550 5.7.1 Blocked by SpamAssassin


 Jun 15 06:27:33 mail sm-mta[1251]: p5FDRUgF001251: to=u...@pcez.com,
 delay=00:00:02, pri=35237, stat=Blocked by SpamAssassin

But the message in question got delivered even though the spamassassin 
said it was spam. So it looked like the milter is working for one 
email but not the other. What would cause this?


Thanks,

Ken



Hi Ken,

It's odd that one spam e-mail is being blocked by the milter, while 
another is not.


It's definitely something with your milter configuration. Unfortunately, 
I cannot

Re: Sought rules

2011-06-10 Thread Lawrence @ Rogers

On 10/06/2011 10:24 PM, Warren Togami Jr. wrote:

On 6/10/2011 2:01 PM, Karsten Bräckelmann wrote:


IFF you use the sought channel with SA 3.3.x, you will need the reorder
hack to bend the alphabet.



It is not entirely clear to me, what exactly are you supposed to 
rename for the reorder hack?  You have to do it every time you sa-update?


Warren

Would renaming 20_sought_fraud.cf to 99_sought_fraud.cf, putting 
20_sought_fraud.cf (from the yelp.org channel) after 72_active.cf (the 
default and assumed older SA rules) solve this problem?


Regards,
Lawrence


Re: Spamassasin - SQLITE as storage database

2011-05-17 Thread Lawrence @ Rogers

On 17/05/2011 12:06 PM, monolit939 wrote:

Hello,

do you have any experience with usage of SQLITE database as storage for
Spamassassin? Spamassassin uses Berkeley DB, but I need to replace it. I
could not find any manual, guide or just phorum discussion about
colaboration Sapmassassin with SQLITE. I apreciate each advice.

Thanks a lot
I have no experience with this, but I do have experience with using 
MySQL with InnoDB tables. The performance is actually much better than 
Berkley DBs.


Regards,
Lawrence


Re: DKIM_SIGNED postive score

2011-04-13 Thread Lawrence @ Rogers

On 13/04/2011 10:08 PM, Noel Butler wrote:


I've looked high and low and dont seem to be adding this locally,  
shouldn't it be a negative score of 0.1?
Or better still, null, and only get a score if valid which is applied 
(DKIM_VALID=-0.1,), Seems the above only cancels this out and either 
way is not needed, or am I missing something?



Cheers
It appears this is a default score for DKIM_VALID, and is intended to be 
canceled out by DKIM_VALID or DKIM_VALID_AU





Re: ups.com virus has now switched to dhl.com

2011-03-31 Thread Lawrence @ Rogers

On 31/03/2011 1:29 PM, Michael Scheidell wrote:

'from' dhl.com
(come on ups/dhl.. I know SPF is broken, but in this case it would 
sure help is decide if the sending ip is authorized to send on your 
behalf)


with some pretty weird received lines:  is this 'ipv8'? 
Doubtful. IPv8 is still very much a pipe dream. The world hasn't even 
embraced IPv6 yet. I would say most of the Received: headers are just 
messed up to bypass IPv4 and RBL checks.


- Lawrence


Re: Spam

2011-03-29 Thread Lawrence @ Rogers

On 29/03/2011 9:27 PM, Martin Gregorie wrote:

On Wed, 2011-03-30 at 00:58 +0200, mar...@swetech.se wrote:

recetly i been getting ALOT of these mail with the subjects like this
contain a link to some scam/chinese crap factory

i run the latest spamassassin along with amavis  but these mails keep
getting through any ideas?

Re: YouWillNotBelieveYourPennisCanBbeThhatHardAndThick!GiveYouserlfATreat

Since the longest (English) word I know has 28 letters
(antidisestablishmentarianism), a private rule like:

header VERY_LONG_WORD  Subject =~ /Re:\s+\S{29}/

should catch that spam.


Martin


We started getting those spams about 6 months ago. What I did was come 
up with a low scoring rule that hits on this


# Rule 1: check if the Subject also containing numbers, letters, or 
common formatting (except spaces) and more than 34 characters

header LW_SUBJECT_SPAMMY  Subject =~ /^[0-9a-zA-Z,.+_\-'!\\\/]{31,}$/
describe LW_SUBJECT_SPAMMY Subject appears spammy (31 or more characters 
without spaces. Only numbers, letters, and formattiing)

score  LW_SUBJECT_SPAMMY 0.2
#tflags LW_SUBJECT_SPAMMY noautolearn

I'm sure this rule could use some improvement.

The ones we saw also always followed 2 possible patterns (sometimes 
containing both in the same e-mail)


1) Hit the HTML_MESSAGE, and either FREEMAIL_FROM or TRACKER_ID, rules.
2) Hit MIME_QP_LONG_LINE and a network test.

We have the above 2 in the form of meta rules and scored at 1.0 each.

We also have a 3rd meta rule, with the first rule + the 2 described 
above, scored at 1.5


This has proven to be quite effective at nuking these spams without FP. 
This is because the likelyhood of a ham e-mail setting off all of the 
above rules is quite low.


Regards,
Lawrence




Re: fake URL's in mail

2011-03-23 Thread Lawrence @ Rogers

On 23/03/2011 4:36 PM, Adam Katz wrote:

On 03/23/2011 11:43 AM, Matus UHLAR - fantomas wrote:

On 03/21/2011 09:37 AM, Matus UHLAR - fantomas wrote:

Does anyone successfully use plugin or at least rules that
catch fake URLs?

On 21.03.11 13:36, Adam Katz wrote:

__SPOOFED_URL, a rule already shipping with SA, does this.

I know about the problem with legal mail and spoofed URL's. That's
why I asked about plugin that would be able to accept whitelists.

That would require an ENORMOUS whitelist and very close attention to its
upkeep.  I do not see this as practical without using a URIBL-style
mechanism (which would also require high maintenance).  Even with such a
mechanism in place, it unduly penalizes the little guys.

Agreed. It's just one of those impractical things and just ain't worth 
the effort.


Regards,
Lawrence


Re: BUG : all messages rule RP_8BIT

2011-03-22 Thread Lawrence @ Rogers

On 22/03/2011 7:02 PM, Bagnoud Thierry [ezwww.ch] wrote:


until 21 mars 2011 after the normal cron.daily/update_spamassassin, 
Spamassassin report all messages with the rule RP_8BIT


header RP_8BIT Return-Path:raw =~ /[^\000-\177]/
describe RP_8BIT Return-Path contains 8-bit characters with high bit on
score RP_8BIT 2.8

Thanks to correct this rule.

Thierry Bagnoud 
I looked through our mail logs and don't see any such hits on our 
e-mail. If all of your e-mail is hitting this rule, I would think 
something before SpamAssassin is messing up the Return-Path (perhaps 
another scanner or MTA)


- Lawrence




Re: BUG : all messages rule RP_8BIT

2011-03-22 Thread Lawrence @ Rogers

On 22/03/2011 7:21 PM, Bagnoud Thierry [ezwww.ch] wrote:

oups, since 21 mars and not until 21 mars, excuse me bad english :-)

the modification from the rule on 2011-03-21

-header   RP_8BITReturn-Path =~ /[^\000-\177]/
+header   RP_8BITReturn-Path:raw =~ /[^\000-\177]/

http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/mmartinec/20_misc.cf?r1=906046r2=906045pathrev=906046 



perhaps the MTA MailScanner is messing up the Return-Path

Thierry Bagnoud


On 3/22/11 5:32 PM, Bagnoud Thierry [ezwww.ch] wrote:

hi,

until 21 mars 2011 after the normal cron.daily/update_spamassassin,
Spamassassin report all messages with the rule RP_8BIT


I don't see this on any inbound email.

what are you saying? you want the rule to be changed to the below?
I don;t see any difference between this rule and your proposed changes
except for the score.

and, I don't recommending changing that score. leave it at 2.866 1.389
2.866 1.389

header RP_8BIT Return-Path:raw =~ /[^\000-\177]/
describe RP_8BIT Return-Path contains 8-bit characters with high bit on
score RP_8BIT 2.8

Thanks to correct this rule.


or are you saying that it is hitting on email that does not have the 8th
bit high?

you might want to post a full email to pastbin.com and send THE LINK
ONLY to this group.

are you saying its a false positive on YOUR system since you like
getting emails with illegal chars in the headers?
then add this to local.cf:
score RP_8BIT 0



Thierry Bagnoud






Something is definitely off. We use SA with MailScanner, and that rule 
never hits anything (less than 1 or 2 messages in several thousand).


- Lawrence


Re: Very large subjects in all caps with no spaces

2011-03-15 Thread Lawrence @ Rogers
I use the following rule that, combined with other meta rules, catches 
the majority of these


header LW_SUBJECT_SPAMMY  Subject =~ /^[0-9a-zA-Z,.+_\-'!\\\/]{31,}$/
describe LW_SUBJECT_SPAMMY Subject appears spammy (31 or more characters 
without spaces. Only numbers, letters, and formatting)

score  LW_SUBJECT_SPAMMY 0.2

The key is to score the actual subject rule low, but bump the SA score 
with meta rules that increase the score as more indicators are hit. I've 
had moderate success with the rules below:


# Rule 2: Message is HTML and has a tracking ID, or comes from a free 
mail address

# Therefore, must hit HTML_MESSAGE, and either TRACKER_ID or FREEMAIL_FROM
meta LW_SPAMMY_EMAIL1  (LW_SUBJECT_SPAMMY  HTML_MESSAGE  (TRACKER_ID 
|| FREEMAIL_FROM))
describe LW_SPAMMY_EMAIL1 Spammy HTML message that has a tracking ID or 
is freemail

score  LW_SPAMMY_EMAIL1 1.0
#tflags LW_SPAMMY_EMAIL1 noautolearn

# Rule 3: Message hits LW_SPAMMY_EMAIL1 and MIME_QP_LONG_LINE
# It's unusual for non-spam HTML messages to have really long Quoted 
Printable lines
meta LW_SPAMMY_EMAIL2  (LW_SPAMMY_EMAIL1  (MIME_QP_LONG_LINE || 
__LW_NET_TESTS))
describe LW_SPAMMY_EMAIL2 Spammy HTML message also has a Quoted 
Printable line  76 chars, or hits net check

score  LW_SPAMMY_EMAIL2 1.0
#tflags LW_SPAMMY_EMAIL2 noautolearn

Hope this helps!

Regards,
Lawrence

On 15/03/2011 1:53 AM, jambroo wrote:

Is there a way of filtering emails with very large one-word subjects. They
are also in all caps.

I can see rules that set emails to spam if they contain specific wording but
nothing like this.

Thanks.




Re: The one year anniversary of the Spamhaus DBL brings a new zone

2011-03-08 Thread Lawrence @ Rogers

On 08/03/2011 4:54 PM, dar...@chaosreigns.com wrote:

Looks like that would be something like this?

urirhssub   URIBL_DBL_REDIRECTOR   dbl.spamhaus.org.   A   127.0.1.3
bodyURIBL_DBL_REDIRECTOR   eval:check_uridnsbl('URIBL_DBL_SPAM')
describeURIBL_DBL_REDIRECTOR   Contains a URL listed in the DBL as a 
spammed redirector domain
tflags  URIBL_DBL_REDIRECTOR   net domains_only
score   URIBL_DBL_REDIRECTOR   0.1


Anybody know of a domain that hits this?


Close.

I believe that you should be using this

eval:check_uridnsbl('URIBL_DBL_REDIRECTOR')

Instead of this

eval:check_uridnsbl('URIBL_DBL_SPAM')

So the correct rule would be

urirhssub   URIBL_DBL_REDIRECTOR   dbl.spamhaus.org.   A   127.0.1.3
bodyURIBL_DBL_REDIRECTOR   
eval:check_uridnsbl('URIBL_DBL_REDIRECTOR')
describeURIBL_DBL_REDIRECTOR   Contains a URL listed in the DBL 
as a spammed redirector domain

tflags  URIBL_DBL_REDIRECTOR   net domains_only
score   URIBL_DBL_REDIRECTOR   0.1

Regards,
Lawrence


Re: The one year anniversary of the Spamhaus DBL brings a new zone

2011-03-08 Thread Lawrence @ Rogers

On 08/03/2011 5:12 PM, Yet Another Ninja wrote:

On 2011-03-08 21:24, dar...@chaosreigns.com wrote:

Looks like that would be something like this?

urirhssub   URIBL_DBL_REDIRECTOR   dbl.spamhaus.org.   A   
127.0.1.3
bodyURIBL_DBL_REDIRECTOR   
eval:check_uridnsbl('URIBL_DBL_SPAM')
describeURIBL_DBL_REDIRECTOR   Contains a URL listed in the 
DBL as a spammed redirector domain

tflags  URIBL_DBL_REDIRECTOR   net domains_only
score   URIBL_DBL_REDIRECTOR   0.1


Anybody know of a domain that hits this?



tried to post a list of the domains but Apache's infra rejected it with.

Delivery to the following recipient failed permanently:

 users@spamassassin.apache.org

Technical details of permanent failure:
Google tried to deliver your message, but it was rejected by the 
recipient domain. We recommend contacting the other email provider for 
further information about the cause of this error. The error that the 
other server returned was: 552 552 spam score (13.3) exceeded 
threshold 
(FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_SURBL_MULTI1,T_SURBL_MULTI2,T_TO_NO_BRKTS_FREEMAIL,T_URIBL_BLACK_OVERLAP,URIBL_AB_SURBL,URIBL_BLACK,URIBL_JP_SURBL,URIBL_PH_SURBL,URIBL_WS_SURBL 
(state 18).



pretty amazing...

How so? You posted a list of spam domains, and SpamAssassin picked up on 
them. Why not try making them a bit mangled like


www dot crappydomain dot com

??

Regards,
Lawrence


Re: Should Emails Have An Expiration Date

2011-03-01 Thread Lawrence @ Rogers

On 28/02/2011 5:12 PM, Matt wrote:

I think this would be a great idea.  Many end users never bother to
delete old emails and on some, such as sales etc, there is no valid
reason for them to countinue to waste disk and server space.

http://www.zdnet.com/news/should-emails-have-an-expiration-date/6197888


Dumbest. Idea. Ever.

Regards,
Lawrence


Re: [Q] sa-compile: not compiling; 'spamassassin --lint' check failed!

2011-02-15 Thread Lawrence @ Rogers

On 15/02/2011 9:27 AM, J4K wrote:

spamassassin --lint
This may seem obvious, but did you run spamassassin --lint like 
sa-compile suggested?


I assume DCC is probably not loaded, or disabled in your setup.

Open up /etc/spamassassin/local.cf, find this line

dcc_add_header 1

Comment it out. The line should look like this when you are done

#dcc_add_header 1

Save your changes and run spamassassin --lint again. This time there 
should be no complaints from it. If there are not, try sa-compile again.


Regards,
Lawrence


Re: [Q] sa-compile: not compiling; 'spamassassin --lint' check failed!

2011-02-15 Thread Lawrence @ Rogers

On 15/02/2011 10:07 AM, J4K wrote:


Its pretty moot any way, because now after running spamassassin -lint, 
sa-compile still fails with the same error.

Hi,

Just because DCC is running doesn't mean SA is configured to use it.

Can you post the following:

- Output of spamassassin --lint
- Contents of /etc/spamassassin/local.cf

Those should help pinpoint the exact problem.

Regards,
Lawrence


Re: eval:html_tag_balance - short tags not accepted?

2011-01-28 Thread Lawrence @ Rogers

On 28/01/2011 4:13 AM, Per Jessen wrote:

letely valid
  don't you think? It's invalid HTML and contains no content.

Yes, I agree.

All HTML/XHTML tags are required to close (XHTML is supposed to be more 
strict, as it was intended to follow XML structure moreso), but only the 
ones I mentioned earlier are allowed to self-close with a /. The others 
require an explicit closing in order to be considered valid.


Example: p/p

OT: I am curious to know why the W3C Validator considers p/ to be 
valid, when it goes against every bit of documentation from them I've 
ever read.


I agree with John Hardin though. head/ is more likely to appear in 
spam than ham (ham being legit e-mail that the sender wants to be 
readable by even the most broken of HTML renderers, like Outlook 2010).


- Lawrence


Re: eval:html_tag_balance - short tags not accepted?

2011-01-28 Thread Lawrence @ Rogers

On 28/01/2011 5:28 AM, Per Jessen wrote:

script type=/
style type=/
fieldset/
legend/
Sounds like it doesn't care whether the actual tag can be shorthand or 
not. It just looks at the structure and decides they're valid, even 
though HTML and XHTML specifications say otherwise.


- Lawrence


Re: Training Bayes on outbound mail

2011-01-28 Thread Lawrence @ Rogers

On 28/01/2011 2:53 PM, David F. Skoll wrote:

On Fri, 28 Jan 2011 18:10:08 +
Dominic Bensondomi...@lenny.cus.org  wrote:


Recently, in order to balance the ham/spam ratio given to sa-learn, I
have started to pass mail submitted by authenticated users to
sa-learn --ham.
I haven't seen any mention of this strategy on-list or on the web, so
I'm interested in whether (a) anyone else does this, and (b) is there
a good reason not to do it that I haven't thought of?

It's possibly a good idea, but you want to be really careful of one
thing: Make sure your users are savvy enough not to have their
accounts phished.  It'll take just one compromised account that blasts
out a spam run to destroy the usefulness of your Bayes data.

Regards,

David.

Agreed. I was considering the same idea at one point, and came to the 
same result. One person could poison the DB completely.


Re: eval:html_tag_balance - short tags not accepted?

2011-01-27 Thread Lawrence @ Rogers

On 27/01/2011 4:15 AM, Per Jessen wrote:

I've just been looking at a mail that got a hit on
HTML_TAG_BALANCE_HEAD due to this:

!DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN 
http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd;
html xmlns=http://www.w3.org/1999/xhtml;
head/
body style=width: 800px

I can't quite figure out whether the short tag syntax is allowed - the HTML
above was generated by XSLT based on this input:

head/head

Other popular short tags:br/  div/  p/  - I don't think we should
be judging those to be unbalanced HTML tags.


/Per Jessen, Zürich


As a person who writes HTML/XHTML every single day, there are several 
flaws in your argument:


- head/ is not valid HTML or XHTML (in any version)
- HTML 4.01 Transitional doesn't allow for an XHTML xmlns attribute, nor 
does it permit short tags
- The only valid short tag that you mentioned is br /. div/ and 
p/ are not
- Using a short tag without a space between the name and the / is also 
not recommended as it causes problems for older browsers and poorly 
written HTML parsers.


You appear to have made a flawed statement based upon a flawed study (no 
HTML e-mail will ever be just a head/head combination)


Regards,
Lawrence




Re: eval:html_tag_balance - short tags not accepted?

2011-01-27 Thread Lawrence @ Rogers

On 27/01/2011 4:43 PM, Per Jessen wrote:

Lawrence @ Rogers wrote:


On 27/01/2011 4:15 AM, Per Jessen wrote:

I've just been looking at a mail that got a hit on
HTML_TAG_BALANCE_HEAD due to this:

!DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN
http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd;  html
xmlns=http://www.w3.org/1999/xhtml;  head/
body style=width: 800px

I can't quite figure out whether the short tag syntax is allowed -
the HTML above was generated by XSLT based on this input:

head/head

Other popular short tags:br/   div/   p/   - I don't think we
should be judging those to be unbalanced HTML tags.


/Per Jessen, Zürich



As a person who writes HTML/XHTML every single day, there are several
flaws in your argument:

-head/  is not valid HTML or XHTML (in any version)

Ah, because it needs at leasttitle. Okay.


- HTML 4.01 Transitional doesn't allow for an XHTML xmlns attribute,
nor does it permit short tags

Irrelevant for this issue. Spamassassin doesn't care about the DTD when
it's evaluating for unbalanced tags.  Use your imagtion and put any
suitable DTD instead.


- The only valid short tag that you mentioned isbr /.div/  and
p/  are not

They're certainly all valid in XHTML. (the validator at w3c says ok for
both).


- Using a short tag without a space between the name and the / is also
not recommended as it causes problems for older browsers and poorly
written HTML parsers.

Irrelevant for this issue.


You appear to have made a flawed statement based upon a flawed study

Gee, what's with the hostility?  I never made an argument, I asked a
simple question.


(no HTML e-mail will ever be just ahead/head  combination)

I didn't suggest that.


/Per Jessen, Zürich



Hi Per,

I did not intend for my message to be hostile in any way. My apologies 
if my terse tone came across that way.


div/ and p/ may pass the validator, but that is most certainly a 
bug. A quick look through the XHTML 1.0 DTD's reveals only ten tags that 
may be closed using the short form, and I am unable to find any 
documentation on the W3C web site to support anything otherwise.


area /
base /
br /
col /
hr /
img /
input /
link /
meta /
param /

Using any other shorthandled elements would result in HTML rendering 
engines choking and giving unpredictable results.


What I was suggesting is that your belief is flawed because your test 
was flawed itself. No e-mail will ever be just head/head. Ignoring 
the fact that a title tag is required as a minimum (although many 
e-mails probably omit it), the head/ form is invalid as well.


SpamAssassin may not care about DTDs and the like, but HTML rendering 
engines such as the one used in Internet Explorer (where people may be 
using webmail clients) and Outlook (which recently reverted from IE's 
engine to a crappy one used in Microsoft Word) do care. Programs who 
send HTML e-mails are going to do at least the bare minimum to ensure 
their messages are displayed and readable, and they will know that 
Internet Explorer's HTML rendering engine is what will most likely be 
parsing the HTML they supply. This almost ensures that a HTML message 
will be at least like this


html
head/head
body
Some content here
/body
/html

Even spammers know that using anything less than the above runs a very 
real risk of the message being unable to be displayed, which would make 
the e-mail completely pointless.


I believe that the behavior of HTML_TAG_BALANCE_HEAD is valid in this 
case, as head/ is invalid HTML (despite what the validator says) and 
should not be used by anyone.


Regards,
Lawrence

(For what it's worth, div/ and p/ are not popular. I've never seen 
them used on any legit site)


Re: eval:html_tag_balance - short tags not accepted?

2011-01-27 Thread Lawrence @ Rogers

On 27/01/2011 5:36 PM, Per Jessen wrote:

Lawrence @ Rogers wrote:


div/  andp/  may pass the validator, but that is most certainly a
bug. A quick look through the XHTML 1.0 DTD's reveals only ten tags
that may be closed using the short form, and I am unable to find any
documentation on the W3C web site to support anything otherwise.

area /
base /
br /
col /
hr /
img /
input /
link /
meta /
param /

Using any other shorthandled elements would result in HTML rendering
engines choking and giving unpredictable results.

I'm not so sure - I think relatively modern renderers are quite capable
of dealing with bothdiv/  andp/  without causing any problems.
p/  instead ofp/p  is not unusual.


What I was suggesting is that your belief is flawed because your test
was flawed itself. No e-mail will ever be justhead/head. Ignoring
the fact that atitle  tag is required as a minimum (although many
e-mails probably omit it), thehead/  form is invalid as well.

Accepted, but it doesn't change the problem in html_eval_tag() - the
code doesn't attempt to validate html, it just does a simple regex
check for a balanced tag, but doesn't accept or ignore the short tag
version with no content.


I believe that the behavior of HTML_TAG_BALANCE_HEAD is valid in this
case, ashead/  is invalid HTML (despite what the validator says) and
should not be used by anyone.

True, but html_eval_tag() will fire on _any_ short tag.


/Per Jessen, Zürich


The problem is that, the majority of HTML e-mails out there are being 
handled by older HTML engines (like IE7 or worse)


If it's firing on head/ with no content, that's completely valid don't 
you think? It's invalid HTML and contains no content. That throws the 
balance off.


Any HTML coder will never assume that people are using programs with 
modern HTML capabilities. Most of the world is only now finally letting 
IE6 die.


Could you provide an example of a site using div/ or p/ shorthand 
tags? I've never seen them before anywhere.


Previously, my understanding has always been that shorthanded closing 
was only allowed for tags that didn't have a closing tag before (such as 
meta). The HTML recommendations support this.


Perhaps there is further work to be done in SA regarding handling HTML 
balancing, but head/ is pointless to test for as it has no reason or 
possible use in the real world.


If html_eval_tag() is firing on any short tag, and not just the invalid 
example code, that would signal a possible bug and investigation.


Cheers,
Lawrence


Re: BlackBerry Email Being Blocked by SpamAssassin

2011-01-13 Thread Lawrence @ Rogers

On 13/01/2011 3:10 PM, Brendan Murtagh wrote:

We are running SpamAssassin 3.2.5 (1.1) with IceWarp Mail Server and
currently the following are whitelisted within IceWarp:

*.bis.na.blackberry.com
*.blackberry.com
*.blackberry.net
A score of 3.0 is much too low for determining if an e-mail is spam. We 
have clients get e-mails all the time that score between 3.5 and 4.0 and 
are non-spam. Anything that scores 5.0 or above is definitely spam in 
our experience.


You may have whitelisted the domains within Icewrap, but all that does 
is ensure they get to SA for scanning. Nothing more.


Cheers,
Lawrence


Re: Spam bot Spam seems to be decreasing

2011-01-11 Thread Lawrence @ Rogers

On 11/01/2011 4:47 PM, Julian Yap wrote:
On Sun, Jan 9, 2011 at 11:42 PM, Jeff Chan je...@surbl.org 
mailto:je...@surbl.org wrote:


On Sunday, January 9, 2011, 12:50:12 PM, Lawrence Rogers wrote:
 On 09/01/2011 4:41 PM, Jari Fredriksson wrote:
 On 9.1.2011 18:40, Marc Perkel wrote:
 Just wondering if anyone else is noticing this. Spam bot spam
is down to
 1/4 of what it was a year ago. I had noticed my black list
shrinking.
 But here's some raw data from someone who tracks it.

 Now:

 http://www.sdsc.edu/~jeff/spam/cbc.html
http://www.sdsc.edu/%7Ejeff/spam/cbc.html

 A year ago:

 http://www.sdsc.edu/~jeff/spam/2010/bc-20100109.html
http://www.sdsc.edu/%7Ejeff/spam/2010/bc-20100109.html

 Are we winning?

 It has been in news also, spam has decreaced since autumn and
then again
 in december. We just have to wait and see if this is permanent.

 It has been since the shutdown of Spamit late last year



http://www.telegraph.co.uk/news/worldnews/europe/russia/8090100/Spam-falls-by-a-fifth-after-Russian-operation-shut-down.html

Rustock is spamming again:

http://www.spamcop.net/spamgraph.shtml?spamweek

http://cbl.abuseat.org/totalflow.html


I concur.  I see a rise again this week.  It really dropped from 
around Christmas time.


- Julian



I get criminals take Christmas off too lol

- Lawrence


Re: Spam bot Spam seems to be decreasing

2011-01-09 Thread Lawrence @ Rogers

On 09/01/2011 4:41 PM, Jari Fredriksson wrote:

On 9.1.2011 18:40, Marc Perkel wrote:

Just wondering if anyone else is noticing this. Spam bot spam is down to
1/4 of what it was a year ago. I had noticed my black list shrinking.
But here's some raw data from someone who tracks it.

Now:

http://www.sdsc.edu/~jeff/spam/cbc.html

A year ago:

http://www.sdsc.edu/~jeff/spam/2010/bc-20100109.html

Are we winning?


It has been in news also, spam has decreaced since autumn and then again
in december. We just have to wait and see if this is permanent.


It has been since the shutdown of Spamit late last year

http://www.telegraph.co.uk/news/worldnews/europe/russia/8090100/Spam-falls-by-a-fifth-after-Russian-operation-shut-down.html


Re: BOTNET rules question

2011-01-05 Thread Lawrence @ Rogers

On 05/01/2011 6:22 PM, Michael Monnerie wrote:

Dear list,

I received this info from a customer, whose order confirmation from the
londontheatredirect.com got marked as spam because of BOTNET* rules. Are
those rules too old, or is that server in a botnet? How to find out?
Or which rules scores should I tune to optimize?


--  Forwarded message --

Datum: Dienstag, 28. Dezember 2010

Preview:  LondonTheatreDirect.com Order confirmation Many thanks for

your order, christian enserer Please print this confirmation for your
reference

[...]



Analyse Details:   (6.0 points, 5.0 required)



Pkt  Name der Regel Beschreibung

 --
-

-0.5 L_P0F_D7   L_P0F_D7

0.5 L_P0F_WRelayed through Windows OS except Windows XP

0.0 RELAY_UK   Relayed through Brittan

2.2 BOTNET Relay might be a spambot or virusbot

[botnet0.8,ip=88.208.245.26,rdns=server88-208-245-26.live-
servers.net,maildo
main=londontheatredir...

0.3 BOTNET_IPINHOSTNAMEHostname contains its own IP address

[botnet_ipinhosntame,ip=88.208.245.26,rdns=server88-208-245-26.live-
servers.
net]

0.0 BOTNET_CLIENT  Relay has a client-like hostname

[botnet_client,ip=88.208.245.26,rdns=server88-208-245-26.live-
servers.net,ip
inhostname]

-0.0 BAYES_40   BODY: Bayes spam probability is 20 to 40%

 [score: 0.3460]

0.0 HTML_MESSAGE   BODY: HTML included in message

0.5 MIME_HTML_ONLY BODY: Message only has text/html MIME parts

0.4 HTML_MIME_NO_HTML_TAG  HTML-only message, but there is no HTML tag

1.0 RDNS_DYNAMIC   Delivered to internal network by host with

 dynamic-looking rDNS

0.0 LOTS_OF_MONEY  Huge... sums of money

1.6 BOTNET_WIN Mail from Windows XP which seems to be in a
Botnet

I would suspect that you are using non-standard rules. What's most 
concerning is the old p0f rules that are looking for Windows XP. That is 
dangerous and a bad thing to use as a rule (the OS of the sender).


I would remove the p0f and botnet rules if I were you. That would solve 
your problem.


Regards,
Lawrence


Re: BOTNET rules question

2011-01-05 Thread Lawrence @ Rogers

On 05/01/2011 8:38 PM, RW wrote:

Aside from BOTNET_WIN the p0f rules are low-scoring and add-up to zero.

Since BOTNETS are 100% Windows it doesn't seem unreasonable to use p0f
in a metarule. However, you might want to look into this inconsistency:
You are right about the overlapping and one rule saying it's Windows XP, 
and the other says it's not.


However, as for botnets, there are a number of Linux botnets nowadays as 
well. Remember Psyb0t from 2009? So while you can argue Windows is 90%+, 
it's not alone :)


Regards,
Lawrence


lots of freemail spam

2010-12-30 Thread Lawrence @ Rogers

Hi,

Lately, I notice we are getting a fair amount (10-12 per day per client) 
of spam coming from freemail users (FREEMAIL_FROM triggers). Usually the 
Subject is non-existent or empty, and the message is always just an URL


Is there a good rule for flagging these as possible spam? I understand 
that there may be some legit e-mails that would hit all 3 factors, so I 
would score the rule low.


Thoughts?

Regards,
Lawrence


Re: Additional sa-update channels

2010-12-15 Thread Lawrence @ Rogers

On 15/12/2010 1:32 PM, Bowie Bailey wrote:

On 12/15/2010 11:57 AM, Andy Jezierski wrote:

Sorry all,

Been away from the list for quite some time.  Just updated SA from
3.2.5 to 3.3.1.  Have been trying to find a list of sa-update channels
that are still relevant but not with much success.

Does anyone know is such a list exists, or if you know of which
additional channels can still be used. I know a lot of them have been
merged into SA and some are outdated and recommended not to be used.

All of the good SARE rules have been merged into SA.  All of the SARE
update channels should no longer be used (as the rules are no longer
being updated).

The best additional channel to use at the moment is the Sought ruleset.

http://wiki.apache.org/spamassassin/SoughtRules

Have to disagree on the Sought rules. I've seen them give quite a few 
false positives (mostly on e-mail notifications from social networks 
Facebook and Twitter), and hit on hardly any spam at all.


Your best best is to use the khop rules, along with one SARE set still 
being updated by Daryl. Below are the channels I recommend:


updates.spamassassin.org
khop-bl.sa.khopesh.com
khop-blessed.sa.khopesh.com
khop-dynamic.sa.khopesh.com
khop-general.sa.khopesh.com
khop-sc-neighbors.sa.khopesh.com
90_2tld.cf.sare.sa-update.dostech.net

Regards,
Lawrence


Re: Additional sa-update channels

2010-12-15 Thread Lawrence @ Rogers

On 15/12/2010 3:51 PM, Bowie Bailey wrote:

The khop rules are good.  I thought the 2tld stuff had been pulled into
SA as 20_aux_tlds.cf?
It has, but the Daryl edited one has some additional stuff (I think) 
that isn't in there. There is conditional code that enables certain 
rules in the file depending on what version of SA you are running.


Re: facebook phishing, SPF_PASS

2010-11-19 Thread Lawrence @ Rogers

On 19/11/2010 4:43 PM, Michael Scheidell wrote:
Thought you would be interested, a facebook phishing email (yes, it 
is, ) with SPF_PASS

(reminding EVERYONE, SPF IS NOT A SPAM VS HAM INDICATOR AT ALL)
yes, I publish SPF, I used it in meta rules.

this one passed because sender used a envelope from in the ip range of 
the spf rules.


http://secnap.pastebin.com/zTmkSc6J
ps, scored a 3.5 here.  by now, hopefully, it scores higher with 
razor/dcc/spamcop, urlbl, etc.



I'm not sure how SPF could pass on this one. The sending server doesn't 
have the same domain name, nor is using an IP authorized in Facebook's 
SPF records. SPF is supposed to confirm that the sending server is 
authorized to do so for the domain, but that clearly fails here.


Re: Sought False Positives

2010-11-08 Thread Lawrence @ Rogers

On 08/11/2010 12:06 PM, Ned Slider wrote:


Fair enough - fortunately I've not seen any of those here so assumed a 
genuine facebook mail had maybe slipped through into the corpus by 
mistake.


Either way, it was fixed by the time I'd spotted it.
I've seen it as well, and disabled the Sought rules. They were causing 
too many FPs and not hitting enough spam to be worthwhile.


Re: Reservation scam?

2010-11-07 Thread Lawrence @ Rogers

On 07/11/2010 8:29 PM, Alex wrote:

Hi,

I just noticed a handful of emails similar to this scam:

http://spamdb.vp44.com/emails/feb09/feb09-234.php

I realize it's a scam, but I'm not sure exactly how, and searching
produced nothing useful. Is this another 419 scam? Can someone point
me to where I can find more info on how this scam works, and more
importantly how to stop them?

Are there any individual rules developed that people are finding useful?

Thanks,
Alex

Can you post the full headers and body from the spam message (including 
Received: lines)?


Re: Reservation scam?

2010-11-07 Thread Lawrence @ Rogers

On 07/11/2010 10:37 PM, Alex wrote:

Hi,


Can you post the full headers and body from the spam message (including
Received: lines)?

Okay, I've figured it out. It's the whole scheme where they convince
you to either deposit one of their checks or accept a credit card
purchase, then expect you to send real money to some other person or
account, all the while they are giving you a fake check or credit
card.

Here's the latest example:

http://pastebin.com/ZUxiLjMy

Rules would very much be appreciated.

Thanks,
Alex

It's going to be difficult to help, as you modified the headers before 
posting the message on Pastebin.


Can you put up the full unmodified message?

Cheers,
Lawrence


Re: new headers rule

2010-11-05 Thread Lawrence @ Rogers

On 05/11/2010 10:58 AM, Randy Ramsdell wrote:
X-MB-Message-Source: WebUI 
You appear to have records of the same spam influencing your bayes 
results (it hits BAYES_99, which is good). What are your Bayes threshold 
settings?


Cheers,
Lawrence


Re: new headers rule

2010-11-05 Thread Lawrence @ Rogers

On 05/11/2010 6:00 PM, Randy Ramsdell wrote:

Lawrence @ Rogers wrote:

On 05/11/2010 10:58 AM, Randy Ramsdell wrote:
X-MB-Message-Source: WebUI 
You appear to have records of the same spam influencing your bayes 
results (it hits BAYES_99, which is good). What are your Bayes 
threshold settings?


Cheers,
Lawrence


I am not sure what you are asking me. Our spam cutoff is around 5. 
Note that the above example was from a ssubject modified message 
that made it through spamassassin. I simply removed the Subject.


In your SpamAssassin configuration, what you you have the following 
options set to:


bayes_auto_learn_threshold_nonspam
bayes_auto_learn_threshold_spam

Cheers,
Lawrence


new headers rule

2010-11-04 Thread Lawrence @ Rogers

Hi,

I've noticed a bunch of spams coming in recently that have no To: and 
Subject: and have cobbled together the following rule to combat them. 
Any feedback would be appreciated.


# Message has empty To: and Subject: headers
# Likely spam
header __LW_EMPTY_SUBJECT Subject =~ /[[:space:]]$/
meta LW_EMPTY_SUBJECT_TO (__LW_EMPTY_SUBJECT  MISSING_HEADERS)
describe LW_EMPTY_SUBJECT_TO Message has empty To and Subject headers
score LW_EMPTY_SUBJECT_TO 2.5

If anyone would like to test this as part of the mass corpus, please 
feel free to do so. I am curious to know how it performs.


Regards,

Lawrence Williams
LCWSoft
www.lcwsoft.com


Re: new headers rule

2010-11-04 Thread Lawrence @ Rogers

On 04/11/2010 5:56 PM, Karsten Bräckelmann wrote:

On Thu, 2010-11-04 at 15:55 -0230, Lawrence @ Rogers wrote:

I've noticed a bunch of spams coming in recently that have no To: and
Subject: and have cobbled together the following rule to combat them.
Any feedback would be appreciated.

Just as a side note, there is a difference between a missing and an
empty header.


# Message has empty To: and Subject: headers
# Likely spam
header __LW_EMPTY_SUBJECT Subject =~ /[[:space:]]$/

That rule does *not* do what you intend. It matches, if the last char of
the Subject happens to be a whitespace.

By definition, that header is not empty. Moreover, it is not equivalent
to a header that has no printable chars, which seems to be what you
actually tried the RE to match.



How's about this then

# Message has empty To: and Subject: headers
# Likely spam
header __LW_EMPTY_TO To  =~ /^[[:space:]]$/
header __LW_EMPTY_SUBJECT Subject =~ /^[[:space:]]$/
meta LW_EMPTY_SUBJECT_TO (__LW_EMPTY_SUBJECT  __LW_EMPTY_TO)
describe LW_EMPTY_SUBJECT_TO Message has empty To and Subject headers
score LW_EMPTY_SUBJECT_TO 2.5


Re: new headers rule

2010-11-04 Thread Lawrence @ Rogers

On 04/11/2010 6:35 PM, Randy Ramsdell wrote:
Are the Subject lines blank or missing from the body? And that goes 
for the To also. 

In the spam I am seeing, there are both present and empty.

Example

To:
Subject:


Re: new headers rule

2010-11-04 Thread Lawrence @ Rogers

On 04/11/2010 8:11 PM, Karsten Bräckelmann wrote:

Moving back on-list, since it doesn't appear to be personally directed
at me.

On Thu, 2010-11-04 at 19:22 -0230, Lawrence @ Rogers wrote:

On 04/11/2010 7:13 PM, Karsten Bräckelmann wrote:

No, that requires the Subject to consist of exactly one whitespace.

Read it out load. The ^ beginning of the string, followed by exactly one
whitespace char [2]. Followed by the $ end of the string.

No offense, but I am a C and PHP programmer and Perl's documentation is
lacking, to put it politely. Too much theory and far too few actual real
world examples.

This is not about Perl, but Regular Expressions. The much more feature-
rich (and widely adopted) Perl flavor, out of all the existing variants.
But that's actually irrelevant in this case, cause you would need a very
limited sub-set only, pretty much available in any tool sporting REs.

Any introduction to REs would do, no need to tend to the Perl docs you
don't like. Though it sounds like you didn't even had a look at the docs
I pointed you to.



That is exactly what I am trying to match, and according to my tests, it
works as expected. When the To and Subject are empty, all that's there
(before the newline) is one whitespace.

Are you referring to the whitespace delimiter between the Header: and
its content? It's not part of the content.


What I am looking to check is a situation where both the To: and
Subject: headers contain nothing at all, but are set (I've seen this in
several spam e-mails recently)

Now you're confusing me. Do you want to match a single whitespace, or a
completely empty header?



If there's a better way of doing this, I would appreciate you providing
an example.

Well, better way... One that does what you just described.

Assuming you want to match headers containing nothing at all, as per
your previous paragraph. That would be nothing between the beginning and
end.
   header __FOO  Foo =~ /^$/

Or, negated, not anything.
   header __FOO  Foo !~ /./

Now, since you specifically constrained this, you might want to check
for the header's existence. Probably not worth it, though. The following
is copied from stock 20_head_tests.cf, and documented in SA Conf.
   header __HAS_SUBJECT  exists:Subject


Anyway, in cases like these it's best to provide a *raw* sample, showing
the headers in question completely un-munged and exactly as seen by SA.
(Otherwise our help often is limited to guessing and an informal
description.) This prohibits copy-n-paste from your MUA, which too often
changes subtle but important details.

One easy way to come to a conclusion whether you want to match
whitespace or not, is the following ad-hoc header rule with spamassassin
debug. The matching header's contents are shown in double quotes.

   spamassassin -D --cf=header FOO To =~ /^.*/  msg  21 | grep FOO

And just for reference, 'grep' uses REs...



Thanks Karsten,

One of these days when I get some free time, I will be sitting down and 
reading up on REs :)


Using your examples, and some hackery, I came up with this. It checks 
for the existence of the To header as well, as SA doesn't seem to have a 
rule for doing this on it's own  (a grep -r exists:To * on the rules 
pulled in from updates.spamassassin.org produced nothing).


# Message has empty To: and Subject: headers
# Likely spam
header __LW_HAS_TO exists:To
header __LW_EMPTY_TO To =~ /^$/
header __LW_EMPTY_SUBJECT Subject =~ /^$/
meta LW_EMPTY_SUBJECT_TO (__HAS_SUBJECT  __LW_HAS_TO  
__LW_EMPTY_SUBJECT  __LW_EMPTY_TO)

describe LW_EMPTY_SUBJECT_TO Message has empty To and Subject headers
score LW_EMPTY_SUBJECT_TO 2.5

I added this to my custom .cf rules file and ran spamassassin --lint and 
got no complaints. I ran it over a sample spam, and it hit. I took 
another spam where both headers had information in them, and it didn't 
hit. Guess it works as expected :)


Cheers,
Lawrence


new rule

2010-11-02 Thread Lawrence @ Rogers

Hi,

Does anyone see anything wrong with this rule I just put together. It is 
a meta rule that is intended to attempt to detect HTML-only spam with 
forged freemail Reply-To: header


meta LW_HTML_REPLYTO_FORGED (FREEMAIL_FORGED_REPLYTO  HTML_MESSAGE  
MIME_HTML_ONLY)
describe LW_HTML_REPLYTO_FORGED HTML-only message with forged freemail 
reply-to

score  LW_HTML_REPLYTO_FORGED 2.0
tflags LW_HTML_REPLYTO_FORGED noautolearn

I've set it to noautolearn while testing.

Regards,
Lawrence


comparing From and Reply-To:

2010-11-02 Thread Lawrence @ Rogers
As a sort of follow up to my last message, I was wondering how 
complicated it is to write a rule that would compare the From: and 
Reply-To: headers, and set it to 0.001 or make it a meta rule that could 
be used in conjunction with others?


Would this plugin suffice?

http://wiki.apache.org/spamassassin/FromNotReplyTo

Regards,
Lawrence


Re: comparing From and Reply-To:

2010-11-02 Thread Lawrence @ Rogers

On 02/11/2010 6:43 PM, Chris Conn wrote:

On 2010-11-02 17:01, Lawrence @ Rogers wrote:

As a sort of follow up to my last message, I was wondering how
complicated it is to write a rule that would compare the From: and
Reply-To: headers, and set it to 0.001 or make it a meta rule that could
be used in conjunction with others?

Would this plugin suffice?

http://wiki.apache.org/spamassassin/FromNotReplyTo

Regards,
Lawrence


I use this plugin for precisely that.  We have modified the plugin to 
match particular addresses in order to score highly for phishing and 
whatnot.


Chris


I've gotten it working here and it seems to do exactly what I want. 
Compare the 2 e-mail addresses only, and ignore the extra crap like the 
name and such.


I've set it to score 0.001 and used it as part of a few meta rules to 
help out with some spam.


Re: Rule works in testing, but not hitting live mail

2010-10-29 Thread Lawrence @ Rogers

On 29/10/2010 3:32 PM, NFN Smith wrote:

header LR_OBSC_RECIPS   To =~ /\\\/
Is this rule being used standalone, or as part of a meta rule? Do you 
have a score declared for it? If so, what is it?


Does spamassassin --lint report any errors at the end of its output?

Cheers,
Lawrence


Re: Rule works in testing, but not hitting live mail

2010-10-29 Thread Lawrence @ Rogers

On 29/10/2010 4:06 PM, NFN Smith wrote:

Lawrence @ Rogers wrote:

On 29/10/2010 3:32 PM, NFN Smith wrote:

header LR_OBSC_RECIPS   To =~ /\\\/



Is this rule being used standalone, or as part of a meta rule? Do you
have a score declared for it? If so, what is it?


Right now, I'm scoring at 1.25 points.  Thus, it's not a hidden rule. 
Also, in testing, not only is the rule showing as expected in the 
SpamAssassinReport.txt attachment, the debug log is also showing that 
the rule is firing correctly.


Oct 29 18:30:18.807 [27696] dbg: rules: ran header rule 
LR_OBSC_RECIPS == got hit: 




Does spamassassin --lint report any errors at the end of its output?


I double-checked, and a --lint check comes up clean.

When I apply rules updates to working configurations, I use a script, 
and part of that script includes a --lint check.  If --lint complains, 
then I don't replicate the update to my production servers.


Smith



Are you running it against an e-mail with a known match? Using 
spamassassin -D -t sample-spam.txt and having sample-spam.txt contain 
the complete e-mail including headers?


Are you sure the machine in question doesn't have 2 copies of SA 
installed (I have seen this before on cPanel servers, one installed via 
CPAN and the other via RPM)


Re: Collecting IP reputation data from many people

2010-10-28 Thread Lawrence @ Rogers

On 28/10/2010 1:45 PM, David F. Skoll wrote:

OK,

On a somewhat less sarcastic note:  One reason we didn't use TCP is that
it simply doesn't scale.  If you have clients that open a TCP connection,
do a report, and then close the TCP connection, there's a huge bandwidth
penalty.  On the other hand, if your clients maintain persistent TCP
connections, your server is going to run out of sockets rather quickly.

Remember, our system is designed to scale to tens or hundreds of thousands
of reporting systems sending tens or hundreds of thousands of reports
per second.

Regards,

David.

What reporting system do you use? and how does one avail of the data it 
provides?


Re: rule to catch subject spamming

2010-10-24 Thread Lawrence @ Rogers

On 23/10/2010 5:47 PM, RW wrote:

On Sat, 23 Oct 2010 14:28:38 -0230
Lawrence @ Rogerslawrencewilli...@nl.rogers.com  wrote:


Hello all,

I noticed recently that our users are getting spam with the subject
similar to the following:

SehxpyNaturalRedheaddFayeReaganHasHerFirstLesbianExperienceWithBrunet


I got some of these a while ago. They were pretty hard to catch because
they came through Hotmail and had little to work with in the body.
I added:


headerSUBJ_LONG_WORD   Subject  =~ /\b[^[:space:][:punct:]]{30}/
describe  SUBJ_LONG_WORD   Longwordinsubjectlikethis
score SUBJ_LONG_WORD   2.0

headerSUBJ_JOIN_CAP_WORD   Subject  =~ 
/([[:upper:]]+[[:lower:]]+){5}/
describe  SUBJ_JOIN_CAP_WORD   JoinedCapitalizedWordsRuntogether
score SUBJ_JOIN_CAP_WORD   1.5


They are missing some ?:, but for single header rules I don't really
care.

Thanks, but some testing showed that your rules FP on URLs in the 
Subject line.


I have settled on the following as it's more specific and less prone to 
FPs (I can't think of any possibilities right now)


# Matches a new technique used by spammers in the Subject line
# Running a bunch of pornographic words together (with no spaces) to 
evade spam filters
# The message itself is generally malformed HTML with one or more 
unusually long lines
# This rule is a meta rule that tests for the Subject containing any 
numbers, letters, or common formatting
# Must hit at least 3 SA rules (__LOCAL_SUBJECT_SPAMMY, and 2 others... 
usually HTML_MESSAGE and MIME_QP_LONG_LINE)

# string must be at least 42 characters and contain no spaces

header __LOCAL_SUBJECT_SPAMMY  Subject =~ /^[0-9a-zA-Z,.+]{42,}$/
meta LOCAL_SUBJECT_SPAMMY1  ((__LOCAL_SUBJECT_SPAMMY + HTML_MESSAGE + 
MIME_QP_LONG_LINE + MPART_ALT_DIFF + TRACKER_ID)  2)
describe LOCAL_SUBJECT_SPAMMY1  Subject looks spammy (contains a lot of 
characters, and no spaces)

score  LOCAL_SUBJECT_SPAMMY1 5.0
tflags LOCAL_SUBJECT_SPAMMY1 noautolearn

Cheers,
Lawrence Williams
LCWSoft


compare 2 headers

2010-10-24 Thread Lawrence @ Rogers

Hi,

Is there a quick way to compare 2 headers? I am seeing spam lately that 
has an invalid e-mail address (one not hosted by us) set in the To: 
header, but has the intended one in the Envelope-To: header


What I would like to do is take the Envelope-To and run a regex to check 
if the To: header contains it.


Is this possible?

Regards,

Lawrence Williams
LCWSoft


Re: compare 2 headers

2010-10-24 Thread Lawrence @ Rogers

On 24/10/2010 5:44 PM, Karsten Bräckelmann wrote:

On Sun, 2010-10-24 at 16:26 -0230, Lawrence @ Rogers wrote:

Is there a quick way to compare 2 headers? I am seeing spam lately that
has an invalid e-mail address (one not hosted by us) set in the To:
header, but has the intended one in the Envelope-To: header

What I would like to do is take the Envelope-To and run a regex to check
if the To: header contains it.

The To header is merely cosmetic. It does not have any solid meaning, in
particular does not necessarily match the recipient.

There are perfectly valid reasons to not have the actual recipient in
the To header. Ever sent a message with Bcc recipients? Ever received a
post via a mailing list?


I had not thought of that, but you are right :) I see this mailing list 
sets the To: header to users@spamassassin.apache.org, even though the 
e-mail comes to me.


I am writing a rule that deals with spam that claims to be coming from 
AOL's webmail client, where the e-mail has malformed HTML, references to 
remote images, and a high ratio of images to content. I guess I will 
have to find another way to detect them.


Re: compare 2 headers

2010-10-24 Thread Lawrence @ Rogers

On 24/10/2010 9:27 PM, Martin Gregorie wrote:

On Sun, 2010-10-24 at 18:03 -0230, Lawrence @ Rogers wrote:

On 24/10/2010 5:44 PM, Karsten Bräckelmann wrote:

There are perfectly valid reasons to not have the actual recipient in
the To header. Ever sent a message with Bcc recipients? Ever received a
post via a mailing list?



I had not thought of that, but you are right :) I see this mailing list
sets the To: header to users@spamassassin.apache.org, even though the
e-mail comes to me.


You might want to write a very low scoring rule (score 0.01) that fires
on 'List-id' headers for mailing lists you are subscribed to and use
this in a meta rules to use different rules for mail from known mailing
lists and everything else.


Martin


Thanks, but I decided to go a different route with this one, as Karsten 
was right (it was too risky).


rule to catch subject spamming

2010-10-23 Thread Lawrence @ Rogers

Hello all,

I noticed recently that our users are getting spam with the subject 
similar to the following:


SehxpyNaturalRedheaddFayeReaganHasHerFirstLesbianExperienceWithBrunet

SpamAssassin seems to be having a hard time determining whether it is 
spam or not because it appears as one long word.


In all cases, the subject contains no spaces (to prevent detection I 
would think) and is longer than 62 characters (not sure why they do 
this, but it is true in every sample I've seen so far).


I would like to create a rule to pick up on this, but having a bit of 
difficult with the regex for the rule. This is what I've come up with so far


header CR_SUBJECT_SPAMMYSubject =~ /.{62}/
describe CR_SUBJECT_SPAMMY  Subject looks spammy (contains a lot of 
characters, and no spaces)

score CR_SUBJECT_SPAMMY 2.5

I just need to modify the regex to check that the Subject contains no 
spaces.


I've done some research, and the longest non-coined word in a major 
dictionary is 30 characters long, meaning that if it was used twice in a 
subject, the total length would still only be 60 characters, There may 
be some FPs if the sender used formatting like commas and such, but the 
possibility of them using 2 of the word, then formatting without 
spacing, would probably be extremely remote.


Any assistance or advice would be greatly appreciated.

Regards,

Lawrence Williams
LCWSoft


Re: rule to catch subject spamming

2010-10-23 Thread Lawrence @ Rogers

On 23/10/2010 2:28 PM, Lawrence @ Rogers wrote:

Hello all,

I noticed recently that our users are getting spam with the subject 
similar to the following:


SehxpyNaturalRedheaddFayeReaganHasHerFirstLesbianExperienceWithBrunet

SpamAssassin seems to be having a hard time determining whether it is 
spam or not because it appears as one long word.


In all cases, the subject contains no spaces (to prevent detection I 
would think) and is longer than 62 characters (not sure why they do 
this, but it is true in every sample I've seen so far).


I would like to create a rule to pick up on this, but having a bit of 
difficult with the regex for the rule. This is what I've come up with 
so far


header CR_SUBJECT_SPAMMYSubject =~ /.{62}/
describe CR_SUBJECT_SPAMMY  Subject looks spammy (contains a lot of 
characters, and no spaces)

score CR_SUBJECT_SPAMMY 2.5

I just need to modify the regex to check that the Subject contains no 
spaces.


I've done some research, and the longest non-coined word in a major 
dictionary is 30 characters long, meaning that if it was used twice in 
a subject, the total length would still only be 60 characters, There 
may be some FPs if the sender used formatting like commas and such, 
but the possibility of them using 2 of the word, then formatting 
without spacing, would probably be extremely remote.


Any assistance or advice would be greatly appreciated.

Regards,

Lawrence Williams
LCWSoft



This is the rule I've come up with now

# Matches a new technique used by spammers in the Subject line
# Running a bunch of pornographic words together (with no spaces) to evade
# spam filters
# This rule tests for the Subject containing any numbers, letters, or 
common formatting

# string must be at least 42 characters and contain no spaces

header CR_SUBJECT_SPAMMYSubject =~ /^[0-9a-zA-Z,.+]{42,}$/
describe CR_SUBJECT_SPAMMY  Subject looks spammy (contains a lot of 
characters, and no spaces)

score CR_SUBJECT_SPAMMY 3.5
tflags CR_SUBJECT_SPAMMY noautolearn


prevent rule from being considered for Bayes auto-learning

2010-10-21 Thread Lawrence @ Rogers

Hi,

I recall reading somewhere that there is a way to prevent a rule from 
being considered for Bayes auto-learning. I am trying to create a rule 
that hits upon some obvious spam that I am seeing, yet I want to make 
sure (for now) that any scores it assigns are not used for anything 
Bayes-related. I cannot seem to find any documentation on how to do this 
(Google doesn't help). I think it is something to do with setting a 
tflag, but any guidance would be appreciated.


Regards,

Lawrence Williams
LCWSoft
www.lcwsoft.com


Re: prevent rule from being considered for Bayes auto-learning

2010-10-21 Thread Lawrence @ Rogers

On 21/10/2010 2:17 PM, Karsten Bräckelmann wrote:

On Thu, 2010-10-21 at 18:39 +0200, Karsten Bräckelmann wrote:

See M::SA::Plugin::AutoLearnThreshold. In a nutshell,  (a) there are a
few tflags that will prevent a rule's score to be used for auto-learning
and  (b) the score used is picked from the respective non-bayes
score-set.

With (a) you can make a rule invisible to the auto-learning decision.
And by setting the scores for score-set 0 and 1 both to 0 as per (b),
you can effectively disable a rule unless Bayes is enabled.

... *and* have that rule ignored for the auto-learning decision, if
Bayes and auto-learn is enabled. (Actually not ignored, but adding zero
doesn't influence the result. ;)

The tflags way is much more straight forward, though.



You cannot, however, create a rule to conditionally prevent auto-
learning altogether (which, as I understand isn't what you had in mind
anyway).
Thanks everyone, I have set the rule to noautolearn using the tflags 
directive (this is what I wanted, for the rule to simply not be 
considered when auto-learning).


- Lawrence