Re: [Declude.JunkMail] Gibberish body detector + inline Base64

2003-09-13 Thread Matthew Bramble
Markus,

I've found myself that the subject test is only slightly useful in the 
scheme of things, but while I know a false positives will happen, I 
haven't seen any under that configuration in the last day.  I've now 
stopped monitoring that test as a result.  BTW, it's very good to know 
that this isn't picking up FP's from mail mainly used by other 
languages, albeit western ones.  I see very little real stuff from 
overseas, so that is hard for me to test.

My feeling is that once you achieve moderate success with Declude, each 
successive step is that much harder to make.  Combined with the body 
gibberish (which often also trips the subject gibberish) and a test for 
obfuscation, this makes a very noticeable impact.  They're all pretty 
much the same test anyway because they're all markers for the same 
school of thought in spamming.  The types of folks that send from open 
relays or wormed machines are also the types of folks that use a lot of 
these techniques.  I'm now able to fail some messages without any header 
errors because they combine subject spaces, obsfucation, gibberish and 
comments.  These guys seem more concerned with masking the content of 
their messages than they are with masking their masking techniques.  I'm 
fine with that because I think looking for techniques produces fewer 
FP's than looking for content.

So in general, I see all of these things as the same test, and most hits 
will score on at least one other test mentioned.  It's hard to say that 
it didn't have an impact when you could say the same about SUBJECTSPACES 
for instance...something often combined with GIBBERISHSUB.

Right now all I am looking for is loose change in the couch, and I found 
a few more pennies.  I've fixed the major problems with the GIBBERISH 
body filter on my machine, and that makes a much bigger impact on 
results than the subject filter because it picks up fake boundaries and 
links that spammers are using even when they don't include gibberish in 
text and comments (I didn't realize that until yesterday, but it 
accounts for a lot of the hits).  FP's are higher, but nothing has 
failed my machine under the new configuration because of that test.  
I'll post the updated filter once I have 1,000 hits and can put together 
some numbers to go along with it.

Thanks,

Matt

Markus Gufler wrote:

Matt, here my observations about GIBBERISHSUB:

I've tested this now for over a day on our mailserver (which handles
mainly messages written in german and italian)
Haven't found any FP, but any spam-message triggering this test has
already recieved more then 200% of our hold weight.
However: good test!

Markus

 



---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]
---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail.  The archives can be found
at http://www.mail-archive.com.


Re: [Declude.JunkMail] Gibberish body detector + inline Base64

2003-09-12 Thread Frederick Samarelli
Do you this in addition to or in replace of the tested listed earlier.

GibberishSub.txt

- Original Message - 
From: Matthew Bramble [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, September 12, 2003 2:41 PM
Subject: [Declude.JunkMail] Gibberish body detector + inline Base64


 I've been testing this for almost a day and have had very good results
 with this filter as it is catching spam all the time...over 1/3 of my
 total mail volume is being tagged in fact.

 Here's how it works.  Like the Gibberish subject test, this searches for
 strings of characters not found commonly in communications.  Since
 Base64 encoding has to be scanned with text filters at this time, the
 filter will automatically trip on any Base64 content because of how
 common strings with Q are in the encoding.  In order to offset this
 effect, it searches for attachment; which is required for any
 non-inline content, and gives back points.  Since this code isn't
 associated with inline Base64 content, it won't get tripped there and
 has the net effect of acting just like Declude's BASE64 test.  If you
 test this out, you are advised to reduce the score of BASE64 by the
 exact score of this test.  Again, this test gets tripped by all
 attachments, but it doesn't change their score.  I've found that inline
 BASE64 only accounts for less than 20% of the hits.

 If you don't use BASE64 test because of foreign languages or other
 similar issues, that test can be scored negatively in order to offset
 the effects of the inline detection by this filter so that only
 displayable text and HTML will produce a change in score.  That includes
 non-displayable gibberish text in brackets.

 False positives are bound to happen, however their occurrence is fairly
 low.  Since HTML code is also searched, it will find matches in some
 URL's, especially ones with a tracking capability such as those used by
 Yahoo! Groups (in the ad sent with listserv postings) and Buy.com, and
 even less often it will find a match in regular wording, primarily with
 the use of acronyms..  I'm very interested in hearing about more FP's if
 you find them.

 The filter is designed to be used with v1.75 of declude without the
 decoding turned off (default on).  It can be modified to work with older
 versions of Declude by changing the attachments; offset to base64 in
 which case it won't detect any Base64 unless it is not appropriately
 tagged (useful).

 I think this is a killer test.  Enjoy.

 Matt









 # GIBBERISH
 # Last Update: 09/12/2003
 #
 # Description:
 # Finds gibberish in the body of the message, including comment blocks.
Will be triggered on
 # any Base64 encoding due to how common Q combinations are.  A negative
weight for attachments
 # defeats the test, however inline base64 encoded content will receive
full scoring.  The BASE64
 # test should be reduced by the score of this test in order to compensate
for this fact.
 #
 # Usage:
 # GIBBERISH filter C:\IMail\Declude\Gibberish.txt x 5
0
 #
 # False Positives
 # Will result primarily from URL's containing random looking strings.
Known offenders include
 # Buy.com and Yahoo! Groups.



 # The following defeats the test if it finds an attachment.

 BODY -5 CONTAINS attachment;


 # Small list of letter combinations not found in a basic dictionary.

 BODY 0 CONTAINS qb
 BODY 0 CONTAINS qc
 BODY 0 CONTAINS qd
 BODY 0 CONTAINS qf
 BODY 0 CONTAINS qg
 BODY 0 CONTAINS qh
 BODY 0 CONTAINS qi
 BODY 0 CONTAINS qj
 BODY 0 CONTAINS qk
 BODY 0 CONTAINS qm
 BODY 0 CONTAINS qn
 BODY 0 CONTAINS qo
 BODY 0 CONTAINS qp
 BODY 0 CONTAINS qr
 BODY 0 CONTAINS qs
 BODY 0 CONTAINS qt
 BODY 0 CONTAINS qv
 BODY 0 CONTAINS qx
 BODY 0 CONTAINS qy
 BODY 0 CONTAINS qz

 BODY 0 CONTAINS vq
 BODY 0 CONTAINS wq
 BODY 0 CONTAINS tq
 BODY 0 CONTAINS jq

 BODY 0 CONTAINS xd
 BODY 0 CONTAINS xj
 BODY 0 CONTAINS xk
 BODY 0 CONTAINS xr
 BODY 0 CONTAINS xz

 BODY 0 CONTAINS zb
 BODY 0 CONTAINS zc
 BODY 0 CONTAINS zf
 BODY 0 CONTAINS zj
 BODY 0 CONTAINS zk
 BODY 0 CONTAINS zm
 BODY 0 CONTAINS zx

---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]

---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail.  The archives can be found
at http://www.mail-archive.com.


Re: [Declude.JunkMail] Gibberish body detector + inline Base64

2003-09-12 Thread Matthew Bramble
Frederick Samarelli wrote:

Do you this in addition to or in replace of the tested listed earlier.

It's completely separate from the GIBBERSUB filter.  I updated the list 
of keywords in the subject filter so that it is the same as the one I 
just posted after finding FP's on the acronym 'QE'EG (Quantitative 
Electroencephalogram) and bamboo'zl'e.  Depending on how you score it, 
that might not matter all that much.

My latest version of GIBBERSUB is attached.  I started dating them 
whenever I make changes in the even that helps anyone that wants to use 
them.

You also might want to whitelist declude.com if you are using these 
filters :)

Matt
# GIBBERISHSUB
# 09/11/2003
#
# Description:
# Built to look for random strings of text (gibberish) in the subject of a message by 
searching
# for character combinations that aren't common in E-mail communications.  Will be 
triggered on
# any Base64 encoding due to the code marker used to tell the mail client to display 
the proper
# character set.  A negative weight for the same code marker defeats the test in order 
to
# protect from false positives on the encoded content.
#
# Usage:
# GIBBERISHSUB filter C:\IMail\Declude\GibberishSub.txt x 5 0
#
# False Positives:
# Very rare.  Would be primarily attributed to randomly generated codes, acronyms and
# misspellings.


# The following defeats the test if it finds the subject is not sent as ASCII

SUBJECT -5  CONTAINS?b?


# Small list of letter combinations not found in a basic dictionary.

SUBJECT 0   CONTAINSqb
SUBJECT 0   CONTAINSqc
SUBJECT 0   CONTAINSqd
SUBJECT 0   CONTAINSqf
SUBJECT 0   CONTAINSqg
SUBJECT 0   CONTAINSqh
SUBJECT 0   CONTAINSqi
SUBJECT 0   CONTAINSqj
SUBJECT 0   CONTAINSqk
SUBJECT 0   CONTAINSqm
SUBJECT 0   CONTAINSqn
SUBJECT 0   CONTAINSqo
SUBJECT 0   CONTAINSqp
SUBJECT 0   CONTAINSqr
SUBJECT 0   CONTAINSqs
SUBJECT 0   CONTAINSqt
SUBJECT 0   CONTAINSqv
SUBJECT 0   CONTAINSqx
SUBJECT 0   CONTAINSqy
SUBJECT 0   CONTAINSqz

SUBJECT 0   CONTAINSvq
SUBJECT 0   CONTAINSwq
SUBJECT 0   CONTAINStq
SUBJECT 0   CONTAINSjq

SUBJECT 0   CONTAINSxd
SUBJECT 0   CONTAINSxj
SUBJECT 0   CONTAINSxk
SUBJECT 0   CONTAINSxr
SUBJECT 0   CONTAINSxz

SUBJECT 0   CONTAINSzb
SUBJECT 0   CONTAINSzc
SUBJECT 0   CONTAINSzf
SUBJECT 0   CONTAINSzj
SUBJECT 0   CONTAINSzk
SUBJECT 0   CONTAINSzm
SUBJECT 0   CONTAINSzx

Re: [Declude.JunkMail] Gibberish body detector + inline Base64

2003-09-12 Thread Frederick Samarelli
Thanks
- Original Message - 
From: Matthew Bramble [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, September 12, 2003 5:15 PM
Subject: Re: [Declude.JunkMail] Gibberish body detector + inline Base64


 Frederick Samarelli wrote:

 Do you this in addition to or in replace of the tested listed earlier.
 

 It's completely separate from the GIBBERSUB filter.  I updated the list
 of keywords in the subject filter so that it is the same as the one I
 just posted after finding FP's on the acronym 'QE'EG (Quantitative
 Electroencephalogram) and bamboo'zl'e.  Depending on how you score it,
 that might not matter all that much.

 My latest version of GIBBERSUB is attached.  I started dating them
 whenever I make changes in the even that helps anyone that wants to use
 them.

 You also might want to whitelist declude.com if you are using these
 filters :)

 Matt







 # GIBBERISHSUB
 # 09/11/2003
 #
 # Description:
 # Built to look for random strings of text (gibberish) in the subject of a
message by searching
 # for character combinations that aren't common in E-mail communications.
Will be triggered on
 # any Base64 encoding due to the code marker used to tell the mail client
to display the proper
 # character set.  A negative weight for the same code marker defeats the
test in order to
 # protect from false positives on the encoded content.
 #
 # Usage:
 # GIBBERISHSUB filter C:\IMail\Declude\GibberishSub.txt x
5 0
 #
 # False Positives:
 # Very rare.  Would be primarily attributed to randomly generated codes,
acronyms and
 # misspellings.


 # The following defeats the test if it finds the subject is not sent as
ASCII

 SUBJECT -5 CONTAINS ?b?


 # Small list of letter combinations not found in a basic dictionary.

 SUBJECT 0 CONTAINS qb
 SUBJECT 0 CONTAINS qc
 SUBJECT 0 CONTAINS qd
 SUBJECT 0 CONTAINS qf
 SUBJECT 0 CONTAINS qg
 SUBJECT 0 CONTAINS qh
 SUBJECT 0 CONTAINS qi
 SUBJECT 0 CONTAINS qj
 SUBJECT 0 CONTAINS qk
 SUBJECT 0 CONTAINS qm
 SUBJECT 0 CONTAINS qn
 SUBJECT 0 CONTAINS qo
 SUBJECT 0 CONTAINS qp
 SUBJECT 0 CONTAINS qr
 SUBJECT 0 CONTAINS qs
 SUBJECT 0 CONTAINS qt
 SUBJECT 0 CONTAINS qv
 SUBJECT 0 CONTAINS qx
 SUBJECT 0 CONTAINS qy
 SUBJECT 0 CONTAINS qz

 SUBJECT 0 CONTAINS vq
 SUBJECT 0 CONTAINS wq
 SUBJECT 0 CONTAINS tq
 SUBJECT 0 CONTAINS jq

 SUBJECT 0 CONTAINS xd
 SUBJECT 0 CONTAINS xj
 SUBJECT 0 CONTAINS xk
 SUBJECT 0 CONTAINS xr
 SUBJECT 0 CONTAINS xz

 SUBJECT 0 CONTAINS zb
 SUBJECT 0 CONTAINS zc
 SUBJECT 0 CONTAINS zf
 SUBJECT 0 CONTAINS zj
 SUBJECT 0 CONTAINS zk
 SUBJECT 0 CONTAINS zm
 SUBJECT 0 CONTAINS zx

---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]

---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail.  The archives can be found
at http://www.mail-archive.com.


Re: [Declude.JunkMail] Gibberish body detector + inline Base64

2003-09-12 Thread Matthew Bramble
Thanks Josh.  I'm sure there are more exceptions to come as well, but 
hopefully only a handful.  BTW, I did whitelist declude.com, so no 
problems here with reading anything just as long as Scott doesn't start 
using these filters with a high score :)  Your message also 
definitively answered the whitelisting question, John was right that 
all it does is defeat the scoring...my capture account still grabbed a 
copy of the message.

Could you post the full headers of that message with PGP, as well as 
any boundary code that might have been above the PGP signature.  I 
could only find one example in 7 years of E-mail :)   Just to be 
responsible with resources,  it would be better to search the headers 
rather than the body.  If folks haven't realized this yet, filtering 
the entire body with attachments can pull a lot of processing power, 
and it can be bad with very large files.  My dual 1 Ghz machine that 
generally bounces in the low single digits pulled about 50% for several 
seconds on a 14 MB attachment using a different filter with over 1,000 
lines of BODY CONTAINS.  I assume that PGP signatures should be marked 
in the headers as an attachment, i.e. application/pgp-signature.  If 
there are exceptions to this, then the BODY makes sense.

This is still a filter in progress.  I have another false positive that 
I just caught from an inline image that didn't trip the BASE64 filter 
or contain the attachment marker.  This is accepted behavior for 
E-mail, so I'm going to have to figure out another way to not score 
such content.  It will probably end up necessary to place the exception 
testing in a different filter so that it doesn't hit more than one 
exception at a time.  Spammers use inline images on a rare occasion and 
I would hate to take extra points away from them.

And thanks to Kami for the kind words :)

BTW, both gibberish filters should remove the qo combination due to 
'QO'S.  I'll post another copy of my file when I figure out the PGP and 
inline problems.  If anyone has any pointers on other inline Base64 
stuff, I'd appreciate hearing it.  It's important to exclude everything 
that the BASE64 test doesn't catch, so knowing the strict criteria 
there helps (i.e. what does it look for).  This might also include 
needing to exclude some inline text, I'm not sure yet.  Still works 
pretty good though.

Thanks,

Matt

Joshua Levitsky wrote:

Question:  Below is a PGP signed message. Notice that it will fail 
your gibberish body test. I would suggest that just like you look for 
attachment in the body, that you also give -5 points to BEGIN PGP 
SIGNATURE  because you are for sure going to see gibberish contained 
in a PGP or GPG signature.

Hope this helps in your spam fighting.

-Josh



-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
n Sep 12, 2003, at 5:15 PM, Matthew Bramble wrote:

Frederick Sama

-BEGIN PGP SIGNATURE-
Version: PGP 8.0.2
iQA/AwUBP2JPAXx8sPj6XQb+EQLuaACgi2cdS7XaOKLfIaVCJ96un+/iGc8AnjBq
DtlxcebkqwzfEpYOzCDFo5CG
=m4KE
-END PGP SIGNATURE-


---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]
---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail.  The archives can be found
at http://www.mail-archive.com.


Re: [Declude.JunkMail] Gibberish body detector + inline Base64

2003-09-12 Thread Matthew Bramble
Someone pointed me to a problem with PGP that needs to be fixed with 
this filter, and there are still some other issues as well.  This is 
still a filter in progress.

I have another false positive that I just caught from an inline image 
that didn't trip the BASE64 filter or contain the attachment marker.  
This is standard behavior for E-mail, so I'm going to have to figure out 
another way to not score such content.  It will probably end up 
necessary to place the exception testing in a different filter so that 
it doesn't hit more than one exception at a time.  Spammers use inline 
images on a rare occasion and I would hate to take extra points away 
from them.

BTW, both gibberish filters should remove the qo combination due to 
'QO'S, qb because of 'QB', qv because of 'QV'C, and qi because of 'Qi' 
and other Chinese names.  The list of combinations is starting to get 
smaller, however there is a limit to how tight the test should be.  I've 
been using Google as a benchmark for letter combinations, qu for 
instance scores 41,500,000 results (allowed), qb scores 2,600,000 
results, qi scores 2,360,000, but jq only scores 838,000.  Seems that 
anything around 1,500,000 or less is about as good as it gets.  This 
doesn't include though when the letters appear inside of a dictionary 
word, and that should be almost nonexistent.  The goal is to find the 
least common of all.  Needless to say, there are enough exceptions to 
score low no matter how refined it is, however it seems to be scoring 
about 98% valid hits on spam even with the obvious limitations.   I'll 
post another copy of my file when I figure out the PGP and inline 
problems.  If anyone has any pointers on other inline Base64 stuff, I'd 
appreciate hearing it.  It's important to exclude everything that the 
BASE64 test doesn't catch, so knowing the strict criteria there helps 
(i.e. what does it look for).  This might also include needing to 
exclude some inline text, I'm not sure yet.  Still works pretty good 
though.

And thanks to Kami for the kind words :)

Thanks,

Matt



---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]
---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail.  The archives can be found
at http://www.mail-archive.com.


Re: [Declude.JunkMail] Gibberish body detector + inline Base64

2003-09-12 Thread Frederick Samarelli
Matt,

How well does this work.

BODY -5 CONTAINS attachment

I noticed it did not counter weight a photo attachment.

Fred
- Original Message - 
From: Matthew Bramble [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, September 12, 2003 2:41 PM
Subject: [Declude.JunkMail] Gibberish body detector + inline Base64


 I've been testing this for almost a day and have had very good results
 with this filter as it is catching spam all the time...over 1/3 of my
 total mail volume is being tagged in fact.

 Here's how it works.  Like the Gibberish subject test, this searches for
 strings of characters not found commonly in communications.  Since
 Base64 encoding has to be scanned with text filters at this time, the
 filter will automatically trip on any Base64 content because of how
 common strings with Q are in the encoding.  In order to offset this
 effect, it searches for attachment; which is required for any
 non-inline content, and gives back points.  Since this code isn't
 associated with inline Base64 content, it won't get tripped there and
 has the net effect of acting just like Declude's BASE64 test.  If you
 test this out, you are advised to reduce the score of BASE64 by the
 exact score of this test.  Again, this test gets tripped by all
 attachments, but it doesn't change their score.  I've found that inline
 BASE64 only accounts for less than 20% of the hits.

 If you don't use BASE64 test because of foreign languages or other
 similar issues, that test can be scored negatively in order to offset
 the effects of the inline detection by this filter so that only
 displayable text and HTML will produce a change in score.  That includes
 non-displayable gibberish text in brackets.

 False positives are bound to happen, however their occurrence is fairly
 low.  Since HTML code is also searched, it will find matches in some
 URL's, especially ones with a tracking capability such as those used by
 Yahoo! Groups (in the ad sent with listserv postings) and Buy.com, and
 even less often it will find a match in regular wording, primarily with
 the use of acronyms..  I'm very interested in hearing about more FP's if
 you find them.

 The filter is designed to be used with v1.75 of declude without the
 decoding turned off (default on).  It can be modified to work with older
 versions of Declude by changing the attachments; offset to base64 in
 which case it won't detect any Base64 unless it is not appropriately
 tagged (useful).

 I think this is a killer test.  Enjoy.

 Matt









 # GIBBERISH
 # Last Update: 09/12/2003
 #
 # Description:
 # Finds gibberish in the body of the message, including comment blocks.
Will be triggered on
 # any Base64 encoding due to how common Q combinations are.  A negative
weight for attachments
 # defeats the test, however inline base64 encoded content will receive
full scoring.  The BASE64
 # test should be reduced by the score of this test in order to compensate
for this fact.
 #
 # Usage:
 # GIBBERISH filter C:\IMail\Declude\Gibberish.txt x 5
0
 #
 # False Positives
 # Will result primarily from URL's containing random looking strings.
Known offenders include
 # Buy.com and Yahoo! Groups.



 # The following defeats the test if it finds an attachment.

 BODY -5 CONTAINS attachment;


 # Small list of letter combinations not found in a basic dictionary.

 BODY 0 CONTAINS qb
 BODY 0 CONTAINS qc
 BODY 0 CONTAINS qd
 BODY 0 CONTAINS qf
 BODY 0 CONTAINS qg
 BODY 0 CONTAINS qh
 BODY 0 CONTAINS qi
 BODY 0 CONTAINS qj
 BODY 0 CONTAINS qk
 BODY 0 CONTAINS qm
 BODY 0 CONTAINS qn
 BODY 0 CONTAINS qo
 BODY 0 CONTAINS qp
 BODY 0 CONTAINS qr
 BODY 0 CONTAINS qs
 BODY 0 CONTAINS qt
 BODY 0 CONTAINS qv
 BODY 0 CONTAINS qx
 BODY 0 CONTAINS qy
 BODY 0 CONTAINS qz

 BODY 0 CONTAINS vq
 BODY 0 CONTAINS wq
 BODY 0 CONTAINS tq
 BODY 0 CONTAINS jq

 BODY 0 CONTAINS xd
 BODY 0 CONTAINS xj
 BODY 0 CONTAINS xk
 BODY 0 CONTAINS xr
 BODY 0 CONTAINS xz

 BODY 0 CONTAINS zb
 BODY 0 CONTAINS zc
 BODY 0 CONTAINS zf
 BODY 0 CONTAINS zj
 BODY 0 CONTAINS zk
 BODY 0 CONTAINS zm
 BODY 0 CONTAINS zx

---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]

---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail.  The archives can be found
at http://www.mail-archive.com.


Re: [Declude.JunkMail] Gibberish body detector + inline Base64

2003-09-12 Thread Matthew Bramble
Fred,

That was referenced in my last post.  I'm trying to figure out the best 
counterweight method.  That should only happen with an inline attached 
file (images can be sent both ways).  Someone gave me a good 
recommendation for a fix and I'm researching it.  There's other FP's 
that while rare, could likely also be stopped.

Still though, it's about 98% accurate on files it adds a score to even 
with obvious flaws, and I can only find one E-mail that failed 
improperly because of the added weight out of 1193 caught by the filter 
in the last 24 hours (that E-mail failed multiple other tests as well of 
course).  It's hard to tell though how many E-mails were scored out of 
the total, meaning that they either didn't have attachments tagged in 
the boundaries, but I'm guessing more than 2/3 didn't.  BTW, I'm not 
counting messages on the topic for obvious reasons.

All in all, the messages most likely to fail even accidentally are still 
spam (having links with random characters, which isn't desired for this 
test but can't be avoided).  The rate at which this is accurate is far 
better than other tests like HELOBOGUS for example, but on the other 
hand, spammers almost always fake the HELO while they don't always 
include gibberish.

I'm probably going to reduce my weight just to be safe, especially from 
FP's in both the subject and the body from the same string of 
characters.  I'm thinking that 3/10 is more appropriate for each.

Add the test as a 0 score and add another 0 test for just the attachment 
line so you can see what would get scored.  If they both appear in the 
headers, it wouldn't get scored, the remainders should either be mostly 
spam, or very low scoring in the first place.  Note that Yahoo has 
boundaries and ads that will trigger this test and that should be 
counterweighted with the line:

REVDNS-5ENDSWITH.yahoo.com

I've gotten a lot of good feedback in PM's and when I get it to work 
more accurately, I'll post the configuration.  Nevertheless, it's pretty 
workable as is, though it depends on the entirety of your config.

Matt



Frederick Samarelli wrote:

Matt,

How well does this work.

BODY -5 CONTAINS attachment

I noticed it did not counter weight a photo attachment.

Fred
 



---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]
---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail.  The archives can be found
at http://www.mail-archive.com.


Re: [Declude.JunkMail] Gibberish body detector + inline Base64

2003-09-12 Thread Joshua Levitsky
On Sep 12, 2003, at 10:15 PM, Frederick Samarelli wrote:

Matt,

How well does this work.

BODY -5 CONTAINS attachment

I noticed it did not counter weight a photo attachment.


I think what would help this filter and others like it would be if 
Scott could make it so you could have a line in a filter that read like

BODYPASSCONTAINSattachment;
BODYFAILCONTAINSthis_is_a_bad_word
That way you could have rules that counteract a filter as a safety, but 
they aren't given a value. With the PASS line above if even one case of 
attachment; showed up then the test would pass rather than failing, 
and at the same time it would be better than the current

BODY	-5	CONTAINS	attachment;

Because if multiples are in the email then something could easily gain 
a lot of negative weight which would hurt the effectiveness of the 
test.

Scott: is this possible to add? Is it easy?

-Josh

---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]
---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail.  The archives can be found
at http://www.mail-archive.com.