Re: [Declude.JunkMail] Gibberish body detector + inline Base64
Markus, I've found myself that the subject test is only slightly useful in the scheme of things, but while I know a false positives will happen, I haven't seen any under that configuration in the last day. I've now stopped monitoring that test as a result. BTW, it's very good to know that this isn't picking up FP's from mail mainly used by other languages, albeit western ones. I see very little real stuff from overseas, so that is hard for me to test. My feeling is that once you achieve moderate success with Declude, each successive step is that much harder to make. Combined with the body gibberish (which often also trips the subject gibberish) and a test for obfuscation, this makes a very noticeable impact. They're all pretty much the same test anyway because they're all markers for the same school of thought in spamming. The types of folks that send from open relays or wormed machines are also the types of folks that use a lot of these techniques. I'm now able to fail some messages without any header errors because they combine subject spaces, obsfucation, gibberish and comments. These guys seem more concerned with masking the content of their messages than they are with masking their masking techniques. I'm fine with that because I think looking for techniques produces fewer FP's than looking for content. So in general, I see all of these things as the same test, and most hits will score on at least one other test mentioned. It's hard to say that it didn't have an impact when you could say the same about SUBJECTSPACES for instance...something often combined with GIBBERISHSUB. Right now all I am looking for is loose change in the couch, and I found a few more pennies. I've fixed the major problems with the GIBBERISH body filter on my machine, and that makes a much bigger impact on results than the subject filter because it picks up fake boundaries and links that spammers are using even when they don't include gibberish in text and comments (I didn't realize that until yesterday, but it accounts for a lot of the hits). FP's are higher, but nothing has failed my machine under the new configuration because of that test. I'll post the updated filter once I have 1,000 hits and can put together some numbers to go along with it. Thanks, Matt Markus Gufler wrote: Matt, here my observations about GIBBERISHSUB: I've tested this now for over a day on our mailserver (which handles mainly messages written in german and italian) Haven't found any FP, but any spam-message triggering this test has already recieved more then 200% of our hold weight. However: good test! Markus --- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)] --- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type unsubscribe Declude.JunkMail. The archives can be found at http://www.mail-archive.com.
Re: [Declude.JunkMail] Gibberish body detector + inline Base64
Do you this in addition to or in replace of the tested listed earlier. GibberishSub.txt - Original Message - From: Matthew Bramble [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, September 12, 2003 2:41 PM Subject: [Declude.JunkMail] Gibberish body detector + inline Base64 I've been testing this for almost a day and have had very good results with this filter as it is catching spam all the time...over 1/3 of my total mail volume is being tagged in fact. Here's how it works. Like the Gibberish subject test, this searches for strings of characters not found commonly in communications. Since Base64 encoding has to be scanned with text filters at this time, the filter will automatically trip on any Base64 content because of how common strings with Q are in the encoding. In order to offset this effect, it searches for attachment; which is required for any non-inline content, and gives back points. Since this code isn't associated with inline Base64 content, it won't get tripped there and has the net effect of acting just like Declude's BASE64 test. If you test this out, you are advised to reduce the score of BASE64 by the exact score of this test. Again, this test gets tripped by all attachments, but it doesn't change their score. I've found that inline BASE64 only accounts for less than 20% of the hits. If you don't use BASE64 test because of foreign languages or other similar issues, that test can be scored negatively in order to offset the effects of the inline detection by this filter so that only displayable text and HTML will produce a change in score. That includes non-displayable gibberish text in brackets. False positives are bound to happen, however their occurrence is fairly low. Since HTML code is also searched, it will find matches in some URL's, especially ones with a tracking capability such as those used by Yahoo! Groups (in the ad sent with listserv postings) and Buy.com, and even less often it will find a match in regular wording, primarily with the use of acronyms.. I'm very interested in hearing about more FP's if you find them. The filter is designed to be used with v1.75 of declude without the decoding turned off (default on). It can be modified to work with older versions of Declude by changing the attachments; offset to base64 in which case it won't detect any Base64 unless it is not appropriately tagged (useful). I think this is a killer test. Enjoy. Matt # GIBBERISH # Last Update: 09/12/2003 # # Description: # Finds gibberish in the body of the message, including comment blocks. Will be triggered on # any Base64 encoding due to how common Q combinations are. A negative weight for attachments # defeats the test, however inline base64 encoded content will receive full scoring. The BASE64 # test should be reduced by the score of this test in order to compensate for this fact. # # Usage: # GIBBERISH filter C:\IMail\Declude\Gibberish.txt x 5 0 # # False Positives # Will result primarily from URL's containing random looking strings. Known offenders include # Buy.com and Yahoo! Groups. # The following defeats the test if it finds an attachment. BODY -5 CONTAINS attachment; # Small list of letter combinations not found in a basic dictionary. BODY 0 CONTAINS qb BODY 0 CONTAINS qc BODY 0 CONTAINS qd BODY 0 CONTAINS qf BODY 0 CONTAINS qg BODY 0 CONTAINS qh BODY 0 CONTAINS qi BODY 0 CONTAINS qj BODY 0 CONTAINS qk BODY 0 CONTAINS qm BODY 0 CONTAINS qn BODY 0 CONTAINS qo BODY 0 CONTAINS qp BODY 0 CONTAINS qr BODY 0 CONTAINS qs BODY 0 CONTAINS qt BODY 0 CONTAINS qv BODY 0 CONTAINS qx BODY 0 CONTAINS qy BODY 0 CONTAINS qz BODY 0 CONTAINS vq BODY 0 CONTAINS wq BODY 0 CONTAINS tq BODY 0 CONTAINS jq BODY 0 CONTAINS xd BODY 0 CONTAINS xj BODY 0 CONTAINS xk BODY 0 CONTAINS xr BODY 0 CONTAINS xz BODY 0 CONTAINS zb BODY 0 CONTAINS zc BODY 0 CONTAINS zf BODY 0 CONTAINS zj BODY 0 CONTAINS zk BODY 0 CONTAINS zm BODY 0 CONTAINS zx --- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)] --- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type unsubscribe Declude.JunkMail. The archives can be found at http://www.mail-archive.com.
Re: [Declude.JunkMail] Gibberish body detector + inline Base64
Frederick Samarelli wrote: Do you this in addition to or in replace of the tested listed earlier. It's completely separate from the GIBBERSUB filter. I updated the list of keywords in the subject filter so that it is the same as the one I just posted after finding FP's on the acronym 'QE'EG (Quantitative Electroencephalogram) and bamboo'zl'e. Depending on how you score it, that might not matter all that much. My latest version of GIBBERSUB is attached. I started dating them whenever I make changes in the even that helps anyone that wants to use them. You also might want to whitelist declude.com if you are using these filters :) Matt # GIBBERISHSUB # 09/11/2003 # # Description: # Built to look for random strings of text (gibberish) in the subject of a message by searching # for character combinations that aren't common in E-mail communications. Will be triggered on # any Base64 encoding due to the code marker used to tell the mail client to display the proper # character set. A negative weight for the same code marker defeats the test in order to # protect from false positives on the encoded content. # # Usage: # GIBBERISHSUB filter C:\IMail\Declude\GibberishSub.txt x 5 0 # # False Positives: # Very rare. Would be primarily attributed to randomly generated codes, acronyms and # misspellings. # The following defeats the test if it finds the subject is not sent as ASCII SUBJECT -5 CONTAINS?b? # Small list of letter combinations not found in a basic dictionary. SUBJECT 0 CONTAINSqb SUBJECT 0 CONTAINSqc SUBJECT 0 CONTAINSqd SUBJECT 0 CONTAINSqf SUBJECT 0 CONTAINSqg SUBJECT 0 CONTAINSqh SUBJECT 0 CONTAINSqi SUBJECT 0 CONTAINSqj SUBJECT 0 CONTAINSqk SUBJECT 0 CONTAINSqm SUBJECT 0 CONTAINSqn SUBJECT 0 CONTAINSqo SUBJECT 0 CONTAINSqp SUBJECT 0 CONTAINSqr SUBJECT 0 CONTAINSqs SUBJECT 0 CONTAINSqt SUBJECT 0 CONTAINSqv SUBJECT 0 CONTAINSqx SUBJECT 0 CONTAINSqy SUBJECT 0 CONTAINSqz SUBJECT 0 CONTAINSvq SUBJECT 0 CONTAINSwq SUBJECT 0 CONTAINStq SUBJECT 0 CONTAINSjq SUBJECT 0 CONTAINSxd SUBJECT 0 CONTAINSxj SUBJECT 0 CONTAINSxk SUBJECT 0 CONTAINSxr SUBJECT 0 CONTAINSxz SUBJECT 0 CONTAINSzb SUBJECT 0 CONTAINSzc SUBJECT 0 CONTAINSzf SUBJECT 0 CONTAINSzj SUBJECT 0 CONTAINSzk SUBJECT 0 CONTAINSzm SUBJECT 0 CONTAINSzx
Re: [Declude.JunkMail] Gibberish body detector + inline Base64
Thanks - Original Message - From: Matthew Bramble [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, September 12, 2003 5:15 PM Subject: Re: [Declude.JunkMail] Gibberish body detector + inline Base64 Frederick Samarelli wrote: Do you this in addition to or in replace of the tested listed earlier. It's completely separate from the GIBBERSUB filter. I updated the list of keywords in the subject filter so that it is the same as the one I just posted after finding FP's on the acronym 'QE'EG (Quantitative Electroencephalogram) and bamboo'zl'e. Depending on how you score it, that might not matter all that much. My latest version of GIBBERSUB is attached. I started dating them whenever I make changes in the even that helps anyone that wants to use them. You also might want to whitelist declude.com if you are using these filters :) Matt # GIBBERISHSUB # 09/11/2003 # # Description: # Built to look for random strings of text (gibberish) in the subject of a message by searching # for character combinations that aren't common in E-mail communications. Will be triggered on # any Base64 encoding due to the code marker used to tell the mail client to display the proper # character set. A negative weight for the same code marker defeats the test in order to # protect from false positives on the encoded content. # # Usage: # GIBBERISHSUB filter C:\IMail\Declude\GibberishSub.txt x 5 0 # # False Positives: # Very rare. Would be primarily attributed to randomly generated codes, acronyms and # misspellings. # The following defeats the test if it finds the subject is not sent as ASCII SUBJECT -5 CONTAINS ?b? # Small list of letter combinations not found in a basic dictionary. SUBJECT 0 CONTAINS qb SUBJECT 0 CONTAINS qc SUBJECT 0 CONTAINS qd SUBJECT 0 CONTAINS qf SUBJECT 0 CONTAINS qg SUBJECT 0 CONTAINS qh SUBJECT 0 CONTAINS qi SUBJECT 0 CONTAINS qj SUBJECT 0 CONTAINS qk SUBJECT 0 CONTAINS qm SUBJECT 0 CONTAINS qn SUBJECT 0 CONTAINS qo SUBJECT 0 CONTAINS qp SUBJECT 0 CONTAINS qr SUBJECT 0 CONTAINS qs SUBJECT 0 CONTAINS qt SUBJECT 0 CONTAINS qv SUBJECT 0 CONTAINS qx SUBJECT 0 CONTAINS qy SUBJECT 0 CONTAINS qz SUBJECT 0 CONTAINS vq SUBJECT 0 CONTAINS wq SUBJECT 0 CONTAINS tq SUBJECT 0 CONTAINS jq SUBJECT 0 CONTAINS xd SUBJECT 0 CONTAINS xj SUBJECT 0 CONTAINS xk SUBJECT 0 CONTAINS xr SUBJECT 0 CONTAINS xz SUBJECT 0 CONTAINS zb SUBJECT 0 CONTAINS zc SUBJECT 0 CONTAINS zf SUBJECT 0 CONTAINS zj SUBJECT 0 CONTAINS zk SUBJECT 0 CONTAINS zm SUBJECT 0 CONTAINS zx --- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)] --- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type unsubscribe Declude.JunkMail. The archives can be found at http://www.mail-archive.com.
Re: [Declude.JunkMail] Gibberish body detector + inline Base64
Thanks Josh. I'm sure there are more exceptions to come as well, but hopefully only a handful. BTW, I did whitelist declude.com, so no problems here with reading anything just as long as Scott doesn't start using these filters with a high score :) Your message also definitively answered the whitelisting question, John was right that all it does is defeat the scoring...my capture account still grabbed a copy of the message. Could you post the full headers of that message with PGP, as well as any boundary code that might have been above the PGP signature. I could only find one example in 7 years of E-mail :) Just to be responsible with resources, it would be better to search the headers rather than the body. If folks haven't realized this yet, filtering the entire body with attachments can pull a lot of processing power, and it can be bad with very large files. My dual 1 Ghz machine that generally bounces in the low single digits pulled about 50% for several seconds on a 14 MB attachment using a different filter with over 1,000 lines of BODY CONTAINS. I assume that PGP signatures should be marked in the headers as an attachment, i.e. application/pgp-signature. If there are exceptions to this, then the BODY makes sense. This is still a filter in progress. I have another false positive that I just caught from an inline image that didn't trip the BASE64 filter or contain the attachment marker. This is accepted behavior for E-mail, so I'm going to have to figure out another way to not score such content. It will probably end up necessary to place the exception testing in a different filter so that it doesn't hit more than one exception at a time. Spammers use inline images on a rare occasion and I would hate to take extra points away from them. And thanks to Kami for the kind words :) BTW, both gibberish filters should remove the qo combination due to 'QO'S. I'll post another copy of my file when I figure out the PGP and inline problems. If anyone has any pointers on other inline Base64 stuff, I'd appreciate hearing it. It's important to exclude everything that the BASE64 test doesn't catch, so knowing the strict criteria there helps (i.e. what does it look for). This might also include needing to exclude some inline text, I'm not sure yet. Still works pretty good though. Thanks, Matt Joshua Levitsky wrote: Question: Below is a PGP signed message. Notice that it will fail your gibberish body test. I would suggest that just like you look for attachment in the body, that you also give -5 points to BEGIN PGP SIGNATURE because you are for sure going to see gibberish contained in a PGP or GPG signature. Hope this helps in your spam fighting. -Josh -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 n Sep 12, 2003, at 5:15 PM, Matthew Bramble wrote: Frederick Sama -BEGIN PGP SIGNATURE- Version: PGP 8.0.2 iQA/AwUBP2JPAXx8sPj6XQb+EQLuaACgi2cdS7XaOKLfIaVCJ96un+/iGc8AnjBq DtlxcebkqwzfEpYOzCDFo5CG =m4KE -END PGP SIGNATURE- --- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)] --- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type unsubscribe Declude.JunkMail. The archives can be found at http://www.mail-archive.com.
Re: [Declude.JunkMail] Gibberish body detector + inline Base64
Someone pointed me to a problem with PGP that needs to be fixed with this filter, and there are still some other issues as well. This is still a filter in progress. I have another false positive that I just caught from an inline image that didn't trip the BASE64 filter or contain the attachment marker. This is standard behavior for E-mail, so I'm going to have to figure out another way to not score such content. It will probably end up necessary to place the exception testing in a different filter so that it doesn't hit more than one exception at a time. Spammers use inline images on a rare occasion and I would hate to take extra points away from them. BTW, both gibberish filters should remove the qo combination due to 'QO'S, qb because of 'QB', qv because of 'QV'C, and qi because of 'Qi' and other Chinese names. The list of combinations is starting to get smaller, however there is a limit to how tight the test should be. I've been using Google as a benchmark for letter combinations, qu for instance scores 41,500,000 results (allowed), qb scores 2,600,000 results, qi scores 2,360,000, but jq only scores 838,000. Seems that anything around 1,500,000 or less is about as good as it gets. This doesn't include though when the letters appear inside of a dictionary word, and that should be almost nonexistent. The goal is to find the least common of all. Needless to say, there are enough exceptions to score low no matter how refined it is, however it seems to be scoring about 98% valid hits on spam even with the obvious limitations. I'll post another copy of my file when I figure out the PGP and inline problems. If anyone has any pointers on other inline Base64 stuff, I'd appreciate hearing it. It's important to exclude everything that the BASE64 test doesn't catch, so knowing the strict criteria there helps (i.e. what does it look for). This might also include needing to exclude some inline text, I'm not sure yet. Still works pretty good though. And thanks to Kami for the kind words :) Thanks, Matt --- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)] --- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type unsubscribe Declude.JunkMail. The archives can be found at http://www.mail-archive.com.
Re: [Declude.JunkMail] Gibberish body detector + inline Base64
Matt, How well does this work. BODY -5 CONTAINS attachment I noticed it did not counter weight a photo attachment. Fred - Original Message - From: Matthew Bramble [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, September 12, 2003 2:41 PM Subject: [Declude.JunkMail] Gibberish body detector + inline Base64 I've been testing this for almost a day and have had very good results with this filter as it is catching spam all the time...over 1/3 of my total mail volume is being tagged in fact. Here's how it works. Like the Gibberish subject test, this searches for strings of characters not found commonly in communications. Since Base64 encoding has to be scanned with text filters at this time, the filter will automatically trip on any Base64 content because of how common strings with Q are in the encoding. In order to offset this effect, it searches for attachment; which is required for any non-inline content, and gives back points. Since this code isn't associated with inline Base64 content, it won't get tripped there and has the net effect of acting just like Declude's BASE64 test. If you test this out, you are advised to reduce the score of BASE64 by the exact score of this test. Again, this test gets tripped by all attachments, but it doesn't change their score. I've found that inline BASE64 only accounts for less than 20% of the hits. If you don't use BASE64 test because of foreign languages or other similar issues, that test can be scored negatively in order to offset the effects of the inline detection by this filter so that only displayable text and HTML will produce a change in score. That includes non-displayable gibberish text in brackets. False positives are bound to happen, however their occurrence is fairly low. Since HTML code is also searched, it will find matches in some URL's, especially ones with a tracking capability such as those used by Yahoo! Groups (in the ad sent with listserv postings) and Buy.com, and even less often it will find a match in regular wording, primarily with the use of acronyms.. I'm very interested in hearing about more FP's if you find them. The filter is designed to be used with v1.75 of declude without the decoding turned off (default on). It can be modified to work with older versions of Declude by changing the attachments; offset to base64 in which case it won't detect any Base64 unless it is not appropriately tagged (useful). I think this is a killer test. Enjoy. Matt # GIBBERISH # Last Update: 09/12/2003 # # Description: # Finds gibberish in the body of the message, including comment blocks. Will be triggered on # any Base64 encoding due to how common Q combinations are. A negative weight for attachments # defeats the test, however inline base64 encoded content will receive full scoring. The BASE64 # test should be reduced by the score of this test in order to compensate for this fact. # # Usage: # GIBBERISH filter C:\IMail\Declude\Gibberish.txt x 5 0 # # False Positives # Will result primarily from URL's containing random looking strings. Known offenders include # Buy.com and Yahoo! Groups. # The following defeats the test if it finds an attachment. BODY -5 CONTAINS attachment; # Small list of letter combinations not found in a basic dictionary. BODY 0 CONTAINS qb BODY 0 CONTAINS qc BODY 0 CONTAINS qd BODY 0 CONTAINS qf BODY 0 CONTAINS qg BODY 0 CONTAINS qh BODY 0 CONTAINS qi BODY 0 CONTAINS qj BODY 0 CONTAINS qk BODY 0 CONTAINS qm BODY 0 CONTAINS qn BODY 0 CONTAINS qo BODY 0 CONTAINS qp BODY 0 CONTAINS qr BODY 0 CONTAINS qs BODY 0 CONTAINS qt BODY 0 CONTAINS qv BODY 0 CONTAINS qx BODY 0 CONTAINS qy BODY 0 CONTAINS qz BODY 0 CONTAINS vq BODY 0 CONTAINS wq BODY 0 CONTAINS tq BODY 0 CONTAINS jq BODY 0 CONTAINS xd BODY 0 CONTAINS xj BODY 0 CONTAINS xk BODY 0 CONTAINS xr BODY 0 CONTAINS xz BODY 0 CONTAINS zb BODY 0 CONTAINS zc BODY 0 CONTAINS zf BODY 0 CONTAINS zj BODY 0 CONTAINS zk BODY 0 CONTAINS zm BODY 0 CONTAINS zx --- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)] --- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type unsubscribe Declude.JunkMail. The archives can be found at http://www.mail-archive.com.
Re: [Declude.JunkMail] Gibberish body detector + inline Base64
Fred, That was referenced in my last post. I'm trying to figure out the best counterweight method. That should only happen with an inline attached file (images can be sent both ways). Someone gave me a good recommendation for a fix and I'm researching it. There's other FP's that while rare, could likely also be stopped. Still though, it's about 98% accurate on files it adds a score to even with obvious flaws, and I can only find one E-mail that failed improperly because of the added weight out of 1193 caught by the filter in the last 24 hours (that E-mail failed multiple other tests as well of course). It's hard to tell though how many E-mails were scored out of the total, meaning that they either didn't have attachments tagged in the boundaries, but I'm guessing more than 2/3 didn't. BTW, I'm not counting messages on the topic for obvious reasons. All in all, the messages most likely to fail even accidentally are still spam (having links with random characters, which isn't desired for this test but can't be avoided). The rate at which this is accurate is far better than other tests like HELOBOGUS for example, but on the other hand, spammers almost always fake the HELO while they don't always include gibberish. I'm probably going to reduce my weight just to be safe, especially from FP's in both the subject and the body from the same string of characters. I'm thinking that 3/10 is more appropriate for each. Add the test as a 0 score and add another 0 test for just the attachment line so you can see what would get scored. If they both appear in the headers, it wouldn't get scored, the remainders should either be mostly spam, or very low scoring in the first place. Note that Yahoo has boundaries and ads that will trigger this test and that should be counterweighted with the line: REVDNS-5ENDSWITH.yahoo.com I've gotten a lot of good feedback in PM's and when I get it to work more accurately, I'll post the configuration. Nevertheless, it's pretty workable as is, though it depends on the entirety of your config. Matt Frederick Samarelli wrote: Matt, How well does this work. BODY -5 CONTAINS attachment I noticed it did not counter weight a photo attachment. Fred --- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)] --- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type unsubscribe Declude.JunkMail. The archives can be found at http://www.mail-archive.com.
Re: [Declude.JunkMail] Gibberish body detector + inline Base64
On Sep 12, 2003, at 10:15 PM, Frederick Samarelli wrote: Matt, How well does this work. BODY -5 CONTAINS attachment I noticed it did not counter weight a photo attachment. I think what would help this filter and others like it would be if Scott could make it so you could have a line in a filter that read like BODYPASSCONTAINSattachment; BODYFAILCONTAINSthis_is_a_bad_word That way you could have rules that counteract a filter as a safety, but they aren't given a value. With the PASS line above if even one case of attachment; showed up then the test would pass rather than failing, and at the same time it would be better than the current BODY -5 CONTAINS attachment; Because if multiples are in the email then something could easily gain a lot of negative weight which would hurt the effectiveness of the test. Scott: is this possible to add? Is it easy? -Josh --- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)] --- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type unsubscribe Declude.JunkMail. The archives can be found at http://www.mail-archive.com.