I've been receiving a steady stream of spam with the following body
format:

[3 centered images, referenced by URLs]
[A short line of random characters, or sometimes a <style> tag]
[Random_character_string][a random word][2nd_random_string]
Last element repeated many times ....

There will be several hundred lines starting with the the same 40 to 80
Random_character_string, following by a random word, different in each
line, followed by a consistent, repeated 2nd_random_string. The
repeated strings are different for each spam, but consistent throughout
each one. Occasionally a 2nd similar line with different random
elements is inserted, but the repeated portions of most lines are
frequent enough to identify the spam. 

I've been successfully using a filter on my personal mail to sideline
these via  dynamic delivery instruction, and have generalized this to a
python filter module, attached.

The header and match information isn't returned in the SMTP dialog, for
security reasons, but you can uncomment the line invoking syslog in the
gibDetect method to log more details about the spam.

/etc/pythonfilter.conf contains:

# gibberish: check message for repetitive giberish lines
gibberish

/etc/pythonfilter-modules.conf contains:

[gibberish.py]
maxMsgSize = 2000000
checkLines = 400
gibLines = 40
gibChars = 10

These are explained in the module code itself, but basically, for any
email smaller than maxMsgSize the module examines the the first 400
lines and looks for gibLines consecutive lines starting with the same
gibChars characters.

Gordon, take a look at this code and if you have any suggestions please
post them.

-- 
Lindsay Haisley       | "UNIX is user-friendly, it just
FMP Computer Services |       chooses its friends."
512-259-1190          |          -- Andreas Bogk
http://www.fmp.com    |

#!/usr/bin/python
# vim: set expandtab ai ts=4:

import sys
import os.path
import courier.config
import courier.control
import courier.xfilter
import syslog as S

maxMsgSize = 2000000
# Maximum message size. Pass if larger.

checkLines = 100
# Number of lines (including headers) to check for repetitive gibberish

gibLines = 40
# Number of consecutive gibberish lines required for rejection

gibChars = 10
# Number of characters to check in each line for repetitive gibberish

def initFilter():
    courier.config.applyModuleConfig('gibberish.py', globals())
    # Record in the system log that this filter was initialized.
    sys.stderr.write('Initialized the "gibberish" python filter\n')

def gibDetect(bf):
    a = []
    bfh = open(bf)
    for i in range(checkLines):
        a.append(bfh.readline())
    
    lfcount = 0
    lcount = 0
    lastlf = ''
    subject = ''
    
    for l in a:
        if not subject:
            if l[:8] == "Subject:":
                subject = l[9:]
                continue
    
        lf = l[:gibChars]
        if lf == lastlf and len(lf) == gibChars and not " " in lf: 
            lfcount += 1
            if lfcount >= gibLines:
#                S.syslog(S.LOG_INFO | S.LOG_MAIL, "gibberish: %s: match: %s" % (subject, lastlf))
                return ("gibberish: %s" % subject)
        else:
            lastlf = lf
            lfcount = 0
    return None
    
def doFilter(bodyFile, controlFileList):
    msgSize = os.path.getsize(bodyFile)
    if msgSize > maxMsgSize:
        return ''

    n = gibDetect(bodyFile)
    if n:
        sender = courier.control.getSendersMta(controlFileList) 
        return "500 gibberish spam from %s" % sender
    return ''

------------------------------------------------------------------------------
_______________________________________________
courier-users mailing list
courier-users@lists.sourceforge.net
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Reply via email to