I had a similar, less expensive thought; Checking the global color table in the header of all of the gif images in a particular message. I tested a couple of spam cases and the GCTs are identical in all of my limited number of test cases.
Logan Shaw wrote: > Looks like people have started to get a grip on the image > spams that are so popular lately, but here's an additional > idea I thought I'd toss out. (I'm not familiar enough with > SA to easily figure out how to make a plugin.) > > Basically, these spams all have a bunch of images which are > tiles of a larger image. The tiling thing is, presumably, done > to avoid checksumming. Now, here's the thing with tiling: the > left edge of one image will be extremely similar to the right > edge of the one next to it. And same with top and bottom edges. > > So it seems like a useful rule could decompress each of the > images, take the left and right columns and top and bottom rows > of each image, and compare those columns and rows to columns > and rows other images of similar dimensions. If they correlate > closely (determined easily enough by subtracting one set of > pixels from the next), that's a strong indicator they were > expected to abut, which in turn is a strong indicator of spam. > > Of course, this requires decoding the entire image, but the > analysis after that point should be fairly cheap (compared to > OCR, for example). > > - Logan