At 10:26 PM 8/21/2006 -0700, John Rudd wrote:
>I also heard that interlaced gif spam is appearing now.

Yes, I saw that post, however there wasn't a publicly available sample.
Any such would be much appreciated.

>It'd be interesting to see how to counter them.

Should be easy.  One approach is "pixel density".  What I've been doing is
reading JUST enough of the header to calculate the area (just like Dallas'
excellent ImageInfo plugin), then dividing by the total raw file size of
just the image (i.e. what one gets after base64 decoding just the GIF part),
less the size of the obvious parts of the header.  Works well, and is
blindingly fast.

Ham generally have a much LOWER density, because it's typically clipart,
whereas spam is generally text, which compresses extremely well, resulting
in a much HIGHER density.  It's not fool proof, so I use a sliding scale,
and have had only one FP this month (from an idiot (redundant) recruiter to
one of my testers - the PNG misfiring was only half the points required to
reject, and the able idiot managed to do several other things rare in Ham).

The beauty is that the spammer can "easily" foil this by lowerering the
density by adding more complexity, which increases the file size, so more
bandwidth is consumed. :)

Some stock spams do use a fancier font which scores lower, so I'm still 
considering other types of analysis as a backup.


Specifically to address animated GIFs, it would be very easy to "walk" the 
raw image, calculating each frame's pixel density, simply ignoring the 
obvious chaff frames.

Tomorrow, I'll write some code to decompose the frames and see what sort of 
numbers I get.

>For interlaced ... I have no idea.  Depends a lot on how the interlaced 
>images are stored, I guess.

Yes, exactly.  Until there's samples, I'm not going to worry about it.

What we also need is a diverse Ham GIF corpus.  Does anyone know of one?
        - "Chip"

P.S.  Dallas:  it never occurred to me to _JUST_ score the area.  My pixel 
density approach fails on multi-GIFs, so you saved my bacon there. ;)


Reply via email to