On Mon, 24 Oct 2005, [EMAIL PROTECTED] whispered secretively: > I'm not sure what the SA folks think about this now a days. A while > back, they removed the checks for MS executables as being spam > indicators even though the test actually is a very good indicator of > spam.
That's because it didn't work very well. The new AntiVirus plugin does a much better job, but note that it is *not* an antivirus plugin despite the name: it's a suspect-extension-and-content-type detector, so if your users are in the habit of mailing executables or PowerPoint documents or things of that nature around, the plugin will cause FPs. > Instead, SA is detecting email worms via the Bayesian analysis, > detecting keywords that match MS executables, even though it doesn't > do anywhere near as good a job. That's because there aren't many such keywords. > Email worms are one of the most dangerous and destructive forms of > UBE. They directly lead to open proxies that are used for "regular" > spam. IMHO, they should be paid *more* attention to than "regular" > spam, not less. The problem is that the properties of worms are totally different to the properties of spam. Spam is wildly variable but intended to contain components that are read by human beings, and the vast majority of SpamAssassin's rules look for things on that basis. Worms are vast lumps of mostly-invariant binary data: the regex rules, the URIBL system, and the Bayesian analyzer are mostly useless on them, and that doesn't really leave very much bar header analysis (and half of those rules are useless on worms too). SA has *no* facilities for spotting patterns in big lumps of binary data, let alone automated partial disassembly and static behavioural analysis routines, unpackers for UPX and OLE unpackers and so on, like many virus scanners have. There is almost no overlap between the jobs they have to do, or between the nature of the emails they trap. Plus, even with the sa-update system, worms change so fast that, with SA's regex matching and URIBL rendered useless by the binary-lump nature of worms, SA would never spot most new worms. (The only reason it spots most spam is because rules that caught old spam often catch new spam too. Rules meant to catch old worms pretty much *never* catch new ones unless, like the MICROSOFT_EXECUTABLE rule, they're so general that they could easily catch lots of stuff that isn't wormy as well.) Plus, worms are often so large that scanning them with SA is astonishingly inefficient. SA is many, many times slower than a dedicated tool like clamav and can never do as good a job as one of them. SA would need *tens of thousands* of individually crafted anti-worm rules to do as good a job as clamav --- and that's *orders of magnitude* more rules than SA has right now. It'd become unimaginably slow and immensely bloated, and would *still* do a bad job. So even though they're UBE, executable lumps aren't something that SA can efficiently spot. (Equally, though, sometimes antivirus tools like clamav start attacking things that perhaps they shouldn't: clamav catches some phishing scams, so those of us with corpuses have had to stop it rejecting such mails lest it bias the corpuses, as SA *is* intended to catch phish.) -- `"Gun-wielding recluse gunned down by local police" isn't the epitaph I want. I am hoping for "Witnesses reported the sound up to two hundred kilometers away" or "Last body part finally located".' --- James Nicoll