On Sun, 2004-02-22 at 17:29, Mary Gardiner wrote:
>  Most people really care about "spam" vs
> "non-spam" but it sounds from your mail like a "spam"/"virus"/"non-spam"
> categorisation might work.

It would need to be a bit smarter than what I'm using now. It would need
to take the email apart and unzip attachments (like clamav does), and
perform analysis on that. I'm pretty sure it could work ok if you did
stuff like indexing on library dependencies (which also means you'd need
ldd versions for a bunch of platforms). As I understand it, the Bayes
algorithm doesn't consider word orderings. Something would need to be
done about that to match code -- unless series of instructions were
taken as single tokens.

Probably getting a bit off-topic, but it's an interesting idea.

> 
> Of course, at present (and perhaps inevitably) pattern matching for
> viruses is much much more reliable than pattern matching for spam.

Well, yeah. I see my virus scanning as a solved-problem. I'm already at
100% accuracy, and downloading new virus signatures each day isn't too
onerous for me.

James.


-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Reply via email to