On Sun, 2004-02-22 at 17:29, Mary Gardiner wrote: > Most people really care about "spam" vs > "non-spam" but it sounds from your mail like a "spam"/"virus"/"non-spam" > categorisation might work.
It would need to be a bit smarter than what I'm using now. It would need to take the email apart and unzip attachments (like clamav does), and perform analysis on that. I'm pretty sure it could work ok if you did stuff like indexing on library dependencies (which also means you'd need ldd versions for a bunch of platforms). As I understand it, the Bayes algorithm doesn't consider word orderings. Something would need to be done about that to match code -- unless series of instructions were taken as single tokens. Probably getting a bit off-topic, but it's an interesting idea. > > Of course, at present (and perhaps inevitably) pattern matching for > viruses is much much more reliable than pattern matching for spam. Well, yeah. I see my virus scanning as a solved-problem. I'm already at 100% accuracy, and downloading new virus signatures each day isn't too onerous for me. James. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
