for those of you running large volume servers you no doubt have an abundance of spam to feed into sa-learn, and i suppose that goes for all sizes of volumes.
but one question. how do you manage to match the same number with hams / real messages. how do you go about bumping up the numbers to even the DB? Am i right in saying that basically anymail thats not spam is ham or is ham only supposed to be mail that are false negatives ie have been tagged but arent really spam.
here at the university there are 3 admins who if they want could read other peoples email... Data protection blah blah but its simply a side affect of administering the systems.
putting a random selection of users' HAM emails ( which could be and unsurprisingly are personal) into the filter to balance the DB could be contentious - but its the only way to get a good selection of emails.
as i said there are only the 3 of us but we have around 40000 mail boxes and 3 isnt really a good representation in terms of quality of emails to be feeding ham into sa-learn. Aside from opening up a mailbox to pleb users and creating more havoc, what are the recommended ways of getting around this?
thanks
ronan
-- Regards
Ronan McGlue ============== Analyst/Programmer Information Services Queens University Belfast BT7 1NN