Re: Experimental Plugin: MetaSVM

LuKreme Sat, 14 Mar 2009 16:41:11 -0700

On 13-Mar-2009, at 15:24, John Hardin wrote:

On Fri, 13 Mar 2009, decoder wrote:
You create one model file once by feeding it a large corpus of ham+spam.
The problem is that feeding does not work with an SVM algorithm.You have to train on the _whole_ set _always_, so feeding mails isunpractical.
That's why you do this process _once_ with a lot of ham and spam.You can repeat this process any time but it isn't necessary to dothis permanently.
I assume it learns from full message corpa? And all it cares aboutis the rules that hit?
Per my earlier suggestion of learning off the logs + corpa to fix FP/FN, could there be an option to learn off generated minimal corpafiles, with their structure being just the rules hit per message(msgid + hits on one possibly very long line)? e.g.:
<kggbph.617...@localhost>BAYES_99,FORGED_RCVD_HELO,L_SOME_STD_PROBS,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E4_51_100,RAZOR2_CF_RANGE_E8_51_100,RAZOR2_CHECK,RBL_PSBL_01,RCVD_IN_BRBL,RCVD_IN_NJABL_SPAM,SARE_FROM_SPAM_MONEY2,STOX_30,URIBL_BLACK,URIBL_JP_SURBL,URIBL_WS_SURBL
Then an external tool could generate and maintain these files fromthe SA log and the maintained training corpa, omitting FP/FN fromthe log data.


This is an excellent idea, but it also needs rule hits on ham, right?


--
Though it's cold and lonely in the deep dark night I can see
        paradise by the dashboard light.

Re: Experimental Plugin: MetaSVM

Reply via email to