Re: Experimental Plugin: MetaSVM

John Hardin Fri, 13 Mar 2009 14:25:34 -0700

On Fri, 13 Mar 2009, decoder wrote:

You create one model file once by feeding it a large corpus of ham+spam.

The problem is that feeding does not work with an SVM algorithm. Youhave to train on the _whole_ set _always_, so feeding mails isunpractical.

That's why you do this process _once_ with a lot of ham and spam. Youcan repeat this process any time but it isn't necessary to do thispermanently.

I assume it learns from full message corpa? And all it cares about is therules that hit?

Per my earlier suggestion of learning off the logs + corpa to fix FP/FN,could there be an option to learn off generated minimal corpa files, withtheir structure being just the rules hit per message (msgid + hits onone possibly very long line)? e.g.:


<kggbph.617...@localhost> 
BAYES_99,FORGED_RCVD_HELO,L_SOME_STD_PROBS,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E4_51_100,RAZOR2_CF_RANGE_E8_51_100,RAZOR2_CHECK,RBL_PSBL_01,RCVD_IN_BRBL,RCVD_IN_NJABL_SPAM,SARE_FROM_SPAM_MONEY2,STOX_30,URIBL_BLACK,URIBL_JP_SURBL,URIBL_WS_SURBL

Then an external tool could generate and maintain these files from the SAlog and the maintained training corpa, omitting FP/FN from the log data.

This is just intended to include in training the high- and low-scoring(obviously spam/ham) messages, which may not appear in the training corpaif training is mostly exception-based.


--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  It is not the place of government to make right every tragedy and
  woe that befalls every resident of the nation.
-----------------------------------------------------------------------
 Tomorrow: Albert Einstein's 130th Birthday

Re: Experimental Plugin: MetaSVM

Reply via email to