Re: more efficent big scoring

Justin Mason Tue, 22 Jan 2008 09:24:31 -0800

Jim Maul writes:
> Justin Mason wrote:
> > John D. Hardin writes:
> >> On Tue, 22 Jan 2008, George Georgalis wrote:
> >>
> >>> On Sun, Jan 20, 2008 at 09:41:58AM -0800, John D. Hardin wrote:
> >>>
> >>>> Neither am I. Another thing to consider is the fraction of defined
> >>>> rules that actually hit and affect the score is rather small. The
> >>>> greatest optimization would be to not test REs you know will fail;  
> >>>> but how do you do *that*?
> >>> thanks for all the followups on my inquiry. I'm glad the topic is/was
> >>> considered and it looks like there is some room for development, but
> >>> I now realize it is not as simple as I thought it might have been.
> >>> In answer to above question, maybe the tests need their own scoring?
> >>> eg fast tests and with big spam scores get a higher test score than
> >>> slow tests with low spam scores.
> >>>
> >>> maybe if there was some way to establish a hierachy at startup
> >>> which groups rule processing into nodes. some nodes finish
> >>> quickly, some have dependencies, some are negative, etc.
> >> Loren mentioned to me in a private email: "common subexpressions".
> >>
> >> It would be theoretically possible to analyze all the rules in a given
> >> set (e.g. body rules) to extract common subexpressions and develop a
> >> processing/pruning tree based on that. You'd probably gain some
> >> performance scanning messages, but at the cost of how much
> >> startup/compiling time?
> > 
> > I experimented with this concept in my sa-compile work, but I could
> > achieve any speedup on real-world mixed spam/ham datasets.
> > 
> > Feel free to give it a try though ;)
> > 
> > --j.
> > 
> > 
> 
> You do mean *couldn't* achieve any speedup, correct?

yep

Re: more efficent big scoring

Reply via email to