On Thu, 2009-04-16 at 21:58 +0100, Matt wrote:
> Karsten Bräckelmann wrote:
> >
> > Wonder why that is -- due to the excessively long metas? The sub-rules'
> > REs are quite trivial.
> >
> > Would constructing it using a binary (or n-ary, with small upper bound
> > of n) tree speed the compilation up?
> 
> Using a slightly different method - using a maximum number of children 
> parser.  The times were taken after deleting the ~/.spamassassin folder 
> before each run.

Err, Matt, just had a very brief look at the code and the resulting
metas, but -- how is that different? :)

The result is exactly the tree structure I mentioned, limiting the
number of children per node.


> Before
> real    21m24.068s

> After using 4 children
> real    13m7.309s

> After using 8 children
> real    12m28.601s

Hmm, interesting, the results aren't linear. The 8-ary tree performs
much better than the flat meta. However, the 4-ary tree with even less
children per node (meta) doesn't improve this further.

Btw, there's a minor issue with the additional nodes not being non-
scoring sub-rules and thus scoring a default 1.0. Just to point it out,
I do realize this is a proof-of-concept hack. :)

  guenther


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to