On Thu, 2009-04-16 at 21:58 +0100, Matt wrote: > Karsten Bräckelmann wrote: > > > > Wonder why that is -- due to the excessively long metas? The sub-rules' > > REs are quite trivial. > > > > Would constructing it using a binary (or n-ary, with small upper bound > > of n) tree speed the compilation up? > > Using a slightly different method - using a maximum number of children > parser. The times were taken after deleting the ~/.spamassassin folder > before each run.
Err, Matt, just had a very brief look at the code and the resulting metas, but -- how is that different? :) The result is exactly the tree structure I mentioned, limiting the number of children per node. > Before > real 21m24.068s > After using 4 children > real 13m7.309s > After using 8 children > real 12m28.601s Hmm, interesting, the results aren't linear. The 8-ary tree performs much better than the flat meta. However, the 4-ary tree with even less children per node (meta) doesn't improve this further. Btw, there's a minor issue with the additional nodes not being non- scoring sub-rules and thus scoring a default 1.0. Just to point it out, I do realize this is a proof-of-concept hack. :) guenther -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}