2009/4/17 Karsten Bräckelmann <guent...@rudersport.de>: > On Fri, 2009-04-17 at 16:18 +0100, Matt wrote: >> Karsten Bräckelmann wrote: > >> > Err, Matt, just had a very brief look at the code and the resulting >> > metas, but -- how is that different? :) >> >> Blame that comment on lack of sleep - I read that as limiting the depth >> of the tree and not being an n-ary tree. > > ;) > >> > Hmm, interesting, the results aren't linear. The 8-ary tree performs >> > much better than the flat meta. However, the 4-ary tree with even less >> > children per node (meta) doesn't improve this further. >> >> I haven't had a chance to look at the SA compile code yet to see how it >> works. I am going to run some more tests to see what impact the >> different values have on the different parts of the sa-compile process. >> I *think* that the actually compilation of the .c files was quicker >> using 4 rather than 8 children. > > Which also seems more likely, from naive point of view without me having > any closer look at the code. > >> I also want to look at the caching code - as with smaller 4-ary trees >> the chance of keeping the same blocks would increase - assuming there is >> some inteligence in the groupings. > > By looking at the sub-rules' names I got the impression they are just > random. But maybe they actually are somehow based on the rule's content? > Never checked. Justin?
yep, they're derived from a hash of the string. >> > Btw, there's a minor issue with the additional nodes not being non- >> > scoring sub-rules and thus scoring a default 1.0. Just to point it out, >> > I do realize this is a proof-of-concept hack. :) >> >> Missed that - ta! Good job I am not running it in production! > > Exactly why I pointed it out. :) > > guenther > > > -- > char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; > main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: > (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}} > >