From: Matt Kettler [mailto:[EMAIL PROTECTED]
> 1) perl has a substantial base of text parsing and utility libraries
> that no other language can match.. Java does have native regex support,
> so it has a leg up over the others,

Right, but both langs are not that much suited for scoring a message: they 
apply all the rules to the very same piece of text.

It would be interesting, instead, to "invert" this approach by designing a 
finite state machine which is basicly a pre-compiled version of the whole rule 
body. You feed once the message in, and you get the results (i.e.: fired rules 
and/or message score).

I believe that this approach would reduce memory consumption as well as 
execution time a lot.

It would not be suitable for custom plugins, however. But all the standard 
rules (even the "expensive" ones in terms of computational power and memory 
footprint) would probably perform better this way.

The basic idea in the FSM model is that the pre-compiler is going to run just 
sometimes, maybe when a rule gets changed, added or deleted to the rule body. 
The pre-compiler could eventually even optimize the resulting FSM, perhaps by 
"merging" together paths shared by different rules. The .cf files syntax would 
not even need to be changed and this method could even allow for injecting a 
new, pre-compiled rule body version into an alive spamassassin.

Optionally, the FSM approach could be implemented the well-appreciated, actual 
perl by use of an external perl module.

Did anybody heard or thought of something like this?

Do you believe that an FSM would really improve SA performances?

What's your point?

giampaolo

Reply via email to