From: Matt Kettler [mailto:[EMAIL PROTECTED] > 1) perl has a substantial base of text parsing and utility libraries > that no other language can match.. Java does have native regex support, > so it has a leg up over the others,
Right, but both langs are not that much suited for scoring a message: they apply all the rules to the very same piece of text. It would be interesting, instead, to "invert" this approach by designing a finite state machine which is basicly a pre-compiled version of the whole rule body. You feed once the message in, and you get the results (i.e.: fired rules and/or message score). I believe that this approach would reduce memory consumption as well as execution time a lot. It would not be suitable for custom plugins, however. But all the standard rules (even the "expensive" ones in terms of computational power and memory footprint) would probably perform better this way. The basic idea in the FSM model is that the pre-compiler is going to run just sometimes, maybe when a rule gets changed, added or deleted to the rule body. The pre-compiler could eventually even optimize the resulting FSM, perhaps by "merging" together paths shared by different rules. The .cf files syntax would not even need to be changed and this method could even allow for injecting a new, pre-compiled rule body version into an alive spamassassin. Optionally, the FSM approach could be implemented the well-appreciated, actual perl by use of an external perl module. Did anybody heard or thought of something like this? Do you believe that an FSM would really improve SA performances? What's your point? giampaolo