Is version 3 really any better at stopping spam that 2.63? I'm running 2.63 and my friend who owns an ISP just upgraded to ver 3, and he claims that 2.63 stopped more spam.
As far as an "out of the box" configuration goes, I'd say 3.0 is orders of magnitude better than the 2.63 version. (Actually why ANYONE is still running 2.63 is a mystery to me. 2.63 is vulnerable to a DoS attack, and you should be running 2.64 if you want your servers to stay in the 2.6x series.)
I'd also say that if you're a default-rules non-bayes user, 3.0 is going to be a HUGE improvement.
However, for someone who's got bayes going and all the add-on rules and packages they can find (antidrug, surbl, rulesemporium, etc) they're likely to experience some reduced hits when they upgrade to 3.x.
The main reason here is that SA 3.0 has a lot of these add-ons integrated, and some of the scores that came out of the GA are a less aggressive than the ones made by some authors as they hand hand-score their rules.
The less aggressive scoring is going to cause more spam misses, but it's also going to reel in the FP rate. The scores are now balanced with the scores of the other rules, not adding score on top of an already balanced system.
Go figure, if you add rules to a balanced system without rebalancing, the average score of all messages, spam and nonspam, goes up. You catch more spam, and you catch more FPs.
SA 3.0 is also significantly less aggressive in bayes scoring, and I think this is largely a reflection of the increased accuracy of the rules picking up a lot of the slack which 2.5x and 2.6x left for bayes to take care. Let's face it. 2.5x and 2.6x had pretty lousy default rule sets, but the power of bayes made the system still catch spam pretty well.
2.6x without any bayes or add-on rules is pretty much hopelessly ineffective against current spam. With bayes or add-ons it works pretty well. With both it works very well at catching spam, but also has a much higher FP ratio than the SA devs are willing to accept for a shipping stable release. (The general rule for the scoring system is FPs are 100 times worse than FNs)