Scanning 100 ham and 100 ham on a well-trained Bayes DB and with auto-learning off.
First, some base data: top 20 routines: Total Elapsed Time = 18.36313 Seconds User+System Time = 17.22313 Seconds 9.74 1.677 8.510 804 0.0021 0.0106 ::PerMsgStatus::run_eval_tests 5.63 0.970 1.449 82037 0.0000 0.0000 ::PerMsgStatus::get 4.01 0.690 0.671 37988 0.0000 0.0000 DB_File::FETCH 3.65 0.629 1.071 5134 0.0001 0.0002 ::Bayes::tokenize_line 2.85 0.491 4.789 201 0.0024 0.0238 ::Bayes::scan 2.85 0.490 1.596 29737 0.0000 0.0001 ::Bayes::compute_prob_for_token 2.50 0.430 0.466 35653 0.0000 0.0000 ::Message::Node::header 2.35 0.404 1.789 201 0.0020 0.0089 ::PerMsgStatus::_head_tests 2.32 0.400 0.676 35919 0.0000 0.0000 ::Message::Node::get_header 2.32 0.400 0.935 29737 0.0000 0.0000 ::BayesStoreDBM::tok_unpack 2.28 0.392 4.351 201 0.0020 0.0216 ::PerMsgStatus::_body_tests 1.80 0.310 0.458 47786 0.0000 0.0000 ::BayesStoreDBM::is_magic_token 1.57 0.270 0.226 87237 0.0000 0.0000 UNIVERSAL::can 1.34 0.230 1.136 29737 0.0000 0.0000 ::BayesStoreDBM::tok_get 1.28 0.220 0.196 47786 0.0000 0.0000 ::BayesStoreDBM::get_magic_re 1.22 0.210 1.522 201 0.0010 0.0076 ::Bayes::tokenize 1.05 0.180 0.175 10050 0.0000 0.0000 ::PerMsgStatus::domain_ratio 1.05 0.180 0.320 402 0.0004 0.0008 ::PerMsgStatus::get_uri_list 0.98 0.169 0.209 2 0.0846 0.1043 ::Conf::_parse 0.87 0.150 0.159 201 0.0007 0.0008 ::PerMsgStatus::_meta_tests Now, some hand-waving about things someone could look at: 1. It seems like we spend a fair bit time figuring out whether tokens are magic tokens: %Time ExclSec CumulS #Calls sec/call Csec/c Name 1.80 0.310 0.458 47786 0.0000 0.0000 ::BayesStoreDBM::is_magic_token 1.28 0.220 0.196 47786 0.0000 0.0000 ::BayesStoreDBM::get_magic_re 2. header, get_header, get all use a lot of time when added together: %Time ExclSec CumulS #Calls sec/call Csec/c Name 5.63 0.970 1.449 82037 0.0000 0.0000 ::PerMsgStatus::get 2.50 0.430 0.466 35653 0.0000 0.0000 ::Message::Node::header 2.32 0.400 0.676 35919 0.0000 0.0000 ::Message::Node::get_header 3. slow stuff: %Time ExclSec CumulS #Calls sec/call Csec/c Name 1.05 0.180 0.175 10050 0.0000 0.0000 ::PerMsgStatus::domain_ratio 1.05 0.180 0.320 402 0.0004 0.0008 ::PerMsgStatus::get_uri_list Safe to ignore domain_ratio since it will only be run once per message instead of a bazillion times soon enough, but get_uri_list is being run twice per message instead of once and might be made faster. The URI list should be part of the message metadata if it isn't already. 4. something interesting seen in one run: %Time ExclSec CumulS #Calls sec/call Csec/c Name 22.0 4.418 7.496 1 4.4179 7.4957 ::BayesStoreDBM::calculate_expire_delta -- Daniel Quinlan anti-spam (SpamAssassin), Linux, http://www.pathname.com/~quinlan/ and open source consulting
