Scanning 100 ham and 100 ham on a well-trained Bayes DB and with
auto-learning off.

First, some base data:

top 20 routines:

Total Elapsed Time = 18.36313 Seconds
  User+System Time = 17.22313 Seconds
 9.74   1.677  8.510    804   0.0021 0.0106  ::PerMsgStatus::run_eval_tests
 5.63   0.970  1.449  82037   0.0000 0.0000  ::PerMsgStatus::get
 4.01   0.690  0.671  37988   0.0000 0.0000  DB_File::FETCH
 3.65   0.629  1.071   5134   0.0001 0.0002  ::Bayes::tokenize_line
 2.85   0.491  4.789    201   0.0024 0.0238  ::Bayes::scan
 2.85   0.490  1.596  29737   0.0000 0.0001  ::Bayes::compute_prob_for_token
 2.50   0.430  0.466  35653   0.0000 0.0000  ::Message::Node::header
 2.35   0.404  1.789    201   0.0020 0.0089  ::PerMsgStatus::_head_tests
 2.32   0.400  0.676  35919   0.0000 0.0000  ::Message::Node::get_header
 2.32   0.400  0.935  29737   0.0000 0.0000  ::BayesStoreDBM::tok_unpack
 2.28   0.392  4.351    201   0.0020 0.0216  ::PerMsgStatus::_body_tests
 1.80   0.310  0.458  47786   0.0000 0.0000  ::BayesStoreDBM::is_magic_token
 1.57   0.270  0.226  87237   0.0000 0.0000  UNIVERSAL::can
 1.34   0.230  1.136  29737   0.0000 0.0000  ::BayesStoreDBM::tok_get
 1.28   0.220  0.196  47786   0.0000 0.0000  ::BayesStoreDBM::get_magic_re
 1.22   0.210  1.522    201   0.0010 0.0076  ::Bayes::tokenize
 1.05   0.180  0.175  10050   0.0000 0.0000  ::PerMsgStatus::domain_ratio
 1.05   0.180  0.320    402   0.0004 0.0008  ::PerMsgStatus::get_uri_list
 0.98   0.169  0.209      2   0.0846 0.1043  ::Conf::_parse
 0.87   0.150  0.159    201   0.0007 0.0008  ::PerMsgStatus::_meta_tests

Now, some hand-waving about things someone could look at:

1. It seems like we spend a fair bit time figuring out whether tokens
   are magic tokens:

%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 1.80   0.310  0.458  47786   0.0000 0.0000  ::BayesStoreDBM::is_magic_token
 1.28   0.220  0.196  47786   0.0000 0.0000  ::BayesStoreDBM::get_magic_re

2. header, get_header, get all use a lot of time when added together:

%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 5.63   0.970  1.449  82037   0.0000 0.0000  ::PerMsgStatus::get
 2.50   0.430  0.466  35653   0.0000 0.0000  ::Message::Node::header
 2.32   0.400  0.676  35919   0.0000 0.0000  ::Message::Node::get_header

3. slow stuff:

%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 1.05   0.180  0.175  10050   0.0000 0.0000  ::PerMsgStatus::domain_ratio
 1.05   0.180  0.320    402   0.0004 0.0008  ::PerMsgStatus::get_uri_list

   Safe to ignore domain_ratio since it will only be run once per
   message instead of a bazillion times soon enough, but get_uri_list is
   being run twice per message instead of once and might be made
   faster.  The URI list should be part of the message metadata if it
   isn't already.

4. something interesting seen in one run:

%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 22.0   4.418  7.496      1   4.4179 7.4957  
::BayesStoreDBM::calculate_expire_delta

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/    and open source consulting

Reply via email to