I guess this question is to Michael in particular.

I thought of a very simple optimization, but I can't test yet as I am
still recovering from having had my computer in the shop for repairs. Can you say if it makes sense and if it does try it?


In Bayes.pm, the subroutine scan gathers all the token thens uses map to
call compute_prob_for_token once for each token in the message, which
results in a call to tok_get. compute_prob_for_token is written to allow
for the possibility that the data has already been fetched and is passed
in, but that isn't done, so there is one call to tok_get per token.

tok_get in BayesStoreSQL.pm contains
"SELECT spam_count, ham_count, atime
   FROM bayes_token
  WHERE username = ?
    AND token = ?";

Would it be a lot more efficient in MySQL or other SQL engines if once
scan had all the tokens from the message it could call a tok_get_all
that used token IN ... to fetch all the tokens in the message in one
select? scan could call tok_get_all and then the map could pass each set
of values to compute_prob_for_token when it call it.

 -- sidney



Reply via email to