Mark> Specifically, this line in addin.py is the culprit:
Mark> self.stats = bayes_stats.Stats(bayes_options,
Mark>
self.classifier_data.message_db)
Mark> I've not even looked inside that module yet, but that seems quite
Mark> extreme, to the point I'm not sure the feature is worth that
Mark> cost... I guess the code is reading each record of my message DB
Mark> (which is 85MB) - but does anyone have any insights?
Yes, it appears to be doing just that. At the end of __init__ it calls
self.CalculatePersistentStats() which loops over all the keys in the
message_db. The author anticipated this in the docstring:
Calculate the statistics totals (i.e. not this session).
This is done by running through the messageinfo database and
adding up the various information. This could get quite time
consuming if the messageinfo database gets very large, so
some consideration should perhaps be made about what to do
then.
It might be worth deferring that call until it's really needed (say, in
GetStats()).
Skip
_______________________________________________
spambayes-dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/spambayes-dev