Mark> Specifically, this line in addin.py is the culprit:

    Mark>         self.stats = bayes_stats.Stats(bayes_options,
    Mark>                                        
self.classifier_data.message_db)

    Mark> I've not even looked inside that module yet, but that seems quite
    Mark> extreme, to the point I'm not sure the feature is worth that
    Mark> cost...  I guess the code is reading each record of my message DB
    Mark> (which is 85MB) - but does anyone have any insights?

Yes, it appears to be doing just that.  At the end of __init__ it calls
self.CalculatePersistentStats() which loops over all the keys in the
message_db.  The author anticipated this in the docstring:

    Calculate the statistics totals (i.e. not this session).

    This is done by running through the messageinfo database and
    adding up the various information.  This could get quite time
    consuming if the messageinfo database gets very large, so
    some consideration should perhaps be made about what to do
    then.

It might be worth deferring that call until it's really needed (say, in
GetStats()).

Skip
_______________________________________________
spambayes-dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/spambayes-dev

Reply via email to