On 12/20/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Mark> Specifically, this line in addin.py is the culprit: > > Mark> self.stats = bayes_stats.Stats(bayes_options, > Mark> > self.classifier_data.message_db) [...] > It might be worth deferring that call until it's really needed (say, in > GetStats()).
The Stats object tracks two types of statistics: statistics for the current session and total statistics across all Outlook sessions. The total statistics are calculated as the value of the persistent statistics plus the accumulated statistics for the current session. The persistent statistics need to be totalled up before we start accumulating anything into the session statistics using the RecordClassification or RecordTraining methods. Otherwise, session stats accumulated up to the point where the persistent stats are calculated will be included twice. We can probably still defer the call if we are smart about the relationship between the persistent stats and session stats. At whatever point we actually calculate the value of the persistent stats, we need to be aware that the session statistics accumulated up to that point are already included in the message db and subtract those values from the persistent statistics values. Of course, this only solves part of the problem because we would still take a huge hit when displaying the statistics. It might be worth considering an optimization to store the actual statistics values instead of calculating them at the start of every Outlook session. The reason the stats are calculated from the message db is so that the user can reset the starting date for the statistics and still get accurate results. We could recalculate the persistent statistics only when the user changes the start date for the statistics, and store the summary values as a separate record in the message db or in a separate statistics db file. I've been incredibly swamped lately with the work that pays the bills, but I'll try to find some time over the holidays to take a look at this. -- Kenny Pitt _______________________________________________ spambayes-dev mailing list [email protected] http://mail.python.org/mailman/listinfo/spambayes-dev
