Benny Pedersen wrote:
> On Fri, January 12, 2007 02:14, Matt Kettler wrote:
>
>
>> form of expiry is one reason why I say the AWL isn't really ready for
>> production use on any servers that have decent mail volume)
>>
>
> if one entry is just deleted when will there be records with 2 ?
>
I don't understand what you're saying here, at all. I'll take a wild
guess at what you might mean..
IMHO, the AWL should use atime based expiry, just like bayes. As it
stands now, the "number of hits" based purge algorithm is an absurdly
cheap hack at best and is a significant downside to the practical
usability of the AWL for anyone with a decent-sized mailserver.
This of course means the format of the AWL database needs to change,
because right now it doesn't store atime.
> awl is tricky but good, we have to live with it or make some changes to
how
> its updated, eg if and email adresse is seen just long time ago and
newer
> later delete it from avl, just delete the one 1 entrys makes it not work
>
>
I *think* you're in agreement with what I just said. Using last-accessed
time instead of hit-count makes substantially more sense.
By moving AWL to SQL this can be accomplished. Here is a sample for MySQL:
Add a new field:
ALTER TABLE awl ADD lastupdate timestamp(14) NOT NULL;
If you have a small data set, optionally initialize existing records:
UPDATE awl SET lastupdate = NOW( ) WHERE lastupdate < 1;
NOTE: to prevent compounding the problem by adding all this extra lastupdate
data if you have a large record set it would probably be better to NOT
initialize existing records, letting only new records get time stamped.
Then be patient enough to wait a couple weeks or so before deleting any
records (because the first command below should delete any records that
are not time stamped).
then start daily or weekly maintenance:
DELETE FROM awl WHERE lastupdate <= DATE_SUB(SYSDATE(), INTERVAL 4 MONTH);
DELETE FROM awl WHERE count = 1 AND lastupdate <= DATE_SUB(SYSDATE(),
INTERVAL 15 DAY);
I don't see why this method could not also be used for bayes_seen.
I was not aware bayes_seen would grow forever so I am going to implement
this
on my own system next week.
ALTER TABLE bayes_seen ADD lastupdate timestamp(14) NOT NULL;
Then wait a few weeks before implementing:
DELETE FROM bayes_seen WHERE lastupdate <= DATE_SUB(SYSDATE(), INTERVAL 2
MONTH);
I am not that familiar with MySQL and Bayes however so I would appreciate it
if someone would point out potential problems with this.
Gary V
_________________________________________________________________
Get live scores and news about your team: Add the Live.com Football Page
www.live.com/?addtemplate=football&icid=T001MSN30A0701