Larry M. Rosenbaum wrote on Mon, 06 Oct 2008 15:42:53 -0400:

> So I copied
> the database to a non-production MySQL server and tried to convert
> it there.  It has taken 4 days to convert!  I'm thinking something
> must be wrong.

Yes, converting a database with a 100 million records will take that long 
or longer.

> So the config file says 500 thousand tokens, but the database has
> 105 million entries.  Have I misunderstood something, or is expiry
> not working correctly?

Maybe. Check the bayes_vars table for the token count and then check how 
many tokens the database actually contains. The expiry code just takes the 
token count from bayes_vars and doesn't check for the real record count of 
bayes_token. So, if there's a mismatch things like this can happen.
For me it happened the other way around. After converting to SQL I removed 
all entries older than a year and then ran expiry without changing the 
token count value in bayes_vars. As it was thinking I still had several 
million tokens it slashed almost the complete database and I had to import 
all the stuff again.

BTW: I'm not seeing output like this when I do an expire:
token frequency: 1-occurrence tokens: 0.13%
token frequency: less than 8 occurrences: 0.06%


Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com



Reply via email to