> -----Original Message-----
> From: Federico Giannici [mailto:[EMAIL PROTECTED] 
> Sent: woensdag 15 november 2006 10:31
> To: users@spamassassin.apache.org
> Subject: Bayes column 'token'
> 
> 
> Last week we migrated our bayes DB from DBM to MySQL.
> Now we have upgraded our MySQL server from version 4.0 to 4.1.
> 
> Today I found a couple of duplicate index values in the 
> "token" column of "bayes_token" table.
> 
> This field is defined as char(5) with default collation
> (that is "latin1_swedish_ci"). Is it the correct one?

Well, bayes_mysql.sql does not specify collation; so, like you said, the
collation will be your MySQL server-set default. And searches in MySQL
are case-insensitive by default. Might indeed perhaps be a good idea
to convert to "latin1_bin" or some such.

There is, btw, now that I look at it, a small bug in:

CREATE TABLE bayes_token (
  id int(11) NOT NULL default '0',
  token char(5) NOT NULL default '',
  spam_count int(11) NOT NULL default '0',
  ham_count int(11) NOT NULL default '0',
  atime int(11) NOT NULL default '0',
  PRIMARY KEY (id, token),
  INDEX bayes_token_idx1 (token),
  INDEX bayes_token_idx2 (id, atime)
) TYPE=MyISAM;

PRIMARY for `id` and `token` should not have INDEX for `id` and `token`
added, too.

- Mark

Reply via email to