Mark wrote:
-----Original Message-----
From: Mark [mailto:[EMAIL PROTECTED] Sent: woensdag 15 november 2006 18:15
To: 'users@spamassassin.apache.org'
Subject: RE: Bayes column 'token'

Well, bayes_mysql.sql does not specify collation; so, like
you said, the collation will be your MySQL server-set default. And
searches in MySQL are case-insensitive by default. Might indeed
perhaps be a good idea to convert to "latin1_bin" or some such.
There will be any problem if I convert the current data to the new
collation?
I see no indication (or reason) in the code that tokens are to be handled in an case-insensitive manner. The opposite, ere.
So, I'm inclined to say that "latin1_bin" collation is better.
I don't wanna be responsible for messing up your database, though. :)
So I will test this a bit on my Vmware box.

Did the testing; and it works very smooth with latin1_bin.

PRIMARY for `id` and `token` should not have INDEX for `id`
and `token` added, too.
I don't understand what you mean.
The couple (id, token) is PRIMARY, not INDEX...
Where exactly is the problem?
PRIMARY, like UNIQUE, always implies INDEX, too. So, adding an extra INDEX for `id` and `token` basically gives you a double
INDEX for them.

There's a double INDEX for `atime` too. So, I'd say, in
bayes_mysql.sql, replace this:

CREATE TABLE bayes_token (
  id int(11) NOT NULL default '0',
  token char(5) NOT NULL default '',
  spam_count int(11) NOT NULL default '0',
  ham_count int(11) NOT NULL default '0',
  atime int(11) NOT NULL default '0',
  PRIMARY KEY  (id, token),
  INDEX bayes_token_idx1 (token),
  INDEX bayes_token_idx2 (id, atime)
) TYPE=MyISAM;

With:

CREATE TABLE bayes_token (
  id int(11) NOT NULL default '0',
  token char(5) COLLATE latin1_bin NOT NULL default '',
  spam_count int(11) NOT NULL default '0',
  ham_count int(11) NOT NULL default '0',
  atime int(11) NOT NULL default '0',
  PRIMARY KEY (id, token),
  INDEX bayes_token_idx1 (atime)
) TYPE=MyISAM;

Those are multi-column indexes not duplicates.

INDEX bayes_token_idx1 (id, atime)

is NOT the same as:

INDEX bayes_token_idx1 (id)
INDEX bayes_token_idx2 (atime)

Unless you've verified that the SQL used by the Bayes modules doesn't need these indexes, you probably shouldn't change these.

(sorry I didn't notice this earlier in the thread)

Reply via email to