Re: Bayes column 'token'

Stuart Johnston Fri, 17 Nov 2006 14:29:20 -0800

Mark wrote:

-----Original Message-----

From: Mark [mailto:[EMAIL PROTECTED]Sent: woensdag 15 november 2006 18:15

To: 'users@spamassassin.apache.org'
Subject: RE: Bayes column 'token'

Well, bayes_mysql.sql does not specify collation; so, like
you said, the collation will be your MySQL server-set default. And
searches in MySQL are case-insensitive by default. Might indeed
perhaps be a good idea to convert to "latin1_bin" or some such.

There will be any problem if I convert the current data to the new
collation?

I see no indication (or reason) in the code that tokens areto be handled in an case-insensitive manner. The opposite, ere.

So, I'm inclined to say that "latin1_bin" collation is better.
I don't wanna be responsible for messing up your database, though. :)
So I will test this a bit on my Vmware box.


Did the testing; and it works very smooth with latin1_bin.

PRIMARY for `id` and `token` should not have INDEX for `id`
and `token` added, too.
I don't understand what you mean.
The couple (id, token) is PRIMARY, not INDEX...
Where exactly is the problem?
PRIMARY, like UNIQUE, always implies INDEX, too. So, addingan extra INDEX for `id` and `token` basically gives you a double
INDEX for them.


There's a double INDEX for `atime` too. So, I'd say, in
bayes_mysql.sql, replace this:

CREATE TABLE bayes_token (
  id int(11) NOT NULL default '0',
  token char(5) NOT NULL default '',
  spam_count int(11) NOT NULL default '0',
  ham_count int(11) NOT NULL default '0',
  atime int(11) NOT NULL default '0',
  PRIMARY KEY  (id, token),
  INDEX bayes_token_idx1 (token),
  INDEX bayes_token_idx2 (id, atime)
) TYPE=MyISAM;

With:

CREATE TABLE bayes_token (
  id int(11) NOT NULL default '0',
  token char(5) COLLATE latin1_bin NOT NULL default '',
  spam_count int(11) NOT NULL default '0',
  ham_count int(11) NOT NULL default '0',
  atime int(11) NOT NULL default '0',
  PRIMARY KEY (id, token),
  INDEX bayes_token_idx1 (atime)
) TYPE=MyISAM;


Those are multi-column indexes not duplicates.

INDEX bayes_token_idx1 (id, atime)

is NOT the same as:

INDEX bayes_token_idx1 (id)
INDEX bayes_token_idx2 (atime)

Unless you've verified that the SQL used by the Bayes modules doesn't need these indexes, youprobably shouldn't change these.


(sorry I didn't notice this earlier in the thread)

Re: Bayes column 'token'

Reply via email to