> -----Original Message-----
> From: Mark [mailto:[EMAIL PROTECTED] 
> Sent: woensdag 15 november 2006 18:15
> To: 'users@spamassassin.apache.org'
> Subject: RE: Bayes column 'token'
> 
> > > Well, bayes_mysql.sql does not specify collation; so, like
> > > you said, the collation will be your MySQL server-set default. And
> > > searches in MySQL are case-insensitive by default. Might indeed
> > > perhaps be a good idea to convert to "latin1_bin" or some such.
> >
> > There will be any problem if I convert the current data to the new
> > collation?
> 
> I see no indication (or reason) in the code that tokens are 
> to be handled in an case-insensitive manner. The opposite, ere.
> So, I'm inclined to say that "latin1_bin" collation is better.
> I don't wanna be responsible for messing up your database, though. :)
> So I will test this a bit on my Vmware box.

Did the testing; and it works very smooth with latin1_bin.

> > > PRIMARY for `id` and `token` should not have INDEX for `id`
> > > and `token` added, too.
> >
> > I don't understand what you mean.
> > The couple (id, token) is PRIMARY, not INDEX...
> > Where exactly is the problem?
> 
> PRIMARY, like UNIQUE, always implies INDEX, too. So, adding 
> an extra INDEX for `id` and `token` basically gives you a double
> INDEX for them.

There's a double INDEX for `atime` too. So, I'd say, in
bayes_mysql.sql, replace this:

CREATE TABLE bayes_token (
  id int(11) NOT NULL default '0',
  token char(5) NOT NULL default '',
  spam_count int(11) NOT NULL default '0',
  ham_count int(11) NOT NULL default '0',
  atime int(11) NOT NULL default '0',
  PRIMARY KEY  (id, token),
  INDEX bayes_token_idx1 (token),
  INDEX bayes_token_idx2 (id, atime)
) TYPE=MyISAM;

With:

CREATE TABLE bayes_token (
  id int(11) NOT NULL default '0',
  token char(5) COLLATE latin1_bin NOT NULL default '',
  spam_count int(11) NOT NULL default '0',
  ham_count int(11) NOT NULL default '0',
  atime int(11) NOT NULL default '0',
  PRIMARY KEY (id, token),
  INDEX bayes_token_idx1 (atime)
) TYPE=MyISAM;

- Mark

Reply via email to