Re: Bayes column 'token'
CREATE TABLE bayes_token ( PRIMARY KEY (id, token), INDEX bayes_token_idx1 (token), INDEX bayes_token_idx2 (id, atime) ) TYPE=MyISAM; PRIMARY for `id` and `token` should not have INDEX for `id` and `token` added, too. Why not? IIRC the three indexes above makes perfect sense. Like this: WHERE id=xxx AND token=xxx will use the primary index. WHERE token=xxx will use the bayes_token_idx1 index. WHERE id=xxx AND atime=xxx will use the bayes_token_idx2 index. Again IIRC, the clause WHERE token=xxx should be faster with the existance of the bayes_token_idx1 index than without it. Or is it simply that the MySQL bayse store module never queries with token as the first column in a WHERE clause? I might of course completely misremenber this, so rather than trust me I'd suggest reading about index optimization or something like that in the current MySQL documentation. :-) Regards /Jonas -- Jonas Eckerman, FSDB Fruktträdet http://whatever.frukt.org/ http://www.fsdb.org/ http://www.frukt.org/
Re: Bayes column 'token'
Michael Alan Dorman wrote: CREATE TABLE bayes_token ( PRIMARY KEY (id, token), INDEX bayes_token_idx1 (token), INDEX bayes_token_idx2 (id, atime) ) TYPE=MyISAM; If the primary key was changed to (token, id), it should be able to be used in the second sort of query as well as the first, no? Ah. Yes. I missed that. Yes, if the key was (token, id), there might be no need at all for the bayes_token_idx1 index above. Sorry for my confusion. I'm no SQL or MySQL guru, so maybe I'm missing something though. I have no idea how one best optimizes a MySQL table for using WHERE xxx IN (yyy) for example, and the module does have WHERE id = ? AND token IN (...) ... in a UPDATE statement. Could that be the reason for the separate (token) index? I'm supposing that whoever wrote the MySQL bayes storage module had a reason for doing what they did and that I simply am not good enough at SQL to realize what the reason is. :-) Regards /Jonas -- Jonas Eckerman, FSDB Fruktträdet http://whatever.frukt.org/ http://www.fsdb.org/ http://www.frukt.org/
Re: Bayes column 'token'
On Tue, 21 Nov 2006 13:42:09 +0100 Jonas Eckerman [EMAIL PROTECTED] wrote: CREATE TABLE bayes_token ( PRIMARY KEY (id, token), INDEX bayes_token_idx1 (token), INDEX bayes_token_idx2 (id, atime) ) TYPE=MyISAM; PRIMARY for `id` and `token` should not have INDEX for `id` and `token` added, too. Why not? IIRC the three indexes above makes perfect sense. Like this: WHERE id=xxx AND token=xxx will use the primary index. WHERE token=xxx will use the bayes_token_idx1 index. WHERE id=xxx AND atime=xxx will use the bayes_token_idx2 index. Again IIRC, the clause WHERE token=xxx should be faster with the existance of the bayes_token_idx1 index than without it. If the primary key was changed to (token, id), it should be able to be used in the second sort of query as well as the first, no? Or is MySQL not smart enough to recognize that it's got an index it could match on a prefix basis? Or is it simply that the MySQL bayse store module never queries with token as the first column in a WHERE clause? The position of a column in the WHERE clause shouldn't make a difference whether an index is used; the nature of SQL is such that WHERE clauses should be reorderable. I'm a PostgreSQL guy myself, but I would still be surprised if MySQL were limited in this way. Mike.
RE: Bayes column 'token'
-Original Message- From: Mark [mailto:[EMAIL PROTECTED] Sent: woensdag 15 november 2006 18:15 To: 'users@spamassassin.apache.org' Subject: RE: Bayes column 'token' Well, bayes_mysql.sql does not specify collation; so, like you said, the collation will be your MySQL server-set default. And searches in MySQL are case-insensitive by default. Might indeed perhaps be a good idea to convert to latin1_bin or some such. There will be any problem if I convert the current data to the new collation? I see no indication (or reason) in the code that tokens are to be handled in an case-insensitive manner. The opposite, ere. So, I'm inclined to say that latin1_bin collation is better. I don't wanna be responsible for messing up your database, though. :) So I will test this a bit on my Vmware box. Did the testing; and it works very smooth with latin1_bin. PRIMARY for `id` and `token` should not have INDEX for `id` and `token` added, too. I don't understand what you mean. The couple (id, token) is PRIMARY, not INDEX... Where exactly is the problem? PRIMARY, like UNIQUE, always implies INDEX, too. So, adding an extra INDEX for `id` and `token` basically gives you a double INDEX for them. There's a double INDEX for `atime` too. So, I'd say, in bayes_mysql.sql, replace this: CREATE TABLE bayes_token ( id int(11) NOT NULL default '0', token char(5) NOT NULL default '', spam_count int(11) NOT NULL default '0', ham_count int(11) NOT NULL default '0', atime int(11) NOT NULL default '0', PRIMARY KEY (id, token), INDEX bayes_token_idx1 (token), INDEX bayes_token_idx2 (id, atime) ) TYPE=MyISAM; With: CREATE TABLE bayes_token ( id int(11) NOT NULL default '0', token char(5) COLLATE latin1_bin NOT NULL default '', spam_count int(11) NOT NULL default '0', ham_count int(11) NOT NULL default '0', atime int(11) NOT NULL default '0', PRIMARY KEY (id, token), INDEX bayes_token_idx1 (atime) ) TYPE=MyISAM; - Mark
Re: Bayes column 'token'
Mark wrote: -Original Message- From: Mark [mailto:[EMAIL PROTECTED] Sent: woensdag 15 november 2006 18:15 To: 'users@spamassassin.apache.org' Subject: RE: Bayes column 'token' Well, bayes_mysql.sql does not specify collation; so, like you said, the collation will be your MySQL server-set default. And searches in MySQL are case-insensitive by default. Might indeed perhaps be a good idea to convert to latin1_bin or some such. There will be any problem if I convert the current data to the new collation? I see no indication (or reason) in the code that tokens are to be handled in an case-insensitive manner. The opposite, ere. So, I'm inclined to say that latin1_bin collation is better. I don't wanna be responsible for messing up your database, though. :) So I will test this a bit on my Vmware box. Did the testing; and it works very smooth with latin1_bin. PRIMARY for `id` and `token` should not have INDEX for `id` and `token` added, too. I don't understand what you mean. The couple (id, token) is PRIMARY, not INDEX... Where exactly is the problem? PRIMARY, like UNIQUE, always implies INDEX, too. So, adding an extra INDEX for `id` and `token` basically gives you a double INDEX for them. There's a double INDEX for `atime` too. So, I'd say, in bayes_mysql.sql, replace this: CREATE TABLE bayes_token ( id int(11) NOT NULL default '0', token char(5) NOT NULL default '', spam_count int(11) NOT NULL default '0', ham_count int(11) NOT NULL default '0', atime int(11) NOT NULL default '0', PRIMARY KEY (id, token), INDEX bayes_token_idx1 (token), INDEX bayes_token_idx2 (id, atime) ) TYPE=MyISAM; With: CREATE TABLE bayes_token ( id int(11) NOT NULL default '0', token char(5) COLLATE latin1_bin NOT NULL default '', spam_count int(11) NOT NULL default '0', ham_count int(11) NOT NULL default '0', atime int(11) NOT NULL default '0', PRIMARY KEY (id, token), INDEX bayes_token_idx1 (atime) ) TYPE=MyISAM; Those are multi-column indexes not duplicates. INDEX bayes_token_idx1 (id, atime) is NOT the same as: INDEX bayes_token_idx1 (id) INDEX bayes_token_idx2 (atime) Unless you've verified that the SQL used by the Bayes modules doesn't need these indexes, you probably shouldn't change these. (sorry I didn't notice this earlier in the thread)
RE: Bayes column 'token'
-Original Message- From: Stuart Johnston [mailto:[EMAIL PROTECTED] Sent: vrijdag 17 november 2006 23:30 To: users@spamassassin.apache.org Subject: Re: Bayes column 'token' CREATE TABLE bayes_token ( id int(11) NOT NULL default '0', token char(5) COLLATE latin1_bin NOT NULL default '', spam_count int(11) NOT NULL default '0', ham_count int(11) NOT NULL default '0', atime int(11) NOT NULL default '0', PRIMARY KEY (id, token), INDEX bayes_token_idx1 (atime) ) TYPE=MyISAM; Those are multi-column indexes not duplicates. INDEX bayes_token_idx1 (id, atime) is NOT the same as: INDEX bayes_token_idx1 (id) INDEX bayes_token_idx2 (atime) The multi-column INDEX actually *does* appear to be used. My bad. Though it still doesn't justify the existence of: INDEX bayes_token_idx1 (token) - Mark
Re: Bayes column 'token'
Mark wrote: -Original Message- From: Stuart Johnston [mailto:[EMAIL PROTECTED] Sent: vrijdag 17 november 2006 23:30 To: users@spamassassin.apache.org Subject: Re: Bayes column 'token' CREATE TABLE bayes_token ( id int(11) NOT NULL default '0', token char(5) COLLATE latin1_bin NOT NULL default '', spam_count int(11) NOT NULL default '0', ham_count int(11) NOT NULL default '0', atime int(11) NOT NULL default '0', PRIMARY KEY (id, token), INDEX bayes_token_idx1 (atime) ) TYPE=MyISAM; Those are multi-column indexes not duplicates. INDEX bayes_token_idx1 (id, atime) is NOT the same as: INDEX bayes_token_idx1 (id) INDEX bayes_token_idx2 (atime) The multi-column INDEX actually *does* appear to be used. My bad. Though it still doesn't justify the existence of: INDEX bayes_token_idx1 (token) I haven't looked at the SQL but does it not do something like: WHERE token = ?
RE: Bayes column 'token'
-Original Message- From: Federico Giannici [mailto:[EMAIL PROTECTED] Sent: woensdag 15 november 2006 10:31 To: users@spamassassin.apache.org Subject: Bayes column 'token' Last week we migrated our bayes DB from DBM to MySQL. Now we have upgraded our MySQL server from version 4.0 to 4.1. Today I found a couple of duplicate index values in the token column of bayes_token table. This field is defined as char(5) with default collation (that is latin1_swedish_ci). Is it the correct one? Well, bayes_mysql.sql does not specify collation; so, like you said, the collation will be your MySQL server-set default. And searches in MySQL are case-insensitive by default. Might indeed perhaps be a good idea to convert to latin1_bin or some such. There is, btw, now that I look at it, a small bug in: CREATE TABLE bayes_token ( id int(11) NOT NULL default '0', token char(5) NOT NULL default '', spam_count int(11) NOT NULL default '0', ham_count int(11) NOT NULL default '0', atime int(11) NOT NULL default '0', PRIMARY KEY (id, token), INDEX bayes_token_idx1 (token), INDEX bayes_token_idx2 (id, atime) ) TYPE=MyISAM; PRIMARY for `id` and `token` should not have INDEX for `id` and `token` added, too. - Mark
Re: Bayes column 'token'
Mark wrote: -Original Message- From: Federico Giannici [mailto:[EMAIL PROTECTED] Sent: woensdag 15 november 2006 10:31 To: users@spamassassin.apache.org Subject: Bayes column 'token' Last week we migrated our bayes DB from DBM to MySQL. Now we have upgraded our MySQL server from version 4.0 to 4.1. Today I found a couple of duplicate index values in the token column of bayes_token table. This field is defined as char(5) with default collation (that is latin1_swedish_ci). Is it the correct one? Well, bayes_mysql.sql does not specify collation; so, like you said, the collation will be your MySQL server-set default. And searches in MySQL are case-insensitive by default. Might indeed perhaps be a good idea to convert to latin1_bin or some such. There will be any problem if I convert the current data to the new collation? There is, btw, now that I look at it, a small bug in: CREATE TABLE bayes_token ( id int(11) NOT NULL default '0', token char(5) NOT NULL default '', spam_count int(11) NOT NULL default '0', ham_count int(11) NOT NULL default '0', atime int(11) NOT NULL default '0', PRIMARY KEY (id, token), INDEX bayes_token_idx1 (token), INDEX bayes_token_idx2 (id, atime) ) TYPE=MyISAM; PRIMARY for `id` and `token` should not have INDEX for `id` and `token` added, too. I don't understand what you mean. The couple (id, token) is PRIMARY, not INDEX... Where exactly is the problem? Thanks. -- ___ __ |- [EMAIL PROTECTED] |ederico Giannici http://www.neomedia.it ___
RE: Bayes column 'token'
-Original Message- From: Federico Giannici [mailto:[EMAIL PROTECTED] Sent: woensdag 15 november 2006 17:42 To: Mark Cc: users@spamassassin.apache.org Subject: Re: Bayes column 'token' Well, bayes_mysql.sql does not specify collation; so, like you said, the collation will be your MySQL server-set default. And searches in MySQL are case-insensitive by default. Might indeed perhaps be a good idea to convert to latin1_bin or some such. There will be any problem if I convert the current data to the new collation? I see no indication (or reason) in the code that tokens are to be handled in an case-insensitive manner. The opposite, ere. So, I'm inclined to say that latin1_bin collation is better. I don't wanna be responsible for messing up your database, though. :) So I will test this a bit on my Vmware box. PRIMARY for `id` and `token` should not have INDEX for `id` and `token` added, too. I don't understand what you mean. The couple (id, token) is PRIMARY, not INDEX... Where exactly is the problem? PRIMARY, like UNIQUE, always implies INDEX, too. So, adding an extra INDEX for `id` and `token` basically gives you a double INDEX for them. - Mark