Josh Berkus wrote:
> Matt,
>
>
>>After playing with various indexes and what not I simply am unable to
>>make this procedure perform any better. Perhaps someone on the list can
>>spot the bottleneck and reveal why this procedure isn't performing that
>>well or ways to make it better.
>
>
> Well, my first thought is that this is a pretty complicated procedure for
> something you want to peform well. Is all this logic really necessary?
> How does it get done for MySQL?
>
I'm not sure if it's all needed, in mysql they have this simple schema:
===============================================
CREATE TABLE bayes_expire (
id int(11) NOT NULL default '0',
runtime int(11) NOT NULL default '0',
KEY bayes_expire_idx1 (id)
) TYPE=MyISAM;
CREATE TABLE bayes_global_vars (
variable varchar(30) NOT NULL default '',
value varchar(200) NOT NULL default '',
PRIMARY KEY (variable)
) TYPE=MyISAM;
INSERT INTO bayes_global_vars VALUES ('VERSION','3');
CREATE TABLE bayes_seen (
id int(11) NOT NULL default '0',
msgid varchar(200) binary NOT NULL default '',
flag char(1) NOT NULL default '',
PRIMARY KEY (id,msgid)
) TYPE=MyISAM;
CREATE TABLE bayes_token (
id int(11) NOT NULL default '0',
token char(5) NOT NULL default '',
spam_count int(11) NOT NULL default '0',
ham_count int(11) NOT NULL default '0',
atime int(11) NOT NULL default '0',
PRIMARY KEY (id, token),
INDEX bayes_token_idx1 (token),
INDEX bayes_token_idx2 (id, atime)
) TYPE=MyISAM;
CREATE TABLE bayes_vars (
id int(11) NOT NULL AUTO_INCREMENT,
username varchar(200) NOT NULL default '',
spam_count int(11) NOT NULL default '0',
ham_count int(11) NOT NULL default '0',
token_count int(11) NOT NULL default '0',
last_expire int(11) NOT NULL default '0',
last_atime_delta int(11) NOT NULL default '0',
last_expire_reduce int(11) NOT NULL default '0',
oldest_token_age int(11) NOT NULL default '2147483647',
newest_token_age int(11) NOT NULL default '0',
PRIMARY KEY (id),
UNIQUE bayes_vars_idx1 (username)
) TYPE=MyISAM;
===============================================
Then they do this to insert the token:
INSERT INTO bayes_token (
id,
token,
spam_count,
ham_count,
atime
) VALUES (
?,
?,
?,
?,
?
) ON DUPLICATE KEY
UPDATE
spam_count = GREATEST(spam_count + ?, 0),
ham_count = GREATEST(ham_count + ?, 0),
atime = GREATEST(atime, ?)
Or update the token:
UPDATE bayes_vars SET
$token_count_update
newest_token_age = GREATEST(newest_token_age, ?),
oldest_token_age = LEAST(oldest_token_age, ?)
WHERE id = ?
I think the reason why the procedure was written for postgres was
because of the greatest and least statements performing poorly.
Honestly, I'm not real up on writing procs, I was hoping the problem
would be obvious to someone.
schu
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster