Matthew Schumacher wrote:
> Tom Lane wrote:
>>I don't really see why you think that this path is going to lead to
>>better performance than where you were before. Manipulation of the
>>temp table is never going to be free, and IN (sub-select) is always
>>inherently not fast, and NOT IN (sub-select) is always inherently
>>awful. Throwing a pile of simple queries at the problem is not
>>necessarily the wrong way ... especially when you are doing it in
>>plpgsql, because you've already eliminated the overhead of network
>>round trips and repeated planning of the queries.
>> regards, tom lane
> The reason why I think this may be faster is because I would avoid
> running an update on data that needs to be inserted which saves
> searching though the table for a matching token.
> Perhaps I should do the insert first, then drop those tokens from the
> temp table, then do my updates in a loop.
> I'll have to do some benchmarking...
Tom, I think your right, whenever I do a NOT IN it does a full table
scan against bayes_token and since that table is going to get very big
doing the simple query in a loop that uses an index seems a bit faster.
John, thanks for your help, it was worth a try, but it looks like the
looping is just faster.
Here is what I have so far in case anyone else has ideas before I
CREATE OR REPLACE FUNCTION put_tokens(inuserid INTEGER,
RETURNS VOID AS '
spam_count = greatest_int(spam_count + inspam_count, 0),
ham_count = greatest_int(ham_count + inham_count , 0),
atime = greatest_int(atime, inatime)
id = inuserid
(token) IN (SELECT bayes_token_tmp FROM bayes_token_tmp(intokenary));
token_count = token_count + (
bayes_token_tmp NOT IN (SELECT token FROM bayes_token)),
newest_token_age = greatest_int(newest_token_age, inatime),
oldest_token_age = least_int(oldest_token_age, inatime)
id = inuserid;
(inspam_count > 0 OR inham_count > 0)
(bayes_token_tmp) NOT IN (SELECT token FROM bayes_token);
' LANGUAGE 'plpgsql';
CREATE OR REPLACE FUNCTION bayes_token_tmp(intokenary BYTEA) RETURNS
SETOF bytea AS
for i in array_lower(intokenary, 1) .. array_upper(intokenary, 1)
return next intokenary[i];
CREATE OR REPLACE FUNCTION greatest_int (integer, integer)
AS 'SELECT CASE WHEN $1 < $2 THEN $2 ELSE $1 END;'
CREATE OR REPLACE FUNCTION least_int (integer, integer)
AS 'SELECT CASE WHEN $1 < $2 THEN $1 ELSE $2 END;'
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?