Jim C. Nasby wrote:
On Sun, Jul 31, 2005 at 08:51:06AM -0800, Matthew Schumacher wrote:

Ok, here is the current plan.

Change the spamassassin API to pass a hash of tokens into the storage
module, pass the tokens to the proc as an array, start a transaction,
load the tokens into a temp table using copy, select the tokens distinct
into the token table for new tokens, update the token table for known
tokens, then commit.


You might consider:
UPDATE tokens
    FROM temp_table (this updates existing records)

INSERT INTO tokens
    SELECT ...
    FROM temp_table
    WHERE NOT IN (SELECT ... FROM tokens)

This way you don't do an update to newly inserted tokens, which helps
keep vacuuming needs in check.

The subselect might be quite a big set, so avoiding a full table scan and materialization by

DELETE temp_table
  WHERE key IN (select key FROM tokens JOIN temp_table);
INSERT INTO TOKENS SELECT * FROM temp_table;

or

INSERT INTO TOKENS
SELECT temp_table.* FROM temp_table LEFT JOIN tokens USING (key)
WHERE tokens.key IS NULL

might be an additional win, assuming that only a small fraction of tokens is inserted and updated.

Regards,
Andreas

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to