On Sun, Jul 31, 2005 at 08:51:06AM -0800, Matthew Schumacher wrote:
> Ok, here is the current plan.
> Change the spamassassin API to pass a hash of tokens into the storage
> module, pass the tokens to the proc as an array, start a transaction,
> load the tokens into a temp table using copy, select the tokens distinct
> into the token table for new tokens, update the token table for known
> tokens, then commit.
You might consider:
FROM temp_table (this updates existing records)
INSERT INTO tokens
WHERE NOT IN (SELECT ... FROM tokens)
This way you don't do an update to newly inserted tokens, which helps
keep vacuuming needs in check.
> This solves the following problems:
> 1. Each email is a transaction instead of each token.
> 2. The update statement is only called when we really need an update
> which avoids all of those searches.
> 3. The looping work is done inside the proc instead of perl calling a
> method a zillion times per email.
> I'm not sure how vacuuming will be done yet, if we vacuum once per email
> that may be too often, so I may do that every 5 mins in cron.
I would suggest leaving an option to have SA vacuum every n emails,
since some people may not want to mess with cron, etc. I suspect that
pg_autovacuum would be able to keep up with things pretty well, though.
Jim C. Nasby, Database Consultant [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828
Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly