I have a stored proc that potentially inserts hundreds of thousands, 
potentially millions, of rows (below).

This stored proc is part of the the sequence of creating an ad campaign and 
links an ad to documents it should be displayed with.

A few of these stored procs can run concurrently as users create ad campaigns.

We have 2 million documents now and linking an ad to all of them takes 5 
minutes on my top-of-the-line SSD MacBook Pro.

Last but not least, the system has to quickly serve ads while documents are 
being linked which is a problem at the moment.

What can I do to make linking documents to ads faster or have less impact on 
the system. I would like the system to be as responsive with serving ads while 
the linking itself is allowed to take a few minutes. 

One thing I'm concerned with, for example, is the whole multi-million row 
insert running within the stored proc transaction. I think inserting rows one 
by one or in small batches may be an improvement. I don't know how to 
accomplish this, though.

        Thanks, Joel

---

CREATE DOMAIN doc_id AS varchar(64);
CREATE DOMAIN id AS int;

CREATE TABLE doc_ads
(
  doc_id    id NOT NULL REFERENCES docs,
  ad_id     id NOT NULL REFERENCES ads,
  distance  float NOT NULL
);

CREATE INDEX doc_ads_idx ON doc_ads(doc_id);

CREATE OR REPLACE FUNCTION link_doc_to_ads(doc id, threshold float) 
RETURNS void AS $$
BEGIN
  INSERT INTO doc_ads (doc_id, ad_id, distance)
  SELECT doc, (t).ad_id, (t).distance
  FROM (SELECT ads_within_distance(topics, threshold) AS t
       FROM docs
       WHERE id = doc) AS x;
  ANALYZE doc_ads;
END;
$$ LANGUAGE plpgsql;

--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to