[PERFORM] Triggers During COPY

2005-01-27 Thread Thomas F . O'Connell
I'm involved in an implementation of doing trigger-based counting as a 
substitute for count( * ) in real time in an application. My 
trigger-based counts seem to be working fine and dramatically improve 
the performance of the display of the counts in the application layer.

The problem comes in importing new data into the tables for which the 
counts are maintained. The current import process does some 
preprocessing and then does a COPY from the filesystem to one of the 
tables on which counts are maintained. This means that for each row 
being inserted by COPY, a trigger is fired. This didn't seem like a big 
deal to me until testing began on realistic data sets.

For a 5,000-record import, preprocessing plus the COPY took about 5 
minutes. Once the triggers used for maintaining the counts were added, 
this grew to 25 minutes. While I knew there would be a slowdown per row 
affected, I expected something closer to 2x than to 5x.

It's not unrealistic for this system to require data imports on the 
order of 100,000 records. Whereas this would've taken at most an hour 
and a half before (preprocessing takes a couple of minutes, so the 
actual original COPY takes closer to 2-3 minutes, or just over 1500 
rows per minute), the new version is likely to take more than 7 hours, 
which seems unreasonable to me. Additionally, the process is fairly CPU 
intensive.

I've examined the plans, and, as far as I can tell, the trigger 
functions are being prepared and using the indexes on the involved 
tables, which are hundreds of thousands of rows in the worst cases. The 
basic structure of the functions is a status lookup SELECT (to 
determine whether a count needs to be updated and which one) and one or 
two UPDATE statements (depending on whether both an increment and a 
decrement need to be performed). As I said, it looks like this basic 
format is using indexes appropriately.

Is there anything I could be overlooking that would tweak some more 
performance out of this scenario?

Would it be absurd to drop the triggers during import and recreate them 
afterward and update the counts in a summary update based on 
information from the import process?

-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Triggers During COPY

2005-01-27 Thread Josh Berkus
Thomas,

 Would it be absurd to drop the triggers during import and recreate them
 afterward and update the counts in a summ ary update based on 
 information from the import process?

That's what I'd do.

Also, might I suggest storing the counts in memcached (see the pgmemached 
project on pgFoundry) rather than in a table?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Triggers During COPY

2005-01-27 Thread Mark Kirkwood
Thomas F.O'Connell wrote:
The problem comes in importing new data into the tables for which the 
counts are maintained. The current import process does some 
preprocessing and then does a COPY from the filesystem to one of the 
tables on which counts are maintained. This means that for each row 
being inserted by COPY, a trigger is fired. This didn't seem like a big 
deal to me until testing began on realistic data sets.

For a 5,000-record import, preprocessing plus the COPY took about 5 
minutes. Once the triggers used for maintaining the counts were added, 
this grew to 25 minutes. While I knew there would be a slowdown per row 
affected, I expected something closer to 2x than to 5x.
rformance out of this scenario?

Have been seeing similar behavior whilst testing sample code for the 8.0
docs (summary table plpgsql trigger example).
I think the nub of the problem is dead tuples bloat in the summary /
count table, so each additional triggered update becomes more and more
expensive as time goes on. I suspect the performance decrease is
exponential with the no of rows to be processed.

Would it be absurd to drop the triggers during import and recreate them 
afterward and update the counts in a summary update based on information 
from the import process?


That's the conclusion I came to :-)
regards
Mark
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Triggers During COPY

2005-01-27 Thread Thomas F . O'Connell
I forgot to mention that I'm running 7.4.6. The README includes the 
caveat that pgmemcache is designed for use with 8.0. My instinct is to 
be hesitant using something like that in a production environment 
without some confidence that people have done so with good and reliable 
success or without more extensive testing than I'm likely to have time 
for primarily because support for 7.4.x is never likely to increase. 
Thanks for the tip, though.

For the time being, it sounds like I'll probably try to implement the 
drop/create trigger setup during import.

-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
On Jan 27, 2005, at 11:41 PM, Josh Berkus wrote:
Thomas,
Would it be absurd to drop the triggers during import and recreate 
them
afterward and update the counts in a summ ary update based on
information from the import process?
That's what I'd do.
Also, might I suggest storing the counts in memcached (see the 
pgmemached
project on pgFoundry) rather than in a table?

--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org