I know, I know: I should've done this before I posted. REINDEXing and VACUUMing mostly fixed this problem. Which gets me back to where I was yesterday, reviewing an import process (that existed previously) that populates tables in this system that seems to allow small data sets to cause simple queries like this to crawl. Is there anything about general COPY/INSERT activity that can cause small data sets to become so severely slow in postgres that can be prevented other than being diligent about VACUUMing? I was hoping that pg_autovacuum along with post-import manual VACUUMs would be sufficient, but it doesn't seem to be the case necessarily. Granted, I haven't done a methodical and complete review of the process, but I'm still surprised at how quickly it seems able to debilitate postgres with even small amounts of data. I had a similar situation crawl yesterday based on a series of COPYs involving 5 rows!

As in, can I look for something to treat the cause rather than the symptoms?

If not, should I be REINDEXing manually, as well as VACUUMing manually after large data imports (whether via COPY or INSERT)? Or will a VACUUM FULL ANALYZE be enough?

Thanks!

-tfo

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On Oct 21, 2004, at 3:36 PM, Thomas F.O'Connell wrote:

I'm seeing some weird behavior on a repurposed server that was wiped clean and set up to run as a database and application server with postgres and Apache, as well as some command-line PHP scripts.

The box itself is a quad processor (2.4 GHz Intel Xeons) Debian woody GNU/Linux (2.6.2) system.

postgres is crawling on some fairly routine queries. I'm wondering if this could somehow be related to the fact that this isn't a database-only server, but Apache is not really using any resources when postgres slows to a crawl.

Here's an example of analysis of a recent query:

EXPLAIN ANALYZE SELECT COUNT(DISTINCT u.id)
FROM userdata as u, userdata_history as h
WHERE h.id = '18181'
AND h.id = u.id;
QUERY PLAN
----------------------------------------------------------------------- ----------------------------------------------------------------------- --
Aggregate (cost=0.02..0.02 rows=1 width=8) (actual time=298321.421..298321.422 rows=1 loops=1)
-> Nested Loop (cost=0.00..0.01 rows=1 width=8) (actual time=1.771..298305.531 rows=2452 loops=1)
Join Filter: ("inner".id = "outer".id)
-> Seq Scan on userdata u (cost=0.00..0.00 rows=1 width=8) (actual time=0.026..11.869 rows=2452 loops=1)
-> Seq Scan on userdata_history h (cost=0.00..0.00 rows=1 width=8) (actual time=0.005..70.519 rows=41631 loops=2452)
Filter: (id = 18181::bigint)
Total runtime: 298321.926 ms
(7 rows)


userdata has a primary/foreign key on id, which references userdata_history.id, which is a primary key.

At the time of analysis, the userdata table had < 2,500 rows. userdata_history had < 50,000 rows. I can't imagine how even a seq scan could result in a runtime of nearly 5 minutes in these circumstances.

Also, doing a count( * ) from each table individually returns nearly instantly.

I can provide details of postgresql.conf and kernel settings if necessary, but I'm using some pretty well tested settings that I use any time I admin a postgres installation these days based on box resources and database size. I'm more interested in knowing if there are any bird's eye details I should be checking immediately.

Thanks.

-tfo

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005


---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Reply via email to