[PERFORM] LIMIT confuses the planner (again)
Hello, I am using PostgreSQL 8.3.7 and I am experiencing an issue similar to the one I've already described some time ago: http://archives.postgresql.org/pgsql-performance/2009-02/msg00261.php Again, adding a LIMIT clause to a query, which is normally executing very fast thanks to an index, makes it perform slow, because the planner no longer uses the correct index. I have the following table: CREATE TABLE message ( message_sid SERIAL PRIMARY KEY, from_profile_sid INT NOT NULL REFERENCES profile, to_profile_sid INT NOT NULL REFERENCES profile, sender_has_deleted BOOLEAN NOT NULL DEFAULT FALSE, receiver_has_deleted BOOLEAN NOT NULL DEFAULT FALSE, body TEXT, datetime TIMESTAMP NOT NULL DEFAULT NOW() ); With the following conditional index: CREATE INDEX message_to_profile_idx ON message (to_profile_sid) WHERE NOT receiver_has_deleted; The query to obtain the list of received messages of a profile is simple and executes very fast, because of the index above: db=# EXPLAIN ANALYZE SELECT * FROM message WHERE to_profile_sid = -1 AND NOT receiver_has_deleted ORDER BY message_sid DESC; QUERY PLAN --- Sort (cost=11857.09..11866.19 rows=3640 width=277) (actual time=0.317..0.319 rows=15 loops=1) Sort Key: message_sid Sort Method: quicksort Memory: 32kB - Bitmap Heap Scan on message (cost=106.44..11641.78 rows=3640 width=277) (actual time=0.096..0.271 rows=15 loops=1) Recheck Cond: ((to_profile_sid = (-1)) AND (NOT receiver_has_deleted)) - Bitmap Index Scan on message_to_profile_idx (cost=0.00..105.53 rows=3640 width=0) (actual time=0.056..0.056 rows=21 loops=1) Index Cond: (to_profile_sid = (-1)) Total runtime: 0.383 ms (8 rows) Adding a LIMIT clause to exactly the same query slows its execution more than 20'000 times: db=# EXPLAIN ANALYZE SELECT * FROM message WHERE to_profile_sid = -1 AND NOT receiver_has_deleted ORDER BY message_sid DESC LIMIT 20; QUERY PLAN Limit (cost=0.00..6513.60 rows=20 width=277) (actual time=0.617..6576.539 rows=15 loops=1) - Index Scan Backward using message_pkey on message (cost=0.00..1185474.32 rows=3640 width=277) (actual time=0.617..6576.522 rows=15 loops=1) Filter: ((NOT receiver_has_deleted) AND (to_profile_sid = (-1))) Total runtime: 6576.572 ms (4 rows) Just as I was advised in my recent post, I've already increased the statistics of both fields all the way till 1000, analyzed the table and reindexed the index: ALTER TABLE message ALTER COLUMN to_profile_sid SET STATISTICS 1000; ALTER TABLE message ALTER COLUMN receiver_has_deleted SET STATISTICS 1000; ANALYZE message; REINDEX index message_to_profile_idx; This time, however, the steps above didn't affect the planner in any way, it still refuses to use the index message_to_profile_idx when a LIMIT is involved (for this particular value of to_profile_sid). Here's some statistical data: db=# SELECT COUNT(*) FROM message; count - 1312213 (1 row) db=# SELECT COUNT(*) FROM message WHERE to_profile_sid = -1; count --- 5604 (1 row) db=# SELECT COUNT(*) FROM message WHERE to_profile_sid = -1 AND NOT receiver_has_deleted; count --- 15 (1 row) db=# SELECT COUNT(DISTINCT to_profile_sid) FROM message; count --- 8596 (1 row) db=# SELECT AVG(length) FROM (SELECT to_profile_sid, COUNT(*) AS length FROM message GROUP BY to_profile_sid) AS freq; avg -- 152.6540251279664960 (1 row) db=# SELECT n_distinct FROM pg_stats WHERE tablename='message' AND attname='to_profile_sid'; n_distinct 6277 (1 row) Also, the value of -1 for to_profile_sid is second in the list of most_common_vals in pg_stats, but still I don't understand why a simple limit is blinding the planner for the good index. Any ideas? Regards, -- Kouber Saparev http://kouber.saparev.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] LIMIT confuses the planner (again)
On Mon, Sep 28, 2009 at 4:43 AM, Kouber Saparev kou...@saparev.com wrote: Hello, I am using PostgreSQL 8.3.7 and I am experiencing an issue similar to the one I've already described some time ago: http://archives.postgresql.org/pgsql-performance/2009-02/msg00261.php Again, adding a LIMIT clause to a query, which is normally executing very fast thanks to an index, makes it perform slow, because the planner no longer uses the correct index. I have the following table: CREATE TABLE message ( message_sid SERIAL PRIMARY KEY, from_profile_sid INT NOT NULL REFERENCES profile, to_profile_sid INT NOT NULL REFERENCES profile, sender_has_deleted BOOLEAN NOT NULL DEFAULT FALSE, receiver_has_deleted BOOLEAN NOT NULL DEFAULT FALSE, body TEXT, datetime TIMESTAMP NOT NULL DEFAULT NOW() ); With the following conditional index: CREATE INDEX message_to_profile_idx ON message (to_profile_sid) WHERE NOT receiver_has_deleted; The query to obtain the list of received messages of a profile is simple and executes very fast, because of the index above: db=# EXPLAIN ANALYZE SELECT * FROM message WHERE to_profile_sid = -1 AND NOT receiver_has_deleted ORDER BY message_sid DESC; QUERY PLAN --- Sort (cost=11857.09..11866.19 rows=3640 width=277) (actual time=0.317..0.319 rows=15 loops=1) Sort Key: message_sid Sort Method: quicksort Memory: 32kB - Bitmap Heap Scan on message (cost=106.44..11641.78 rows=3640 width=277) (actual time=0.096..0.271 rows=15 loops=1) Recheck Cond: ((to_profile_sid = (-1)) AND (NOT receiver_has_deleted)) - Bitmap Index Scan on message_to_profile_idx (cost=0.00..105.53 rows=3640 width=0) (actual time=0.056..0.056 rows=21 loops=1) Index Cond: (to_profile_sid = (-1)) Total runtime: 0.383 ms (8 rows) Adding a LIMIT clause to exactly the same query slows its execution more than 20'000 times: db=# EXPLAIN ANALYZE SELECT * FROM message WHERE to_profile_sid = -1 AND NOT receiver_has_deleted ORDER BY message_sid DESC LIMIT 20; QUERY PLAN Limit (cost=0.00..6513.60 rows=20 width=277) (actual time=0.617..6576.539 rows=15 loops=1) - Index Scan Backward using message_pkey on message (cost=0.00..1185474.32 rows=3640 width=277) (actual time=0.617..6576.522 rows=15 loops=1) Filter: ((NOT receiver_has_deleted) AND (to_profile_sid = (-1))) Total runtime: 6576.572 ms (4 rows) Just as I was advised in my recent post, I've already increased the statistics of both fields all the way till 1000, analyzed the table and reindexed the index: ALTER TABLE message ALTER COLUMN to_profile_sid SET STATISTICS 1000; ALTER TABLE message ALTER COLUMN receiver_has_deleted SET STATISTICS 1000; ANALYZE message; REINDEX index message_to_profile_idx; This time, however, the steps above didn't affect the planner in any way, it still refuses to use the index message_to_profile_idx when a LIMIT is involved (for this particular value of to_profile_sid). Here's some statistical data: db=# SELECT COUNT(*) FROM message; count - 1312213 (1 row) db=# SELECT COUNT(*) FROM message WHERE to_profile_sid = -1; count --- 5604 (1 row) db=# SELECT COUNT(*) FROM message WHERE to_profile_sid = -1 AND NOT receiver_has_deleted; count --- 15 (1 row) db=# SELECT COUNT(DISTINCT to_profile_sid) FROM message; count --- 8596 (1 row) db=# SELECT AVG(length) FROM (SELECT to_profile_sid, COUNT(*) AS length FROM message GROUP BY to_profile_sid) AS freq; avg -- 152.6540251279664960 (1 row) db=# SELECT n_distinct FROM pg_stats WHERE tablename='message' AND attname='to_profile_sid'; n_distinct 6277 (1 row) Also, the value of -1 for to_profile_sid is second in the list of most_common_vals in pg_stats, but still I don't understand why a simple limit is blinding the planner for the good index. Any ideas? It would be good to see what the planner's second choice would be, if it didn't have that other index. BEGIN; DROP INDEX message_pkey; EXPLAIN ANALYZE ... ROLLBACK; However, I suspect what's going on here is as follows. When trying to estimate the cost of LIMIT, the planner takes the startup cost for the subpath and a pro-rata share of the run cost, based on the number of rows being fetched as a fraction of the total number it believes to be present. So if the run cost is estimated to be lower than it really is, some other plan with a lower startup cost can look like a better choice, even if the run cost is much higher (because only a tiny fraction of
Re: [PERFORM] Postgres performance
std pik wrote: Hi all.. please, how can i tune postgres performance? Thanks. Thats a very generic question. Here are some generic answers: You can tune the hardware underneath. Faster hardware = faster pg. You can tune the memory usage, and other postgres.conf setting to match your hardware. See the online manuals. You can tune a single slow query, use explain analyze. -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PG 8.3 and large shared buffer settings
On 9/26/09 8:19 AM, Greg Smith wrote: This means that the question you want an answer to is if the OS cache isn't really available, where does giving memory to shared_buffers becomes less efficient than not caching things at all? My guess is that this number is much larger than 10GB, but I don't think anyone has done any tests to try to quantify exactly where it is. Typically when people are talking about systems as large as yours, they're dedicated database servers at that point, so the OS cache gets considered at the same time. If it's effectively out of the picture, the spot where caching still helps even when it's somewhat inefficient due to buffer contention isn't well explored. It also depends on the filesystem. In testing at Sun and on this list, people have found that very large s_b (60% of RAM) plus directIO was actually a win on Solaris UFS, partly because UFS isn't very agressive or smart about readahead and caching. On Linux/Ext3, however, it was never a win. I don't know what AIX's filesystems are like. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres performance
Didn't see the original message so I replied to this one. On Mon, Sep 28, 2009 at 8:11 AM, Andy Colson a...@squeakycode.net wrote: std pik wrote: Hi all.. please, how can i tune postgres performance? Start here: http://www.westnet.com/~gsmith/content/postgresql/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance