[PERFORM] LIMIT confuses the planner (again)

2009-09-28 Thread Kouber Saparev

Hello,

I am using PostgreSQL 8.3.7 and I am experiencing an issue similar to 
the one I've already described some time ago:

http://archives.postgresql.org/pgsql-performance/2009-02/msg00261.php

Again, adding a LIMIT clause to a query, which is normally executing 
very fast thanks to an index, makes it perform slow, because the planner 
no longer uses the correct index.


I have the following table:

CREATE TABLE message (
  message_sid SERIAL PRIMARY KEY,
  from_profile_sid INT NOT NULL REFERENCES profile,
  to_profile_sid INT NOT NULL REFERENCES profile,
  sender_has_deleted BOOLEAN NOT NULL DEFAULT FALSE,
  receiver_has_deleted BOOLEAN NOT NULL DEFAULT FALSE,
  body TEXT,
  datetime TIMESTAMP NOT NULL DEFAULT NOW()
);


With the following conditional index:

CREATE INDEX message_to_profile_idx ON message (to_profile_sid) WHERE 
NOT receiver_has_deleted;



The query to obtain the list of received messages of a profile is simple 
and executes very fast, because of the index above:


db=# EXPLAIN ANALYZE SELECT
  *
FROM
  message
WHERE
  to_profile_sid = -1
AND
  NOT receiver_has_deleted
ORDER BY
  message_sid DESC;
QUERY PLAN
---
 Sort  (cost=11857.09..11866.19 rows=3640 width=277) (actual 
time=0.317..0.319 rows=15 loops=1)

   Sort Key: message_sid
   Sort Method:  quicksort  Memory: 32kB
   -  Bitmap Heap Scan on message  (cost=106.44..11641.78 rows=3640 
width=277) (actual time=0.096..0.271 rows=15 loops=1)
 Recheck Cond: ((to_profile_sid = (-1)) AND (NOT 
receiver_has_deleted))
 -  Bitmap Index Scan on message_to_profile_idx 
(cost=0.00..105.53 rows=3640 width=0) (actual time=0.056..0.056 rows=21 
loops=1)

   Index Cond: (to_profile_sid = (-1))
 Total runtime: 0.383 ms
(8 rows)


Adding a LIMIT clause to exactly the same query slows its execution more 
than 20'000 times:


db=# EXPLAIN ANALYZE SELECT
  *
FROM
  message
WHERE
  to_profile_sid = -1
AND
  NOT receiver_has_deleted
ORDER BY
  message_sid DESC LIMIT 20;

QUERY PLAN

 Limit  (cost=0.00..6513.60 rows=20 width=277) (actual 
time=0.617..6576.539 rows=15 loops=1)
   -  Index Scan Backward using message_pkey on message 
(cost=0.00..1185474.32 rows=3640 width=277) (actual time=0.617..6576.522 
rows=15 loops=1)

 Filter: ((NOT receiver_has_deleted) AND (to_profile_sid = (-1)))
 Total runtime: 6576.572 ms
(4 rows)


Just as I was advised in my recent post, I've already increased the 
statistics of both fields all the way till 1000, analyzed the table and 
reindexed the index:


ALTER TABLE message ALTER COLUMN to_profile_sid SET STATISTICS 1000;
ALTER TABLE message ALTER COLUMN receiver_has_deleted SET STATISTICS 1000;
ANALYZE message;
REINDEX index message_to_profile_idx;


This time, however, the steps above didn't affect the planner in any 
way, it still refuses to use the index message_to_profile_idx when a 
LIMIT is involved (for this particular value of to_profile_sid).


Here's some statistical data:

db=# SELECT COUNT(*) FROM message;
  count
-
 1312213
(1 row)

db=# SELECT COUNT(*) FROM message WHERE to_profile_sid = -1;
 count
---
  5604
(1 row)

db=# SELECT COUNT(*) FROM message WHERE to_profile_sid = -1 AND NOT 
receiver_has_deleted;

 count
---
15
(1 row)

db=# SELECT COUNT(DISTINCT to_profile_sid) FROM message;
 count
---
  8596
(1 row)

db=# SELECT AVG(length) FROM (SELECT to_profile_sid, COUNT(*) AS length 
FROM message GROUP BY to_profile_sid) AS freq;

 avg
--
 152.6540251279664960
(1 row)

db=# SELECT n_distinct FROM pg_stats WHERE tablename='message' AND 
attname='to_profile_sid';

 n_distinct

   6277
(1 row)


Also, the value of -1 for to_profile_sid is second in the list of 
most_common_vals in pg_stats, but still I don't understand why a simple 
limit is blinding the planner for the good index. Any ideas?


Regards,
--
Kouber Saparev
http://kouber.saparev.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] LIMIT confuses the planner (again)

2009-09-28 Thread Robert Haas
On Mon, Sep 28, 2009 at 4:43 AM, Kouber Saparev kou...@saparev.com wrote:
 Hello,

 I am using PostgreSQL 8.3.7 and I am experiencing an issue similar to the
 one I've already described some time ago:
 http://archives.postgresql.org/pgsql-performance/2009-02/msg00261.php

 Again, adding a LIMIT clause to a query, which is normally executing very
 fast thanks to an index, makes it perform slow, because the planner no
 longer uses the correct index.

 I have the following table:

 CREATE TABLE message (
  message_sid SERIAL PRIMARY KEY,
  from_profile_sid INT NOT NULL REFERENCES profile,
  to_profile_sid INT NOT NULL REFERENCES profile,
  sender_has_deleted BOOLEAN NOT NULL DEFAULT FALSE,
  receiver_has_deleted BOOLEAN NOT NULL DEFAULT FALSE,
  body TEXT,
  datetime TIMESTAMP NOT NULL DEFAULT NOW()
 );


 With the following conditional index:

 CREATE INDEX message_to_profile_idx ON message (to_profile_sid) WHERE NOT
 receiver_has_deleted;


 The query to obtain the list of received messages of a profile is simple and
 executes very fast, because of the index above:

 db=# EXPLAIN ANALYZE SELECT
  *
 FROM
  message
 WHERE
  to_profile_sid = -1
 AND
  NOT receiver_has_deleted
 ORDER BY
  message_sid DESC;
                                                                QUERY PLAN
 ---
  Sort  (cost=11857.09..11866.19 rows=3640 width=277) (actual
 time=0.317..0.319 rows=15 loops=1)
   Sort Key: message_sid
   Sort Method:  quicksort  Memory: 32kB
   -  Bitmap Heap Scan on message  (cost=106.44..11641.78 rows=3640
 width=277) (actual time=0.096..0.271 rows=15 loops=1)
         Recheck Cond: ((to_profile_sid = (-1)) AND (NOT
 receiver_has_deleted))
         -  Bitmap Index Scan on message_to_profile_idx (cost=0.00..105.53
 rows=3640 width=0) (actual time=0.056..0.056 rows=21 loops=1)
               Index Cond: (to_profile_sid = (-1))
  Total runtime: 0.383 ms
 (8 rows)


 Adding a LIMIT clause to exactly the same query slows its execution more
 than 20'000 times:

 db=# EXPLAIN ANALYZE SELECT
  *
 FROM
  message
 WHERE
  to_profile_sid = -1
 AND
  NOT receiver_has_deleted
 ORDER BY
  message_sid DESC LIMIT 20;

 QUERY PLAN
 
  Limit  (cost=0.00..6513.60 rows=20 width=277) (actual time=0.617..6576.539
 rows=15 loops=1)
   -  Index Scan Backward using message_pkey on message
 (cost=0.00..1185474.32 rows=3640 width=277) (actual time=0.617..6576.522
 rows=15 loops=1)
         Filter: ((NOT receiver_has_deleted) AND (to_profile_sid = (-1)))
  Total runtime: 6576.572 ms
 (4 rows)


 Just as I was advised in my recent post, I've already increased the
 statistics of both fields all the way till 1000, analyzed the table and
 reindexed the index:

 ALTER TABLE message ALTER COLUMN to_profile_sid SET STATISTICS 1000;
 ALTER TABLE message ALTER COLUMN receiver_has_deleted SET STATISTICS 1000;
 ANALYZE message;
 REINDEX index message_to_profile_idx;


 This time, however, the steps above didn't affect the planner in any way, it
 still refuses to use the index message_to_profile_idx when a LIMIT is
 involved (for this particular value of to_profile_sid).

 Here's some statistical data:

 db=# SELECT COUNT(*) FROM message;
  count
 -
  1312213
 (1 row)

 db=# SELECT COUNT(*) FROM message WHERE to_profile_sid = -1;
  count
 ---
  5604
 (1 row)

 db=# SELECT COUNT(*) FROM message WHERE to_profile_sid = -1 AND NOT
 receiver_has_deleted;
  count
 ---
    15
 (1 row)

 db=# SELECT COUNT(DISTINCT to_profile_sid) FROM message;
  count
 ---
  8596
 (1 row)

 db=# SELECT AVG(length) FROM (SELECT to_profile_sid, COUNT(*) AS length FROM
 message GROUP BY to_profile_sid) AS freq;
         avg
 --
  152.6540251279664960
 (1 row)

 db=# SELECT n_distinct FROM pg_stats WHERE tablename='message' AND
 attname='to_profile_sid';
  n_distinct
 
       6277
 (1 row)


 Also, the value of -1 for to_profile_sid is second in the list of
 most_common_vals in pg_stats, but still I don't understand why a simple
 limit is blinding the planner for the good index. Any ideas?

It would be good to see what the planner's second choice would be, if
it didn't have that other index.

BEGIN;
DROP INDEX message_pkey;
EXPLAIN ANALYZE ...
ROLLBACK;

However, I suspect what's going on here is as follows.  When trying to
estimate the cost of LIMIT, the planner takes the startup cost for the
subpath and a pro-rata share of the run cost, based on the number of
rows being fetched as a fraction of the total number it believes to be
present.  So if the run cost is estimated to be lower than it really
is, some other plan with a lower startup cost can look like a better
choice, even if the run cost is much higher (because only a tiny
fraction of 

Re: [PERFORM] Postgres performance

2009-09-28 Thread Andy Colson

std pik wrote:

Hi all..
please, how can i tune postgres performance?

Thanks.


Thats a very generic question.  Here are some generic answers:

You can tune the hardware underneath.  Faster hardware = faster pg.

You can tune the memory usage, and other postgres.conf setting to match 
your hardware.  See the online manuals.


You can tune a single slow query, use explain analyze.

-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG 8.3 and large shared buffer settings

2009-09-28 Thread Josh Berkus
On 9/26/09 8:19 AM, Greg Smith wrote:
 This means that the question you want an answer to is if the OS cache
 isn't really available, where does giving memory to shared_buffers
 becomes less efficient than not caching things at all?  My guess is
 that this number is much larger than 10GB, but I don't think anyone has
 done any tests to try to quantify exactly where it is.  Typically when
 people are talking about systems as large as yours, they're dedicated
 database servers at that point, so the OS cache gets considered at the
 same time. If it's effectively out of the picture, the spot where
 caching still helps even when it's somewhat inefficient due to buffer
 contention isn't well explored.

It also depends on the filesystem.  In testing at Sun and on this list,
people have found that very large s_b (60% of RAM) plus directIO was
actually a win on Solaris UFS, partly because UFS isn't very agressive
or smart about readahead and caching.  On Linux/Ext3, however, it was
never a win.

I don't know what AIX's filesystems are like.

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres performance

2009-09-28 Thread Scott Marlowe
Didn't see the original message so I replied to this one.

On Mon, Sep 28, 2009 at 8:11 AM, Andy Colson a...@squeakycode.net wrote:
 std pik wrote:

 Hi all..
 please, how can i tune postgres performance?

Start here:
http://www.westnet.com/~gsmith/content/postgresql/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance