Re: [PERFORM] Slow query postgres 8.3

2011-04-09 Thread tv
 Hi,

 I am trying to tune a query that is taking too much time on a large
 dataset (postgres 8.3).


Hi, run ANALYZE on the tables used in the query - the stats are very off,
so the db chooses a really bad execution plan.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why it is using/not using index scan?

2011-04-09 Thread Tomas Vondra
Dne 31.3.2011 19:26, Laszlo Nagy napsal(a):
 For this query:
 
 select pp.id,pp.product_id,pp.selling_site_id,pp.asin
 from product_price pp
 where
 (pp.asin is not null and pp.asin'')
 and (pp.upload_status_id1)
 and pp.selling_site_id in (8,7,35,6,9)
 and (pp.last_od  'now'::timestamp - '1 week'::interval )
 limit 5000
 
 Query plan is:
 
 Limit  (cost=9182.41..77384.80 rows=3290 width=35)
   -  Bitmap Heap Scan on product_price pp  (cost=9182.41..77384.80
 rows=3290 width=35)
 Recheck Cond: ((last_od  '2011-03-24
 13:05:09.540025'::timestamp without time zone) AND (selling_site_id =
 ANY ('{8,7,35,6,9}'::bigint[])))
 Filter: ((asin IS NOT NULL) AND (asin  ''::text) AND
 (upload_status_id  1))
 -  Bitmap Index Scan on idx_product_price_last_od_ss 
 (cost=0.00..9181.59 rows=24666 width=0)
   Index Cond: ((last_od  '2011-03-24
 13:05:09.540025'::timestamp without time zone) AND (selling_site_id =
 ANY ('{8,7,35,6,9}'::bigint[])))
 
 For this query:
 
 select pp.id,pp.product_id,pp.selling_site_id,pp.asin
 from product_price pp
 where
 (pp.asin is not null and pp.asin'')
 and (pp.upload_status_id1)
 and pp.selling_site_id in (8,7,35,6,9)
 and (pp.last_od + '1 week'::interval  'now'::timestamp )
 limit 5000
 
 Query plan is:
 
 Limit  (cost=0.00..13890.67 rows=5000 width=35)
   -  Seq Scan on product_price pp  (cost=0.00..485889.97 rows=174898
 width=35)
 Filter: ((asin IS NOT NULL) AND (asin  ''::text) AND
 (upload_status_id  1) AND ((last_od + '7 days'::interval) 
 '2011-03-31 13:06:17.460013'::timestamp without time zone) AND
 (selling_site_id = ANY ('{8,7,35,6,9}'::bigint[])))
 
 
 The only difference is this: instead of (pp.last_od  'now'::timestamp -
 '1 week'::interval ) I have used (pp.last_od + '1 week'::interval 
 'now'::timestamp )

That's the only difference as you see it - the planner actually found
out the former query is expected to return 3290 rows while the latter
one is expected to return 174898 rows. That's the reason why the second
query is using seqscan instead of index scan - for a lot of rows, the
index scan tends to be very inefficient.

Next time post EXPLAIN ANALYZE output, as it provides data from the
actual run, so we can see if there are any issues with those estimates.

Anyway, you may try to disable sequential scans (just run 'set
enable_seqacan=off' before running the query) and you'll see if index
scan really would be better.

 First query with index scan opens in 440msec. The second query with seq
 scan opens in about 22 seconds. So the first one is about 50x faster.

Every database/planner has some weaknesses - it may be the case that an
index scan would be faster but postgresql is not able to use it in this
case for some reason.

 My concern is that we are working on a huge set of applications that use
 thousands of different queries on a database. There are programs that we
 wrote years ago. The database structure continuously changing. We are
 adding new indexes and columns, and of course we are upgrading
 PostgreSQL when a new stable version comes out. There are cases when a
 change in a table affects 500+ queries in 50+ programs. I really did not
 think that I have to be THAT CAREFUL with writing conditions in SQL. Do
 I really have to manually analyze all those queries and correct
 conditions like this?

You have to be that careful, and it's not specific to PostgreSQL. I'm
working with other databases and the same holds for them - SQL looks so
simple that a chimp might learn it, but only the best chimps may produce
good queries.

 If so, then at least I would like to know if there is a documentation or
 wiki page where I can learn about how not to write conditions. I just
 figured out that I need to put constant expressions on one side of any
 comparison, if possible. But probably there are other rules I wouldn't
 think of.

I'm not aware of such official document. I've planned to write an
article 10 ways to ruin your query but I somehow forgot about it.
Anyway it's mostly common sense, i.e. once you know how indexes work
you'll immediately see if a condition may benefit from them or not.

Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Multiple index builds on same table - in one sweep?

2011-04-09 Thread Chris Ruprecht
I have a table that I need to rebuild indexes on from time to time (records get 
loaded before indexes get build).

To build the indexes, I use 'create index ...', which reads the entire table 
and builds the index, one at a time.
I'm wondering if there is a way to build these indexes in parallel while 
reading the table only once for all indexes and building them all at the same 
time. Is there an index build tool that I missed somehow, that can do this?

Thanks,
Chris. 



best regards,
chris
-- 
chris ruprecht
database grunt and bit pusher extraordinaíre


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Multiple index builds on same table - in one sweep?

2011-04-09 Thread Tom Lane
Chris Ruprecht ch...@ruprecht.org writes:
 I have a table that I need to rebuild indexes on from time to time (records 
 get loaded before indexes get build).
 To build the indexes, I use 'create index ...', which reads the entire table 
 and builds the index, one at a time.
 I'm wondering if there is a way to build these indexes in parallel while 
 reading the table only once for all indexes and building them all at the same 
 time. Is there an index build tool that I missed somehow, that can do this?

I don't know of any automated tool, but if you launch several CREATE
INDEX operations on the same table at approximately the same time (in
separate sessions), they should share the I/O required to read the
table.  (The synchronized scans feature guarantees this in recent
PG releases, even if you're not very careful about starting them at
the same time.)

The downside of that is that you need N times the working memory and
you will have N times the subsidiary I/O for sort temp files and writes
to the finished indexes.  Depending on the characteristics of your I/O
system it's not hard to imagine this being a net loss ... but it'd be
interesting to experiment.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Multiple index builds on same table - in one sweep?

2011-04-09 Thread Chris Ruprecht
I'm running 2 tests now, one, where I'm doing the traditional indexing, in 
sequence. The server isn't doing anything else, so I should get pretty accurate 
results.
Test 2 will win all the create index sessions in separate sessions in parallel 
(echo create index ...|psql ...  ) once the 'serial build' test is done.

Maybe, in a future release, somebody will develop something that can create 
indexes as inactive and have a build tool build and activate them at the same 
time. Food for thought?
 
On Apr 9, 2011, at 13:10 , Tom Lane wrote:

 Chris Ruprecht ch...@ruprecht.org writes:
 I have a table that I need to rebuild indexes on from time to time (records 
 get loaded before indexes get build).
 To build the indexes, I use 'create index ...', which reads the entire table 
 and builds the index, one at a time.
 I'm wondering if there is a way to build these indexes in parallel while 
 reading the table only once for all indexes and building them all at the 
 same time. Is there an index build tool that I missed somehow, that can do 
 this?
 
 I don't know of any automated tool, but if you launch several CREATE
 INDEX operations on the same table at approximately the same time (in
 separate sessions), they should share the I/O required to read the
 table.  (The synchronized scans feature guarantees this in recent
 PG releases, even if you're not very careful about starting them at
 the same time.)
 
 The downside of that is that you need N times the working memory and
 you will have N times the subsidiary I/O for sort temp files and writes
 to the finished indexes.  Depending on the characteristics of your I/O
 system it's not hard to imagine this being a net loss ... but it'd be
 interesting to experiment.
 
   regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance