Re: [PERFORM] Why is PostgreSQL so slow on Windows ( Postgres 8.3.7) version

PFC Sun, 02 Aug 2009 13:32:15 -0700

,some of our queries to the database taking long time to return theresults.

fsync: off (even we tested this parameter is on ,we observed the sameslowness )

If your queries take long time to return results, I suppose you aretalking about SELECTs.

fsync = off will not make SELECTs faster (only inserts, updates, deletes)but it is not worth it as you risk data loss.

synchronous_commit = on has about the same advantages (faster...) asfsync=off, but with no risk of data loss, so it is much better !

We have 300k row's in PolledData Table.In each STATSDATA table ,we havealmost 12 to 13 million rows.


        OK. So you insert 13 million rows per day ?
        That is about 150 rows per second.

Every one minute interval ,we insert data into to STATSDATA table.

I assume you are making an INSERT INTO statsdata VALUES (...... 150values .....)

        and not 150 inserts, yes ?

First Query :
SELECT COUNT(*) FROM (

        SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME, VAL
        FROM PolledData, STATSDATA8_21_2009 WHERE
        ( ( PolledData.ID=STATSDATA8_21_2009.POLLID)
        AND ( ( TTIME >= 1250838027454)
        AND ( TTIME <=1250838079654) ) ) ) t1;

* You could rewrite as :

SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME, VAL
FROM PolledData
JOIN STATSDATA8_21_2009 ON ( PolledData.ID = STATSDATA8_21_2009.POLLID)
WHERE TTIME BETWEEN ... AND ...

- It is exactly the same query, but much easier to read.

* some ANALYZE-ing of your tables would be useful, since the estimatesfrom the planner look suspiciously different from reality

- ANALYZE is fast, you can run it often if you INSERT rows all the time

* You are joining on POLLID which is a NUMERIC in one table and a BIGINTin the other table.

- Is there any reason for this type difference ?
- Could you use BIGINT in both tables ?
- BIGINT is faster than NUMERIC and uses less space.
- Type conversions use CPU cycles too.

* Should StatsData.ID have a foreign key REFERENCES PolledData.ID ?

- This won't make the query faster, but if you know all rows in StatsDatareference rows in PolledData (because of the FK constraint) and you want acount(*) like above, you don't need to JOIN.


* TTIME >= 1250838027454 AND TTIME <=1250838079654

- TTIME should be TIMESTAMP (with or without TIMEZONE) or BIGINT butcertainly not NUMERIC- An index on StatsData.TTIME would be useful, it would avoid Seq Scan,replacing it with a Bitmap Scan, much faster


* work_mem
- since you have few connections you could increase work_mem

Second Query :


        Same as first query

Third Query

SELECT COUNT(*) FROM ( SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME,VAL

FROM PolledData, STATSDATA8_21_2009
WHERE ( ( PolledData.ID=STATSDATA8_21_2009.POLLID)
AND ( ( TTIME >= 1250838027454) AND ( TTIME <=1250838027454) ) )

union all SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME, VAL FROMPolledData, STATSDATA8_20_2009WHERE ( ( PolledData.ID=STATSDATA8_20_2009.POLLID) AND ( ( TTIME >=1250767134601) AND ( TTIME <= 1250767134601) ) ) )t1 ;

Basically this is, again, exactly the same query as above, but two times,and UNION ALL'ed


* You could rewrite it like this :

SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME, VAL
FROM
( SELECT ... FROM STATSDATA8_21_2009 WHERE TTIME BETWEEN ... AND ... )

UNION ALL SELECT ... FROM STATSDATA8_20_2009 WHERE TTIME BETWEEN ... AND... )

)
JOIN STATSDATA8_21_2009 ON ( PolledData.ID = STATSDATA8_21_2009.POLLID)

* If TTIME is the current time, and you insert data as it comes, data inStatsData tables is probably already ordered on TTIME.- If it is not the case, once a table is filled and becomes read-only,consider CLUSTER on the index you created on TTIME

- It will make range queries on TTIME much faster

* Query plan

Seq Scan on statsdata8_21_2009 (cost=0.00..70574.88 rows=1 width=32)(actual time=0.047..29066.227 rows=227 loops=1)Seq Scan on statsdata8_20_2009 (cost=0.00..382519.60 rows=1 width=32)(actual time=3136.008..93985.540 rows=1 loops=1)

Postgres thinks there is 1 row in those tables... that's probably not thecase !

The first one returns 227 rows, so the plan chosen in a catastrophe.

I was a bit intrigued by your query, so I made a little test...

BEGIN;
CREATE TABLE test( x INT, y INT );
INSERT INTO test (SELECT n,n FROM generate_series( 1,1000000 ) AS n );
CREATE INDEX test_x ON test( x );
CREATE INDEX test_y ON test( y );
COMMIT;

ANALYZE test;

test=> EXPLAIN ANALYZE SELECT * FROM test a JOIN test b ON (b.x=a.x) WHEREa.x BETWEEN 0 AND 10000;

                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------

Hash Join (cost=480.53..23759.14 rows=10406 width=16) (actualtime=15.614..1085.085 rows=10000 loops=1)

   Hash Cond: (b.x = a.x)

-> Seq Scan on test b (cost=0.00..14424.76 rows=999976 width=8)(actual time=0.013..477.516 rows=1000000 loops=1)-> Hash (cost=350.46..350.46 rows=10406 width=8) (actualtime=15.581..15.581 rows=10000 loops=1)-> Index Scan using test_x on test a (cost=0.00..350.46rows=10406 width=8) (actual time=0.062..8.537 rows=10000 loops=1)

               Index Cond: ((x >= 0) AND (x <= 10000))
 Total runtime: 1088.462 ms
(7 lignes)

test=> set enable_seqscan TO 0;
SET

test=> EXPLAIN ANALYZE SELECT * FROM test a JOIN test b ON (b.x=a.x) WHEREa.x BETWEEN 0 AND 10000;

                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=0.00..30671.03 rows=10406 width=16) (actualtime=0.075..85.897 rows=10000 loops=1)-> Index Scan using test_x on test a (cost=0.00..350.46 rows=10406width=8) (actual time=0.066..8.377 rows=10000 loops=1)

         Index Cond: ((x >= 0) AND (x <= 10000))

-> Index Scan using test_x on test b (cost=0.00..2.90 rows=1 width=8)(actual time=0.005..0.006 rows=1 loops=10000)

         Index Cond: (b.x = a.x)
 Total runtime: 90.160 ms
(6 lignes)

test=> set enable_nestloop TO 0;
SET

test=> EXPLAIN ANALYZE SELECT * FROM test a JOIN test b ON (b.x=a.x) WHEREa.x BETWEEN 0 AND 10000;

                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------

Merge Join (cost=0.00..31200.45 rows=10406 width=16) (actualtime=0.081..35.735 rows=10000 loops=1)

   Merge Cond: (a.x = b.x)

-> Index Scan using test_x on test a (cost=0.00..350.46 rows=10406width=8) (actual time=0.059..8.093 rows=10000 loops=1)

         Index Cond: ((x >= 0) AND (x <= 10000))

-> Index Scan using test_x on test b (cost=0.00..28219.98 rows=999976width=8) (actual time=0.016..7.494 rows=10001 loops=1)

 Total runtime: 40.013 ms
(6 lignes)


I wonder why it doesn't choose the merge join at first...





--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Why is PostgreSQL so slow on Windows ( Postgres 8.3.7) version

Reply via email to