,some of our queries to the database taking long time to return the results.

fsync: off (even we tested this parameter is on ,we observed the same slowness )

If your queries take long time to return results, I suppose you are talking about SELECTs.

fsync = off will not make SELECTs faster (only inserts, updates, deletes) but it is not worth it as you risk data loss.

synchronous_commit = on has about the same advantages (faster...) as fsync=off, but with no risk of data loss, so it is much better !


We have 300k row's in PolledData Table.In each STATSDATA table ,we have almost 12 to 13 million rows.

        OK. So you insert 13 million rows per day ?
        That is about 150 rows per second.

Every one minute interval ,we insert data into to STATSDATA table.

I assume you are making an INSERT INTO statsdata VALUES (...... 150 values .....)
        and not 150 inserts, yes ?

First Query :
SELECT COUNT(*) FROM (
        SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME, VAL
        FROM PolledData, STATSDATA8_21_2009 WHERE
        ( ( PolledData.ID=STATSDATA8_21_2009.POLLID)
        AND ( ( TTIME >= 1250838027454)
        AND ( TTIME <=1250838079654) ) ) ) t1;

* You could rewrite as :

SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME, VAL
FROM PolledData
JOIN STATSDATA8_21_2009 ON ( PolledData.ID = STATSDATA8_21_2009.POLLID)
WHERE TTIME BETWEEN ... AND ...

- It is exactly the same query, but much easier to read.

* some ANALYZE-ing of your tables would be useful, since the estimates from the planner look suspiciously different from reality
- ANALYZE is fast, you can run it often if you INSERT rows all the time

* You are joining on POLLID which is a NUMERIC in one table and a BIGINT in the other table.
- Is there any reason for this type difference ?
- Could you use BIGINT in both tables ?
- BIGINT is faster than NUMERIC and uses less space.
- Type conversions use CPU cycles too.

* Should StatsData.ID have a foreign key REFERENCES PolledData.ID ?
- This won't make the query faster, but if you know all rows in StatsData reference rows in PolledData (because of the FK constraint) and you want a count(*) like above, you don't need to JOIN.

* TTIME >= 1250838027454 AND TTIME <=1250838079654
- TTIME should be TIMESTAMP (with or without TIMEZONE) or BIGINT but certainly not NUMERIC - An index on StatsData.TTIME would be useful, it would avoid Seq Scan, replacing it with a Bitmap Scan, much faster

* work_mem
- since you have few connections you could increase work_mem

Second Query :

        Same as first query

Third Query

SELECT COUNT(*) FROM ( SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME, VAL
FROM PolledData, STATSDATA8_21_2009
WHERE ( ( PolledData.ID=STATSDATA8_21_2009.POLLID)
AND ( ( TTIME >= 1250838027454) AND ( TTIME <=1250838027454) ) )

union all SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME, VAL FROM PolledData, STATSDATA8_20_2009 WHERE ( ( PolledData.ID=STATSDATA8_20_2009.POLLID) AND ( ( TTIME >= 1250767134601) AND ( TTIME <= 1250767134601) ) ) )t1 ;

Basically this is, again, exactly the same query as above, but two times, and UNION ALL'ed

* You could rewrite it like this :

SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME, VAL
FROM
( SELECT ... FROM STATSDATA8_21_2009 WHERE TTIME BETWEEN ... AND ... )
UNION ALL SELECT ... FROM STATSDATA8_20_2009 WHERE TTIME BETWEEN ... AND ... )
)
JOIN STATSDATA8_21_2009 ON ( PolledData.ID = STATSDATA8_21_2009.POLLID)

* If TTIME is the current time, and you insert data as it comes, data in StatsData tables is probably already ordered on TTIME. - If it is not the case, once a table is filled and becomes read-only, consider CLUSTER on the index you created on TTIME
- It will make range queries on TTIME much faster

* Query plan
Seq Scan on statsdata8_21_2009 (cost=0.00..70574.88 rows=1 width=32) (actual time=0.047..29066.227 rows=227 loops=1) Seq Scan on statsdata8_20_2009 (cost=0.00..382519.60 rows=1 width=32) (actual time=3136.008..93985.540 rows=1 loops=1)

Postgres thinks there is 1 row in those tables... that's probably not the case !
The first one returns 227 rows, so the plan chosen in a catastrophe.

I was a bit intrigued by your query, so I made a little test...

BEGIN;
CREATE TABLE test( x INT, y INT );
INSERT INTO test (SELECT n,n FROM generate_series( 1,1000000 ) AS n );
CREATE INDEX test_x ON test( x );
CREATE INDEX test_y ON test( y );
COMMIT;

ANALYZE test;

test=> EXPLAIN ANALYZE SELECT * FROM test a JOIN test b ON (b.x=a.x) WHERE a.x BETWEEN 0 AND 10000;
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=480.53..23759.14 rows=10406 width=16) (actual time=15.614..1085.085 rows=10000 loops=1)
   Hash Cond: (b.x = a.x)
-> Seq Scan on test b (cost=0.00..14424.76 rows=999976 width=8) (actual time=0.013..477.516 rows=1000000 loops=1) -> Hash (cost=350.46..350.46 rows=10406 width=8) (actual time=15.581..15.581 rows=10000 loops=1) -> Index Scan using test_x on test a (cost=0.00..350.46 rows=10406 width=8) (actual time=0.062..8.537 rows=10000 loops=1)
               Index Cond: ((x >= 0) AND (x <= 10000))
 Total runtime: 1088.462 ms
(7 lignes)

test=> set enable_seqscan TO 0;
SET
test=> EXPLAIN ANALYZE SELECT * FROM test a JOIN test b ON (b.x=a.x) WHERE a.x BETWEEN 0 AND 10000;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..30671.03 rows=10406 width=16) (actual time=0.075..85.897 rows=10000 loops=1) -> Index Scan using test_x on test a (cost=0.00..350.46 rows=10406 width=8) (actual time=0.066..8.377 rows=10000 loops=1)
         Index Cond: ((x >= 0) AND (x <= 10000))
-> Index Scan using test_x on test b (cost=0.00..2.90 rows=1 width=8) (actual time=0.005..0.006 rows=1 loops=10000)
         Index Cond: (b.x = a.x)
 Total runtime: 90.160 ms
(6 lignes)

test=> set enable_nestloop TO 0;
SET
test=> EXPLAIN ANALYZE SELECT * FROM test a JOIN test b ON (b.x=a.x) WHERE a.x BETWEEN 0 AND 10000;
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=0.00..31200.45 rows=10406 width=16) (actual time=0.081..35.735 rows=10000 loops=1)
   Merge Cond: (a.x = b.x)
-> Index Scan using test_x on test a (cost=0.00..350.46 rows=10406 width=8) (actual time=0.059..8.093 rows=10000 loops=1)
         Index Cond: ((x >= 0) AND (x <= 10000))
-> Index Scan using test_x on test b (cost=0.00..28219.98 rows=999976 width=8) (actual time=0.016..7.494 rows=10001 loops=1)
 Total runtime: 40.013 ms
(6 lignes)


I wonder why it doesn't choose the merge join at first...





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to