[PERFORM] database performance and query performance question

2004-01-22 Thread Shea,Dan [CIS]
Our database has slowed right down.  We are not getting any performance from
our biggest table forecastelement.
The table has 93,218,671 records in it and climbing.
The index is on 4 columns, origianlly it was on 3.  I added another to see
if it improve performance.  It did not.
Should there be less columns in the index?
How can we improve database performance?
How should I improve my query?

PWFPM_DEV=# \d forecastelement
  Table public.forecastelement
 Column |Type | Modifiers
+-+---
 version| character varying(99)   | not null
 origin | character varying(10)   | not null
 timezone   | character varying(99)   | not null
 region_id  | character varying(20)   | not null
 wx_element | character varying(99)   | not null
 value  | character varying(99)   | not null
 flag   | character(3)| not null
 units  | character varying(99)   | not null
 valid_time | timestamp without time zone | not null
 issue_time | timestamp without time zone | not null
 next_forecast  | timestamp without time zone | not null
 reception_time | timestamp without time zone | not null
Indexes:
forecastelement_vrwi_idx btree
(valid_time,region_id.wx_element.issue_time)

explain analyze select  DISTINCT ON (valid_time)
to_char(valid_time,'MMDDHH24MISS') as valid_time,value from
   (select valid_time,value,time(valid_time) as
hour,reception_time,
   issue_time from forecastelement where
   valid_time between '2002-09-02 04:00:00' and
   '2002-09-07 03:59:59' and region_id = 'PU-REG-WTO-00200'
   and wx_element = 'TEMP_VALEUR1' and issue_time between
   '2002-09-02 05:00:00' and '2002-09-06 05:00:00'
   and origin = 'REGIONAL'and time(issue_time) =
'05:00:00'
   order by issue_time,reception_time DESC,valid_time) as
foo where
   (date(valid_time) = date(issue_time)+1 -1  or
date(valid_time) = date(issue_time)+1   or
   (valid_time between '2002-09-07 00:00:00' and '2002-09-07
03:59:59'
  and issue_time = '2002-09-06 05:00:00'))  order by valid_time
,issue_time DESC;

USING INDEX
forecastelement_vrwi_idx btree (valid_time, region_id, wx_element,
issue_time)
 Unique  (cost=116.75..116.76 rows=1 width=83) (actual
time=9469.088..9470.002 rows=115 loops=1)
   -  Sort  (cost=116.75..116.75 rows=1 width=83) (actual
time=9469.085..9469.308 rows=194 loops=1)
 Sort Key: to_char(valid_time, 'MMDDHH24MISS'::text), issue_time
 -  Subquery Scan foo  (cost=116.72..116.74 rows=1 width=83)
(actual time=9465.979..9467.735 rows=194 loops=1)
   -  Sort  (cost=116.72..116.73 rows=1 width=30) (actual
time=9440.756..9440.981 rows=194 loops=1)
 Sort Key: issue_time, reception_time, valid_time
 -  Index Scan using forecastelement_vrwi_idx on
forecastelement  (cost=0.00..116.71 rows=1 width=30) (actual
time=176.510..9439.470 rows=194 loops=1)
   Index Cond: ((valid_time = '2002-09-02
04:00:00'::timestamp without time zone) AND (valid_time = '2002-09-07
03:59:59'::timestamp without time zone) AND ((region_id)::text =
'PU-REG-WTO-00200'::text) AND ((wx_element)::text = 'TEMP_VALEUR1'::text)
AND (issue_time = '2002-09-02 05:00:00'::timestamp without time zone) AND
(issue_time = '2002-09-06 05:00:00'::timestamp without time zone))
   Filter: (((origin)::text = 'REGIONAL'::text) AND
(time(issue_time) = '05:00:00'::time without time zone) AND
((date(valid_time) = ((date(issue_time) + 1) - 1)) OR (date(valid_time) =
(date(issue_time) + 1)) OR ((valid_time = '2002-09-07 00:00:00'::timestamp
without time zone) AND (valid_time = '2002-09-07 03:59:59'::timestamp
without time zone) AND (issue_time = '2002-09-06 05:00:00'::timestamp
without time zone
 Total runtime: 9470.404 ms

We are running postgresql-7.4-0.5PGDG.i386.rpm .
on a Dell Poweredge 6650.
system
OS RHAS 3.0
cpu  4
memory 3.6 GB
disk  270 GB raid 5

postgresql.conf
max_connections = 64 
shared_buffers = 4000 
vacuum_mem = 32768 
effective_cache_size = 312500   
random_page_cost = 2

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] database performance and query performance question

2004-01-22 Thread Josh Berkus
Dan,

 Should there be less columns in the index?
 How can we improve database performance?
 How should I improve my query?

Your query plan isn't the problem.  It's a good plan, and a reasonably 
efficient query.   Under other circumstances, the SELECT DISTINCT with the 
to_char could be a performance-killer, but it's not in that result set.

Overall, you're taking 9 seconds to scan 93 million records.  Is this the time 
the first time you run the query, or the 2nd and successive times?

When did you last run VACUUM ANALYZE on the table?   Have you tried increasing 
the ANALYZE statistics on the index columns to, say, 500?

Your disks are RAID 5.  How many drives?  In RAID5, more drives improves the 
speed of large scans.

And what's your sort_mem setting?   You didn't mention it.

Why is your effective cache size only 300mb when you have 3 GB of RAM?  It's 
not affecting this query, but it could affect others.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] database performance and query performance question

2004-01-22 Thread Josh Berkus
Dan,

 Why is your effective cache size only 300mb when you have 3 GB of RAM?  It's 
 not affecting this query, but it could affect others.

Ignore this last question, I dropped a zero from my math.   Sorry!

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] database performance and query performance question

2004-01-22 Thread Shea,Dan [CIS]

Something that I do not understand is why if you use a valid_time =
'2004-01-22 00:00:00' the query will use the index but if you do a
valid_time   '2004-01-22 00:00:00' it does not use the index?
PWFPM_DEV=# explain analyze select * from forecastelement where valid_time 
date '2004-01-23'::date limit 10;
   QUERY PLAN


 Limit  (cost=0.00..3.82 rows=10 width=129) (actual
time=199550.388..199550.783 rows=10 loops=1)
   -  Seq Scan on forecastelement  (cost=0.00..2722898.40 rows=7131102
width=129) (actual time=199550.382..199550.757 rows=10 loops=1)
 Filter: (valid_time  '2004-01-23 00:00:00'::timestamp without time
zone)
 Total runtime: 199550.871 ms
(4 rows)

PWFPM_DEV=# explain analyze select * from forecastelement where valid_time =
date '2004-01-23'::date limit 10;
 
QUERY PLAN 



 Limit  (cost=0.00..18.76 rows=10 width=129) (actual time=176.141..276.577
rows=10 loops=1)
   -  Index Scan using forecastelement_vrwi_idx on forecastelement
(cost=0.00..160770.98 rows=85707 width=129) (actual time=176.133..276.494
rows=10 loops=1)
 Index Cond: (valid_time = '2004-01-23 00:00:00'::timestamp without
time zone)
 Total runtime: 276.721 ms
(4 rows)

-Original Message-
From: Josh Berkus [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 22, 2004 3:01 PM
To: Shea,Dan [CIS]; [EMAIL PROTECTED]
Subject: Re: [PERFORM] database performance and query performance
question


Dan,

 Should there be less columns in the index?
 How can we improve database performance?
 How should I improve my query?

Your query plan isn't the problem.  It's a good plan, and a reasonably 
efficient query.   Under other circumstances, the SELECT DISTINCT with the 
to_char could be a performance-killer, but it's not in that result set.

Overall, you're taking 9 seconds to scan 93 million records.  Is this the
time 
the first time you run the query, or the 2nd and successive times?

When did you last run VACUUM ANALYZE on the table?   Have you tried
increasing 
the ANALYZE statistics on the index columns to, say, 500?

Your disks are RAID 5.  How many drives?  In RAID5, more drives improves the

speed of large scans.

And what's your sort_mem setting?   You didn't mention it.

Why is your effective cache size only 300mb when you have 3 GB of RAM?  It's

not affecting this query, but it could affect others.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] database performance and query performance question

2004-01-22 Thread Shea,Dan [CIS]

-Original Message-
From: Josh Berkus [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 22, 2004 3:01 PM
To: Shea,Dan [CIS]; [EMAIL PROTECTED]
Subject: Re: [PERFORM] database performance and query performance
question


Dan,

 Should there be less columns in the index?
 How can we improve database performance?
 How should I improve my query?

Your query plan isn't the problem.  It's a good plan, and a reasonably 
efficient query.   Under other circumstances, the SELECT DISTINCT with the

to_char could be a performance-killer, but it's not in that result set.

Overall, you're taking 9 seconds to scan 93 million records.  Is this the
time 
the first time you run the query, or the 2nd and successive times?

This is actually the second time.  The first query took more time.
Concerning the number of columns for an index, I switched the index to have
only one column and tried the same query.  It is below.


When did you last run VACUUM ANALYZE on the table?   Have you tried
increasing 
the ANALYZE statistics on the index columns to, say, 500?
 It is run nightly. But last night's did not complete.  It was taking quite
some time and I cancelled it, over 4 hours. I will try increasing the
ANALYZE statistics to 500.

Your disks are RAID 5.  How many drives?  In RAID5, more drives improves
the 
speed of large scans.
 There are 4 drives in this raid 5.  We are using lvm with ext3 filesystem.
Will be moving the database to a SAN within the next month.

And what's your sort_mem setting?   You didn't mention it.
The sort_mem is the default 
PWFPM_DEV=# show sort_mem;
 sort_mem
--
 1024

Why is your effective cache size only 300mb when you have 3 GB of RAM?  It's

not affecting this query, but it could affect others.
 Oh, I thought I had it set for 2.5 GB of RAM. 312500 * 8k = 2.5 GB


QUERY WITH 1 column in index.

 Unique  (cost=717633.28..717633.29 rows=1 width=83) (actual
time=62922.399..62923.334 rows=115 loops=1)
   -  Sort  (cost=717633.28..717633.29 rows=1 width=83) (actual
time=62922.395..62922.615 rows=194 loops=1)
 Sort Key: to_char(valid_time, 'MMDDHH24MISS'::text), issue_time
 -  Subquery Scan foo  (cost=717633.26..717633.27 rows=1 width=83)
(actual time=62918.232..62919.989 rows=194 loops=1)
   -  Sort  (cost=717633.26..717633.26 rows=1 width=30) (actual
time=62902.378..62902.601 rows=194 loops=1)
 Sort Key: issue_time, reception_time, valid_time
 -  Index Scan using forecastelement_v_idx on
forecastelement  (cost=0.00..717633.25 rows=1 width=30) (actual
time=1454.974..62900.752 rows=194 loops=1)
   Index Cond: ((valid_time = '2002-09-02
04:00:00'::timestamp without time zone) AND (valid_time = '2002-09-07
03:59:59'::timestamp without time zone))
   Filter: (((region_id)::text =
'PU-REG-WTO-00200'::text) AND ((wx_element)::text = 'TEMP_VALEUR1'::text)
AND (issue_time = '2002-09-02 05:00:00'::timestamp without time zone) AND
(issue_time = '2002-09-06 05:00:00'::timestamp without time zone) AND
((origin)::text = 'REGIONAL'::text) AND (time(issue_time) =
'05:00:00'::time without time zone) AND ((date(valid_time) =
((date(issue_time) + 1) - 1)) OR (date(valid_time) = (date(issue_time) + 1))
OR ((valid_time = '2002-09-07 00:00:00'::timestamp without time zone) AND
(valid_time = '2002-09-07 03:59:59'::timestamp without time zone) AND
(issue_time = '2002-09-06 05:00:00'::timestamp without time zone
 Total runtime: 62923.723 ms
(10 rows)

PWFPM_DEV=# expalin analyze 312500
PWFPM_DEV=# explain analyze select  DISTINCT ON (valid_time)
to_char(valid_time,'MMDDHH24MISS') as valid_time,value from
PWFPM_DEV-#(select valid_time,value,time(valid_time)
as hour,reception_time,
PWFPM_DEV(#issue_time from forecastelement where
PWFPM_DEV(#valid_time between '2002-09-02 04:00:00' and
PWFPM_DEV(#'2002-09-07 03:59:59' and region_id =
'PU-REG-WTO-00200'
PWFPM_DEV(#and wx_element = 'TEMP_VALEUR1' and
issue_time between
PWFPM_DEV(#'2002-09-02 05:00:00' and '2002-09-06
05:00:00'
PWFPM_DEV(#and origin = 'REGIONAL'and
time(issue_time) = '05:00:00'
PWFPM_DEV(#order by issue_time,reception_time
DESC,valid_time) as foo where
PWFPM_DEV-#(date(valid_time) = date(issue_time)+1 -1  or
date(valid_time) = date(issue_time)+1   or
PWFPM_DEV(#(valid_time between '2002-09-07 00:00:00' and
'2002-09-07 03:59:59'
PWFPM_DEV(#and issue_time = '2002-09-06 05:00:00'))
order by valid_time ,issue_time DESC;
 
QUERY PLAN

Re: [PERFORM] database performance and query performance question

2004-01-22 Thread Josh Berkus
Dan,

 Something that I do not understand is why if you use a valid_time =
 '2004-01-22 00:00:00' the query will use the index but if you do a
 valid_time   '2004-01-22 00:00:00' it does not use the index?

Because of the expected number of rows to be returned.  Take a look at the row 
estimates on the forecastleelement scans.   

You can improve these estimates by increasing the ANALYZE stats and/or running 
ANALYZE more often.   Of course, increasing the stats makes analyze run 
slower ...

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] database performance and query performance question

2004-01-22 Thread Hannu Krosing
Shea,Dan [CIS] kirjutas N, 22.01.2004 kell 22:35:
 Something that I do not understand is why if you use a valid_time =
 '2004-01-22 00:00:00' the query will use the index but if you do a
 valid_time   '2004-01-22 00:00:00' it does not use the index?

It probably can't tell if  is selective enough to justify using index.

Together with limit 10 it may be.

You could try 

explain analyze select * from forecastelement where valid_time between 
'2004-01-22'::date and '2004-01-22'::date limit 10;

to see if this is considered good enough.

--
Hannu


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] database performance and query performance question

2004-01-22 Thread Hannu Krosing
Hannu Krosing kirjutas N, 22.01.2004 kell 22:46:
 Shea,Dan [CIS] kirjutas N, 22.01.2004 kell 22:35:
  Something that I do not understand is why if you use a valid_time =
  '2004-01-22 00:00:00' the query will use the index but if you do a
  valid_time   '2004-01-22 00:00:00' it does not use the index?
 
 It probably can't tell if  is selective enough to justify using index.
 
 Together with limit 10 it may be.
 
 You could try 
 
 explain analyze select * from forecastelement where valid_time between 
 '2004-01-22'::date and '2004-01-22'::date limit 10;

Sorry, that should have been:

between '2004-01-22'::date and '2004-01-23'::date


 to see if this is considered good enough.
 
 --
 Hannu
 
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] database performance and query performance question

2004-01-22 Thread Shea,Dan [CIS]
This sure speed up the query, it is fast.
PWFPM_DEV=# explain analyze select * from forecastelement where valid_time
between '2004-01-12'::date and '2003-01-12'::date;
 
QUERY PLAN


---
 Index Scan using forecastelement_v_idx on forecastelement
(cost=0.00..159607.11 rows=466094 width=129) (actual time=49.504..49.504
rows=0 loops=1)
   Index Cond: ((valid_time = '2004-01-12 00:00:00'::timestamp without time
zone) AND (valid_time = '2003-01-12 00:00:00'::timestamp without time
zone))
 Total runtime: 49.589 ms
(3 rows)

-Original Message-
From: Hannu Krosing [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 22, 2004 3:54 PM
To: Shea,Dan [CIS]
Cc: '[EMAIL PROTECTED]'; [EMAIL PROTECTED]
Subject: Re: [PERFORM] database performance and query performance
question


Hannu Krosing kirjutas N, 22.01.2004 kell 22:46:
 Shea,Dan [CIS] kirjutas N, 22.01.2004 kell 22:35:
  Something that I do not understand is why if you use a valid_time =
  '2004-01-22 00:00:00' the query will use the index but if you do a
  valid_time   '2004-01-22 00:00:00' it does not use the index?
 
 It probably can't tell if  is selective enough to justify using index.
 
 Together with limit 10 it may be.
 
 You could try 
 
 explain analyze select * from forecastelement where valid_time between 
 '2004-01-22'::date and '2004-01-22'::date limit 10;

Sorry, that should have been:

between '2004-01-22'::date and '2004-01-23'::date


 to see if this is considered good enough.
 
 --
 Hannu
 
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] database performance and query performance question

2004-01-22 Thread Shea,Dan [CIS]
The end date in the previous example was actually invalid  between
'2004-01-12'::date and '2003-01-12'::date;
There have been multiple inserts since I recreated the index but it took
quite some time to complete the following
PWFPM_DEV=# explain analyze select * from forecastelement where valid_time
between '2004-01-12'::date and '2004-01-13'::date;
 
QUERY PLAN  


---
 Index Scan using forecastelement_v_idx on forecastelement
(cost=0.00..832139.81 rows=2523119 width=129) (actual time=0.519..467159.658
rows=2940600 loops=1)
   Index Cond: ((valid_time = '2004-01-12 00:00:00'::timestamp without time
zone) AND (valid_time = '2004-01-13 00:00:00'::timestamp without time
zone))
 Total runtime: 472627.148 ms
(3 rows)

-Original Message-
From: Shea,Dan [CIS] 
Sent: Thursday, January 22, 2004 4:10 PM
To: 'Hannu Krosing'; Shea,Dan [CIS]
Cc: '[EMAIL PROTECTED]'; [EMAIL PROTECTED]
Subject: RE: [PERFORM] database performance and query performance
question


This sure speed up the query, it is fast.
PWFPM_DEV=# explain analyze select * from forecastelement where valid_time
between '2004-01-12'::date and '2003-01-12'::date;
 
QUERY PLAN


---
 Index Scan using forecastelement_v_idx on forecastelement
(cost=0.00..159607.11 rows=466094 width=129) (actual time=49.504..49.504
rows=0 loops=1)
   Index Cond: ((valid_time = '2004-01-12 00:00:00'::timestamp without time
zone) AND (valid_time = '2003-01-12 00:00:00'::timestamp without time
zone))
 Total runtime: 49.589 ms
(3 rows)

-Original Message-
From: Hannu Krosing [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 22, 2004 3:54 PM
To: Shea,Dan [CIS]
Cc: '[EMAIL PROTECTED]'; [EMAIL PROTECTED]
Subject: Re: [PERFORM] database performance and query performance
question


Hannu Krosing kirjutas N, 22.01.2004 kell 22:46:
 Shea,Dan [CIS] kirjutas N, 22.01.2004 kell 22:35:
  Something that I do not understand is why if you use a valid_time =
  '2004-01-22 00:00:00' the query will use the index but if you do a
  valid_time   '2004-01-22 00:00:00' it does not use the index?
 
 It probably can't tell if  is selective enough to justify using index.
 
 Together with limit 10 it may be.
 
 You could try 
 
 explain analyze select * from forecastelement where valid_time between 
 '2004-01-22'::date and '2004-01-22'::date limit 10;

Sorry, that should have been:

between '2004-01-22'::date and '2004-01-23'::date


 to see if this is considered good enough.
 
 --
 Hannu
 
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] database performance and query performance question

2004-01-22 Thread Josh Berkus
Dan,

Of course it took forever.  You're retrieving 2.9 million rows!   

  Index Scan using forecastelement_v_idx on forecastelement
 (cost=0.00..832139.81 rows=2523119 width=129) (actual time=0.519..467159.658
 rows=2940600 loops=1)

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] database performance and query performance question

2004-01-22 Thread Hannu Krosing
Shea,Dan [CIS] kirjutas N, 22.01.2004 kell 23:32:
 The end date in the previous example was actually invalid  between
 '2004-01-12'::date and '2003-01-12'::date;
 There have been multiple inserts since I recreated the index but it took
 quite some time to complete the following
 PWFPM_DEV=# explain analyze select * from forecastelement where valid_time
 between '2004-01-12'::date and '2004-01-13'::date;

You could try ORDER BY to bias the optimiser towards using an index:

explain analyze
 select *
  from forecastelement
 where valid_time  '2004-01-12'::date
 order by valid_time
 limit 10;

This also may be more close to what you are expecting :)

--
Hannu


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] database performance and query performance question

2004-01-22 Thread Greg Stark
Shea,Dan [CIS] [EMAIL PROTECTED] writes:

 Indexes:
 forecastelement_vrwi_idx btree (valid_time,region_id.wx_element.issue_time)
 
 explain analyze 
  SELECT DISTINCT ON (valid_time) 
 to_char(valid_time,'MMDDHH24MISS') AS valid_time,
 value
from (
SELECT valid_time,value, time(valid_time) AS hour, reception_time, 
 issue_time 
  FROM forecastelement 
 WHERE valid_time BETWEEN '2002-09-02 04:00:00' AND '2002-09-07 03:59:59' 
   AND region_id = 'PU-REG-WTO-00200'
   AND wx_element = 'TEMP_VALEUR1' 
   AND issue_time BETWEEN '2002-09-02 05:00:00' AND '2002-09-06 05:00:00'
   AND origin = 'REGIONAL' 
   AND time(issue_time) = '05:00:00'
 ORDER BY issue_time,reception_time DESC,valid_time
) AS foo 
WHERE
  (   date(valid_time) = date(issue_time)+1 -1  
   OR date(valid_time) = date(issue_time)+1   
   OR (valid_time BETWEEN '2002-09-07 00:00:00' AND '2002-09-07 03:59:59'
   AND issue_time = '2002-09-06 05:00:00'
  )
  )
ORDER BY valid_time ,issue_time DESC;

Incidentally, I find it easier to analyze queries when they've been formatted
well. This makes what's going on much clearer.

From this it's clear your index doesn't match the query. Adding more columns
will be useless because only the leading column valid_time will be used at
all. Since you're fetching a whole range of valid_times the remaining columns
are all irrelevant. They only serve to bloat the index and require reading a
lot more data.

You could either try creating an index just on valid_time, or create an index
on (region_id,wx_element,valid_time) or (region_id,wx_element,issue_time)
whichever is more selective. You could put wx_element first if it's more
selective than region_id.

Moreover, what purpose does the inner ORDER BY clause serve? It's only going
to be re-sorted again by the outer ORDER BY.

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html