Re: [PERFORM] Index isn't used during a join.

2006-01-11 Thread Robert Creager
When grilled further on (Wed, 11 Jan 2006 00:56:55 -0700),
Michael Fuhr [EMAIL PROTECTED] confessed:

 On Tue, Jan 10, 2006 at 10:10:55PM -0700, Robert Creager wrote:
  The query is now correct, but still is slow because of lack of
  index usage.  I don't know how to structure the query correctly to
  use the index.
 
 Have you tried adding restrictions on doy in the WHERE clause?
 Something like this, I think:

I cannot.  That's what I thought I would get from the join.  The query shown 
will always have two days involved, and only grows from there.  The data is 
graphed at http://www.logicalchaos.org/weather/index.html, and I'm looking at 
adding historical data to the graphs.

Opps, never mind.  You hit the nail on the head:

weather-# SELECT *, unmunge_time( time_group ) AS time,
weather-# EXTRACT( doy FROM unmunge_time( time_group ) )
weather-# FROM minute.windspeed
weather-# JOIN doy_agg ON( EXTRACT( doy FROM unmunge_time( time_group ) ) = doy 
)
weather-# WHERE unmunge_time( time_group )  ( now() - '24 hour'::interval ) 
weather-# AND doy BETWEEN EXTRACT( doy FROM now() - '24 hour'::interval) 
weather-# AND EXTRACT( doy FROM now() )
weather-# ORDER BY time_group;

   QUERY PLAN   


 Sort  (cost=21914.09..21914.10 rows=1 width=48) (actual time=76.595..76.662 
rows=286 loops=1)
   Sort Key: windspeed.time_group
   -  Hash Join  (cost=21648.19..21914.08 rows=1 width=48) (actual 
time=64.656..75.562 rows=286 loops=1)
 Hash Cond: (date_part('doy'::text, unmunge_time(outer.time_group)) = 
inner.doy)
 -  Bitmap Heap Scan on windspeed  (cost=2.27..267.40 rows=74 
width=28) (actual time=0.585..1.111 rows=286 loops=1)
   Recheck Cond: (unmunge_time(time_group)  (now() - 
'24:00:00'::interval))
   -  Bitmap Index Scan on minute_windspeed_unmunge_index  
(cost=0.00..2.27 rows=74 width=0) (actual time=0.566..0.566 rows=287 loops=1)
 Index Cond: (unmunge_time(time_group)  (now() - 
'24:00:00'::interval))
 -  Hash  (cost=21645.92..21645.92 rows=3 width=20) (actual 
time=63.849..63.849 rows=2 loops=1)
   -  HashAggregate  (cost=21645.84..21645.89 rows=3 width=20) 
(actual time=63.832..63.834 rows=2 loops=1)
 -  Bitmap Heap Scan on readings  (cost=59.21..21596.85 
rows=6532 width=20) (actual time=15.174..53.249 rows=7613 loops=1)
   Recheck Cond: ((date_part('doy'::text, when) = 
date_part('doy'::text, (now() - '24:00:00'::interval))) AND 
(date_part('doy'::text, when) = date_part('doy'::text, now(
   -  Bitmap Index Scan on readings_doy_index  
(cost=0.00..59.21 rows=6532 width=0) (actual time=12.509..12.509 rows=10530 
loops=1)
 Index Cond: ((date_part('doy'::text, when) 
= date_part('doy'::text, (now() - '24:00:00'::interval))) AND 
(date_part('doy'::text, when) = date_part('doy'::text, now(
 Total runtime: 77.177 ms

What I had thought is that PG would (could?) be smart enough to realize that 
one query was restricted, and apply that restriction to the other based on the 
join.  I know it works in other cases (using indexes on both tables using the 
join)...

 
 Something else occurred to me: do you (or will you) have more than
 one year of data?  If so then matching on doy could be problematic
 unless you also check for the year, or unless you want to match
 more than one year.

Yes and yes.  I'm doing both aggregate by day of the year for all data, and 
aggregate by day of year within each year.  The examples are:

weather=# select * from doy_agg where doy = extract( doy from now() );
 doy |  avg_windspeed   | max_windspeed 
-+--+---
  11 | 6.14058239764748 |69
(1 row)

weather=# select * from doy_day_agg where extract( doy from day ) = extract( 
doy from now() );
 day |  avg_windspeed   | max_windspeed 
-+--+---
 2004-01-11 00:00:00 | 5.03991313397539 |17
 2006-01-11 00:00:00 |  18.532050716667 |69
 2005-01-11 00:00:00 |  3.6106763448041 |13

Thanks for your help Michael.

Cheers,
Rob

-- 
 07:07:30 up 3 days, 23:34,  9 users,  load average: 2.29, 2.44, 2.43
Linux 2.6.12-12-2 #4 SMP Tue Jan 3 19:56:19 MST 2006


pgpbEkxD1FaVM.pgp
Description: PGP signature


Re: [PERFORM] Index isn't used during a join.

2006-01-11 Thread Robert Creager
When grilled further on (Wed, 11 Jan 2006 07:26:59 -0700),
Robert Creager [EMAIL PROTECTED] confessed:

 
 weather-# SELECT *, unmunge_time( time_group ) AS time,
 weather-# EXTRACT( doy FROM unmunge_time( time_group ) )
 weather-# FROM minute.windspeed
 weather-# JOIN doy_agg ON( EXTRACT( doy FROM unmunge_time( time_group ) ) = 
 doy )
 weather-# WHERE unmunge_time( time_group )  ( now() - '24 hour'::interval ) 
 weather-# AND doy BETWEEN EXTRACT( doy FROM now() - '24 hour'::interval) 
 weather-# AND EXTRACT( doy FROM now() )
 weather-# ORDER BY time_group;

The more I think about it, the more I believe PG is missing an opportunity.  
The query is adequately constrained without the BETWEEN clause.  Why doesn't PG 
see that?  I realize I'm a hack and by db organization shows that...

The query is wrong as stated, as it won't work when the interval crosses a year 
boundary, but it's a stop gap for now.

Cheers,
Rob

-- 
 07:58:30 up 4 days, 25 min,  9 users,  load average: 2.13, 2.15, 2.22
Linux 2.6.12-12-2 #4 SMP Tue Jan 3 19:56:19 MST 2006


pgpvvd8OrY8zM.pgp
Description: PGP signature


Re: [PERFORM] Index isn't used during a join.

2006-01-11 Thread Robert Creager
When grilled further on (Wed, 11 Jan 2006 10:33:03 -0500),
Tom Lane [EMAIL PROTECTED] confessed:

 The planner understands about transitivity of equality, ie given a = b
 and b = c it can infer a = c.  It doesn't do any such thing for
 inequalities though, nor does it deduce f(a) = f(b) for arbitrary
 functions f.  The addition Michael suggested requires much more
 understanding of the properties of the functions in your query than
 I think would be reasonable to put into the planner.
 

OK.  I think reached a point that I need to re-organize how the data is stored,
maybe ridding myself of the schema and switching entirely to views.  At that
point, I likely could rid myself of the function (unmunge_time) I'm using, and
work with times and doy fields.

Thanks,
Rob

-- 
 21:17:00 up 4 days, 13:43,  9 users,  load average: 2.02, 2.18, 2.23
Linux 2.6.12-12-2 #4 SMP Tue Jan 3 19:56:19 MST 2006


pgpFfPrTCdHVy.pgp
Description: PGP signature


Re: [PERFORM] Index isn't used during a join.

2006-01-10 Thread Robert Creager
When grilled further on (Mon, 9 Jan 2006 22:58:18 -0700),
Michael Fuhr [EMAIL PROTECTED] confessed:

 On Mon, Jan 09, 2006 at 09:23:38PM -0700, Robert Creager wrote:
  I'm working with a query to get more info out with a join.  The base
  query works great speed wise because of index usage.  When the join is
  tossed in, the index is no longer used, so the query performance tanks.
 
 The first query you posted returns 285 rows and the second returns
 over one million; index usage aside, that difference surely accounts
 for a performance penalty.  And as is often pointed out, index scans
 aren't always faster than sequential scans: the more of a table a
 query has to fetch, the more likely a sequential scan will be faster.

Thanks for pointing out the obvious that I missed.  Too much data in the second 
query.  It's supposed to match (row wise) what was returned from the first 
query.

Just ignore me for now...

Thanks,
Rob

-- 
 08:15:24 up 3 days, 42 min,  9 users,  load average: 2.07, 2.20, 2.25
Linux 2.6.12-12-2 #4 SMP Tue Jan 3 19:56:19 MST 2006


pgpQ6nKK3glcj.pgp
Description: PGP signature


Re: [PERFORM] Index isn't used during a join.

2006-01-10 Thread Robert Creager

Ok, I'm back, and in a little better shape.

The query is now correct, but still is slow because of lack of index usage.  I 
don't know how to structure the query correctly to use the index.

Taken individually:

weather=# explain analyze select * from doy_agg where doy = extract( doy from 
now() );
   QUERY PLAN   


 HashAggregate  (cost=13750.67..13750.71 rows=2 width=20) (actual 
time=123.134..123.135 rows=1 loops=1)
   -  Bitmap Heap Scan on readings  (cost=25.87..13720.96 rows=3962 width=20) 
(actual time=6.384..116.559 rows=4175 loops=1)
 Recheck Cond: (date_part('doy'::text, when) = date_part('doy'::text, 
now()))
 -  Bitmap Index Scan on readings_doy_index  (cost=0.00..25.87 
rows=3962 width=0) (actual time=5.282..5.282 rows=4215 loops=1)
   Index Cond: (date_part('doy'::text, when) = 
date_part('doy'::text, now()))
 Total runtime: 123.366 ms

produces the data:

weather=# select * from doy_agg where doy = extract( doy from now() );
 doy |  avg_windspeed   | max_windspeed 
-+--+---
  10 | 8.53403056583666 |59

and:

weather=# EXPLAIN ANALYZE
weather-# SELECT *,
weather-# unmunge_time( time_group ) AS time
weather-# FROM minute.windspeed
weather-# WHERE unmunge_time( time_group )  ( now() - '24 hour'::interval )
weather-# ORDER BY time_group;
   QUERY PLAN   
 
-
 Sort  (cost=595.33..595.77 rows=176 width=28) (actual time=4.762..4.828 
rows=283 loops=1)
   Sort Key: time_group
   -  Bitmap Heap Scan on windspeed  (cost=2.62..588.76 rows=176 width=28) 
(actual time=0.901..3.834 rows=283 loops=1)
 Recheck Cond: (unmunge_time(time_group)  (now() - 
'24:00:00'::interval))
 -  Bitmap Index Scan on minute_windspeed_unmunge_index  
(cost=0.00..2.62 rows=176 width=0) (actual time=0.745..0.745 rows=284 loops=1)
   Index Cond: (unmunge_time(time_group)  (now() - 
'24:00:00'::interval))
 Total runtime: 5.108 ms

produces:

 time_group |min_reading| max_reading |avg_reading|time 

+---+-+---+-
 1136869500 |   0.8 |   6 |  2.62193548387097 | 2006-01-09 
22:05:00
 1136869800 | 0 |   3 | 0.406021505376343 | 2006-01-09 
22:10:00
 1136870100 | 0 |   5 |  1.68 | 2006-01-09 
22:15:00
... 

But I want the composite of the two queries, and I'm stuck on:

weather=# EXPLAIN ANALYZE
weather-# SELECT *,
weather-# unmunge_time( time_group ) AS time
weather-# FROM minute.windspeed
weather-# JOIN doy_agg ON( EXTRACT( doy FROM unmunge_time( time_group ) ) = doy 
)
weather-# WHERE unmunge_time( time_group )  ( now() - '24 hour'::interval )
weather-# ORDER BY time_group;
 QUERY 
PLAN  
-
 Sort  (cost=153627.67..153628.48 rows=322 width=48) (actual 
time=10637.681..10637.748 rows=286 loops=1)
   Sort Key: windspeed.time_group
   -  Merge Join  (cost=153604.82..153614.26 rows=322 width=48) (actual 
time=10633.375..10636.728 rows=286 loops=1)
 Merge Cond: (outer.?column5? = inner.doy)
 -  Sort  (cost=594.89..595.33 rows=176 width=28) (actual 
time=5.539..5.612 rows=286 loops=1)
   Sort Key: date_part('doy'::text, 
unmunge_time(windspeed.time_group))
   -  Bitmap Heap Scan on windspeed  (cost=2.62..588.32 rows=176 
width=28) (actual time=0.918..4.637 rows=286 loops=1)
 Recheck Cond: (unmunge_time(time_group)  (now() - 
'24:00:00'::interval))
 -  Bitmap Index Scan on minute_windspeed_unmunge_index  
(cost=0.00..2.62 rows=176 width=0) (actual time=0.739..0.739 rows=287 loops=1)
   Index Cond: (unmunge_time(time_group)  (now() - 
'24:00:00'::interval))
 -  Sort  (cost=153009.93..153010.84 rows=366 width=20) (actual 
time=10627.699..10627.788 rows=295 loops=1)
   Sort Key: doy_agg.doy
   -  HashAggregate  (cost=152984.28..152990.69 rows=366 width=20) 
(actual time=10625.649..10626.601 rows=366 loops=1)
 -  Seq Scan on readings  (cost=0.00..145364.93 
rows=1015914 width=20) (actual 

[PERFORM] Index isn't used during a join.

2006-01-09 Thread Robert Creager


Hey folks,

I'm working with a query to get more info out with a join.  The base query 
works great speed wise because of index usage.  When the join is tossed in, the 
index is no longer used, so the query performance tanks.

Can anyone advise on how to get the index usage back?

weather=# select version();
version 
   
---
 PostgreSQL 8.1.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.0.1 
(4.0.1-5mdk for Mandriva Linux release 2006.0)
(1 row)

The base query is:

weather=# EXPLAIN ANALYZE
weather-# SELECT min_reading, max_reading, avg_reading, -- doy,
weather-#unmunge_time( time_group ) AS time
weather-# FROM minute.windspeed
weather-# --JOIN readings_doy ON EXTRACT( doy FROM unmunge_time( time_group ) ) 
= doy
weather-# WHERE unmunge_time( time_group )  ( now() - '24 hour'::interval )
weather-# ORDER BY time_group;
 QUERY PLAN 
 
-
 Sort  (cost=10995.29..11155.58 rows=64117 width=28) (actual time=4.509..4.574 
rows=285 loops=1)
   Sort Key: time_group
   -  Bitmap Heap Scan on windspeed  (cost=402.42..5876.05 rows=64117 
width=28) (actual time=0.784..3.639 rows=285 loops=1)
 Recheck Cond: (unmunge_time(time_group)  (now() - 
'24:00:00'::interval))
 -  Bitmap Index Scan on minute_windspeed_index  (cost=0.00..402.42 
rows=64117 width=0) (actual time=0.675..0.675 rows=285 loops=1)
   Index Cond: (unmunge_time(time_group)  (now() - 
'24:00:00'::interval))
 Total runtime: 4.880 ms
(7 rows)

When I add in the join, the query tosses out the nice quick index in favor of 
sequence scans:

weather=# EXPLAIN ANALYZE
weather-# SELECT min_reading, max_reading, avg_reading, -- doy,
weather-#unmunge_time( time_group ) AS time
weather-# FROM minute.windspeed
weather-# JOIN readings_doy ON EXTRACT( doy FROM unmunge_time( time_group ) ) = 
doy
weather-# WHERE unmunge_time( time_group )  ( now() - '24 hour'::interval )
weather-# ORDER BY time_group;
   QUERY PLAN   
 
-
 Sort  (cost=98239590.88..99052623.66 rows=325213113 width=28) (actual 
time=60136.484..61079.845 rows=1030656 loops=1)
   Sort Key: windspeed.time_group
   -  Merge Join  (cost=265774.21..8396903.54 rows=325213113 width=28) (actual 
time=34318.334..47113.277 rows=1030656 loops=1)
 Merge Cond: (outer.?column5? = inner.?column2?)
 -  Sort  (cost=12997.68..13157.98 rows=64120 width=28) (actual 
time=2286.155..2286.450 rows=284 loops=1)
   Sort Key: date_part('doy'::text, 
unmunge_time(windspeed.time_group))
   -  Seq Scan on windspeed  (cost=0.00..7878.18 rows=64120 
width=28) (actual time=2279.275..2285.271 rows=284 loops=1)
 Filter: (unmunge_time(time_group)  (now() - 
'24:00:00'::interval))
 -  Sort  (cost=252776.54..255312.51 rows=1014389 width=8) (actual 
time=32001.370..33473.407 rows=1051395 loops=1)
   Sort Key: date_part('doy'::text, readings.when)
   -  Seq Scan on readings  (cost=0.00..142650.89 rows=1014389 
width=8) (actual time=0.053..13759.015 rows=1014448 loops=1)
 Total runtime: 61720.935 ms
(12 rows)

weather=# \d minute.windspeed
  Table minute.windspeed
   Column|   Type   | Modifiers 
-+--+---
 time_group  | integer  | not null
 min_reading | double precision | not null
 max_reading | double precision | not null
 avg_reading | double precision | not null
Indexes:
windspeed_pkey PRIMARY KEY, btree (time_group)
minute_windspeed_index btree (unmunge_time(time_group))

CREATE OR REPLACE FUNCTION unmunge_time( integer )
RETURNS timestamp AS '
DECLARE
   input ALIAS FOR $1;
BEGIN
   RETURN (''epoch''::timestamptz + input * ''1sec''::interval)::timestamp;
END;
' LANGUAGE plpgsql IMMUTABLE STRICT;

weather=# \d readings
 Table public.readings
Column|Type |  
Modifiers  
--+-+-
 when | timestamp without time zone | not null default 
(timeofday())::timestamp without time zone
 hour_group   | integer | 
 minute_group

Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS

2005-07-23 Thread Robert Creager

I've now backed off to version 7.4.1, which doesn't exhibit the problems that
8.0.3 does.  I guess I'll wait 'till the next version and see if any progress
has occurred.

Rob

When grilled further on (Tue, 19 Jul 2005 22:49:08 -0600),
Robert Creager [EMAIL PROTECTED] confessed:

 When grilled further on (Tue, 19 Jul 2005 12:09:51 -0600),
 Robert Creager [EMAIL PROTECTED] confessed:
 
  On Tue, 19 Jul 2005 12:54:22 -0400
  Tom Lane [EMAIL PROTECTED] wrote:
  
   Hmm, I hadn't thought about the possible impact of multiple concurrent
   vacuums.  Is the problem caused by that, or has performance already gone
   into the tank by the time the cron-driven vacuums are taking long enough
   to overlap?
  
  
  I'll re-start the database, vacuum full analyze and restart the runs without
 the
  cron vacuum running.
  
 
 It took a few hours, but the problem did finally occur with no vacuum running
on
 803.  CS is averaging 72k.  I cannot quantitatively say it took longer to
 reproduce than with the vacuums running, but it seemed like it did.
 
 Can any information be gotten out of this?  Should I try CVS HEAD?
 
 Thoughts?
 
 Thanks,
 Rob
 
 -- 
  22:41:36 up 6 days,  2:16,  6 users,  load average: 0.15, 0.21, 0.30
 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


-- 
 14:35:32 up 9 days, 18:10,  5 users,  load average: 2.17, 2.19, 2.15
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


pgp1mUbAAYNgr.pgp
Description: PGP signature


Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS issue?

2005-07-19 Thread Robert Creager
On Mon, 18 Jul 2005 13:52:53 -0400
Tom Lane [EMAIL PROTECTED] wrote:

 
 Start a fresh psql session and SHOW vacuum_cost_delay to verify what
 the active setting is.
 

Alright.  Restarted the 803 database.  Cron based vacuum analyze is running
every 5 minutes.  vacuum_cost_delay is 0.  The problem showed up after about 1/2
hour of running.  I've got vacuum jobs stacked from the last 35 minutes, with 2
vacuums running at the same time.  CS is around 73k.

What do I do now?  I can bring the db back to normal and not run any cron based
vacuum to see if it still happens, but I suspect nothing will happen without the
vacuum.  I'll leave it in it's current semi-catatonic state as long as possible
in case there is something to look at?

Cheers,
Rob

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS issue?

2005-07-19 Thread Robert Creager
On Tue, 19 Jul 2005 12:54:22 -0400
Tom Lane [EMAIL PROTECTED] wrote:

 Hmm, I hadn't thought about the possible impact of multiple concurrent
 vacuums.  Is the problem caused by that, or has performance already gone
 into the tank by the time the cron-driven vacuums are taking long enough
 to overlap?

Don't know just yet.  When I run the vacuums manually on a healthy system on
741, they take less than 30 seconds.  I've stopped the cron vacuum and canceled
all the outstanding vacuum processes, but the 803 is still struggling (1/2 hour
later).

I'll re-start the database, vacuum full analyze and restart the runs without the
cron vacuum running.

Cheers,
Rob

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS issue?

2005-07-19 Thread Robert Creager
On Tue, 19 Jul 2005 12:54:22 -0400
Tom Lane [EMAIL PROTECTED] wrote:

 Robert Creager [EMAIL PROTECTED] writes:
 
 Hmm, I hadn't thought about the possible impact of multiple concurrent
 vacuums.  Is the problem caused by that, or has performance already gone
 into the tank by the time the cron-driven vacuums are taking long enough
 to overlap?

All statements over 5 seconds are logged.  Vacuums are running on the 5 minute 
mark.

Log file shows the first query starts going bad a 9:32:15 (7 seconds), although 
the second query start before the first .  The first vacuum statement logged 
shows 1148 seconds completing at 9:54:09, so starting at 9:35.  Looks like the 
vacuum is an innocent bystander of the problem.

The first problem queries are below.  Additionally, I've attached 5 minutes 
(bzipped) of logs starting at the first event below.

Jul 19 09:32:15 annette postgres[17029]: [2-1] LOG:  duration: 7146.168 ms  
statement:
Jul 19 09:32:15 annette postgres[17029]: [2-2] ^I  SELECT location_id, 
location_type.name AS type, library, rail
Jul 19 09:32:15 annette postgres[17029]: [2-3] ^I  FROM location_lock JOIN 
location USING( location_id )
Jul 19 09:32:15 annette postgres[17029]: [2-4] ^I JOIN 
location_type USING( location_type_id )
Jul 19 09:32:15 annette postgres[17029]: [2-5] ^I  WHERE test_session_id = 
'5264'
Jul 19 09:32:20 annette postgres[17092]: [2-1] LOG:  duration: 13389.730 ms  
statement:
Jul 19 09:32:20 annette postgres[17092]: [2-2] ^I  SELECT 
location_type.name AS location_type_name,
Jul 19 09:32:20 annette postgres[17092]: [2-3] ^I library, rail, 
col, side, row, location_id,
Jul 19 09:32:20 annette postgres[17092]: [2-4] ^I hli_lsm, 
hli_panel, hli_row, hli_col
Jul 19 09:32:20 annette postgres[17092]: [2-5] ^I  FROM location JOIN 
location_type USING( location_type_id )
Jul 19 09:32:20 annette postgres[17092]: [2-6] ^IJOIN 
complex USING( library_id )
Jul 19 09:32:20 annette postgres[17092]: [2-7] ^ILEFT OUTER 
JOIN hli_location USING( location_id )
Jul 19 09:32:20 annette postgres[17092]: [2-8] ^ILEFT OUTER 
JOIN application USING( application_id )
Jul 19 09:32:20 annette postgres[17092]: [2-9] ^I  WHERE  
complex.complex_id = '13'
Jul 19 09:32:20 annette postgres[17092]: [2-10] ^I AND location_id NOT 
IN
Jul 19 09:32:20 annette postgres[17092]: [2-11] ^I(SELECT 
location_id
Jul 19 09:32:20 annette postgres[17092]: [2-12] ^I FROM 
location_lock)
Jul 19 09:32:20 annette postgres[17092]: [2-13] ^I AND location_id NOT 
IN
Jul 19 09:32:20 annette postgres[17092]: [2-14] ^I(SELECT 
location_id
Jul 19 09:32:20 annette postgres[17092]: [2-15] ^I FROM 
cartridge)
Jul 19 09:32:20 annette postgres[17092]: [2-16] ^IAND (location_type.name ~ 
'cell' AND application.name ~ 'hli'  AND hli_lsm = 1 AND col BETWEEN -2 AND 2)
Jul 19 09:32:20 annette postgres[17092]: [2-17] ^I
Jul 19 09:32:20 annette postgres[17092]: [2-18] ^I ORDER BY 
location.usage_count, location.rand LIMIT 1
Jul 19 09:32:20 annette postgres[17092]: [2-19] ^I FOR UPDATE OF 
location

Cheers,
Rob



pg.log.bz2
Description: BZip2 compressed data

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS

2005-07-19 Thread Robert Creager
When grilled further on (Tue, 19 Jul 2005 12:09:51 -0600),
Robert Creager [EMAIL PROTECTED] confessed:

 On Tue, 19 Jul 2005 12:54:22 -0400
 Tom Lane [EMAIL PROTECTED] wrote:
 
  Hmm, I hadn't thought about the possible impact of multiple concurrent
  vacuums.  Is the problem caused by that, or has performance already gone
  into the tank by the time the cron-driven vacuums are taking long enough
  to overlap?
 
 
 I'll re-start the database, vacuum full analyze and restart the runs without
the
 cron vacuum running.
 

It took a few hours, but the problem did finally occur with no vacuum running on
803.  CS is averaging 72k.  I cannot quantitatively say it took longer to
reproduce than with the vacuums running, but it seemed like it did.

Can any information be gotten out of this?  Should I try CVS HEAD?

Thoughts?

Thanks,
Rob

-- 
 22:41:36 up 6 days,  2:16,  6 users,  load average: 0.15, 0.21, 0.30
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


pgpmXr0G8ggMk.pgp
Description: PGP signature


Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS

2005-07-18 Thread Robert Creager
When grilled further on (Sun, 17 Jul 2005 23:43:29 -0600),
Robert Creager [EMAIL PROTECTED] confessed:

 I've 6 runs going concurrently.  Just saw (vmstat 1) a set of 8 seconds where
 the CS didn't drop below 90k, but right now its at ~300 for over 30 seconds...

 It's bouncing all over the place, but staying reasonably well behaved overall.
 

Against 741 and the same load, CS is steady around 300 with spikes up to 4k, but
it's only been running for about 15 minutes.  All queries are  .2 seconds.

Cheers,
Rob

-- 
 00:03:33 up 4 days,  3:38,  6 users,  load average: 1.67, 0.98, 0.44
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


pgpnjscjDu7zf.pgp
Description: PGP signature


Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS

2005-07-18 Thread Robert Creager
When grilled further on (Mon, 18 Jul 2005 00:10:53 -0400),
Tom Lane [EMAIL PROTECTED] confessed:

 The context swap problem was no worse in 8.0 than in prior versions,
 so that hardly seems like a good explanation.  Have you tried reverting
 to the cron-based vacuuming method you used in 7.4?
 

Ran 7 hours on 741 with VACUUM ANALYZE every 5 minutes.  The largest CS I saw
was 40k, with an average of 500 (via script which monitors vmstat output).

I've done a VACUUM FULL ANALYZE on 803 and have switched the cron based VACUUM
ANALYZE to 803 also.  The tests are now running again.

 Hmm, did you read this thread?
 http://archives.postgresql.org/pgsql-performance/2005-07/msg00088.php

I just glanced at it.  Once I've reproduced (or not) the problem on 803 with the
VACUUM FULL, I'll turn off the vacuum delay.

Cheers,
Rob

-- 
 07:10:06 up 4 days, 10:45,  6 users,  load average: 0.28, 0.40, 0.29
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


pgpVMzW0M2boJ.pgp
Description: PGP signature


Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS

2005-07-18 Thread Robert Creager
When grilled further on (Mon, 18 Jul 2005 09:23:11 -0400),
Tom Lane [EMAIL PROTECTED] confessed:

 It's still far from clear what's going on there, but it might be
 interesting to see if turning off the vacuum delay changes your results
 with 8.0.
 

Can that be affected by hupping the server, or do I need a restart?

Thanks,
Rob

-- 
 07:46:53 up 4 days, 11:21,  6 users,  load average: 0.77, 0.43, 0.27
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


pgpvW15iUwSUq.pgp
Description: PGP signature


Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS issue?

2005-07-18 Thread Robert Creager
On Mon, 18 Jul 2005 13:52:53 -0400
Tom Lane [EMAIL PROTECTED] wrote:

 Start a fresh psql session and SHOW vacuum_cost_delay to verify what
 the active setting is.

Thanks.  It does show 0 for 803 in a session that was up since I thought I had
HUPed the server with the new value.

This is leading me to believe that 803 doesn't do very well with VACUUM ANALYZE
running often, at least in my particular application...  I will provide a more
definitive statement to that affect, hopefully tonight.

Cheers,
Rob

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS issue?

2005-07-17 Thread Robert Creager

Sigh...

I recently upgraded from 7.4.1 to 8.0.3.  The application did not change.  I'm
now running both database concurrently (on different ports, same machine) just
so I could verify the problem really exists.

The application is a custom test application for testing mechanical systems. 
The runs in question (4 at a time) each generate 16 queries at a time of which
the results are sent to the mechanical system which processes the request, which
processes them anywhere from 10 to 120 seconds.  The system is capable of
completing between 4 and 8 jobs at once.  So, once the system is running, at
most there will be 8 queries per run simultaneously.

The entire database fits into RAM (2Gb), as evidenced by no disk activity and
relatively small database size.  pg_xlog is on different disks from the db.

The problem is that on version 8.0.3, once I get 3 or more concurrent runs
going, the query times start tanking (20 seconds).  On 7.4.1, the applications
hum along with queries typically below .2 seconds on over 5 concurrent runs. 
Needless to say, 7.4.1 behaves as expected...  The only change between runs is
the port connecting to.  Bot DB's are up at the same time.

For 8.03, pg_autovacuum is running.  On 7.4.1, I set up a cron job to vacuum
analyze every 5 minutes.

The system is Mandrake Linux running 2.4.22 kernel with dual Intel Xenon CPU
with HT enabled.  On an 803 run, the context switching is up around 60k.  On
7.4.1, it maxes around 23k and averages  1k.

I've attached four files.  741 has the query and explain analyze.  803 has the
query and explain analyze during loaded and unloaded times.  I've also attached
the conf files for the two versions running.  I've gone through them and don't
see any explanation for the problem I'm having.

I'm guessing this is the CS problem that reared it's head last year?  I had an
e-mail exchange in April of last year about this.  Any reason this would be
worse with 8.0.3?

Thanks,
Rob

-- 
 13:33:43 up 3 days, 17:08,  9 users,  load average: 0.16, 0.59, 0.40
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


741
Description: Binary data


803
Description: Binary data


postgres741.conf
Description: Binary data


postgres803.conf
Description: Binary data


pgpyrJTShV6un.pgp
Description: PGP signature


Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS

2005-07-17 Thread Robert Creager

I am, and it is.  It's ANALYZING and VACUUM'ing tables every interval (5 minutes
- 8.0.3).  Right now, for that last 4 hours, I'm not VACUUMing the 7.4.1
database and it's still clicking along at  .2 second queries.  Last year
(7.4.1), I noticed that it took about a week of heavy activity (for this DB)
before I'd really need a vacuum.  That's when I put in the 5 min cron.

When I first switched over to 8.0.3, I was still running the cron vacuum.  I got
into big trouble when I had vacuum's backed up for 6 hours.  That's when I
started noticing the query problem, and the CS numbers being high.  7.4.1
vacuums every 5 minutes always take  30 seconds (when I'm watching).

Cheers,
Rob

When grilled further on (Sun, 17 Jul 2005 23:48:20 -0400),
Matthew T. O'Connor matthew@zeut.net confessed:

 Robert Creager wrote:
 
 For 8.03, pg_autovacuum is running.  On 7.4.1, I set up a cron job to vacuum
 analyze every 5 minutes.
   
 
 
 Are you sure that pg_autovacuum is doing it's job?  Meaning are you sure 
 it's vacuuming as often as needed?  Try to run it with -d2 or so and 
 make sure that it is actually doing the vacuuming needed.


-- 
 22:04:10 up 4 days,  1:39,  8 users,  load average: 0.15, 0.15, 0.12
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


pgpxEAthphYZc.pgp
Description: PGP signature


Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS

2005-07-17 Thread Robert Creager
When grilled further on (Mon, 18 Jul 2005 16:17:54 +1200),
David Mitchell [EMAIL PROTECTED] confessed:

 Maybe you should check who is holding locks.

Hmmm...  The only difference is how the vacuum is run.  One by autovacuum, one
by cron (vacuum analyze every 5 minutes).

Cheers,
Rob

-- 
 23:01:44 up 4 days,  2:36,  6 users,  load average: 0.27, 0.16, 0.10
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


pgpkB3hqiYTXD.pgp
Description: PGP signature


Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS

2005-07-17 Thread Robert Creager
When grilled further on (Mon, 18 Jul 2005 00:10:53 -0400),
Tom Lane [EMAIL PROTECTED] confessed:

 Have you tried reverting
 to the cron-based vacuuming method you used in 7.4?
 

I just stopped autovacuum, ran a manual vacuum analyze on 803 (2064 pages
needed, 800 FSM setting) and re-started the run (with cron vac enabled). 
The query problem has not showed up yet (1/2 hour).  A vacuum on 741 showed 3434
pages needed, 20 FSM setting.

I'll let it run the night and see if it shows up after a couple of hours.  It
has run clean for 1 hour prior.  If this runs 'till morning, I'll re-enable the
autovacuum, disable the cron and see if it reproduces itself (the slowdown).

Cheers,
Rob

-- 
 22:18:40 up 4 days,  1:53,  8 users,  load average: 0.10, 0.20, 0.14
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


pgphZ4CGPEvpV.pgp
Description: PGP signature


Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS

2005-07-17 Thread Robert Creager
When grilled further on (Sun, 17 Jul 2005 22:09:11 -0700),
Jeffrey W. Baker [EMAIL PROTECTED] confessed:

 
 Did you build 8.0.3 yourself, or install it from packages?  I've seen in
 the past where pg would build with the wrong kind of mutexes on some
 machines, and that would send the CS through the roof.  If you did build
 it yourself, check your ./configure logs.  If not, try strace.

I always build PG from source.  I did check the config.log command line
(./configure) and they were similar enough.  The system has not changed between
building the two versions (if it ain't broke...).

Cheers,
Rob

-- 
 23:25:21 up 4 days,  3:00,  6 users,  load average: 0.25, 0.15, 0.11
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


pgpl91SSTPMNJ.pgp
Description: PGP signature


Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS

2005-07-17 Thread Robert Creager
When grilled further on (Mon, 18 Jul 2005 00:18:30 -0400),
Tom Lane [EMAIL PROTECTED] confessed:

 Robert Creager [EMAIL PROTECTED] writes:
  I am, and it is.  It's ANALYZING and VACUUM'ing tables every interval (5 mi=
  nutes
  - 8.0.3).  Right now, for that last 4 hours, I'm not VACUUMing the 7.4.1
  database and it's still clicking along at  .2 second queries.
 
 Have you compared physical table sizes?  If the autovac daemon did let
 things get out of hand, you'd need a VACUUM FULL or CLUSTER or TRUNCATE
 to get the table size back down --- plain VACUUM is unlikely to fix it.

Table sizes, no.  Entire DB size is 45Mb for 803 and 29Mb for 741.  Cannot make
a direct comparison between the two as I've run against more machines now with
803 than 741, so I'd expect it to be larger.

I'm still running relatively clean on 803 with cron vacuum.  The CS are jumping
from 100 to 120k, but it's not steady state like it was before, and queries are
all under 5 seconds (none hitting the logs) and are typically (glancing at test
runs) still under 1 sec, with some hitting ~2 seconds occasionally.

I've 6 runs going concurrently.  Just saw (vmstat 1) a set of 8 seconds where
the CS didn't drop below 90k, but right now its at ~300 for over 30 seconds... 
It's bouncing all over the place, but staying reasonably well behaved overall.

Whoop.  Spoke too soon.  Just hit the wall.  CS at ~80k constant, queries over
10 seconds and rising (30+ now)...  Looking at ps, the vacuum is currently
running.  Going back in the logs, the CS and vacuum hit at about the same time.

I'm going to go back to 741 with the same load and see what happens by tomorrow
morning...  I'll change the cron vac to hit the 741 db.

Cheers,
Rob

-- 
 23:29:24 up 4 days,  3:04,  6 users,  load average: 0.02, 0.07, 0.08
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


pgpkmkSsQbJPx.pgp
Description: PGP signature


Re: [PERFORM] Insert performance, what should I expect?

2004-10-20 Thread Robert Creager
When grilled further on (Tue, 19 Oct 2004 22:12:28 -0400),
Rod Taylor [EMAIL PROTECTED] confessed:

  I've done some manual benchmarking running my script 'time script.pl'
  I realise my script uses some of the time, bench marking shows that
  %50 of the time is spent in dbd:execute.
  
 1) Drop DBD::Pg and switch to the Pg driver for Perl instead (non-DBI
 compliant) which has functions similar to putline() that allow COPY to
 be used.

COPY can be used with DBD::Pg, per a script I use:

$dbh-do( COPY temp_obs_$band ( $col_list ) FROM stdin );
$dbh-func( join ( \t, @data ) . \n, 'putline' );
$dbh-func( \\.\n, 'putline' );
$dbh-func( 'endcopy' );

With sets of data from 1000 to 8000 records, my COPY performance is consistent
at ~1 records per second.

Cheers,
Rob

-- 
 10:39:31 up 2 days, 16:25,  2 users,  load average: 2.15, 2.77, 3.06
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


pgp1zoDKUrU9u.pgp
Description: PGP signature


[PERFORM] This query is still running after 10 hours...

2004-09-28 Thread Robert Creager

Help?

Normally, this query takes from 5 minutes to 2 hours to run.  On this update, it's 
been running for more than 10 hours.

Can it be helped?

UPDATE obs_v
SET mag = obs_v.imag + zp.zero_v + cg.color_v * (obs_v.imag - i.imag),
use = true
FROM color_groups AS cg, zero_pair AS zp, obs_i AS i, files AS f, groups AS g
WHERE  obs_v.star_id = i.star_id
   AND obs_v.file_id = f.file_id
   AND cg.group_id = g.group_id
   AND g.night_id = f.night_id
   AND g.group_id = $group_id
   AND zp.pair_id = f.pair_id

Hash Join  (cost=130079.22..639663.94 rows=1590204 width=63)
  Hash Cond: (outer.star_id = inner.star_id)
  -  Seq Scan on obs_i i  (cost=0.00..213658.19 rows=10391319 width=8)
  -  Hash  (cost=129094.19..129094.19 rows=77211 width=59)
-  Nested Loop  (cost=250.69..129094.19 rows=77211 width=59)
  -  Hash Join  (cost=250.69..307.34 rows=67 width=12)
Hash Cond: (outer.pair_id = inner.pair_id)
-  Seq Scan on zero_pair zp  (cost=0.00..43.32 rows=2532 width=8)
-  Hash  (cost=250.40..250.40 rows=118 width=12)
  -  Hash Join  (cost=4.80..250.40 rows=118 width=12)
Hash Cond: (outer.night_id = inner.night_id)
-  Seq Scan on files f  (cost=0.00..199.28 rows=9028 
width=12)
-  Hash  (cost=4.80..4.80 rows=1 width=8)
  -  Nested Loop  (cost=0.00..4.80 rows=1 width=8)
-  Seq Scan on color_groups cg  
(cost=0.00..2.84 rows=1 width=8)
  Filter: (171 = group_id)
-  Seq Scan on groups g  (cost=0.00..1.95 
rows=1 width=8)
  Filter: (group_id = 171)
  -  Index Scan using obs_v_file_id_index on obs_v  (cost=0.00..1893.23 
rows=2317 width=51)
Index Cond: (obs_v.file_id = outer.file_id)

Table definitions:

tassiv=# \d color_groups
  Table public.color_groups
Column|  Type   |   Modifiers  
 
--+-+---
 group_id | integer | not null default nextval('color_groups_group_id_seq'::text)
 color_u  | real| 
 color_b  | real| 
 color_v  | real| 
 color_r  | real| 
 color_i  | real| 
 max_residual | real| 
Indexes:
color_groups_pkey primary key, btree (group_id)
color_group_group_id_index btree (group_id)

tassiv=# \d zero_pair
   Table public.zero_pair
 Column  |  Type   | Modifiers 
-+-+---
 pair_id | integer | not null
 zero_u  | real| default 0
 zero_b  | real| default 0
 zero_v  | real| default 0
 zero_r  | real| default 0
 zero_i  | real| default 0
Indexes:
zero_pair_pkey primary key, btree (pair_id)
Foreign-key constraints:
$1 FOREIGN KEY (pair_id) REFERENCES pairs(pair_id) ON DELETE CASCADE

tassiv=# \d obs_v
Table public.obs_v
 Column  |  Type   |   Modifiers
-+-+
 x   | real| not null
 y   | real| not null
 imag| real| not null
 smag| real| not null
 loc | spoint  | not null
 obs_id  | integer | not null default nextval('obs_id_seq'::text)
 file_id | integer | not null
 use | boolean | default false
 solve   | boolean | default false
 star_id | integer | 
 mag | real| 
Indexes:
obs_v_file_id_index btree (file_id)
obs_v_loc_index gist (loc)
obs_v_obs_id_index btree (obs_id)
obs_v_star_id_index btree (star_id)
obs_v_use_index btree (use)
Foreign-key constraints:
obs_v_files_constraint FOREIGN KEY (file_id) REFERENCES files(file_id) ON DELETE 
CASCADE
obs_v_star_id_constraint FOREIGN KEY (star_id) REFERENCES catalog(star_id) ON 
DELETE SET NULL
Triggers:
obs_v_trig BEFORE INSERT OR DELETE OR UPDATE ON obs_v FOR EACH ROW EXECUTE 
PROCEDURE observations_trigger
()

tassiv=# \d files
  Table public.files
  Column  |Type |   Modifiers  
 
--+-+---
 file_id  | integer | not null default 
nextval('files_file_id_seq'::text)
 night_id | integer | 
 pair_id  | integer | 
 name | character varying   | not null
 date | timestamp without time zone | 
Indexes:
files_pkey primary key, btree (file_id)
files_name_key unique, btree (name)
files_id_index btree (file_id, night_id, pair_id)
Foreign-key constraints:
$1 FOREIGN KEY (night_id) REFERENCES 

Re: [PERFORM] This query is still running after 10 hours...

2004-09-28 Thread Robert Creager
When grilled further on (Tue, 28 Sep 2004 09:28:47 -0500),
Kevin Barnard [EMAIL PROTECTED] confessed:

 What does observations_trigger do?
 

The trigger keeps another table (catalog) up to date with the information from the 
obs_v and obs_i tables.  There are no direct insert/update/delete's on the catalog 
table, only though the trigger.

-- 
 19:56:54 up 5 days, 22:23,  2 users,  load average: 2.46, 2.27, 2.15
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


pgpuGE1HUtiSw.pgp
Description: PGP signature


Re: [PERFORM] This query is still running after 10 hours...

2004-09-28 Thread Robert Creager
When grilled further on (Tue, 28 Sep 2004 16:55:13 +0200),
Gaetano Mendola [EMAIL PROTECTED] confessed:

 Robert Creager wrote:
  Help?
  
  Normally, this query takes from 5 minutes to 2 hours to run.  On this
  update, it's been running for more than 10 hours.
  
  Can it be helped?
 
 
 When I see this usually means that tables are full of
 dead rows. Did you vacuum you DB. Which version are you
 using ?
 

Gee, the two questions I realized I forgot to answer going into work ;-)  I  run
pg_autovacuum, and it's working.  Even ran a FULL ANALYZE, no help.  The version
is 7.4.1.

Cheers,
Rob 

-- 
 20:22:11 up 5 days, 22:48,  2 users,  load average: 2.16, 2.18, 2.15
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


pgpqHwDKGGx6C.pgp
Description: PGP signature


Re: [PERFORM] This query is still running after 10 hours...

2004-09-28 Thread Robert Creager
When grilled further on (Tue, 28 Sep 2004 21:41:50 -0500),
Kevin Barnard [EMAIL PROTECTED] confessed:

 On Tue, 28 Sep 2004 20:21:40 -0600, Robert Creager
 [EMAIL PROTECTED] wrote:
  
  The trigger keeps another table (catalog) up to date with the information
  from the obs_v and obs_i tables.  There are no direct insert/update/delete's
  on the catalog table, only though the trigger.
  
 
 It's possible that the update to catalog is what is really taking a
 long time.  You might wish to try and explain that query just to make
 sure.  You might also wish to disable to trigger just to rule it out. 
 Does catalog have any triggers on it?  Does it have any foreign keys?

A select on the catalog is really quick (54ms on a random query - ~1M entries).  The 
updates use the index.  The catalog table has no triggers or foreign keys.  The 
trigger on the obs_? tables manages the catalog table.

tassiv=# \d catalog
Table public.catalog
  Column  |   Type   |Modifiers
--+--+-
 star_id  | integer  | not null default nextval('star_id_seq'::text)
 loc_count| integer  | default 0
 loc  | spoint   | not null
 ra_sum   | double precision | default 0
 ra_sigma | real | default 0
 ra_sum_square| double precision | default 0
 dec_sum  | double precision | default 0
 dec_sigma| real | default 0
 dec_sum_square   | double precision | default 0
 mag_u_count  | integer  | default 0
 mag_u| real | default 99
 mag_u_sum| double precision | default 0
 mag_u_sigma  | real | default 0
 mag_u_sum_square | double precision | default 0
 mag_b_count  | integer  | default 0
 mag_b| real | default 99
 mag_b_sum| double precision | default 0
 mag_b_sigma  | real | default 0
 mag_b_sum_square | double precision | default 0
 mag_v_count  | integer  | default 0
 mag_v| real | default 99
 mag_v_sum| double precision | default 0
 mag_v_sigma  | real | default 0
 mag_v_sum_square | double precision | default 0
 mag_r_count  | integer  | default 0
 mag_r| real | default 99
 mag_r_sum| double precision | default 0
 mag_r_sigma  | real | default 0
 mag_r_sum_square | double precision | default 0
 mag_i_count  | integer  | default 0
 mag_i| real | default 99
 mag_i_sum| double precision | default 0
 mag_i_sigma  | real | default 0
 mag_i_sum_square | double precision | default 0
Indexes:
catalog_pkey primary key, btree (star_id)
catalog_ra_decl_index gist (loc)


-- 
 21:44:49 up 6 days, 11 min,  2 users,  load average: 2.03, 2.17, 2.39
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


pgpDWj4ILoBx5.pgp
Description: PGP signature


Re: [PERFORM] This query is still running after 10 hours...

2004-09-28 Thread Robert Creager
When grilled further on (Tue, 28 Sep 2004 11:04:23 -0400),
Tom Lane [EMAIL PROTECTED] confessed:

 Robert Creager [EMAIL PROTECTED] writes:
  Normally, this query takes from 5 minutes to 2 hours to run.  On this
  update, it's been running for more than 10 hours.
 
  ...
  -  Nested Loop  (cost=250.69..129094.19 rows=77211 width=59)
-  Hash Join  (cost=250.69..307.34 rows=67 width=12)
  Hash Cond: (outer.pair_id = inner.pair_id)
  ...
 
 It chose a nested loop here because it was only expecting 67 rows out of
 the next-lower join, and so it thought it would only need 67 repetitions
 of the index probe into obs_v_file_id_index.  I'm suspicious that that
 estimate was way low and so the nestloop is taking forever.  You might
 try SET enable_nestloop = off as a crude way of avoiding that trap.

I tried your suggestion.  Did generate a different plan (below), but the
estimation is blown as it still used a nested loop. The query is currently
running(42 minutes so far). For the query in question, there are 151 different
pair_id's in the pairs table, which equates to 302 entries in the files table
(part of the query), which moves on to 533592 entries in the obs_v table and
533699 entries in the obs_i table.

The groups table has 76 total entries, files 9028, zero_pair 2532, color_groups
147. Only the obs_v and obs_i tables have data of any significant quantities
with 10M rows apiece.  The trigger hitting the catalog table (875499 entries) is
searching for single entries to match (one fire per obs_v/obs_i update) on an
index (took 54ms on the first query of a random id just now).

There is no significant disk activity (read 0), one CPU is pegged, and that
process is consuming 218M Resident memory, 168M Shared (10% available memory
total). All reasonable, except for the fact it doesn't come back...

Hash Join  (cost=100267870.17..100751247.13 rows=1578889 width=63)
  Hash Cond: (outer.star_id = inner.star_id)
  -  Seq Scan on obs_i i  (cost=0.00..213658.19 rows=10391319 width=8)
  -  Hash  (cost=100266886.39..100266886.39 rows=77113 width=59)
-  Hash Join  (cost=10307.51..100266886.39 rows=77113 width=59)
  Hash Cond: (outer.file_id = inner.file_id)
  -  Seq Scan on obs_v  (cost=0.00..213854.50 rows=10390650 width=5
1)  -  Hash  (cost=10307.34..10307.34 rows=67 width=12)
-  Hash Join  (cost=10250.69..10307.34 rows=67
width=12)  Hash Cond: (outer.pair_id =
inner.pair_id)  -  Seq Scan on zero_pair zp 
(cost=0.00..43.32 rows=2532 width=8)  -  Hash 
(cost=10250.40..10250.40 rows=118 width=12) 
  -  Hash Join  (cost=10004.80..10250.40 rows=118 width=12)
 Hash Cond: (outer.night_id = inner.night_id)   
  -  Seq Scan on files f  (cost=0.00..199.28
rows=9028 width=12)  -  Hash 
(cost=10004.80..10004.80rows=1 width=8) 
  -  Nested Loop  (cost=1.00..10004.80 rows=1 width=8) 
-  Seq Scan on color_groups cg 
(cost=0.00..2.84 rows=1 width=8)
   Filter: (175 = group_id) 
-  Seq Scan on groups g  (cost=0.00..1.95 rows=1 width=8)  
 Filter: (group_id = 175)



-- 
 20:48:23 up 5 days, 23:14,  2 users,  load average: 2.56, 2.91, 2.78
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


color.explain
Description: Binary data


pgp9Z2BOS5FKm.pgp
Description: PGP signature


[PERFORM] Speed up a function?CREATE TABLE readings ( when TIMESTAMP DEFAULT timeofday()::timestamp NOT NULL PRIMARY KEY, barometer FLOAT DEFAULT NULL,

2004-06-07 Thread Robert Creager

Hey All,

I've implemented a couple of functions ala date_trunc (listed at the bottom).  These
functions are executed every 5 minutes (date_trunc_minute) and every week
(date_trunc_week) across 16 different values.  The problem is that they take way
too long to execute (nearly 7x the 'regular' date_trunc function).  What might
be the best way to fix the problem?  Use a different function language?  Re-write
the functions?  Re-write the queries?  The gist of what I'm doing is filling
schema tables with weather summary information for the time period in question.

Currently I don't have indexes on these tables.  Would indexs on readings.when 
and minute.barometer.time be used with date_trunc?  Functional indexes maybe?

CREATE TABLE readings
   (
   when  TIMESTAMP DEFAULT timeofday()::timestamp
  NOT NULL PRIMARY KEY,
   barometer FLOAT DEFAULT NULL,
   );

CREATE SCHEMA minute;
CREATE TABLE minute.barometer
   (
   time   TIMESTAMP NOT NULL,
   min_reading  FLOAT NOT NULL,
   max_reading  FLOAT NOT NULL,
   avg_reading  FLOAT NOT NULL
   );

The hour schema is identical to the minute schema.

weather=# EXPLAIN ANALYZE
weather-# SELECT p.period, p.min, p.max, p.avg
weather-# FROM (SELECT date_trunc_minute( 'minute'::text, when ) AS period,
weather(#  min( barometer ), max( barometer ), avg( barometer )
weather(#   FROM readings
weather(#   WHERE barometer NOTNULL
weather(#   GROUP BY period) AS p
weather-# WHERE p.period
weather-#   NOT IN (SELECT time FROM minute.barometer )
weather-#   AND p.period != date_trunc_minute( 'minute'::text, now()::timestamp );
 QUERY 
PLAN  
   
-
---
 Subquery Scan p  (cost=1665.63..2282.47 rows=13708 width=32) (actual 
time=3318.758..3318.758 rows=0 loops=1)
   Filter: (NOT (hashed subplan))
   -  HashAggregate  (cost=1501.61..1775.76 rows=27415 width=16) (actual 
time=3227.409..3263.367 rows=13918 
loops=1)
 -  Seq Scan on readings  (cost=0.00..1227.46 rows=27415 width=16) (actual 
time=1.075..3028.673 rows
=69398 loops=1)
   Filter: ((barometer IS NOT NULL) AND (date_trunc_minute('minute'::text, 
when)  date_trunc_
minute('minute'::text, (now())::timestamp without time zone)))
   SubPlan
 -  Seq Scan on barometer  (cost=0.00..144.02 rows=8002 width=8) (actual 
time=0.008..15.406 rows=13918 l
oops=1)
 Total runtime: 3320.146 ms
(8 rows)

weather=# EXPLAIN ANALYZE
weather-# SELECT p.period, p.min, p.max, p.avg
weather-# FROM (SELECT date_trunc( 'hour'::text, when ) AS period,
weather(#  min( barometer ), max( barometer ), avg( barometer )
weather(#   FROM readings
weather(#   WHERE barometer NOTNULL
weather(#   GROUP BY period) AS p
weather-# WHERE p.period
weather-#   NOT IN (SELECT time FROM hour.barometer )
weather-#   AND p.period != date_trunc( 'hour'::text, now()::timestamp );
QUERY PLAN 
  
 
-
-
 Subquery Scan p  (cost=1524.11..2140.95 rows=13708 width=32) (actual 
time=551.516..551.516 rows=0 loops=1)
   Filter: (NOT (hashed subplan))
   -  HashAggregate  (cost=1501.61..1775.76 rows=27415 width=16) (actual 
time=544.859..547.605 rows=1173 loo
ps=1)
 -  Seq Scan on readings  (cost=0.00..1227.46 rows=27415 width=16) (actual 
time=0.596..399.344 rows=
69353 loops=1)
   Filter: ((barometer IS NOT NULL) AND (date_trunc('hour'::text, when) 
 date_trunc('hour'::t
ext, (now())::timestamp without time zone)))
   SubPlan
 -  Seq Scan on barometer  (cost=0.00..20.00 rows=1000 width=8) (actual 
time=0.007..1.268 rows=1173 loop
s=1)
 Total runtime: 552.137 ms


CREATE FUNCTION date_trunc_week( text, timestamp )
RETURNS timestamp AS '
DECLARE
   reading_time ALIAS FOR $2;
   year timestamp;
   dow integer;
   adjust text;
   week text;
BEGIN
   year := date_trunc( ''year''::text, reading_time );
   week := date_part( ''week'', reading_time ) - 1 || '' week'';
   dow := date_part( ''dow'', year );
   -- If the dow is less than Thursday, then the start week is last year
   IF dow = 4 THEN
  adjust := 1 - dow || '' day'';
   ELSE
  adjust := 8 - dow || '' day'';
   END IF;
   RETURN year + adjust::interval + week::interval;
END;
' LANGUAGE plpgsql IMMUTABLE STRICT;

CREATE OR REPLACE FUNCTION date_trunc_minute( text, timestamp )
RETURNS timestamp AS '
DECLARE
   reading_time ALIAS 

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Robert Creager
When grilled further on (Wed, 19 May 2004 21:20:20 -0400 (EDT)),
Bruce Momjian [EMAIL PROTECTED] confessed:

 
 Did we ever come to a conclusion about excessive SMP context switching
 under load?
 

I just figured out what was causing the problem on my system Monday.  I'm using
the pg_autovacuum daemon, and it was not vacuuming my db.  I've no idea why and
didn't get a chance to investigate.

This lack of vacuuming was causing a huge number of context switches and query
delays. the queries that normally take .1 seconds were taking 11 seconds, and
the context switches were averaging 160k/s, peaking at 190k/s

Unfortunately, I was under pressure to fix the db at the time so I didn't get a
chance to play with the patch.

I restarted the vacuum daemon, and will keep an eye on it to see if it behaves.

If the problem re-occurs, is it worth while to attempt the different patch
delay settings?

Cheers,
Rob

-- 
 19:45:40 up 21 days,  2:30,  4 users,  load average: 2.03, 2.09, 2.06
Linux 2.6.5-01 #7 SMP Fri Apr 16 22:45:31 MDT 2004


pgpmHt8DM7Wii.pgp
Description: PGP signature


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Robert Creager
When grilled further on (Wed, 19 May 2004 22:42:26 -0400),
Tom Lane [EMAIL PROTECTED] confessed:

 Robert Creager [EMAIL PROTECTED] writes:
  I just figured out what was causing the problem on my system Monday.
  I'm using the pg_autovacuum daemon, and it was not vacuuming my db.
 
 Do you have the post-7.4.2 datatype fixes for pg_autovacuum?

No.  I'm still running 7.4.1 w/associated contrib.  I guess an upgrade is in
order then.  I'm currently downloading 7.4.2 to see what the change is that I
need.  Is it just the 7.4.2 pg_autovacuum that is needed here?

I've caught a whiff that 7.4.3 is nearing release?  Any idea when?

Thanks,
Rob

-- 
 20:45:52 up 21 days,  3:30,  4 users,  load average: 2.02, 2.05, 2.05
Linux 2.6.5-01 #7 SMP Fri Apr 16 22:45:31 MDT 2004


pgp19AJ8EZHJX.pgp
Description: PGP signature


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-02 Thread Robert Creager

Found some co-workers at work yesterday to load up my library...

The sample period is 5 minutes long (vs 2 minutes previously):

Context switches -  avgmax

Default 7.4.1 code :   48784 107354
Default patch - 10 :   20400  28160
patch at 100   :   38574  85372
patch at 1000  :   41188 106569

The reading at 1000 was not produced under the same circumstances as the prior
readings as I had to replace my device under test with a simulated one.  The
real one died.

The previous run with smaller database and 120 second averages:

Context switches -  avgmax

Default 7.4.1 code :   10665  69470
Default patch - 10 :   17297  21929
patch at 100   :   26825  87073
patch at 1000  :   37580 110849

-- 
 20:13:50 up 3 days,  2:58,  4 users,  load average: 2.12, 2.14, 2.10
Linux 2.6.5-01 #7 SMP Fri Apr 16 22:45:31 MDT 2004


pgpGe9OZpz0nG.pgp
Description: PGP signature


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-30 Thread Robert Creager
When grilled further on (Thu, 29 Apr 2004 11:21:51 -0700),
Josh Berkus [EMAIL PROTECTED] confessed:

 spins_per_delay was not beneficial.   Instead, try increasing them, one step 
 at a time:
 
 (take baseline measurement at 100)
 250
 500
 1000
 1500
 2000
 3000
 5000
 
 ... until you find an optimal level.   Then report the results to us!
 

Some results.  The patch mentioned is what Dave Cramer posted to the Performance
list on 4/21.

A Perl script monitored vmstat 1 for 120 seconds and generated max and average
values.  Unfortunately, I am not present on site, so I cannot physically change
the device under test to increase the db load to where it hit about 10 days ago.
 That will have to wait till the 'real' work week on Monday.

Context switches -  avgmax

Default 7.4.1 code :   10665  69470
Default patch - 10 :   17297  21929
patch at 100   :   26825  87073
patch at 1000  :   37580 110849

Now granted, the db isn't showing the CS swap problem in a bad way (at all), but
should the numbers be trending the way they are with the patched code?  Or will
these numbers potentially change dramatically when I can load up the db?

And, presuming I can re-produce what I was seeing previously (200K CS/s), you
folks want me to carry on with more testing of the patch and report the results?
 Or just go away and be quiet...

The information is provided from a HP Proliant DL380 G3 with 2x 2.4 GHZ Xenon's
(with HT enabled) 2 GB ram, running 2.4.22-26mdkenterprise kernel, RAID
controller w/128 Mb battery backed cache RAID 1 on 2x 15K RPM drives for WAL
drive, RAID 0+1 on 4x 10K RPM drives for data.  The only job this box has is
running this db.

Cheers,
Rob

-- 
 21:54:48 up 2 days,  4:39,  4 users,  load average: 2.00, 2.03, 2.00
Linux 2.6.5-01 #7 SMP Fri Apr 16 22:45:31 MDT 2004


pgp88T6PR5F9b.pgp
Description: PGP signature


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-28 Thread Robert Creager
When grilled further on (Wed, 21 Apr 2004 10:29:43 -0700),
Josh Berkus [EMAIL PROTECTED] confessed:

 Dave,
 
  After some testing if you use the current head code for s_lock.c which
  has some mods in it to alleviate this situation, and change
  SPINS_PER_DELAY to 10 you can drastically reduce the cs with tom's test.
  I am seeing a slight degradation in throughput using pgbench -c 10 -t
  1000 but it might be liveable, considering the alternative is unbearable
  in some situations.
 
  Can anyone else replicate my results?
 
 Can you produce a patch against 7.4.1?   I'd like to test your fix against a 
 real-world database.

I would like to see the same, as I have a system that exhibits the same behavior
on a production db that's running 7.4.1.

Cheers,
Rob


-- 
 18:55:22 up  1:40,  4 users,  load average: 2.00, 2.04, 2.00
Linux 2.6.5-01 #7 SMP Fri Apr 16 22:45:31 MDT 2004


pgpEBmUUlf2Tx.pgp
Description: PGP signature


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Robert Creager
When grilled further on (Mon, 19 Apr 2004 20:53:09 -0400),
Tom Lane [EMAIL PROTECTED] confessed:

 I wrote:
  Here is a test case.
 
 Hmmm ... I've been able to reproduce the CS storm on a dual Athlon,
 which seems to pretty much let the Xeon per se off the hook.  Anybody
 got a multiple Opteron to try?  Totally non-Intel CPUs?
 
 It would be interesting to see results with non-Linux kernels, too.
 

Same problem on my dual AMD MP with 2.6.5 kernel using two sessions of your
test, but maybe not quite as severe. The highest CS values I saw was 102k, with
some non-db number crunching going on in parallel with the test.  'Average'
about 80k with two instances.  Using the anticipatory scheduler.

A single instance pulls in around 200-300 CS, and no tests running around
200-300 CS (i.e. no CS difference).

A snipet:

procs ---memory-- ---swap-- -io --system-- cpu
 3  0284  90624  93452 145374000 0 0 1075 76548 83 17  0  0
 6  0284 125312  93452 147019600 0 0 1073 87702 78 22  0  0
 3  0284 178392  93460 14202080076   298 1083 67721 77 24  0  0
 4  0284 177120  93460 142150000  1104 0 1054 89593 80 21  0  0
 5  0284 173504  93460 142517200  3584 0 1110 65536 81 19  0  0
 4  0284 169984  93460 142870800  3456 0 1098 66937 81 20  0  0
 6  0284 170944  93460 142870800 8 0 1045 66065 81 19  0  0
 6  0284 167288  93460 142877600 0 8 1097 75560 81 19  0  0
 6  0284 136296  93460 145835600 0 0 1036 80808 75 26  0  0
 5  0284 132864  93460 146168800 0 0 1007 76071 84 17  0  0
 4  0284 132880  93460 146168800 0 0 1079 86903 82 18  0  0
 5  0284 132880  93460 146168800 0 0 1078 79885 83 17  0  0
 6  0284 132648  93460 146168800 0   760 1228 66564 86 14  0  0
 6  0284 132648  93460 146168800 0 0 1047 69741 86 15  0  0
 6  0284 132672  93460 146168800 0 0 1057 79052 84 16  0  0
 5  0284 132672  93460 146168800 0 0 1054 81109 82 18  0  0
 5  0284 132736  93460 146168800 0 0 1043 91725 80 20  0  0


Cheers,
Rob

-- 
 21:33:03 up 3 days,  1:10,  3 users,  load average: 5.05, 4.67, 4.22
Linux 2.6.5-01 #5 SMP Tue Apr 6 21:32:39 MDT 2004


pgp0.pgp
Description: PGP signature


Re: [PERFORM] failures on machines using jfs

2004-01-09 Thread Robert Creager
When grilled further on (Wed, 7 Jan 2004 18:06:08 -0500),
Andrew Sullivan [EMAIL PROTECTED] confessed:

 
 We have lately had a couple of cases where machines either locked up,
 slowed down to the point of complete unusability, or died completely
 while using jfs.  We are _not_ sure that jfs is in fact the culprit. 
 In one case, a kernel panic appeared to be referring to the jfs
 kernel module, but I can't be sure as I lost the output immediately
 thereafter.  Yesterday, we had a problem of data corruption on a
 failed jfs volume.
 
 None of this is to say that jfs is in fact to blame, nor even that,
 if it is, it does not have something to do with the age of our
 installations, c. (these are all RH 8).  In fact, I suspect hardware
 in both cases.  But I thought I'd mention it just in case other
 people are seeing strange behaviour, on the principle of better
 safe than sorry.
 

Interestingly enough, I'm using JFS on a new scsi disk with Mandrake 9.1 and
was having similar problems.  I was generating heavy disk usage through database
and astronomical data reductions.  My machine (dual AMD) would suddenly hang. 
No new jobs would run, just increase the load, until I reboot the machine.

I solved my problems by creating a 128Mb ram disk (using EXT2) for the temp
data produced my reduction runs.

I believe JFS was to blame, not hardware, but you never know...

Cheers,
Rob

-- 
 20:22:27 up 12 days, 10:13,  4 users,  load average: 2.00, 2.01, 2.03


pgp0.pgp
Description: PGP signature


Re: [PERFORM] Hardware performance

2003-07-17 Thread Robert Creager
On Thu, 17 Jul 2003 16:20:42 +0100
Adam Witney [EMAIL PROTECTED] said something like:

 
 Actually I am going through the same questions myself at the
 moment I would like to have a 2 disk RAID1 and a 4 disk RAID5, so
 need at least 6 disks
 
 Anybody have any suggestions or experience with other hardware
 manufacturers for this size of setup? (2U rack, up to 6 disks, 2
 processors, ~2GB RAM, if possible)
 

We recently bought a couple of Compaq Proliant DL380 units.  They are
2u, and support 6 disks, 2 CPU's, 12Gb max.

We purchased 2 units of 1CPU, 4x72Gb RAID 0+1, 1Gb mem, redundant fans
and power supplies for around $11,000 total.  Unfortunately they are
running Win2K with SQLAnywhere (ClearQuest/Web server) ;-)  So far (5
months), they're real board...

Cheers,
Rob

-- 
 21:16:04 up  1:19,  1 user,  load average: 2.04, 1.99, 1.38


pgp0.pgp
Description: PGP signature