Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-26 Thread Shoaib Burq (VPAC)
OK ... so just to clearify...  (and pardon my ignorance):

I need to increase the value of 'default_statistics_target' variable and 
then run VACUUM ANALYZE, right? If so what should I choose for the 
'default_statistics_target'?

BTW I only don't do any sub-selection on the View.

I have attached the view in question and the output of:
SELECT oid , relname, relpages, reltuples 
FROM pg_class ORDER BY relpages DESC;

reg
shoaib

On Sat, 23 Apr 2005, Tom Lane wrote:

 John A Meinel [EMAIL PROTECTED] writes:
  Actually, you probably don't want enable_seqscan=off, you should try:
  SET enable_nestloop TO off.
  The problem is that it is estimating there will only be 44 rows, but in
  reality there are 13M rows. It almost definitely should be doing a
  seqscan with a sort and merge join.
 
 Not nestloops anyway.
 
  I don't understand how postgres could get the number of rows that wrong.
 
 No stats, or out-of-date stats is the most likely bet.
 
  I can't figure out exactly what is where from the formatting, but the query 
  that seems misestimated is:
  -  Index Scan using IX_ClimateId on ClimateChangeModel40  
  (cost=0.00..1063711.75 rows=265528 width=20) (actual time=28.311..17212.703 
  rows=13276368 loops=1)
  Index Cond: (outer.ClimateId = ClimateChangeModel40.ClimateId)
 
 Yeah, that's what jumped out at me too.  It's not the full explanation
 for the join number being so far off, but this one at least you have a
 chance to fix by updating the stats on ClimateChangeModel40.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
 

-- 
Shoaib Burq
--
VPAC - Geospatial Applications Developer
Building 91, 110 Victoria Street, 
Carlton South, Vic 3053, Australia
___
w: www.vpac.org  | e: sab_AT_vpac_DOT_org | mob: +61.431-850039


oid| relname | relpages |  reltuples  
---+-+--+-
 16996 | CurrentAusClimate   |   474551 | 8.06736e+07
 16983 | ClimateChangeModel40|   338252 | 5.31055e+07
 157821816 | PK_CurrentAusClimate|   265628 | 8.06736e+07
 157835995 | idx_climateid   |   176645 | 8.06736e+07
 157835996 | idx_ausposnum   |   176645 | 8.06736e+07
 157835997 | idx_climatevalue|   176645 | 8.06736e+07
 157821808 | PK_ClimateModelChange_40|   174858 | 5.31055e+07
 157821788 | IX_iMonth001|   116280 | 5.31055e+07
 157821787 | IX_ClimateId|   116280 | 5.31055e+07
 157821786 | IX_AusPosNumber |   116280 | 5.31055e+07
 17034 | NeighbourhoodTable  |54312 | 1.00476e+07
 157821854 | PK_NeighbourhoodTable   |27552 | 1.00476e+07
 157821801 | IX_NeighbourhoodId  |22002 | 1.00476e+07
 157821800 | IX_NAusPosNumber|22002 | 1.00476e+07
 157821799 | IX_AusPosNumber006  |22002 | 1.00476e+07
 17012 | FutureEvapMonth |12026 | 1.10636e+06
 17014 | FutureMaxTMonth |12026 | 1.10636e+06
 17016 | FutureMinTMonth |12026 | 1.10636e+06
 17018 | FutureRainMonth |12026 | 1.10636e+06
 17000 | CurrentEvapMonth| 8239 | 1.12047e+06
 17002 | CurrentMaxTMonth| 8239 | 1.12047e+06
 17004 | CurrentMinTMonth| 8239 | 1.12047e+06
 17006 | CurrentR_RMonth | 8239 | 1.12047e+06
 17008 | CurrentRadMonth | 8239 | 1.12047e+06
 17010 | CurrentRainMonth| 8239 | 1.12047e+06
 16977 | Aus40_DEM   | 6591 | 1.12047e+06
 16979 | Aus40DemRandom  | 6057 | 1.12047e+06
 16981 | ClimateChange   | 3752 |  543984
 157821780 | IX_Random   | 3075 | 1.12047e+06
 157821832 | PK_FutureEvapMonth  | 3036 | 1.10636e+06
 157821834 | PK_FutureMaxTMonth  | 3036 | 1.10636e+06
 157821836 | PK_FutureMinTMonth  | 3036 | 1.10636e+06
 157821838 | PK_FutureRainMonth  | 3036 | 1.10636e+06
 157821804 | PK_Aus40DemRandom   | 2456 | 1.12047e+06
 157821802 | PK_Aus40_DEM| 2456 | 1.12047e+06
 157821820 | PK_CurrentEvapMonth | 2456 | 1.12047e+06
 157821822 | PK_CurrentMaxTMonth | 2456 | 1.12047e+06
 157821824 | PK_CurrentMinTMonth | 2456 | 1.12047e+06
 157821826 | PK_CurrentR_RMonth  | 2456 | 1.12047e+06
 157821828 | PK_CurrentRadMonth  | 2456 | 1.12047e+06
 157821830 | 

Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-26 Thread Dave Held
 -Original Message-
 From: Shoaib Burq (VPAC) [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, April 26, 2005 9:31 AM
 To: Tom Lane
 Cc: John A Meinel; Russell Smith; Jeff; 
 pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] two queries and dual cpu (perplexed)
 
 
 OK ... so just to clearify...  (and pardon my ignorance):
 
 I need to increase the value of 'default_statistics_target' 
 variable and then run VACUUM ANALYZE, right?

Not necessarily.  You can set the statistics for a single
column with ALTER TABLE.

 If so what should I choose for the 'default_statistics_target'?
 [...]

Since you have a decently large table, go for the max setting
which is 1000.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-26 Thread John A Meinel
Shoaib Burq (VPAC) wrote:
OK ... so just to clearify...  (and pardon my ignorance):
I need to increase the value of 'default_statistics_target' variable and
then run VACUUM ANALYZE, right? If so what should I choose for the
'default_statistics_target'?
BTW I only don't do any sub-selection on the View.
I have attached the view in question and the output of:
SELECT oid , relname, relpages, reltuples
FROM pg_class ORDER BY relpages DESC;
reg
shoaib
Actually, you only need to alter the statistics for that particular
column, not for all columns in the db.
What you want to do is:
ALTER TABLE ClimateChangeModel40
ALTER COLUMN whatever the column is
SET STATISTICS 100;
VACUUM ANALYZE ClimateChangeModel40;
The column is just the column that you have the IX_ClimateId index on,
I don't know which one that is.
The statistics value ranges from 1 - 1000, the default being 10, and for
indexed columns you are likely to want somewhere between 100-200.
If you set it to 100 and the planner is still mis-estimating the number
of rows, try 200, etc.
The reason to keep the number low is because with a high number the
planner has to spend more time planning. But especially for queries like
this one, you'd rather the query planner spent a little bit more time
planning, and got the right plan.
John
=:-


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-23 Thread Tom Lane
John A Meinel [EMAIL PROTECTED] writes:
 Actually, you probably don't want enable_seqscan=off, you should try:
 SET enable_nestloop TO off.
 The problem is that it is estimating there will only be 44 rows, but in
 reality there are 13M rows. It almost definitely should be doing a
 seqscan with a sort and merge join.

Not nestloops anyway.

 I don't understand how postgres could get the number of rows that wrong.

No stats, or out-of-date stats is the most likely bet.

 I can't figure out exactly what is where from the formatting, but the query 
 that seems misestimated is:
 -  Index Scan using IX_ClimateId on ClimateChangeModel40  
 (cost=0.00..1063711.75 rows=265528 width=20) (actual time=28.311..17212.703 
 rows=13276368 loops=1)
 Index Cond: (outer.ClimateId = ClimateChangeModel40.ClimateId)

Yeah, that's what jumped out at me too.  It's not the full explanation
for the join number being so far off, but this one at least you have a
chance to fix by updating the stats on ClimateChangeModel40.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-22 Thread Jeff
On Apr 21, 2005, at 11:33 PM, Shoaib Burq (VPAC) wrote:
BTW I guess should mention that I am doing the select count(*) on a 
View.

A bit of a silly question...
but are you actually selecting all the rows from this query in 
production or would it be more selective?  ie select * from bigslowview 
where bah = 'snort'?


Ran the Explain analyse with the nestedloop disabled but it was taking
forever... and killed it after 30mins.
If it takes too long you can run just plain explain (no analyze) and it 
will show you the plan.  This is nearly always instant... it'll give 
you a clue as to if your setting changes did anything.

You may need to end up breaking some parts of this up into subqueries.  
I've had to do this before.  I had one query that just ran too dang 
slow as a join so I modified it into a subquery type deal.  Worked 
great.  However since you are selecting ALL rows I doubt that will help 
much.

Another option may be to use materialized views.  Not sure how 
dynamic your data model is. It could help.

--
Jeff Trout [EMAIL PROTECTED]
http://www.jefftrout.com/
http://www.stuarthamm.net/
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-22 Thread Daniel Schuchardt
Shoaib Burq (VPAC) schrieb:
Hi everybody,
One of our clients was using SQL-Server and decided to switch to
PostgreSQL 8.0.1.
Hardware: Dual processor  Intel(R) Xeon(TM) CPU 3.40GHz
OS: Enterprise Linux with 2.6.9-5 SMP kernel
Filesystem: ext3
SHMMAX: $ cat  /proc/sys/kernel/shmmax
6442450944 --- beleive that's ~6.5 GB, total ram is 8GB
Database: 15GB in size with a few tables with over 80 million rows.
Here is a snippit from the output of 
SELECT oid , relname, relpages, reltuples 
   FROM pg_class ORDER BY relpages DESC;
   oid| relname | relpages |  reltuples  
---+-+--+-
16996 | CurrentAusClimate   |   474551 | 8.06736e+07
16983 | ClimateChangeModel40|   338252 | 5.31055e+07
157821816 | PK_CurrentAusClimate|   265628 | 8.06736e+07
157835995 | idx_climateid   |   176645 | 8.06736e+07
157835996 | idx_ausposnum   |   176645 | 8.06736e+07
157835997 | idx_climatevalue 		 |   176645 | 8.06736e+07
157821808 | PK_ClimateModelChange_40|   174858 | 5.31055e+07
157821788 | IX_iMonth001|   116280 | 5.31055e+07
157821787 | IX_ClimateId|   116280 | 5.31055e+07
157821786 | IX_AusPosNumber |   116280 | 5.31055e+07
17034 | NeighbourhoodTable  |54312 | 1.00476e+07
157821854 | PK_NeighbourhoodTable   |27552 | 1.00476e+07
157821801 | IX_NeighbourhoodId  |22002 | 1.00476e+07
157821800 | IX_NAusPosNumber|22002 | 1.00476e+07
157821799 | IX_AusPosNumber006  |22002 | 1.00476e+07
[...]

To test the performance of the database we ran one of the most demanding
queries that exist with the following embarrassing results:
Query Execution time on:
SQL-Server (dual processor xeon)  3min 11sec
PostgreSQL (SMP IBM Linux server) 5min 30sec
Now I have not touch the $PGDATA/postgresql.conf (As I know very little 
about memory tuning) Have run VACCUM  ANALYZE.

The client understands that they may not match the performance for a
single query as there is no multithreading. So they asked me to
demonstrate the benefits of Postgresql's multiprocessing capabilities.
To do that I modified the most demanding query to create a second query
and ran them in parallel:
$ time ./run_test1.sh
$ cat ./run_test1.sh
/usr/bin/time -p psql -f ./q1.sql ausclimate  q1.out 2q1.time 
/usr/bin/time -p psql -f ./q2.sql ausclimate  q2.out 2q2.time
and the time taken is *twice* that for the original. The modification was 
minor. The queries do make use of both CPUs:

2388 postgres 16 0 79640 15m 11m R 80.9 0.2 5:05.81 postmaster
2389 postgres 16 0 79640 15m 11m R 66.2 0.2 5:04.25 postmaster
But I can't understand why there's no performance improvement and infact
there seems to be no benefit of multiprocessing.  Any ideas? I don't know
enough about the locking procedures employed by postgres but one would
think this shouldn't be and issue with read-only queries.
Please don't hesitate to ask me for more info like, the query or the
output of explain, or stats on memory usage. I just wanted to keep this 
short and provide more info as the cogs start turning :-)

Thanks  Regards
Shoaib

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
 

I think you should post the SQL-Statement and EXPLAIN ANALYSE - Output 
here to get a usefull awnser.
(EXPLAIN ANALYSE SELECT * FROM x WHERE ---)

Daniel
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-22 Thread Kenneth Marshall
On Thu, Apr 21, 2005 at 08:24:15AM -0400, Jeff wrote:
 
 On Apr 21, 2005, at 7:49 AM, Shoaib Burq (VPAC) wrote:
 
 Now I have not touch the $PGDATA/postgresql.conf (As I know very little
 about memory tuning) Have run VACCUM  ANALYZE.
 
 You should really, really bump up shared_buffers and given you have 8GB 
 of ram this query would likely benefit from more work_mem.
 
 and the time taken is *twice* that for the original. The modification 
 was
 minor. The queries do make use of both CPUs:
 
 Is this an IO intensive query?  If running both in parellel results in 
 2x the run time and you have sufficient cpus it would (to me) indicate 
 you don't have enough IO bandwidth to satisfy the query.
 

I would add to Jeff's comments, that the default configuration parameters
are fairly-to-very conservative which tends to produce plans with more I/O.
Bumping your shared_buffers, work_mem, and effective_cache_size should
allow the planner to favor plans that utilize more memory but require
less I/O. Also, with small amounts of work_mem, hash joins cannot be
used and the planner will resort to nested loops.

Ken

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] two queries and dual cpu (perplexed)

2005-04-21 Thread Shoaib Burq (VPAC)
Hi everybody,

One of our clients was using SQL-Server and decided to switch to
PostgreSQL 8.0.1.

Hardware: Dual processor  Intel(R) Xeon(TM) CPU 3.40GHz
OS: Enterprise Linux with 2.6.9-5 SMP kernel
Filesystem: ext3
SHMMAX: $ cat  /proc/sys/kernel/shmmax
6442450944 --- beleive that's ~6.5 GB, total ram is 8GB
Database: 15GB in size with a few tables with over 80 million rows.

Here is a snippit from the output of 
SELECT oid , relname, relpages, reltuples 
FROM pg_class ORDER BY relpages DESC;
oid| relname | relpages |  reltuples  
---+-+--+-
 16996 | CurrentAusClimate   |   474551 | 8.06736e+07
 16983 | ClimateChangeModel40|   338252 | 5.31055e+07
 157821816 | PK_CurrentAusClimate|   265628 | 8.06736e+07
 157835995 | idx_climateid   |   176645 | 8.06736e+07
 157835996 | idx_ausposnum   |   176645 | 8.06736e+07
 157835997 | idx_climatevalue|   176645 | 8.06736e+07
 157821808 | PK_ClimateModelChange_40|   174858 | 5.31055e+07
 157821788 | IX_iMonth001|   116280 | 5.31055e+07
 157821787 | IX_ClimateId|   116280 | 5.31055e+07
 157821786 | IX_AusPosNumber |   116280 | 5.31055e+07
 17034 | NeighbourhoodTable  |54312 | 1.00476e+07
 157821854 | PK_NeighbourhoodTable   |27552 | 1.00476e+07
 157821801 | IX_NeighbourhoodId  |22002 | 1.00476e+07
 157821800 | IX_NAusPosNumber|22002 | 1.00476e+07
 157821799 | IX_AusPosNumber006  |22002 | 1.00476e+07
[...]

To test the performance of the database we ran one of the most demanding
queries that exist with the following embarrassing results:

Query Execution time on:
SQL-Server (dual processor xeon)  3min 11sec
PostgreSQL (SMP IBM Linux server) 5min 30sec

Now I have not touch the $PGDATA/postgresql.conf (As I know very little 
about memory tuning) Have run VACCUM  ANALYZE.

The client understands that they may not match the performance for a
single query as there is no multithreading. So they asked me to
demonstrate the benefits of Postgresql's multiprocessing capabilities.

To do that I modified the most demanding query to create a second query
and ran them in parallel:

$ time ./run_test1.sh
$ cat ./run_test1.sh
/usr/bin/time -p psql -f ./q1.sql ausclimate  q1.out 2q1.time 
/usr/bin/time -p psql -f ./q2.sql ausclimate  q2.out 2q2.time

and the time taken is *twice* that for the original. The modification was 
minor. The queries do make use of both CPUs:

 2388 postgres 16 0 79640 15m 11m R 80.9 0.2 5:05.81 postmaster
 2389 postgres 16 0 79640 15m 11m R 66.2 0.2 5:04.25 postmaster

But I can't understand why there's no performance improvement and infact
there seems to be no benefit of multiprocessing.  Any ideas? I don't know
enough about the locking procedures employed by postgres but one would
think this shouldn't be and issue with read-only queries.

Please don't hesitate to ask me for more info like, the query or the
output of explain, or stats on memory usage. I just wanted to keep this 
short and provide more info as the cogs start turning :-)

Thanks  Regards
Shoaib



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-21 Thread Jeff
On Apr 21, 2005, at 7:49 AM, Shoaib Burq (VPAC) wrote:
Now I have not touch the $PGDATA/postgresql.conf (As I know very little
about memory tuning) Have run VACCUM  ANALYZE.
You should really, really bump up shared_buffers and given you have 8GB 
of ram this query would likely benefit from more work_mem.

and the time taken is *twice* that for the original. The modification 
was
minor. The queries do make use of both CPUs:

Is this an IO intensive query?  If running both in parellel results in 
2x the run time and you have sufficient cpus it would (to me) indicate 
you don't have enough IO bandwidth to satisfy the query.

Can we see an explain analyze of the query?  Could be a bad plan and a 
bad plan will never give good performance.

--
Jeff Trout [EMAIL PROTECTED]
http://www.jefftrout.com/
http://www.stuarthamm.net/
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-21 Thread Gavin Sherry
On Thu, 21 Apr 2005, Jeff wrote:


 On Apr 21, 2005, at 7:49 AM, Shoaib Burq (VPAC) wrote:

  Now I have not touch the $PGDATA/postgresql.conf (As I know very little
  about memory tuning) Have run VACCUM  ANALYZE.
 
 You should really, really bump up shared_buffers and given you have 8GB
 of ram this query would likely benefit from more work_mem.

I'd recommend shared_buffers = 10600. Its possible that work_mem in the
hundreds of megabytes might have a good impact, but its hard to say
without seeing the EXPLAIN ANALYZE output.

Gavin


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-21 Thread Shoaib Burq (VPAC)

here's explain sorry about the mess: I can attach it as text-file if you 
like.

ausclimate=# explain ANALYZE  select count(*) from getfutureausclimate;



   
QUERY PLAN  
 

 Aggregate  (cost=1069345.85..1069345.85 rows=1 width=0) (actual 
time=443241.241..443241.242 rows=1 loops=1)
   -  Subquery Scan getfutureausclimate  (cost=1069345.61..1069345.81 
rows=16 width=0) (actual time=411449.034..436165.259 rows=13276368 
loops=1)
 -  Sort  (cost=1069345.61..1069345.65 rows=16 width=58) (actual 
time=411449.026..426001.199 rows=13276368 loops=1)
   Sort Key: Aus40_DEM.AusPosNumber, 
CurrentAusClimate.iMonth
   -  Nested Loop  (cost=2.19..1069345.29 rows=16 width=58) 
(actual time=135.390..366902.373 rows=13276368 loops=1)
 -  Nested Loop  (cost=2.19..1067304.07 rows=44 
width=68) (actual time=107.627..186390.137 rows=13276368 loops=1)
   -  Nested Loop  (cost=2.19..1067038.94 rows=44 
width=52) (actual time=87.255..49743.796 rows=13276368 loops=1)
 -  Nested Loop  (cost=2.19..8.09 rows=1 
width=32) (actual time=52.684..52.695 rows=1 loops=1)
   -  Merge Join  (cost=2.19..2.24 
rows=1 width=24) (actual time=28.000..28.007 rows=1 loops=1)
 Merge Cond: 
(outer.ClimateId = inner.ClimateId)
 -  Sort  (cost=1.17..1.19 
rows=7 width=10) (actual time=10.306..10.307 rows=3 loops=1)
   Sort Key: 
ClimateVariables.ClimateId
   -  Seq Scan on 
ClimateVariables  (cost=0.00..1.07 rows=7 width=10) (actual 
time=10.277..10.286 rows=7 loops=1)
 -  Sort  (cost=1.02..1.02 
rows=1 width=14) (actual time=17.679..17.680 rows=1 loops=1)
   Sort Key: 
GetFutureClimateParameters.ClimateId
   -  Seq Scan on 
GetFutureClimateParameters  (cost=0.00..1.01 rows=1 width=14) (actual 
time=17.669..17.671 rows=1 loops=1)
   -  Index Scan using 
PK_ScenarioEmissionLevels on ScenarioEmissionLevels  (cost=0.00..5.83 
rows=1 width=18) (actual time=24.676..24.679 rows=1 loops=1)
 Index Cond: 
((ScenarioEmissionLevels.ScenarioId = outer.ScenarioId) AND 
(ScenarioEmissionLevels.iYear = outer.iYear) AND 
(ScenarioEmissionLevels.LevelId = outer.LevelId))
 -  Index Scan using IX_ClimateId on 
ClimateChangeModel40  (cost=0.00..1063711.75 rows=265528 width=20) 
(actual time=34.564..19435.855 rows=13276368 loops=1)
   Index Cond: (outer.ClimateId = 
ClimateChangeModel40.ClimateId)
   -  Index Scan using PK_Aus40_DEM on 
Aus40_DEM  (cost=0.00..6.01 rows=1 width=16) (actual time=0.005..0.006 
rows=1 loops=13276368)
 Index Cond: (outer.AusPosNumber = 
Aus40_DEM.AusPosNumber)
 -  Index Scan using PK_CurrentAusClimate on 
CurrentAusClimate  (cost=0.00..46.20 rows=11 width=14) (actual 
time=0.007..0.009 rows=1 loops=13276368)
   Index Cond: ((CurrentAusClimate.ClimateId = 
outer.ClimateId) AND (outer.AusPosNumber = 
CurrentAusClimate.AusPosNum) AND (CurrentAusClimate.iMonth = 
outer.iMonth))
 Total runtime: 443983.269 ms
(25 rows)


Shsshh...

 You should really, really bump up shared_buffers and given you have 8GB 
 of ram this query would likely benefit from more work_mem.

I actually tried that and there was a decrease in performance. Are the
shared_buffers and work_mem the only things I should change to start with? 
If so what's the reasoning.


 Is this an IO intensive query?  If running both in parellel results in 
 2x the run time and you have sufficient cpus it would (to me) indicate 
 you don't have enough IO bandwidth to satisfy the query.

Yes I think so too: ... I am just compiling some io stats...

Also will jump on to irc...

 
Whoa! thanks all... I am overwhelmed with the help I am getting... I love 
it!



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-21 Thread Shoaib Burq (VPAC)

here are some i/o stats with the unchanged postgresql.conf. Gonna change
it now and have another go.


[EMAIL PROTECTED] MultiCPU_test]$ vmstat 10
procs ---memory-- ---swap-- -io --system-- 
cpu
 r  b   swpd   free   buff  cache   si   sobibo   incs us sy 
id wa
 0  0  25808 710356 6348860 9720522473291 3  1  0 
99  0
 2  0  25808 647636 6348960 103478400  3226  3048 1054 92819 55 19 
25  1
 2  0  25808 585684 6349032 109666000  3203  3057 1053 96375 55 19 
25  1
 2  0  25808 521940 6349112 116036400  3388  2970 1052 95563 54 19 
26  1
 2  0  25808 463636 6349184 121856800  2804  3037 1048 93560 55 19 
25  1
 2  0  25808 405460 6349264 127669600  2794  3047 1046 96971 55 19 
25  1
 2  0  25808 343956 6349340 133816000  3151  3040 1049 96629 55 20 
25  1
 2  0  25808 287252 6349412 139473200  2666  2990 1045 95173 54 20 
25  1
 2  0  25808 230804 6349484 145116800  2678  2966 1044 95577 54 19 
26  1
 2  0  25808 169428 6349560 151242800  3164  3015 1048 98451 55 19 
25  1
 2  0  25808 110484 6349640 157130400  2910  2970 1050 98214 55 20 
25  0
 0  0  25808  50260 6349716 163140800  3049  3015 1049 99830 55 20 
25  1
 1  0  25808   8512 6349788 167315600  2934  2959 1047 95940 54 19 
24  3
 2  1  25808   8768 6349796 167294400  2552  2984 1043 97893 55 19 
18  8
 1  1  25808   8384 6349824 167325600  2596  3032 1051 94646 55 19 
19  6
 2  1  25808   8960 6349856 167268000  2982  3028 1052 94486 55 20 
19  6
 1  1  25808   8960 6349884 167258400  3125  2919 1052 86969 52 20 
19  8
 2  0  25808   6196 6349912 167527600  2809  3064 1046 99147 55 20 
19  5
 1  1  25808   9216 6349976 167215200  2898  3076 1047 93271 55 19 
21  6
 2  0  25808   6580 6349316 166397200  3150  2982 1048 94964 54 22 
20  4
 2  0  25808   7692 6349348 167448000  2742  3006 1045 97488 54 21 
21  4
 2  1  25808   8232 6346244 167670000  2900  3022 1048 92496 54 20 
19  8
 2  0  25808   7104 6346192 167804400  3284  2958 1057 97265 55 20 
18  7
 2  0  25808   8488 6346168 167677600  2609  3031 1047 93965 55 19 
20  7
 2  1  25808   8680 6346184 167648800  3067  3044 1051 96594 55 19 
19  6
 2  0  25808   8576 6346168 167664000  2900  3070 1047 96300 55 19 
20  6
 2  1  25808   9152 6346156 167617600  3010  2993 1049 98271 55 20 
19  7
 2  0  25808   7040 6346172 167820000  3242  3034 1050 97669 55 20 
21  4
 1  1  25808   8900 6346192 167634400  2859  3014 1052 91220 53 19 
21  6
 2  1  25808   8512 6346188 167682400  2737  2960 1049 100609 55 
20 18  6
 2  0  25808   7204 6346236 167800000  2972  3045 1050 94851 55 19 
17  9
 1  0  25808   7116 6346208 167802800  3053  2996 1048 98901 55 19 
20  5
 2  1  25808   9180 6346196 167606800  2857  3067 1047 100629 56 
21 20  3
procs ---memory-- ---swap-- -io --system-- 
cpu
 r  b   swpd   free   buff  cache   si   sobibo   incs us sy 
id wa
 3  1  25808   8896 6346172 167650000  3138  3022 1049 97937 55 20 
20  5
 2  1  25808   9088 6346188 167621200  2844  3022 1047 97664 55 19 
20  5
 1  1  25808   8920 6346248 167628800  3017  3024 1049 99644 55 20 
17  7
 1  1  25808   8064 6346116 167716800  2824  3037 1047 99171 55 20 
19  5
 2  1  25820   8472 6344336 167859600  2969  2957 1047 96396 54 21 
18  7
 2  1  25820   9208 6344300 167788400  3072  3031 1050 95017 54 19 
22  5
 1  0  25820   7848 6344328 167914800  3229  3011 1050 97108 55 19 
20  5
 2  1  25820   8960 6344348 167804000  2701  2954 1046 98485 54 20 
21  5
 2  0  25820   7900 6344368 167924400  2604  2931 1044 97198 54 20 
19  7
 2  0  25820   9240 6344424 167789600  2990  3015 1048 102414 56 
20 19  5
 2  0  25820   8924 6344436 167808800  3256  2991 1049 96709 55 19 
21  5
 1  1  25820   8900 6344456 167820400  2761  3030 1051 96498 55 20 
20  5
 2  0  25820   7628 630 167944400  2952  3012 1053 96534 55 20 
19  6
 2  0  25820   7080 6344472 167995600  2848  3079 1050 95074 56 19 
19  6
 2  0  25820   8928 634 167808000  2985  3021 1049 96806 55 20 
18  7
 2  1  25820   7976 6344976 1676892   110  3429  3062 1083 92817 55 19 
18  8
 2  0  25820   8096 6345080 167665200  2662  2989 1056 91921 54 19 
17 10
 1  0  25820   7424 6345128 167735200  2956  3029 1054 99385 56 19 
20  5
 2  0  25820   6664 6345232 167772400  3358  3030 1064 95929 55 19 
21  5
 1  0  25820   7268 6345320 167695600  2681  3012 1082 97744 54 20 
18  7
 2  0  25820   6944 6345364 167718400  3156  3022 1061 98055 55 19 
22  4
 2  0  25820   8668 6345420 167542800  2990  3018 1050 94734 55 19 
22  5
 2  1  25820   

Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-21 Thread Shoaib Burq (VPAC)

 Is this an IO intensive query?  If running both in parellel results in 
 2x the run time and you have sufficient cpus it would (to me) indicate 
 you don't have enough IO bandwidth to satisfy the query.

any tips on how to verify this?


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-21 Thread Shoaib Burq (VPAC)

Just tried it with the following changes:

shared_buffers = 10600
work_mem = 102400
enable_seqscan = false

still no improvement

Ok here's the Plan with the enable_seqscan = false:
ausclimate=# explain ANALYZE  select count(*) from getfutureausclimate;

   
QUERY PLAN  
 

 Aggregate  (cost=101069350.74..101069350.74 rows=1 width=0) (actual 
time=461651.787..461651.787 rows=1 loops=1)
   -  Subquery Scan getfutureausclimate  (cost=101069350.50..101069350.70 
rows=16 width=0) (actual time=426142.382..454571.397 rows=13276368 
loops=1)
 -  Sort  (cost=101069350.50..101069350.54 rows=16 width=58) 
(actual time=426142.375..28.278 rows=13276368 loops=1)
   Sort Key: Aus40_DEM.AusPosNumber, 
CurrentAusClimate.iMonth
   -  Nested Loop  (cost=10001.02..101069350.18 rows=16 
width=58) (actual time=72.740..366588.646 rows=13276368 loops=1)
 -  Nested Loop  (cost=10001.02..101067308.96 
rows=44 width=68) (actual time=35.788..184032.873 rows=13276368 loops=1)
   -  Nested Loop  
(cost=10001.02..101067043.83 rows=44 width=52) (actual 
time=35.753..47971.652 rows=13276368 loops=1)
 -  Nested Loop  
(cost=10001.02..10012.98 rows=1 width=32) (actual 
time=7.433..7.446 rows=1 loops=1)
   -  Merge Join  
(cost=10001.02..10007.13 rows=1 width=24) (actual 
time=7.403..7.412 rows=1 loops=1)
 Merge Cond: 
(outer.ClimateId = inner.ClimateId)
 -  Index Scan using 
PK_ClimateVariables on ClimateVariables  (cost=0.00..6.08 rows=7 
width=10) (actual time=0.011..0.015 rows=3 loops=1)
 -  Sort  
(cost=10001.02..10001.03 rows=1 width=14) (actual 
time=7.374..7.375 rows=1 loops=1)
   Sort Key: 
GetFutureClimateParameters.ClimateId
   -  Seq Scan on 
GetFutureClimateParameters  (cost=1.00..10001.01 rows=1 
width=14) (actual time=7.361..7.362 rows=1 loops=1)
   -  Index Scan using 
PK_ScenarioEmissionLevels on ScenarioEmissionLevels  (cost=0.00..5.83 
rows=1 width=18) (actual time=0.021..0.024 rows=1 loops=1)
 Index Cond: 
((ScenarioEmissionLevels.ScenarioId = outer.ScenarioId) AND 
(ScenarioEmissionLevels.iYear = outer.iYear) AND 
(ScenarioEmissionLevels.LevelId = outer.LevelId))
 -  Index Scan using IX_ClimateId on 
ClimateChangeModel40  (cost=0.00..1063711.75 rows=265528 width=20) 
(actual time=28.311..17212.703 rows=13276368 loops=1)
   Index Cond: (outer.ClimateId = 
ClimateChangeModel40.ClimateId)
   -  Index Scan using PK_Aus40_DEM on 
Aus40_DEM  (cost=0.00..6.01 rows=1 width=16) (actual time=0.005..0.006 
rows=1 loops=13276368)
 Index Cond: (outer.AusPosNumber = 
Aus40_DEM.AusPosNumber)
 -  Index Scan using PK_CurrentAusClimate on 
CurrentAusClimate  (cost=0.00..46.20 rows=11 width=14) (actual 
time=0.007..0.009 rows=1 loops=13276368)
   Index Cond: ((CurrentAusClimate.ClimateId =
outer.ClimateId) AND (outer.AusPosNumber =
CurrentAusClimate.AusPosNum) AND (CurrentAusClimate.iMonth =
outer.iMonth))
 Total runtime: 462218.120 ms
(23 rows)






On Thu, 21 Apr 2005, Russell Smith wrote:

 On Thu, 21 Apr 2005 10:44 pm, Shoaib Burq (VPAC) wrote:
    -  Nested Loop  (cost=2.19..1069345.29 rows=16 width=58) (actual 
  time=135.390..366902.373 rows=13276368 loops=1)
                       -  Nested Loop  (cost=2.19..1067304.07 rows=44 
  width=68) (actual time=107.627..186390.137 rows=13276368 loops=1)
                             -  Nested Loop  (cost=2.19..1067038.94 rows=44 
  width=52) (actual time=87.255..49743.796 rows=13276368 loops=1)
 
 OUCH, OUCH, OUCH.
 
 Most if not all of the time is going on nested loop joins.  The tuple 
 estimates are off by a factore of 10^6 which is means it's chosing the wrong
 join type.
 
 you could set enable_seqscan to OFF;  to test what he performance is like 
 with a different plan, and then set it back on.
 
 However you really need to get the row count estimates up to something 
 comparable.  within a factor of 10 at least.
 A number of the other rows estimates seem to be off by 

Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-21 Thread John A Meinel
Shoaib Burq (VPAC) wrote:
Just tried it with the following changes:
shared_buffers = 10600
work_mem = 102400
enable_seqscan = false
still no improvement
Ok here's the Plan with the enable_seqscan = false:
ausclimate=# explain ANALYZE  select count(*) from getfutureausclimate;

Actually, you probably don't want enable_seqscan=off, you should try:
SET enable_nestloop TO off.
The problem is that it is estimating there will only be 44 rows, but in
reality there are 13M rows. It almost definitely should be doing a
seqscan with a sort and merge join.
Also, please attach you explain analyzes, the wrapping is really hard to
read.
I don't understand how postgres could get the number of rows that wrong.
It seems to be misestimating the number of entries in IX_ClimateId
Here:
-  Index Scan using PK_Aus40_DEM on Aus40_DEM  (cost=0.00..6.01 rows=1 
width=16) (actual time=0.005..0.006 rows=1 loops=13276368)
   Index Cond: (outer.AusPosNumber = Aus40_DEM.AusPosNumber)
-  Index Scan using PK_CurrentAusClimate on CurrentAusClimate  
(cost=0.00..46.20 rows=11 width=14) (actual time=0.007..0.009 rows=1 loops=13276368)
The first index scan is costing you 0.006*13276368=79s, and the second one is 
119s.
I can't figure out exactly what is where from the formatting, but the query 
that seems misestimated is:
-  Index Scan using IX_ClimateId on ClimateChangeModel40  
(cost=0.00..1063711.75 rows=265528 width=20) (actual time=28.311..17212.703 rows=13276368 loops=1)
   Index Cond: (outer.ClimateId = ClimateChangeModel40.ClimateId)
Is there an unexpected correlaction between
ClimateChangeModel40.ClimateId and whatever outer is at this point?
John
=:-


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-21 Thread Gavin Sherry
On Fri, 22 Apr 2005, Shoaib Burq (VPAC) wrote:

 Please see attached the output from explain analyse. This is with the

   shared_buffers = 10600
   work_mem = 102400
   enable_seqscan = true

 BTW I guess should mention that I am doing the select count(*) on a View.

 Ran the Explain analyse with the nestedloop disabled but it was taking
 forever... and killed it after 30mins.

Try increasing stats collection on ClimateChangeModel40.ClimateId:

alter table ClimateChangeModel40 alter column ClimateId set statistics 1000;
analyze ClimateChangeModel40;

Gavin

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match