Re: [PERFORM] When are index scans used over seq scans?

2005-04-21 Thread Richard van den Berg
Tom Lane wrote:
 The explain shows no such thing.  What is the *actual* runtime of
 each plan per EXPLAIN ANALYZE, please?

Ok, it took 3.5 hours to complete. :-/

This is with the default cpu_tuple_cost = 0.01:

 Nested Loop  (cost=252.80..233010147.16 rows=1035480320 width=98)
(actual time=0.369..12672213.137 rows=6171334 loops=1)
   Join Filter: ((outer.starttimetrunc = inner.ts) AND
(outer.finishtimetrunc = inner.ts))
   -  Seq Scan on sessions us  (cost=0.00..26822.36 rows=924536
width=106) (actual time=0.039..5447.349 rows=924536 loops=1)
   -  Materialize  (cost=252.80..353.60 rows=10080 width=8) (actual
time=0.000..2.770 rows=10080 loops=924536)
 -  Seq Scan on duration du  (cost=0.00..252.80 rows=10080
width=8) (actual time=0.019..13.397 rows=10080 loops=1)
 Total runtime: 12674486.670 ms

Once again with cpu_tuple_cost = 0.1:

 Nested Loop  (cost=0.00..667684584.42 rows=1035480320 width=98) (actual
time=42.892..39877.928 rows=6171334 loops=1)
   -  Seq Scan on sessions us  (cost=0.00..110030.60 rows=924536
width=106) (actual time=0.020..917.803 rows=924536 loops=1)
   -  Index Scan using ix_du_ts on duration du  (cost=0.00..604.46
rows=1120 width=8) (actual time=0.004..0.011 rows=7 loops=924536)
 Index Cond: ((outer.starttimetrunc = du.ts) AND
(outer.finishtimetrunc = du.ts))
 Total runtime: 41635.468 ms
(5 rows)

-- 
Richard van den Berg, CISSP
---
Trust Factory B.V. | www.dna-portal.net
Bazarstraat 44a|  www.trust-factory.com
2518AK The Hague   |  Phone: +31 70 3620684
The Netherlands|  Fax  : +31 70 3603009
---

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] How can an index be larger than a table

2005-04-21 Thread David Roussel
Hi,

I have a series of tables with identical structure.  Some contain a few
thousand rows and some contain 3,000,000 rows.  Another applicate writes
the rows and my applicate reads then just by selecting where pk 
last_seen_pk limit 2000.

I've found that one of the tables, when selecting from it that one of
the tables is many times slower than the others.  

For instance when reading data in batches of 2000 rows, it seems to take
26 seconds to query from dave_data_update_events  with 1593600, but only
1 or two seconds to query from jane_data_update_events with 310
rows!

This is ther SQL used


|
|select 
|   events.event_id, ctrl.real_name, events.tsds, events.value, 
|   events.lds, events.correction, ctrl.type, ctrl.freq 
|from dave_data_update_events events, dave_control ctrl 
|where events.obj_id = ctrl.obj_id and 
|events.event_id  32128893::bigint 
|order by events.event_id 
|limit 2000
|

Here is the structure of the tables...

|
|CREATE TABLE dave_control (
|obj_id numeric(6,0) NOT NULL,
|real_name character varying(64) NOT NULL,
|type numeric(2,0) NOT NULL,
|freq numeric(2,0) NOT NULL
|);
|
|CREATE TABLE dave_data_update_events (
|lds numeric(13,0) NOT NULL,
|obj_id numeric(6,0) NOT NULL,
|tsds numeric(13,0) NOT NULL,
|value character varying(22) NOT NULL,
|correction numeric(1,0) NOT NULL,
|delta_lds_tsds numeric(13,0) NOT NULL,
|event_id bigserial NOT NULL
|);
|
|CREATE UNIQUE INDEX dave_control_obj_id_idx ON dave_control USING btree
(obj_id);
|ALTER TABLE dave_control CLUSTER ON dave_control_obj_id_idx;
|
|CREATE UNIQUE INDEX dave_control_real_name_idx ON dave_control USING
btree (real_name);
|
|CREATE INDEX dave_data_update_events_lds_idx ON dave_data_update_events
USING btree (lds);
|
|CREATE INDEX dave_data_update_events_obj_id_idx ON
dave_data_update_events USING btree (obj_id);
|
|ALTER TABLE ONLY dave_control
|ADD CONSTRAINT dave_control_obj_id_key UNIQUE (obj_id);
|
|ALTER TABLE ONLY dave_control
|ADD CONSTRAINT dave_control_real_name_key UNIQUE (real_name);
|
|ALTER TABLE ONLY dave_data_update_events
|ADD CONSTRAINT dave_data_update_events_event_id_key UNIQUE
(event_id);
|

There are several pairs of tables, but with names like rod, jane,
fredie, etc.. instead of dave.
The first thing to note about the scheme (not designed by me) is that
the control table is clustered on obj_id, but the data_update_events
table is not clustered.  Does that mean the rows will be stored in order
of insert?  That might be ok, because data_update_events table is like a
queue and I read it in the order the new rows are inserted.

What also seems weird to me is that the control table has some unique
indexes created on it, but the data_upate_events table just has a unique
constraint.  Will postgres use an index in the background to enforce
this constraint?

When looking at the indexes on the all the tables in DbVisualiser my
colleague noticed that the cardinality of the indexes on the rod, jane
and fredie tables was consistent, but for dave the cardinality was
strange...

|
|SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE
relname LIKE 'dave_data%';
|
|relnamerelkind reltuples relpages
|=== === = 
|dave_data_update_eventsr   1593600.0 40209
|dave_data_update_events_event_id_keyi1912320.0 29271
|dave_data_update_events_event_id_seqS   1.0   1
|dave_data_update_events_lds_idxi   1593600.0 6139
|dave_data_update_events_obj_id_idx i   1593600.0 6139
|iso_pjm_data_update_events_obj_id_idxi1593600.0 6139
|

Note that there are only 1593600 rows in the table, so why the 1912320
figure?

Of course I checked that the row count was correct...

|
|EXPLAIN ANALYZE 
|select count(*) from iso_pjm_data_update_events
|
|QUERY PLAN
|Aggregate  (cost=60129.00..60129.00 rows=1 width=0) (actual
time=35933.292..35933.293 rows=1 loops=1)
|  -  Seq Scan on iso_pjm_data_update_events  (cost=0.00..56145.00
rows=1593600 width=0) (actual time=0.213..27919.497 rows=1593600
loops=1)
|Total runtime: 35933.489 ms
|

and...

|
|select count(*) from iso_pjm_data_update_events
|
|count
|1593600
|

so it's not that there are any undeleted rows lying around.

So any comments on the index structure?  Any ideas why the cardinality
of the index is greater than the number of rows in the table?  Was it
because the table used to be larger?

Also any ideas on how else to track down the big performance difference
between tables of the same structure?



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] two queries and dual cpu (perplexed)

2005-04-21 Thread Shoaib Burq (VPAC)
Hi everybody,

One of our clients was using SQL-Server and decided to switch to
PostgreSQL 8.0.1.

Hardware: Dual processor  Intel(R) Xeon(TM) CPU 3.40GHz
OS: Enterprise Linux with 2.6.9-5 SMP kernel
Filesystem: ext3
SHMMAX: $ cat  /proc/sys/kernel/shmmax
6442450944 --- beleive that's ~6.5 GB, total ram is 8GB
Database: 15GB in size with a few tables with over 80 million rows.

Here is a snippit from the output of 
SELECT oid , relname, relpages, reltuples 
FROM pg_class ORDER BY relpages DESC;
oid| relname | relpages |  reltuples  
---+-+--+-
 16996 | CurrentAusClimate   |   474551 | 8.06736e+07
 16983 | ClimateChangeModel40|   338252 | 5.31055e+07
 157821816 | PK_CurrentAusClimate|   265628 | 8.06736e+07
 157835995 | idx_climateid   |   176645 | 8.06736e+07
 157835996 | idx_ausposnum   |   176645 | 8.06736e+07
 157835997 | idx_climatevalue|   176645 | 8.06736e+07
 157821808 | PK_ClimateModelChange_40|   174858 | 5.31055e+07
 157821788 | IX_iMonth001|   116280 | 5.31055e+07
 157821787 | IX_ClimateId|   116280 | 5.31055e+07
 157821786 | IX_AusPosNumber |   116280 | 5.31055e+07
 17034 | NeighbourhoodTable  |54312 | 1.00476e+07
 157821854 | PK_NeighbourhoodTable   |27552 | 1.00476e+07
 157821801 | IX_NeighbourhoodId  |22002 | 1.00476e+07
 157821800 | IX_NAusPosNumber|22002 | 1.00476e+07
 157821799 | IX_AusPosNumber006  |22002 | 1.00476e+07
[...]

To test the performance of the database we ran one of the most demanding
queries that exist with the following embarrassing results:

Query Execution time on:
SQL-Server (dual processor xeon)  3min 11sec
PostgreSQL (SMP IBM Linux server) 5min 30sec

Now I have not touch the $PGDATA/postgresql.conf (As I know very little 
about memory tuning) Have run VACCUM  ANALYZE.

The client understands that they may not match the performance for a
single query as there is no multithreading. So they asked me to
demonstrate the benefits of Postgresql's multiprocessing capabilities.

To do that I modified the most demanding query to create a second query
and ran them in parallel:

$ time ./run_test1.sh
$ cat ./run_test1.sh
/usr/bin/time -p psql -f ./q1.sql ausclimate  q1.out 2q1.time 
/usr/bin/time -p psql -f ./q2.sql ausclimate  q2.out 2q2.time

and the time taken is *twice* that for the original. The modification was 
minor. The queries do make use of both CPUs:

 2388 postgres 16 0 79640 15m 11m R 80.9 0.2 5:05.81 postmaster
 2389 postgres 16 0 79640 15m 11m R 66.2 0.2 5:04.25 postmaster

But I can't understand why there's no performance improvement and infact
there seems to be no benefit of multiprocessing.  Any ideas? I don't know
enough about the locking procedures employed by postgres but one would
think this shouldn't be and issue with read-only queries.

Please don't hesitate to ask me for more info like, the query or the
output of explain, or stats on memory usage. I just wanted to keep this 
short and provide more info as the cogs start turning :-)

Thanks  Regards
Shoaib



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] When are index scans used over seq scans?

2005-04-21 Thread Richard van den Berg
Thanks a lot John for the correct search terms. :-)

The suggestion in
http://archives.postgresql.org/pgsql-performance/2005-04/msg00029.php to
add a constraint that checks (finishtime = starttime) does not make a
difference for me. Still seq scans are used.

The width solution explained in
http://archives.postgresql.org/pgsql-performance/2005-04/msg00027.php
and
http://archives.postgresql.org/pgsql-performance/2005-04/msg00116.php
does make a huge difference when selecting 1 timestamp using a BETWEEN
(2ms vs 2sec), but as soon as I put 2 timestamps in a table and try a
join, everything goes south (7.7sec). I have 10k timestamps in the
duration table. :-(

I'm getting more confused on how the planner decides to use indexes. For
example, if I try:

explain analyze select us.oid from sessions us where '2005-04-10
23:11:00' between us.starttimetrunc and us.finishtimetrunc;

 QUERY PLAN

 Index Scan using sessions_st_ft_idx2 on sessions us
(cost=0.00..18320.73 rows=4765 width=4) (actual time=0.063..2.455
rows=279 loops=1)
   Index Cond: (('2005-04-10 23:11:00'::timestamp without time zone =
finishtimetrunc) AND ('2005-04-10 23:11:00'::timestamp without time zone
= starttimetrunc))
 Total runtime: 2.616 ms

is uses the index! However, if I change the date it does not:

explain analyze select us.oid from sessions us where '2005-04-09
23:11:00' between us.starttimetrunc and us.finishtimetrunc;

   QUERY PLAN

Seq Scan on sessions us  (cost=0.00..68173.04 rows=41575 width=4)
(actual time=553.424..1981.695 rows=64 loops=1)
   Filter: (('2005-04-09 23:11:00'::timestamp without time zone =
starttimetrunc) AND ('2005-04-09 23:11:00'::timestamp without time zone
= finishtimetrunc))
 Total runtime: 1981.802 ms

The times in sessions go from '2005-04-04 00:00:00' to '2005-04-10
23:59:00' so both are valid times to query for, but April 10th is more
towards the end. A little experimenting shows that if I go earlier than
'2005-04-10 13:26:15' seq scans are being used. I was thinking this
timestamp would have something to do with the histogram_bounds in
pg_stats, but I cannot find a match:

 starttimetrunc | {2005-04-04 00:05:00,2005-04-04
11:49:00,2005-04-04 22:03:00,2005-04-05 10:54:00,2005-04-05
21:08:00,2005-04-06 10:28:00,2005-04-07 01:57:00,2005-04-07
15:55:00,2005-04-08 10:18:00,2005-04-08 17:12:00,2005-04-10 23:57:00}
 finishtimetrunc| {2005-04-04 00:05:00.93,2005-04-04
11:53:00.98,2005-04-04 22:35:00.38,2005-04-05
11:13:00.02,2005-04-05 21:31:00.98,2005-04-06
10:45:01,2005-04-07 02:08:08.25,2005-04-07 16:20:00.93,2005-04-08
10:25:00.40,2005-04-08 17:15:00.94,2005-04-11 02:08:19}

-- 
Richard van den Berg, CISSP
---
Trust Factory B.V. | www.dna-portal.net
Bazarstraat 44a|  www.trust-factory.com
2518AK The Hague   |  Phone: +31 70 3620684
The Netherlands|  Fax  : +31 70 3603009
---
   Have you visited our new DNA Portal?
---

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-21 Thread Jeff
On Apr 21, 2005, at 7:49 AM, Shoaib Burq (VPAC) wrote:
Now I have not touch the $PGDATA/postgresql.conf (As I know very little
about memory tuning) Have run VACCUM  ANALYZE.
You should really, really bump up shared_buffers and given you have 8GB 
of ram this query would likely benefit from more work_mem.

and the time taken is *twice* that for the original. The modification 
was
minor. The queries do make use of both CPUs:

Is this an IO intensive query?  If running both in parellel results in 
2x the run time and you have sufficient cpus it would (to me) indicate 
you don't have enough IO bandwidth to satisfy the query.

Can we see an explain analyze of the query?  Could be a bad plan and a 
bad plan will never give good performance.

--
Jeff Trout [EMAIL PROTECTED]
http://www.jefftrout.com/
http://www.stuarthamm.net/
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-21 Thread Gavin Sherry
On Thu, 21 Apr 2005, Jeff wrote:


 On Apr 21, 2005, at 7:49 AM, Shoaib Burq (VPAC) wrote:

  Now I have not touch the $PGDATA/postgresql.conf (As I know very little
  about memory tuning) Have run VACCUM  ANALYZE.
 
 You should really, really bump up shared_buffers and given you have 8GB
 of ram this query would likely benefit from more work_mem.

I'd recommend shared_buffers = 10600. Its possible that work_mem in the
hundreds of megabytes might have a good impact, but its hard to say
without seeing the EXPLAIN ANALYZE output.

Gavin


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-21 Thread Shoaib Burq (VPAC)

here's explain sorry about the mess: I can attach it as text-file if you 
like.

ausclimate=# explain ANALYZE  select count(*) from getfutureausclimate;



   
QUERY PLAN  
 

 Aggregate  (cost=1069345.85..1069345.85 rows=1 width=0) (actual 
time=443241.241..443241.242 rows=1 loops=1)
   -  Subquery Scan getfutureausclimate  (cost=1069345.61..1069345.81 
rows=16 width=0) (actual time=411449.034..436165.259 rows=13276368 
loops=1)
 -  Sort  (cost=1069345.61..1069345.65 rows=16 width=58) (actual 
time=411449.026..426001.199 rows=13276368 loops=1)
   Sort Key: Aus40_DEM.AusPosNumber, 
CurrentAusClimate.iMonth
   -  Nested Loop  (cost=2.19..1069345.29 rows=16 width=58) 
(actual time=135.390..366902.373 rows=13276368 loops=1)
 -  Nested Loop  (cost=2.19..1067304.07 rows=44 
width=68) (actual time=107.627..186390.137 rows=13276368 loops=1)
   -  Nested Loop  (cost=2.19..1067038.94 rows=44 
width=52) (actual time=87.255..49743.796 rows=13276368 loops=1)
 -  Nested Loop  (cost=2.19..8.09 rows=1 
width=32) (actual time=52.684..52.695 rows=1 loops=1)
   -  Merge Join  (cost=2.19..2.24 
rows=1 width=24) (actual time=28.000..28.007 rows=1 loops=1)
 Merge Cond: 
(outer.ClimateId = inner.ClimateId)
 -  Sort  (cost=1.17..1.19 
rows=7 width=10) (actual time=10.306..10.307 rows=3 loops=1)
   Sort Key: 
ClimateVariables.ClimateId
   -  Seq Scan on 
ClimateVariables  (cost=0.00..1.07 rows=7 width=10) (actual 
time=10.277..10.286 rows=7 loops=1)
 -  Sort  (cost=1.02..1.02 
rows=1 width=14) (actual time=17.679..17.680 rows=1 loops=1)
   Sort Key: 
GetFutureClimateParameters.ClimateId
   -  Seq Scan on 
GetFutureClimateParameters  (cost=0.00..1.01 rows=1 width=14) (actual 
time=17.669..17.671 rows=1 loops=1)
   -  Index Scan using 
PK_ScenarioEmissionLevels on ScenarioEmissionLevels  (cost=0.00..5.83 
rows=1 width=18) (actual time=24.676..24.679 rows=1 loops=1)
 Index Cond: 
((ScenarioEmissionLevels.ScenarioId = outer.ScenarioId) AND 
(ScenarioEmissionLevels.iYear = outer.iYear) AND 
(ScenarioEmissionLevels.LevelId = outer.LevelId))
 -  Index Scan using IX_ClimateId on 
ClimateChangeModel40  (cost=0.00..1063711.75 rows=265528 width=20) 
(actual time=34.564..19435.855 rows=13276368 loops=1)
   Index Cond: (outer.ClimateId = 
ClimateChangeModel40.ClimateId)
   -  Index Scan using PK_Aus40_DEM on 
Aus40_DEM  (cost=0.00..6.01 rows=1 width=16) (actual time=0.005..0.006 
rows=1 loops=13276368)
 Index Cond: (outer.AusPosNumber = 
Aus40_DEM.AusPosNumber)
 -  Index Scan using PK_CurrentAusClimate on 
CurrentAusClimate  (cost=0.00..46.20 rows=11 width=14) (actual 
time=0.007..0.009 rows=1 loops=13276368)
   Index Cond: ((CurrentAusClimate.ClimateId = 
outer.ClimateId) AND (outer.AusPosNumber = 
CurrentAusClimate.AusPosNum) AND (CurrentAusClimate.iMonth = 
outer.iMonth))
 Total runtime: 443983.269 ms
(25 rows)


Shsshh...

 You should really, really bump up shared_buffers and given you have 8GB 
 of ram this query would likely benefit from more work_mem.

I actually tried that and there was a decrease in performance. Are the
shared_buffers and work_mem the only things I should change to start with? 
If so what's the reasoning.


 Is this an IO intensive query?  If running both in parellel results in 
 2x the run time and you have sufficient cpus it would (to me) indicate 
 you don't have enough IO bandwidth to satisfy the query.

Yes I think so too: ... I am just compiling some io stats...

Also will jump on to irc...

 
Whoa! thanks all... I am overwhelmed with the help I am getting... I love 
it!



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

2005-04-21 Thread Merlin Moncure
 John A Meinel [EMAIL PROTECTED] writes:
  Joel Fradkin wrote:
  Postgres was on the second run
  Total query runtime: 17109 ms.
  Data retrieval runtime: 72188 ms.
  331640 rows retrieved.
 
  How were you measuring data retrieval time?
 
 I suspect he's using pgadmin.  We've seen reports before suggesting
that
 pgadmin can be amazingly slow, eg here
 http://archives.postgresql.org/pgsql-performance/2004-10/msg00427.php
 where the *actual* data retrieval time as shown by EXPLAIN ANALYZE
 was under three seconds, but pgadmin claimed the query runtime was 22
 sec and data retrieval runtime was 72 sec.

The problem is that pgAdmin takes your query results and puts it in a
grid.  The grid is not designed to be used in that way for large
datasets. The time complexity is not linear and really breaks down
around 10k-100k rows depending on various factors.  pgAdmin users just
have to become used to it and use limit or the filter feature at
appropriate times.

The ms sql enterprise manager uses cursors which has its own set of
nasty issues (no mvcc).

In fairness, unless you are running with \a switch, psql adds a fair
amount of time to the query too.

Joel:
Postgres was on the second run
Total query runtime: 17109 ms.
Data retrieval runtime: 72188 ms.
331640 rows retrieved.

The Data retrieval runtime is time spend by pgAdmin formatting, etc.
The query runtime is the actual timing figure you should be concerned
with (you are not comparing apples to apples).  I can send you a utility
I wrote in Delphi which adds only a few seconds overhead for 360k result
set.  Or, go into psql, throw \a switch, and run query.

or: 
psql -A -c select * from myview where x  output.txt

it should finish the above in 16-17 sec plus the time to write out the
file.

Joel, I have a lot of experience with all three databases you are
evaluating and you are making a huge mistake switching to mysql.  you
can make a decent case for ms sql, but it's quite expensive at your
level of play as you know.

Merlin




---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

2005-04-21 Thread Andreas Pflug
Joel Fradkin wrote:
I did think of something similar just loading the data tables with junk
records and I may visit that idea with Josh.
I did just do some comparisons on timing of a plain select * from tbl where
indexed column = x and it was considerably slower then both MSSQL and MYSQL,
so I am still a bit confused. This still might be configuration issue (I ran
on my 2gig desktop and the 8 gig Linux box comparisons were all ran on the
same machines as far MSSQL, MYSQL, and Postgres.
I turned off postgres when running MYSQL and turned off MYSQL when running
postgres, MSSQL had one of the two running while I tested it.
For the 360,000 records returned MYSQL did it in 40 seconds first run and 17
seconds second run.
MSSQL did it in 56 seconds first run and 16 seconds second run.
Postgres was on the second run
Total query runtime: 17109 ms.
Data retrieval runtime: 72188 ms.
331640 rows retrieved.
Beware!
From the data, I can see that you're probably using pgAdmin3.
The time to execute your query including transfer of all data to the 
client is 17s in this example, while displaying it (i.e. pure GUI and 
memory alloc stuff) takes 72s. Execute to a file to avoid this.

Regards,
Andreas
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-21 Thread Shoaib Burq (VPAC)

here are some i/o stats with the unchanged postgresql.conf. Gonna change
it now and have another go.


[EMAIL PROTECTED] MultiCPU_test]$ vmstat 10
procs ---memory-- ---swap-- -io --system-- 
cpu
 r  b   swpd   free   buff  cache   si   sobibo   incs us sy 
id wa
 0  0  25808 710356 6348860 9720522473291 3  1  0 
99  0
 2  0  25808 647636 6348960 103478400  3226  3048 1054 92819 55 19 
25  1
 2  0  25808 585684 6349032 109666000  3203  3057 1053 96375 55 19 
25  1
 2  0  25808 521940 6349112 116036400  3388  2970 1052 95563 54 19 
26  1
 2  0  25808 463636 6349184 121856800  2804  3037 1048 93560 55 19 
25  1
 2  0  25808 405460 6349264 127669600  2794  3047 1046 96971 55 19 
25  1
 2  0  25808 343956 6349340 133816000  3151  3040 1049 96629 55 20 
25  1
 2  0  25808 287252 6349412 139473200  2666  2990 1045 95173 54 20 
25  1
 2  0  25808 230804 6349484 145116800  2678  2966 1044 95577 54 19 
26  1
 2  0  25808 169428 6349560 151242800  3164  3015 1048 98451 55 19 
25  1
 2  0  25808 110484 6349640 157130400  2910  2970 1050 98214 55 20 
25  0
 0  0  25808  50260 6349716 163140800  3049  3015 1049 99830 55 20 
25  1
 1  0  25808   8512 6349788 167315600  2934  2959 1047 95940 54 19 
24  3
 2  1  25808   8768 6349796 167294400  2552  2984 1043 97893 55 19 
18  8
 1  1  25808   8384 6349824 167325600  2596  3032 1051 94646 55 19 
19  6
 2  1  25808   8960 6349856 167268000  2982  3028 1052 94486 55 20 
19  6
 1  1  25808   8960 6349884 167258400  3125  2919 1052 86969 52 20 
19  8
 2  0  25808   6196 6349912 167527600  2809  3064 1046 99147 55 20 
19  5
 1  1  25808   9216 6349976 167215200  2898  3076 1047 93271 55 19 
21  6
 2  0  25808   6580 6349316 166397200  3150  2982 1048 94964 54 22 
20  4
 2  0  25808   7692 6349348 167448000  2742  3006 1045 97488 54 21 
21  4
 2  1  25808   8232 6346244 167670000  2900  3022 1048 92496 54 20 
19  8
 2  0  25808   7104 6346192 167804400  3284  2958 1057 97265 55 20 
18  7
 2  0  25808   8488 6346168 167677600  2609  3031 1047 93965 55 19 
20  7
 2  1  25808   8680 6346184 167648800  3067  3044 1051 96594 55 19 
19  6
 2  0  25808   8576 6346168 167664000  2900  3070 1047 96300 55 19 
20  6
 2  1  25808   9152 6346156 167617600  3010  2993 1049 98271 55 20 
19  7
 2  0  25808   7040 6346172 167820000  3242  3034 1050 97669 55 20 
21  4
 1  1  25808   8900 6346192 167634400  2859  3014 1052 91220 53 19 
21  6
 2  1  25808   8512 6346188 167682400  2737  2960 1049 100609 55 
20 18  6
 2  0  25808   7204 6346236 167800000  2972  3045 1050 94851 55 19 
17  9
 1  0  25808   7116 6346208 167802800  3053  2996 1048 98901 55 19 
20  5
 2  1  25808   9180 6346196 167606800  2857  3067 1047 100629 56 
21 20  3
procs ---memory-- ---swap-- -io --system-- 
cpu
 r  b   swpd   free   buff  cache   si   sobibo   incs us sy 
id wa
 3  1  25808   8896 6346172 167650000  3138  3022 1049 97937 55 20 
20  5
 2  1  25808   9088 6346188 167621200  2844  3022 1047 97664 55 19 
20  5
 1  1  25808   8920 6346248 167628800  3017  3024 1049 99644 55 20 
17  7
 1  1  25808   8064 6346116 167716800  2824  3037 1047 99171 55 20 
19  5
 2  1  25820   8472 6344336 167859600  2969  2957 1047 96396 54 21 
18  7
 2  1  25820   9208 6344300 167788400  3072  3031 1050 95017 54 19 
22  5
 1  0  25820   7848 6344328 167914800  3229  3011 1050 97108 55 19 
20  5
 2  1  25820   8960 6344348 167804000  2701  2954 1046 98485 54 20 
21  5
 2  0  25820   7900 6344368 167924400  2604  2931 1044 97198 54 20 
19  7
 2  0  25820   9240 6344424 167789600  2990  3015 1048 102414 56 
20 19  5
 2  0  25820   8924 6344436 167808800  3256  2991 1049 96709 55 19 
21  5
 1  1  25820   8900 6344456 167820400  2761  3030 1051 96498 55 20 
20  5
 2  0  25820   7628 630 167944400  2952  3012 1053 96534 55 20 
19  6
 2  0  25820   7080 6344472 167995600  2848  3079 1050 95074 56 19 
19  6
 2  0  25820   8928 634 167808000  2985  3021 1049 96806 55 20 
18  7
 2  1  25820   7976 6344976 1676892   110  3429  3062 1083 92817 55 19 
18  8
 2  0  25820   8096 6345080 167665200  2662  2989 1056 91921 54 19 
17 10
 1  0  25820   7424 6345128 167735200  2956  3029 1054 99385 56 19 
20  5
 2  0  25820   6664 6345232 167772400  3358  3030 1064 95929 55 19 
21  5
 1  0  25820   7268 6345320 167695600  2681  3012 1082 97744 54 20 
18  7
 2  0  25820   6944 6345364 167718400  3156  3022 1061 98055 55 19 
22  4
 2  0  25820   8668 6345420 167542800  2990  3018 1050 94734 55 19 
22  5
 2  1  25820   

Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

2005-04-21 Thread Joel Fradkin
Why is MYSQL returning 360,000 rows, while Postgres is only returning
330,000? This may not be important at all, though.
I also assume you are selecting from a plain table, not a view.

Yes plain table. Difference in rows is one of the datasets had sears data in
it. It (speed differences found) is much worse on some of my views, which is
what forced me to start looking at other options.

I suppose knowing your work_mem, and shared_buffers settings would be
useful. I have posted my configs, but will add the Tampa to the bottom
again. My desktop has
# - Memory -

shared_buffers = 8000   # min 16, at least max_connections*2, 8KB
each
work_mem = 8000#1024# min 64, size in KB
maintenance_work_mem = 16384# min 1024, size in KB
#max_stack_depth = 2048 # min 100, size in KB

# - Free Space Map -

max_fsm_pages = 3#2 # min max_fsm_relations*16, 6 bytes
each
max_fsm_relations = 1000# min 100, ~50 bytes each
# - Planner Cost Constants -

effective_cache_size = 8#1000   # typically 8KB each
random_page_cost = 2# units are one sequential page fetch cost

How were you measuring data retrieval time? And how does this compare
to what you were measuring on the other machines? It might be possible
that what you are really measuring is just the time it takes psql to
load up all the data into memory, and then print it out. And since psql
defaults to measuring entry lengths for each column, this may not be
truly comparable.
It *looks* like it only takes 18s for postgres to get the data, but then
it is taking 72s to transfer the data to you. That would be network
latency, or something like that, not database latency.
And I would say that 18s is very close to 16 or 17 seconds.
This was ran on the machine with database (as was MYSQL and MSSQL).
The PG timing was from PGADMIN and the 18 secs was second run, first run was
Same time to return the data and 70 secs to do the first part like 147 secs
all told, compared to the 40 seconds first run of MYSQL and 56 Seconds
MSSQL. MYSQL was done in their query tool, it returns the rows as well and
MSSQL was done in their query analyzer. All three tools appear to use a
similar approach. Just an FYI doing an explain analyze of my problem view
took much longer then actually returning the data in MSSQL and MYSQL. I have
done extensive testing with MYSQL (just this table and two of my problem
views). I am not using the transactional version, because I need the best
speed.


I don't know what commands you were issuing, or how you measured,
though. You might be using some other interface (like ODBC), which I
can't say too much about.

John
=:-

This is the Linux box config.
# -
# PostgreSQL configuration file
# -
#
# This file consists of lines of the form:
#
#   name = value
#
# (The '=' is optional.) White space may be used. Comments are introduced
# with '#' anywhere on a line. The complete list of option names and
# allowed values can be found in the PostgreSQL documentation. The
# commented-out settings shown in this file represent the default values.
#
# Please note that re-commenting a setting is NOT sufficient to revert it
# to the default value, unless you restart the postmaster.
#
# Any option can also be given as a command line switch to the
# postmaster, e.g. 'postmaster -c log_connections=on'. Some options
# can be changed at run-time with the 'SET' SQL command.
#
# This file is read on postmaster startup and when the postmaster
# receives a SIGHUP. If you edit the file on a running system, you have 
# to SIGHUP the postmaster for the changes to take effect, or use 
# pg_ctl reload. Some settings, such as listen_address, require
# a postmaster shutdown and restart to take effect.


#---
# FILE LOCATIONS
#---

# The default values of these variables are driven from the -D command line
# switch or PGDATA environment variable, represented here as ConfigDir.
# data_directory = 'ConfigDir'  # use data in another directory
#data_directory = '/pgdata/data'
# hba_file = 'ConfigDir/pg_hba.conf'# the host-based authentication file
# ident_file = 'ConfigDir/pg_ident.conf'  # the IDENT configuration file

# If external_pid_file is not explicitly set, no extra pid file is written.
# external_pid_file = '(none)'  # write an extra pid file


#---
# CONNECTIONS AND AUTHENTICATION
#---

# - Connection Settings -

#listen_addresses = 'localhost' # what IP interface(s) to listen on; 
# defaults to localhost, '*' = any

listen_addresses = '*'
port = 5432
max_connections = 100
# note: increasing max_connections costs about 500 

Re: [PERFORM] When are index scans used over seq scans?

2005-04-21 Thread Tom Lane
Richard van den Berg [EMAIL PROTECTED] writes:
 This is with the default cpu_tuple_cost = 0.01:

  Nested Loop  (cost=252.80..233010147.16 rows=1035480320 width=98)
 (actual time=0.369..12672213.137 rows=6171334 loops=1)
Join Filter: ((outer.starttimetrunc = inner.ts) AND
 (outer.finishtimetrunc = inner.ts))
-  Seq Scan on sessions us  (cost=0.00..26822.36 rows=924536
 width=106) (actual time=0.039..5447.349 rows=924536 loops=1)
-  Materialize  (cost=252.80..353.60 rows=10080 width=8) (actual
 time=0.000..2.770 rows=10080 loops=924536)
  -  Seq Scan on duration du  (cost=0.00..252.80 rows=10080
 width=8) (actual time=0.019..13.397 rows=10080 loops=1)
  Total runtime: 12674486.670 ms

Hmm, that *is* showing rather a spectacularly large amount of time in
the join itself: if I did the arithmetic right,

regression=# select 12672213.137 - (5447.349 + 2.770*924536 + 13.397);
   ?column?
--
 10105787.671
(1 row)

which is almost 80% of the entire runtime.  Which is enormous.
What are those column datatypes exactly?  Perhaps you are incurring a
datatype conversion cost?  Straight timestamp-vs-timestamp comparison
is fairly cheap, but any sort of conversion will cost dearly.

The planner's model for the time spent in the join itself is
(cpu_tuple_cost + 2 * cpu_operator_cost) * n_tuples
(the 2 because you have 2 operators in the join condition)
so you'd have to raise one or the other of these parameters
to model this situation accurately.  But I have a hard time
believing that cpu_tuple_cost is really as high as 0.1.
It seems more likely that the cpu_operator_cost is underestimated,
which leads me to question what exactly is happening in those
comparisons.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

2005-04-21 Thread Mohan, Ross
FWIW, ODBC has variables to tweak, as well. fetch/buffer sizes, and the like. 

Maybe one of the ODBC cognoscenti here can chime in more concretely



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Joel Fradkin
Sent: Thursday, April 21, 2005 10:36 AM
To: 'Tom Lane'; 'John A Meinel'
Cc: 'Postgresql Performance'
Subject: Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon



I suspect he's using pgadmin.  
Yup I was, but I did try running on the linux box in psql, but it was running 
to the screen and took forever because of that.

The real issue is returning to my app using ODBC is very slow (Have not tested 
the ODBC for MYSQL, MSSQL is ok (the two proc dell is running out of steam but 
been good until this year when we about doubled our demand by adding sears as a 
client).

Using odbc to postgres on some of the views (Josh from Command is having me do 
some very specific testing) is timing out with a 10 minute time limit. These 
are pages that still respond using MSSQL (this is wehere production is using 
the duel proc and the test is using the 4 proc).

I have a tool that hooks to all three databases so I can try it with that and 
see if I get different responses.

Joel


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

2005-04-21 Thread Joel Fradkin

I suspect he's using pgadmin.  
Yup I was, but I did try running on the linux box in psql, but it was
running to the screen and took forever because of that.

The real issue is returning to my app using ODBC is very slow (Have not
tested the ODBC for MYSQL, MSSQL is ok (the two proc dell is running out of
steam but been good until this year when we about doubled our demand by
adding sears as a client).

Using odbc to postgres on some of the views (Josh from Command is having me
do some very specific testing) is timing out with a 10 minute time limit.
These are pages that still respond using MSSQL (this is wehere production is
using the duel proc and the test is using the 4 proc).

I have a tool that hooks to all three databases so I can try it with that
and see if I get different responses.

Joel


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-21 Thread Shoaib Burq (VPAC)

 Is this an IO intensive query?  If running both in parellel results in 
 2x the run time and you have sufficient cpus it would (to me) indicate 
 you don't have enough IO bandwidth to satisfy the query.

any tips on how to verify this?


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

2005-04-21 Thread Joel Fradkin
Here is the connect string I am using.
It could be horrid as I cut it from ODBC program.

Session(StringConn) =
DRIVER={PostgreSQL};DATABASE=wazagua;SERVER=192.168.123.252;PORT=5432;UID=;
PWD=;ReadOnly=0;Protocol=6.4;FakeOidIndex=0;ShowOidColumn=0;RowVersioning=0;
ShowSystemTables=0;ConnSettings=;Fetch=100;Socket=4096;UnknownSizes=0;MaxVar
charSize=254;MaxLongVarcharSize=8190;Debug=0;CommLog=0;Optimizer=1;Ksqo=1;Us
eDeclareFetch=0;TextAsLongVarchar=1;UnknownsAsLongVarchar=0;BoolsAsChar=1;Pa
rse=0;CancelAsFreeStmt=0;ExtraSysTablePrefixes=dd_;LFConversion=1;UpdatableC
ursors=1;DisallowPremature=0;TrueIsMinus1=0;BI=0;ByteaAsLongVarBinary=0;UseS
erverSidePrepare=0

Joel Fradkin
 

-Original Message-
From: Mohan, Ross [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 21, 2005 9:42 AM
To: [EMAIL PROTECTED]
Subject: RE: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

FWIW, ODBC has variables to tweak, as well. fetch/buffer sizes, and the
like. 

Maybe one of the ODBC cognoscenti here can chime in more concretely






---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

2005-04-21 Thread Mohan, Ross
Joel, thanks. A couple of things jump out there for
me, not a problem for a routine ODBC connection, but
perhaps in the lotsa stuff context of your current
explorations, it might be relevant?

I am completely shooting from the hip, here, but...if
it were my goose to cook, I'd be investigating

Session(StringConn) = 
DRIVER={PostgreSQL};DATABASE=wazagua;SERVER=192.168.123.252;PORT=5432;UID=;
PWD=;ReadOnly=0;Protocol=6.4;

|| Protocol? Is this related to version? is the driver wy old?


FakeOidIndex=0;ShowOidColumn=0;RowVersioning=0;
ShowSystemTables=0;ConnSettings=;Fetch=100;

||  Fetch great for OLTP, lousy for batch?


Socket=4096;UnknownSizes=0;MaxVarcharSize=254;MaxLongVarcharSize=8190;

||  what ARE the datatypes and sizes in your particular case? 

Debug=0;

||  a run with debug=1 probably would spit up something interesting

CommLog=0;Optimizer=1;

||  Optimizer? that's a new one on me

Ksqo=1;UseDeclareFetch=0;TextAsLongVarchar=1;UnknownsAsLongVarchar=0;BoolsAsChar=1;Parse=0;CancelAsFreeStmt=;ExtraSysTablePrefixes=dd_;LFConversion=1;UpdatableCursors=1;DisallowPremature=0;TrueIsMinus1=0;BI=0;ByteaAsLongVarBinary=0;UseServerSidePrepare=0

||  that's about all I can see, prima facie.  I'll be very curious to know if 
ODBC is
   any part of your performance equation. 


HTH, 

Ross

-Original Message-
From: Joel Fradkin [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 21, 2005 10:54 AM
To: Mohan, Ross
Cc: [EMAIL PROTECTED]; PostgreSQL Perform
Subject: RE: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon


Here is the connect string I am using.
It could be horrid as I cut it from ODBC program.

Session(StringConn) = 
DRIVER={PostgreSQL};DATABASE=wazagua;SERVER=192.168.123.252;PORT=5432;UID=;
PWD=;ReadOnly=0;Protocol=6.4;FakeOidIndex=0;ShowOidColumn=0;RowVersioning=0;
ShowSystemTables=0;ConnSettings=;Fetch=100;Socket=4096;UnknownSizes=0;MaxVar
charSize=254;MaxLongVarcharSize=8190;Debug=0;CommLog=0;Optimizer=1;Ksqo=1;Us
eDeclareFetch=0;TextAsLongVarchar=1;UnknownsAsLongVarchar=0;BoolsAsChar=1;Pa
rse=0;CancelAsFreeStmt=0;ExtraSysTablePrefixes=dd_;LFConversion=1;UpdatableC
ursors=1;DisallowPremature=0;TrueIsMinus1=0;BI=0;ByteaAsLongVarBinary=0;UseS
erverSidePrepare=0

Joel Fradkin
 

-Original Message-
From: Mohan, Ross [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 21, 2005 9:42 AM
To: [EMAIL PROTECTED]
Subject: RE: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

FWIW, ODBC has variables to tweak, as well. fetch/buffer sizes, and the like. 

Maybe one of the ODBC cognoscenti here can chime in more concretely






---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-21 Thread Shoaib Burq (VPAC)

Just tried it with the following changes:

shared_buffers = 10600
work_mem = 102400
enable_seqscan = false

still no improvement

Ok here's the Plan with the enable_seqscan = false:
ausclimate=# explain ANALYZE  select count(*) from getfutureausclimate;

   
QUERY PLAN  
 

 Aggregate  (cost=101069350.74..101069350.74 rows=1 width=0) (actual 
time=461651.787..461651.787 rows=1 loops=1)
   -  Subquery Scan getfutureausclimate  (cost=101069350.50..101069350.70 
rows=16 width=0) (actual time=426142.382..454571.397 rows=13276368 
loops=1)
 -  Sort  (cost=101069350.50..101069350.54 rows=16 width=58) 
(actual time=426142.375..28.278 rows=13276368 loops=1)
   Sort Key: Aus40_DEM.AusPosNumber, 
CurrentAusClimate.iMonth
   -  Nested Loop  (cost=10001.02..101069350.18 rows=16 
width=58) (actual time=72.740..366588.646 rows=13276368 loops=1)
 -  Nested Loop  (cost=10001.02..101067308.96 
rows=44 width=68) (actual time=35.788..184032.873 rows=13276368 loops=1)
   -  Nested Loop  
(cost=10001.02..101067043.83 rows=44 width=52) (actual 
time=35.753..47971.652 rows=13276368 loops=1)
 -  Nested Loop  
(cost=10001.02..10012.98 rows=1 width=32) (actual 
time=7.433..7.446 rows=1 loops=1)
   -  Merge Join  
(cost=10001.02..10007.13 rows=1 width=24) (actual 
time=7.403..7.412 rows=1 loops=1)
 Merge Cond: 
(outer.ClimateId = inner.ClimateId)
 -  Index Scan using 
PK_ClimateVariables on ClimateVariables  (cost=0.00..6.08 rows=7 
width=10) (actual time=0.011..0.015 rows=3 loops=1)
 -  Sort  
(cost=10001.02..10001.03 rows=1 width=14) (actual 
time=7.374..7.375 rows=1 loops=1)
   Sort Key: 
GetFutureClimateParameters.ClimateId
   -  Seq Scan on 
GetFutureClimateParameters  (cost=1.00..10001.01 rows=1 
width=14) (actual time=7.361..7.362 rows=1 loops=1)
   -  Index Scan using 
PK_ScenarioEmissionLevels on ScenarioEmissionLevels  (cost=0.00..5.83 
rows=1 width=18) (actual time=0.021..0.024 rows=1 loops=1)
 Index Cond: 
((ScenarioEmissionLevels.ScenarioId = outer.ScenarioId) AND 
(ScenarioEmissionLevels.iYear = outer.iYear) AND 
(ScenarioEmissionLevels.LevelId = outer.LevelId))
 -  Index Scan using IX_ClimateId on 
ClimateChangeModel40  (cost=0.00..1063711.75 rows=265528 width=20) 
(actual time=28.311..17212.703 rows=13276368 loops=1)
   Index Cond: (outer.ClimateId = 
ClimateChangeModel40.ClimateId)
   -  Index Scan using PK_Aus40_DEM on 
Aus40_DEM  (cost=0.00..6.01 rows=1 width=16) (actual time=0.005..0.006 
rows=1 loops=13276368)
 Index Cond: (outer.AusPosNumber = 
Aus40_DEM.AusPosNumber)
 -  Index Scan using PK_CurrentAusClimate on 
CurrentAusClimate  (cost=0.00..46.20 rows=11 width=14) (actual 
time=0.007..0.009 rows=1 loops=13276368)
   Index Cond: ((CurrentAusClimate.ClimateId =
outer.ClimateId) AND (outer.AusPosNumber =
CurrentAusClimate.AusPosNum) AND (CurrentAusClimate.iMonth =
outer.iMonth))
 Total runtime: 462218.120 ms
(23 rows)






On Thu, 21 Apr 2005, Russell Smith wrote:

 On Thu, 21 Apr 2005 10:44 pm, Shoaib Burq (VPAC) wrote:
    -  Nested Loop  (cost=2.19..1069345.29 rows=16 width=58) (actual 
  time=135.390..366902.373 rows=13276368 loops=1)
                       -  Nested Loop  (cost=2.19..1067304.07 rows=44 
  width=68) (actual time=107.627..186390.137 rows=13276368 loops=1)
                             -  Nested Loop  (cost=2.19..1067038.94 rows=44 
  width=52) (actual time=87.255..49743.796 rows=13276368 loops=1)
 
 OUCH, OUCH, OUCH.
 
 Most if not all of the time is going on nested loop joins.  The tuple 
 estimates are off by a factore of 10^6 which is means it's chosing the wrong
 join type.
 
 you could set enable_seqscan to OFF;  to test what he performance is like 
 with a different plan, and then set it back on.
 
 However you really need to get the row count estimates up to something 
 comparable.  within a factor of 10 at least.
 A number of the other rows estimates seem to be off by 

Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-21 Thread John A Meinel
Shoaib Burq (VPAC) wrote:
Just tried it with the following changes:
shared_buffers = 10600
work_mem = 102400
enable_seqscan = false
still no improvement
Ok here's the Plan with the enable_seqscan = false:
ausclimate=# explain ANALYZE  select count(*) from getfutureausclimate;

Actually, you probably don't want enable_seqscan=off, you should try:
SET enable_nestloop TO off.
The problem is that it is estimating there will only be 44 rows, but in
reality there are 13M rows. It almost definitely should be doing a
seqscan with a sort and merge join.
Also, please attach you explain analyzes, the wrapping is really hard to
read.
I don't understand how postgres could get the number of rows that wrong.
It seems to be misestimating the number of entries in IX_ClimateId
Here:
-  Index Scan using PK_Aus40_DEM on Aus40_DEM  (cost=0.00..6.01 rows=1 
width=16) (actual time=0.005..0.006 rows=1 loops=13276368)
   Index Cond: (outer.AusPosNumber = Aus40_DEM.AusPosNumber)
-  Index Scan using PK_CurrentAusClimate on CurrentAusClimate  
(cost=0.00..46.20 rows=11 width=14) (actual time=0.007..0.009 rows=1 loops=13276368)
The first index scan is costing you 0.006*13276368=79s, and the second one is 
119s.
I can't figure out exactly what is where from the formatting, but the query 
that seems misestimated is:
-  Index Scan using IX_ClimateId on ClimateChangeModel40  
(cost=0.00..1063711.75 rows=265528 width=20) (actual time=28.311..17212.703 rows=13276368 loops=1)
   Index Cond: (outer.ClimateId = ClimateChangeModel40.ClimateId)
Is there an unexpected correlaction between
ClimateChangeModel40.ClimateId and whatever outer is at this point?
John
=:-


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] When are index scans used over seq scans?

2005-04-21 Thread Richard van den Berg
John A Meinel wrote:
 You might try doing:
 ALTER TABLE us ALTER COLUMN starttimetrunc SET STATISTICS 200;
 ALTER TABLE us ALTER COLUMN finishtimetrunc SET STATISTICS 200;
 VACUUM ANALYZE us;

I've been looking into that. While increasing the statistics makes the
planner use the index for simple selects, it still does not for joins.

Another thing that threw me off is that after a vacuum analyze a
select * from us where 'x' between start and finish uses seq scans,
while after just an analyze is uses the index! I thought both
statements were supposed to update the statistics in the same way? (This
is with 7.4.7.)

 You have 2 tables, a duration, and a from-to table, right? How many
 rows in each? 

Duration: 10k
Sessions: 1M

 Anyway, you can play around with it by using stuff like:
 SET enable_seqscan TO off;

This doesn't help much. Instead of turning seqscans off this setting
increases its cost with 100M. Since my query already has a cost of about
400M-800M this doesn't matter much.

For now, the only reliable way of forcing the use of the index is to set
cpu_tuple_cost = 1.

-- 
Richard van den Berg, CISSP
---
Trust Factory B.V. | www.dna-portal.net
Bazarstraat 44a|  www.trust-factory.com
2518AK The Hague   |  Phone: +31 70 3620684
The Netherlands|  Fax  : +31 70 3603009
---


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] When are index scans used over seq scans?

2005-04-21 Thread Richard van den Berg
Tom Lane wrote:
 which is almost 80% of the entire runtime.  Which is enormous.
 What are those column datatypes exactly? 

  Table richard.sessions
 Column |Type | Modifiers
+-+---
[unrelated columns removed]
 starttimetrunc | timestamp without time zone |
 finishtimetrunc| timestamp without time zone |
Indexes:
rb_us_st_ft_idx btree (starttimetrunc, finishtimetrunc)
rb_us_st_ft_idx2 btree (finishtimetrunc, starttimetrunc)
Check constraints:
date_check CHECK (finishtimetrunc = starttimetrunc)

 Table richard.duration
 Column |Type | Modifiers
+-+---
 ts | timestamp without time zone |

 Perhaps you are incurring a datatype conversion cost? 

Not that I can tell.

 It seems more likely that the cpu_operator_cost is underestimated,

As you perdicted, increasing cpu_operator_cost from 0.0025 to 0.025 also
causes the planner to use the index on duration.

 which leads me to question what exactly is happening in those
 comparisons.

Your guess is as good as mine (actually, yours is much better). I can
put together a reproducable test case if you like..

-- 
Richard van den Berg, CISSP
---
Trust Factory B.V. | www.dna-portal.net
Bazarstraat 44a|  www.trust-factory.com
2518AK The Hague   |  Phone: +31 70 3620684
The Netherlands|  Fax  : +31 70 3603009
---
   Have you visited our new DNA Portal?
---

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] When are index scans used over seq scans?

2005-04-21 Thread Tom Lane
Richard van den Berg [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Perhaps you are incurring a datatype conversion cost? 

 Not that I can tell.

No, apparently not.  Hmm ... timestamp_cmp_internal is just a couple of
isnan() checks and one or two floating-point compares.  Should be pretty
dang cheap.  Unless isnan() is ridiculously expensive on your hardware?
More likely there is some bottleneck that we are not thinking of.

Are the tables in question particularly wide (many columns)?

 which leads me to question what exactly is happening in those
 comparisons.

 Your guess is as good as mine (actually, yours is much better). I can
 put together a reproducable test case if you like..

I'm thinking it would be interesting to look at a gprof profile of the
nestloop case.  If you can rebuild with profiling and get one, that
would be fine, or you can make up a test case that shows the same slow
joining behavior.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[PERFORM] Index bloat problem?

2005-04-21 Thread Bill Chandler
All,

Running PostgreSQL 7.4.2, Solaris.

Client is reporting that the size of an index is
greater than the number of rows in the table (1.9
million vs. 1.5 million).  Index was automatically
created from a 'bigserial unique' column.

Database contains several tables with exactly the same
columns (including 'bigserial unique' column).  This
is the only table where this index is out of line with
the actual # of rows.  

Queries on this table take 40 seconds to retrieve 2000
rows as opposed to 1-2 seconds on the other tables.

We have been running 'VACUUM ANALYZE' very regularly. 
In fact, our vacuum schedule has probably been
overkill.  We have been running on a per-table basis
after every update (many per day, only inserts
occurring) and after every purge (one per day,
deleting a day's worth of data).  

It is theoretically possible that at some time a
process was run that deleted all rows in the table
followed by a VACUUM FULL.  In this case we would have
dropped/recreated our own indexes on the table but not
the index automatically created for the bigserial
column.  If that happened, could that cause these
symptoms?

What about if an out-of-the-ordinary number of rows
were deleted (say 75% of rows in the table, as opposed
to normal 5%) followed by a 'VACUUM ANALYZE'?  Could
things get out of whack because of that situation?

thanks,

Bill

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Josh Berkus
Bill,

 What about if an out-of-the-ordinary number of rows
 were deleted (say 75% of rows in the table, as opposed
 to normal 5%) followed by a 'VACUUM ANALYZE'?  Could
 things get out of whack because of that situation?

Yes.  You'd want to run REINDEX after and event like that.  As you should now.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] How can an index be larger than a table

2005-04-21 Thread Josh Berkus
David,

 What also seems weird to me is that the control table has some unique
 indexes created on it, but the data_upate_events table just has a unique
 constraint.  Will postgres use an index in the background to enforce
 this constraint?

If you somehow have a unique constraint without a unique index, something is 
seriously broken.   I suspect hacking of system tables.

Otherwise, it sounds like you have index bloat due to mass deletions.  Run 
REINDEX, or, preferably, VACUUM FULL and then REINDEX.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Bill Chandler

--- Josh Berkus josh@agliodbs.com wrote:
 Bill,
 
  What about if an out-of-the-ordinary number of
 rows
  were deleted (say 75% of rows in the table, as
 opposed
  to normal 5%) followed by a 'VACUUM ANALYZE'?
  Could
  things get out of whack because of that situation?
 
 Yes.  You'd want to run REINDEX after and event like
 that.  As you should now.
 
 -- 
 Josh Berkus
 Aglio Database Solutions
 San Francisco
 

Thank you.  Though I must say, that is very
discouraging.  REINDEX is a costly operation, timewise
and due to the fact that it locks out other processes
from proceeding.  Updates are constantly coming in and
queries are occurring continuously.  A REINDEX could
potentially bring the whole thing to a halt.

Honestly, this seems like an inordinate amount of
babysitting for a production application.  I'm not
sure if the client will be willing to accept it.  

Admittedly my knowledge of the inner workings of an
RDBMS is limited, but could somebody explain to me why
this would be so?  If you delete a bunch of rows why
doesn't the index get updated at the same time?  Is
this a common issue among all RDBMSs or is it
something that is PostgreSQL specific?  Is there any
way around it?

thanks,

Bill

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Josh Berkus
Bill,

 Honestly, this seems like an inordinate amount of
 babysitting for a production application.  I'm not
 sure if the client will be willing to accept it.

Well, then, tell them not to delete 75% of the rows in a table at once.  I 
imagine that operation brought processing to a halt, too.

If the client isn't willing to accept the consequences of their own bad data 
management, I'm not really sure what you expect us to do about it.

 Admittedly my knowledge of the inner workings of an
 RDBMS is limited, but could somebody explain to me why
 this would be so?  If you delete a bunch of rows why
 doesn't the index get updated at the same time?  

It does get updated.  What doesn't happen is the space getting reclaimed.  In 
a *normal* data situation, the dead nodes are recycled for new rows.   But 
doing a massive delete operation upsets that, and generally needs to be 
followed by a REINDEX.

 Is 
 this a common issue among all RDBMSs or is it
 something that is PostgreSQL specific?  

Speaking from experience, this sort of thing affects MSSQL as well, although 
the maintenance routines are different.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Josh Berkus
Bill,

 Honestly, this seems like an inordinate amount of
 babysitting for a production application.  I'm not
 sure if the client will be willing to accept it.

Well, then, tell them not to delete 75% of the rows in a table at once.  I 
imagine that operation brought processing to a halt, too.

 Admittedly my knowledge of the inner workings of an
 RDBMS is limited, but could somebody explain to me why
 this would be so?  If you delete a bunch of rows why
 doesn't the index get updated at the same time?  

It does get updated.  What doesn't happen is the space getting reclaimed.  In 
a *normal* data situation, those dead nodes would be replaced with new index 
nodes.   However, a mass-delete-in-one-go messes that system up.

 Is 
 this a common issue among all RDBMSs or is it
 something that is PostgreSQL specific?  

Speaking from experience, this sort of thing affects MSSQL as well, although 
the maintenance routines are different.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Alex Turner
Is:

REINDEX DATABASE blah

supposed to rebuild all indices in the database, or must you specify
each table individualy? (I'm asking because I just tried it and it
only did system tables)

Alex Turner
netEconomist

On 4/21/05, Josh Berkus josh@agliodbs.com wrote:
 Bill,
 
  What about if an out-of-the-ordinary number of rows
  were deleted (say 75% of rows in the table, as opposed
  to normal 5%) followed by a 'VACUUM ANALYZE'? Could
  things get out of whack because of that situation?
 
 Yes.  You'd want to run REINDEX after and event like that.  As you should now.
 
 --
 Josh Berkus
 Aglio Database Solutions
 San Francisco
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
http://archives.postgresql.org


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Josh Berkus
Alex,

 REINDEX DATABASE blah

 supposed to rebuild all indices in the database, or must you specify
 each table individualy? (I'm asking because I just tried it and it
 only did system tables)

DATABASE

 Recreate all system indexes of a specified database. Indexes on user tables 
are not processed. Also, indexes on shared system catalogs are skipped except 
in stand-alone mode (see below). 

http://www.postgresql.org/docs/8.0/static/sql-reindex.html

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Chris Browne
josh@agliodbs.com (Josh Berkus) writes:
 Bill,

 What about if an out-of-the-ordinary number of rows
 were deleted (say 75% of rows in the table, as opposed
 to normal 5%) followed by a 'VACUUM ANALYZE'?  Could
 things get out of whack because of that situation?

 Yes.  You'd want to run REINDEX after and event like that.  As you should now.

Based on Tom's recent comments, I'd be inclined to handle this via
doing a CLUSTER, which has the triple heroism effect of:

 a) Reorganizing the entire table to conform with the relevant index order,
 b) Having the effect of VACUUM FULL, and
 c) Having the effect of REINDEX

all in one command.

It has all of the oops, that blocked me for 20 minutes effect of
REINDEX and VACUUM FULL, but at least it doesn't have the effect
twice...
-- 
(format nil [EMAIL PROTECTED] cbbrowne acm.org)
http://www.ntlug.org/~cbbrowne/sap.html
Rules of the Evil Overlord #78.  I will not tell my Legions of Terror
And he must  be taken alive! The command will be:  ``And try to take
him alive if it is reasonably practical.''
http://www.eviloverlord.com/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Michael Guerin

Is 
this a common issue among all RDBMSs or is it
something that is PostgreSQL specific?  
   

Speaking from experience, this sort of thing affects MSSQL as well, although 
the maintenance routines are different.

 

Yes, this is true with MSSQL too, however sql server implements a defrag 
index that doesn't lock up the table..

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_dbcc_30o9.asp
DBCC INDEXDEFRAG can defragment clustered and nonclustered indexes on 
tables and views. DBCC INDEXDEFRAG defragments the leaf level of an 
index so that the physical order of the pages matches the left-to-right 
logical order of the leaf nodes, thus improving index-scanning performance.

Every five minutes, DBCC INDEXDEFRAG will report to the user an 
estimated percentage completed. DBCC INDEXDEFRAG can be terminated at 
any point in the process, and *any completed work is retained.*

-michael
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Josh Berkus
Michael,

 Every five minutes, DBCC INDEXDEFRAG will report to the user an
 estimated percentage completed. DBCC INDEXDEFRAG can be terminated at
 any point in the process, and *any completed work is retained.*

Keen.  Sounds like something for our TODO list.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Disk filling, CPU filling, renegade inserts and deletes?

2005-04-21 Thread Richard Plotkin
More info on what is bloating:
It's only in one database (the one that's most used), and after running 
oid2name on the bloated files, the result is (mysteriously) empty.  
Here's the run on the three enormous files:

$ /usr/local/bin/oid2name -d smt -o 160779
From database smt:
  Filenode  Table Name
--
$ /usr/local/bin/oid2name -d smt -o 65782869
From database smt:
  Filenode  Table Name
--
$ /usr/local/bin/oid2name -d smt -o 83345634
From database smt:
  Filenode  Table Name
--
The file list looks like this (with normal sized files mostly removed):
1.0G./106779
1.0G./106779.1
1.0G./106779.2
1.0G./106779.3
978M./106779.4
1.0G./65782869
248M./65782869.1
  0B./65782871
8.0K./65782873
780M./83345634
  0B./83345636
8.0K./83345638
So does the empty result mean it's a temporary table?  There is one 
temporary table (in the function previously mentioned) that does get 
created and dropped with some regularity.

Thanks again,
Richard
On Apr 20, 2005, at 2:06 PM, Richard Plotkin wrote:
Hi Tom,
Q: what have you got the FSM parameters set to?
Here's from postgresql.conf -- FSM at default settings.
# - Memory -
shared_buffers = 30400  # min 16, at least max_connections*2, 
8KB each
work_mem = 32168# min 64, size in KB
#maintenance_work_mem = 16384   # min 1024, size in KB
#max_stack_depth = 2048 # min 100, size in KB

# - Free Space Map -
#max_fsm_pages = 2  # min max_fsm_relations*16, 6 bytes 
each
#max_fsm_relations = 1000   # min 100, ~50 bytes each

# - Kernel Resource Usage -
max_files_per_process = 750 #1000   # min 25
#preload_libraries = ''

Q: what exactly is bloating?  Without knowing which tables or indexes
are growing, it's hard to speculate about the exact causes.  Use du 
and
oid2name, or look at pg_class.relpages after a plain VACUUM.
This I do not know.  I've disabled the cron jobs and will let the 
system bloat, then I will gather statistics (I'll give it 12-24 
hours).

It's likely that the real answer is you need to vacuum more often
than every six hours, but I'm trying not to jump to conclusions.
That could be it, except that I would expect the problem to then look 
more like a gradual increase in CPU usage and a gradual increase in 
use of disk space.  Mine could be an invalid assumption, but the 
system here looks like it goes from no problem to 100% problem within 
a minute.

Thanks again!
Richard
---(end of 
broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if 
your
 joining column's datatypes do not match


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


[PERFORM] index not used

2005-04-21 Thread Enrico Weigelt

Hi folks,


I'm doing a simple lookup in a small table by an unique id, and I'm
wondering, why explains tells me seqscan is used instead the key.

The table looks like:

id  bigint  primary key,
a   varchar,
b   varchar,
c   varchar

and I'm quering: select * from foo where id = 2;

I've got only 15 records in this table, but I wanna have it as 
fast as possible since its used (as a map between IDs and names) 
for larger queries.


thx
-- 
-
 Enrico Weigelt==   metux IT service

  phone: +49 36207 519931 www:   http://www.metux.de/
  fax:   +49 36207 519932 email: [EMAIL PROTECTED]
  cellphone: +49 174 7066481
-
 -- DSL ab 0 Euro. -- statische IP -- UUCP -- Hosting -- Webshops --
-

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Bill Chandler
--- [EMAIL PROTECTED] wrote:
 I gather you mean, out-of-the-ordinary for most
 apps, but not for this client?

Actually, no.  The normal activity is to delete 3-5%
of the rows per day, followed by a VACUUM ANALYZE. 
Then over the course of the day (in multiple
transactions) about the same amount are INSERTed (each
transaction followed by a VACUUM ANALYZE on just the
updated table).  So 75% deletion is just out of the
ordinary for this app.  However, on occasion, deleting
75% of rows is a legitimate action for the client to
take.  It would be nice if they didn't have to
remember to do things like REINDEX or CLUSTER or
whatever on just those occasions.
 
 In case nobody else has asked: is your max_fsm_pages
 big enough to handle all
 the deleted pages, across ALL tables hit by the
 purge? If not, you're
 haemorrhaging pages, and VACUUM is probably warning
 you about exactly that.

This parameter is most likely set incorrectly.  So
that could be causing problems.  Could that be a
culprit for the index bloat, though?

 If that's not a problem, you might want to consider
 partitioning the data.
 Take a look at inherited tables. For me, they're a
 good approximation of
 clustered indexes (sigh, miss'em) and equivalent to
 table spaces.
 
 My app is in a similar boat to yours: up to 1/3 of a
 10M-row table goes away
 every day. For each of the child tables that is a
 candidate to be dropped, there
 is a big prologue txn, whichs moves (INSERT then
 DELETE) the good rows into a
 child table that is NOT to be dropped. Then BANG
 pull the plug on the tables you
 don't want. MUCH faster than DELETE: the dropped
 tables' files' disk space goes
 away in one shot, too.
 
 Just my 2c.

Thanks.

Bill 

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Alex Turner
Same thing happens in Oracle

ALTER INDEX blah rebuild

To force a rebuild.  It will mark the free blocks as 'free' below the
PCTFREE value for the tablespace.

Basically If you build an index with  entries.  and each entry is
1/4 of a block, the database will write 2500 blocks to the disk.  If
you delete a random 75% of the index values, you will now have 2500
blocks that have 75% free space.  The database will reuse that free
space in those blocks as you insert new values, but until then, you
still have 2500 blocks worth of data on a disk, that is only 25% full.
 Rebuilding the index forces the system to physically re-allocate all
that data space, and now you have just 2499 entries, that use 625
blocks.

I'm not sure that 'blocks' is the correct term in postgres, it's
segments in Oracle, but the concept remains the same.

Alex Turner
netEconomist

On 4/21/05, Bill Chandler [EMAIL PROTECTED] wrote:
 
 --- Josh Berkus josh@agliodbs.com wrote:
  Bill,
 
   What about if an out-of-the-ordinary number of
  rows
   were deleted (say 75% of rows in the table, as
  opposed
   to normal 5%) followed by a 'VACUUM ANALYZE'?
  Could
   things get out of whack because of that situation?
 
  Yes.  You'd want to run REINDEX after and event like
  that.  As you should now.
 
  --
  Josh Berkus
  Aglio Database Solutions
  San Francisco
 
 
 Thank you.  Though I must say, that is very
 discouraging.  REINDEX is a costly operation, timewise
 and due to the fact that it locks out other processes
 from proceeding.  Updates are constantly coming in and
 queries are occurring continuously.  A REINDEX could
 potentially bring the whole thing to a halt.
 
 Honestly, this seems like an inordinate amount of
 babysitting for a production application.  I'm not
 sure if the client will be willing to accept it.
 
 Admittedly my knowledge of the inner workings of an
 RDBMS is limited, but could somebody explain to me why
 this would be so?  If you delete a bunch of rows why
 doesn't the index get updated at the same time?  Is
 this a common issue among all RDBMSs or is it
 something that is PostgreSQL specific?  Is there any
 way around it?
 
 thanks,
 
 Bill
 
 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam protection around
 http://mail.yahoo.com
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
http://archives.postgresql.org


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] immutable functions vs. join for lookups ?

2005-04-21 Thread Enrico Weigelt
* Jaime Casanova [EMAIL PROTECTED] wrote:

snip
 Even if your data never changes it *can* change so the function should
 be at most stable not immutable.

okay, the planner sees that the table could potentionally change.
but - as the dba - I'd like to tell him, this table *never* changes 
in practise (or at most there will be an insert once a year)

isnt there any way to enforce the function to be really immutable ?


cu
-- 
-
 Enrico Weigelt==   metux IT service

  phone: +49 36207 519931 www:   http://www.metux.de/
  fax:   +49 36207 519932 email: [EMAIL PROTECTED]
  cellphone: +49 174 7066481
-
 -- DSL ab 0 Euro. -- statische IP -- UUCP -- Hosting -- Webshops --
-

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] immutable functions vs. join for lookups ?

2005-04-21 Thread Enrico Weigelt
* Tom Lane [EMAIL PROTECTED] wrote:

snip
 Yeah, I was actually thinking about a two-step process: inline the
 function to produce somethig equivalent to a handwritten scalar
 sub-SELECT, and then try to convert sub-SELECTs into joins.

... back to my original question ... 

What kind of query should I use ?
Is a join better than a function ? 


cu
-- 
-
 Enrico Weigelt==   metux IT service

  phone: +49 36207 519931 www:   http://www.metux.de/
  fax:   +49 36207 519932 email: [EMAIL PROTECTED]
  cellphone: +49 174 7066481
-
 -- DSL ab 0 Euro. -- statische IP -- UUCP -- Hosting -- Webshops --
-

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Dave Chapeskie
On Thu, Apr 21, 2005 at 11:28:43AM -0700, Josh Berkus wrote:
 Michael,
 
  Every five minutes, DBCC INDEXDEFRAG will report to the user an
  estimated percentage completed. DBCC INDEXDEFRAG can be terminated at
  any point in the process, and *any completed work is retained.*
 
 Keen.  Sounds like something for our TODO list.
 
 -- 
 Josh Berkus
 Aglio Database Solutions
 San Francisco

See http://archives.postgresql.org/pgsql-general/2005-03/msg01465.php
for my thoughts on a non-blocking alternative to REINDEX.  I got no
replies to that message. :-(


I've almost got a working solution integrated in the backend that does
correct WAL logging and everything.  (Writing the code to write and
replay WAL logs for complicated operations can be very annoying!)

For now I've gone with a syntax of:

  REINDEX INDEX btree_index_name INCREMENTAL;

(For now it's not a proper index AM (accessor method), instead the
generic index code knows this is only supported for btrees and directly
calls the btree_compress function.)

It's not actually a REINDEX per-se in that it doesn't rebuild the whole
index.  It holds brief exclusive locks on the index while it shuffles
items around to pack the leaf pages fuller.  There were issues with the
code I attached to the above message that have been resolved with the
new code.  With respect to the numbers provided in that e-mail the new
code also recycles more pages than before.

Once I've finished it up I'll prepare and post a patch.

-- 
Dave Chapeskie
OpenPGP Key ID: 0x3D2B6B34

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] index not used

2005-04-21 Thread Litao Wu
If id is PK, the query shoudl return 1 row only...
--- Enrico Weigelt [EMAIL PROTECTED] wrote:
 
 Hi folks,
 
 
 I'm doing a simple lookup in a small table by an
 unique id, and I'm
 wondering, why explains tells me seqscan is used
 instead the key.
 
 The table looks like:
 
 idbigint  primary key,
 a varchar,
 b varchar,
 c varchar
 
 and I'm quering: select * from foo where id = 2;
 
 I've got only 15 records in this table, but I wanna
 have it as 
 fast as possible since its used (as a map between
 IDs and names) 
 for larger queries.
 
 
 thx
 -- 

-
  Enrico Weigelt==   metux IT service
 
   phone: +49 36207 519931 www:  
 http://www.metux.de/
   fax:   +49 36207 519932 email:
 [EMAIL PROTECTED]
   cellphone: +49 174 7066481

-
  -- DSL ab 0 Euro. -- statische IP -- UUCP --
 Hosting -- Webshops --

-
 
 ---(end of
 broadcast)---
 TIP 8: explain analyze is your friend
 



__ 
Do you Yahoo!? 
Yahoo! Small Business - Try our new resources site!
http://smallbusiness.yahoo.com/resources/ 

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Disk filling, CPU filling, renegade inserts and deletes?

2005-04-21 Thread Alvaro Herrera
On Thu, Apr 21, 2005 at 11:38:22AM -0700, Richard Plotkin wrote:
 More info on what is bloating:
 
 It's only in one database (the one that's most used), and after running 
 oid2name on the bloated files, the result is (mysteriously) empty.  
 Here's the run on the three enormous files:
 
 $ /usr/local/bin/oid2name -d smt -o 160779
 From database smt:
   Filenode  Table Name
 --

Try -f instead of -o ...

-- 
Alvaro Herrera ([EMAIL PROTECTED])
World domination is proceeding according to plan(Andrew Morton)

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Disk filling, CPU filling, renegade inserts and deletes?

2005-04-21 Thread Richard Plotkin
That returned the same result.  I also tried oid2name -d smt -x -i -S 
and, separately -s, and also separately, -d with all other databases, 
and none of the databases turned up any listing, in either oid or 
filenode, for any of these three bloated files.  One thing I've noticed 
is that these oids are all extremely large numbers, whereas the rest of 
the oids in /data/base/* are no higher than 4 or 5.

On Apr 21, 2005, at 1:46 PM, Alvaro Herrera wrote:
On Thu, Apr 21, 2005 at 11:38:22AM -0700, Richard Plotkin wrote:
More info on what is bloating:
It's only in one database (the one that's most used), and after 
running
oid2name on the bloated files, the result is (mysteriously) empty.
Here's the run on the three enormous files:

$ /usr/local/bin/oid2name -d smt -o 160779
From database smt:
  Filenode  Table Name
--
Try -f instead of -o ...
--
Alvaro Herrera ([EMAIL PROTECTED])
World domination is proceeding according to plan(Andrew 
Morton)

---(end of 
broadcast)---
TIP 7: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] index not used

2005-04-21 Thread Stephan Szabo
On Thu, 21 Apr 2005, Enrico Weigelt wrote:

 I'm doing a simple lookup in a small table by an unique id, and I'm
 wondering, why explains tells me seqscan is used instead the key.

 The table looks like:

 idbigint  primary key,
 a varchar,
 b varchar,
 c varchar

 and I'm quering: select * from foo where id = 2;

 I've got only 15 records in this table, but I wanna have it as
 fast as possible since its used (as a map between IDs and names)
 for larger queries.

Two general things:
 For 15 records, an index scan may not be faster.  For simple tests
  you can play with enable_seqscan to see, but for more complicated
  queries it's a little harder to tell.
 If you're using a version earlier than 8.0, you'll need to quote
  or cast the value you're searching for due to problems with
  cross-type comparisons (the 2 would be treated as int4).

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Mischa Sandberg
Quoting Bill Chandler [EMAIL PROTECTED]:

 ... The normal activity is to delete 3-5% of the rows per day,
 followed by a VACUUM ANALYZE. 
...
 However, on occasion, deleting 75% of rows is a 
 legitimate action for the client to take.  

  In case nobody else has asked: is your max_fsm_pages
  big enough to handle all the deleted pages, 
  across ALL tables hit by the purge?

 This parameter is most likely set incorrectly.  So
 that could be causing problems.  Could that be a
 culprit for the index bloat, though?

Look at the last few lines of vacuum verbose output.
It will say something like:

free space map: 55 relations, 88416 pages stored; 89184 total pages needed
  Allocated FSM size: 1000 relations + 100 pages = 5920 kB shared memory.

100 here is [max_fsm_pages] from my postgresql.conf.
If the total pages needed is bigger than the pages 
fsm is allocated for, then you are bleeding.
-- 
Dreams come true, not free. -- S.Sondheim, ITW


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Bill Chandler
Mischa,

Thanks.  Yes, I understand that not having a large
enough max_fsm_pages is a problem and I think that it
is most likely the case for the client.  What I wasn't
sure of was if the index bloat we're seeing is the
result of the bleeding you're talking about or
something else.

If I deleted 75% of the rows but had a max_fsm_pages
setting that still exceeded the pages required (as
indicated in VACUUM output), would that solve my
indexing problem or would I still need to REINDEX
after such a purge?

regards,

Bill

--- Mischa Sandberg [EMAIL PROTECTED] wrote:
 Quoting Bill Chandler [EMAIL PROTECTED]:
 
  ... The normal activity is to delete 3-5% of the
 rows per day,
  followed by a VACUUM ANALYZE. 
 ...
  However, on occasion, deleting 75% of rows is a 
  legitimate action for the client to take.  
 
   In case nobody else has asked: is your
 max_fsm_pages
   big enough to handle all the deleted pages, 
   across ALL tables hit by the purge?
 
  This parameter is most likely set incorrectly.  So
  that could be causing problems.  Could that be a
  culprit for the index bloat, though?
 
 Look at the last few lines of vacuum verbose output.
 It will say something like:
 
 free space map: 55 relations, 88416 pages stored;
 89184 total pages needed
   Allocated FSM size: 1000 relations + 100 pages
 = 5920 kB shared memory.
 
 100 here is [max_fsm_pages] from my
 postgresql.conf.
 If the total pages needed is bigger than the pages
 
 fsm is allocated for, then you are bleeding.
 -- 
 Dreams come true, not free. -- S.Sondheim, ITW
 
 

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Josh Berkus
Bill,

 If I deleted 75% of the rows but had a max_fsm_pages
 setting that still exceeded the pages required (as
 indicated in VACUUM output), would that solve my
 indexing problem or would I still need to REINDEX
 after such a purge?

Depends on the performance you're expecting.The FSM relates the the re-use 
of nodes, not taking up free space.   So after you've deleted 75% of rows, 
the index wouldn't shrink.  It just wouldn't grow when you start adding rows.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread John A Meinel
Bill Chandler wrote:
Mischa,
Thanks.  Yes, I understand that not having a large
enough max_fsm_pages is a problem and I think that it
is most likely the case for the client.  What I wasn't
sure of was if the index bloat we're seeing is the
result of the bleeding you're talking about or
something else.
If I deleted 75% of the rows but had a max_fsm_pages
setting that still exceeded the pages required (as
indicated in VACUUM output), would that solve my
indexing problem or would I still need to REINDEX
after such a purge?
regards,
Bill

I don't believe VACUUM re-packs indexes. It just removes empty index
pages. So if you have 1000 index pages all with 1 entry in them, vacuum
cannot reclaim any pages. REINDEX re-packs the pages to 90% full.
fsm just needs to hold enough pages that all requests have free space
that can be used before your next vacuum. It is just a map letting
postgres know where space is available for a new fill.
John
=:-


signature.asc
Description: OpenPGP digital signature


[PERFORM] foreign key performance

2005-04-21 Thread Enrico Weigelt

Hi folks,


do foreign keys have any influence on performance (besides slowing
down huge inserts) ? do they bring any performance improvement ?


thx
-- 
-
 Enrico Weigelt==   metux IT service

  phone: +49 36207 519931 www:   http://www.metux.de/
  fax:   +49 36207 519932 email: [EMAIL PROTECTED]
  cellphone: +49 174 7066481
-
 -- DSL ab 0 Euro. -- statische IP -- UUCP -- Hosting -- Webshops --
-

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-21 Thread Gavin Sherry
On Fri, 22 Apr 2005, Shoaib Burq (VPAC) wrote:

 Please see attached the output from explain analyse. This is with the

   shared_buffers = 10600
   work_mem = 102400
   enable_seqscan = true

 BTW I guess should mention that I am doing the select count(*) on a View.

 Ran the Explain analyse with the nestedloop disabled but it was taking
 forever... and killed it after 30mins.

Try increasing stats collection on ClimateChangeModel40.ClimateId:

alter table ClimateChangeModel40 alter column ClimateId set statistics 1000;
analyze ClimateChangeModel40;

Gavin

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match