Re: [PERFORM] Delete query takes exorbitant amount of time

2005-03-27 Thread Stephan Szabo

On Sat, 26 Mar 2005, Karim Nassar wrote:

 On Sat, 2005-03-26 at 15:18 -0800, Stephan Szabo wrote:
  On Sat, 26 Mar 2005, Karim Nassar wrote:
 
   On Sat, 2005-03-26 at 07:55 -0800, Stephan Szabo wrote:
That seems like it should be okay, hmm, what does something like:
   
PREPARE test(int) AS SELECT 1 from measurement where
id_int_sensor_meas_type = $1 FOR UPDATE;
EXPLAIN ANALYZE EXECUTE TEST(1);
   
give you as the plan?
  
 QUERY PLAN
   ---
Seq Scan on measurement  (cost=0.00..164559.16 rows=509478 width=6)
 (actual time=11608.402..11608.402 rows=0 
   loops=1)
 Filter: (id_int_sensor_meas_type = $1)
Total runtime: 11608.441 ms
   (3 rows)
 
  Hmm, has measurement been analyzed recently?  You might want to see if
  raising the statistics target on measurement.id_int_sensor_meas_type and
  reanalyzing changes the estimated rows down from 500k.

 orfs=# ALTER TABLE measurement ALTER COLUMN id_int_sensor_meas_type SET 
 STATISTICS 1000;
 ALTER TABLE
 orfs=# VACUUM FULL ANALYZE VERBOSE;
 snip
 INFO:  free space map: 52 relations, 13501 pages stored; 9760 total pages 
 needed
 DETAIL:  Allocated FSM size: 1000 relations + 30 pages = 1864 kB shared 
 memory.
 VACUUM
 orfs=# PREPARE test(int) AS SELECT 1 from measurement where
 orfs-# id_int_sensor_meas_type = $1 FOR UPDATE;
 PREPARE
 orfs=# EXPLAIN ANALYZE EXECUTE TEST(1);
  QUERY PLAN
 -
  Seq Scan on measurement  (cost=0.00..164559.16 rows=509478 width=6) (actual 
 time=8948.452..8948.452 rows=0 loops=1)
Filter: (id_int_sensor_meas_type = $1)
  Total runtime: 8948.494 ms
 (3 rows)

 orfs=# EXPLAIN ANALYZE EXECUTE TEST(1);
  QUERY PLAN
 -
  Seq Scan on measurement  (cost=0.00..164559.16 rows=509478 width=6) (actual 
 time=3956.616..3956.616 rows=0 loops=1)
Filter: (id_int_sensor_meas_type = $1)
  Total runtime: 3956.662 ms
 (3 rows)



 Some improvement. Even better once it's cached. Row estimate didn't
 change. Is this the best I can expect? Is there any other optimizations
 I am missing?

I'm not sure, really. Running a seq scan for each removed row in the
referenced table doesn't seem like a particularly good plan in general
though, especially if the average number of rows being referenced isn't
on the order of 500k per value. I don't know what to look at next though.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] How to improve db performance with $7K?

2005-03-27 Thread Alexander Kirpa
1. Buy for empty PCI-X Slot - 1 or dual channel SCSI-320
hardware RAID controller, like MegaRAID SCSI 320-2X
(don't forget check driver for your OS)
plus battery backup 
plus (optional) expand RAM to Maximum 256MB - approx $1K
2. Buy new MAXTOR drives - Atlas 15K II (4x36.7GB) - approx 4x$400.
3. SCSI 320 Cable set.
4. Old drives (2) use for OS (optional DB log) files in RAID1 mode,
possible over one channel of MegaRAID.
5. New drives (4+) in RAID10 mode for DB
6. Start tuning Postres + OS: more shared RAM etc.

Best regards,
 Alexander Kirpa

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] [BUGS] BUG #1552: massive performance hit between 7.4

2005-03-27 Thread Karim Nassar
On Fri, 2005-03-25 at 10:18 +, Simon Riggs wrote:
  When running with PostgreSQL 7.4 on a dual-CPU Athlon MP2400+ machine 
  with a gigabyte of RAM, running Debian Linux version 2.6.8.1, we were 
  able to insert all this data in 5-7 minutes.  It's taken a while to 
  install Postgres 8.0.1 on the same machine, but now I have, and it's 
  taking 40-45 minutes to run the same insert script.  

snip

 OK. Not-a-bug.

 Your situation is covered in the manual with some sage advice
 http://www.postgresql.org/docs/8.0/static/populate.html
 It doesn't go into great lengths about all the reasons why those
 recommendations are good ones - but they are clear.


Simon, this begs the question: what changed from 7.4-8.0 to require he
modify his script?


TIA,
-- 
Karim Nassar
Department of Computer Science
Box 15600, College of Engineering and Natural Sciences
Northern Arizona University,  Flagstaff, Arizona 86011
Office: (928) 523-5868 -=- Mobile: (928) 699-9221


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] pg_autovacuum not having enough suction ?

2005-03-27 Thread Matthew T. O'Connor
Well the simple answer is that pg_autovacuum didn't see 10,000 inserts 
updates or deletes.
pg_autovacuum saw:476095 - 471336 = 4759  U/D's relevant for 
vacuuming and
  634119 - 629121 = 4998  I/U/D's relevant for performing analyze.

The tough question is why is pg_autovacuum not seeing all the updates.  
Since autovacuum depends on the stats system for it's numbers, the most 
likely answer is that the stats system is not able to keep up with the 
workload, and is ignoring some of the updates.  Would you check to see 
what the stats system is reporting for numbers of I/U/D's for the 
file_92 table?  The query pg_autovacuum uses is:

select a.oid,a.relname,a.relnamespace,a.relpages,a.relisshared,a.reltuples,
  b.schemaname,b.n_tup_ins,b.n_tup_upd,b.n_tup_del
from pg_class a, pg_stat_all_tables b
where a.oid=b.relid and a.relkind = 'r'
Take a look at the n_tup_ins, upd, del numbers before and see if they 
are keeping up with the actual number if I/U/D's that you are 
performing.  If they are, then it's a pg_autovacuum problem that I will 
look into further, if they are not, then it's a stats system problem 
that I can't really help with.

Good luck,
Matthew
Otto Blomqvist wrote:
Hello !
I'm running pg_autovacuum on a 1GHz, 80Gig, 512Mhz machine. The database is
about 30MB tarred. We have about 5 Updates/Inserts/Deletes per day. It
runs beautifully for ~4 days. Then the HDD activity and the Postmaster CPU
usage goes up ALOT. Even though I have plenty (?) of FSM (2 million) pages.
I perform a vacuum and everything is back to normal for another 4 days. I
could schedule a manual vacuum each day but the util is not called
pg_SemiAutoVacuum so I'm hoping this is not necessary. The same user that
ran the manual vacuum is running pg_autovacuum. The normal CPU usage is
about 10% w/ little HD activity.
Im running autovacuum with the following flags -d 3 -v 300 -V 0.1 -s 180 -S
0.1 -a 200 -A 0.1
Below are some snipplets regarding vacuuming from the busiest table
This is the last VACUUM ANALYZE performed by pg_autovacuum before I ran the
manual vacuum
[2005-03-24 02:05:43 EST] DEBUG:Performing: VACUUM ANALYZE
public.file_92
[2005-03-24 02:05:52 EST] INFO: table name: secom.public.file_92
[2005-03-24 02:05:52 EST] INFO: relid: 9384219;   relisshared: 0
[2005-03-24 02:05:52 EST] INFO: reltuples: 106228.00;  relpages:
9131
[2005-03-24 02:05:52 EST] INFO: curr_analyze_count: 629121;
curr_vacuum_count: 471336
[2005-03-24 02:05:52 EST] INFO: last_analyze_count: 629121;
last_vacuum_count: 471336
[2005-03-24 02:05:52 EST] INFO: analyze_threshold: 10822;
vacuum_threshold: 10922
This is the last pg_autovacuum debug output before I ran the manual vacuum
[2005-03-24 09:18:44 EST] INFO: table name: secom.public.file_92
[2005-03-24 09:18:44 EST] INFO: relid: 9384219;   relisshared: 0
[2005-03-24 09:18:44 EST] INFO: reltuples: 106228.00;  relpages:
9131
[2005-03-24 09:18:44 EST] INFO: curr_analyze_count: 634119;
curr_vacuum_count: 476095
[2005-03-24 09:18:44 EST] INFO: last_analyze_count: 629121;
last_vacuum_count: 471336
[2005-03-24 09:18:44 EST] INFO: analyze_threshold: 10822;
vacuum_threshold: 10922
file_92 had about 1 Inserts/Deletes between 02:05  and 9:20
Then i Ran a vacuum verbose
23 Mar 05 - 9:20 AM
INFO:  vacuuming public.file_92
INFO:  index file_92_record_number_key now contains 94 row versions in
2720 pages
DETAIL:  107860 index row versions were removed.
2712 index pages have been deleted, 2060 are currently reusable.
CPU 0.22s/0.64u sec elapsed 8.45 sec.
INFO:  file_92: removed 107860 row versions in 9131 pages
DETAIL:  CPU 1.13s/4.27u sec elapsed 11.75 sec.
INFO:  file_92: found 107860 removable, 92 nonremovable row versions in
9131 pages
DETAIL:  91 dead row versions cannot be removed yet.
There were 303086 unused item pointers.
0 pages are entirely empty.
CPU 1.55s/5.00u sec elapsed 20.86 sec.
INFO:  file_92: truncated 9131 to 8423 pages
DETAIL:  CPU 0.65s/0.03u sec elapsed 5.80 sec.
INFO:  free space map: 57 relations, 34892 pages stored; 34464 total pages
needed
DETAIL:  Allocated FSM size: 1000 relations + 200 pages = 11784 kB
shared memory.
Also, file_92 is just a temporary storage area, for records waiting to be
processed. Records are in there typically ~10 sec.
Over 100'000 Index Rows removed, 300'000 unused item pointers ? How could
autovacuum let this happen ? I would estimate the table had about 1
inserts/deletes between the last pg_autovacuum Vacuum analyze and my
manual vacuum verbose.
It is like the suction is not strong enough ;)
Any ideas ? It would be greatly appreciated as this is taking me one step
closer to the looney bin.
Thanks
/Otto Blomqvist

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
 

--

Re: [PERFORM] Delete query takes exorbitant amount of time

2005-03-27 Thread Karim Nassar
On Thu, 2005-03-24 at 21:24 -0800, Josh Berkus wrote:
 Karim,
 How about getting some decent disk support?  A single 10K SCSI disk is a bit 
 sub-par for a database with 100's of millions of records.  Too bad you didn't 
 get a v40z ...

Hehe. I have one I am setting up that will be dedicated to postgresql,
hence my question about a week ago about disk partitioning/striping :-)


 Beyond that, you'll want to do the same thing whenever you purge the 
 referencing table; drop keys, delete, re-create keys.  Or think about why it 
 is you need to delete batches of records from this FKed table at all.

The database is for weather data from multiple sources. When adding a
new dataset, I have to create/test/delete/recreate the config in the
FKed table. Users don't have this power, but I need it.
Drop/delete/recreate is a totally acceptable solution for this scenario.

I guess I was wondering if there is other general tuning advice for such
large table indexes such as increasing statistics, etc. 

Thanks,
-- 
Karim Nassar
Department of Computer Science
Box 15600, College of Engineering and Natural Sciences
Northern Arizona University,  Flagstaff, Arizona 86011
Office: (928) 523-5868 -=- Mobile: (928) 699-9221


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Hardware questions

2005-03-27 Thread Dave Cramer
Junaili,
I'd suggest you don't buy a dell. The aren't particularly good performers.
Dave
Junaili Lie wrote:
Hi guys,
We are in the process of buying a new dell server.
Here is what we need to be able to do:
- we need to be able to do queries on tables that has 10-20 millions
of records (around 40-60 bytes each row) in less than 5-7 seconds.
We also need the hardware to be able to handle up to 50 millions
records on a few tables (5 tables in the DB).
Here is what we are thinking:
- Dual Xeon 2.8 Ghz
- 4GB DDR2 400 Mhz Dual Ranked DIMMS (is dual ranked or single ranked
makes any differences in terms of performance?). Do you guys think 4GB
is reasonably enough?
- 73 GB 15k RPM Ultra 320 SCSI Hard Drive
- Dual on-board NICS (is this enough, or Gigabit network adapter will help?)
Any input or suggestions is greatly appreciated.
Thank you,
Jun
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq
 

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] Configuration/Tuning of server/DB

2005-03-27 Thread Reid Thompson
Using information found on the web, I've come up with some configuration 
and tuning parameters for a server/db that we will be implementing.  I 
was wondering if I could generate some feedback as to configuration and 
tuning so that I could compare my estimations with those of others.

Host is AIX 5.1 with 4 cpu's and 4 GB ram.  Postgresql will be sharing 
this machine with other processes.  Storage is an EMC storage array. 
The DB itself is very simple.  Two tables, one with 40-45 columns ( 
largest column will likely contain no more than 32 chars of data ), the 
other with less than 5 columns ( largest column will contain no more 
than 20 chars data ).  Expected transactions will be along the order of 
~600K +- 100K inserts and ~600K +-200K updates per week.

Thanks
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Delete query takes exorbitant amount of time

2005-03-27 Thread Karim Nassar
On Thu, 2005-03-24 at 20:48 -0500, Tom Lane wrote:
 In that case there's a datatype mismatch between the referencing and
 referenced columns, which prevents the index from being used for the
 FK check.

Can I have more words on this? Here is how I created the tables:

CREATE TABLE int_sensor_meas_type( 
id_int_sensor_meas_type SERIAL PRIMARY KEY,
id_sensor integer NOT NULL REFERENCES sensor,
id_meas_type integer NOT NULL REFERENCES meas_type UNIQUE);


CREATE TABLE measurement (
id_measurement SERIAL PRIMARY KEY,
id_int_sensor_meas_type integer NOT NULL REFERENCES int_sensor_meas_type,
datetime timestamp WITH TIME ZONE NOT NULL,
value numeric(15,5) NOT NULL,
created timestamp with time zone NOT NULL DEFAULT now(),
created_by TEXT NOT NULL REFERENCES public.person(id_person));

CREATE INDEX measurement__id_int_sensor_meas_type_idx ON 
measurement(id_int_sensor_meas_type);

Do I need to cast the id_int_sensor_meas_type column when creating the
index? Both referrer and referenced look like INTEGER to me...

http://www.postgresql.org/docs/8.0/interactive/datatype.html#DATATYPE-SERIAL
says: The type names serial and serial4 are equivalent: both create
integer columns 

TIA,

-- 
Karim Nassar
Department of Computer Science
Box 15600, College of Engineering and Natural Sciences
Northern Arizona University,  Flagstaff, Arizona 86011
Office: (928) 523-5868 -=- Mobile: (928) 699-9221


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] Query Optimizer Failure / Possible Bug

2005-03-27 Thread Hannes Dorbath
The query and the corresponding EXPLAIN is at
http://hannes.imos.net/query.txt
I'd like to use the column q.replaced_serials for multiple calculations
in the SELECT clause, but every time it is referenced there in some way
the whole query in the FROM clause returning q is executed again.
This doesn't make sense to me at all and eats performance.
If this wasn't clear enough, for every
q.replaced_serials insert_random_calculation AS some_column
in the SELECT clause there is new block of
---
-  Aggregate  (cost=884.23..884.23 rows=1 width=0)
  -  Nested Loop  (cost=0.00..884.23 rows=1 width=0)
-  Index Scan using ix_rma_ticket_serials_replace on 

rma_ticket_serials rts  (cost=0.00..122.35
rows=190 width=4)
  Index Cond: (replace = false)
-  Index Scan using pk_serials on serials s
  (cost=0.00..3.51 rows=1 width=4)
  Index Cond: (s.serial_id = outer.serial_id)
  Filter: ((article_no = $0) AND (delivery_id = $1))
---
in the EXPLAIN result.
For those who wonder why I do this FROM (SELECT...). I was searching for
a way to use the result of an subselect for multiple calculations in the
SELECT clause and return that calculation results as individual columns.
I tested a bit further and found out that PG behaves the same in case q
is a view. This makes me wonder how efficient the optimizer can work
with views - or even worse - nested views.
Tested and reproduced on PG 7.4.1 linux and 8.0.0 win32.
Thanks in advance,
Hannes Dorbath
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] What about utility to calculate planner cost constants?

2005-03-27 Thread Kenneth Marshall
On Tue, Mar 22, 2005 at 08:09:40AM -0500, Christopher Browne wrote:
 Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Greg Stark) 
 wrote:
  I don't think it would be very hard at all actually.
 
  It's just a linear algebra problem with a bunch of independent
  variables and a system of equations. Solving for values for all of
  them is a straightforward problem.
 
  Of course in reality these variables aren't actually independent
  because the costing model isn't perfect. But that wouldn't be a
  problem, it would just reduce the accuracy of the results.
 
 Are you certain it's a linear system?  I'm not.  If it was a matter of
 minimizing a linear expression subject to some set of linear
 equations, then we could model this as a Linear Program for which
 there are some perfectly good solvers available.  (Few with BSD-style
 licenses, but we could probably get some insight out of running for a
 while with something that's there...)
 
 I think there's good reason to consider it to be distinctly
 NON-linear, which makes it way more challenging to solve the problem.
 
Non-linear optimization works very well in many cases. Issues such
as local minima can be addressed. In a sense, the planner output
can be treated as a blackbox function and the goodness of the
solution is how well it approximates the actual query times. In this
case, it will be imperative to constrain some of the values to prevent
crazy configurations.

Ken

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org