Re: [PERFORM] Delete query takes exorbitant amount of time
On Sat, 26 Mar 2005, Karim Nassar wrote: On Sat, 2005-03-26 at 15:18 -0800, Stephan Szabo wrote: On Sat, 26 Mar 2005, Karim Nassar wrote: On Sat, 2005-03-26 at 07:55 -0800, Stephan Szabo wrote: That seems like it should be okay, hmm, what does something like: PREPARE test(int) AS SELECT 1 from measurement where id_int_sensor_meas_type = $1 FOR UPDATE; EXPLAIN ANALYZE EXECUTE TEST(1); give you as the plan? QUERY PLAN --- Seq Scan on measurement (cost=0.00..164559.16 rows=509478 width=6) (actual time=11608.402..11608.402 rows=0 loops=1) Filter: (id_int_sensor_meas_type = $1) Total runtime: 11608.441 ms (3 rows) Hmm, has measurement been analyzed recently? You might want to see if raising the statistics target on measurement.id_int_sensor_meas_type and reanalyzing changes the estimated rows down from 500k. orfs=# ALTER TABLE measurement ALTER COLUMN id_int_sensor_meas_type SET STATISTICS 1000; ALTER TABLE orfs=# VACUUM FULL ANALYZE VERBOSE; snip INFO: free space map: 52 relations, 13501 pages stored; 9760 total pages needed DETAIL: Allocated FSM size: 1000 relations + 30 pages = 1864 kB shared memory. VACUUM orfs=# PREPARE test(int) AS SELECT 1 from measurement where orfs-# id_int_sensor_meas_type = $1 FOR UPDATE; PREPARE orfs=# EXPLAIN ANALYZE EXECUTE TEST(1); QUERY PLAN - Seq Scan on measurement (cost=0.00..164559.16 rows=509478 width=6) (actual time=8948.452..8948.452 rows=0 loops=1) Filter: (id_int_sensor_meas_type = $1) Total runtime: 8948.494 ms (3 rows) orfs=# EXPLAIN ANALYZE EXECUTE TEST(1); QUERY PLAN - Seq Scan on measurement (cost=0.00..164559.16 rows=509478 width=6) (actual time=3956.616..3956.616 rows=0 loops=1) Filter: (id_int_sensor_meas_type = $1) Total runtime: 3956.662 ms (3 rows) Some improvement. Even better once it's cached. Row estimate didn't change. Is this the best I can expect? Is there any other optimizations I am missing? I'm not sure, really. Running a seq scan for each removed row in the referenced table doesn't seem like a particularly good plan in general though, especially if the average number of rows being referenced isn't on the order of 500k per value. I don't know what to look at next though. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] How to improve db performance with $7K?
1. Buy for empty PCI-X Slot - 1 or dual channel SCSI-320 hardware RAID controller, like MegaRAID SCSI 320-2X (don't forget check driver for your OS) plus battery backup plus (optional) expand RAM to Maximum 256MB - approx $1K 2. Buy new MAXTOR drives - Atlas 15K II (4x36.7GB) - approx 4x$400. 3. SCSI 320 Cable set. 4. Old drives (2) use for OS (optional DB log) files in RAID1 mode, possible over one channel of MegaRAID. 5. New drives (4+) in RAID10 mode for DB 6. Start tuning Postres + OS: more shared RAM etc. Best regards, Alexander Kirpa ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] [BUGS] BUG #1552: massive performance hit between 7.4
On Fri, 2005-03-25 at 10:18 +, Simon Riggs wrote: When running with PostgreSQL 7.4 on a dual-CPU Athlon MP2400+ machine with a gigabyte of RAM, running Debian Linux version 2.6.8.1, we were able to insert all this data in 5-7 minutes. It's taken a while to install Postgres 8.0.1 on the same machine, but now I have, and it's taking 40-45 minutes to run the same insert script. snip OK. Not-a-bug. Your situation is covered in the manual with some sage advice http://www.postgresql.org/docs/8.0/static/populate.html It doesn't go into great lengths about all the reasons why those recommendations are good ones - but they are clear. Simon, this begs the question: what changed from 7.4-8.0 to require he modify his script? TIA, -- Karim Nassar Department of Computer Science Box 15600, College of Engineering and Natural Sciences Northern Arizona University, Flagstaff, Arizona 86011 Office: (928) 523-5868 -=- Mobile: (928) 699-9221 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] pg_autovacuum not having enough suction ?
Well the simple answer is that pg_autovacuum didn't see 10,000 inserts updates or deletes. pg_autovacuum saw:476095 - 471336 = 4759 U/D's relevant for vacuuming and 634119 - 629121 = 4998 I/U/D's relevant for performing analyze. The tough question is why is pg_autovacuum not seeing all the updates. Since autovacuum depends on the stats system for it's numbers, the most likely answer is that the stats system is not able to keep up with the workload, and is ignoring some of the updates. Would you check to see what the stats system is reporting for numbers of I/U/D's for the file_92 table? The query pg_autovacuum uses is: select a.oid,a.relname,a.relnamespace,a.relpages,a.relisshared,a.reltuples, b.schemaname,b.n_tup_ins,b.n_tup_upd,b.n_tup_del from pg_class a, pg_stat_all_tables b where a.oid=b.relid and a.relkind = 'r' Take a look at the n_tup_ins, upd, del numbers before and see if they are keeping up with the actual number if I/U/D's that you are performing. If they are, then it's a pg_autovacuum problem that I will look into further, if they are not, then it's a stats system problem that I can't really help with. Good luck, Matthew Otto Blomqvist wrote: Hello ! I'm running pg_autovacuum on a 1GHz, 80Gig, 512Mhz machine. The database is about 30MB tarred. We have about 5 Updates/Inserts/Deletes per day. It runs beautifully for ~4 days. Then the HDD activity and the Postmaster CPU usage goes up ALOT. Even though I have plenty (?) of FSM (2 million) pages. I perform a vacuum and everything is back to normal for another 4 days. I could schedule a manual vacuum each day but the util is not called pg_SemiAutoVacuum so I'm hoping this is not necessary. The same user that ran the manual vacuum is running pg_autovacuum. The normal CPU usage is about 10% w/ little HD activity. Im running autovacuum with the following flags -d 3 -v 300 -V 0.1 -s 180 -S 0.1 -a 200 -A 0.1 Below are some snipplets regarding vacuuming from the busiest table This is the last VACUUM ANALYZE performed by pg_autovacuum before I ran the manual vacuum [2005-03-24 02:05:43 EST] DEBUG:Performing: VACUUM ANALYZE public.file_92 [2005-03-24 02:05:52 EST] INFO: table name: secom.public.file_92 [2005-03-24 02:05:52 EST] INFO: relid: 9384219; relisshared: 0 [2005-03-24 02:05:52 EST] INFO: reltuples: 106228.00; relpages: 9131 [2005-03-24 02:05:52 EST] INFO: curr_analyze_count: 629121; curr_vacuum_count: 471336 [2005-03-24 02:05:52 EST] INFO: last_analyze_count: 629121; last_vacuum_count: 471336 [2005-03-24 02:05:52 EST] INFO: analyze_threshold: 10822; vacuum_threshold: 10922 This is the last pg_autovacuum debug output before I ran the manual vacuum [2005-03-24 09:18:44 EST] INFO: table name: secom.public.file_92 [2005-03-24 09:18:44 EST] INFO: relid: 9384219; relisshared: 0 [2005-03-24 09:18:44 EST] INFO: reltuples: 106228.00; relpages: 9131 [2005-03-24 09:18:44 EST] INFO: curr_analyze_count: 634119; curr_vacuum_count: 476095 [2005-03-24 09:18:44 EST] INFO: last_analyze_count: 629121; last_vacuum_count: 471336 [2005-03-24 09:18:44 EST] INFO: analyze_threshold: 10822; vacuum_threshold: 10922 file_92 had about 1 Inserts/Deletes between 02:05 and 9:20 Then i Ran a vacuum verbose 23 Mar 05 - 9:20 AM INFO: vacuuming public.file_92 INFO: index file_92_record_number_key now contains 94 row versions in 2720 pages DETAIL: 107860 index row versions were removed. 2712 index pages have been deleted, 2060 are currently reusable. CPU 0.22s/0.64u sec elapsed 8.45 sec. INFO: file_92: removed 107860 row versions in 9131 pages DETAIL: CPU 1.13s/4.27u sec elapsed 11.75 sec. INFO: file_92: found 107860 removable, 92 nonremovable row versions in 9131 pages DETAIL: 91 dead row versions cannot be removed yet. There were 303086 unused item pointers. 0 pages are entirely empty. CPU 1.55s/5.00u sec elapsed 20.86 sec. INFO: file_92: truncated 9131 to 8423 pages DETAIL: CPU 0.65s/0.03u sec elapsed 5.80 sec. INFO: free space map: 57 relations, 34892 pages stored; 34464 total pages needed DETAIL: Allocated FSM size: 1000 relations + 200 pages = 11784 kB shared memory. Also, file_92 is just a temporary storage area, for records waiting to be processed. Records are in there typically ~10 sec. Over 100'000 Index Rows removed, 300'000 unused item pointers ? How could autovacuum let this happen ? I would estimate the table had about 1 inserts/deletes between the last pg_autovacuum Vacuum analyze and my manual vacuum verbose. It is like the suction is not strong enough ;) Any ideas ? It would be greatly appreciated as this is taking me one step closer to the looney bin. Thanks /Otto Blomqvist ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) --
Re: [PERFORM] Delete query takes exorbitant amount of time
On Thu, 2005-03-24 at 21:24 -0800, Josh Berkus wrote: Karim, How about getting some decent disk support? A single 10K SCSI disk is a bit sub-par for a database with 100's of millions of records. Too bad you didn't get a v40z ... Hehe. I have one I am setting up that will be dedicated to postgresql, hence my question about a week ago about disk partitioning/striping :-) Beyond that, you'll want to do the same thing whenever you purge the referencing table; drop keys, delete, re-create keys. Or think about why it is you need to delete batches of records from this FKed table at all. The database is for weather data from multiple sources. When adding a new dataset, I have to create/test/delete/recreate the config in the FKed table. Users don't have this power, but I need it. Drop/delete/recreate is a totally acceptable solution for this scenario. I guess I was wondering if there is other general tuning advice for such large table indexes such as increasing statistics, etc. Thanks, -- Karim Nassar Department of Computer Science Box 15600, College of Engineering and Natural Sciences Northern Arizona University, Flagstaff, Arizona 86011 Office: (928) 523-5868 -=- Mobile: (928) 699-9221 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Hardware questions
Junaili, I'd suggest you don't buy a dell. The aren't particularly good performers. Dave Junaili Lie wrote: Hi guys, We are in the process of buying a new dell server. Here is what we need to be able to do: - we need to be able to do queries on tables that has 10-20 millions of records (around 40-60 bytes each row) in less than 5-7 seconds. We also need the hardware to be able to handle up to 50 millions records on a few tables (5 tables in the DB). Here is what we are thinking: - Dual Xeon 2.8 Ghz - 4GB DDR2 400 Mhz Dual Ranked DIMMS (is dual ranked or single ranked makes any differences in terms of performance?). Do you guys think 4GB is reasonably enough? - 73 GB 15k RPM Ultra 320 SCSI Hard Drive - Dual on-board NICS (is this enough, or Gigabit network adapter will help?) Any input or suggestions is greatly appreciated. Thank you, Jun ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] Configuration/Tuning of server/DB
Using information found on the web, I've come up with some configuration and tuning parameters for a server/db that we will be implementing. I was wondering if I could generate some feedback as to configuration and tuning so that I could compare my estimations with those of others. Host is AIX 5.1 with 4 cpu's and 4 GB ram. Postgresql will be sharing this machine with other processes. Storage is an EMC storage array. The DB itself is very simple. Two tables, one with 40-45 columns ( largest column will likely contain no more than 32 chars of data ), the other with less than 5 columns ( largest column will contain no more than 20 chars data ). Expected transactions will be along the order of ~600K +- 100K inserts and ~600K +-200K updates per week. Thanks ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Delete query takes exorbitant amount of time
On Thu, 2005-03-24 at 20:48 -0500, Tom Lane wrote: In that case there's a datatype mismatch between the referencing and referenced columns, which prevents the index from being used for the FK check. Can I have more words on this? Here is how I created the tables: CREATE TABLE int_sensor_meas_type( id_int_sensor_meas_type SERIAL PRIMARY KEY, id_sensor integer NOT NULL REFERENCES sensor, id_meas_type integer NOT NULL REFERENCES meas_type UNIQUE); CREATE TABLE measurement ( id_measurement SERIAL PRIMARY KEY, id_int_sensor_meas_type integer NOT NULL REFERENCES int_sensor_meas_type, datetime timestamp WITH TIME ZONE NOT NULL, value numeric(15,5) NOT NULL, created timestamp with time zone NOT NULL DEFAULT now(), created_by TEXT NOT NULL REFERENCES public.person(id_person)); CREATE INDEX measurement__id_int_sensor_meas_type_idx ON measurement(id_int_sensor_meas_type); Do I need to cast the id_int_sensor_meas_type column when creating the index? Both referrer and referenced look like INTEGER to me... http://www.postgresql.org/docs/8.0/interactive/datatype.html#DATATYPE-SERIAL says: The type names serial and serial4 are equivalent: both create integer columns TIA, -- Karim Nassar Department of Computer Science Box 15600, College of Engineering and Natural Sciences Northern Arizona University, Flagstaff, Arizona 86011 Office: (928) 523-5868 -=- Mobile: (928) 699-9221 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] Query Optimizer Failure / Possible Bug
The query and the corresponding EXPLAIN is at http://hannes.imos.net/query.txt I'd like to use the column q.replaced_serials for multiple calculations in the SELECT clause, but every time it is referenced there in some way the whole query in the FROM clause returning q is executed again. This doesn't make sense to me at all and eats performance. If this wasn't clear enough, for every q.replaced_serials insert_random_calculation AS some_column in the SELECT clause there is new block of --- - Aggregate (cost=884.23..884.23 rows=1 width=0) - Nested Loop (cost=0.00..884.23 rows=1 width=0) - Index Scan using ix_rma_ticket_serials_replace on rma_ticket_serials rts (cost=0.00..122.35 rows=190 width=4) Index Cond: (replace = false) - Index Scan using pk_serials on serials s (cost=0.00..3.51 rows=1 width=4) Index Cond: (s.serial_id = outer.serial_id) Filter: ((article_no = $0) AND (delivery_id = $1)) --- in the EXPLAIN result. For those who wonder why I do this FROM (SELECT...). I was searching for a way to use the result of an subselect for multiple calculations in the SELECT clause and return that calculation results as individual columns. I tested a bit further and found out that PG behaves the same in case q is a view. This makes me wonder how efficient the optimizer can work with views - or even worse - nested views. Tested and reproduced on PG 7.4.1 linux and 8.0.0 win32. Thanks in advance, Hannes Dorbath ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] What about utility to calculate planner cost constants?
On Tue, Mar 22, 2005 at 08:09:40AM -0500, Christopher Browne wrote: Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Greg Stark) wrote: I don't think it would be very hard at all actually. It's just a linear algebra problem with a bunch of independent variables and a system of equations. Solving for values for all of them is a straightforward problem. Of course in reality these variables aren't actually independent because the costing model isn't perfect. But that wouldn't be a problem, it would just reduce the accuracy of the results. Are you certain it's a linear system? I'm not. If it was a matter of minimizing a linear expression subject to some set of linear equations, then we could model this as a Linear Program for which there are some perfectly good solvers available. (Few with BSD-style licenses, but we could probably get some insight out of running for a while with something that's there...) I think there's good reason to consider it to be distinctly NON-linear, which makes it way more challenging to solve the problem. Non-linear optimization works very well in many cases. Issues such as local minima can be addressed. In a sense, the planner output can be treated as a blackbox function and the goodness of the solution is how well it approximates the actual query times. In this case, it will be imperative to constrain some of the values to prevent crazy configurations. Ken ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org