Re: [PERFORM] Takes too long to fetch the data from database
On Thu, Apr 20, 2006 at 11:07:31 +0530, soni de [EMAIL PROTECTED] wrote: Please provide me some help regarding how could I use cursor in following cases? : I want to fetch 50 records at a time starting from largest stime. SELECT * FROM wan ORDER BY stime LIMIT 50 OFFSET 81900; Something like the following may be faster: SELECT * FROM wan ORDER BY stime DESC LIMIT 50; ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Perfrmance Problems (7.4.6)
Hi, I am running on postgres 7.4.6. I did a vacuum analyze on the database but there was no change. I Attached here a file with details about the tables, the queries and the Explain analyze plans. Hope this can be helpful to analyze my problem 10x Doron TABLES Table log.msg_info Column |Type |Modifiers +-+-- msgid | bigint | not null default nextval('log.msg_info_msgid_seq'::text) sender | character varying(255) | subject| text| size | bigint | entry_time | timestamp without time zone | default now() source_ip | cidr| origin | smallint| Indexes: msg_info_pkey primary key, btree (msgid) ddindx btree (date(entry_time)) msg_info_entry_time btree (entry_time) msg_info_sender_index btree (sender) msg_info_size btree (size) msg_info_subject btree (subject) Table log.msg_fate Column| Type | Modifiers -++ msgid | bigint | not null default nextval('log.msg_fate_msgid_seq'::text) grp_fate_id | bigint | not null default nextval('log.msg_fate_grp_fate_id_seq'::text) modid | integer| description | character varying(255) | rule_origin | bigint | action | smallint | ruleid | integer| Indexes: msg_fate_pkey primary key, btree (grp_fate_id) msg_fate_action btree (action) msg_fate_description btree (description) msg_fate_modid btree (modid) msg_fate_msgid btree (msgid) Foreign-key constraints: msgid_fkey FOREIGN KEY (msgid) REFERENCES log.msg_info(msgid) ON UPDATE CASCADE ON DELETE CASCADE Table log.msg_fate_recipients Column|Type | Modifiers -+-+--- grp_fate_id | bigint | recipient | character varying(255) | update_at | timestamp without time zone | default now() last_action | integer | zone_id | integer | direction | smallint| default 7 Indexes: msg_fate_recipients_grp_fate_id btree (grp_fate_id) msg_fate_recipients_last_action_idx btree (last_action) msg_fate_recipients_recipient_idx btree (recipient) msg_fate_recipients_update_at btree (update_at) msg_fate_recipients_zone_id btree (zone_id) Triggers: stats_for_domain AFTER INSERT ON log.msg_fate_recipients FOR EACH ROW EXECUTE PROCEDURE log.collect_stats_for_domain() stats_for_object AFTER INSERT ON log.msg_fate_recipients FOR EACH ROW EXECUTE PROCEDURE log.collect_stats_for_object() update_timestamp_last_action BEFORE UPDATE ON log.msg_fate_recipients FOR EACH ROW EXECUTE PROCEDURE log.recipients_status_changed_update() Table pineapp.zones Column | Type | Modifiers ---++- zone_id | integer| not null default nextval('pineapp.zones_zone_id_seq'::text) zone_name | character varying(20) | zone_desc | character varying(255) | zone_type | smallint | Indexes: zones_pkey primary key, btree (zone_id) zones_zone_id btree (zone_id) QUERIES *** 1) explain analyze SELECT date_trunc('hour'::text, i.entry_time) AS datetime, COUNT(fr.grp_fate_id) , SUM(i.size) FROM log.msg_info as i,log.msg_fate as f, log.msg_fate_recipients as fr WHERE i.origin = 1 AND i.msgid=f.msgid AND i.entry_time '2006-01-25' AND f.grp_fate_id=fr.grp_fate_id GROUP BY datetime order by datetime; QUERY PLAN - GroupAggregate (cost=1355984.84..1417243.22 rows=1485233 width=24) (actual time=257433.784..269102.088 rows=623 loops=1) - Sort (cost=1355984.84..1368514.62 rows=5011913 width=24) (actual time=257349.038..261012.595 rows=5160187 loops=1) Sort Key: date_trunc('hour'::text, i.entry_time) - Hash Join (cost=256729.52..667400.86 rows=5011913 width=24) (actual time=63133.140..208966.342 rows=5160187 loops=1) Hash Cond: (outer.grp_fate_id = inner.grp_fate_id) - Seq Scan on msg_fate_recipients fr (cost=0.00..178230.71
Re: [PERFORM] Perfrmance Problems (7.4.6)
I think that the problem is the GROUP BY (datetime) that is date_trunc('hour'::text, i.entry_time) You should create an indexe with this expression (if its possible). http://www.postgresql.org/docs/7.4/interactive/indexes-expressional.html If is not possible, I would create a column with value date_trunc('hour'::text, i.entry_time) of each row and then index it. Hope this helps :) Doron Baranes wrote: Hi, I am running on postgres 7.4.6. I did a vacuum analyze on the database but there was no change. I Attached here a file with details about the tables, the queries and the Explain analyze plans. Hope this can be helpful to analyze my problem 10x Doron ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Perfrmance Problems (7.4.6)
Ok. But that means I need a trigger on the original column to update the new column on each insert/update and that overhead. -Original Message- From: Ruben Rubio Rey [mailto:[EMAIL PROTECTED] Sent: Thursday, April 20, 2006 12:49 PM To: Doron Baranes; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Perfrmance Problems (7.4.6) I think that the problem is the GROUP BY (datetime) that is date_trunc('hour'::text, i.entry_time) You should create an indexe with this expression (if its possible). http://www.postgresql.org/docs/7.4/interactive/indexes-expressional.html If is not possible, I would create a column with value date_trunc('hour'::text, i.entry_time) of each row and then index it. Hope this helps :) Doron Baranes wrote: Hi, I am running on postgres 7.4.6. I did a vacuum analyze on the database but there was no change. I Attached here a file with details about the tables, the queries and the Explain analyze plans. Hope this can be helpful to analyze my problem 10x Doron --- - ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Quick Performance Poll
Hi, I was just wondering whether anyone has had success with storing more than 1TB of data with PostgreSQL and how they have found the performance. We need a database that can store in excess of this amount and still show good performance. We will probably be implementing several tables with foreign keys and also indexes which will obviously impact on both data size and performance too. Many thanks in advance, Simon Visit our Website at www.rm.com This message is confidential. You should not copy it or disclose its contents to anyone. You may use and apply the information for the intended purpose only. Internet communications are not secure; therefore, RM does not accept legal responsibility for the contents of this message. Any views or opinions presented are those of the author only and not of RM. If this email has come to you in error, please delete it, along with any attachments. Please note that RM may intercept incoming and outgoing email communications. Freedom of Information Act 2000 This email and any attachments may contain confidential information belonging to RM. Where the email and any attachments do contain information of a confidential nature, including without limitation information relating to trade secrets, special terms or prices these shall be deemed for the purpose of the Freedom of Information Act 2000 as information provided in confidence by RM and the disclosure of which would be prejudicial to RM's commercial interests. This email has been scanned for viruses by Trend ScanMail.
Re: [PERFORM] Quick Performance Poll
Simon, I have many databases over 1T with the largest being ~6T. All of my databases store telecom data, such as call detail records. The access is very fast when looking for a small subset of the data. For servers, I am using white box intel XEON and P4 systems with SATA disks, 4G of memory. SCSI is out of our price range, but if I had unlimited $ I would go with SCSI /SCSI raid instead. Jim -- Original Message --- From: Simon Dale [EMAIL PROTECTED] To: pgsql-performance@postgresql.org Sent: Thu, 20 Apr 2006 14:18:58 +0100 Subject: [PERFORM] Quick Performance Poll Hi, I was just wondering whether anyone has had success with storing more than 1TB of data with PostgreSQL and how they have found the performance. We need a database that can store in excess of this amount and still show good performance. We will probably be implementing several tables with foreign keys and also indexes which will obviously impact on both data size and performance too. Many thanks in advance, Simon Visit our Website at http://www.rm.com This message is confidential. You should not copy it or disclose its contents to anyone. You may use and apply the information for the intended purpose only. Internet communications are not secure; therefore, RM does not accept legal responsibility for the contents of this message. Any views or opinions presented are those of the author only and not of RM. If this email has come to you in error, please delete it, along with any attachments. Please note that RM may intercept incoming and outgoing email communications. Freedom of Information Act 2000 This email and any attachments may contain confidential information belonging to RM. Where the email and any attachments do contain information of a confidential nature, including without limitation information relating to trade secrets, special terms or prices these shall be deemed for the purpose of the Freedom of Information Act 2000 as information provided in confidence by RM and the disclosure of which would be prejudicial to RM's commercial interests. This email has been scanned for viruses by Trend ScanMail. --- End of Original Message --- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Identical query on two machines, different plans....
I have copied the database from production server to my laptop (pg_dump, etc...) to do some testing. While testing I have found out that one particular query is beeing much slower on my machine than on the server (it's not just because my laptop is much slower than the server), and found out that postgres is using different plan on server than on my laptop. Both on server and on my laptop is postgres-8.1.2, running on Debian (sarge on server, Ubuntu on my laptop), with 2.6 kernel, I compiled postgres with gcc4 on both machines. The query is like this: on the server: pulitzer2=# explain analyze select code_id from ticketing_codes where code_group_id = 1000 and code_value = UPPER('C7ZP2U'); QUERY PLAN --- Index Scan using ticketing_codes_uq_value_group_id on ticketing_codes (cost=0.00..6.02 rows=1 width=4) (actual time=0.104..0.107 rows=1 loops=1) Index Cond: (((code_value)::text = 'C7ZP2U'::text) AND (code_group_id = 1000)) Total runtime: 0.148 ms (3 rows) And, on my laptop: som_pulitzer2=# explain analyze select code_id from ticketing_codes where code_group_id = 1000 and code_value = UPPER('C7ZP2U'); QUERY PLAN -- Bitmap Heap Scan on ticketing_codes (cost=2.01..1102.05 rows=288 width=4) (actual time=88.164..88.170 rows=1 loops=1) Recheck Cond: (((code_value)::text = 'C7ZP2U'::text) AND (code_group_id = 1000)) - Bitmap Index Scan on ticketing_codes_uq_value_group_id (cost=0.00..2.01 rows=288 width=0) (actual time=54.397..54.397 rows=1 loops=1) Index Cond: (((code_value)::text = 'C7ZP2U'::text) AND (code_group_id = 1000)) Total runtime: 88.256 ms (5 rows) This is the table ticketing_codes: som_pulitzer2=# \d ticketing_codes; Table public.ticketing_codes Column | Type | Modifiers ---+---+--- code_id | integer | not null default nextval('ticketing_codes_code_id_seq'::regclass) code_value| character varying(10) | not null code_group_id | integer | not null Indexes: ticketing_codes_pk PRIMARY KEY, btree (code_id) ticketing_codes_uq_value_group_id UNIQUE, btree (code_value, code_group_id) Foreign-key constraints: ticketing_codes_fk__ticketing_code_groups FOREIGN KEY (code_group_id) REFERENCES ticketing_code_groups(group_id) And the \d command produces the same result on both my server and laptop. That query is beeing called from within function, the code is like this: codeId := code_id from ticketing_codes where code_group_id = 1000 and code_value = UPPER('C7ZP2U'); codeId has been declared as int4. When that query is run inside the function, it takes around 20 seconds (compared to 88 miliseconds when I call it from psql). The query is that very same query, just the values 1000 and 'C7ZP2U' are parametars for the function. So, the second question would be why is that query much much slower when run from within function? Is there a way to see an execution plan for the query inside the function? Mike -- Mario Splivalo Mob-Art [EMAIL PROTECTED] I can do it quick, I can do it cheap, I can do it well. Pick any two. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Identical query on two machines, different plans....
You very likely forgot to run ANALYZE on your laptop after copying the data. Observe the different row count estimates in the 2 plans... HTH, Csaba. QUERY PLAN --- Index Scan using ticketing_codes_uq_value_group_id on ticketing_codes (cost=0.00..6.02 rows=1 width=4) (actual time=0.104..0.107 rows=1 ^^ loops=1) Index Cond: (((code_value)::text = 'C7ZP2U'::text) AND (code_group_id = 1000)) Total runtime: 0.148 ms (3 rows) PLAN -- Bitmap Heap Scan on ticketing_codes (cost=2.01..1102.05 rows=288 width=4) (actual time=88.164..88.170 rows=1 loops=1) Recheck Cond: (((code_value)::text = 'C7ZP2U'::text) AND (code_group_id = 1000)) - Bitmap Index Scan on ticketing_codes_uq_value_group_id (cost=0.00..2.01 rows=288 width=0) (actual time=54.397..54.397 rows=1 loops=1) Index Cond: (((code_value)::text = 'C7ZP2U'::text) AND (code_group_id = 1000)) Total runtime: 88.256 ms (5 rows) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Identical query on two machines, different plans....
On Thu, 2006-04-20 at 15:59 +0200, Csaba Nagy wrote: You very likely forgot to run ANALYZE on your laptop after copying the data. Observe the different row count estimates in the 2 plans... HTH, Csaba. Sometimes I wish I am Dumbo the Elephant, so I could cover myself with me ears... Thnx :) Mike -- Mario Splivalo Mob-Art [EMAIL PROTECTED] I can do it quick, I can do it cheap, I can do it well. Pick any two. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Identical query on two machines, different plans....
OK, I marked the wrong row counts, but the conclusion is the same. Cheers, Csaba. QUERY PLAN --- Index Scan using ticketing_codes_uq_value_group_id on ticketing_codes (cost=0.00..6.02 rows=1 width=4) (actual time=0.104..0.107 rows=1 ^^ loops=1) Index Cond: (((code_value)::text = 'C7ZP2U'::text) AND (code_group_id = 1000)) Total runtime: 0.148 ms (3 rows) PLAN -- Bitmap Heap Scan on ticketing_codes (cost=2.01..1102.05 rows=288 width=4) (actual time=88.164..88.170 rows=1 loops=1) Recheck Cond: (((code_value)::text = 'C7ZP2U'::text) AND (code_group_id = 1000)) - Bitmap Index Scan on ticketing_codes_uq_value_group_id (cost=0.00..2.01 rows=288 width=0) (actual time=54.397..54.397 rows=1 loops=1) Index Cond: (((code_value)::text = 'C7ZP2U'::text) AND (code_group_id = 1000)) Total runtime: 88.256 ms (5 rows) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Inserts optimization?
On Wed, 2006-04-19 at 20:07, Christopher Kings-Lynne wrote: Scott Marlowe [EMAIL PROTECTED] writes: It's the refusal of people to stop using MyISAM table types that's the real issue. Isn't MyISAM still the default over there? It's hardly likely that the average MySQL user would use anything but the default table type ... Since MySQL 5, InnoDB tables are default I recall. It gets built by default, but when you do a plain create table, it will still default to myisam tables. Note that there is a setting somewhere in my.cnf that will make the default table type anything you want. For Bacula though, what I was suggesting was that they simply declare that you need innodb table type support if you want decent performance, then coding to that, and if someone doesn't have innodb table support, then they have no right to complain about poor performance. Seems a fair compromise to me. The Bacula folks would get to program to a real database model with proper serlialization and all that, and the people who refuse to move up to a later model MySQL get crappy performance. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Quick Performance Poll
Jim, On 4/20/06 6:36 AM, Jim Buttafuoco [EMAIL PROTECTED] wrote: The access is very fast when looking for a small subset of the data. I guess you are not using indexes because building a (non bitmap) index on 6TB on a single machine would take days if not weeks. So if you are using table partitioning, do you have to refer to each child table separately in your queries? - Luke ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Perfrmance Problems (7.4.6)
Did you tried to index the expression? Did it work? Doron Baranes wrote: Ok. But that means I need a trigger on the original column to update the new column on each insert/update and that overhead. -Original Message- From: Ruben Rubio Rey [mailto:[EMAIL PROTECTED] Sent: Thursday, April 20, 2006 12:49 PM To: Doron Baranes; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Perfrmance Problems (7.4.6) I think that the problem is the GROUP BY (datetime) that is date_trunc('hour'::text, i.entry_time) You should create an indexe with this expression (if its possible). http://www.postgresql.org/docs/7.4/interactive/indexes-expressional.html If is not possible, I would create a column with value date_trunc('hour'::text, i.entry_time) of each row and then index it. Hope this helps :) Doron Baranes wrote: Hi, I am running on postgres 7.4.6. I did a vacuum analyze on the database but there was no change. I Attached here a file with details about the tables, the queries and the Explain analyze plans. Hope this can be helpful to analyze my problem 10x Doron --- - ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Quick Performance Poll
Jim, On 4/20/06 7:40 AM, Jim Buttafuoco [EMAIL PROTECTED] wrote: First of all this is NOT a single table and yes I am using partitioning and the constaint exclusion stuff. the largest set of tables is over 2T. I have not had to rebuild the biggest database yet, but for a smaller one ~1T the restore takes about 12 hours including many indexes on both large and small tables You would probably benefit greatly from the new on-disk bitmap index feature in Bizgres Open Source. It's 8.1 plus the sort speed improvement and on-disk bitmap index. Index creation and sizes for the binary version are in the table below (from a performance report on bizgres network. The version in CVS tip on pgfoundry is much faster on index creation as well. The current drawback to bitmap index is that it isn't very maintainable under insert/update, although it is safe for those operations. For now, you have to drop index, do inserts/updates, rebuild index. We'll have a version that is maintained for insert/update next. - Luke # Indexed Columns Create Time (seconds) Space Used (MBs) BITMAP BTREE BITMAP BTREE 1 L_SHIPMODE454.8 2217.1 58 1804 2 L_QUANTITY547.2 937.81171804 3 L_LINENUMBER 374.5 412.459 1285 4 L_SHIPMODE, L_QUANTITY948.7 2933.4 1762845 5 O_ORDERSTATUS 83.5241.35 321 6 O_ORDERPRIORITY 108.5 679.111 580 7 C_MKTSEGMENT 10.951.3 1 45 8 C_NATIONKEY 8.3 9.3 2 32 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Quick Performance Poll
I have been following your work with great interest. I believe I spoke to someone from Greenplum at linux world in Boston a couple of weeks ago. -- Original Message --- From: Luke Lonergan [EMAIL PROTECTED] To: [EMAIL PROTECTED], Simon Dale [EMAIL PROTECTED], pgsql-performance@postgresql.org Sent: Thu, 20 Apr 2006 08:03:10 -0700 Subject: Re: [PERFORM] Quick Performance Poll Jim, On 4/20/06 7:40 AM, Jim Buttafuoco [EMAIL PROTECTED] wrote: First of all this is NOT a single table and yes I am using partitioning and the constaint exclusion stuff. the largest set of tables is over 2T. I have not had to rebuild the biggest database yet, but for a smaller one ~1T the restore takes about 12 hours including many indexes on both large and small tables You would probably benefit greatly from the new on-disk bitmap index feature in Bizgres Open Source. It's 8.1 plus the sort speed improvement and on-disk bitmap index. Index creation and sizes for the binary version are in the table below (from a performance report on bizgres network. The version in CVS tip on pgfoundry is much faster on index creation as well. The current drawback to bitmap index is that it isn't very maintainable under insert/update, although it is safe for those operations. For now, you have to drop index, do inserts/updates, rebuild index. We'll have a version that is maintained for insert/update next. - Luke # Indexed Columns Create Time (seconds) Space Used (MBs) BITMAP BTREE BITMAP BTREE 1 L_SHIPMODE454.8 2217.1 58 1804 2 L_QUANTITY547.2 937.81171804 3 L_LINENUMBER 374.5 412.459 1285 4 L_SHIPMODE, L_QUANTITY948.7 2933.4 1762845 5 O_ORDERSTATUS 83.5241.35 321 6 O_ORDERPRIORITY 108.5 679.111 580 7 C_MKTSEGMENT 10.951.3 1 45 8 C_NATIONKEY 8.3 9.3 2 32 --- End of Original Message --- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Quick Performance Poll
Hi, Luke, Luke Lonergan wrote: The current drawback to bitmap index is that it isn't very maintainable under insert/update, although it is safe for those operations. For now, you have to drop index, do inserts/updates, rebuild index. So they effectively turn the table into a read-only table for now. Are they capable to index custom datatypes like the PostGIS geometries that use the GIST mechanism? This could probably speed up our Geo Databases for Map rendering, containing static data that is updated approx. 2 times per year. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] IBM pSeries - overrated bucket of crud?
Hi again :) This is a follow-up to the mega thread which made a Friday night more interesting [1] - the summary is various people thought there was some issue with shared memory access on AIX. I then installed Debian (kernel 2.6.11) on the 8-CPU p650 (native - no LPAR) and saw just as woeful performance. Now I've had a chance to try a 2-CPU dualcore Opteron box, and it *FLIES* - the 4-way machine sits churning through our heavy 'hotelsearch' function at ~400ms per call. Basically, this pSeries box is available until Monday lunchtime if any pg devel wants to pop in, run tests, mess around since I am convinced that the hardware itself cannot be this poor - it has to be some failing of pg when mixed with our dataset / load pattern. e.g. If I run 'ab -n 200 -c 4 -k http://localhost/test.php [2] with pg_connect pointed at the pSeries, it turns in search times of ~3500ms with loadavg of 4. The same test with pg_connect pointed at the dual-Opteron turns in ~300ms searches, with loadavg of 3.5 .. something is very very wrong with the pSeries setup :) If I crank up the heat and run apachebench with 10 hammering clients instead of 4, the differences become even more stark.. pSeries: 5000-15000ms, loadavg 9.. Opteron ~3000ms, loadavg 8. 90% of queries on the Opteron conclude in under 4000ms, which maxes out at 6.5 searches per second. The pSeries manages 0.9 searches per second. (!) Databases on both machines have seen a VACUUM FULL and VACUUM ANALYZE before testing, and have near-identical postgresql.conf's. (the pSeries has twice the RAM) This post is not intended to be whining that 'pg is crap on pSeries!' - I'm trying to make a resource available (albeit for a short time) to help fix a problem that will doubtless affect others in future - for certain we're never going midrange again! :O Cheers, Gavin. [1] http://archives.postgresql.org/pgsql-performance/2006-04/msg00143.php [2] Trivial script which does a pg_connect, runs a random hotelsearch and exits. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Quick Performance Poll
Markus, On 4/20/06 8:11 AM, Markus Schaber [EMAIL PROTECTED] wrote: Are they capable to index custom datatypes like the PostGIS geometries that use the GIST mechanism? This could probably speed up our Geo Databases for Map rendering, containing static data that is updated approx. 2 times per year. Should work fine - the other limitation is cardinality, or number of unique values in the column being indexed. A reasonable limit is about 10,000 unique values in the column. We're also going to improve this aspect of the implementation, but the progress might take the useful limit to 300,000 or so. - Luke ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Performance decrease
I'm new to PG and I'm testing default PG settings for now. I have PG 8.1.3. installed with autovacuum=on. My test table has 15830 records with 190 fields. I have different fields types (date, numeric, varchar, integer, smallint,...). I decided to evaluate PG because I need to use schemas. First test I did is not very promising. I tried to update one fields in test table several times to see how PG react on this. I do like this: update table set field = null After first execute I get time 3 seconds. Then I repeat this update. After each update time increase. I get 4 sec, 7 sec, 10 sec, 12 sec, 15 sec, 18 sec, 21 sec. Is this normal (default) behaviour or I must do something to prevent this. Regards, Radovan Antloga ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Performance decrease
Radovan Antloga [EMAIL PROTECTED] writes: My test table has 15830 records with 190 fields. 190 fields in a table seems like rather a lot ... is that actually representative of your intended applications? I do like this: update table set field = null Again, is that representative of something you'll be doing a lot in practice? Most apps don't often update every row of a table, in my experience. After first execute I get time 3 seconds. Then I repeat this update. After each update time increase. I get 4 sec, 7 sec, 10 sec, 12 sec, 15 sec, 18 sec, 21 sec. There should be some increase because of the addition of dead rows, but both the original 3 seconds and the rate of increase seem awfully high for such a small table. What are you running this on? For comparison purposes, here's what I see on a full-table UPDATE of a 1-row table on a rather slow HP box: regression=# \timing Timing is on. regression=# create table t1 as select * from tenk1; SELECT Time: 1274.213 ms regression=# update t1 set unique2 = null; UPDATE 1 Time: 565.664 ms regression=# update t1 set unique2 = null; UPDATE 1 Time: 589.839 ms regression=# update t1 set unique2 = null; UPDATE 1 Time: 593.735 ms regression=# update t1 set unique2 = null; UPDATE 1 Time: 615.575 ms regression=# update t1 set unique2 = null; UPDATE 1 Time: 755.456 ms regression=# Vacuuming brings the time back down: regression=# vacuum t1; VACUUM Time: 242.406 ms regression=# update t1 set unique2 = null; UPDATE 1 Time: 458.028 ms regression=# regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Takes too long to fetch the data from database
SELECT * FROM wan ORDER BY stime LIMIT 50 OFFSET 81900; you need to try and solve the problem without using 'offset'. you could do: BEGIN; DECLARE crs cursor FOR SELECT * FROM wan ORDER BY stime; FETCH ABSOLUTE 81900 in crs; FETCH 49 in crs; CLOSE crs; COMMIT; this may be a bit faster but will not solve the fundamental problem. the more interesting question is why you want to query exactly 81900 rows into a set. This type of thinking will always get you into trouble, absolute positioning will not really work in a true sql sense. if you are browsing a table sequentially, there are much better methods. merlin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Performance decrease
190 fields in a table seems like rather a lot ... is that actually representative of your intended applications? Test table is like table I use in production with Firebird and Oracle db. Table has a lot of smallint and integer fields. As you can see I have Firebird for low cost projects (small companies) and Oracle medium or large project. Again, is that representative of something you'll be doing a lot in practice? Most apps don't often update every row of a table, in my experience. I agree with you ! I have once or twice a month update on many records (~6000) but not so many. I did not expect PG would have problems with updating 15800 records. My test was on Windows XP SP2. I have AMD 64 2.1 GHz cpu with 1GB ram. Regards, Radovan Antloga ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] mergehashloop
On Wed, Apr 19, 2006 at 01:25:28AM -0400, Tom Lane wrote: Mark Kirkwood [EMAIL PROTECTED] writes: Jim C. Nasby wrote: Good point. :/ I'm guessing there's no easy way to see how many blocks for a given relation are in shared memory, either... contrib/pg_buffercache will tell you this - I think the key word in Jim's comment was easy, ie, cheap. Grovelling through many thousands of buffers to count the matches to a given relation doesn't sound appetizing, especially not if it gets done over again several times during each query-planning cycle. Trying to keep centralized counts somewhere would be even worse (because of locking/ contention issues). Very true. OTOH, it might not be unreasonable to periodically slog through the buffers and store that information, perhaps once a minute, or every X number of transactions. I think a bigger issue is that we currently have no way to really measure the effictiveness of the planner. Without that it's impossible to come up with any real data on whether cost formula A is better or worse than cost formula B. The only means I can think of for doing this would be to measure estimated cost vs actual cost, but with the overhead of EXPLAIN ANALYZE and other variables that might not prove terribly practical. Maybe someone else has some ideas... -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] SELECT FOR UPDATE performance is bad
On Wed, Apr 19, 2006 at 10:20:54AM +0200, Mario Splivalo wrote: This works perfectly, but sometimes the game has no codes, and I still need to know exactley who came first, who was second, and so on... So a locking table as Tom suggested is, I guess, a perfect solution for my situation... Depending on your performance requirements, you should look at contrib/userlock as well, since it will probably be much more performant than locking a row in a table. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Quick Performance Poll
Interested in doing a case study for the website? On Thu, Apr 20, 2006 at 09:36:25AM -0400, Jim Buttafuoco wrote: Simon, I have many databases over 1T with the largest being ~6T. All of my databases store telecom data, such as call detail records. The access is very fast when looking for a small subset of the data. For servers, I am using white box intel XEON and P4 systems with SATA disks, 4G of memory. SCSI is out of our price range, but if I had unlimited $ I would go with SCSI /SCSI raid instead. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Performance decrease
On Thu, Apr 20, 2006 at 06:10:21PM +0200, Radovan Antloga wrote: I have once or twice a month update on many records (~6000) but not so many. I did not expect PG would have problems with updating 15800 records. And generally speaking, it doesn't. But you do need to ensure that you're vacuuming the database frequently enough. Autovacuum is a good way to do that. My test was on Windows XP SP2. I have AMD 64 2.1 GHz cpu with 1GB ram. One think to keep in mind is that the windows code is rather new, so it is possible to find some performance issues there. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Hardware: HP StorageWorks MSA 1500
We're going to get one for evaluation next week (equipped with dual 2Gbit HBA:s and 2x14 disks, iirc). Anyone with experience from them, performance wise? Regards, Mikael ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hardware: HP StorageWorks MSA 1500
Hmmm. We use an MSA 1000 with Fibre Channel interconnects. No real complaints, although I was a little bit disappointed by the RAID controller's battery-backed write cache performance; tiny random writes are only about 3 times as fast with write caching enabled as with it disabled, I had (perhaps naively) hoped for more. Sequential scans from our main DB (on a 5-pair RAID 10 set with 15k RPM drives) get roughly 80MB/sec. Getting the redundant RAID controllers to fail over correctly on Linux was a big headache and required working the tech support phone all day until we finally got to the deep guru who knew the proper undocumented incantations. -- Mark Lewis On Thu, 2006-04-20 at 20:00 +0200, Mikael Carneholm wrote: We're going to get one for evaluation next week (equipped with dual 2Gbit HBA:s and 2x14 disks, iirc). Anyone with experience from them, performance wise? Regards, Mikael ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Recovery will take 10 hours
Hi, We had a database issue today that caused us to have to restore to our most recent backup. We are using PITR so we have 3120 WAL files that need to be applied to the database. After 45 minutes, it has restored only 230 WAL files. At this rate, it's going to take about 10 hours to restore our database. Most of the time, the server is not using very much CPU time or I/O time. So I'm wondering what can be done to speed up the process? The database is about 20 GB. The WAL files are compressed with gzip to about 4 MB. Expanded, the WAL files would take 48 GB. We are using PostgreSQL 8.1.3 on OS X Server 10.4.6 connected to an XServe RAID. The pg_xlog is on its own separate RAID and so are the table spaces. Here's a representative sample of doing iostat: hulk1:/Library/PostgreSQL admin$ iostat 5 disk1 disk2 disk0 cpu KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us sy id 19.31 101 1.91 14.39 51 0.71 37.37 4 0.13 15 10 76 8.00 21 0.16 0.00 0 0.00 90.22 2 0.16 0 2 98 8.00 32 0.25 0.00 0 0.00 0.00 0 0.00 0 1 98 8.00 76 0.60 0.00 0 0.00 0.00 0 0.00 0 1 99 8.00 587 4.59 1024.00 4 4.00 0.00 0 0.00 4 7 88 8.00 675 5.27 956.27 6 5.60 0.00 0 0.00 6 6 88 11.32 1705 18.84 5.70 1 0.01 16.36 7 0.12 1 6 93 8.00 79 0.62 1024.00 3 3.20 0.00 0 0.00 2 2 96 8.00 68 0.53 0.00 0 0.00 0.00 0 0.00 0 2 98 8.00 76 0.59 0.00 0 0.00 0.00 0 0.00 0 1 99 8.02 89 0.69 0.00 0 0.00 0.00 0 0.00 1 1 98 8.00 572 4.47 911.11 4 3.20 0.00 0 0.00 5 5 91 13.53 1227 16.21 781.55 4 3.21 12.14 2 0.03 3 6 90 8.00 54 0.42 0.00 0 0.00 90.22 2 0.16 1 1 98 8.00 68 0.53 0.00 0 0.00 0.00 0 0.00 0 1 99 8.00 461 3.60 1024.00 3 3.20 0.00 0 0.00 3 6 91 8.00 671 5.24 964.24 7 6.40 0.00 0 0.00 6 8 86 7.99 248 1.94 0.00 0 0.00 0.00 0 0.00 1 3 96 15.06 1050 15.44 911.11 4 3.20 12.12 3 0.03 2 5 93 19.84 176 3.41 5.70 1 0.01 0.00 0 0.00 0 1 99 disk1 is the RAID volume that has the table spaces on it. disk2 is the pg_xlog and disk0 is the boot disk. So you can see the CPU is idle much of the time and the IO only occurs in short bursts. Each line in the iostat results is 5 seconds apart. If there were something we could do to speed up the process, would it be possible to kill the postgres process, tweak some parameter somewhere and then start it up again? Or would we have to restore our base backup again and start over? How can I make this go faster? Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Inserts optimization?
On Apr 13, 2006, at 2:59 PM, Francisco Reyes wrote: This particular server is pretty much what I inherited for now for this project.and its Raid 5. There is a new server I am setting up soon... 8 disks which we are planning to setup 6 disks in RAID 10 2 Hot spares In RAID 10 would it matter that WALL is in the same RAID set? Would it be better: 4 disks in RAID10 Data 2 disks RAID 1 WALL 2 hot spares why do you need two hot spares? I'd go with 6 disk RAID10 for data 2 disk RAID1 for WAL (and OS if you don't have other disks from which to boot) and run nothing else but Postgres on that box. bump up checkpoint_segments to some huge number like 256 and use the bg writer process. if a disk fails, just replace it quickly with a cold spare. and if your RAID controller has two channels, pair the mirrors across channels. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Inserts optimization?
On Apr 14, 2006, at 8:00 AM, Marc Cousin wrote: So, you'll probably end up being slowed down by WAL fsyncs ... and you won't have a lot of solutions. Maybe you should start with trying to set fsync=no as a test to confirm that (you should have a lot of iowaits right now if you haven't disabled fsync). Instead of doing that, why not use commit_delay to some nominal value to try and group the fsyncs. If they're coming in at 30 per second, this should help a bit, I suspect. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Quick Performance Poll
Hi Luke, I (still) haven't tried Bizgres, but what do you mean with The current drawback to bitmap index is that it isn't very maintainable under insert/update, although it is safe for those operations? Do you mean that INSERT/UPDATE operations against bitmap indexes are imperformant ? If yes, to what extend ? Or you mean that bitmap index corruption is possible when issueing DML againts BMP indexes? Or BMP indexes are growing too fast as a result of DML ? I am asking this question because Oracle needed 3 years to solve its BMP index problems (BMP index corruption/ space usage explosion when several processes are performing DML operations ). Is Bizgres implementation suffering from this kind child deseases ? Regards . Milen -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Luke Lonergan Sent: Thursday, April 20, 2006 5:03 PM To: [EMAIL PROTECTED]; Simon Dale; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Quick Performance Poll Jim, On 4/20/06 7:40 AM, Jim Buttafuoco [EMAIL PROTECTED] wrote: First of all this is NOT a single table and yes I am using partitioning and the constaint exclusion stuff. the largest set of tables is over 2T. I have not had to rebuild the biggest database yet, but for a smaller one ~1T the restore takes about 12 hours including many indexes on both large and small tables You would probably benefit greatly from the new on-disk bitmap index feature in Bizgres Open Source. It's 8.1 plus the sort speed improvement and on-disk bitmap index. Index creation and sizes for the binary version are in the table below (from a performance report on bizgres network. The version in CVS tip on pgfoundry is much faster on index creation as well. The current drawback to bitmap index is that it isn't very maintainable under insert/update, although it is safe for those operations. For now, you have to drop index, do inserts/updates, rebuild index. We'll have a version that is maintained for insert/update next. - Luke # Indexed Columns Create Time (seconds) Space Used (MBs) BITMAP BTREE BITMAP BTREE 1 L_SHIPMODE454.8 2217.1 58 1804 2 L_QUANTITY547.2 937.81171804 3 L_LINENUMBER 374.5 412.459 1285 4 L_SHIPMODE, L_QUANTITY948.7 2933.4 1762845 5 O_ORDERSTATUS 83.5241.35 321 6 O_ORDERPRIORITY 108.5 679.111 580 7 C_MKTSEGMENT 10.951.3 1 45 8 C_NATIONKEY 8.3 9.3 2 32 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Recovery will take 10 hours
Brendan Duddridge [EMAIL PROTECTED] writes: We had a database issue today that caused us to have to restore to our most recent backup. We are using PITR so we have 3120 WAL files that need to be applied to the database. After 45 minutes, it has restored only 230 WAL files. At this rate, it's going to take about 10 hours to restore our database. Most of the time, the server is not using very much CPU time or I/O time. So I'm wondering what can be done to speed up the process? That seems a bit odd --- should be eating one or the other, one would think. Try strace'ing the recovery process to see what it's doing. If there were something we could do to speed up the process, would it be possible to kill the postgres process, tweak some parameter somewhere and then start it up again? Or would we have to restore our base backup again and start over? You could start it up again, but it'd want to read through all the WAL it's already looked at, so I'd not recommend this until/unless you're pretty sure you've fixed the performance issue. Right at the moment, I think this is a golden opportunity to study the performance of WAL recovery --- it's not something we've tried to optimize particularly. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Recovery will take 10 hours
Hi Tom, Do you mean do a kill -QUIT on the postgres process in order to generate a stack trace? Will that affect the currently running process in any bad way? And where would the output go? stdout? Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Apr 20, 2006, at 2:17 PM, Tom Lane wrote: Brendan Duddridge [EMAIL PROTECTED] writes: We had a database issue today that caused us to have to restore to our most recent backup. We are using PITR so we have 3120 WAL files that need to be applied to the database. After 45 minutes, it has restored only 230 WAL files. At this rate, it's going to take about 10 hours to restore our database. Most of the time, the server is not using very much CPU time or I/O time. So I'm wondering what can be done to speed up the process? That seems a bit odd --- should be eating one or the other, one would think. Try strace'ing the recovery process to see what it's doing. If there were something we could do to speed up the process, would it be possible to kill the postgres process, tweak some parameter somewhere and then start it up again? Or would we have to restore our base backup again and start over? You could start it up again, but it'd want to read through all the WAL it's already looked at, so I'd not recommend this until/unless you're pretty sure you've fixed the performance issue. Right at the moment, I think this is a golden opportunity to study the performance of WAL recovery --- it's not something we've tried to optimize particularly. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Recovery will take 10 hours
Brendan Duddridge [EMAIL PROTECTED] writes: Do you mean do a kill -QUIT on the postgres process in order to generate a stack trace? Not at all! I'm talking about tracing the kernel calls it's making. Depending on your platform, the tool for this is called strace, ktrace, truss, or maybe even just trace. With strace you'd do something like strace -p PID-of-process 2outfile ... wait 30 sec or so ... control-C Not sure about the APIs for the others but they're probably roughly similar ... read the man page ... regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance decrease
On 20.04.2006, at 18:10 Uhr, Radovan Antloga wrote: I have once or twice a month update on many records (~6000) but not so many. I did not expect PG would have problems with updating 15800 records. It has no problems with that. We have a database where we often update/insert rows with about one hundred columns. No problem so far. Performance is in the sub 10ms range. The whole table has about 10 records. Do you wrap every update in a separate transaction? I do commits every 200 updates for bulk updates. cug -- PharmaLine, Essen, GERMANY Software and Database Development smime.p7s Description: S/MIME cryptographic signature
Re: [PERFORM] Recovery will take 10 hours
Title: Re: [PERFORM] Recovery will take 10 hours Brendan, strace p pid -c Then do a CTRL-C after a minute to get the stats of system calls. - Luke On 4/20/06 2:13 PM, Brendan Duddridge [EMAIL PROTECTED] wrote: Hi Tom, Do you mean do a kill -QUIT on the postgres process in order to generate a stack trace? Will that affect the currently running process in any bad way? And where would the output go? stdout? Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Apr 20, 2006, at 2:17 PM, Tom Lane wrote: Brendan Duddridge [EMAIL PROTECTED] writes: We had a database issue today that caused us to have to restore to our most recent backup. We are using PITR so we have 3120 WAL files that need to be applied to the database. After 45 minutes, it has restored only 230 WAL files. At this rate, it's going to take about 10 hours to restore our database. Most of the time, the server is not using very much CPU time or I/O time. So I'm wondering what can be done to speed up the process? That seems a bit odd --- should be eating one or the other, one would think. Try strace'ing the recovery process to see what it's doing. If there were something we could do to speed up the process, would it be possible to kill the postgres process, tweak some parameter somewhere and then start it up again? Or would we have to restore our base backup again and start over? You could start it up again, but it'd want to read through all the WAL it's already looked at, so I'd not recommend this until/unless you're pretty sure you've fixed the performance issue. Right at the moment, I think this is a golden opportunity to study the performance of WAL recovery --- it's not something we've tried to optimize particularly. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Quick Performance Poll
Milen, On 4/20/06 12:45 PM, Milen Kulev [EMAIL PROTECTED] wrote: I (still) haven't tried Bizgres, but what do you mean with The current drawback to bitmap index is that it isn't very maintainable under insert/update, although it is safe for those operations? Yes. Do you mean that INSERT/UPDATE operations against bitmap indexes are imperformant ? If yes, to what extend ? Insert/Update (but not delete) operations will often invalidate a bitmap index in our current implementation because we have not implemented a maintenance method for them when insertions re-use TIDs. We are in the planning stages for an update that will fix this. Or you mean that bitmap index corruption is possible when issueing DML againts BMP indexes? We check for the case of an insertion that causes a re-used TID and issue an error that indicates the index should be removed before the operation is retried. This isn't particularly useful for cases where inserts occur frequently, so the current use-case if for tables where DML should be done in batches after removing the index, then the index re-applied. I am asking this question because Oracle needed 3 years to solve its BMP index problems (BMP index corruption/ space usage explosion when several processes are performing DML operations ). We will be much faster than that! Concurrency will be less than ideal with our maintenance approach initially, but there shouldn't be a corruption problem. Is Bizgres implementation suffering from this kind child deseases ? Sneeze, cough. - Luke -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Luke Lonergan Sent: Thursday, April 20, 2006 5:03 PM To: [EMAIL PROTECTED]; Simon Dale; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Quick Performance Poll Jim, On 4/20/06 7:40 AM, Jim Buttafuoco [EMAIL PROTECTED] wrote: First of all this is NOT a single table and yes I am using partitioning and the constaint exclusion stuff. the largest set of tables is over 2T. I have not had to rebuild the biggest database yet, but for a smaller one ~1T the restore takes about 12 hours including many indexes on both large and small tables You would probably benefit greatly from the new on-disk bitmap index feature in Bizgres Open Source. It's 8.1 plus the sort speed improvement and on-disk bitmap index. Index creation and sizes for the binary version are in the table below (from a performance report on bizgres network. The version in CVS tip on pgfoundry is much faster on index creation as well. The current drawback to bitmap index is that it isn't very maintainable under insert/update, although it is safe for those operations. For now, you have to drop index, do inserts/updates, rebuild index. We'll have a version that is maintained for insert/update next. - Luke # Indexed Columns Create Time (seconds) Space Used (MBs) BITMAP BTREE BITMAP BTREE 1 L_SHIPMODE454.8 2217.1 58 1804 2 L_QUANTITY547.2 937.81171804 3 L_LINENUMBER 374.5 412.459 1285 4 L_SHIPMODE, L_QUANTITY948.7 2933.4 1762845 5 O_ORDERSTATUS 83.5241.35 321 6 O_ORDERPRIORITY 108.5 679.111 580 7 C_MKTSEGMENT 10.951.3 1 45 8 C_NATIONKEY 8.3 9.3 2 32 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Recovery will take 10 hours
Hi Jeff, The WAL files are stored on a separate server and accessed through an NFS mount located at /wal_archive. However, the restore failed about 5 hours in after we got this error: [2006-04-20 16:41:28 MDT] LOG: restored log file 0001018F0034 from archive [2006-04-20 16:41:35 MDT] LOG: restored log file 0001018F0035 from archive [2006-04-20 16:41:38 MDT] LOG: restored log file 0001018F0036 from archive sh: line 1: /wal_archive/0001018F0037.gz: No such file or directory [2006-04-20 16:41:46 MDT] LOG: could not open file pg_xlog/ 0001018F0037 (log file 399, segment 55): No such file or directory [2006-04-20 16:41:46 MDT] LOG: redo done at 18F/36FFF254 sh: line 1: /wal_archive/0001018F0036.gz: No such file or directory [2006-04-20 16:41:46 MDT] PANIC: could not open file pg_xlog/ 0001018F0036 (log file 399, segment 54): No such file or directory [2006-04-20 16:41:46 MDT] LOG: startup process (PID 9190) was terminated by signal 6 [2006-04-20 16:41:46 MDT] LOG: aborting startup due to startup process failure [2006-04-20 16:41:46 MDT] LOG: logger shutting down The /wal_archive/0001018F0037.gz is there accessible on the NFS mount. Is there a way to continue the restore process from where it left off? Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Apr 20, 2006, at 3:19 PM, Jeff Frost wrote: On Thu, 20 Apr 2006, Brendan Duddridge wrote: Hi, We had a database issue today that caused us to have to restore to our most recent backup. We are using PITR so we have 3120 WAL files that need to be applied to the database. After 45 minutes, it has restored only 230 WAL files. At this rate, it's going to take about 10 hours to restore our database. Most of the time, the server is not using very much CPU time or I/ O time. So I'm wondering what can be done to speed up the process? Brendan, Where are the WAL files being stored and how are they being read back? -- Jeff Frost, Owner [EMAIL PROTECTED] Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Recovery will take 10 hours
Hi Tom, I found it... it's called ktrace on OS X Server. However, as I just finished posting to the list, the process died with a PANIC error: [2006-04-20 16:41:28 MDT] LOG: restored log file 0001018F0034 from archive [2006-04-20 16:41:35 MDT] LOG: restored log file 0001018F0035 from archive [2006-04-20 16:41:38 MDT] LOG: restored log file 0001018F0036 from archive sh: line 1: /wal_archive/0001018F0037.gz: No such file or directory [2006-04-20 16:41:46 MDT] LOG: could not open file pg_xlog/ 0001018F0037 (log file 399, segment 55): No such file or directory [2006-04-20 16:41:46 MDT] LOG: redo done at 18F/36FFF254 sh: line 1: /wal_archive/0001018F0036.gz: No such file or directory [2006-04-20 16:41:46 MDT] PANIC: could not open file pg_xlog/ 0001018F0036 (log file 399, segment 54): No such file or directory [2006-04-20 16:41:46 MDT] LOG: startup process (PID 9190) was terminated by signal 6 [2006-04-20 16:41:46 MDT] LOG: aborting startup due to startup process failure [2006-04-20 16:41:46 MDT] LOG: logger shutting down Would turning off fsync make it go faster? Maybe it won't take 10 hours again if we start from scratch. Also, what if we did just start it up again? Will postgres realize that the existing wal_archive files have already been processed and just skip along until it finds one it hasn't processed yet? Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Apr 20, 2006, at 3:19 PM, Tom Lane wrote: Brendan Duddridge [EMAIL PROTECTED] writes: Do you mean do a kill -QUIT on the postgres process in order to generate a stack trace? Not at all! I'm talking about tracing the kernel calls it's making. Depending on your platform, the tool for this is called strace, ktrace, truss, or maybe even just trace. With strace you'd do something like strace -p PID-of-process 2outfile ... wait 30 sec or so ... control-C Not sure about the APIs for the others but they're probably roughly similar ... read the man page ... regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Recovery will take 10 hours
Oops... forgot to mention that both files that postgres said were missing are in fact there: A partial listing from our wal_archive directory: -rw--- 1 postgres staff 4971129 Apr 19 20:08 0001018F0036.gz -rw--- 1 postgres staff 4378284 Apr 19 20:09 0001018F0037.gz There didn't seem to be any issues with the NFS mount. Perhaps it briefly disconnected and came back right away. Thanks! Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Apr 20, 2006, at 5:11 PM, Brendan Duddridge wrote: Hi Jeff, The WAL files are stored on a separate server and accessed through an NFS mount located at /wal_archive. However, the restore failed about 5 hours in after we got this error: [2006-04-20 16:41:28 MDT] LOG: restored log file 0001018F0034 from archive [2006-04-20 16:41:35 MDT] LOG: restored log file 0001018F0035 from archive [2006-04-20 16:41:38 MDT] LOG: restored log file 0001018F0036 from archive sh: line 1: /wal_archive/0001018F0037.gz: No such file or directory [2006-04-20 16:41:46 MDT] LOG: could not open file pg_xlog/ 0001018F0037 (log file 399, segment 55): No such file or directory [2006-04-20 16:41:46 MDT] LOG: redo done at 18F/36FFF254 sh: line 1: /wal_archive/0001018F0036.gz: No such file or directory [2006-04-20 16:41:46 MDT] PANIC: could not open file pg_xlog/ 0001018F0036 (log file 399, segment 54): No such file or directory [2006-04-20 16:41:46 MDT] LOG: startup process (PID 9190) was terminated by signal 6 [2006-04-20 16:41:46 MDT] LOG: aborting startup due to startup process failure [2006-04-20 16:41:46 MDT] LOG: logger shutting down The /wal_archive/0001018F0037.gz is there accessible on the NFS mount. Is there a way to continue the restore process from where it left off? Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Apr 20, 2006, at 3:19 PM, Jeff Frost wrote: On Thu, 20 Apr 2006, Brendan Duddridge wrote: Hi, We had a database issue today that caused us to have to restore to our most recent backup. We are using PITR so we have 3120 WAL files that need to be applied to the database. After 45 minutes, it has restored only 230 WAL files. At this rate, it's going to take about 10 hours to restore our database. Most of the time, the server is not using very much CPU time or I/ O time. So I'm wondering what can be done to speed up the process? Brendan, Where are the WAL files being stored and how are they being read back? -- Jeff Frost, Owner [EMAIL PROTECTED] Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Recovery will take 10 hours
Brendan Duddridge [EMAIL PROTECTED] writes: However, as I just finished posting to the list, the process died with a PANIC error: [2006-04-20 16:41:28 MDT] LOG: restored log file 0001018F0034 from archive [2006-04-20 16:41:35 MDT] LOG: restored log file 0001018F0035 from archive [2006-04-20 16:41:38 MDT] LOG: restored log file 0001018F0036 from archive sh: line 1: /wal_archive/0001018F0037.gz: No such file or directory [2006-04-20 16:41:46 MDT] LOG: could not open file pg_xlog/ 0001018F0037 (log file 399, segment 55): No such file or directory [2006-04-20 16:41:46 MDT] LOG: redo done at 18F/36FFF254 sh: line 1: /wal_archive/0001018F0036.gz: No such file or directory [2006-04-20 16:41:46 MDT] PANIC: could not open file pg_xlog/ 0001018F0036 (log file 399, segment 54): No such file or directory This looks to me like a bug in your archive restore command. It had just finished providing 0001018F0036 at 16:41:38, why was it not able to do so again at 16:41:46? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Recovery will take 10 hours
Brendan, Is your NFS share mounted hard or soft? Do you have space to copy the files locally? I suspect you're seeing NFS slowness in your restore since you aren't using much in the way of disk IO or CPU. -Jeff On Thu, 20 Apr 2006, Brendan Duddridge wrote: Oops... forgot to mention that both files that postgres said were missing are in fact there: A partial listing from our wal_archive directory: -rw--- 1 postgres staff 4971129 Apr 19 20:08 0001018F0036.gz -rw--- 1 postgres staff 4378284 Apr 19 20:09 0001018F0037.gz There didn't seem to be any issues with the NFS mount. Perhaps it briefly disconnected and came back right away. Thanks! Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Apr 20, 2006, at 5:11 PM, Brendan Duddridge wrote: Hi Jeff, The WAL files are stored on a separate server and accessed through an NFS mount located at /wal_archive. However, the restore failed about 5 hours in after we got this error: [2006-04-20 16:41:28 MDT] LOG: restored log file 0001018F0034 from archive [2006-04-20 16:41:35 MDT] LOG: restored log file 0001018F0035 from archive [2006-04-20 16:41:38 MDT] LOG: restored log file 0001018F0036 from archive sh: line 1: /wal_archive/0001018F0037.gz: No such file or directory [2006-04-20 16:41:46 MDT] LOG: could not open file pg_xlog/0001018F0037 (log file 399, segment 55): No such file or directory [2006-04-20 16:41:46 MDT] LOG: redo done at 18F/36FFF254 sh: line 1: /wal_archive/0001018F0036.gz: No such file or directory [2006-04-20 16:41:46 MDT] PANIC: could not open file pg_xlog/0001018F0036 (log file 399, segment 54): No such file or directory [2006-04-20 16:41:46 MDT] LOG: startup process (PID 9190) was terminated by signal 6 [2006-04-20 16:41:46 MDT] LOG: aborting startup due to startup process failure [2006-04-20 16:41:46 MDT] LOG: logger shutting down The /wal_archive/0001018F0037.gz is there accessible on the NFS mount. Is there a way to continue the restore process from where it left off? Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Apr 20, 2006, at 3:19 PM, Jeff Frost wrote: On Thu, 20 Apr 2006, Brendan Duddridge wrote: Hi, We had a database issue today that caused us to have to restore to our most recent backup. We are using PITR so we have 3120 WAL files that need to be applied to the database. After 45 minutes, it has restored only 230 WAL files. At this rate, it's going to take about 10 hours to restore our database. Most of the time, the server is not using very much CPU time or I/O time. So I'm wondering what can be done to speed up the process? Brendan, Where are the WAL files being stored and how are they being read back? -- Jeff Frost, Owner [EMAIL PROTECTED] Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 6: explain analyze is your friend -- Jeff Frost, Owner [EMAIL PROTECTED] Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Recovery will take 10 hours
Well our restore command is pretty basic: restore_command = 'gunzip /wal_archive/%f.gz%p' I'm not sure why that would succeed then fail. Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Apr 20, 2006, at 5:20 PM, Tom Lane wrote: Brendan Duddridge [EMAIL PROTECTED] writes: However, as I just finished posting to the list, the process died with a PANIC error: [2006-04-20 16:41:28 MDT] LOG: restored log file 0001018F0034 from archive [2006-04-20 16:41:35 MDT] LOG: restored log file 0001018F0035 from archive [2006-04-20 16:41:38 MDT] LOG: restored log file 0001018F0036 from archive sh: line 1: /wal_archive/0001018F0037.gz: No such file or directory [2006-04-20 16:41:46 MDT] LOG: could not open file pg_xlog/ 0001018F0037 (log file 399, segment 55): No such file or directory [2006-04-20 16:41:46 MDT] LOG: redo done at 18F/36FFF254 sh: line 1: /wal_archive/0001018F0036.gz: No such file or directory [2006-04-20 16:41:46 MDT] PANIC: could not open file pg_xlog/ 0001018F0036 (log file 399, segment 54): No such file or directory This looks to me like a bug in your archive restore command. It had just finished providing 0001018F0036 at 16:41:38, why was it not able to do so again at 16:41:46? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Recovery will take 10 hours
Brendan Duddridge [EMAIL PROTECTED] writes: Oops... forgot to mention that both files that postgres said were missing are in fact there: Please place the blame where it should fall: it's your archive restore command that's telling postgres that. There didn't seem to be any issues with the NFS mount. Perhaps it briefly disconnected and came back right away. Unstable NFS mounts are Really Bad News. You shouldn't be expecting to run a stable database atop such a thing. If it's not the database but only the WAL archive that's NFS'd, it might be possible to live with it, but you'll need to put some defenses into your archive restore script to cope with such events. As far as restarting goes: I think you can restart from here without first redoing your base-backup restore, but as previously noted it'll still read through the same WAL files it looked at before. You won't save much except the time to redo the base restore. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Recovery will take 10 hours
Thanks Tom, We are storing only the WAL archives on the NFS volume. It must have been a hiccup in the NFS mount. Jeff Frost asked if we were using hard or soft mounts. We were using soft mounts, so that may be where the problem lies with the PANIC. Is it better to use the boot volume of the database machine for archiving our WAL files instead of over the NFS mount? I'm sure it's probably not a good idea to archive to the same volume as the pg_xlog directory, so that's why I thought maybe using the boot drive would be better. We'll just have to make sure we don't fill up the drive. Although I know that PostgreSQL often writes to the /data directory that is located on the boot drive. It might not be good to start archiving there. Our table spaces are on a separate RAID. If we need to restore in the future we'll just have to copy the WAL files from the boot drive of our database machine over the NFS to the restore machine. Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Apr 20, 2006, at 5:29 PM, Tom Lane wrote: Brendan Duddridge [EMAIL PROTECTED] writes: Oops... forgot to mention that both files that postgres said were missing are in fact there: Please place the blame where it should fall: it's your archive restore command that's telling postgres that. There didn't seem to be any issues with the NFS mount. Perhaps it briefly disconnected and came back right away. Unstable NFS mounts are Really Bad News. You shouldn't be expecting to run a stable database atop such a thing. If it's not the database but only the WAL archive that's NFS'd, it might be possible to live with it, but you'll need to put some defenses into your archive restore script to cope with such events. As far as restarting goes: I think you can restart from here without first redoing your base-backup restore, but as previously noted it'll still read through the same WAL files it looked at before. You won't save much except the time to redo the base restore. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Recovery will take 10 hours
Hi Tom, Well, we started the restore back up with the WAL archives copied to our local disk. It's going at about the same pace as with the restore over NFS. So I tried ktrace -p PID and it created a really big file. I had to do 'ktrace -p PID -c' to get it to stop. The ktrace.out file is read using kdump, but there's a lot of binary data in there intermixed with some system calls. For example: 15267 postgres RET read 8192/0x2000 15267 postgres CALL lseek(153,0,2) 15267 postgres RET lseek 0 15267 postgres CALL lseek(127,0,2) 15267 postgres RET lseek 0 15267 postgres CALL lseek(138,0,2) 15267 postgres RET lseek 0 15267 postgres CALL lseek(153,0,2) 15267 postgres RET lseek 0 15267 postgres CALL lseek(127,0,2) 15267 postgres RET lseek 0 15267 postgres CALL read(5,25225728,8192) 15267 postgres GIO fd 5 read 8192 bytes \M-P]\0\^A\0\0\0\^A\0\0\^A\M^H,\M-5`\0\0\0\^C\M-6r fill, polyester has a subtle sheen, machine wash\0\0\0Xreverses to\ solid colour, polyester fill, polyester has a subtle sheen, machine wash\^_\^Y7\M-3\0\0\0\0\0\0\0\0\0\0\0\0\0\0oG\0\ \b\0\^[)\^C \M^Or\M-#\^B\0\0\0\0\0A\M-\M-] ... lots of data \M^K$\0\0\0\fcomplete\0\0\0HCustom-width Valanceless Aluminum Mini Blinds 37 1/4-44 w. x 48 l.\0\0\0\M-P1 aluminum\ slats, valanceless headrail and matching bottom rail, hidden brackets, clear acrylic tilt wand, extra slats with rou\ te holes in the back, can be cut down to minimum width of 14, hardware. . .\0\0\^Aq1 aluminum slats, valanceless he\ adrail and matching bottom rail, hidden brackets, clear acrylic tilt wand, extra slats with route holes in the back, \ can be cut down to minimum width of 14, hardware and instructions included, wipe with a dam 15267 postgres RET read 8192/0x2000 15267 postgres CALL lseek(138,0,2) 15267 postgres RET lseek 0 15267 postgres CALL lseek(158,317251584,0) 15267 postgres RET lseek 0 15267 postgres CALL write(158,35286464,8192) 15267 postgres GIO fd 158 wrote 8192 bytes [EMAIL PROTECTED],[EMAIL PROTECTED] \^C?\M^X [EMAIL PROTECTED]@[EMAIL PROTECTED][EMAIL PROTECTED]@$[EMAIL PROTECTED][EMAIL PROTECTED]@$=\ [EMAIL PROTECTED]@$[EMAIL PROTECTED]@[EMAIL PROTECTED][EMAIL PROTECTED][EMAIL PROTECTED]@$8 [EMAIL PROTECTED];[EMAIL PROTECTED];\M-([EMAIL PROTECTED];[EMAIL PROTECTED];[EMAIL PROTECTED]:[EMAIL PROTECTED]:\M^H\ etc... I'm not sure that really tells me anything though other than the WAL archives don't actually archive SQL, but store only the database changes. Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Apr 20, 2006, at 3:19 PM, Tom Lane wrote: Brendan Duddridge [EMAIL PROTECTED] writes: Do you mean do a kill -QUIT on the postgres process in order to generate a stack trace? Not at all! I'm talking about tracing the kernel calls it's making. Depending on your platform, the tool for this is called strace, ktrace, truss, or maybe even just trace. With strace you'd do something like strace -p PID-of-process 2outfile ... wait 30 sec or so ... control-C Not sure about the APIs for the others but they're probably roughly similar ... read the man page ... regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Recovery will take 10 hours
Hi Tomas, Hmm... ktrace -p PID -c returns immediately without doing anything unless I've previously done a ktrace -p PID. According to the man page for ktrace's -c flag: -c Clear the trace points associated with the specified file or processes. When I run ktrace on OS X Server 10.4.6 it returns to the console immediately, however the ktrace.out file gets larger and larger until I issue another ktrace command with the -c flag. It never sits waiting for keyboard input. I haven't been able to find any way of generating the stats yet. The man page for ktrace or kdump doesn't mention anything about stats. Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Apr 20, 2006, at 6:51 PM, Tomas Vondra wrote: So I tried ktrace -p PID and it created a really big file. I had to do 'ktrace -p PID -c' to get it to stop. The ktrace.out file is read using kdump, but there's a lot of binary data in there intermixed with some system calls. Yes, that's what (s|k)trace does - it attaches to the process, and prints out all the system calls, parameters, return values etc. That gives you exact overview of what's going on in the program, but it's a little bit confusing if you are not familiar with that and/or you're in a hurry. But Luke Lonergan offered a '-c' switch, which gives you a statistics of the used system calls. This way you can see number of calls for individual syscalls and time spent in them. That could give you a hint why the process is so slow (for example there can be an I/O bottleneck or something like that). Just do 'ktrace -p PID -c' for about 30 seconds, then 'Ctrl-C' and post the output to this mailing list. t.v. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Recovery will take 10 hours
On Thu, 20 Apr 2006, Brendan Duddridge wrote: Hi Tomas, Hmm... ktrace -p PID -c returns immediately without doing anything unless I've previously done a ktrace -p PID. According to the man page for ktrace's -c flag: -c Clear the trace points associated with the specified file or processes. On other systems, strace/truss with -c produces a list of sys calls with the number of times they've been called in the elapsed period. To answer your other question, temporarily disabling fsync during the recovery should speed it up. For future reference, processing thousands of WAL files for recovery is not ideal. You should be doing a base backup much more often. Gavin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Introducing a new linux readahead framework
Greetings, I'd like to introduce a new readahead framework for the linux kernel: http://www.ussg.iu.edu/hypermail/linux/kernel/0603.2/1021.html HOW IT WORKS In adaptive readahead, the context based method may be of particular interest to postgresql users. It works by peeking into the file cache and check if there are any history pages present or accessed. In this way it can detect almost all forms of sequential / semi-sequential read patterns, e.g. - parallel / interleaved sequential scans on one file - sequential reads across file open/close - mixed sequential / random accesses - sparse / skimming sequential read It also have methods to detect some less common cases: - reading backward - seeking all over reading N pages WAYS TO BENEFIT FROM IT As we know, postgresql relies on the kernel to do proper readahead. The adaptive readahead might help performance in the following cases: - concurrent sequential scans - sequential scan on a fragmented table (some DBs suffer from this problem, not sure for pgsql) - index scan with clustered matches - index scan on majority rows (in case the planner goes wrong) TUNABLE PARAMETERS There are two parameters which are described in this email: http://www.ussg.iu.edu/hypermail/linux/kernel/0603.2/1024.html Here are the more oriented guidelines for postgresql users: - /proc/sys/vm/readahead_ratio Since most DB servers are bounty of memory, the danger of readahead thrashing is near to zero. In this case, you can set readahead_ratio to 100(or even 200:), which helps the readahead window to scale up rapidly. - /proc/sys/vm/readahead_hit_rate Sparse sequential reads are read patterns like {0, 2, 4, 5, 8, 11, ...}. In this case we might prefer to do readahead to get good I/O performance with the overhead of some useless pages. But if you prefer not to do so, set readahead_hit_rate to 1 will disable this feature. - /sys/block/sdX/queue/read_ahead_kb Set it to a large value(e.g. 4096) as you used to do. RAID users might want to use a bigger number. TRYING IT OUT The latest patch for stable kernels can be downloaded here: http://www.vanheusden.com/ara/ Before compiling, make sure that the following options are enabled: Processor type and features - Adaptive file readahead Processor type and features - Readahead debug and accounting HELPING AND CONTRIBUTING The patch is open to fine-tuning advices :) Comments and benchmarking results are highly appreciated. Thanks, Wu ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Introducing a new linux readahead framework
On Fri, Apr 21, 2006 at 09:38:26AM +0800, Wu Fengguang wrote: Greetings, I'd like to introduce a new readahead framework for the linux kernel: http://www.ussg.iu.edu/hypermail/linux/kernel/0603.2/1021.html HOW IT WORKS In adaptive readahead, the context based method may be of particular interest to postgresql users. It works by peeking into the file cache and check if there are any history pages present or accessed. In this way it can detect almost all forms of sequential / semi-sequential read patterns, e.g. - parallel / interleaved sequential scans on one file - sequential reads across file open/close - mixed sequential / random accesses - sparse / skimming sequential read It also have methods to detect some less common cases: - reading backward - seeking all over reading N pages Are there any ways to inform the kernel that you either are or aren't doing a sequential read? It seems that in some cases it would be better to bypass a bunch of tricky logic trying to determine that it's doing a sequential read. A sequential scan in PostgreSQL would be such a case. The opposite example would be an index scan of a highly uncorrelated index, which would produce mostly random reads from the table. In that case, reading ahead probably makes very little sense, though your logic might have a better idea of the access pattern than PostgreSQL does. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly