Re: [PERFORM] bad performance on Solaris 10
Jignesh K. Shah wrote: Bruce, Hard to answer that... People like me who know and love PostgreSQL and Solaris finds this as an opportunity to make their favorite database work best on their favorite operating system. Many times PostgreSQL has many things based on assumption that it will run on Linux and it is left to Solaris to emulate that behavior.That said there are ways to improve performance even on UFS on Solaris, it just requires more tweaks. Hopefully this will lead to few Solaris friendly default values like fsync/odatasync :-) Yes, if someone wants to give us a clear answer on which wal_sync method is best on all versions of Solaris, we can easily make that change. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Better index stategy for many fields with few values
Hi, Jim, Jim C. Nasby wrote: I was also thinking about about using a functional index. If there's a logical relation between those values that they can easily combined, that may be a good alternative. How would that be any better than just doing a multi-column index? 10 different values per column, and 20 columns are 10^20 value combinations. Partitioning it for the first column gives 10^19 combinations which is smaller than 2^64, and thus fits into a long value. And I just guess that a 10-partition functional index on a long value could perform better than a multi-column index on 20 columns of character(10), if only because it is approx. 1/25th in size. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Better index stategy for many fields with few values
Hi, Jim, Jim Nasby wrote: Adding -performance back in I would like to try it. However in an other post I added that contrary to what I stated initially all the paramXX columns are not mandatory in the query. So it seems that requirement make the problem more complexe. Okay, this rules out my functional index over 19 columns. Doesn't this new requirement rule out this solution? No, just group the columns logically. Yes, that's the solution. If you have common groups of columns that appear and disappear synchroneously, pack those together in an (possibly partitioned and/or functional) index. Then rely on the query planner that the combines the appropriate indices via index bitmap scan. By the way I have test to index each column individually and check what happens in relation to bitscan map. My test table is 1 million rows. The explain analyze command shows that a bit scan is sometimes used but I still end up with queries that can take up to 10s which is way to much. Is it on the first query, or on repeated queries? It might be that you're I/O bound, and the backend has to fetch indices and rows from Disk into RAM. I currently don't know whether the order of indices in a multi-index bitmap scan is relevant, but I could imagine that it may be useful to have the most selective index scanned first. And keep in mind that, assuming an equal distribution of your parameters, every index bitmap hits 1/10th of the whole table on average, so the selectivity generally is low. The selectivity of a partitioned 3-column index will be much better (about 1/1th of the whole table), and less index scans and bitmaps have to be generated. A functional index may also make sense to CLUSTER the table to optimize the locality of search results (and so reducing disk I/O). In case your table has low write activity, but high read-only activity, the overhead that comes with the additional index is neglible compared to the performance improvement proper CLUSTERing can generate. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Slow query - possible bug?
laterooms=# explain analyze select allocation0_.ID as y1_, allocation0_.RoomID as y2_, allocation0_.StatusID as y4_, allocation0_.Price as y3_, allocation0_.Number as y5_, allocation0_.Date as y6_ from Allocation allocation0_ where (allocation0_.Date between '2006-06-09 00:00:00.00' and '2006-06-09 00:00:00.00')and(allocation0_.RoomID in(4300591)); QUERY PLAN -- Index Scan using ix_date on Allocation allocation0_ (cost=0.00..4.77 rows=1 width=34) (actual time=1411.325..1689.860 rows=1 loops=1) Index Cond: ((Date = '2006-06-09'::date) AND (Date = '2006-06-09'::date)) Filter: (RoomID = 4300591) Total runtime: 1689.917 ms (4 rows) Yep, the two dates are identical - yep I would change the client software to do where Date = '2006-06-09 00:00:00.00' if I could... However, it's clear to see why this simple query is taking so long - the plan is selecting /all/ dates after 2006-06-09 and /all/ dates before then, and only returning the union of the two - a large waste of effort, surely? VACUUM ANALYZE hasn't improved matters... the schema for the table is ID int8 NOT NULL DEFAULT nextval(('public.allocation_id_seq'::text)::regclass), RoomID int4, Price numeric(10,2), StatusID int4, Number int4, Date date, and there are indexes kept for 'RoomID' and 'Date' in this 4.3-million row table. Is this a bug or a hidden feature in pg 8.1.3 ? :) Cheers, Gavin. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Slow query - possible bug?
On 4/13/06, Gavin Hamill [EMAIL PROTECTED] wrote: laterooms=# explain analyze select allocation0_.ID as y1_, allocation0_.RoomID as y2_, allocation0_.StatusID as y4_, allocation0_.Price as y3_, allocation0_.Number as y5_, allocation0_.Date as y6_ from Allocation allocation0_ where (allocation0_.Date between '2006-06-09 00:00:00.00' and '2006-06-09 00:00:00.00')and(allocation0_.RoomID in(4300591)); QUERY PLAN -- Index Scan using ix_date on Allocation allocation0_ (cost=0.00..4.77 rows=1 width=34) (actual time=1411.325..1689.860 rows=1 loops=1) Index Cond: ((Date = '2006-06-09'::date) AND (Date = '2006-06-09'::date)) Filter: (RoomID = 4300591) Total runtime: 1689.917 ms (4 rows) 1.6secs isn't too bad on 4.3mill rows... How many entries are there for that date range? -- Postgresql php tutorials http://www.designmagick.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Slow query - possible bug?
chris smith wrote: 1.6secs isn't too bad on 4.3mill rows... How many entries are there for that date range? 1.7 secs /is/ good - it typically takes 5 or 6 seconds, which isn't so good. My question is 'why does the planner choose such a bizarre range request when both elements of the 'between' are identical? :)' If I replace the (allocation0_.Date between '2006-06-09 00:00:00.00' and '2006-06-09 00:00:00.00') with allocation0_.Date ='2006-04-09 00:00:00.00' then the query comes back in a few milliseconds (as I'd expect :) - and yup I've been using different dates for each test to avoid the query being cached. For ref, there are typically 35000 rows per date :) Cheers, Gavin. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Slow query - possible bug?
Gavin Hamill wrote: chris smith wrote: 1.6secs isn't too bad on 4.3mill rows... How many entries are there for that date range? 1.7 secs /is/ good - it typically takes 5 or 6 seconds, which isn't so good. My question is 'why does the planner choose such a bizarre range request when both elements of the 'between' are identical? :)' What's bizarre about the range request, and are you sure it's searching doing the union of both conditions separately? It looks to me like it's doing a standard range-search. If it was trying to fetch 4.3 million rows via that index, I'd expect it to use a different index instead. If you've got stats turned on, look in pg_stat_user_indexes/tables before and after the query to see. Here's an example of a similar query against one of my log tables. It's small, but the clause is the same, and I don't see any evidence of the whole table being selected. lamp= SELECT * FROM pg_stat_user_indexes WHERE relname LIKE 'act%'; relid | indexrelid | schemaname | relname | indexrelname | idx_scan | idx_tup_read | idx_tup_fetch -+++-++--+--+--- 6124993 |7519044 | public | act_log | act_log_ts_idx | 23 | 18 |18 6124993 |7371115 | public | act_log | act_log_pkey | 0 |0 | 0 (2 rows) lamp= EXPLAIN ANALYSE SELECT * FROM act_log WHERE al_ts BETWEEN '2006-04-05 14:10:23+00'::timestamptz AND '2006-04-05 14:10:23+00'::timestamptz; QUERY PLAN - Index Scan using act_log_ts_idx on act_log (cost=0.00..3.02 rows=1 width=102) (actual time=0.116..0.131 rows=1 loops=1) Index Cond: ((al_ts = '2006-04-05 15:10:23+01'::timestamp with time zone) AND (al_ts = '2006-04-05 15:10:23+01'::timestamp with time zone)) Total runtime: 0.443 ms (3 rows) lamp= SELECT * FROM pg_stat_user_indexes WHERE relname LIKE 'act%'; relid | indexrelid | schemaname | relname | indexrelname | idx_scan | idx_tup_read | idx_tup_fetch -+++-++--+--+--- 6124993 |7519044 | public | act_log | act_log_ts_idx | 24 | 19 |19 6124993 |7371115 | public | act_log | act_log_pkey | 0 |0 | 0 (2 rows) 1. vacuum full verbose your table (and post the output please) 2. perhaps reindex? 3. Try the explain analyse again and see what happens. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] pgmemcache
Hi, Tom, Tom Lane wrote: Why are AFTER COMMIT triggers impossible? What happens if such a trigger gets an error? You can't un-commit. Then it must be specified that those triggers are in their own transaction, and cannot abort the transaction. Or use the 2-phase-commit infrastructure for them. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
Tom Lane wrote: Gavin Hamill [EMAIL PROTECTED] writes: would a simple #define LWLOCK_PADDED_SIZE 128 be sufficient? Yeah, that's fine. OK I tried that but noticed no real improvement... in the interim I've installed Debian on the pSeries (using http://debian.gonicus.de/debian/dists/sarge/main/disks-powerpc/current/pseries/install.txt ) and using a simple load-test script - it picks a 'hotelsearch' select at random from a big file and just does a pg_query on that via PHP... Using apachebench with 10 clients gave a loadavg of about 10 after a few minutes, and the logs showed typical query times of 8 seconds. Again, no disk activity, normal context-switching, just full-out CPU usage... We're improving the quality + efficiency of the hotelsearch function all the time (Simon will certainly be able to vouch for its complexity) - am really uncertain what to do next tho! :/ Cheers, Gavin. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] index is not used if I include a function that returns current time in my query
Hello, postgresql 7.4.8 on SuSE Linux here. I have a table called DMO with a column called ORA_RIF defined as timestamp without time zone ; I created an index on this table based on this column only. If I run a query against a text literal the index is used: explain select * from dmo where ora_rif'2006-01-01'; QUERY PLAN - Index Scan using dmo_ndx02 on dmo (cost=0.00..1183.23 rows=736 width=156) Index Cond: (ora_rif '2006-01-01 00:00:00'::timestamp without time zone) If I try to use a function that returns the current time instead, a sequential scan is always performed: explain select * from dmo where ora_riflocaltimestamp; QUERY PLAN -- Seq Scan on dmo (cost=0.00..1008253.22 rows=2703928 width=156) Filter: (ora_rif ('now'::text)::timestamp(6) without time zone) explain select * from dmo where ora_riflocaltimestamp::timestamp without time zone; QUERY PLAN -- Seq Scan on dmo (cost=0.00..1008253.22 rows=2703928 width=156) Filter: (ora_rif ('now'::text)::timestamp(6) without time zone) ... etc. ... (tried with all datetime functions with and without cast) I even tried to write a function that explicitly returns a timestamp without time zone value: create or replace function f () returns timestamp without time zone as ' declare x timestamp without time zone ; begin x := ''2006-01-01 00:00:00''; return x ; end ; ' language plpgsql ; But the result is the same: explain select * from dmo ora_riff(); QUERY PLAN - Seq Scan on dmo (cost=0.00..987973.76 rows=2703928 width=156) Filter: (ora_rif f()) Any suggestion? Kind regards, -- Cristian Veronesi - C.R.P.A. S.p.A. - Reggio Emilia, Italy The first thing you need to learn about databases is that they are not just a fancy file system for storing data. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Slow query - possible bug?
Gavin Hamill [EMAIL PROTECTED] writes: If I replace the (allocation0_.Date between '2006-06-09 00:00:00.00' and '2006-06-09 00:00:00.00') with allocation0_.Date ='2006-04-09 00:00:00.00' then the query comes back in a few milliseconds (as I'd expect :) Could we see EXPLAIN ANALYZE for * both forms of the date condition, with the roomid condition; * both forms of the date condition, WITHOUT the roomid condition; * just the roomid condition I'm thinking the planner is misestimating something, but it's hard to tell what without breaking it down. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Better index stategy for many fields with few values
Hi Markus,Markus Schaber [EMAIL PROTECTED] wrote:Hi, Oscar,Please reply to the list and not privately, so others can learn fromyour replies, and possibly have better Ideas than me.That was my intention. I made a mistake.Oscar Picasso wrote: I cannot group the columns logically. Any column may or may not appear in a query.That's suboptimal. Summrarizing what I have learned: - I cannot use multicolumn indexes because I cannot group the column logically. - I cannot use funtional indexes - I cannot use clustering.You still can have a set of partitioned multi-column indices,overlapping enough that every combination of columns is covered (or riska sequential sub scan for the last two or three columns, this should nothurt too much if the first 17 columns were selective enough).The main problem with indices is that they also decrease write performance.If disk costs are not limited, it will make sense to have WAL, table andindices on different disks / raid arrays, to parallelize writes.Btw, I guess you have multiple, concurrent users?Yes I do.I have just made other tests with only the individual indexes and performance is much better than previously. Obviously there was an I/O problem during my initial test.Something interesting though. If I use few columns in the query the results come very quickly and pg does a sequential scan. When it reachs some threshold (4 or 5 columns) pg switches to bitmap scans. It then takes an almost constant time (~ 2500 ms) not matter how many more columns I add to the where clause.Interestingly enough, queries with many columns are less common. They also return less results and even many times no result at all. From the user point of view it would be nice to have a waiting time lower than 2500ms for these queries. Maybe I could achieve that goal simply by tuning postgresql. In a such case where should I look first in order to increase bitmap scanning? Maybe I could, that way, avoid the use of partitioned multi-column indexes.Oscar Talk is cheap. Use Yahoo! Messenger to make PC-to-Phone calls. Great rates starting at 1/min.
Re: [PERFORM] index is not used if I include a function that returns current time in my query
Cristian Veronesi [EMAIL PROTECTED] writes: If I try to use a function that returns the current time instead, a sequential scan is always performed: ... Any suggestion? 1. Use something newer than 7.4 ;-) 2. Set up a dummy range constraint, ie select ... where ora_rif localtimestamp and ora_rif 'infinity'; The problem you have is that the planner doesn't know the value of the function and falls back to a default assumption about the selectivity of the '' condition --- and that default discourages indexscans. (Note the very large estimate of number of rows returned.) In the range-constraint situation, the planner still doesn't know the value of the function, but its default assumption for a range constraint is tighter and it (probably) will choose an indexscan. Since PG 8.0, the planner understands that it's reasonable to pre-evaluate certain functions like localtimestamp to obtain better-than-guess values about selectivity, so updating would be a better fix. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] multi column query
You need to run EXPLAIN ANALYZE. Also, what's random_page_cost set to? And the output of \d chkpfw_tr_dy_dimension. The cost for that index scan looks way too high. And please reply-all so that the list is included. -Original Message- From: Sriram Dandapani [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 12, 2006 7:48 PM To: Jim Nasby Subject: RE: [PERFORM] multi column query I executed enable_seqscan=off and then ran an explain plan on the query UPDATE chkpfw_tr_dy_dimension SET summcount = a.summcount + b.summcount, bytes = a.bytes + b.bytes, duration = a.duration + b.duration from chkpfw_tr_dy_dimension a, c_chkpfw_dy_tr_updates b WHERE a.firstoccurrence = b.firstoccurrence AND a.customerid_id = b.customerid_id AND a.sentryid_id = b.sentryid_id AND a.node_id = b.node_id AND a.interface_id = b.interface_id AND a.source_id = b.source_id AND a.destination_id = b.destination_id AND a.sourceport_id = b.sourceport_id AND a.destinationport_id = b.destinationport_id AND a.inoutbound_id = b.inoutbound_id AND a.action_id = b.action_id AND a.protocol_id = b.protocol_id AND a.service_id = b.service_id AND a.sourcezone_id = b.sourcezone_id AND a.destinationzone_id = b.destinationzone_id; Here is the query plan Nested Loop (cost=20036.18..221851442.39 rows=1 width=166) - Merge Join (cost=10036.18..121620543.75 rows=1 width=96) Merge Cond: ((outer.firstoccurrence = inner.firstoccurrence) AND (outer.sentryid_id = inner.sentryid_id) AND (outer.node_id = inner.node_id)) Join Filter: ((outer.customerid_id = inner.customerid_id) AND (outer.interface_id = inner.interface_id) AND (outer.source_id = inner.source_id) AND (outer.destination_id = inner.destination_id) AND (outer.sourceport_id = inner.s (..) - Index Scan using chkpfw_tr_dy_idx1 on chkpfw_tr_dy_dimension a (cost=0.00..21573372.84 rows=6281981 width=88) - Sort (cost=10036.18..10037.38 rows=480 width=136) Sort Key: b.firstoccurrence, b.sentryid_id, b.node_id - Seq Scan on c_chkpfw_dy_tr_updates b (cost=1.00..10014.80 rows=480 width=136) - Seq Scan on chkpfw_tr_dy_dimension (cost=1.00..100168078.81 rows=6281981 width=70) -Original Message- From: Jim C. Nasby [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 12, 2006 5:44 PM To: Sriram Dandapani Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] multi column query On Wed, Apr 12, 2006 at 05:32:32PM -0700, Sriram Dandapani wrote: Hi When I update a table that has 20 columns and the where clause includes 16 of the columns (this is a data warehousing type update on aggregate fields), The bitmap scan is not used by the optimizer. The table is indexed on 3 of the 20 fields. The update takes really long to finish (on a 6 million row table) Do I need to do some magic with configuration to turn on bitmap scans. No. What's explain analyze of the query show? What's it doing now? Seqscan? You might try set enable_seqscan=off and see what that does. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] pgmemcache
Christian Storm [EMAIL PROTECTED] writes: Not sure if I follow why this is a problem. Seems like it would be beneficial to have both BEFORE and AFTER COMMIT triggers. With the BEFORE COMMIT trigger you would have the ability to 'un- commit' (rollback) the transaction. With the AFTER COMMIT trigger you wouldn't have that option because the commit has already been successful. However, with an AFTER COMMIT you would be able to trigger other downstream events that rely on a transaction successfully committing. An AFTER COMMIT trigger would have to be in a separate transaction. What happens if there's more than one, and one of them fails? Even more to the point, if it's a separate transaction, don't you have to fire all these triggers again when you commit that transaction? The idea seems circular. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] bad performance on Solaris 10
Bruce Momjian wrote On 04/13/06 01:39 AM,: Yes, if someone wants to give us a clear answer on which wal_sync method is best on all versions of Solaris, we can easily make that change. We're doing tests to see how various parameters in postgresql.conf affect performance on Solaris and will share the results shortly. Regards, -Robert ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Blocks read for index scans
While working on determining a good stripe size for a database, I realized it would be handy to know what the average request size is. Getting this info is a simple matter of joining pg_stat_all_tables and pg_statio_all_tables and doing some math, but there's one issue I've found; it appears that there's no information on how many heap blocks were read in by an index scan. Is there any way to get that info? -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] pgmemcache
On Apr 13, 2006, at 12:38 PM, Tom Lane wrote: Christian Storm [EMAIL PROTECTED] writes: Not sure if I follow why this is a problem. Seems like it would be beneficial to have both BEFORE and AFTER COMMIT triggers. With the BEFORE COMMIT trigger you would have the ability to 'un- commit' (rollback) the transaction. With the AFTER COMMIT trigger you wouldn't have that option because the commit has already been successful. However, with an AFTER COMMIT you would be able to trigger other downstream events that rely on a transaction successfully committing. An AFTER COMMIT trigger would have to be in a separate transaction. What happens if there's more than one, and one of them fails? Even more to the point, if it's a separate transaction, don't you have to fire all these triggers again when you commit that transaction? The idea seems circular. I suspect that in reality you'd probably want each on-commit trigger to be it's own transaction, but it depends on what you're doing. Also, I can't see any use for them where you'd actually be interacting with the database, only if you were calling something externally via a function. One example would be sending an email out when a certain table changes; in many cases it's better to let the change happen even if the email can't be sent, and you'd rather not send an email if the transaction just ends up rolling back for some reason. And yes, you'd have to ensure you didn't code yourself up a trigger loop. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Inserts optimization?
Chris writes: If you can, use copy instead: http://www.postgresql.org/docs/8.1/interactive/sql-copy.html I am familiar with copy. Can't use it in this scenario. The data is coming from a program called Bacula (Backup server). It is not static data. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Blocks read for index scans
Adding -performance back in... From: Steve Poe [mailto:[EMAIL PROTECTED] Jim, I could be way off, but doesn't from pg_statio_user_tables contain this information? http://www.postgresql.org/docs/8.1/interactive/monitoring-stats.html#MONITORING-STATS-VIEWS states: numbers of disk blocks read and buffer hits in all indexes of that table That leads me to believe that it's only tracking index blocks read, and not heap blocks read. One could presume that each index row read as reported by pg_stat_all_tables would represent a heap block read, but a large number of those would (hopefully) have already been in shared_buffers. On Thu, 2006-04-13 at 13:00 -0500, Jim Nasby wrote: While working on determining a good stripe size for a database, I realized it would be handy to know what the average request size is. Getting this info is a simple matter of joining pg_stat_all_tables and pg_statio_all_tables and doing some math, but there's one issue I've found; it appears that there's no information on how many heap blocks were read in by an index scan. Is there any way to get that info? -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Inserts optimization?
Tom Lane writes: Or at least try to do multiple inserts per transaction. Will see if the program has an option like that. Also, increasing checkpoint_segments and possibly wal_buffers helps a Will try those. Try to get the WAL onto a separate disk spindle if you can. (These things don't matter for SELECTs, but they do matter for writes.) This particular server is pretty much what I inherited for now for this project.and its Raid 5. There is a new server I am setting up soon... 8 disks which we are planning to setup 6 disks in RAID 10 2 Hot spares In RAID 10 would it matter that WALL is in the same RAID set? Would it be better: 4 disks in RAID10 Data 2 disks RAID 1 WALL 2 hot spares All in the same RAID controller ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] bad performance on Solaris 10
On 4/12/06, Josh Berkus josh@agliodbs.com wrote: People, Lately I find people are not so receptive to VxFS, and Sun is promoting ZFS, and we don't have a reasonable near term option for Raw IO in Postgres, so we need to work to find a reasonable path for Solaris users IMO. The long delays in ZFS production haven't helped us there, as the problems with UFS are severe. I just recently worked with sun solaris 10 and found it to be reasonably performant without much tuning. This was on a dual sparc sunblade workstation which i felt was very well engineered. I was able (with zero solaris experience) to get postgresql up and crunching away at some really data intensive tasks while running an application compiled their very excellent fortran compiler. In the enterprise world I am finding that the only linux distrubutions supported are redhat and suse, meaning if you have a problem with your san running against your gentoo box you have a serious problem. Solaris OTOH, is generally very well supported (especially on sun hardware) and is free. So I give sun great credit for providing a free if not necessarily completely open platform for developing open source applications in an enterprise environment. Merlin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] multi column query
Hi Jim The problem is fixed. The destination table that was being updated had 3 separate indexes. I combined them to a multi-column index and the effect was amazing. Thanks for your input Sriram -Original Message- From: Jim Nasby [mailto:[EMAIL PROTECTED] Sent: Thursday, April 13, 2006 9:42 AM To: Sriram Dandapani Cc: Pgsql-Performance (E-mail) Subject: RE: [PERFORM] multi column query You need to run EXPLAIN ANALYZE. Also, what's random_page_cost set to? And the output of \d chkpfw_tr_dy_dimension. The cost for that index scan looks way too high. And please reply-all so that the list is included. -Original Message- From: Sriram Dandapani [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 12, 2006 7:48 PM To: Jim Nasby Subject: RE: [PERFORM] multi column query I executed enable_seqscan=off and then ran an explain plan on the query UPDATE chkpfw_tr_dy_dimension SET summcount = a.summcount + b.summcount, bytes = a.bytes + b.bytes, duration = a.duration + b.duration from chkpfw_tr_dy_dimension a, c_chkpfw_dy_tr_updates b WHERE a.firstoccurrence = b.firstoccurrence AND a.customerid_id = b.customerid_id AND a.sentryid_id = b.sentryid_id AND a.node_id = b.node_id AND a.interface_id = b.interface_id AND a.source_id = b.source_id AND a.destination_id = b.destination_id AND a.sourceport_id = b.sourceport_id AND a.destinationport_id = b.destinationport_id AND a.inoutbound_id = b.inoutbound_id AND a.action_id = b.action_id AND a.protocol_id = b.protocol_id AND a.service_id = b.service_id AND a.sourcezone_id = b.sourcezone_id AND a.destinationzone_id = b.destinationzone_id; Here is the query plan Nested Loop (cost=20036.18..221851442.39 rows=1 width=166) - Merge Join (cost=10036.18..121620543.75 rows=1 width=96) Merge Cond: ((outer.firstoccurrence = inner.firstoccurrence) AND (outer.sentryid_id = inner.sentryid_id) AND (outer.node_id = inner.node_id)) Join Filter: ((outer.customerid_id = inner.customerid_id) AND (outer.interface_id = inner.interface_id) AND (outer.source_id = inner.source_id) AND (outer.destination_id = inner.destination_id) AND (outer.sourceport_id = inner.s (..) - Index Scan using chkpfw_tr_dy_idx1 on chkpfw_tr_dy_dimension a (cost=0.00..21573372.84 rows=6281981 width=88) - Sort (cost=10036.18..10037.38 rows=480 width=136) Sort Key: b.firstoccurrence, b.sentryid_id, b.node_id - Seq Scan on c_chkpfw_dy_tr_updates b (cost=1.00..10014.80 rows=480 width=136) - Seq Scan on chkpfw_tr_dy_dimension (cost=1.00..100168078.81 rows=6281981 width=70) -Original Message- From: Jim C. Nasby [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 12, 2006 5:44 PM To: Sriram Dandapani Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] multi column query On Wed, Apr 12, 2006 at 05:32:32PM -0700, Sriram Dandapani wrote: Hi When I update a table that has 20 columns and the where clause includes 16 of the columns (this is a data warehousing type update on aggregate fields), The bitmap scan is not used by the optimizer. The table is indexed on 3 of the 20 fields. The update takes really long to finish (on a 6 million row table) Do I need to do some magic with configuration to turn on bitmap scans. No. What's explain analyze of the query show? What's it doing now? Seqscan? You might try set enable_seqscan=off and see what that does. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] pgmemcache
An AFTER COMMIT trigger would have to be in a separate transaction. I guess AFTER COMMIT triggers would be like a NOTIFY, but more powerful. While NOTIFY can't transmit information to another process, this trigger could, and the other process could then view the results of the commited transaction. Also, implementing a long process (something involving network roundtrips, for instance) in a BEFORE COMMIT trigger would delay the transaction and any locks it holds with no benefit. What happens if there's more than one, and one of them fails? Each one in its own transaction ? Even more to the point, if it's a separate transaction, don't you have to fire all these triggers again when you commit that transaction? The idea seems circular. I guess AFTER COMMIT triggers are most useful when coupled to a trigger on a modification to a table. So, the before / after commit could be an attribute of an AFTER INSERT/UPDATE/DELETE trigger. If the AFTER COMMIT trigger doesn't do any modifications to the target table, there will be no infinite loop. The before/after commit could also be implemented not via triggers, but via deferred actions, by telling postgres to execute a specific query just before/after the transaction commits. This could be used to implement the triggers, but would also be more generic : a trigger on INSERT could then defer a call to memcache update once the transaction is commited. It gets lisp-ish, but it would be really cool? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] pgmemcache
Tom Lane [EMAIL PROTECTED] writes: Christian Storm [EMAIL PROTECTED] writes: Not sure if I follow why this is a problem. Seems like it would be beneficial to have both BEFORE and AFTER COMMIT triggers. With the BEFORE COMMIT trigger you would have the ability to 'un- commit' (rollback) the transaction. With the AFTER COMMIT trigger you wouldn't have that option because the commit has already been successful. However, with an AFTER COMMIT you would be able to trigger other downstream events that rely on a transaction successfully committing. An AFTER COMMIT trigger would have to be in a separate transaction. What happens if there's more than one, and one of them fails? Even more to the point, if it's a separate transaction, don't you have to fire all these triggers again when you commit that transaction? The idea seems circular. Maybe it just means they would have to be limited to not making any database modifications. Ie, all they can do is notify the outside world that the transaction committed. Presumably if you wanted to make any database modifications you would just do it in the transaction anyways since they wouldn't show up until the transaction commits. -- greg ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Blocks read for index scans
Jim Nasby [EMAIL PROTECTED] writes: While working on determining a good stripe size for a database, I realized it would be handy to know what the average request size is. Getting this info is a simple matter of joining pg_stat_all_tables and pg_statio_all_tables and doing some math, but there's one issue I've found; it appears that there's no information on how many heap blocks were read in by an index scan. Is there any way to get that info? If the table is otherwise idle, the change in the table's entry in pgstatio_all_tables should do, no? (This is as of 8.1 ... older versions acted a bit differently IIRC.) regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] pgmemcache
PFC [EMAIL PROTECTED] writes: I guess AFTER COMMIT triggers would be like a NOTIFY, but more powerful. I'll let you in on a secret: NOTIFY is actually a before-commit operation. This is good enough because it never, or hardly ever, fails. I would argue that anything you want to do in an AFTER COMMIT trigger could just as well be done in a BEFORE COMMIT trigger; if that's not reliable enough then you need to fix your trigger. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] pg 7.4.x - pg_restore impossibly slow
Hi Tom, et.al., So I changed the following settings in postgresql.conf file and restarted PostgreSQL and then proceeded with pg_restore: # new changes for this test-run log_statement = true sort_mem = 10240 # default 1024 vacuum_mem = 20480 # default 8192 # from before checkpoint_segments = 10 log_pid = true log_timestamp = true With these settings and running: pg_restore -vaOd dbname dbname.DUMP Things seem to progress better. The first of the large tables got COPY'ed within 1 hr 40 min: start: 2006-04-13 11:44:19 finish: 2006-04-13 13:25:36 I ended up ctrl-C'ing out of the pg_restore as the second large table was taking over 3 hours and the last PostgreSQL log entry was from over 2.5hrs ago, with message: 2006-04-13 14:09:29 [3049] LOG: recycled transaction log file 0006006B Time for something different. Before attempting the same procedure with fsync off, I ran the following sequence of commands: $ dropdb dbname $ createdb dbname $ pg_restore -vsOd dbname dbname.DUMP $ date db.restore ; pg_restore -vcOd dbname \ dbname.DUMP ; date db.restore $ cat db.restore Thu Apr 13 18:02:51 PDT 2006 Thu Apr 13 18:17:16 PDT 2006 That's just over 14 minutes! Ideas? Is this because the -c option drops all foreign keys and so the restore goes faster? Should this be the preferred, recommended and documented method to run pg_restore? Any drawbacks to this method? Thanks, --patrick On 4/12/06, Tom Lane [EMAIL PROTECTED] wrote: patrick keshishian [EMAIL PROTECTED] writes: My dev box is much slower hardware than the customer's server. Even with that difference I expected to be able to pg_restore the database within one day. But no. Seems a bit odd. Can you narrow down more closely which step of the restore is taking the time? (Try enabling log_statements.) One thought is that kicking up work_mem and vacuum_mem is likely to help for some steps (esp. CREATE INDEX and foreign-key checking). And be sure you've done the usual tuning for write-intensive activity, such as bumping up checkpoint_segments. Turning off fsync wouldn't be a bad idea either. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] pg 7.4.x - pg_restore impossibly slow
patrick keshishian [EMAIL PROTECTED] writes: With these settings and running: pg_restore -vaOd dbname dbname.DUMP If you had mentioned you were using random nondefault switches, we'd have told you not to. -a in particular is a horrid idea performancewise --- a standard schema-plus-data restore goes way faster because it's doing index builds and foreign key checks wholesale instead of incrementally. Is this because the -c option drops all foreign keys and so the restore goes faster? Should this be the preferred, recommended and documented method to run pg_restore? It is documented in recent versions of the documentation: see http://www.postgresql.org/docs/8.1/static/populate.html particularly the last section. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster