Re: [PERFORM] Slow query - possible bug?
Tom Lane wrote: Gavin Hamill [EMAIL PROTECTED] writes: If I replace the (allocation0_.Date between '2006-06-09 00:00:00.00' and '2006-06-09 00:00:00.00') with allocation0_.Date ='2006-04-09 00:00:00.00' then the query comes back in a few milliseconds (as I'd expect :) Could we see EXPLAIN ANALYZE for * both forms of the date condition, with the roomid condition; * both forms of the date condition, WITHOUT the roomid condition; * just the roomid condition I'm thinking the planner is misestimating something, but it's hard to tell what without breaking it down. Of course. In each case, I have changed the date by two weeks to try and minimise the effect of any query caching. The base query is explain analyse select allocation0_.ID as y1_, allocation0_.RoomID as y2_, allocation0_.StatusID as y4_, allocation0_.Price as y3_, allocation0_.Number as y5_, allocation0_.Date as y6_ from Allocation allocation0_ where now both forms of the Date condition a) (allocation0_.Date between '2006-04-25 00:00:00.00' and '2006-04-25 00:00:00.00')and(allocation0_.RoomID in(211800)); QUERY PLAN --- Index Scan using ix_date on Allocation allocation0_ (cost=0.00..4.77 rows=1 width=34) (actual time=3253.340..48040.396 rows=1 loops=1) Index Cond: ((Date = '2006-04-25'::date) AND (Date = '2006-04-25'::date)) Filter: (RoomID = 211800) Total runtime: 48040.451 ms (ouch!) b) (allocation0_.Date= '2006-05-10 00:00:00.00'::date)and(allocation0_.RoomID in(211800)); QUERY PLAN Index Scan using ix_dateroom on Allocation allocation0_ (cost=0.00..5.01 rows=1 width=34) (actual time=0.033..0.035 rows=1 loops=1) Index Cond: ((RoomID = 211800) AND (Date = '2006-05-10'::date)) Total runtime: 0.075 ms (whoosh!) And now without the RoomID condition: a) (allocation0_.Date between '2006-06-10 00:00:00.00' and '2006-06-10 00:00:00.00'); QUERY PLAN --- Index Scan using ix_date on Allocation allocation0_ (cost=0.00..4.77 rows=1 width=34) (actual time=0.035..6706.467 rows=34220 loops=1) Index Cond: ((Date = '2006-06-10'::date) AND (Date = '2006-06-10'::date)) Total runtime: 6728.743 ms b) (allocation0_.Date= '2006-05-25 00:00:00.00'::date); QUERY PLAN -- Bitmap Heap Scan on Allocation allocation0_ (cost=87.46..25017.67 rows=13845 width=34) (actual time=207.674..9702.656 rows=34241 loops=1) Recheck Cond: (Date = '2006-05-25'::date) - Bitmap Index Scan on ix_date (cost=0.00..87.46 rows=13845 width=0) (actual time=185.086..185.086 rows=42705 loops=1) Index Cond: (Date = '2006-05-25'::date) Total runtime: 9725.470 ms Wow, I'm not really sure what that tells me... Cheers, Gavin. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Migration study, step 2: rewriting queries
This should be fixed by the changes I made recently in choose_bitmap_and --- it wasn't being aggressive about pruning overlapping AND conditions when a sub-OR was involved. It's possible the new coding is *too* aggressive, and will reject indexes that it'd be profitable to include; but at least it won't make this particular mistake. Ok, cool. I don't have time to test this right now as the project has to move on (and I guess testing the fix would require a dump+build CVS version+restore), but as a temporary workaround I simly dropped the xda_dat index (all queries on that table include the person_information__id column anyway). - Mikael ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Inserts optimization?
Hi, Francisco, Francisco Reyes wrote: I only wonder what is safer.. using a second or two in commit_delay or using fsync = off.. Anyone cares to comment? It might be that you misunderstood commit_delay. It will not only delay the disk write, but also block your connnection until the write actually is performed. It will rise the throughput in multi-client scenarios, but will also rise the latency, and it will absolutely bring no speedup in single-client scenarios. It does not decrease safety (in opposite to fsync=off), data will be consistent, and any application that has successfully finished a commit can be shure their data is on the platters.[1] HTH, Markus [1] As long as the platters don't lie, but that's another subject. -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] mergehashloop
Hi, Tom, Tom Lane wrote: Well, the other thing that's going on here is that we know we are overestimating the cost of nestloop-with-inner-indexscan plans. The current estimation for that is basically outer scan cost plus N times inner scan cost where N is the estimated number of outer tuples; in other words the repeated indexscan probes are each assumed to happen from a cold start. In reality, caching of the upper levels of the index means that the later index probes are much cheaper than this model thinks. We've known about this for some time but no one's yet proposed a more reasonable cost model. My spontaneus guess would be to use log(N)*inner instead of N*inner. I don't have any backings for that, it's just what my intuition tells me as a first shot. In my mind this is tied into another issue, which is that the planner always costs on the basis of each query starting from zero. In a real environment it's much cheaper to use heavily-used indexes than this cost model suggests, because they'll already be swapped in due to use by previous queries. But we haven't got any infrastructure to keep track of what's been heavily used, let alone a cost model that could make use of the info. An easy first approach would be to add a user tunable cache probability value to each index (and possibly table) between 0 and 1. Then simply multiply random_page_cost with (1-that value) for each scan. Later, this value could be automatically tuned by stats analysis or other means. I think part of the reason that people commonly reduce random_page_cost to values much lower than physical reality would suggest is that it provides a crude way of partially compensating for this basic problem. I totall agree with this, it's just what we did here from time to time. :-) Hmm, how does effective_cach_size correspond with it? Shouldn't a high effective_cache_size have a similar effect? Thanks, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Inserts optimization?
For now, I only could get good performance with bacula and postgresql when disabling fsync... Isn't that less safe? Most definitly. FWIW, I'm getting pretty good speeds with Bacula and PostgreSQL on a reasonably small db (file table about 40 million rows, filename about 5.2 million and path 1.5 million). Config changes are increasing shared mem and work mems, fsm pages, wal_sync_method=fdatasync, wal_buffers=16, checkpoint_segments=8, default_with_oids=off (before creating the bacula tables, so they don't use oids). Used to run with full_pages_writes=off, but not anymore since it's not safe. Also planning to check commit_delay and see if that helps. I will try to avoid 2 or more machines backing up at the same time.. plus in a couple of weeks I should have a better machine for the DB anyways.. Bacula already serializes access to the database (they have to support mysql/myisam), so this shouldn't help. Actually, it might well hurt by introducing extra delays. I only wonder what is safer.. using a second or two in commit_delay or using fsync = off.. Anyone cares to comment? Absolutely a commit_delay. //Magnus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] SELECT FOR UPDATE performance is bad
For the purpose of the application I need to establish some form of serialization, therefore I use FOR UPDATE. The query, inside the function, is like this: pulitzer2=# explain analyze select id FROM messages JOIN ticketing_codes_played ON id = message_id WHERE service_id = 1102 AND receiving_time BETWEEN '2006-03-01' AND '2006-06-30' FOR UPDATE; QUERY PLAN - Hash Join (cost=32131.04..34281.86 rows=627 width=16) (actual time=742.806..1491.864 rows=58005 loops=1) Hash Cond: (outer.message_id = inner.id) - Seq Scan on ticketing_codes_played (cost=0.00..857.17 rows=57217 width=10) (actual time=0.024..209.331 rows=58005 loops=1) - Hash (cost=32090.60..32090.60 rows=16177 width=10) (actual time=742.601..742.601 rows=65596 loops=1) - Bitmap Heap Scan on messages (cost=4153.51..32090.60 rows=16177 width=10) (actual time=160.555..489.459 rows=65596 loops=1) Recheck Cond: ((service_id = 1102) AND (receiving_time = '2006-03-01 00:00:00+01'::timestamp with time zone) AND (receiving_time = '2006-06-30 00:00:00+02'::timestamp with time zone)) - BitmapAnd (cost=4153.51..4153.51 rows=16177 width=0) (actual time=156.900..156.900 rows=0 loops=1) - Bitmap Index Scan on idx_service_id (cost=0.00..469.31 rows=68945 width=0) (actual time=16.661..16.661 rows=66492 loops=1) Index Cond: (service_id = 1102) - Bitmap Index Scan on idx_messages_receiving_time (cost=0.00..3683.95 rows=346659 width=0) (actual time=137.526..137.526 rows=360754 loops=1) Index Cond: ((receiving_time = '2006-03-01 00:00:00+01'::timestamp with time zone) AND (receiving_time = '2006-06-30 00:00:00+02'::timestamp with time zone)) Total runtime: 6401.954 ms (12 rows) Now, this query takes between 8 and 30 seconds, wich is a lot, since during the day we have almost 20 requests per minute. I notice that during the execution of the above mentioned query i/o goes bezerk, iostat tells me that load is around 60%. I tried playing with WAL configuration parametars, even put the log on separate disk spindles, it did nothing. Shall I reconsider the need for the exact lock I developed, or there is something more I could do to speed the things up? Mario -- Mario Splivalo Mob-Art [EMAIL PROTECTED] I can do it quick, I can do it cheap, I can do it well. Pick any two. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] SELECT FOR UPDATE performance is bad
Mario Splivalo [EMAIL PROTECTED] writes: For the purpose of the application I need to establish some form of serialization, therefore I use FOR UPDATE. The query, inside the function, is like this: pulitzer2=# explain analyze select id FROM messages JOIN ticketing_codes_played ON id = message_id WHERE service_id = 1102 AND receiving_time BETWEEN '2006-03-01' AND '2006-06-30' FOR UPDATE; Hash Join (cost=32131.04..34281.86 rows=627 width=16) (actual time=742.806..1491.864 rows=58005 loops=1) ^ Now, this query takes between 8 and 30 seconds, wich is a lot, since during the day we have almost 20 requests per minute. Acquiring a row lock separately for each of 58000 rows is not going to be a cheap operation. Especially not if anyone else is locking any of the same rows and thereby blocking you. If there is concurrent locking, you're also running a big risk of deadlock because two processes might try to lock the same rows in different orders. Are you really intending to update all 58000 rows? If not, what is the serialization requirement exactly (ie, what are you trying to accomplish)? Seems like something about this app needs to be redesigned. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Migration study, step 2: rewriting queries
Mikael Carneholm [EMAIL PROTECTED] writes: Ok, cool. I don't have time to test this right now as the project has to move on (and I guess testing the fix would require a dump+build CVS version+restore), but as a temporary workaround I simly dropped the xda_dat index (all queries on that table include the person_information__id column anyway). The patch is in the 8.1 branch so you don't need dump/restore anyway... regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Problem with LIKE-Performance
Hi! I am having trouble with like statements on one of my tables. I already tried a vacuum and analyze but with no success. The database is PostgreSQL Database Server 8.1.3 on i686-pc-mingw32 I get the following explain and I am troubled by the very high startup_cost ... does anyone have any idea why that value is so high? {SEQSCAN :startup_cost 1.00 :total_cost 100021432.33 :plan_rows 1 :plan_width 1311 :targetlist ( {TARGETENTRY :expr {VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 1 } :resno 1 :resname image_id :ressortgroupref 0 :resorigtbl 29524 :resorigcol 1 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 2 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 2 } :resno 2 :resname customer_id :ressortgroupref 0 :resorigtbl 29524 :resorigcol 2 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 3 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 3 } :resno 3 :resname theme_id :ressortgroupref 0 :resorigtbl 29524 :resorigcol 3 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 4 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 4 } :resno 4 :resname gallery_id :ressortgroupref 0 :resorigtbl 29524 :resorigcol 4 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 5 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 5 } :resno 5 :resname event_id :ressortgroupref 0 :resorigtbl 29524 :resorigcol 5 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 6 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 6 } :resno 6 :resname width :ressortgroupref 0 :resorigtbl 29524 :resorigcol 6 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 7 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 7 } :resno 7 :resname height :ressortgroupref 0 :resorigtbl 29524 :resorigcol 7 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 8 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 8 } :resno 8 :resname filesize :ressortgroupref 0 :resorigtbl 29524 :resorigcol 8 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 9 :vartype 1114 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 9 } :resno 9 :resname uploadtime :ressortgroupref 0 :resorigtbl 29524 :resorigcol 9 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 10 :vartype 1043 :vartypmod 259 :varlevelsup 0 :varnoold 1 :varoattno 10 } :resno 10 :resname filename :ressortgroupref 0 :resorigtbl 29524 :resorigcol 10 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 11 :vartype 1043 :vartypmod 259 :varlevelsup 0 :varnoold 1 :varoattno 11 } :resno 11 :resname originalfilename :ressortgroupref 0 :resorigtbl 29524 :resorigcol 11 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 12 :vartype 1043 :vartypmod 259 :varlevelsup 0 :varnoold 1 :varoattno 12 } :resno 12 :resname thumbname :ressortgroupref 0 :resorigtbl 29524 :resorigcol 12 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 13 :vartype 1043 :vartypmod 259 :varlevelsup 0 :varnoold 1 :varoattno 13 }
Re: [PERFORM] Problem with LIKE-Performance
-Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-performance- [EMAIL PROTECTED] On Behalf Of Tarabas (Manuel Rorarius) Subject: [PERFORM] Problem with LIKE-Performance Hi! I am having trouble with like statements on one of my tables. It looks like you are getting a sequential scan instead of an index scan. What is your locale setting? As far as I know Postgres doesn't support using indexes with LIKE unless you are using the C locale. Also, in the future you only need to post EXPLAIN ANALYZE not EXPLAIN ANALYZE VERBOSE. Dave ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Problem with LIKE-Performance
Hi Dave, DD It looks like you are getting a sequential scan instead of an index DD scan. What is your locale setting? As far as I know Postgres doesn't DD support using indexes with LIKE unless you are using the C locale. Actually no, I am using de_DE as locale because I need the german order-by support. But even for a seq-scan it seems pretty slow, but that's just a feeling. The table currently has ~172.000 rows and is suposed to rise to about 1 mio or more. Is there any way to speed the like's up with a different locale than C or to get an order by in a different Locale although using the default C locale? DD Also, in the future you only need to post EXPLAIN ANALYZE not EXPLAIN DD ANALYZE VERBOSE. ok, i will keep that in mind :-) didn't know how verbose you would need it *smile* Best regards Manuel ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Problem with LIKE-Performance
Tarabas (Manuel Rorarius) [EMAIL PROTECTED] writes: I get the following explain and I am troubled by the very high startup_cost ... does anyone have any idea why that value is so high? {SEQSCAN :startup_cost 1.00 You have enable_seqscan = off, no? Please refrain from posting EXPLAIN VERBOSE unless it's specifically requested ... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] SELECT FOR UPDATE performance is bad
Mario Splivalo [EMAIL PROTECTED] writes: If there is concurrent locking, you're also running a big risk of deadlock because two processes might try to lock the same rows in different orders. I think there is no risk of a deadlock, since that particular function is called from the middleware (functions are used as interface to the database), and the lock order is always the same. No, you don't even know what the order is, let alone that it's always the same. Now, I just need to have serialization, I need to have clients 'line up' in order to perform something in the database. Actually, users are sending codes from the newspaper, beer-cans, Cola-cans, and stuff, and database needs to check has the code allready been played. Since the system is designed so that it could run multiple code-games (and then there similair code could exists for coke-game and beer-game), I'm using messages table to see what code-game (i.e. service) that particular code belongs. I'd suggest using a table that has exactly one row per code-game, and doing a SELECT FOR UPDATE on that row to establish the lock you need. This need not have anything to do with the tables/rows you are actually intending to update --- although obviously such a convention is pretty fragile if you have updates coming from a variety of code. I think it's reasonably safe when you're funneling all the operations through a bit of middleware. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Problem with LIKE-Performance
Hi Tom, TL Tarabas (Manuel Rorarius) [EMAIL PROTECTED] writes: I get the following explain and I am troubled by the very high startup_cost ... does anyone have any idea why that value is so high? {SEQSCAN :startup_cost 1.00 TL You have enable_seqscan = off, no? You were right, I was testing this and had it removed, but somehow I must have hit the wrong button in pgadmin and it was not successfully removed from the database. After removing the enable_seqscan = off and making sure it was gone, it is a lot faster again. Now it takes about 469.841 ms for the select. TL Please refrain from posting EXPLAIN VERBOSE unless it's specifically TL requested ... mea culpa, i will not do that again :-) Best regards Manuel ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Problem with LIKE-Performance
Hi, i remember something that you need a special index with localesC. You nned a different operator class for this index smth. like: CREATE INDEX idx_image_title ON image USING btree (title varchar_pattern_ops); You can find the details here: http://www.postgresql.org/docs/8.1/interactive/indexes-opclass.html Best regards Hakan Kocaman Software-Development digame.de GmbH Richard-Byrd-Str. 4-8 50829 Köln Tel.: +49 (0) 221 59 68 88 31 Fax: +49 (0) 221 59 68 88 98 Email: [EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tarabas (Manuel Rorarius) Sent: Tuesday, April 18, 2006 4:35 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] Problem with LIKE-Performance Hi! I am having trouble with like statements on one of my tables. I already tried a vacuum and analyze but with no success. The database is PostgreSQL Database Server 8.1.3 on i686-pc-mingw32 I get the following explain and I am troubled by the very high startup_cost ... does anyone have any idea why that value is so high? {SEQSCAN :startup_cost 1.00 :total_cost 100021432.33 :plan_rows 1 :plan_width 1311 :targetlist ( {TARGETENTRY :expr {VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 1 } :resno 1 :resname image_id :ressortgroupref 0 :resorigtbl 29524 :resorigcol 1 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 2 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 2 } :resno 2 :resname customer_id :ressortgroupref 0 :resorigtbl 29524 :resorigcol 2 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 3 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 3 } :resno 3 :resname theme_id :ressortgroupref 0 :resorigtbl 29524 :resorigcol 3 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 4 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 4 } :resno 4 :resname gallery_id :ressortgroupref 0 :resorigtbl 29524 :resorigcol 4 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 5 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 5 } :resno 5 :resname event_id :ressortgroupref 0 :resorigtbl 29524 :resorigcol 5 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 6 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 6 } :resno 6 :resname width :ressortgroupref 0 :resorigtbl 29524 :resorigcol 6 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 7 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 7 } :resno 7 :resname height :ressortgroupref 0 :resorigtbl 29524 :resorigcol 7 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 8 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 8 } :resno 8 :resname filesize :ressortgroupref 0 :resorigtbl 29524 :resorigcol 8 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 9 :vartype 1114 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 9 } :resno 9 :resname uploadtime :ressortgroupref 0 :resorigtbl 29524 :resorigcol 9 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 10 :vartype 1043 :vartypmod 259 :varlevelsup 0 :varnoold 1 :varoattno 10 } :resno 10 :resname filename :ressortgroupref 0 :resorigtbl 29524 :resorigcol 10 :resjunk false }
Re: [PERFORM] Problem with LIKE-Performance
Dave Dutcher a écrit : It looks like you are getting a sequential scan instead of an index scan. What is your locale setting? As far as I know Postgres doesn't support using indexes with LIKE unless you are using the C locale. It does if you create your index this way : CREATE INDEX idx_image_title ON image USING btree (title varchar_pattern_ops); Please see http://www.postgresql.org/docs/8.1/interactive/indexes-opclass.html Thomas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [bulk] RE: [PERFORM] Problem with LIKE-Performance
Hi Hakan, HK i remember something that you need a special index with localesC. HK You nned a different operator class for this index smth. like: HK CREATE INDEX idx_image_title HK ON image HK USING btree HK (title varchar_pattern_ops); I also forgot that, thanks a lot for the hint. that speeded up my searches a lot! Best regards Manuel ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Problem with LIKE-Performance
On 18.04.2006, at 17:16 Uhr, Tarabas (Manuel Rorarius) wrote: Is there any way to speed the like's up with a different locale than C or to get an order by in a different Locale although using the default C locale? Sure. Just create the index with create index tabname_column_index on tabname (column varchar_pattern_ops); Than you can use something like select * from table where column like 'Something%'; Remember that an index can't be used for queries with '%pattern%'. cug smime.p7s Description: S/MIME cryptographic signature
Re: [bulk] Re: [PERFORM] Problem with LIKE-Performance
Hi Tom, TL As already noted, it might be worth your while to add an index using the TL pattern-ops opclass to help with queries like this. I have done that now and it works very fine as supposed. The problem with the high startup_costs disappeared somehow after the change of the enable_seqscan = off and a restart of pg-admin. first Time I ran the statement it showed 13 sec execution time. Seq Scan on image image0_ (cost=0.00..21414.21 rows=11 width=1311) (actual time=10504.138..12857.127 rows=119 loops=1) Filter: ((title)::text ~~ '%Davorka%'::text) Total runtime: 12857.372 ms second time I ran the statement it dropped to ~500 msec , which is pretty ok. :-) Seq Scan on image image0_ (cost=0.00..21414.21 rows=11 width=1311) (actual time=270.289..552.144 rows=119 loops=1) Filter: ((title)::text ~~ '%Davorka%'::text) Total runtime: 552.708 ms Best regards Manuel Rorarius ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Problem with LIKE-Performance
Tarabas (Manuel Rorarius) [EMAIL PROTECTED] writes: After removing the enable_seqscan = off and making sure it was gone, it is a lot faster again. Now it takes about 469.841 ms for the select. Um, no, enable_seqscan would certainly not have had any effect on the *actual* runtime of this query. All that enable_seqscan = off really does is to add a large constant to the estimated cost of any seqscan, so as to prevent the planner from selecting it unless there is no other alternative plan available. But that has nothing to do with how long the seqscan will really run. If you are seeing a speedup in repeated executions of the same seqscan plan, it's probably just a caching effect. As already noted, it might be worth your while to add an index using the pattern-ops opclass to help with queries like this. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] creating of temporary table takes very long
Sriram Dandapani [EMAIL PROTECTED] writes: Got an explain analyze output..Here it is Seq Scan on c_chkpfw_hr_tr a (cost=0.00..225975659.89 rows=11000 width=136) (actual time=2.345..648070.474 rows=22001 loops=1) Filter: (subplan) SubPlan - Bitmap Heap Scan on chkpfw_tr_hr_dimension b (cost=1474.64..10271.13 rows=1 width=0) (actual time=29.439..29.439 rows=1 loops=22001) Recheck Cond: (($0 = firstoccurrence) AND ($1 = sentryid_id) AND ($2 = node_id)) Filter: (($3 = customerid_id) AND (COALESCE($4, 0) = COALESCE(interface_id, 0)) AND (COALESCE($5, 0) = COALESCE(source_id, 0)) AND (COALESCE($6, 0) = COALESCE(destination_id, 0)) AND (COALESCE($7, 0) = COALESCE(sourceport_id, 0)) AND (COALESCE($8 (..) - Bitmap Index Scan on chkpfw_tr_hr_idx1 (cost=0.00..1474.64 rows=38663 width=0) (actual time=12.144..12.144 rows=33026 loops=22001) Index Cond: (($0 = firstoccurrence) AND ($1 = sentryid_id) AND ($2 = node_id)) Total runtime: 648097.800 ms That's probably about as good a query plan as you can hope for given the way the query is written. Those COALESCE comparisons are all unindexable (unless you make functional indexes on the COALESCE expressions). You might get somewhere by converting the EXISTS to an IN, though. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] creating of temporary table takes very long
Thx Tom I guess I have to abandon the bulk update. The columns in the where clause comprise 80% of the table columns..So indexing all may not help. The target table will have on average 60-180 million rows. I will attempt the in instead of exist and let you know the result -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 18, 2006 9:10 AM To: Sriram Dandapani Cc: Pgsql-Performance (E-mail) Subject: Re: [PERFORM] creating of temporary table takes very long Sriram Dandapani [EMAIL PROTECTED] writes: Got an explain analyze output..Here it is Seq Scan on c_chkpfw_hr_tr a (cost=0.00..225975659.89 rows=11000 width=136) (actual time=2.345..648070.474 rows=22001 loops=1) Filter: (subplan) SubPlan - Bitmap Heap Scan on chkpfw_tr_hr_dimension b (cost=1474.64..10271.13 rows=1 width=0) (actual time=29.439..29.439 rows=1 loops=22001) Recheck Cond: (($0 = firstoccurrence) AND ($1 = sentryid_id) AND ($2 = node_id)) Filter: (($3 = customerid_id) AND (COALESCE($4, 0) = COALESCE(interface_id, 0)) AND (COALESCE($5, 0) = COALESCE(source_id, 0)) AND (COALESCE($6, 0) = COALESCE(destination_id, 0)) AND (COALESCE($7, 0) = COALESCE(sourceport_id, 0)) AND (COALESCE($8 (..) - Bitmap Index Scan on chkpfw_tr_hr_idx1 (cost=0.00..1474.64 rows=38663 width=0) (actual time=12.144..12.144 rows=33026 loops=22001) Index Cond: (($0 = firstoccurrence) AND ($1 = sentryid_id) AND ($2 = node_id)) Total runtime: 648097.800 ms That's probably about as good a query plan as you can hope for given the way the query is written. Those COALESCE comparisons are all unindexable (unless you make functional indexes on the COALESCE expressions). You might get somewhere by converting the EXISTS to an IN, though. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [bulk] Re: [PERFORM] Problem with LIKE-Performance
Tarabas (Manuel Rorarius) wrote: Hi Tom, TL As already noted, it might be worth your while to add an index using the TL pattern-ops opclass to help with queries like this. I have done that now and it works very fine as supposed. The problem with the high startup_costs disappeared somehow after the change of the enable_seqscan = off and a restart of pg-admin. I'm not sure restarting pgAdmin would have had any effect. first Time I ran the statement it showed 13 sec execution time. Seq Scan on image image0_ (cost=0.00..21414.21 rows=11 width=1311) (actual time=10504.138..12857.127 rows=119 loops=1) Filter: ((title)::text ~~ '%Davorka%'::text) Total runtime: 12857.372 ms second time I ran the statement it dropped to ~500 msec , which is pretty ok. :-) This will be because all the data is cached in the server's memory. Seq Scan on image image0_ (cost=0.00..21414.21 rows=11 width=1311) (actual time=270.289..552.144 rows=119 loops=1) Filter: ((title)::text ~~ '%Davorka%'::text) Total runtime: 552.708 ms As you can see, the plan is still scanning all the rows. In any case, you've changed the query - this has % at the beginning and end, which no index will help you with. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [bulk] Re: [bulk] Re: [PERFORM] Problem with LIKE-Performance
Hi Richard, RH As you can see, the plan is still scanning all the rows. In any case, RH you've changed the query - this has % at the beginning and end, which no RH index will help you with. I realize that, the index definately helped a lot with the query where the % is just at the end. The time went down to 0.203 ms after I changed the index to varchar_pattern_ops. Index Scan using idx_image_title on image (cost=0.00..6.01 rows=1 width=1311) (actual time=0.027..0.108 rows=33 loops=1) Index Cond: (((title)::text ~=~ 'Davorka'::character varying) AND ((title)::text ~~ 'Davorkb'::character varying)) Filter: ((title)::text ~~ 'Davorka%'::text) Total runtime: 0.203 ms Although 13 sec. for the first select seems a bit odd, I think after the Database-Cache on the Table kicks in, it should be fine with ~500 ms Best regards Manuel ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] SELECT FOR UPDATE performance is bad
Suppose you have a table codes : ( game_id INT, codeTEXT, usedBOOL NOT NULL DEFAULT 'f', prize ... ... PRIMARY KEY (game_id, code) ) Just UPDATE codes SET used='t' WHERE used='f' AND game_id=... AND code=... Then check the rowcount : if one row was updated, the code was not used yet. If no row was updated, the code either did not exist, or was already used. Another option : create a table used_codes like this : ( game_id INT, codeTEXT, ... PRIMARY KEY (game_id, code) ) Then, when trying to use a code, INSERT into this table. If you get a constraint violation on the uniqueness of the primary key, your code has already been used. Both solutions have a big advantage : they don't require messing with locks and are extremely simple. The one with UPDATE is IMHO better, because it doesn't abort the current transaction (although you could use a savepoint in the INSERT case to intercept the error). On Tue, 18 Apr 2006 17:33:06 +0200, Tom Lane [EMAIL PROTECTED] wrote: Mario Splivalo [EMAIL PROTECTED] writes: If there is concurrent locking, you're also running a big risk of deadlock because two processes might try to lock the same rows in different orders. I think there is no risk of a deadlock, since that particular function is called from the middleware (functions are used as interface to the database), and the lock order is always the same. No, you don't even know what the order is, let alone that it's always the same. Now, I just need to have serialization, I need to have clients 'line up' in order to perform something in the database. Actually, users are sending codes from the newspaper, beer-cans, Cola-cans, and stuff, and database needs to check has the code allready been played. Since the system is designed so that it could run multiple code-games (and then there similair code could exists for coke-game and beer-game), I'm using messages table to see what code-game (i.e. service) that particular code belongs. I'd suggest using a table that has exactly one row per code-game, and doing a SELECT FOR UPDATE on that row to establish the lock you need. This need not have anything to do with the tables/rows you are actually intending to update --- although obviously such a convention is pretty fragile if you have updates coming from a variety of code. I think it's reasonably safe when you're funneling all the operations through a bit of middleware. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE
Thanks for all responses! I agree with most of you, and say that the RI is best maintened by Database ! Performance must be improved in other ways (indexes, hardware, etc)! - Original Message - From: Jim C. Nasby [EMAIL PROTECTED] To: Craig A. James [EMAIL PROTECTED] Cc: PFC [EMAIL PROTECTED]; Michael Glaesemann [EMAIL PROTECTED]; Rodrigo Sakai [EMAIL PROTECTED]; pgsql-performance@postgresql.org Sent: Wednesday, April 12, 2006 5:59 PM Subject: Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE On Wed, Apr 12, 2006 at 10:36:28AM -0700, Craig A. James wrote: Jim C. Nasby wrote: 1. You have only one application that modifies the data. (Otherwise, you have to duplicate the rules across many applications, leading to a code-maintenance nightmare). You forgot something: 1a: You know that there will never, ever, ever, ever, be any other application that wants to talk to the database. I know tons of people that get burned because they go with something that's good enough for now, and then regret that decision for years to come. No, I don't agree with this. Too many people waste time designing for what if... scenarios that never happen. You don't want to be dumb and design something that locks out a foreseeable and likely future need, but referential integrity doesn't meet this criterion. There's nothing to keep you from changing from app-managed to database-managed referential integrity if your needs change. In this case your argument makes no sense, because you will spend far more time re-creating RI capability inside an application than if you just use what the database offers natively. It's certainly true that you don't want to over-engineer for no reason, but many times choices are made to save a very small amount of time or hassle up-front, and those choices become extremely painful later. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Slow query - possible bug?
Gavin Hamill [EMAIL PROTECTED] writes: Tom Lane wrote: I'm thinking the planner is misestimating something, but it's hard to tell what without breaking it down. (allocation0_.Date between '2006-06-10 00:00:00.00' and '2006-06-10 00:00:00.00'); QUERY PLAN --- Index Scan using ix_date on Allocation allocation0_ (cost=0.00..4.77 rows=1 width=34) (actual time=0.035..6706.467 rows=34220 loops=1) Index Cond: ((Date = '2006-06-10'::date) AND (Date = '2006-06-10'::date)) Total runtime: 6728.743 ms Bingo, there's our misestimation: estimated 1 row, actual 34220 :-( That's why it's choosing the wrong index: it thinks the condition on RoomID isn't going to reduce the number of rows fetched any further, and so the smaller index ought to be marginally cheaper to use. In reality, it works way better when using the two-column index. I think this is the same problem recently discussed about how the degenerate case for a range comparison is making an unreasonably small estimate, where it probably ought to fall back to some equality estimate instead. With the simple-equality form of the date condition, it does get a reasonable estimate, and so it picks the right index. There should be a fix for this by the time PG 8.2 comes out, but in the meantime you might find that it helps to write the range check in a way that doesn't have identical bounds, eg date = '2006-06-10'::date AND date '2006-06-11'::date regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Slow query - possible bug?
On Tue, 18 Apr 2006 13:31:48 -0400 Tom Lane [EMAIL PROTECTED] wrote: There should be a fix for this by the time PG 8.2 comes out, but in the meantime you might find that it helps to write the range check in a way that doesn't have identical bounds, eg date = '2006-06-10'::date AND date '2006-06-11'::date OK coolies - we've already had a code release for this (and other stuff) planned for tomorrow morning checking on the client side if a single date has been chosen, then do an equality test on that... otherwise leave the between in place - seems to work like a charm, and hopefully it'll mean we don't have a loadavg of 15 on our main pg server tomorrow (!) :)) Basically, as long as I know it's a pg issue rather than something daft I've done (or not done) then I'm happy enough. Cheers, Gavin. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Slow query - possible bug?
Gavin Hamill [EMAIL PROTECTED] writes: On Tue, 18 Apr 2006 13:31:48 -0400 Tom Lane [EMAIL PROTECTED] wrote: There should be a fix for this by the time PG 8.2 comes out, but in the meantime you might find that it helps to write the range check in a way that doesn't have identical bounds, eg date = '2006-06-10'::date AND date '2006-06-11'::date OK coolies - we've already had a code release for this (and other stuff) planned for tomorrow morning checking on the client side if a single date has been chosen, then do an equality test on that... Fair enough, no reason to replace one workaround with another. But would you try it on your test case, just to verify the diagnosis? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] index is not used if I include a function that returns current time in my query
Interesting what's EXPLAIN ANALYZE show if you SET enable_seqscan=off; ? You should also consider upgrading to 8.1... On Thu, Apr 13, 2006 at 12:25:02PM +0200, Cris Carampa wrote: Hello, postgresql 7.4.8 on SuSE Linux here. I have a table called DMO with a column called ORA_RIF defined as timestamp without time zone ; I created an index on this table based on this column only. If I run a query against a text literal the index is used: explain select * from dmo where ora_rif'2006-01-01'; QUERY PLAN - Index Scan using dmo_ndx02 on dmo (cost=0.00..1183.23 rows=736 width=156) Index Cond: (ora_rif '2006-01-01 00:00:00'::timestamp without time zone) If I try to use a function that returns the current time instead, a sequential scan is always performed: explain select * from dmo where ora_riflocaltimestamp; QUERY PLAN -- Seq Scan on dmo (cost=0.00..1008253.22 rows=2703928 width=156) Filter: (ora_rif ('now'::text)::timestamp(6) without time zone) explain select * from dmo where ora_riflocaltimestamp::timestamp without time zone; QUERY PLAN -- Seq Scan on dmo (cost=0.00..1008253.22 rows=2703928 width=156) Filter: (ora_rif ('now'::text)::timestamp(6) without time zone) ... etc. ... (tried with all datetime functions with and without cast) I even tried to write a function that explicitly returns a timestamp without time zone value: create or replace function f () returns timestamp without time zone as ' declare x timestamp without time zone ; begin x := ''2006-01-01 00:00:00''; return x ; end ; ' language plpgsql ; But the result is the same: explain select * from dmo ora_riff(); QUERY PLAN - Seq Scan on dmo (cost=0.00..987973.76 rows=2703928 width=156) Filter: (ora_rif f()) Any suggestion? Kind regards, -- Cris Carampa (spamto:[EMAIL PROTECTED]) potevo chiedere come si chiama il vostro cane il mio ? un po' di tempo che si chiama Libero ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Blocks read for index scans
On Fri, Apr 14, 2006 at 11:12:55AM -0400, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: In my case it would be helpful to break the heap access numbers out between seqscans and index scans, since each of those represents very different access patterns. Would adding that be a mess? Yes; it'd require more counters-per-table than we now keep, thus nontrivial bloat in the stats collector's tables. Not to mention ISTM it would only require two additional columns, which doesn't seem unreasonable, especially considering the value of the information collected. incompatible changes in the pgstats views and the underlying functions (which some apps probably use directly). There's certainly ways around that issue, especially since this would only be adding new information (though we would probably want to consider the old info as depricated and eventually remove it). -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Blocks read for index scans
On Fri, Apr 14, 2006 at 08:05:39AM +0200, Terje Elde wrote: Jim Nasby wrote: While working on determining a good stripe size for a database, I realized it would be handy to know what the average request size is. Getting this info is a simple matter of joining pg_stat_all_tables and pg_statio_all_tables and doing some math, but there's one issue I've found; it appears that there's no information on how many heap blocks were read in by an index scan. Is there any way to get that info? snip Knowing what the average stripe size is can be a good place to start, but the real question is; which stripe size will allow the majority of your transactions to be possible to satisfy without having to go to two spindles? And of course right now there's not a very good way to know that... granted, I can look at the average request size on the machine, but that will include any seqscans that are happening, and for stripe sizing I think it's better to leave that out of the picture unless your workload is heavily based on seqscans. That said, it's the transactions against disk that typically matter. On FreeBSD, you can get an impression of this using 'systat -vmstat', and watch the KB/t column for your drives. On a related note, you know of any way to determine the breakdown between read activity and write activity on FreeBSD? vmstat, systat, iostat all only return aggregate info. :( -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Slow query - possible bug?
On Tue, 18 Apr 2006 15:51:44 -0400 Tom Lane [EMAIL PROTECTED] wrote: Fair enough, no reason to replace one workaround with another. But would you try it on your test case, just to verify the diagnosis? Yup I can confirm it from testing earlier today - as soon as the two dates are non-equal, an index scan is correctly selected and returns results in just a few milliseconds: laterooms=# explain analyse select allocation0_.ID as y1_, allocation0_.RoomID as y2_, allocation0_.StatusID as y4_, allocation0_.Price as y3_, allocation0_.Number as y5_, allocation0_.Date as y6_ from Allocation allocation0_ where (allocation0_.Date between '2006-04-25 00:00:00.00' and '2006-04-26 00:00:00.00')and(allocation0_.RoomID in(211800)); QUERY PLAN --- Index Scan using ix_dateroom on Allocation allocation0_ (cost=0.00..14.02 rows=4 width=34) (actual time=16.799..21.804 rows=2 loops=1) Index Cond: ((RoomID = 211800) AND (Date = '2006-04-25'::date) AND (Date = '2006-04-26'::date)) Total runtime: 21.910 ms which I ran first, versus the identical-date equivalent which turned in a whopping... Index Scan using ix_date on Allocation allocation0_ (cost=0.00..4.77 rows=1 width=34) (actual time=6874.272..69541.064 rows=1 loops=1) Index Cond: ((Date = '2006-04-25'::date) AND (Date = '2006-04-25'::date)) Filter: (RoomID = 211800) Total runtime: 69541.113 ms (4 rows) Cheers, Gavin. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Inserts optimization?
On Fri, Apr 14, 2006 at 03:15:33PM -0500, Scott Marlowe wrote: On Fri, 2006-04-14 at 15:09, Francisco Reyes wrote: Michael Stone writes: I still don't follow that. Why would the RAID level matter? IOW, are you actually wanting 2 spares, or are you just stick with that because you need a factor of two disks for your mirrors? RAID 10 needs pairs.. so we can either have no spares or 2 spares. Spares are placed in service one at a time. You don't need 2 spares for RAID 10, trust me. Sadly, 3ware doesn't produce any controllers with the ability to do an odd number of channels, so you end up burning through 2 slots to get a hot spare (unless you spend substantially more money and go with the next model up). -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Inserts optimization?
On Tue, Apr 18, 2006 at 01:56:44PM +0200, Magnus Hagander wrote: Bacula already serializes access to the database (they have to support mysql/myisam), so this shouldn't help. Actually, it might well hurt by introducing extra delays. You have any contact with the developers? Maybe they're a possibility for our summer of code... -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] pg_toast size
On Fri, Apr 14, 2006 at 03:13:43PM +0200, Julien Drouard wrote: Hi everyone, I've seen my pg_toast tables are becoming bigger and bigger. After googling I would like to modify my max_fsm_pages parameter to prevent that kind of problem. So I'm wondering if changing this parameter is enough and after that how can I reduce the size of these tables? By doing a full vacuum? A full vacuum would do it. CLUSTERing the table might rewrite the toast tables as well. As for toast, if you do a vacuum verbose over the entire cluster, it will tell you at the end how much space you need in the FSM. See also http://www.pervasivepostgres.com/instantkb13/article.aspx?id=10087 and http://www.pervasivepostgres.com/instantkb13/article.aspx?id=10116 -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] mergehashloop
On Tue, Apr 18, 2006 at 12:51:59PM +0200, Markus Schaber wrote: In my mind this is tied into another issue, which is that the planner always costs on the basis of each query starting from zero. In a real environment it's much cheaper to use heavily-used indexes than this cost model suggests, because they'll already be swapped in due to use by previous queries. But we haven't got any infrastructure to keep track of what's been heavily used, let alone a cost model that could make use of the info. An easy first approach would be to add a user tunable cache probability value to each index (and possibly table) between 0 and 1. Then simply multiply random_page_cost with (1-that value) for each scan. Later, this value could be automatically tuned by stats analysis or other means. Actually, if you run with stats_block_level turned on you have a first-order approximation of what is and isn't cached. Perhaps the planner could make use of this information if it's available. I think part of the reason that people commonly reduce random_page_cost to values much lower than physical reality would suggest is that it provides a crude way of partially compensating for this basic problem. I totall agree with this, it's just what we did here from time to time. :-) Hmm, how does effective_cach_size correspond with it? Shouldn't a high effective_cache_size have a similar effect? Generally, effective_cache_size is used to determine the likelyhood that something will be in-cache. random_page_cost tells us how expensive it will be to get that information if it isn't in cache. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] mergehashloop
Jim C. Nasby [EMAIL PROTECTED] writes: Actually, if you run with stats_block_level turned on you have a first-order approximation of what is and isn't cached. Only if those stats decayed (pretty fast) with time; which they don't. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] mergehashloop
Markus Schaber [EMAIL PROTECTED] writes: Hmm, how does effective_cach_size correspond with it? Shouldn't a high effective_cache_size have a similar effect? It seems reasonable to suppose that effective_cache_size ought to be used as a number indicating how much stuff would hang around from query to query. Right now it's not used that way... regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] creating of temporary table takes very long
You might try rewriting the coalesces into a row comparison... WHERE row($4, $5, ...) IS NOT DISTINCT FROM row(interface_id, source_id, ...) See http://www.postgresql.org/docs/8.1/interactive/functions-comparisons.html#AEN13408 Note that the docs only show IS DISTINCT FROM, so you might have to do WHERE NOT row(...) IS DISTINCT FROM row(...) On Tue, Apr 18, 2006 at 09:13:04AM -0700, Sriram Dandapani wrote: Thx Tom I guess I have to abandon the bulk update. The columns in the where clause comprise 80% of the table columns..So indexing all may not help. The target table will have on average 60-180 million rows. I will attempt the in instead of exist and let you know the result -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 18, 2006 9:10 AM To: Sriram Dandapani Cc: Pgsql-Performance (E-mail) Subject: Re: [PERFORM] creating of temporary table takes very long Sriram Dandapani [EMAIL PROTECTED] writes: Got an explain analyze output..Here it is Seq Scan on c_chkpfw_hr_tr a (cost=0.00..225975659.89 rows=11000 width=136) (actual time=2.345..648070.474 rows=22001 loops=1) Filter: (subplan) SubPlan - Bitmap Heap Scan on chkpfw_tr_hr_dimension b (cost=1474.64..10271.13 rows=1 width=0) (actual time=29.439..29.439 rows=1 loops=22001) Recheck Cond: (($0 = firstoccurrence) AND ($1 = sentryid_id) AND ($2 = node_id)) Filter: (($3 = customerid_id) AND (COALESCE($4, 0) = COALESCE(interface_id, 0)) AND (COALESCE($5, 0) = COALESCE(source_id, 0)) AND (COALESCE($6, 0) = COALESCE(destination_id, 0)) AND (COALESCE($7, 0) = COALESCE(sourceport_id, 0)) AND (COALESCE($8 (..) - Bitmap Index Scan on chkpfw_tr_hr_idx1 (cost=0.00..1474.64 rows=38663 width=0) (actual time=12.144..12.144 rows=33026 loops=22001) Index Cond: (($0 = firstoccurrence) AND ($1 = sentryid_id) AND ($2 = node_id)) Total runtime: 648097.800 ms That's probably about as good a query plan as you can hope for given the way the query is written. Those COALESCE comparisons are all unindexable (unless you make functional indexes on the COALESCE expressions). You might get somewhere by converting the EXISTS to an IN, though. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Multicolumn order by
Hi Apologies if this has already been raised... PostgreSQL 8.1.3 and prior versions. Vacuum done. Assuming a single table with columns named c1 to cn and a requirement to select from a particular position in multiple column order. The column values in my simple example below denoted by 'cnv' a typical query would look as follows select * from mytable where (c1 = 'c1v' and c2 = 'c2v' and c3 = 'c3v') or (c1 = 'c1v' and c2 'c2v') or (c1 'c1v') order by c1, c2, c3; In real life with the table containing many rows (9 Million) and a single multicolumn index on the required columns existing I get the following explain analyse SELECT tran_subledger, tran_subaccount, tran_mtch, tran_self, tran_Rflg FROM tran WHERE ((tran_subledger = 2 AND tran_subaccount = 'ARM ' AND tran_mtch = 0 AND tran_self = 0 ) OR (tran_subledger = 2 AND tran_subaccount = 'ARM ' AND tran_mtch 0 ) OR (tran_subledger = 2 AND tran_subaccount 'ARM ' ) OR (tran_subledger 2 )) ORDER BY tran_subledger, tran_subaccount, tran_mtch, tran_self limit 10; Limit (cost=0.00..25.21 rows=10 width=36) (actual time=2390271.832..2390290.305 rows=10 loops=1) - Index Scan using tran_mtc_idx on tran (cost=0.00..13777295.04 rows=5465198 width=36) (actual time=2390271.823..2390290.252 rows=10 loops=1) Filter: (((tran_subledger = 2) AND (tran_subaccount = 'ARM '::bpchar) AND (tran_mtch = 0) AND (tran_self = 0)) OR ((tran_subledger = 2) AND (tran_subaccount = 'ARM '::bpchar) AND (tran_mtch 0)) OR ((tran_subledger = 2) AND (tran_subaccount 'ARM '::bpchar)) OR (tran_subledger 2)) Total runtime: 2390290.417 ms Any suggestions/comments/ideas appreciated. -- Regards Theo ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] creating of temporary table takes very long
Jim C. Nasby [EMAIL PROTECTED] writes: You might try rewriting the coalesces into a row comparison... WHERE row($4, $5, ...) IS NOT DISTINCT FROM row(interface_id, source_id, ...) That would be notationally nicer, but no help performance-wise; I'm fairly sure that IS DISTINCT doesn't get optimized in any fashion whatsoever :-( What might be worth trying is functional indexes on the COALESCE(foo,0) expressions. Or if possible, consider revising your data schema to avoid using NULLs in a way that requires assuming that NULL = NULL. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Multicolumn order by
Theo Kramer [EMAIL PROTECTED] writes: select * from mytable where (c1 = 'c1v' and c2 = 'c2v' and c3 = 'c3v') or (c1 = 'c1v' and c2 'c2v') or (c1 'c1v') order by c1, c2, c3; Yeah ... what you really want is the SQL-spec row comparison operator select ... where (c1,c2,c3) = ('c1v','c2v','c3v') order by c1,c2,c3; This does not work properly in any current PG release :-( but it does work and is optimized well in CVS HEAD. See eg this thread http://archives.postgresql.org/pgsql-hackers/2006-02/msg00209.php regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Multicolumn order by
Assuming stats are accurate, you're reading through 5.5M index rows in order to run that limit query. You didn't say what the index was actually on, but you might want to try giving each column it's own index. That might make a bitmap scan feasable. I know this doesn't help right now, but 8.2 will also allow you to do this using a row comparitor. You might want to compile cvs HEAD and see how that does with this query (specifically if using a row comparitor performs better than the query below). On Wed, Apr 19, 2006 at 12:07:55AM +0200, Theo Kramer wrote: Hi Apologies if this has already been raised... PostgreSQL 8.1.3 and prior versions. Vacuum done. Assuming a single table with columns named c1 to cn and a requirement to select from a particular position in multiple column order. The column values in my simple example below denoted by 'cnv' a typical query would look as follows select * from mytable where (c1 = 'c1v' and c2 = 'c2v' and c3 = 'c3v') or (c1 = 'c1v' and c2 'c2v') or (c1 'c1v') order by c1, c2, c3; In real life with the table containing many rows (9 Million) and a single multicolumn index on the required columns existing I get the following explain analyse SELECT tran_subledger, tran_subaccount, tran_mtch, tran_self, tran_Rflg FROM tran WHERE ((tran_subledger = 2 AND tran_subaccount = 'ARM ' AND tran_mtch = 0 AND tran_self = 0 ) OR (tran_subledger = 2 AND tran_subaccount = 'ARM ' AND tran_mtch 0 ) OR (tran_subledger = 2 AND tran_subaccount 'ARM ' ) OR (tran_subledger 2 )) ORDER BY tran_subledger, tran_subaccount, tran_mtch, tran_self limit 10; Limit (cost=0.00..25.21 rows=10 width=36) (actual time=2390271.832..2390290.305 rows=10 loops=1) - Index Scan using tran_mtc_idx on tran (cost=0.00..13777295.04 rows=5465198 width=36) (actual time=2390271.823..2390290.252 rows=10 loops=1) Filter: (((tran_subledger = 2) AND (tran_subaccount = 'ARM '::bpchar) AND (tran_mtch = 0) AND (tran_self = 0)) OR ((tran_subledger = 2) AND (tran_subaccount = 'ARM '::bpchar) AND (tran_mtch 0)) OR ((tran_subledger = 2) AND (tran_subaccount 'ARM '::bpchar)) OR (tran_subledger 2)) Total runtime: 2390290.417 ms Any suggestions/comments/ideas appreciated. -- Regards Theo ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] mergehashloop
On Tue, Apr 18, 2006 at 06:22:26PM -0400, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: Actually, if you run with stats_block_level turned on you have a first-order approximation of what is and isn't cached. Only if those stats decayed (pretty fast) with time; which they don't. Good point. :/ I'm guessing there's no easy way to see how many blocks for a given relation are in shared memory, either... -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] mergehashloop
On Tue, Apr 18, 2006 at 06:26:48PM -0400, Tom Lane wrote: Markus Schaber [EMAIL PROTECTED] writes: Hmm, how does effective_cach_size correspond with it? Shouldn't a high effective_cache_size have a similar effect? It seems reasonable to suppose that effective_cache_size ought to be used as a number indicating how much stuff would hang around from query to query. Right now it's not used that way... Maybe it would be a reasonable first pass to have estimators calculate the cost if a node found everything it wanted in cache and then do a linear interpolation between that and the costs we currently come up with? Something like pg_class.relpages / sum(pg_class.relpages) would give an idea of how much of a relation is likely to be cached, which could be used for the linear interpolation. Of course having *any* idea as to how much of a relation was actually in shared_buffers (or better yet, the OS cache) would be a lot more accurate, but this simple method might be a good enough first-pass. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] mergehashloop
Markus Schaber [EMAIL PROTECTED] writes: An easy first approach would be to add a user tunable cache probability value to each index (and possibly table) between 0 and 1. Then simply multiply random_page_cost with (1-that value) for each scan. That's not the way you'd need to use it. But on reflection I do think there's some merit in a cache probability parameter, ranging from zero (giving current planner behavior) to one (causing the planner to assume everything is already in cache from prior queries). We'd have to look at exactly how such an assumption should affect the cost equations ... regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Planner doesn't chose Index - (slow select)
Hi all, I've been struggling with some performance issues with certain SQL queries. I was prepping a long-ish overview of my problem to submit, but I think I'll start out with a simple case of the problem first, hopefully answers I receive will help me solve my initial issue. Consider the following two queries which yield drastically different run-time: db=# select count(*) from pk_c2 b0 where b0.offer_id=7141; count --- 1 (1 row) Time: 5139.004 ms db=# select count(*) from pk_c2 b0 where b0.pending=true and b0.offer_id=7141; count --- 1 (1 row) Time: 1.828 ms That's 2811 times faster! Just to give you an idea of size of pk_c2 table: db=# select count(*) from pk_c2 ; count - 2158094 (1 row) Time: 5275.782 ms db=# select count(*) from pk_c2 where pending=true; count --- 51 (1 row) Time: 5073.699 ms db=# explain select count(*) from pk_c2 b0 where b0.offer_id=7141; QUERY PLAN --- Aggregate (cost=44992.78..44992.78 rows=1 width=0) - Seq Scan on pk_c2 b0 (cost=0.00..44962.50 rows=12109 width=0) Filter: (offer_id = 7141) (3 rows) Time: 1.350 ms db=# explain select count(*) from pk_c2 b0 where b0.pending=true and b0.offer_id=7141; QUERY PLAN Aggregate (cost=45973.10..45973.10 rows=1 width=0) - Index Scan using pk_boidx on pk_c2 b0 (cost=0.00..45973.09 rows=1 width=0) Index Cond: (offer_id = 7141) Filter: (pending = true) (4 rows) Time: 1.784 ms The table has indexes for both 'offer_id' and '(pending=true)': Indexes: pk_boidx btree (offer_id) pk_bpidx btree (((pending = true))) So, why would the planner chose to use the index on the second query and not on the first? Note that I am able to fool the planner into using an Index scan on offer_id by adding a silly new condition in the where clause of the first form of the query: db=# explain select count(*) from pk_c2 b0 where b0.offer_id=7141 and oid 1; QUERY PLAN --- Aggregate (cost=45983.19..45983.19 rows=1 width=0) - Index Scan using pk_boidx on pk_c2 b0 (cost=0.00..45973.09 rows=4037 width=0) Index Cond: (offer_id = 7141) Filter: (oid 1::oid) (4 rows) Time: 27.301 ms db=# select count(*) from pk_c2 b0 where b0.offer_id=7141 and oid 1; count --- 1 (1 row) Time: 1.900 ms What gives? This seems just too hokey for my taste. --patrick db=# select version(); version - PostgreSQL 7.4.12 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.6 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Planner doesn't chose Index - (slow select)
patrick keshishian [EMAIL PROTECTED] writes: I've been struggling with some performance issues with certain SQL queries. I was prepping a long-ish overview of my problem to submit, but I think I'll start out with a simple case of the problem first, hopefully answers I receive will help me solve my initial issue. Have you ANALYZEd this table lately? db=# select count(*) from pk_c2 b0 where b0.offer_id=7141; count --- 1 (1 row) The planner is evidently estimating that there are 12109 such rows, not 1, which is the reason for its reluctance to use an indexscan. Generally the only reason for it to be off that far on such a simple statistical issue is if you haven't updated the stats in a long time. (If you've got a really skewed data distribution for offer_id, you might need to raise the statistics target for it.) The table has indexes for both 'offer_id' and '(pending=true)': Indexes: pk_boidx btree (offer_id) pk_bpidx btree (((pending = true))) The expression index on (pending = true) won't do you any good, unless you spell your query in a weird way like ... WHERE (pending = true) = true I'd suggest a plain index on pending instead. db=# select version(); PostgreSQL 7.4.12 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.6 You might want to think about an update, too. 7.4 is pretty long in the tooth. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] SELECT FOR UPDATE performance is bad
Suppose you have a table codes : ( game_idINT, codeTEXT, usedBOOL NOT NULL DEFAULT 'f', prize... ... PRIMARY KEY (game_id, code) ) Just UPDATE codes SET used='t' WHERE used='f' AND game_id=... AND code=... Then check the rowcount : if one row was updated, the code was not used yet. If no row was updated, the code either did not exist, or was already used. You can use a stored procedure with exceptions no? Try this: http://www.postgresql.org/docs/8.1/interactive/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE Chris ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Blocks read for index scans
Jim C. Nasby wrote: That said, it's the transactions against disk that typically matter. On FreeBSD, you can get an impression of this using 'systat -vmstat', and watch the KB/t column for your drives. On a related note, you know of any way to determine the breakdown between read activity and write activity on FreeBSD? vmstat, systat, iostat all only return aggregate info. :( Can't think of a right way to do this ATM, but for a lab-type setup to get an idea, you could set up a gmirror volume, then choose a balancing algorithm to only read from one of the disks. The effect should be that writes go to both, while reads only go to one. Activity on the write-only disk would give you an idea of the write activity, and (read/write disk - write-only disk) would give you an idea of the reads. I have to admit though, seems like quite a bit of hassle, and I'm not sure how good the numbers would be, given that at least some of the info (KB/transaction) are totals, it'd require a bit of math to get decent numbers. But at least it's something. Terje ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] mergehashloop
Jim C. Nasby wrote: On Tue, Apr 18, 2006 at 06:22:26PM -0400, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: Actually, if you run with stats_block_level turned on you have a first-order approximation of what is and isn't cached. Only if those stats decayed (pretty fast) with time; which they don't. Good point. :/ I'm guessing there's no easy way to see how many blocks for a given relation are in shared memory, either... contrib/pg_buffercache will tell you this - what buffers from what relation are in shared_buffers (if you want to interrogate the os file buffer cache, that's a different story - tho I've been toying with doing a utility for Freebsd that would do this). Cheers Mark ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Blocks read for index scans
On Wed, Apr 19, 2006 at 04:35:11AM +0200, Terje Elde wrote: Jim C. Nasby wrote: That said, it's the transactions against disk that typically matter. On FreeBSD, you can get an impression of this using 'systat -vmstat', and watch the KB/t column for your drives. On a related note, you know of any way to determine the breakdown between read activity and write activity on FreeBSD? vmstat, systat, iostat all only return aggregate info. :( Can't think of a right way to do this ATM, but for a lab-type setup to get an idea, you could set up a gmirror volume, then choose a balancing algorithm to only read from one of the disks. The effect should be that writes go to both, while reads only go to one. Activity on the write-only disk would give you an idea of the write activity, and (read/write disk - write-only disk) would give you an idea of the reads. I have to admit though, seems like quite a bit of hassle, and I'm not sure how good the numbers would be, given that at least some of the info (KB/transaction) are totals, it'd require a bit of math to get decent numbers. But at least it's something. Yeah... not gonna happen... It's completely mind-boggling that FBSD doesn't track writes and reads seperately. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] mergehashloop
On Wed, Apr 19, 2006 at 04:47:40PM +1200, Mark Kirkwood wrote: Jim C. Nasby wrote: On Tue, Apr 18, 2006 at 06:22:26PM -0400, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: Actually, if you run with stats_block_level turned on you have a first-order approximation of what is and isn't cached. Only if those stats decayed (pretty fast) with time; which they don't. Good point. :/ I'm guessing there's no easy way to see how many blocks for a given relation are in shared memory, either... contrib/pg_buffercache will tell you this - what buffers from what relation are in shared_buffers (if you want to interrogate the os file So theoretically with that code we could make the cost estimator functions more intelligent about actual query costs. Now, how you'd actually see how those estimates improved... buffer cache, that's a different story - tho I've been toying with doing a utility for Freebsd that would do this). Well, the problem is that I doubt anything that OS-specific would be accepted into core. What we really need is some method that's OS-agnostic... -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] mergehashloop
Mark Kirkwood [EMAIL PROTECTED] writes: Jim C. Nasby wrote: Good point. :/ I'm guessing there's no easy way to see how many blocks for a given relation are in shared memory, either... contrib/pg_buffercache will tell you this - I think the key word in Jim's comment was easy, ie, cheap. Grovelling through many thousands of buffers to count the matches to a given relation doesn't sound appetizing, especially not if it gets done over again several times during each query-planning cycle. Trying to keep centralized counts somewhere would be even worse (because of locking/ contention issues). regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Blocks read for index scans
Jim C. Nasby wrote: Yeah... not gonna happen... It's completely mind-boggling that FBSD doesn't track writes and reads seperately. 'iostat' does not tell you this, but 'gstat' does - its the geom system monitor (a bit annoying that the standard tool is lacking in this regard...). Cheers Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] mergehashloop
Tom Lane wrote: Mark Kirkwood [EMAIL PROTECTED] writes: Jim C. Nasby wrote: Good point. :/ I'm guessing there's no easy way to see how many blocks for a given relation are in shared memory, either... contrib/pg_buffercache will tell you this - I think the key word in Jim's comment was easy, ie, cheap. Grovelling through many thousands of buffers to count the matches to a given relation doesn't sound appetizing, especially not if it gets done over again several times during each query-planning cycle. Trying to keep centralized counts somewhere would be even worse (because of locking/ contention issues). Yeah - not sensible for a transaction oriented system - might be ok for DSS tho. Cheers mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings