Re: [PERFORM] When are index scans used over seq scans?
Tom Lane wrote: The explain shows no such thing. What is the *actual* runtime of each plan per EXPLAIN ANALYZE, please? Ok, it took 3.5 hours to complete. :-/ This is with the default cpu_tuple_cost = 0.01: Nested Loop (cost=252.80..233010147.16 rows=1035480320 width=98) (actual time=0.369..12672213.137 rows=6171334 loops=1) Join Filter: ((outer.starttimetrunc = inner.ts) AND (outer.finishtimetrunc = inner.ts)) - Seq Scan on sessions us (cost=0.00..26822.36 rows=924536 width=106) (actual time=0.039..5447.349 rows=924536 loops=1) - Materialize (cost=252.80..353.60 rows=10080 width=8) (actual time=0.000..2.770 rows=10080 loops=924536) - Seq Scan on duration du (cost=0.00..252.80 rows=10080 width=8) (actual time=0.019..13.397 rows=10080 loops=1) Total runtime: 12674486.670 ms Once again with cpu_tuple_cost = 0.1: Nested Loop (cost=0.00..667684584.42 rows=1035480320 width=98) (actual time=42.892..39877.928 rows=6171334 loops=1) - Seq Scan on sessions us (cost=0.00..110030.60 rows=924536 width=106) (actual time=0.020..917.803 rows=924536 loops=1) - Index Scan using ix_du_ts on duration du (cost=0.00..604.46 rows=1120 width=8) (actual time=0.004..0.011 rows=7 loops=924536) Index Cond: ((outer.starttimetrunc = du.ts) AND (outer.finishtimetrunc = du.ts)) Total runtime: 41635.468 ms (5 rows) -- Richard van den Berg, CISSP --- Trust Factory B.V. | www.dna-portal.net Bazarstraat 44a| www.trust-factory.com 2518AK The Hague | Phone: +31 70 3620684 The Netherlands| Fax : +31 70 3603009 --- ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] How can an index be larger than a table
Hi, I have a series of tables with identical structure. Some contain a few thousand rows and some contain 3,000,000 rows. Another applicate writes the rows and my applicate reads then just by selecting where pk last_seen_pk limit 2000. I've found that one of the tables, when selecting from it that one of the tables is many times slower than the others. For instance when reading data in batches of 2000 rows, it seems to take 26 seconds to query from dave_data_update_events with 1593600, but only 1 or two seconds to query from jane_data_update_events with 310 rows! This is ther SQL used | |select | events.event_id, ctrl.real_name, events.tsds, events.value, | events.lds, events.correction, ctrl.type, ctrl.freq |from dave_data_update_events events, dave_control ctrl |where events.obj_id = ctrl.obj_id and |events.event_id 32128893::bigint |order by events.event_id |limit 2000 | Here is the structure of the tables... | |CREATE TABLE dave_control ( |obj_id numeric(6,0) NOT NULL, |real_name character varying(64) NOT NULL, |type numeric(2,0) NOT NULL, |freq numeric(2,0) NOT NULL |); | |CREATE TABLE dave_data_update_events ( |lds numeric(13,0) NOT NULL, |obj_id numeric(6,0) NOT NULL, |tsds numeric(13,0) NOT NULL, |value character varying(22) NOT NULL, |correction numeric(1,0) NOT NULL, |delta_lds_tsds numeric(13,0) NOT NULL, |event_id bigserial NOT NULL |); | |CREATE UNIQUE INDEX dave_control_obj_id_idx ON dave_control USING btree (obj_id); |ALTER TABLE dave_control CLUSTER ON dave_control_obj_id_idx; | |CREATE UNIQUE INDEX dave_control_real_name_idx ON dave_control USING btree (real_name); | |CREATE INDEX dave_data_update_events_lds_idx ON dave_data_update_events USING btree (lds); | |CREATE INDEX dave_data_update_events_obj_id_idx ON dave_data_update_events USING btree (obj_id); | |ALTER TABLE ONLY dave_control |ADD CONSTRAINT dave_control_obj_id_key UNIQUE (obj_id); | |ALTER TABLE ONLY dave_control |ADD CONSTRAINT dave_control_real_name_key UNIQUE (real_name); | |ALTER TABLE ONLY dave_data_update_events |ADD CONSTRAINT dave_data_update_events_event_id_key UNIQUE (event_id); | There are several pairs of tables, but with names like rod, jane, fredie, etc.. instead of dave. The first thing to note about the scheme (not designed by me) is that the control table is clustered on obj_id, but the data_update_events table is not clustered. Does that mean the rows will be stored in order of insert? That might be ok, because data_update_events table is like a queue and I read it in the order the new rows are inserted. What also seems weird to me is that the control table has some unique indexes created on it, but the data_upate_events table just has a unique constraint. Will postgres use an index in the background to enforce this constraint? When looking at the indexes on the all the tables in DbVisualiser my colleague noticed that the cardinality of the indexes on the rod, jane and fredie tables was consistent, but for dave the cardinality was strange... | |SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE relname LIKE 'dave_data%'; | |relnamerelkind reltuples relpages |=== === = |dave_data_update_eventsr 1593600.0 40209 |dave_data_update_events_event_id_keyi1912320.0 29271 |dave_data_update_events_event_id_seqS 1.0 1 |dave_data_update_events_lds_idxi 1593600.0 6139 |dave_data_update_events_obj_id_idx i 1593600.0 6139 |iso_pjm_data_update_events_obj_id_idxi1593600.0 6139 | Note that there are only 1593600 rows in the table, so why the 1912320 figure? Of course I checked that the row count was correct... | |EXPLAIN ANALYZE |select count(*) from iso_pjm_data_update_events | |QUERY PLAN |Aggregate (cost=60129.00..60129.00 rows=1 width=0) (actual time=35933.292..35933.293 rows=1 loops=1) | - Seq Scan on iso_pjm_data_update_events (cost=0.00..56145.00 rows=1593600 width=0) (actual time=0.213..27919.497 rows=1593600 loops=1) |Total runtime: 35933.489 ms | and... | |select count(*) from iso_pjm_data_update_events | |count |1593600 | so it's not that there are any undeleted rows lying around. So any comments on the index structure? Any ideas why the cardinality of the index is greater than the number of rows in the table? Was it because the table used to be larger? Also any ideas on how else to track down the big performance difference between tables of the same structure? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] two queries and dual cpu (perplexed)
Hi everybody, One of our clients was using SQL-Server and decided to switch to PostgreSQL 8.0.1. Hardware: Dual processor Intel(R) Xeon(TM) CPU 3.40GHz OS: Enterprise Linux with 2.6.9-5 SMP kernel Filesystem: ext3 SHMMAX: $ cat /proc/sys/kernel/shmmax 6442450944 --- beleive that's ~6.5 GB, total ram is 8GB Database: 15GB in size with a few tables with over 80 million rows. Here is a snippit from the output of SELECT oid , relname, relpages, reltuples FROM pg_class ORDER BY relpages DESC; oid| relname | relpages | reltuples ---+-+--+- 16996 | CurrentAusClimate | 474551 | 8.06736e+07 16983 | ClimateChangeModel40| 338252 | 5.31055e+07 157821816 | PK_CurrentAusClimate| 265628 | 8.06736e+07 157835995 | idx_climateid | 176645 | 8.06736e+07 157835996 | idx_ausposnum | 176645 | 8.06736e+07 157835997 | idx_climatevalue| 176645 | 8.06736e+07 157821808 | PK_ClimateModelChange_40| 174858 | 5.31055e+07 157821788 | IX_iMonth001| 116280 | 5.31055e+07 157821787 | IX_ClimateId| 116280 | 5.31055e+07 157821786 | IX_AusPosNumber | 116280 | 5.31055e+07 17034 | NeighbourhoodTable |54312 | 1.00476e+07 157821854 | PK_NeighbourhoodTable |27552 | 1.00476e+07 157821801 | IX_NeighbourhoodId |22002 | 1.00476e+07 157821800 | IX_NAusPosNumber|22002 | 1.00476e+07 157821799 | IX_AusPosNumber006 |22002 | 1.00476e+07 [...] To test the performance of the database we ran one of the most demanding queries that exist with the following embarrassing results: Query Execution time on: SQL-Server (dual processor xeon) 3min 11sec PostgreSQL (SMP IBM Linux server) 5min 30sec Now I have not touch the $PGDATA/postgresql.conf (As I know very little about memory tuning) Have run VACCUM ANALYZE. The client understands that they may not match the performance for a single query as there is no multithreading. So they asked me to demonstrate the benefits of Postgresql's multiprocessing capabilities. To do that I modified the most demanding query to create a second query and ran them in parallel: $ time ./run_test1.sh $ cat ./run_test1.sh /usr/bin/time -p psql -f ./q1.sql ausclimate q1.out 2q1.time /usr/bin/time -p psql -f ./q2.sql ausclimate q2.out 2q2.time and the time taken is *twice* that for the original. The modification was minor. The queries do make use of both CPUs: 2388 postgres 16 0 79640 15m 11m R 80.9 0.2 5:05.81 postmaster 2389 postgres 16 0 79640 15m 11m R 66.2 0.2 5:04.25 postmaster But I can't understand why there's no performance improvement and infact there seems to be no benefit of multiprocessing. Any ideas? I don't know enough about the locking procedures employed by postgres but one would think this shouldn't be and issue with read-only queries. Please don't hesitate to ask me for more info like, the query or the output of explain, or stats on memory usage. I just wanted to keep this short and provide more info as the cogs start turning :-) Thanks Regards Shoaib ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] When are index scans used over seq scans?
Thanks a lot John for the correct search terms. :-) The suggestion in http://archives.postgresql.org/pgsql-performance/2005-04/msg00029.php to add a constraint that checks (finishtime = starttime) does not make a difference for me. Still seq scans are used. The width solution explained in http://archives.postgresql.org/pgsql-performance/2005-04/msg00027.php and http://archives.postgresql.org/pgsql-performance/2005-04/msg00116.php does make a huge difference when selecting 1 timestamp using a BETWEEN (2ms vs 2sec), but as soon as I put 2 timestamps in a table and try a join, everything goes south (7.7sec). I have 10k timestamps in the duration table. :-( I'm getting more confused on how the planner decides to use indexes. For example, if I try: explain analyze select us.oid from sessions us where '2005-04-10 23:11:00' between us.starttimetrunc and us.finishtimetrunc; QUERY PLAN Index Scan using sessions_st_ft_idx2 on sessions us (cost=0.00..18320.73 rows=4765 width=4) (actual time=0.063..2.455 rows=279 loops=1) Index Cond: (('2005-04-10 23:11:00'::timestamp without time zone = finishtimetrunc) AND ('2005-04-10 23:11:00'::timestamp without time zone = starttimetrunc)) Total runtime: 2.616 ms is uses the index! However, if I change the date it does not: explain analyze select us.oid from sessions us where '2005-04-09 23:11:00' between us.starttimetrunc and us.finishtimetrunc; QUERY PLAN Seq Scan on sessions us (cost=0.00..68173.04 rows=41575 width=4) (actual time=553.424..1981.695 rows=64 loops=1) Filter: (('2005-04-09 23:11:00'::timestamp without time zone = starttimetrunc) AND ('2005-04-09 23:11:00'::timestamp without time zone = finishtimetrunc)) Total runtime: 1981.802 ms The times in sessions go from '2005-04-04 00:00:00' to '2005-04-10 23:59:00' so both are valid times to query for, but April 10th is more towards the end. A little experimenting shows that if I go earlier than '2005-04-10 13:26:15' seq scans are being used. I was thinking this timestamp would have something to do with the histogram_bounds in pg_stats, but I cannot find a match: starttimetrunc | {2005-04-04 00:05:00,2005-04-04 11:49:00,2005-04-04 22:03:00,2005-04-05 10:54:00,2005-04-05 21:08:00,2005-04-06 10:28:00,2005-04-07 01:57:00,2005-04-07 15:55:00,2005-04-08 10:18:00,2005-04-08 17:12:00,2005-04-10 23:57:00} finishtimetrunc| {2005-04-04 00:05:00.93,2005-04-04 11:53:00.98,2005-04-04 22:35:00.38,2005-04-05 11:13:00.02,2005-04-05 21:31:00.98,2005-04-06 10:45:01,2005-04-07 02:08:08.25,2005-04-07 16:20:00.93,2005-04-08 10:25:00.40,2005-04-08 17:15:00.94,2005-04-11 02:08:19} -- Richard van den Berg, CISSP --- Trust Factory B.V. | www.dna-portal.net Bazarstraat 44a| www.trust-factory.com 2518AK The Hague | Phone: +31 70 3620684 The Netherlands| Fax : +31 70 3603009 --- Have you visited our new DNA Portal? --- ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] two queries and dual cpu (perplexed)
On Apr 21, 2005, at 7:49 AM, Shoaib Burq (VPAC) wrote: Now I have not touch the $PGDATA/postgresql.conf (As I know very little about memory tuning) Have run VACCUM ANALYZE. You should really, really bump up shared_buffers and given you have 8GB of ram this query would likely benefit from more work_mem. and the time taken is *twice* that for the original. The modification was minor. The queries do make use of both CPUs: Is this an IO intensive query? If running both in parellel results in 2x the run time and you have sufficient cpus it would (to me) indicate you don't have enough IO bandwidth to satisfy the query. Can we see an explain analyze of the query? Could be a bad plan and a bad plan will never give good performance. -- Jeff Trout [EMAIL PROTECTED] http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] two queries and dual cpu (perplexed)
On Thu, 21 Apr 2005, Jeff wrote: On Apr 21, 2005, at 7:49 AM, Shoaib Burq (VPAC) wrote: Now I have not touch the $PGDATA/postgresql.conf (As I know very little about memory tuning) Have run VACCUM ANALYZE. You should really, really bump up shared_buffers and given you have 8GB of ram this query would likely benefit from more work_mem. I'd recommend shared_buffers = 10600. Its possible that work_mem in the hundreds of megabytes might have a good impact, but its hard to say without seeing the EXPLAIN ANALYZE output. Gavin ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] two queries and dual cpu (perplexed)
here's explain sorry about the mess: I can attach it as text-file if you like. ausclimate=# explain ANALYZE select count(*) from getfutureausclimate; QUERY PLAN Aggregate (cost=1069345.85..1069345.85 rows=1 width=0) (actual time=443241.241..443241.242 rows=1 loops=1) - Subquery Scan getfutureausclimate (cost=1069345.61..1069345.81 rows=16 width=0) (actual time=411449.034..436165.259 rows=13276368 loops=1) - Sort (cost=1069345.61..1069345.65 rows=16 width=58) (actual time=411449.026..426001.199 rows=13276368 loops=1) Sort Key: Aus40_DEM.AusPosNumber, CurrentAusClimate.iMonth - Nested Loop (cost=2.19..1069345.29 rows=16 width=58) (actual time=135.390..366902.373 rows=13276368 loops=1) - Nested Loop (cost=2.19..1067304.07 rows=44 width=68) (actual time=107.627..186390.137 rows=13276368 loops=1) - Nested Loop (cost=2.19..1067038.94 rows=44 width=52) (actual time=87.255..49743.796 rows=13276368 loops=1) - Nested Loop (cost=2.19..8.09 rows=1 width=32) (actual time=52.684..52.695 rows=1 loops=1) - Merge Join (cost=2.19..2.24 rows=1 width=24) (actual time=28.000..28.007 rows=1 loops=1) Merge Cond: (outer.ClimateId = inner.ClimateId) - Sort (cost=1.17..1.19 rows=7 width=10) (actual time=10.306..10.307 rows=3 loops=1) Sort Key: ClimateVariables.ClimateId - Seq Scan on ClimateVariables (cost=0.00..1.07 rows=7 width=10) (actual time=10.277..10.286 rows=7 loops=1) - Sort (cost=1.02..1.02 rows=1 width=14) (actual time=17.679..17.680 rows=1 loops=1) Sort Key: GetFutureClimateParameters.ClimateId - Seq Scan on GetFutureClimateParameters (cost=0.00..1.01 rows=1 width=14) (actual time=17.669..17.671 rows=1 loops=1) - Index Scan using PK_ScenarioEmissionLevels on ScenarioEmissionLevels (cost=0.00..5.83 rows=1 width=18) (actual time=24.676..24.679 rows=1 loops=1) Index Cond: ((ScenarioEmissionLevels.ScenarioId = outer.ScenarioId) AND (ScenarioEmissionLevels.iYear = outer.iYear) AND (ScenarioEmissionLevels.LevelId = outer.LevelId)) - Index Scan using IX_ClimateId on ClimateChangeModel40 (cost=0.00..1063711.75 rows=265528 width=20) (actual time=34.564..19435.855 rows=13276368 loops=1) Index Cond: (outer.ClimateId = ClimateChangeModel40.ClimateId) - Index Scan using PK_Aus40_DEM on Aus40_DEM (cost=0.00..6.01 rows=1 width=16) (actual time=0.005..0.006 rows=1 loops=13276368) Index Cond: (outer.AusPosNumber = Aus40_DEM.AusPosNumber) - Index Scan using PK_CurrentAusClimate on CurrentAusClimate (cost=0.00..46.20 rows=11 width=14) (actual time=0.007..0.009 rows=1 loops=13276368) Index Cond: ((CurrentAusClimate.ClimateId = outer.ClimateId) AND (outer.AusPosNumber = CurrentAusClimate.AusPosNum) AND (CurrentAusClimate.iMonth = outer.iMonth)) Total runtime: 443983.269 ms (25 rows) Shsshh... You should really, really bump up shared_buffers and given you have 8GB of ram this query would likely benefit from more work_mem. I actually tried that and there was a decrease in performance. Are the shared_buffers and work_mem the only things I should change to start with? If so what's the reasoning. Is this an IO intensive query? If running both in parellel results in 2x the run time and you have sufficient cpus it would (to me) indicate you don't have enough IO bandwidth to satisfy the query. Yes I think so too: ... I am just compiling some io stats... Also will jump on to irc... Whoa! thanks all... I am overwhelmed with the help I am getting... I love it! ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
John A Meinel [EMAIL PROTECTED] writes: Joel Fradkin wrote: Postgres was on the second run Total query runtime: 17109 ms. Data retrieval runtime: 72188 ms. 331640 rows retrieved. How were you measuring data retrieval time? I suspect he's using pgadmin. We've seen reports before suggesting that pgadmin can be amazingly slow, eg here http://archives.postgresql.org/pgsql-performance/2004-10/msg00427.php where the *actual* data retrieval time as shown by EXPLAIN ANALYZE was under three seconds, but pgadmin claimed the query runtime was 22 sec and data retrieval runtime was 72 sec. The problem is that pgAdmin takes your query results and puts it in a grid. The grid is not designed to be used in that way for large datasets. The time complexity is not linear and really breaks down around 10k-100k rows depending on various factors. pgAdmin users just have to become used to it and use limit or the filter feature at appropriate times. The ms sql enterprise manager uses cursors which has its own set of nasty issues (no mvcc). In fairness, unless you are running with \a switch, psql adds a fair amount of time to the query too. Joel: Postgres was on the second run Total query runtime: 17109 ms. Data retrieval runtime: 72188 ms. 331640 rows retrieved. The Data retrieval runtime is time spend by pgAdmin formatting, etc. The query runtime is the actual timing figure you should be concerned with (you are not comparing apples to apples). I can send you a utility I wrote in Delphi which adds only a few seconds overhead for 360k result set. Or, go into psql, throw \a switch, and run query. or: psql -A -c select * from myview where x output.txt it should finish the above in 16-17 sec plus the time to write out the file. Joel, I have a lot of experience with all three databases you are evaluating and you are making a huge mistake switching to mysql. you can make a decent case for ms sql, but it's quite expensive at your level of play as you know. Merlin ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Joel Fradkin wrote: I did think of something similar just loading the data tables with junk records and I may visit that idea with Josh. I did just do some comparisons on timing of a plain select * from tbl where indexed column = x and it was considerably slower then both MSSQL and MYSQL, so I am still a bit confused. This still might be configuration issue (I ran on my 2gig desktop and the 8 gig Linux box comparisons were all ran on the same machines as far MSSQL, MYSQL, and Postgres. I turned off postgres when running MYSQL and turned off MYSQL when running postgres, MSSQL had one of the two running while I tested it. For the 360,000 records returned MYSQL did it in 40 seconds first run and 17 seconds second run. MSSQL did it in 56 seconds first run and 16 seconds second run. Postgres was on the second run Total query runtime: 17109 ms. Data retrieval runtime: 72188 ms. 331640 rows retrieved. Beware! From the data, I can see that you're probably using pgAdmin3. The time to execute your query including transfer of all data to the client is 17s in this example, while displaying it (i.e. pure GUI and memory alloc stuff) takes 72s. Execute to a file to avoid this. Regards, Andreas ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] two queries and dual cpu (perplexed)
here are some i/o stats with the unchanged postgresql.conf. Gonna change it now and have another go. [EMAIL PROTECTED] MultiCPU_test]$ vmstat 10 procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 0 0 25808 710356 6348860 9720522473291 3 1 0 99 0 2 0 25808 647636 6348960 103478400 3226 3048 1054 92819 55 19 25 1 2 0 25808 585684 6349032 109666000 3203 3057 1053 96375 55 19 25 1 2 0 25808 521940 6349112 116036400 3388 2970 1052 95563 54 19 26 1 2 0 25808 463636 6349184 121856800 2804 3037 1048 93560 55 19 25 1 2 0 25808 405460 6349264 127669600 2794 3047 1046 96971 55 19 25 1 2 0 25808 343956 6349340 133816000 3151 3040 1049 96629 55 20 25 1 2 0 25808 287252 6349412 139473200 2666 2990 1045 95173 54 20 25 1 2 0 25808 230804 6349484 145116800 2678 2966 1044 95577 54 19 26 1 2 0 25808 169428 6349560 151242800 3164 3015 1048 98451 55 19 25 1 2 0 25808 110484 6349640 157130400 2910 2970 1050 98214 55 20 25 0 0 0 25808 50260 6349716 163140800 3049 3015 1049 99830 55 20 25 1 1 0 25808 8512 6349788 167315600 2934 2959 1047 95940 54 19 24 3 2 1 25808 8768 6349796 167294400 2552 2984 1043 97893 55 19 18 8 1 1 25808 8384 6349824 167325600 2596 3032 1051 94646 55 19 19 6 2 1 25808 8960 6349856 167268000 2982 3028 1052 94486 55 20 19 6 1 1 25808 8960 6349884 167258400 3125 2919 1052 86969 52 20 19 8 2 0 25808 6196 6349912 167527600 2809 3064 1046 99147 55 20 19 5 1 1 25808 9216 6349976 167215200 2898 3076 1047 93271 55 19 21 6 2 0 25808 6580 6349316 166397200 3150 2982 1048 94964 54 22 20 4 2 0 25808 7692 6349348 167448000 2742 3006 1045 97488 54 21 21 4 2 1 25808 8232 6346244 167670000 2900 3022 1048 92496 54 20 19 8 2 0 25808 7104 6346192 167804400 3284 2958 1057 97265 55 20 18 7 2 0 25808 8488 6346168 167677600 2609 3031 1047 93965 55 19 20 7 2 1 25808 8680 6346184 167648800 3067 3044 1051 96594 55 19 19 6 2 0 25808 8576 6346168 167664000 2900 3070 1047 96300 55 19 20 6 2 1 25808 9152 6346156 167617600 3010 2993 1049 98271 55 20 19 7 2 0 25808 7040 6346172 167820000 3242 3034 1050 97669 55 20 21 4 1 1 25808 8900 6346192 167634400 2859 3014 1052 91220 53 19 21 6 2 1 25808 8512 6346188 167682400 2737 2960 1049 100609 55 20 18 6 2 0 25808 7204 6346236 167800000 2972 3045 1050 94851 55 19 17 9 1 0 25808 7116 6346208 167802800 3053 2996 1048 98901 55 19 20 5 2 1 25808 9180 6346196 167606800 2857 3067 1047 100629 56 21 20 3 procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 3 1 25808 8896 6346172 167650000 3138 3022 1049 97937 55 20 20 5 2 1 25808 9088 6346188 167621200 2844 3022 1047 97664 55 19 20 5 1 1 25808 8920 6346248 167628800 3017 3024 1049 99644 55 20 17 7 1 1 25808 8064 6346116 167716800 2824 3037 1047 99171 55 20 19 5 2 1 25820 8472 6344336 167859600 2969 2957 1047 96396 54 21 18 7 2 1 25820 9208 6344300 167788400 3072 3031 1050 95017 54 19 22 5 1 0 25820 7848 6344328 167914800 3229 3011 1050 97108 55 19 20 5 2 1 25820 8960 6344348 167804000 2701 2954 1046 98485 54 20 21 5 2 0 25820 7900 6344368 167924400 2604 2931 1044 97198 54 20 19 7 2 0 25820 9240 6344424 167789600 2990 3015 1048 102414 56 20 19 5 2 0 25820 8924 6344436 167808800 3256 2991 1049 96709 55 19 21 5 1 1 25820 8900 6344456 167820400 2761 3030 1051 96498 55 20 20 5 2 0 25820 7628 630 167944400 2952 3012 1053 96534 55 20 19 6 2 0 25820 7080 6344472 167995600 2848 3079 1050 95074 56 19 19 6 2 0 25820 8928 634 167808000 2985 3021 1049 96806 55 20 18 7 2 1 25820 7976 6344976 1676892 110 3429 3062 1083 92817 55 19 18 8 2 0 25820 8096 6345080 167665200 2662 2989 1056 91921 54 19 17 10 1 0 25820 7424 6345128 167735200 2956 3029 1054 99385 56 19 20 5 2 0 25820 6664 6345232 167772400 3358 3030 1064 95929 55 19 21 5 1 0 25820 7268 6345320 167695600 2681 3012 1082 97744 54 20 18 7 2 0 25820 6944 6345364 167718400 3156 3022 1061 98055 55 19 22 4 2 0 25820 8668 6345420 167542800 2990 3018 1050 94734 55 19 22 5 2 1 25820
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Why is MYSQL returning 360,000 rows, while Postgres is only returning 330,000? This may not be important at all, though. I also assume you are selecting from a plain table, not a view. Yes plain table. Difference in rows is one of the datasets had sears data in it. It (speed differences found) is much worse on some of my views, which is what forced me to start looking at other options. I suppose knowing your work_mem, and shared_buffers settings would be useful. I have posted my configs, but will add the Tampa to the bottom again. My desktop has # - Memory - shared_buffers = 8000 # min 16, at least max_connections*2, 8KB each work_mem = 8000#1024# min 64, size in KB maintenance_work_mem = 16384# min 1024, size in KB #max_stack_depth = 2048 # min 100, size in KB # - Free Space Map - max_fsm_pages = 3#2 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 1000# min 100, ~50 bytes each # - Planner Cost Constants - effective_cache_size = 8#1000 # typically 8KB each random_page_cost = 2# units are one sequential page fetch cost How were you measuring data retrieval time? And how does this compare to what you were measuring on the other machines? It might be possible that what you are really measuring is just the time it takes psql to load up all the data into memory, and then print it out. And since psql defaults to measuring entry lengths for each column, this may not be truly comparable. It *looks* like it only takes 18s for postgres to get the data, but then it is taking 72s to transfer the data to you. That would be network latency, or something like that, not database latency. And I would say that 18s is very close to 16 or 17 seconds. This was ran on the machine with database (as was MYSQL and MSSQL). The PG timing was from PGADMIN and the 18 secs was second run, first run was Same time to return the data and 70 secs to do the first part like 147 secs all told, compared to the 40 seconds first run of MYSQL and 56 Seconds MSSQL. MYSQL was done in their query tool, it returns the rows as well and MSSQL was done in their query analyzer. All three tools appear to use a similar approach. Just an FYI doing an explain analyze of my problem view took much longer then actually returning the data in MSSQL and MYSQL. I have done extensive testing with MYSQL (just this table and two of my problem views). I am not using the transactional version, because I need the best speed. I don't know what commands you were issuing, or how you measured, though. You might be using some other interface (like ODBC), which I can't say too much about. John =:- This is the Linux box config. # - # PostgreSQL configuration file # - # # This file consists of lines of the form: # # name = value # # (The '=' is optional.) White space may be used. Comments are introduced # with '#' anywhere on a line. The complete list of option names and # allowed values can be found in the PostgreSQL documentation. The # commented-out settings shown in this file represent the default values. # # Please note that re-commenting a setting is NOT sufficient to revert it # to the default value, unless you restart the postmaster. # # Any option can also be given as a command line switch to the # postmaster, e.g. 'postmaster -c log_connections=on'. Some options # can be changed at run-time with the 'SET' SQL command. # # This file is read on postmaster startup and when the postmaster # receives a SIGHUP. If you edit the file on a running system, you have # to SIGHUP the postmaster for the changes to take effect, or use # pg_ctl reload. Some settings, such as listen_address, require # a postmaster shutdown and restart to take effect. #--- # FILE LOCATIONS #--- # The default values of these variables are driven from the -D command line # switch or PGDATA environment variable, represented here as ConfigDir. # data_directory = 'ConfigDir' # use data in another directory #data_directory = '/pgdata/data' # hba_file = 'ConfigDir/pg_hba.conf'# the host-based authentication file # ident_file = 'ConfigDir/pg_ident.conf' # the IDENT configuration file # If external_pid_file is not explicitly set, no extra pid file is written. # external_pid_file = '(none)' # write an extra pid file #--- # CONNECTIONS AND AUTHENTICATION #--- # - Connection Settings - #listen_addresses = 'localhost' # what IP interface(s) to listen on; # defaults to localhost, '*' = any listen_addresses = '*' port = 5432 max_connections = 100 # note: increasing max_connections costs about 500
Re: [PERFORM] When are index scans used over seq scans?
Richard van den Berg [EMAIL PROTECTED] writes: This is with the default cpu_tuple_cost = 0.01: Nested Loop (cost=252.80..233010147.16 rows=1035480320 width=98) (actual time=0.369..12672213.137 rows=6171334 loops=1) Join Filter: ((outer.starttimetrunc = inner.ts) AND (outer.finishtimetrunc = inner.ts)) - Seq Scan on sessions us (cost=0.00..26822.36 rows=924536 width=106) (actual time=0.039..5447.349 rows=924536 loops=1) - Materialize (cost=252.80..353.60 rows=10080 width=8) (actual time=0.000..2.770 rows=10080 loops=924536) - Seq Scan on duration du (cost=0.00..252.80 rows=10080 width=8) (actual time=0.019..13.397 rows=10080 loops=1) Total runtime: 12674486.670 ms Hmm, that *is* showing rather a spectacularly large amount of time in the join itself: if I did the arithmetic right, regression=# select 12672213.137 - (5447.349 + 2.770*924536 + 13.397); ?column? -- 10105787.671 (1 row) which is almost 80% of the entire runtime. Which is enormous. What are those column datatypes exactly? Perhaps you are incurring a datatype conversion cost? Straight timestamp-vs-timestamp comparison is fairly cheap, but any sort of conversion will cost dearly. The planner's model for the time spent in the join itself is (cpu_tuple_cost + 2 * cpu_operator_cost) * n_tuples (the 2 because you have 2 operators in the join condition) so you'd have to raise one or the other of these parameters to model this situation accurately. But I have a hard time believing that cpu_tuple_cost is really as high as 0.1. It seems more likely that the cpu_operator_cost is underestimated, which leads me to question what exactly is happening in those comparisons. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
FWIW, ODBC has variables to tweak, as well. fetch/buffer sizes, and the like. Maybe one of the ODBC cognoscenti here can chime in more concretely -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Joel Fradkin Sent: Thursday, April 21, 2005 10:36 AM To: 'Tom Lane'; 'John A Meinel' Cc: 'Postgresql Performance' Subject: Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon I suspect he's using pgadmin. Yup I was, but I did try running on the linux box in psql, but it was running to the screen and took forever because of that. The real issue is returning to my app using ODBC is very slow (Have not tested the ODBC for MYSQL, MSSQL is ok (the two proc dell is running out of steam but been good until this year when we about doubled our demand by adding sears as a client). Using odbc to postgres on some of the views (Josh from Command is having me do some very specific testing) is timing out with a 10 minute time limit. These are pages that still respond using MSSQL (this is wehere production is using the duel proc and the test is using the 4 proc). I have a tool that hooks to all three databases so I can try it with that and see if I get different responses. Joel ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
I suspect he's using pgadmin. Yup I was, but I did try running on the linux box in psql, but it was running to the screen and took forever because of that. The real issue is returning to my app using ODBC is very slow (Have not tested the ODBC for MYSQL, MSSQL is ok (the two proc dell is running out of steam but been good until this year when we about doubled our demand by adding sears as a client). Using odbc to postgres on some of the views (Josh from Command is having me do some very specific testing) is timing out with a 10 minute time limit. These are pages that still respond using MSSQL (this is wehere production is using the duel proc and the test is using the 4 proc). I have a tool that hooks to all three databases so I can try it with that and see if I get different responses. Joel ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] two queries and dual cpu (perplexed)
Is this an IO intensive query? If running both in parellel results in 2x the run time and you have sufficient cpus it would (to me) indicate you don't have enough IO bandwidth to satisfy the query. any tips on how to verify this? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Here is the connect string I am using. It could be horrid as I cut it from ODBC program. Session(StringConn) = DRIVER={PostgreSQL};DATABASE=wazagua;SERVER=192.168.123.252;PORT=5432;UID=; PWD=;ReadOnly=0;Protocol=6.4;FakeOidIndex=0;ShowOidColumn=0;RowVersioning=0; ShowSystemTables=0;ConnSettings=;Fetch=100;Socket=4096;UnknownSizes=0;MaxVar charSize=254;MaxLongVarcharSize=8190;Debug=0;CommLog=0;Optimizer=1;Ksqo=1;Us eDeclareFetch=0;TextAsLongVarchar=1;UnknownsAsLongVarchar=0;BoolsAsChar=1;Pa rse=0;CancelAsFreeStmt=0;ExtraSysTablePrefixes=dd_;LFConversion=1;UpdatableC ursors=1;DisallowPremature=0;TrueIsMinus1=0;BI=0;ByteaAsLongVarBinary=0;UseS erverSidePrepare=0 Joel Fradkin -Original Message- From: Mohan, Ross [mailto:[EMAIL PROTECTED] Sent: Thursday, April 21, 2005 9:42 AM To: [EMAIL PROTECTED] Subject: RE: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon FWIW, ODBC has variables to tweak, as well. fetch/buffer sizes, and the like. Maybe one of the ODBC cognoscenti here can chime in more concretely ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Joel, thanks. A couple of things jump out there for me, not a problem for a routine ODBC connection, but perhaps in the lotsa stuff context of your current explorations, it might be relevant? I am completely shooting from the hip, here, but...if it were my goose to cook, I'd be investigating Session(StringConn) = DRIVER={PostgreSQL};DATABASE=wazagua;SERVER=192.168.123.252;PORT=5432;UID=; PWD=;ReadOnly=0;Protocol=6.4; || Protocol? Is this related to version? is the driver wy old? FakeOidIndex=0;ShowOidColumn=0;RowVersioning=0; ShowSystemTables=0;ConnSettings=;Fetch=100; || Fetch great for OLTP, lousy for batch? Socket=4096;UnknownSizes=0;MaxVarcharSize=254;MaxLongVarcharSize=8190; || what ARE the datatypes and sizes in your particular case? Debug=0; || a run with debug=1 probably would spit up something interesting CommLog=0;Optimizer=1; || Optimizer? that's a new one on me Ksqo=1;UseDeclareFetch=0;TextAsLongVarchar=1;UnknownsAsLongVarchar=0;BoolsAsChar=1;Parse=0;CancelAsFreeStmt=;ExtraSysTablePrefixes=dd_;LFConversion=1;UpdatableCursors=1;DisallowPremature=0;TrueIsMinus1=0;BI=0;ByteaAsLongVarBinary=0;UseServerSidePrepare=0 || that's about all I can see, prima facie. I'll be very curious to know if ODBC is any part of your performance equation. HTH, Ross -Original Message- From: Joel Fradkin [mailto:[EMAIL PROTECTED] Sent: Thursday, April 21, 2005 10:54 AM To: Mohan, Ross Cc: [EMAIL PROTECTED]; PostgreSQL Perform Subject: RE: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon Here is the connect string I am using. It could be horrid as I cut it from ODBC program. Session(StringConn) = DRIVER={PostgreSQL};DATABASE=wazagua;SERVER=192.168.123.252;PORT=5432;UID=; PWD=;ReadOnly=0;Protocol=6.4;FakeOidIndex=0;ShowOidColumn=0;RowVersioning=0; ShowSystemTables=0;ConnSettings=;Fetch=100;Socket=4096;UnknownSizes=0;MaxVar charSize=254;MaxLongVarcharSize=8190;Debug=0;CommLog=0;Optimizer=1;Ksqo=1;Us eDeclareFetch=0;TextAsLongVarchar=1;UnknownsAsLongVarchar=0;BoolsAsChar=1;Pa rse=0;CancelAsFreeStmt=0;ExtraSysTablePrefixes=dd_;LFConversion=1;UpdatableC ursors=1;DisallowPremature=0;TrueIsMinus1=0;BI=0;ByteaAsLongVarBinary=0;UseS erverSidePrepare=0 Joel Fradkin -Original Message- From: Mohan, Ross [mailto:[EMAIL PROTECTED] Sent: Thursday, April 21, 2005 9:42 AM To: [EMAIL PROTECTED] Subject: RE: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon FWIW, ODBC has variables to tweak, as well. fetch/buffer sizes, and the like. Maybe one of the ODBC cognoscenti here can chime in more concretely ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] two queries and dual cpu (perplexed)
Just tried it with the following changes: shared_buffers = 10600 work_mem = 102400 enable_seqscan = false still no improvement Ok here's the Plan with the enable_seqscan = false: ausclimate=# explain ANALYZE select count(*) from getfutureausclimate; QUERY PLAN Aggregate (cost=101069350.74..101069350.74 rows=1 width=0) (actual time=461651.787..461651.787 rows=1 loops=1) - Subquery Scan getfutureausclimate (cost=101069350.50..101069350.70 rows=16 width=0) (actual time=426142.382..454571.397 rows=13276368 loops=1) - Sort (cost=101069350.50..101069350.54 rows=16 width=58) (actual time=426142.375..28.278 rows=13276368 loops=1) Sort Key: Aus40_DEM.AusPosNumber, CurrentAusClimate.iMonth - Nested Loop (cost=10001.02..101069350.18 rows=16 width=58) (actual time=72.740..366588.646 rows=13276368 loops=1) - Nested Loop (cost=10001.02..101067308.96 rows=44 width=68) (actual time=35.788..184032.873 rows=13276368 loops=1) - Nested Loop (cost=10001.02..101067043.83 rows=44 width=52) (actual time=35.753..47971.652 rows=13276368 loops=1) - Nested Loop (cost=10001.02..10012.98 rows=1 width=32) (actual time=7.433..7.446 rows=1 loops=1) - Merge Join (cost=10001.02..10007.13 rows=1 width=24) (actual time=7.403..7.412 rows=1 loops=1) Merge Cond: (outer.ClimateId = inner.ClimateId) - Index Scan using PK_ClimateVariables on ClimateVariables (cost=0.00..6.08 rows=7 width=10) (actual time=0.011..0.015 rows=3 loops=1) - Sort (cost=10001.02..10001.03 rows=1 width=14) (actual time=7.374..7.375 rows=1 loops=1) Sort Key: GetFutureClimateParameters.ClimateId - Seq Scan on GetFutureClimateParameters (cost=1.00..10001.01 rows=1 width=14) (actual time=7.361..7.362 rows=1 loops=1) - Index Scan using PK_ScenarioEmissionLevels on ScenarioEmissionLevels (cost=0.00..5.83 rows=1 width=18) (actual time=0.021..0.024 rows=1 loops=1) Index Cond: ((ScenarioEmissionLevels.ScenarioId = outer.ScenarioId) AND (ScenarioEmissionLevels.iYear = outer.iYear) AND (ScenarioEmissionLevels.LevelId = outer.LevelId)) - Index Scan using IX_ClimateId on ClimateChangeModel40 (cost=0.00..1063711.75 rows=265528 width=20) (actual time=28.311..17212.703 rows=13276368 loops=1) Index Cond: (outer.ClimateId = ClimateChangeModel40.ClimateId) - Index Scan using PK_Aus40_DEM on Aus40_DEM (cost=0.00..6.01 rows=1 width=16) (actual time=0.005..0.006 rows=1 loops=13276368) Index Cond: (outer.AusPosNumber = Aus40_DEM.AusPosNumber) - Index Scan using PK_CurrentAusClimate on CurrentAusClimate (cost=0.00..46.20 rows=11 width=14) (actual time=0.007..0.009 rows=1 loops=13276368) Index Cond: ((CurrentAusClimate.ClimateId = outer.ClimateId) AND (outer.AusPosNumber = CurrentAusClimate.AusPosNum) AND (CurrentAusClimate.iMonth = outer.iMonth)) Total runtime: 462218.120 ms (23 rows) On Thu, 21 Apr 2005, Russell Smith wrote: On Thu, 21 Apr 2005 10:44 pm, Shoaib Burq (VPAC) wrote: - Nested Loop (cost=2.19..1069345.29 rows=16 width=58) (actual time=135.390..366902.373 rows=13276368 loops=1) - Nested Loop (cost=2.19..1067304.07 rows=44 width=68) (actual time=107.627..186390.137 rows=13276368 loops=1) - Nested Loop (cost=2.19..1067038.94 rows=44 width=52) (actual time=87.255..49743.796 rows=13276368 loops=1) OUCH, OUCH, OUCH. Most if not all of the time is going on nested loop joins. The tuple estimates are off by a factore of 10^6 which is means it's chosing the wrong join type. you could set enable_seqscan to OFF; to test what he performance is like with a different plan, and then set it back on. However you really need to get the row count estimates up to something comparable. within a factor of 10 at least. A number of the other rows estimates seem to be off by
Re: [PERFORM] two queries and dual cpu (perplexed)
Shoaib Burq (VPAC) wrote: Just tried it with the following changes: shared_buffers = 10600 work_mem = 102400 enable_seqscan = false still no improvement Ok here's the Plan with the enable_seqscan = false: ausclimate=# explain ANALYZE select count(*) from getfutureausclimate; Actually, you probably don't want enable_seqscan=off, you should try: SET enable_nestloop TO off. The problem is that it is estimating there will only be 44 rows, but in reality there are 13M rows. It almost definitely should be doing a seqscan with a sort and merge join. Also, please attach you explain analyzes, the wrapping is really hard to read. I don't understand how postgres could get the number of rows that wrong. It seems to be misestimating the number of entries in IX_ClimateId Here: - Index Scan using PK_Aus40_DEM on Aus40_DEM (cost=0.00..6.01 rows=1 width=16) (actual time=0.005..0.006 rows=1 loops=13276368) Index Cond: (outer.AusPosNumber = Aus40_DEM.AusPosNumber) - Index Scan using PK_CurrentAusClimate on CurrentAusClimate (cost=0.00..46.20 rows=11 width=14) (actual time=0.007..0.009 rows=1 loops=13276368) The first index scan is costing you 0.006*13276368=79s, and the second one is 119s. I can't figure out exactly what is where from the formatting, but the query that seems misestimated is: - Index Scan using IX_ClimateId on ClimateChangeModel40 (cost=0.00..1063711.75 rows=265528 width=20) (actual time=28.311..17212.703 rows=13276368 loops=1) Index Cond: (outer.ClimateId = ClimateChangeModel40.ClimateId) Is there an unexpected correlaction between ClimateChangeModel40.ClimateId and whatever outer is at this point? John =:- signature.asc Description: OpenPGP digital signature
Re: [PERFORM] When are index scans used over seq scans?
John A Meinel wrote: You might try doing: ALTER TABLE us ALTER COLUMN starttimetrunc SET STATISTICS 200; ALTER TABLE us ALTER COLUMN finishtimetrunc SET STATISTICS 200; VACUUM ANALYZE us; I've been looking into that. While increasing the statistics makes the planner use the index for simple selects, it still does not for joins. Another thing that threw me off is that after a vacuum analyze a select * from us where 'x' between start and finish uses seq scans, while after just an analyze is uses the index! I thought both statements were supposed to update the statistics in the same way? (This is with 7.4.7.) You have 2 tables, a duration, and a from-to table, right? How many rows in each? Duration: 10k Sessions: 1M Anyway, you can play around with it by using stuff like: SET enable_seqscan TO off; This doesn't help much. Instead of turning seqscans off this setting increases its cost with 100M. Since my query already has a cost of about 400M-800M this doesn't matter much. For now, the only reliable way of forcing the use of the index is to set cpu_tuple_cost = 1. -- Richard van den Berg, CISSP --- Trust Factory B.V. | www.dna-portal.net Bazarstraat 44a| www.trust-factory.com 2518AK The Hague | Phone: +31 70 3620684 The Netherlands| Fax : +31 70 3603009 --- ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] When are index scans used over seq scans?
Tom Lane wrote: which is almost 80% of the entire runtime. Which is enormous. What are those column datatypes exactly? Table richard.sessions Column |Type | Modifiers +-+--- [unrelated columns removed] starttimetrunc | timestamp without time zone | finishtimetrunc| timestamp without time zone | Indexes: rb_us_st_ft_idx btree (starttimetrunc, finishtimetrunc) rb_us_st_ft_idx2 btree (finishtimetrunc, starttimetrunc) Check constraints: date_check CHECK (finishtimetrunc = starttimetrunc) Table richard.duration Column |Type | Modifiers +-+--- ts | timestamp without time zone | Perhaps you are incurring a datatype conversion cost? Not that I can tell. It seems more likely that the cpu_operator_cost is underestimated, As you perdicted, increasing cpu_operator_cost from 0.0025 to 0.025 also causes the planner to use the index on duration. which leads me to question what exactly is happening in those comparisons. Your guess is as good as mine (actually, yours is much better). I can put together a reproducable test case if you like.. -- Richard van den Berg, CISSP --- Trust Factory B.V. | www.dna-portal.net Bazarstraat 44a| www.trust-factory.com 2518AK The Hague | Phone: +31 70 3620684 The Netherlands| Fax : +31 70 3603009 --- Have you visited our new DNA Portal? --- ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] When are index scans used over seq scans?
Richard van den Berg [EMAIL PROTECTED] writes: Tom Lane wrote: Perhaps you are incurring a datatype conversion cost? Not that I can tell. No, apparently not. Hmm ... timestamp_cmp_internal is just a couple of isnan() checks and one or two floating-point compares. Should be pretty dang cheap. Unless isnan() is ridiculously expensive on your hardware? More likely there is some bottleneck that we are not thinking of. Are the tables in question particularly wide (many columns)? which leads me to question what exactly is happening in those comparisons. Your guess is as good as mine (actually, yours is much better). I can put together a reproducable test case if you like.. I'm thinking it would be interesting to look at a gprof profile of the nestloop case. If you can rebuild with profiling and get one, that would be fine, or you can make up a test case that shows the same slow joining behavior. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[PERFORM] Index bloat problem?
All, Running PostgreSQL 7.4.2, Solaris. Client is reporting that the size of an index is greater than the number of rows in the table (1.9 million vs. 1.5 million). Index was automatically created from a 'bigserial unique' column. Database contains several tables with exactly the same columns (including 'bigserial unique' column). This is the only table where this index is out of line with the actual # of rows. Queries on this table take 40 seconds to retrieve 2000 rows as opposed to 1-2 seconds on the other tables. We have been running 'VACUUM ANALYZE' very regularly. In fact, our vacuum schedule has probably been overkill. We have been running on a per-table basis after every update (many per day, only inserts occurring) and after every purge (one per day, deleting a day's worth of data). It is theoretically possible that at some time a process was run that deleted all rows in the table followed by a VACUUM FULL. In this case we would have dropped/recreated our own indexes on the table but not the index automatically created for the bigserial column. If that happened, could that cause these symptoms? What about if an out-of-the-ordinary number of rows were deleted (say 75% of rows in the table, as opposed to normal 5%) followed by a 'VACUUM ANALYZE'? Could things get out of whack because of that situation? thanks, Bill __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Index bloat problem?
Bill, What about if an out-of-the-ordinary number of rows were deleted (say 75% of rows in the table, as opposed to normal 5%) followed by a 'VACUUM ANALYZE'? Could things get out of whack because of that situation? Yes. You'd want to run REINDEX after and event like that. As you should now. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] How can an index be larger than a table
David, What also seems weird to me is that the control table has some unique indexes created on it, but the data_upate_events table just has a unique constraint. Will postgres use an index in the background to enforce this constraint? If you somehow have a unique constraint without a unique index, something is seriously broken. I suspect hacking of system tables. Otherwise, it sounds like you have index bloat due to mass deletions. Run REINDEX, or, preferably, VACUUM FULL and then REINDEX. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Index bloat problem?
--- Josh Berkus josh@agliodbs.com wrote: Bill, What about if an out-of-the-ordinary number of rows were deleted (say 75% of rows in the table, as opposed to normal 5%) followed by a 'VACUUM ANALYZE'? Could things get out of whack because of that situation? Yes. You'd want to run REINDEX after and event like that. As you should now. -- Josh Berkus Aglio Database Solutions San Francisco Thank you. Though I must say, that is very discouraging. REINDEX is a costly operation, timewise and due to the fact that it locks out other processes from proceeding. Updates are constantly coming in and queries are occurring continuously. A REINDEX could potentially bring the whole thing to a halt. Honestly, this seems like an inordinate amount of babysitting for a production application. I'm not sure if the client will be willing to accept it. Admittedly my knowledge of the inner workings of an RDBMS is limited, but could somebody explain to me why this would be so? If you delete a bunch of rows why doesn't the index get updated at the same time? Is this a common issue among all RDBMSs or is it something that is PostgreSQL specific? Is there any way around it? thanks, Bill __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Index bloat problem?
Bill, Honestly, this seems like an inordinate amount of babysitting for a production application. I'm not sure if the client will be willing to accept it. Well, then, tell them not to delete 75% of the rows in a table at once. I imagine that operation brought processing to a halt, too. If the client isn't willing to accept the consequences of their own bad data management, I'm not really sure what you expect us to do about it. Admittedly my knowledge of the inner workings of an RDBMS is limited, but could somebody explain to me why this would be so? If you delete a bunch of rows why doesn't the index get updated at the same time? It does get updated. What doesn't happen is the space getting reclaimed. In a *normal* data situation, the dead nodes are recycled for new rows. But doing a massive delete operation upsets that, and generally needs to be followed by a REINDEX. Is this a common issue among all RDBMSs or is it something that is PostgreSQL specific? Speaking from experience, this sort of thing affects MSSQL as well, although the maintenance routines are different. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Index bloat problem?
Bill, Honestly, this seems like an inordinate amount of babysitting for a production application. I'm not sure if the client will be willing to accept it. Well, then, tell them not to delete 75% of the rows in a table at once. I imagine that operation brought processing to a halt, too. Admittedly my knowledge of the inner workings of an RDBMS is limited, but could somebody explain to me why this would be so? If you delete a bunch of rows why doesn't the index get updated at the same time? It does get updated. What doesn't happen is the space getting reclaimed. In a *normal* data situation, those dead nodes would be replaced with new index nodes. However, a mass-delete-in-one-go messes that system up. Is this a common issue among all RDBMSs or is it something that is PostgreSQL specific? Speaking from experience, this sort of thing affects MSSQL as well, although the maintenance routines are different. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Index bloat problem?
Is: REINDEX DATABASE blah supposed to rebuild all indices in the database, or must you specify each table individualy? (I'm asking because I just tried it and it only did system tables) Alex Turner netEconomist On 4/21/05, Josh Berkus josh@agliodbs.com wrote: Bill, What about if an out-of-the-ordinary number of rows were deleted (say 75% of rows in the table, as opposed to normal 5%) followed by a 'VACUUM ANALYZE'? Could things get out of whack because of that situation? Yes. You'd want to run REINDEX after and event like that. As you should now. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Index bloat problem?
Alex, REINDEX DATABASE blah supposed to rebuild all indices in the database, or must you specify each table individualy? (I'm asking because I just tried it and it only did system tables) DATABASE Recreate all system indexes of a specified database. Indexes on user tables are not processed. Also, indexes on shared system catalogs are skipped except in stand-alone mode (see below). http://www.postgresql.org/docs/8.0/static/sql-reindex.html -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Index bloat problem?
josh@agliodbs.com (Josh Berkus) writes: Bill, What about if an out-of-the-ordinary number of rows were deleted (say 75% of rows in the table, as opposed to normal 5%) followed by a 'VACUUM ANALYZE'? Could things get out of whack because of that situation? Yes. You'd want to run REINDEX after and event like that. As you should now. Based on Tom's recent comments, I'd be inclined to handle this via doing a CLUSTER, which has the triple heroism effect of: a) Reorganizing the entire table to conform with the relevant index order, b) Having the effect of VACUUM FULL, and c) Having the effect of REINDEX all in one command. It has all of the oops, that blocked me for 20 minutes effect of REINDEX and VACUUM FULL, but at least it doesn't have the effect twice... -- (format nil [EMAIL PROTECTED] cbbrowne acm.org) http://www.ntlug.org/~cbbrowne/sap.html Rules of the Evil Overlord #78. I will not tell my Legions of Terror And he must be taken alive! The command will be: ``And try to take him alive if it is reasonably practical.'' http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Index bloat problem?
Is this a common issue among all RDBMSs or is it something that is PostgreSQL specific? Speaking from experience, this sort of thing affects MSSQL as well, although the maintenance routines are different. Yes, this is true with MSSQL too, however sql server implements a defrag index that doesn't lock up the table.. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_dbcc_30o9.asp DBCC INDEXDEFRAG can defragment clustered and nonclustered indexes on tables and views. DBCC INDEXDEFRAG defragments the leaf level of an index so that the physical order of the pages matches the left-to-right logical order of the leaf nodes, thus improving index-scanning performance. Every five minutes, DBCC INDEXDEFRAG will report to the user an estimated percentage completed. DBCC INDEXDEFRAG can be terminated at any point in the process, and *any completed work is retained.* -michael ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Index bloat problem?
Michael, Every five minutes, DBCC INDEXDEFRAG will report to the user an estimated percentage completed. DBCC INDEXDEFRAG can be terminated at any point in the process, and *any completed work is retained.* Keen. Sounds like something for our TODO list. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Disk filling, CPU filling, renegade inserts and deletes?
More info on what is bloating: It's only in one database (the one that's most used), and after running oid2name on the bloated files, the result is (mysteriously) empty. Here's the run on the three enormous files: $ /usr/local/bin/oid2name -d smt -o 160779 From database smt: Filenode Table Name -- $ /usr/local/bin/oid2name -d smt -o 65782869 From database smt: Filenode Table Name -- $ /usr/local/bin/oid2name -d smt -o 83345634 From database smt: Filenode Table Name -- The file list looks like this (with normal sized files mostly removed): 1.0G./106779 1.0G./106779.1 1.0G./106779.2 1.0G./106779.3 978M./106779.4 1.0G./65782869 248M./65782869.1 0B./65782871 8.0K./65782873 780M./83345634 0B./83345636 8.0K./83345638 So does the empty result mean it's a temporary table? There is one temporary table (in the function previously mentioned) that does get created and dropped with some regularity. Thanks again, Richard On Apr 20, 2005, at 2:06 PM, Richard Plotkin wrote: Hi Tom, Q: what have you got the FSM parameters set to? Here's from postgresql.conf -- FSM at default settings. # - Memory - shared_buffers = 30400 # min 16, at least max_connections*2, 8KB each work_mem = 32168# min 64, size in KB #maintenance_work_mem = 16384 # min 1024, size in KB #max_stack_depth = 2048 # min 100, size in KB # - Free Space Map - #max_fsm_pages = 2 # min max_fsm_relations*16, 6 bytes each #max_fsm_relations = 1000 # min 100, ~50 bytes each # - Kernel Resource Usage - max_files_per_process = 750 #1000 # min 25 #preload_libraries = '' Q: what exactly is bloating? Without knowing which tables or indexes are growing, it's hard to speculate about the exact causes. Use du and oid2name, or look at pg_class.relpages after a plain VACUUM. This I do not know. I've disabled the cron jobs and will let the system bloat, then I will gather statistics (I'll give it 12-24 hours). It's likely that the real answer is you need to vacuum more often than every six hours, but I'm trying not to jump to conclusions. That could be it, except that I would expect the problem to then look more like a gradual increase in CPU usage and a gradual increase in use of disk space. Mine could be an invalid assumption, but the system here looks like it goes from no problem to 100% problem within a minute. Thanks again! Richard ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] index not used
Hi folks, I'm doing a simple lookup in a small table by an unique id, and I'm wondering, why explains tells me seqscan is used instead the key. The table looks like: id bigint primary key, a varchar, b varchar, c varchar and I'm quering: select * from foo where id = 2; I've got only 15 records in this table, but I wanna have it as fast as possible since its used (as a map between IDs and names) for larger queries. thx -- - Enrico Weigelt== metux IT service phone: +49 36207 519931 www: http://www.metux.de/ fax: +49 36207 519932 email: [EMAIL PROTECTED] cellphone: +49 174 7066481 - -- DSL ab 0 Euro. -- statische IP -- UUCP -- Hosting -- Webshops -- - ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Index bloat problem?
--- [EMAIL PROTECTED] wrote: I gather you mean, out-of-the-ordinary for most apps, but not for this client? Actually, no. The normal activity is to delete 3-5% of the rows per day, followed by a VACUUM ANALYZE. Then over the course of the day (in multiple transactions) about the same amount are INSERTed (each transaction followed by a VACUUM ANALYZE on just the updated table). So 75% deletion is just out of the ordinary for this app. However, on occasion, deleting 75% of rows is a legitimate action for the client to take. It would be nice if they didn't have to remember to do things like REINDEX or CLUSTER or whatever on just those occasions. In case nobody else has asked: is your max_fsm_pages big enough to handle all the deleted pages, across ALL tables hit by the purge? If not, you're haemorrhaging pages, and VACUUM is probably warning you about exactly that. This parameter is most likely set incorrectly. So that could be causing problems. Could that be a culprit for the index bloat, though? If that's not a problem, you might want to consider partitioning the data. Take a look at inherited tables. For me, they're a good approximation of clustered indexes (sigh, miss'em) and equivalent to table spaces. My app is in a similar boat to yours: up to 1/3 of a 10M-row table goes away every day. For each of the child tables that is a candidate to be dropped, there is a big prologue txn, whichs moves (INSERT then DELETE) the good rows into a child table that is NOT to be dropped. Then BANG pull the plug on the tables you don't want. MUCH faster than DELETE: the dropped tables' files' disk space goes away in one shot, too. Just my 2c. Thanks. Bill __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Index bloat problem?
Same thing happens in Oracle ALTER INDEX blah rebuild To force a rebuild. It will mark the free blocks as 'free' below the PCTFREE value for the tablespace. Basically If you build an index with entries. and each entry is 1/4 of a block, the database will write 2500 blocks to the disk. If you delete a random 75% of the index values, you will now have 2500 blocks that have 75% free space. The database will reuse that free space in those blocks as you insert new values, but until then, you still have 2500 blocks worth of data on a disk, that is only 25% full. Rebuilding the index forces the system to physically re-allocate all that data space, and now you have just 2499 entries, that use 625 blocks. I'm not sure that 'blocks' is the correct term in postgres, it's segments in Oracle, but the concept remains the same. Alex Turner netEconomist On 4/21/05, Bill Chandler [EMAIL PROTECTED] wrote: --- Josh Berkus josh@agliodbs.com wrote: Bill, What about if an out-of-the-ordinary number of rows were deleted (say 75% of rows in the table, as opposed to normal 5%) followed by a 'VACUUM ANALYZE'? Could things get out of whack because of that situation? Yes. You'd want to run REINDEX after and event like that. As you should now. -- Josh Berkus Aglio Database Solutions San Francisco Thank you. Though I must say, that is very discouraging. REINDEX is a costly operation, timewise and due to the fact that it locks out other processes from proceeding. Updates are constantly coming in and queries are occurring continuously. A REINDEX could potentially bring the whole thing to a halt. Honestly, this seems like an inordinate amount of babysitting for a production application. I'm not sure if the client will be willing to accept it. Admittedly my knowledge of the inner workings of an RDBMS is limited, but could somebody explain to me why this would be so? If you delete a bunch of rows why doesn't the index get updated at the same time? Is this a common issue among all RDBMSs or is it something that is PostgreSQL specific? Is there any way around it? thanks, Bill __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] immutable functions vs. join for lookups ?
* Jaime Casanova [EMAIL PROTECTED] wrote: snip Even if your data never changes it *can* change so the function should be at most stable not immutable. okay, the planner sees that the table could potentionally change. but - as the dba - I'd like to tell him, this table *never* changes in practise (or at most there will be an insert once a year) isnt there any way to enforce the function to be really immutable ? cu -- - Enrico Weigelt== metux IT service phone: +49 36207 519931 www: http://www.metux.de/ fax: +49 36207 519932 email: [EMAIL PROTECTED] cellphone: +49 174 7066481 - -- DSL ab 0 Euro. -- statische IP -- UUCP -- Hosting -- Webshops -- - ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] immutable functions vs. join for lookups ?
* Tom Lane [EMAIL PROTECTED] wrote: snip Yeah, I was actually thinking about a two-step process: inline the function to produce somethig equivalent to a handwritten scalar sub-SELECT, and then try to convert sub-SELECTs into joins. ... back to my original question ... What kind of query should I use ? Is a join better than a function ? cu -- - Enrico Weigelt== metux IT service phone: +49 36207 519931 www: http://www.metux.de/ fax: +49 36207 519932 email: [EMAIL PROTECTED] cellphone: +49 174 7066481 - -- DSL ab 0 Euro. -- statische IP -- UUCP -- Hosting -- Webshops -- - ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Index bloat problem?
On Thu, Apr 21, 2005 at 11:28:43AM -0700, Josh Berkus wrote: Michael, Every five minutes, DBCC INDEXDEFRAG will report to the user an estimated percentage completed. DBCC INDEXDEFRAG can be terminated at any point in the process, and *any completed work is retained.* Keen. Sounds like something for our TODO list. -- Josh Berkus Aglio Database Solutions San Francisco See http://archives.postgresql.org/pgsql-general/2005-03/msg01465.php for my thoughts on a non-blocking alternative to REINDEX. I got no replies to that message. :-( I've almost got a working solution integrated in the backend that does correct WAL logging and everything. (Writing the code to write and replay WAL logs for complicated operations can be very annoying!) For now I've gone with a syntax of: REINDEX INDEX btree_index_name INCREMENTAL; (For now it's not a proper index AM (accessor method), instead the generic index code knows this is only supported for btrees and directly calls the btree_compress function.) It's not actually a REINDEX per-se in that it doesn't rebuild the whole index. It holds brief exclusive locks on the index while it shuffles items around to pack the leaf pages fuller. There were issues with the code I attached to the above message that have been resolved with the new code. With respect to the numbers provided in that e-mail the new code also recycles more pages than before. Once I've finished it up I'll prepare and post a patch. -- Dave Chapeskie OpenPGP Key ID: 0x3D2B6B34 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] index not used
If id is PK, the query shoudl return 1 row only... --- Enrico Weigelt [EMAIL PROTECTED] wrote: Hi folks, I'm doing a simple lookup in a small table by an unique id, and I'm wondering, why explains tells me seqscan is used instead the key. The table looks like: idbigint primary key, a varchar, b varchar, c varchar and I'm quering: select * from foo where id = 2; I've got only 15 records in this table, but I wanna have it as fast as possible since its used (as a map between IDs and names) for larger queries. thx -- - Enrico Weigelt== metux IT service phone: +49 36207 519931 www: http://www.metux.de/ fax: +49 36207 519932 email: [EMAIL PROTECTED] cellphone: +49 174 7066481 - -- DSL ab 0 Euro. -- statische IP -- UUCP -- Hosting -- Webshops -- - ---(end of broadcast)--- TIP 8: explain analyze is your friend __ Do you Yahoo!? Yahoo! Small Business - Try our new resources site! http://smallbusiness.yahoo.com/resources/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Disk filling, CPU filling, renegade inserts and deletes?
On Thu, Apr 21, 2005 at 11:38:22AM -0700, Richard Plotkin wrote: More info on what is bloating: It's only in one database (the one that's most used), and after running oid2name on the bloated files, the result is (mysteriously) empty. Here's the run on the three enormous files: $ /usr/local/bin/oid2name -d smt -o 160779 From database smt: Filenode Table Name -- Try -f instead of -o ... -- Alvaro Herrera ([EMAIL PROTECTED]) World domination is proceeding according to plan(Andrew Morton) ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Disk filling, CPU filling, renegade inserts and deletes?
That returned the same result. I also tried oid2name -d smt -x -i -S and, separately -s, and also separately, -d with all other databases, and none of the databases turned up any listing, in either oid or filenode, for any of these three bloated files. One thing I've noticed is that these oids are all extremely large numbers, whereas the rest of the oids in /data/base/* are no higher than 4 or 5. On Apr 21, 2005, at 1:46 PM, Alvaro Herrera wrote: On Thu, Apr 21, 2005 at 11:38:22AM -0700, Richard Plotkin wrote: More info on what is bloating: It's only in one database (the one that's most used), and after running oid2name on the bloated files, the result is (mysteriously) empty. Here's the run on the three enormous files: $ /usr/local/bin/oid2name -d smt -o 160779 From database smt: Filenode Table Name -- Try -f instead of -o ... -- Alvaro Herrera ([EMAIL PROTECTED]) World domination is proceeding according to plan(Andrew Morton) ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] index not used
On Thu, 21 Apr 2005, Enrico Weigelt wrote: I'm doing a simple lookup in a small table by an unique id, and I'm wondering, why explains tells me seqscan is used instead the key. The table looks like: idbigint primary key, a varchar, b varchar, c varchar and I'm quering: select * from foo where id = 2; I've got only 15 records in this table, but I wanna have it as fast as possible since its used (as a map between IDs and names) for larger queries. Two general things: For 15 records, an index scan may not be faster. For simple tests you can play with enable_seqscan to see, but for more complicated queries it's a little harder to tell. If you're using a version earlier than 8.0, you'll need to quote or cast the value you're searching for due to problems with cross-type comparisons (the 2 would be treated as int4). ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Index bloat problem?
Quoting Bill Chandler [EMAIL PROTECTED]: ... The normal activity is to delete 3-5% of the rows per day, followed by a VACUUM ANALYZE. ... However, on occasion, deleting 75% of rows is a legitimate action for the client to take. In case nobody else has asked: is your max_fsm_pages big enough to handle all the deleted pages, across ALL tables hit by the purge? This parameter is most likely set incorrectly. So that could be causing problems. Could that be a culprit for the index bloat, though? Look at the last few lines of vacuum verbose output. It will say something like: free space map: 55 relations, 88416 pages stored; 89184 total pages needed Allocated FSM size: 1000 relations + 100 pages = 5920 kB shared memory. 100 here is [max_fsm_pages] from my postgresql.conf. If the total pages needed is bigger than the pages fsm is allocated for, then you are bleeding. -- Dreams come true, not free. -- S.Sondheim, ITW ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Index bloat problem?
Mischa, Thanks. Yes, I understand that not having a large enough max_fsm_pages is a problem and I think that it is most likely the case for the client. What I wasn't sure of was if the index bloat we're seeing is the result of the bleeding you're talking about or something else. If I deleted 75% of the rows but had a max_fsm_pages setting that still exceeded the pages required (as indicated in VACUUM output), would that solve my indexing problem or would I still need to REINDEX after such a purge? regards, Bill --- Mischa Sandberg [EMAIL PROTECTED] wrote: Quoting Bill Chandler [EMAIL PROTECTED]: ... The normal activity is to delete 3-5% of the rows per day, followed by a VACUUM ANALYZE. ... However, on occasion, deleting 75% of rows is a legitimate action for the client to take. In case nobody else has asked: is your max_fsm_pages big enough to handle all the deleted pages, across ALL tables hit by the purge? This parameter is most likely set incorrectly. So that could be causing problems. Could that be a culprit for the index bloat, though? Look at the last few lines of vacuum verbose output. It will say something like: free space map: 55 relations, 88416 pages stored; 89184 total pages needed Allocated FSM size: 1000 relations + 100 pages = 5920 kB shared memory. 100 here is [max_fsm_pages] from my postgresql.conf. If the total pages needed is bigger than the pages fsm is allocated for, then you are bleeding. -- Dreams come true, not free. -- S.Sondheim, ITW __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Index bloat problem?
Bill, If I deleted 75% of the rows but had a max_fsm_pages setting that still exceeded the pages required (as indicated in VACUUM output), would that solve my indexing problem or would I still need to REINDEX after such a purge? Depends on the performance you're expecting.The FSM relates the the re-use of nodes, not taking up free space. So after you've deleted 75% of rows, the index wouldn't shrink. It just wouldn't grow when you start adding rows. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Index bloat problem?
Bill Chandler wrote: Mischa, Thanks. Yes, I understand that not having a large enough max_fsm_pages is a problem and I think that it is most likely the case for the client. What I wasn't sure of was if the index bloat we're seeing is the result of the bleeding you're talking about or something else. If I deleted 75% of the rows but had a max_fsm_pages setting that still exceeded the pages required (as indicated in VACUUM output), would that solve my indexing problem or would I still need to REINDEX after such a purge? regards, Bill I don't believe VACUUM re-packs indexes. It just removes empty index pages. So if you have 1000 index pages all with 1 entry in them, vacuum cannot reclaim any pages. REINDEX re-packs the pages to 90% full. fsm just needs to hold enough pages that all requests have free space that can be used before your next vacuum. It is just a map letting postgres know where space is available for a new fill. John =:- signature.asc Description: OpenPGP digital signature
[PERFORM] foreign key performance
Hi folks, do foreign keys have any influence on performance (besides slowing down huge inserts) ? do they bring any performance improvement ? thx -- - Enrico Weigelt== metux IT service phone: +49 36207 519931 www: http://www.metux.de/ fax: +49 36207 519932 email: [EMAIL PROTECTED] cellphone: +49 174 7066481 - -- DSL ab 0 Euro. -- statische IP -- UUCP -- Hosting -- Webshops -- - ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] two queries and dual cpu (perplexed)
On Fri, 22 Apr 2005, Shoaib Burq (VPAC) wrote: Please see attached the output from explain analyse. This is with the shared_buffers = 10600 work_mem = 102400 enable_seqscan = true BTW I guess should mention that I am doing the select count(*) on a View. Ran the Explain analyse with the nestedloop disabled but it was taking forever... and killed it after 30mins. Try increasing stats collection on ClimateChangeModel40.ClimateId: alter table ClimateChangeModel40 alter column ClimateId set statistics 1000; analyze ClimateChangeModel40; Gavin ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match