Re: [PERFORM] shared_buffers advice
2010/5/26 Cédric Villemain cedric.villemain.deb...@gmail.com: At the moment where a block is requested for the first time (usualy 8kb from postgres, so in fact 2 blocks in OS), you have 'duplicate' buffers. But, depending of your workload, it is not so bad because those 2 blocks should not be requested untill some time (because in postgresql shared buffers) and should be evicted by OS in favor of new blocks requests. Since pg_buffercache is 4-8 times smaller, it would seem to be extremely rare to me. And when PG requests a block, it also needs to evict something from shared_buffers. You can try pgfincore extension to grab stats from OS cache and/or patch postgresql if you want real stats ;) Thank you! It seems to be the tool I was looking for. Could help me locate and troubleshoot the hogs in page cache. I also find the snapshot/restore function promising. Every morning our cache is cold or filled with irrelevant data left by nightly batch jobs, thus severely impacting the performance. Seems to be exactly what this tool is for. How does it work? How stable is it? Can we use it in production on a daily basis? pgbuffercache is provided with postgresql and deliver very usefull information : http://www.postgresql.org/docs/8.4/interactive/pgbuffercache.html Thank you. I already am using it. I've already found a few hogs with it. -- Konrad Garus -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random Page Cost and Planner
Hi, Bryan. I was just about to reply to the thread, thanks for asking. Clustering was key. After rebooting the machine (just to make sure absolutely nothing was cached), I immediately ran a report on Toronto: 5.25 seconds! Here's what I did: 1. Created a new set of tables that matched the old set, with statistics of 1000 on the station and taken (date) columns. 2. Inserted the data from the old hierarchy into the new set, ordered by station id then by date (same seven child tables as before: one per category). - I wanted to ensure a strong correlation between primary key and station id. 3. Added three indexes per table: (a) station id; (b) date taken; and (c) station-taken-category. 4. Set the station-taken-category index as CLUSTER. 5. Vacuumed the new tables. 6. Dropped the old tables. 7. Set the following configuration values: - shared_buffers = 1GB - temp_buffers = 32MB - work_mem = 32MB - maintenance_work_mem = 64MB - seq_page_cost = 1.0 - random_page_cost = 2.0 - cpu_index_tuple_cost = 0.001 - effective_cache_size = 512MB I ran a few more reports (no reboots, but reading vastly different data sets): - Vancouver: 4.2s - Yellowknife: 1.7s - Montreal: 6.5s - Trois-Riviers: 2.8s No full table scans. I imagine some indexes are not strictly necessary and will test to see which can be removed (my guess: the station and taken indexes). The problem was that the station ids were scattered and so PostgreSQL presumed a full table scan would be faster. Physically ordering the data by station ids triggers index use every time. Next week's hardware upgrade should halve those times -- unless anyone has further suggestions to squeeze more performance out of PG. ;-) Dave
Re: [PERFORM] shared_buffers advice
2010/5/27 Konrad Garus konrad.ga...@gmail.com: 2010/5/26 Cédric Villemain cedric.villemain.deb...@gmail.com: At the moment where a block is requested for the first time (usualy 8kb from postgres, so in fact 2 blocks in OS), you have 'duplicate' buffers. But, depending of your workload, it is not so bad because those 2 blocks should not be requested untill some time (because in postgresql shared buffers) and should be evicted by OS in favor of new blocks requests. Since pg_buffercache is 4-8 times smaller, it would seem to be extremely rare to me. And when PG requests a block, it also needs to evict something from shared_buffers. 3 very important things : * postgresql shared buffers are database oriented * OS shared buffers are *more* complex and will not evict the same buffers as postgres. * OS page cache can handle tens of GB where postgres usually have no gain in performance over 10GB. You can try pgfincore extension to grab stats from OS cache and/or patch postgresql if you want real stats ;) Thank you! It seems to be the tool I was looking for. Could help me locate and troubleshoot the hogs in page cache. I also find the snapshot/restore function promising. Every morning our cache is cold or filled with irrelevant data left by nightly batch jobs, thus severely impacting the performance. Seems to be exactly what this tool is for. How does it work? How stable is it? Can we use it in production on a daily basis? It works thanks to mincore/posix_fadvise stuff : you need linux. It is stable enough in my own experiment. I did use it for debugging purpose in production servers with succes. BUT : * snapshot/restore is done via a flat_file (one per segment or table/index) and *it is not removed* when you drop a table. * it might exist corner case not yet handled (like snapshot a database, change things like drop table, truncate table, then restore) It needs some polish to be totally production ready but the job can be done. pgbuffercache is provided with postgresql and deliver very usefull information : http://www.postgresql.org/docs/8.4/interactive/pgbuffercache.html Thank you. I already am using it. I've already found a few hogs with it. -- Konrad Garus -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random Page Cost and Planner
2010/5/27 David Jarvis thanga...@gmail.com: Hi, Bryan. I was just about to reply to the thread, thanks for asking. Clustering was key. After rebooting the machine (just to make sure absolutely nothing was cached), I immediately ran a report on Toronto: 5.25 seconds! Here's what I did: Created a new set of tables that matched the old set, with statistics of 1000 on the station and taken (date) columns. Inserted the data from the old hierarchy into the new set, ordered by station id then by date (same seven child tables as before: one per category). I wanted to ensure a strong correlation between primary key and station id. Added three indexes per table: (a) station id; (b) date taken; and (c) station-taken-category. Set the station-taken-category index as CLUSTER. Vacuumed the new tables. Dropped the old tables. Set the following configuration values: shared_buffers = 1GB temp_buffers = 32MB work_mem = 32MB maintenance_work_mem = 64MB seq_page_cost = 1.0 random_page_cost = 2.0 cpu_index_tuple_cost = 0.001 effective_cache_size = 512MB I ran a few more reports (no reboots, but reading vastly different data sets): Vancouver: 4.2s Yellowknife: 1.7s Montreal: 6.5s Trois-Riviers: 2.8s No full table scans. I imagine some indexes are not strictly necessary and will test to see which can be removed (my guess: the station and taken indexes). The problem was that the station ids were scattered and so PostgreSQL presumed a full table scan would be faster. Physically ordering the data by station ids triggers index use every time. Next week's hardware upgrade should halve those times -- unless anyone has further suggestions to squeeze more performance out of PG. ;-) I wonder what the plan will be if you replace sc.taken_* in : m.taken BETWEEN sc.taken_start AND sc.taken_end by values. It might help the planner... Also, I'll consider explicit ordered join but I admit I haven't read the whole thread (in particular the table size). Ho, and I set statistics to a highter value for column category_id, table station_category (seeing the same resquest and explain analyze without date in the query will help) Dave -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] shared_buffers advice
2010/5/27 Cédric Villemain cedric.villemain.deb...@gmail.com: It works thanks to mincore/posix_fadvise stuff : you need linux. It is stable enough in my own experiment. I did use it for debugging purpose in production servers with succes. What impact does it have on performance? Does it do anything, is there any interaction between it and PG/OS, when it's not executing a command explicitly invoked by me? -- Konrad Garus -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] shared_buffers advice
2010/5/27 Konrad Garus konrad.ga...@gmail.com: 2010/5/27 Cédric Villemain cedric.villemain.deb...@gmail.com: It works thanks to mincore/posix_fadvise stuff : you need linux. It is stable enough in my own experiment. I did use it for debugging purpose in production servers with succes. What impact does it have on performance? pgmincore() and pgmincore_snapshot() both are able to mmap up to 1GB. I didn't mesure a performance impact. But I haven't enough benchmarks/test yet. Does it do anything, is there any interaction between it and PG/OS, when it's not executing a command explicitly invoked by me? pgfincore does nothing until you call one of the functions. Reducing the mmap window is faisable, and I had start something to use effective_io_concurrency in order to improve prefetch (for restore) but this part of the code is not yet finished. -- Konrad Garus -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Query causing explosion of temp space with join involving partitioning
Hello, Sorry for the re-post - not sure list is the relevant one, I included slightly changed query in the previous message, sent to bugs list. I have an ORM-generated queries where parent table keys are used to fetch the records from the child table (with relevant FK indicies), where child table is partitioned. My understanding is that Postgres is unable to properly use constraint exclusion to query only a relevant table? Half of the join condition is propagated down, while the other is not. table sources has pk (sureyid,srcid), ts has fk(survey_pk,source_pk) on source (sureyid,srcid) and another index with survey_pk,source_pk,tstype (not used in the query). This is very unfortunate as the queries are auto-generated and I cannot move predicate to apply it directly to partitioned table. The plan includes all the partitions, next snippet shows exclusion works for the table when condition is used directly on the partitioned table. surveys- SELECT t1.SURVEY_PK, t1.SOURCE_PK, t1.TSTYPE, t1.VALS surveys- FROM sources t0 ,TS t1 where surveys- (t0.SURVEYID = 16 AND t0.SRCID = 203510110032281 AND t0.SRCID = 203520107001677 and t0.SURVEYID = t1.SURVEY_PK AND t0.SRCID = t1.SOURCE_PK ) ORDER BY t0.SURVEYID ASC, t0.SRCID ASC surveys- surveys- ; QUERY PLAN Merge Join (cost=11575858.83..11730569.40 rows=3448336 width=60) Merge Cond: (t0.srcid = t1.source_pk) - Index Scan using sources_pkey on sources t0 (cost=0.00..68407.63 rows=37817 width=12) Index Cond: ((surveyid = 16) AND (srcid = 203510110032281::bigint) AND (srcid = 203520107001677::bigint)) - Materialize (cost=11575858.83..11618963.03 rows=3448336 width=48) - Sort (cost=11575858.83..11584479.67 rows=3448336 width=48) Sort Key: t1.source_pk - Append (cost=0.00..11049873.18 rows=3448336 width=48) - Index Scan using ts_pkey on ts t1 (cost=0.00..8.27 rows=1 width=853) Index Cond: (survey_pk = 16) - Index Scan using ts_part_bs3000l0_ts_pkey on ts_part_bs3000l0 t1 (cost=0.00..8.27 rows=1 width=48) Index Cond: (survey_pk = 16) - Bitmap Heap Scan on ts_part_bs3000l1_cg0346l0 t1 (cost=5760.36..1481735.21 rows=462422 width=48) Recheck Cond: (survey_pk = 16) - Bitmap Index Scan on ts_part_bs3000l1_cg0346l0_ts_pkey (cost=0.00..5644.75 rows=462422 width=0) Index Cond: (survey_pk = 16) - Bitmap Heap Scan on ts_part_cg0346l1_cg0816k0 t1 (cost=5951.07..1565423.79 rows=488582 width=48) Recheck Cond: (survey_pk = 16) - Bitmap Index Scan on ts_part_cg0346l1_cg0816k0_ts_pkey (cost=0.00..5828.93 rows=488582 width=0) Index Cond: (survey_pk = 16) - Bitmap Heap Scan on ts_part_cg0816k1_cg1180k0 t1 (cost=5513.54..1432657.90 rows=447123 width=48) Recheck Cond: (survey_pk = 16) - Bitmap Index Scan on ts_part_cg0816k1_cg1180k0_ts_pkey (cost=0.00..5401.75 rows=447123 width=0) Index Cond: (survey_pk = 16) - Bitmap Heap Scan on ts_part_cg1180k1_cg6204k0 t1 (cost=5212.63..1329884.46 rows=415019 width=48) Recheck Cond: (survey_pk = 16) - Bitmap Index Scan on ts_part_cg1180k1_cg6204k0_ts_pkey (cost=0.00..5108.87 rows=415019 width=0) Index Cond: (survey_pk = 16) - Bitmap Heap Scan on ts_part_cg6204k1_lm0022n0 t1 (cost=5450.37..1371917.76 rows=428113 width=48) Recheck Cond: (survey_pk = 16) - Bitmap Index Scan on ts_part_cg6204k1_lm0022n0_ts_pkey (cost=0.00..5343.35 rows=428113 width=0) Index Cond: (survey_pk = 16) - Bitmap Heap Scan on ts_part_lm0022n1_lm0276m0 t1 (cost=5136.71..1298542.32 rows=405223 width=48) Recheck Cond: (survey_pk = 16) - Bitmap Index Scan on ts_part_lm0022n1_lm0276m0_ts_pkey (cost=0.00..5035.40 rows=405223 width=0) Index Cond: (survey_pk = 16) - Bitmap Heap Scan on ts_part_lm0276m1_lm0584k0 t1 (cost=5770.98..1525737.42 rows=476204 width=48) Recheck Cond: (survey_pk = 16) - Bitmap Index Scan on ts_part_lm0276m1_lm0584k0_ts_pkey (cost=0.00..5651.93 rows=476204 width=0) Index Cond: (survey_pk = 16)
[PERFORM] Autovacuum in postgres.
Hi All, In my application we are using postgres which runs on an embedded box. I have configured autovacuum to run once for every one hour. It has 5 different databases in it. When I saw the log messages, I found that it is running autovacuum on one database every hour. As a result, on my database autovacuum is run once in 5 hours. Is there any way to make it run it every hour. Thank you, Venu
Re: [PERFORM] Autovacuum in postgres.
One more question Is is expected ? On Fri, May 21, 2010 at 3:08 PM, venu madhav venutaurus...@gmail.comwrote: Hi All, In my application we are using postgres which runs on an embedded box. I have configured autovacuum to run once for every one hour. It has 5 different databases in it. When I saw the log messages, I found that it is running autovacuum on one database every hour. As a result, on my database autovacuum is run once in 5 hours. Is there any way to make it run it every hour. Thank you, Venu
[PERFORM] Query timing increased from 3s to 55s when used as function instead of select
We're using a function that when run as a select statement outside of the function takes roughly 1.5s to complete whereas running an identical query within a function is taking around 55s to complete. We are lost as to why placing this query within a function as opposed to substituting the variables in a select statement is so drastically different. The timings posted here are from a 512MB memory virtual machine and are not of major concern on their own but we are finding the same issue in our production environment with far superior hardware. The function can be found here: http://campbell-lange.net/media/files/fn_medirota_get_staff_leave_summary.sql --- Timings for the individual components on their own is as follows: select * from fn_medirota_validate_rota_master(6); Time: 0.670 ms select to_date(EXTRACT (YEAR FROM current_date)::text, ''); Time: 0.749 ms select * from fn_medirota_people_template_generator(2, 6, date'2009-01-01', date'2009-12-31', TRUE) AS templates; Time: 68.004 ms select * from fn_medirota_people_template_generator(2, 6, date'2010-01-01', date'2010-12-31', TRUE) AS templates; Time: 1797.323 Copying the exact same for loop select statement from the query above into the psql query buffer and running them with variable substitution yields the following: Running FOR loop SElECT with variable substitution: Time: 3150.585 ms Whereas invoking the function yields: select * from fn_medirota_get_staff_leave_summary(6); Time: 57375.477 ms We have tried using explain analyse to update the query optimiser, dropped and recreated the function and have restarted both the machine and the postgres server multiple times. Any help or advice would be greatly appreciated. Kindest regards, Tyler Hildebrandt --- EXPLAIN ANALYSE VERBOSE SELECT * FROM fn_medirota_get_staff_leave_summary(6); QUERY PLAN - {FUNCTIONSCAN :startup_cost 0.00 :total_cost 260.00 :plan_rows 1000 :plan_width 85 :targetlist ( {TARGETENTRY :expr {VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 1 } :resno 1 :resname id :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 2 :vartype 1043 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 2 } :resno 2 :resname t_full_name :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 3 :vartype 16 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 3 } :resno 3 :resname b_enabled :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 4 :vartype 1043 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 4 } :resno 4 :resname t_anniversary :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 5 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 5 } :resno 5 :resname n_last_year_annual :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 6 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 6 } :resno 6 :resname n_last_year_other :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 7 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 7 } :resno 7 :resname n_this_year_annual :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 8 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 8 } :resno 8 :resname n_this_year_other :ressortgroupref 0 :resorigtbl 0 :resorigcol 0
[PERFORM] Certain query eating up all free memory (out of memory error)
Hi group, I could really use your help with this one. I don't have all the details right now (I can provide more descriptions tomorrow and logs if needed), but maybe this will be enough: I have written a PG (8.3.8) module, which uses Flex Lexical Analyser. It takes text from database field and finds matches for defined rules. It returns a set of two text fields (value found and value type). When I run query like this: SELECT * FROM flex_me(SELECT some_text FROM some_table WHERE id = 1); It works perfectly fine. Memory never reaches more than 1% (usually its below 0.5% of system mem). But when I run query like this: SELECT flex_me(some_text_field) FROM some_table WHERE id = 1; Memory usage goes through the roof, and if the result is over about 10k matches (rows) it eats up all memory and I get out of memory error. I try to free all memory allocated, and even did a version with double linked list of results but the same behaviour persists. I tried to track it down on my own and from my own trials it seems that the problem lies directly in the set returning function in File 2 flex_me() as even with 40k of results in a 2 column array it shouldn't take more than 1MB of memory. Also when I run it just to the point of SRF_IS_FIRSTCALL() (whole bit) the memory usage doesn't go up, but when subsequent SRF_PERCALL calls are made it's where the memory usage goes through the roof. Btw, if the following code contains some nasty errors and I'm pretty sure it does, please know that I'm just learning PG and C programming. Any help or tips would be greatly appreciated. Simplified (but still relevant) code below: File 1 (Flex parser template which is compiled with flex): %{ #include stdio.h extern void *addToken(int type); extern char ***flexme(char *ptr); #define T_NUM 1 #define S_NUM number #define T_FLO 2 #define S_FLO float #define T_DAT 3 #define S_DAT date #define T_WRD 7 #define S_WRD word char ***vals; int cnt = 0, mem_cnt = 64; %} DGT [0-9] NUMBER (-)?{DGT}+ FLOAT((-)?{DGT}+[\.,]{DGT}+)|{NUMBER} DATE_S1 - DATE_S2 , DATE_S3 . DATE_S4 / DATE_S5 DATE_YY ([0-9]|([0-9][0-9])|([0-1][0-9][0-9][0-9])|(2[0-4][0-9][0-9])) DATE_DD ([1-9]|(([0-2][0-9])|(3[0-1]))) DATE_MM ([1-9]|((0[1-9])|(1[0-2]))) DATE_YMD_S1 ({DATE_YY}{DATE_S1}{DATE_MM}{DATE_S1}{DATE_DD}) DATE_YMD_S2 ({DATE_YY}{DATE_S2}{DATE_MM}{DATE_S2}{DATE_DD}) DATE_YMD_S3 ({DATE_YY}{DATE_S3}{DATE_MM}{DATE_S3}{DATE_DD}) DATE_YMD_S4 ({DATE_YY}{DATE_S4}{DATE_MM}{DATE_S4}{DATE_DD}) DATE_YMD_S5 ({DATE_YY}{DATE_S5}{DATE_MM}{DATE_S5}{DATE_DD}) DATE_YMD ({DATE_YMD_S1}|{DATE_YMD_S2}|{DATE_YMD_S3}|{DATE_YMD_S4}|{DATE_YMD_S5}) WORD ([a-zA-Z0-9]+) %% {FLOAT} addToken(T_FLO); {DATE_YMD} addToken(T_DAT); {WORD} addToken(T_WRD); .|\n /* eat up any unmatched character */ %% void * addToken(int type) { int x = 0; //elog(NOTICE,W[%d] %s, type, yytext); //check if we need to add more mem if (mem_cnt-1 = cnt) { mem_cnt *= 2; vals = repalloc(vals, mem_cnt * sizeof(char *)); //elog(NOTICE, mem increased to: %d, mem_cnt*sizeof(char *)); } vals[cnt] = palloc(2 * sizeof(char *)); //types switch (type) { case T_FLO://float x = strlen(S_FLO); vals[cnt][1] = palloc((x+1) * sizeof(char)); strncpy(vals[cnt][1], S_FLO, x); vals[cnt][1][x] = '\0'; break; case T_DAT: //date x = strlen(S_DAT); vals[cnt][1] = palloc((x+1) * sizeof(char)); strncpy(vals[cnt][1], S_DAT, x); vals[cnt][1][x] = '\0'; break; case T_WRD: //word x = strlen(S_WRD); vals[cnt][1] = palloc((x+1) * sizeof(char)); strncpy(vals[cnt][1], S_WRD, x); vals[cnt][1][x] = '\0'; break; default: elog(ERROR,Unknown flexme type: %d, type); break; } //value vals[cnt][0] = palloc((yyleng+1) * sizeof(char)); strncpy(vals[cnt][0], yytext, yyleng); vals[cnt][0][yyleng] = '\0'; cnt++; //elog(NOTICE,i: %d, cnt); return 0; } char ***flexme(char *ptr) { YY_BUFFER_STATE bp; int yyerr = 0; cnt = 0; //initial table size vals = palloc(mem_cnt * sizeof(char *)); bp = yy_scan_string(ptr); yy_switch_to_buffer(bp); yyerr = yylex(); yy_delete_buffer(bp); if (yyerr != 0) { elog(ERROR, Flex parser error code: %d, yyerr); } return vals; } File 2 (PG function, which includes flex output analyser of compiled File 1 - lex.yy.c): #include postgres.h #include fmgr.h #include funcapi.h #include lex.yy.c char *text_to_cstring(const text *t); //this is copied directly from PG sources char * text_to_cstring(const text *t) { /* must cast away the const, unfortunately */ text *tunpacked =
[PERFORM] Re: [BUGS] Query causing explosion of temp space with join involving partitioning
Hello, Thank you for the clarifications. The plan as run from the psql looks ok, also did not notice any specific locks for this particular query. Logs of the system running queries are not utterly clear, so chasing the parameters for the explosive query is not that simple (shared logs between multiple threads), but from what I see there is no difference between them and the plan looks like (without removal of irrelevant parameters this time, most of them are float8, but also bytea) explain SELECT t0.surveyid, t0.srcid, t1.survey_pk, t1.source_pk, t1.tstype, t1.homoscedasticitytest, t1.ljungboxrandomnesstest, t1.maxvalue, t1.meanobstime, t1.meanvalue, t1.median, t1.minvalue, t1.range, t1.robustweightedstddev, t1.symmetrytest, t1.trimmedweightedmean, t1.trimmedweightedrange, t1.variabilityflag, t1.weightedkurtosis, t1.weightedmean, t1.weightedmeanconfidenceinterval, t1.weightedmeanobstime, t1.weightednormalizedp2pscatter, t1.weightedskewness, t1.weightedstddevdf, t1.weightedstddevwdf, t1.vals, t1.ccdids, t1.flags, t1.obstime, t1.len, t1.valueerrors FROM sources t0 INNER JOIN ts t1 ON t0.surveyid = t1.survey_pk AND t0.srcid = t1.source_pk WHERE (t0.surveyid = 16 AND t0.srcid = 200210107009116 AND t0.srcid = 200210107009991) ORDER BY t0.surveyid ASC, t0.srcid ASC ; QUERY PLAN - Nested Loop (cost=0.00..2363.21 rows=835683 width=1527) Join Filter: (t0.srcid = t1.source_pk) - Index Scan using sources_pkey on sources t0 (cost=0.00..17.88 rows=1 width=12) Index Cond: ((surveyid = 16) AND (srcid = 200210107009116::bigint) AND (srcid = 200210107009991::bigint)) - Append (cost=0.00..2325.93 rows=1552 width=1053) - Index Scan using ts_pkey on ts t1 (cost=0.00..4.27 rows=1 width=1665) Index Cond: ((t1.survey_pk = 16) AND (t1.source_pk = t0.srcid)) - Index Scan using ts_part_bs3000l0_ts_pkey on ts_part_bs3000l0 t1 (cost=0.00..6.30 rows=2 width=327) Index Cond: ((t1.survey_pk = 16) AND (t1.source_pk = t0.srcid)) - Index Scan using ts_part_sm0073k1_ts_pkey on ts_part_sm0073k1 t1 (cost=0.00..1232.63 rows=608 width=327) Index Cond: ((t1.survey_pk = 16) AND (t1.source_pk = t0.srcid)) - Index Scan using ts_part_bs3000l1_cg0346l0_ts_pkey on ts_part_bs3000l1_cg0346l0 t1 (cost=0.00..145.41 rows=127 width=1556) Index Cond: ((t1.survey_pk = 16) AND (t1.source_pk = t0.srcid)) - Index Scan using ts_part_cg0346l1_cg0816k0_ts_pkey on ts_part_cg0346l1_cg0816k0 t1 (cost=0.00..147.64 rows=127 width=1669) Index Cond: ((t1.survey_pk = 16) AND (t1.source_pk = t0.srcid)) - Index Scan using ts_part_cg0816k1_cg1180k0_ts_pkey on ts_part_cg0816k1_cg1180k0 t1 (cost=0.00..138.09 rows=119 width=1615) Index Cond: ((t1.survey_pk = 16) AND (t1.source_pk = t0.srcid)) - Index Scan using ts_part_cg1180k1_cg6204k0_ts_pkey on ts_part_cg1180k1_cg6204k0 t1 (cost=0.00..125.69 rows=109 width=1552) Index Cond: ((t1.survey_pk = 16) AND (t1.source_pk = t0.srcid)) - Index Scan using ts_part_cg6204k1_lm0022n0_ts_pkey on ts_part_cg6204k1_lm0022n0 t1 (cost=0.00..133.23 rows=116 width=1509) Index Cond: ((t1.survey_pk = 16) AND (t1.source_pk = t0.srcid)) - Index Scan using ts_part_lm0022n1_lm0276m0_ts_pkey on ts_part_lm0022n1_lm0276m0 t1 (cost=0.00..131.08 rows=115 width=1500) Index Cond: ((t1.survey_pk = 16) AND (t1.source_pk = t0.srcid)) - Index Scan using ts_part_lm0276m1_lm0584k0_ts_pkey on ts_part_lm0276m1_lm0584k0 t1 (cost=0.00..158.11 rows=135 width=1471) Index Cond: ((t1.survey_pk = 16) AND (t1.source_pk = t0.srcid)) - Index Scan using ts_part_lm0584k1_sm0073k0_ts_pkey on ts_part_lm0584k1_sm0073k0 t1 (cost=0.00..103.47 rows=93 width=1242) Index Cond: ((t1.survey_pk = 16) AND (t1.source_pk = t0.srcid)) I could increase debug level on the server, but not sure if the plan printed there is of any help. Could this be caused by some race where there is too much activity? - DB box is at around 10% CPU load, small io wait, before the query starts to overload the machine. For sake of clarity this is the plan for the non-joined parameters to show which partition would be used (i.e. a single one) explain select * from ts t0 where t0.survey_pk = 16 AND t0.source_pk = 200210107009116 AND t0.source_pk = 200210107009991; QUERY PLAN
[PERFORM] About Tom Lane's Xeon CS test case
Dear all and Tom, Recently my company’s postgres DB server sluggish suddenly with a hight Context-switching value as below: 2010-04-07 04:03:15 procs memory swap io system cpu 2010-04-07 04:03:15 r b swpd free buff cache si sobibo incs us sy id wa 2010-04-07 14:04:27 3 0 0 2361272 272684 309614800 3 1445 973 14230 7 8 84 0 2010-04-07 14:05:27 2 0 0 2361092 272684 309622000 3 1804 1029 31852 8 10 81 1 2010-04-07 14:06:27 1 0 0 2362236 272684 309656400 3 1865 1135 19689 9 9 81 0 2010-04-07 14:07:27 1 0 0 2348400 272720 310183600 3 1582 1182 149461 15 17 67 0 2010-04-07 14:08:27 3 0 0 2392028 272840 310760000 3 3093 1275 203196 24 23 53 1 2010-04-07 14:09:27 3 1 0 2386224 272916 310796000 3 2486 1331 193299 26 22 52 0 2010-04-07 14:10:27 34 0 0 2332320 272980 310794400 3 1692 1082 214309 24 22 54 0 2010-04-07 14:11:27 1 0 0 2407432 273028 310809200 6 2770 1540 76643 29 13 57 1 2010-04-07 14:12:27 9 0 0 2358968 273104 310838800 7 2639 1466 10603 22 6 72 1 I have read this problem about ““Tom Lane” Workload” . And I found my company’s DB is a Xeon MP server. I am going to have a test to confirm it. If anybody have the test case “Tom Lane's Xeon CS test case” ? Thank you! My postgres version: 8.1.3; My OS version: Linux version 2.4.21-47.Elsmp((Red Hat Linux 3.2.3-54) My CPU: processor : 7 vendor_id : GenuineIntel cpu family : 15 model : 6 model name : Intel(R) Xeon(TM) CPU 3.40GHz stepping: 8 cpu MHz : 3400.262 cache size : 1024 KB physical id : 1 Best regards, Ray Huang
Re: [PERFORM] Autovacuum in postgres.
venu madhav wrote: Hi All, In my application we are using postgres which runs on an embedded box. I have configured autovacuum to run once for every one hour. It has 5 different databases in it. When I saw the log messages, I found that it is running autovacuum on one database every hour. As a result, on my database autovacuum is run once in 5 hours. Is there any way to make it run it every hour. What settings did you change to make it run every hour? Also, it will only vacuum tables that need vacuuming. What version of Postgres are you using? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Autovacuum in postgres.
venu madhav wrote: Thanks for the reply.. I am using postgres 8.01 and since it runs on a client box, I can't upgrade it. I've set the auto vacuum nap time to 3600 seconds. That is an older version of autovacuum that wasn't very capable. --- On Thu, May 27, 2010 at 8:03 PM, Bruce Momjian br...@momjian.us wrote: venu madhav wrote: Hi All, In my application we are using postgres which runs on an embedded box. I have configured autovacuum to run once for every one hour. It has 5 different databases in it. When I saw the log messages, I found that it is running autovacuum on one database every hour. As a result, on my database autovacuum is run once in 5 hours. Is there any way to make it run it every hour. What settings did you change to make it run every hour? Also, it will only vacuum tables that need vacuuming. What version of Postgres are you using? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] shared_buffers advice
2010/5/27 Cédric Villemain cedric.villemain.deb...@gmail.com: pgmincore() and pgmincore_snapshot() both are able to mmap up to 1GB. Does it mean they can occupy 1 GB of RAM? How does it relate to amount of page buffers mapped by OS? -- Konrad Garus -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] shared_buffers advice
2010/5/27 Konrad Garus konrad.ga...@gmail.com: 2010/5/27 Cédric Villemain cedric.villemain.deb...@gmail.com: pgmincore() and pgmincore_snapshot() both are able to mmap up to 1GB. Does it mean they can occupy 1 GB of RAM? How does it relate to amount of page buffers mapped by OS? well, that is the projection of file in memory. only projection, but the memory is still acquire. It is ok to rework this part and project something like 128MB and loop. (in fact the code is needed for 9.0 because segment can be 1GB, I didn't check what is the optimum projection size yet) So both yes at your questions :) -- Konrad Garus -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query timing increased from 3s to 55s when used as function instead of select
On 21/05/2010 9:54 PM, Tyler Hildebrandt wrote: We're using a function that when run as a select statement outside of the function takes roughly 1.5s to complete whereas running an identical query within a function is taking around 55s to complete. We are lost as to why placing this query within a function as opposed to substituting the variables in a select statement is so drastically different. This is a frequently asked question. It's the same issue as with prepared queries, where the planner has to pick a more general plan when it doesn't know the value of a parameter. The short answer is work around it by using EXECUTE ... USING to invoke your query dynamically. ( Oddly, this FAQ doesn't seem to be on the FAQ list at http://wiki.postgresql.org/wiki/FAQ ) -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] shared_buffers advice
2010/5/27 Cédric Villemain cedric.villemain.deb...@gmail.com: well, that is the projection of file in memory. only projection, but the memory is still acquire. It is ok to rework this part and project something like 128MB and loop. (in fact the code is needed for 9.0 because segment can be 1GB, I didn't check what is the optimum projection size yet) So both yes at your questions :) So when I map 12 GB, this process will consume 1 GB and the time needed to browse through the whole 12 GB buffer? -- Konrad Garus -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random Page Cost and Planner
Salut, Cédric. I wonder what the plan will be if you replace sc.taken_* in : m.taken BETWEEN sc.taken_start AND sc.taken_end by values. It might help the planner... That is a fairly important restriction. I will try making it * (year1||'-01-01')::date*, but I have no constant value for it -- it is a user-supplied parameter. And then there's the year wrapping problem, too, where the ending year will differ from the starting year in certain cases. (Like querying rows between Dec 22, 1900 to Mar 22 *1901* rather than Mar 22 1900 to Dec 22 1900. The first query is the winter season and the second query is all seasons except winter.) Also, I'll consider explicit ordered join but I admit I haven't read the whole thread (in particular the table size). C'est une grosse table. Pres que 40 million lines; il y a sept tableau comme ca. I tried an explicit join in the past: it did not help much. But that was before everything was running this fast, so now that the system performs differently, maybe it will help? Dave
Re: [PERFORM] Query timing increased from 3s to 55s when used as function instead of select
On 27/05/2010 11:33 PM, Craig Ringer wrote: On 21/05/2010 9:54 PM, Tyler Hildebrandt wrote: We're using a function that when run as a select statement outside of the function takes roughly 1.5s to complete whereas running an identical query within a function is taking around 55s to complete. We are lost as to why placing this query within a function as opposed to substituting the variables in a select statement is so drastically different. This is a frequently asked question. It's the same issue as with prepared queries, where the planner has to pick a more general plan when it doesn't know the value of a parameter. The short answer is work around it by using EXECUTE ... USING to invoke your query dynamically. ( Oddly, this FAQ doesn't seem to be on the FAQ list at http://wiki.postgresql.org/wiki/FAQ ) Added as: http://wiki.postgresql.org/wiki/FAQ#Why_is_my_query_much_slower_when_run_as_a_prepared_query.3F and the subsequent entry too. Comments, edits, clarification appreciated. I know it's not as well written as it could be, could use archive links, etc; it's just pass 1. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Autovacuum in postgres.
Excerpts from venu madhav's message of vie may 21 05:38:43 -0400 2010: Hi All, In my application we are using postgres which runs on an embedded box. I have configured autovacuum to run once for every one hour. It has 5 different databases in it. When I saw the log messages, I found that it is running autovacuum on one database every hour. As a result, on my database autovacuum is run once in 5 hours. Is there any way to make it run it every hour. If you set naptime to 12 mins, it will run on one database every 12 minutes, so once per hour for your database. This is not really the intended usage though. You will have to adjust the time if another database is created. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [BUGS] Query causing explosion of temp space with join involving partitioning
Krzysztof Nienartowicz krzysztof.nienartow...@unige.ch writes: Logs of the system running queries are not utterly clear, so chasing the parameters for the explosive query is not that simple (shared logs between multiple threads), but from what I see there is no difference between them and the plan looks like (without removal of irrelevant parameters this time, most of them are float8, but also bytea) [ nestloop with inner index scans over the inherited table ] Well, that type of plan isn't going to consume much memory or disk space. What I suspect is happening is that sometimes, depending on the specific parameter values called out in the query, the planner is switching to another plan type that does consume lots of space (probably via sort or hash temp files). The most obvious guess is that that will happen when the range limits on srcid get far enough apart to make a nestloop not look cheap. You could try experimenting with EXPLAIN and different constant values to see what you get. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] shared_buffers advice
2010/5/27 Konrad Garus konrad.ga...@gmail.com: 2010/5/27 Cédric Villemain cedric.villemain.deb...@gmail.com: well, that is the projection of file in memory. only projection, but the memory is still acquire. It is ok to rework this part and project something like 128MB and loop. (in fact the code is needed for 9.0 because segment can be 1GB, I didn't check what is the optimum projection size yet) So both yes at your questions :) So when I map 12 GB, this process will consume 1 GB and the time needed to browse through the whole 12 GB buffer? Exactly. And the time to browse depend on the number of blocks already in core memory. I am interested by tests results and benchmarks if you are going to do some :) -- Konrad Garus -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random Page Cost and Planner
2010/5/27 David Jarvis thanga...@gmail.com: Salut, Cédric. I wonder what the plan will be if you replace sc.taken_* in : m.taken BETWEEN sc.taken_start AND sc.taken_end by values. It might help the planner... That is a fairly important restriction. I will try making it (year1||'-01-01')::date, but I have no constant value for it -- it is a user-supplied parameter. And then there's the year wrapping problem, too, where the ending year will differ from the starting year in certain cases. (Like querying rows between Dec 22, 1900 to Mar 22 1901 rather than Mar 22 1900 to Dec 22 1900. The first query is the winter season and the second query is all seasons except winter.) Ah, I though that you had a start and an end provided (so able to put them in the query) Also, I'll consider explicit ordered join but I admit I haven't read the whole thread (in particular the table size). C'est une grosse table. Pres que 40 million lines; il y a sept tableau comme ca. I tried an explicit join in the past: it did not help much. But that was before everything was running this fast, so now that the system performs differently, maybe it will help? yes. the documentation is fine for this topic : http://www.postgresql.org/docs/8.4/interactive/explicit-joins.html Consider the parameter to explicit join order (you can set it per sql session). You know your data and know what are the tables with less results to join first. ;) Dave -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Does FILTER in SEQSCAN short-circuit AND?
Sample code: SELECT * FROM MyTable WHERE foo = 'bar' AND MySlowFunc('foo') = 'bar' Let's say this required a SEQSCAN because there were no indexes to support column foo. For every row where foo 'bar' would the filter on the SEQSCAN short-circuit the AND return false right away, or would it still execute MySlowFunc('foo') ? Thanks! Carlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Does FILTER in SEQSCAN short-circuit AND?
Carlo Stonebanks stonec.regis...@sympatico.ca wrote: SELECT * FROM MyTable WHERE foo = 'bar' AND MySlowFunc('foo') = 'bar' Let's say this required a SEQSCAN because there were no indexes to support column foo. For every row where foo 'bar' would the filter on the SEQSCAN short-circuit the AND return false right away, or would it still execute MySlowFunc('foo') ? For that example, I'm pretty sure it will skip the slow function for rows which fail the first test. A quick test confirmed that for me. If you create a sufficiently slow function, you shouldn't have much trouble testing that yourself. :-) -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Does FILTER in SEQSCAN short-circuit AND?
On 5/18/10 3:28 PM, Carlo Stonebanks wrote: Sample code: SELECT * FROM MyTable WHERE foo = 'bar' AND MySlowFunc('foo') = 'bar' Let's say this required a SEQSCAN because there were no indexes to support column foo. For every row where foo 'bar' would the filter on the SEQSCAN short-circuit the AND return false right away, or would it still execute MySlowFunc('foo') ? I asked a similar question a few years back, and the answer is that the planner just makes a guess and applies it to all functions. It has no idea whether your function is super fast or incredibly slow, they're all assigned the same cost. In this fairly simple case, the planner might reasonably guess that foo = 'bar' will always be faster than AnyFunc(foo) = 'bar'. But for real queries, that might not be the case. In my case, I have a function that is so slow that it ALWAYS is good to avoid it. Unfortunately, there's no way to explain that to Postgres, so I have to use other tricks to force the planner not to use it. select * from (select * from MyTable where foo = 'bar' offset 0) where MySlowFunc(foo) = 'bar'; The offset 0 prevents the planner from collapsing this query back into your original syntax. It will only apply MySlowFunc() to rows where you already know that foo = 'bar'. It would be nice if Postgres had a way to assign a cost to every function. Until then, you have to use convoluted SQL if you have a really slow function. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Does FILTER in SEQSCAN short-circuit AND?
Craig James wrote on 27.05.2010 23:13: It would be nice if Postgres had a way to assign a cost to every function. Isn't that what the COST parameter is intended to be: http://www.postgresql.org/docs/current/static/sql-createfunction.html Thomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Does FILTER in SEQSCAN short-circuit AND?
Craig James craig_ja...@emolecules.com wrote: It would be nice if Postgres had a way to assign a cost to every function. The COST clause of CREATE FUNCTION doesn't do what you want? http://www.postgresql.org/docs/8.4/interactive/sql-createfunction.html -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Does FILTER in SEQSCAN short-circuit AND?
On 5/27/10 2:28 PM, Kevin Grittner wrote: Craig Jamescraig_ja...@emolecules.com wrote: It would be nice if Postgres had a way to assign a cost to every function. The COST clause of CREATE FUNCTION doesn't do what you want? http://www.postgresql.org/docs/8.4/interactive/sql-createfunction.html Cool ... I must have missed it when this feature was added. Nice! Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random Page Cost and Planner
Agree with Tom on his point about avoidance of cost param adjustments to fit specific test cases. A few suggestions...as I assume you own this database... - check out pg_statio_user_tables - optimize your cache hit ratio on blocks read...different time durations... pg_stat_bgwriter (read from a script or something and snapshot) - pg_buffercache in contrib/ - /proc/meminfo on linux - find out exactly what is going on with your kernel buffer cache (size, how it is buffering) and if your controller or drive is using a read ahead cache. - might want to play around with partial indexes vs. and/or range partitioning with exclusion constraints, etc. - define I/O characteristics of the dataset - taking into account index clustering and index order on in-memory pages (i.e. re-cluster?), why need for multiple index if clustering indexes on heap? - solidify the referential integrity constraints between those tables, on paperdefine the use cases before modifying the database tables...i assume this is a dev database - linux fs mount options to explore - i.e. noatime, writeback, etc. -maybe look at prepared statements if you are running alot of similar queries from a single session? assuming web front end for your db - with say frequently queried region/category/dates for large read-only dataset with multiple join conditions? There are some good presentations on pgcon.org from PGCon 2010 that was held last week... http://www.pgcon.org/2010/schedule/events/218.en.html If you take everything into account and model it correctly (not too loose, not too tight), your solution will be reusable and will save time and hardware expenses. Regards - Bryan On Thu, May 27, 2010 at 2:43 AM, David Jarvis thanga...@gmail.com wrote: Hi, Bryan. I was just about to reply to the thread, thanks for asking. Clustering was key. After rebooting the machine (just to make sure absolutely nothing was cached), I immediately ran a report on Toronto: 5.25 seconds! Here's what I did: 1. Created a new set of tables that matched the old set, with statistics of 1000 on the station and taken (date) columns. 2. Inserted the data from the old hierarchy into the new set, ordered by station id then by date (same seven child tables as before: one per category). - I wanted to ensure a strong correlation between primary key and station id. 3. Added three indexes per table: (a) station id; (b) date taken; and (c) station-taken-category. 4. Set the station-taken-category index as CLUSTER. 5. Vacuumed the new tables. 6. Dropped the old tables. 7. Set the following configuration values: - shared_buffers = 1GB - temp_buffers = 32MB - work_mem = 32MB - maintenance_work_mem = 64MB - seq_page_cost = 1.0 - random_page_cost = 2.0 - cpu_index_tuple_cost = 0.001 - effective_cache_size = 512MB I ran a few more reports (no reboots, but reading vastly different data sets): - Vancouver: 4.2s - Yellowknife: 1.7s - Montreal: 6.5s - Trois-Riviers: 2.8s No full table scans. I imagine some indexes are not strictly necessary and will test to see which can be removed (my guess: the station and taken indexes). The problem was that the station ids were scattered and so PostgreSQL presumed a full table scan would be faster. Physically ordering the data by station ids triggers index use every time. Next week's hardware upgrade should halve those times -- unless anyone has further suggestions to squeeze more performance out of PG. ;-) Dave
Re: [PERFORM] merge join killing performance
Scott Marlowe scott.marl...@gmail.com writes: So, Tom, so you think it's possible that the planner isn't noticing all those nulls and thinks it'll just take a row or two to get to the value it needs to join on? I've committed a patch for this, if you're interested in testing that it fixes your situation. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] hp hpsa vs cciss driver
Hi all, Are there any HP Smart Array disk controller users running linux that have experimented with the new scsi based hpsa driver over the block based cciss driver? I have a p800 controller that I'll try out soon. (I hope.) Regards, Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] merge join killing performance
On Thu, May 27, 2010 at 7:16 PM, Tom Lane t...@sss.pgh.pa.us wrote: Scott Marlowe scott.marl...@gmail.com writes: So, Tom, so you think it's possible that the planner isn't noticing all those nulls and thinks it'll just take a row or two to get to the value it needs to join on? I've committed a patch for this, if you're interested in testing that it fixes your situation. Cool, do we have a snapshot build somewhere or do I need to get all the extra build bits like flex or yacc or bison or whatnot? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] merge join killing performance
Scott Marlowe scott.marl...@gmail.com writes: On Thu, May 27, 2010 at 7:16 PM, Tom Lane t...@sss.pgh.pa.us wrote: I've committed a patch for this, if you're interested in testing that it fixes your situation. Cool, do we have a snapshot build somewhere or do I need to get all the extra build bits like flex or yacc or bison or whatnot? There's a nightly snapshot tarball of HEAD on the ftp server. I don't believe there's any snapshots for back branches though. Alternatively, you could grab the latest release tarball for whichever branch you want and just apply that patch --- it should apply cleanly. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random Page Cost and Planner
Hi, Bryan. Thanks for the notes. I thought about using a prepared statement, but I cannot find any examples of using a PREPARE statement from within a function, and don't really feel like tinkering around to figure it out. Performance is at the point where the Java/PHP bridge and JasperReports are bottlenecks. The run_time variable seldom goes beyond 2.6s now. The reports take about 5 - 6 seconds to appear. At this point I'm into diminishing returns. I can perform a 60-minute hardware upgrade or spend 12 hours profiling to get less than the same net effect (and there is no guarantee I can improve the performance in fewer than 12 hours -- it took me 17 days and countless e-mails to this mailing group just to get this far -- *thank you again for all the help*, by the way). (If I was a PostgreSQL guru like most people on this list, it might take me 2 hours of profiling to optimize away the remaining bottlenecks, but even then the gain would only be a second or two in the database arena; the other system components will also gain by a hardware upgrade.) Dave