Re: [PERFORM] Performance Issue -- "Materialize"
On Sat, Aug 19, 2017 at 10:37 AM, anand086wrote: Your email is very hard to read, the formatting and line wrapping is heavily mangled. You might want to attach the plans as files attachments instead of or in addition to putting the in the body. > -> Index Only Scan using ui_nkey_test_tab on test_tab test_tab_1 > (cost=0.56..2.03 rows=1 width=8) | > > Index Cond: ((import_num = '4520460'::numeric) AND (login IS NOT > NULL)) > > It looks like the statistics for your table are desperately out of date, as a later query showed there are 762599 rows (unless login is null for all of them) but the above is estimating there is only one. When was the table last analyzed? Cheers, Jeff On Sat, Aug 19, 2017 at 10:37 AM, anand086 wrote: > I am a Postgres Newbie and trying to learn :) We have a scenario wherein, > one of the SQL with different input value for import_num showing different > execution plan. As an example, with import_num = '4520440' the execution > plan shows Nested Loop and is taking ~12secs. With import_num = '4520460' > execution plan showed using "Materialize" and never completed. After I set > enable_material to off, the execution plan is changed using Hash Semi Join > and completes in less than 3 secs. SELECT count(*) FROM test_tab WHERE > login IN (SELECT DISTINCT login FROM test_tab WHERE import_num = '4520440' > AND login IS NOT NULL EXCEPT SELECT DISTINCT login FROM test_tab WHERE > import_num = '0' AND login IS NOT NULL) AND import_num = '4520440'; > ++ | count | ++ | 746982 | ++ (1 row) Time: > 12054.274 ms > > +---+ > | > QUERY PLAN > | > +---+ > | Aggregate (cost=351405.08..351405.09 rows=1 width=8) > > | > | -> Nested Loop (cost=349846.23..350366.17 rows=415562 width=0) > > | > | -> HashAggregate (cost=349845.67..349847.67 rows=200 width=96) > > | > | Group Key: ("ANY_subquery".login)::text > > | > | -> Subquery Scan on "ANY_subquery" > (cost=340828.23..348557.47 rows=515282 width=96) > | > | -> SetOp Except (cost=340828.23..343404.65 > rows=515282 width=100) >| > | -> Sort (cost=340828.23..342116.44 rows=515283 > width=100) > | > | Sort Key: "*SELECT* 1".login > > | > | -> Append (cost=0.56..275836.74 > rows=515283 width=100) > | > | -> Subquery Scan on "*SELECT* 1" > (cost=0.56..275834.70 rows=515282 width=12) >| > | -> Unique > (cost=0.56..270681.88 rows=515282 width=8) > | > | -> Index Only Scan using > ui_nkey_test_tab on test_tab test_tab_1 (cost=0.56..268604.07 rows=831125 > width=8) | > | Index Cond: > ((import_num = '4520440'::numeric) AND (login IS NOT NULL)) > | > | -> Subquery Scan on "*SELECT* 2" > (cost=0.56..2.04 rows=1 width=12) >| > | -> Unique (cost=0.56..2.03 > rows=1 width=8) >| > | -> Index Only Scan using > ui_nkey_test_tab on test_tab test_tab_2 (cost=0.56..2.03 rows=1 width=8) > | > | Index Cond: > ((import_num = '0'::numeric) AND (login IS NOT NULL)) > | > | -> Index Only Scan using
Re: [PERFORM] Performance Issue -- "Materialize"
I think you query is a bit confusing and have many subqueries, so I tried to simplify If you cant´t have more import_num = 0 to the same login, try this SELECT count(*) FROM test_tab tab1 LEFT JOIN test_tab tab2 ON tab1.login = tab2.login AND tab2.import_num = '0' WHERE tab2.login IS NULL AND import_num = '4520440' otherwise try this SELECT count(*) FROM test_tab tab1 LEFT JOIN ( SELECT DISTINCT login FROM test_tab WHERE import_num = '0' ) tab2 ON tab1.login = tab2.login WHERE tab2.login IS NULL AND import_num = '4520440' Em seg, 21 de ago de 2017 às 15:47, Carlos Augusto Machado < caugus...@gmail.com> escreveu: > > Do you have an index on login column ? > > If not, try creating an index and taking off those DISTICTs. > > Em seg, 21 de ago de 2017 às 15:33, Justin Pryzby> escreveu: > >> On Sat, Aug 19, 2017 at 10:37:56AM -0700, anand086 wrote: >> > >> +---+| >> > QUERY PLAN >> > >> |+---+| >> > Aggregate (cost=351405.08..351405.09 rows=1 width=8) >> >> Would you send explain ANALYZE and not just explain ? >> >> Justin >> >> >> -- >> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org >> ) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-performance >> >
Re: [PERFORM] Performance Issue -- "Materialize"
Do you have an index on login column ? If not, try creating an index and taking off those DISTICTs. Em seg, 21 de ago de 2017 às 15:33, Justin Pryzbyescreveu: > On Sat, Aug 19, 2017 at 10:37:56AM -0700, anand086 wrote: > > > +---+| > > QUERY PLAN > > > |+---+| > > Aggregate (cost=351405.08..351405.09 rows=1 width=8) > > Would you send explain ANALYZE and not just explain ? > > Justin > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Re: [PERFORM] Performance Issue -- "Materialize"
On Sat, Aug 19, 2017 at 10:37:56AM -0700, anand086 wrote: > +---+| > > QUERY PLAN > > |+---+| > Aggregate (cost=351405.08..351405.09 rows=1 width=8) > Would you send explain ANALYZE and not just explain ? Justin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance Issue -- "Materialize"
Any thoughts on this? -- View this message in context: http://www.postgresql-archive.org/Performance-Issue-Materialize-tp5979128p5979481.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue in PostgreSQL server...
On Mon, Mar 06, 2017 at 12:17:22PM +, Dinesh Chandra 12108 wrote: > Below is the output of Query SELECT * FROM pg_stats WHERE tablename='point' > AND attname='domain_class_id' ; > > > schemaname | tablename | attname | inherited | null_frac | avg_width > | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | > correlation > > "evidence"|"point"|"domain_class_id"|f|0|8|10|"{7,9,2,11,43,3,1,10,4,17}"|"{0.9322,0.0451333,0.0145,0.0039,0.0018,0.00146667,0.0005,0.0003,6.7e-05,6.7e-05}"|""|0.889078 On Fri, Mar 03, 2017 at 12:44:07PM +, Dinesh Chandra 12108 wrote: >-> Index Scan using point_domain_class_id_index on point p > (cost=0.00..1483472.70 rows=1454751 width=16) (actual time=27.265..142101.1 > 59 rows=1607491 loops=1) On Sun, Mar 05, 2017 at 08:23:08PM -0800, Jeff Janes wrote: > Why wouldn't this be using a bitmap scan rather than a regular index scan? > It seems like it should prefer the bitmap scan, unless the table is well > clustered on domain_class_id. In which case, why isn't it just faster? I missed your response until now, and can't see that anybody else responded, but I suspect the issue is that the *table* is highly correlated WRT this column, but the index may not be, probably due to duplicated index keys. postgres only stores statistics on expression indices, and falls back to correlation of table column of a simple indices. If you're still fighting this, would you send result of: SELECT domain_class_id, count(1) FROM point GROUP BY 1 ORDER BY 2 DESC LIMIT 22; or, SELECT count(1) FROM point GROUP BY domain_class_id ORDER BY 1 DESC LIMIT 22; if there's much repetition in the index keys, then PG's planner thinks an index scan has low random_page_cost, and effective_cache_size has little effect on large tables, and it never uses bitmap scan, which blows up if the index is fragmented and has duplicate keys. The table reads end up costing something like 1454751*random_page_cost nonsequential reads and fseek() calls when it thinks it'll cost only 1454751*16*seq_page_cost. Is the query much faster if you first reindex point_domain_class_id_index ? This has come up before, see: > https://www.postgresql.org/message-id/flat/520D6610.8040907%40emulex.com#520d6610.8040...@emulex.com > https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsasoft.com#20160524173914.ga11...@telsasoft.com > https://www.postgresql.org/message-id/flat/n6cmpug13b9rk1srebjvhphg0lm8dou1kn%404ax.com#n6cmpug13b9rk1srebjvhphg0lm8dou...@4ax.com Justin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue after upgrading from 9.4 to 9.6
On Mon, Mar 6, 2017 at 7:20 AM, Piotr Gasidłowrote: > We are having some performance issues after we upgraded to newest > version of PostgreSQL, before it everything was fast and smooth. > > Upgrade was done by pg_upgrade from 9.4 directly do 9.6.1. Now we > upgraded to 9.6.2 with no improvement. > > Some information about our setup: Freebsd, Solaris (SmartOS), simple > master-slave using streaming replication. > > Problem: > Very high system CPU when master is streaming replication data, CPU > goes up to 77%. Only one process is generating this load, it's a > postgresql startup process. When I attached a truss to this process I > saw a lot o read calls with almost the same number of errors (EAGAIN). > > root@d8:~ # truss -c -p 38091 > ^Csyscall seconds calls errors > semop 0.001611782 198 0 > write 0.74404 2 0 > read2.281535100 17266 12375 > openat 0.000683532 48 0 > lseek 0.177612479 20443 0 > close 0.000395549 48 0 > - --- --- > 2.461912846 38005 12375 > > read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable' > lseek(444,0x0,SEEK_END) = 32571392 (0x1f1) > read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable' > lseek(444,0x0,SEEK_END) = 32571392 (0x1f1) > read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable' > lseek(7,0x0,SEEK_END) = 164487168 (0x9cde000) > lseek(778,0x0,SEEK_END) = 57344 (0xe000) > read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable' > lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000) > lseek(894,0x4b7e000,SEEK_SET) = 79159296 (0x4b7e000) > read(894," ~\0\08\a\M--m\0\0\^A\0\M^T\"...,8192) = 8192 (0x2000) > lseek(3,0xfa6000,SEEK_SET) = 16408576 (0xfa6000) > read(3,"\M^S\M-P\^E\0\^A\0\0\0\0`\M-z"...,8192) = 8192 (0x2000) > read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable' > lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000) > lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000) > lseek(894,0x449c000,SEEK_SET) = 71942144 (0x449c000) > read(894,"\^_~\0\0\M-H\M-H\M-B\M-b\0\0\^E"...,8192) = 8192 (0x2000) > lseek(818,0x0,SEEK_END) = 57344 (0xe000) > read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable' > lseek(442,0x0,SEEK_END) = 10174464 (0x9b4000) > lseek(442,0x4c4000,SEEK_SET) = 4997120 (0x4c4000) > read(442,"\^_~\0\0\M-P\M-+\M-1\M-b\0\0\0\0"...,8192) = 8192 (0x2000) > read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable' > > Descriptor 6 is a pipe > > Read call try to read one byte over and over, I looked up to source > code and I think this file is responsible for this behavior > src/backend/storage/ipc/latch.c. There was no such file in 9.4. Is a git bisect out of the question? merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue in PostgreSQL server...
Dear Justin, Below is the output of Query SELECT * FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ; schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation "evidence"|"point"|"domain_class_id"|f|0|8|10|"{7,9,2,11,43,3,1,10,4,17}"|"{0.9322,0.0451333,0.0145,0.0039,0.0018,0.00146667,0.0005,0.0003,6.7e-05,6.7e-05}"|""|0.889078 Regards, Dinesh Chandra |Database administrator (Oracle/PostgreSQL)| Cyient Ltd. Noida. -Original Message- From: Justin Pryzby [mailto:pry...@telsasoft.com] Sent: 06 March, 2017 10:54 AM To: Dinesh Chandra 12108 <dinesh.chan...@cyient.com> Cc: Nur Agus <nuragus.li...@gmail.com>; Jeff Janes <jeff.ja...@gmail.com>; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Performance issue in PostgreSQL server... On Sun, Mar 05, 2017 at 08:23:08PM -0800, Jeff Janes wrote: > On Fri, Mar 3, 2017 at 4:44 AM, Dinesh Chandra 12108 > <dinesh.chan...@cyient.com> wrote: > > The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT > > feature_id FROM evidence.point p INNER JOIN > > evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id > > WHERE p.domain_class_id IN (11) AND (p.modification_time > > > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 > > 00:06:56.056 IST') ORDER BY feature_id > ... > > > -> Index Scan using point_domain_class_id_index on > > point p (cost=0.00..1483472.70 rows=1454751 width=16) (actual > > time=27.265..142101.1 59 rows=1607491 loops=1) > >Index Cond: (domain_class_id = 11) > > Why wouldn't this be using a bitmap scan rather than a regular index scan? > It seems like it should prefer the bitmap scan, unless the table is > well clustered on domain_class_id. In which case, why isn't it just faster? Could you send: SELECT * FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ; .. or if that's too verbose or you don't want to share the histogram or MCV list: SELECT correlation FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ; Justin DISCLAIMER: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. Check all attachments for viruses before opening them. All views or opinions presented in this e-mail are those of the author and may not reflect the opinion of Cyient or those of our affiliates. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue in PostgreSQL server...
On Sun, Mar 05, 2017 at 08:23:08PM -0800, Jeff Janes wrote: > On Fri, Mar 3, 2017 at 4:44 AM, Dinesh Chandra 12108 >wrote: > > The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT > > feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence > > oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND > > (p.modification_time > '2015-05-10 00:06:56.056 IST' OR > > oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id > ... > > > -> Index Scan using point_domain_class_id_index on > > point p (cost=0.00..1483472.70 rows=1454751 width=16) (actual > > time=27.265..142101.1 59 rows=1607491 loops=1) > >Index Cond: (domain_class_id = 11) > > Why wouldn't this be using a bitmap scan rather than a regular index scan? > It seems like it should prefer the bitmap scan, unless the table is well > clustered on domain_class_id. In which case, why isn't it just faster? Could you send: SELECT * FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ; .. or if that's too verbose or you don't want to share the histogram or MCV list: SELECT correlation FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ; Justin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue in PostgreSQL server...
On Fri, Mar 3, 2017 at 4:44 AM, Dinesh Chandra 12108 < dinesh.chan...@cyient.com> wrote: > Dear Nur, > > > > The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT > feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence > oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND > (p.modification_time > '2015-05-10 00:06:56.056 IST' OR > oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id > > > > ... > -> Index Scan using point_domain_class_id_index on > point p (cost=0.00..1483472.70 rows=1454751 width=16) (actual > time=27.265..142101.1 > > 59 rows=1607491 loops=1) > >Index Cond: (domain_class_id = 11) > Why wouldn't this be using a bitmap scan rather than a regular index scan? It seems like it should prefer the bitmap scan, unless the table is well clustered on domain_class_id. In which case, why isn't it just faster? You could try repeating the explain analyze after setting enable_indexscan =off to see what that gives. If it gives a seq scan, then repeat with enable_seqscan also turned off. Or If it gives the bitmap scan, then repeat with enable_bitmapscan turned off. How many rows is in point, and how big is it? The best bet for making this better might be to have an index on (domain_class_id, modification_time) and hope for an index only scan. Except that you are on 9.1, so first you would have to upgrade. Which would allow you to use BUFFERS in the explain analyze, as well as track_io_timings, both of which would also be pretty nice to see. Using 9.1 is like having one hand tied behind your back. Also, any idea why this execution of this query 15 is times faster than the execution you found in the log file? Was the top output you showed in the first email happening at the time the really slow query was running, or was that from a different period? Cheers, Jeff
Re: [PERFORM] Performance issue in PostgreSQL server...
Dinesh Chandra 12108writes: > The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id > FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON > p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND > (p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id I think that's a fundamentally slow query and you're not going to be able to make it better without rethinking your requirements and/or data representation. As written, that requires the server to form the entire join of p to oe on feature_id, with the only filter before the join being the evidently-none-too-selective domain_class_id condition. Only after joining can it apply the OR condition. So this is inherently processing a lot of rows. If the OR arms were individually pretty selective you could rewrite this into a UNION of two joins, a la the discussion at https://www.postgresql.org/message-id/flat/7f70bd5a-5d16-e05c-f0b4-2fdfc8873...@bluetreble.com but given the dates involved I'm betting that won't help very much. Or maybe you could try select feature_id from p where domain_class_id IN (11) AND p.modification_time > '2015-05-10 00:06:56.056 IST' intersect select feature_id from oe where oe.modification_time > '2015-05-10 00:06:56.056 IST' order by feature_id although I'm not entirely certain that that has exactly the same semantics (-ENOCAFFEINE), and it might still be none too quick. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue in PostgreSQL server...
Dear Nur, The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id QUERY PLAN - - Unique (cost=1679730.32..1679837.46 rows=21428 width=8) (actual time=154753.528..155657.818 rows=1607489 loops=1) -> Sort (cost=1679730.32..1679783.89 rows=21428 width=8) (actual time=154753.514..155087.734 rows=4053270 loops=1) Sort Key: p.feature_id Sort Method: quicksort Memory: 288302kB -> Hash Join (cost=1501657.09..1678188.87 rows=21428 width=8) (actual time=144146.620..152050.311 rows=4053270 loops=1) Hash Cond: (oe.evd_feature_id = p.feature_id) Join Filter: ((p.modification_time > '2015-05-10 03:36:56.056+05:30'::timestamp with time zone) OR (oe.modification_time > '2015-05-10 03:36:5 6.056+05:30'::timestamp with time zone)) -> Seq Scan on observation_evidence oe (cost=0.00..121733.18 rows=5447718 width=16) (actual time=0.007..1534.905 rows=5434406 loops=1) -> Hash (cost=1483472.70..1483472.70 rows=1454751 width=16) (actual time=144144.653..144144.653 rows=1607491 loops=1) Buckets: 262144 Batches: 1 Memory Usage: 75352kB -> Index Scan using point_domain_class_id_index on point p (cost=0.00..1483472.70 rows=1454751 width=16) (actual time=27.265..142101.1 59 rows=1607491 loops=1) Index Cond: (domain_class_id = 11) Total runtime: 155787.379 ms (13 rows) Regards, Dinesh Chandra |Database administrator (Oracle/PostgreSQL)| Cyient Ltd. Noida. -- Mobile: +91-9953975849 | Ext 1078 |dinesh.chan...@cyient.com<mailto:%7cdinesh.chan...@cyient.com> Plot No. 7, NSEZ, Phase-II ,Noida-Dadri Road, Noida - 201 305,India. From: Nur Agus [mailto:nuragus.li...@gmail.com] Sent: 03 March, 2017 5:54 PM To: Dinesh Chandra 12108 <dinesh.chan...@cyient.com> Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Performance issue in PostgreSQL server... Hello Dinesh, You can try the EXPLAIN tool psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id Then paste here the result. Thanks On Fri, Mar 3, 2017 at 5:29 PM, Dinesh Chandra 12108 <dinesh.chan...@cyient.com<mailto:dinesh.chan...@cyient.com>> wrote: Dear Experts, I need your suggestions to resolve the performance issue reported on our PostgreSQL9.1 production database having 1.5 TB Size. I have observed that, some select queries with order by clause are taking lot of time in execution and forcing applications to give slow response. The configuration of database server is : Architecture: x86_64 CPU op-mode(s):32-bit, 64-bit CPU’s : 8 Core(s) per socket:4 Socket(s): 2 Model name:Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz RAM : 32 GB SWAP :8 Gb Kernel parameter: kernel.shmmax = 32212254720 kernel.shmall = 1073741824 Values of PostgreSQL.conf parameters are : shared_buffers = 10GB temp_buffers = 32MB work_mem = 512MB maintenance_work_mem = 2048MB max_files_per_process = 2000 checkpoint_segments = 200 max_wal_senders = 5 wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers Queries taking lot of time are: == 2017-03-02 00:46:50 IST LOG: duration: 2492951.927 ms execute : SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id 2017-03-02 01:05:16 IST LOG: duration: 516250.512 ms execute : SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (3) AND (p.modification_time > '2015-05-10 01:22:59.059 IST' OR oe.modification_time > '2015-05-10 01:22:59.059 IST') ORDER BY feature_id Top command output: top - 15:13:15 up 66 days, 3:45, 8 users, load average: 1.8
Re: [PERFORM] Performance issue in PostgreSQL server...
Hello Dinesh, You can try the EXPLAIN tool psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id Then paste here the result. Thanks On Fri, Mar 3, 2017 at 5:29 PM, Dinesh Chandra 12108 < dinesh.chan...@cyient.com> wrote: > Dear Experts, > > > > I need your suggestions to resolve the performance issue reported on our > *PostgreSQL9.1* production database having 1.5 TB *Size*. I have observed > that, some select queries with order by clause are taking lot of time in > execution and forcing applications to give slow response. > > > > The configuration of database server is : > > > > Architecture: x86_64 > > CPU op-mode(s):32-bit, 64-bit > > CPU’s : 8 > > Core(s) per socket:4 > > Socket(s): 2 > > Model name:Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz > > > > *RAM* : 32 GB > > *SWAP* :8 Gb > > > > *Kernel parameter*: > > > > kernel.shmmax = 32212254720 > > kernel.shmall = 1073741824 > > > > > > Values of PostgreSQL.conf parameters are : > > > > shared_buffers = 10GB > > temp_buffers = 32MB > > work_mem = 512MB > > maintenance_work_mem = 2048MB > > max_files_per_process = 2000 > > checkpoint_segments = 200 > > max_wal_senders = 5 > > wal_buffers = -1 # min 32kB, -1 sets based on > shared_buffers > > > > > > *Queries taking lot of time are:* > > == > > > > > > 2017-03-02 00:46:50 IST LOG: duration: 2492951.927 ms execute : > SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN > evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE > p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 > 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') > ORDER BY feature_id > > > > > > 2017-03-02 01:05:16 IST LOG: duration: 516250.512 ms execute : > SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN > evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE > p.domain_class_id IN (3) AND (p.modification_time > '2015-05-10 > 01:22:59.059 IST' OR oe.modification_time > '2015-05-10 01:22:59.059 IST') > ORDER BY feature_id > > > > > > *Top command output*: > > > > top - 15:13:15 up 66 days, 3:45, 8 users, load average: 1.84, 1.59, 1.57 > > Tasks: 830 total, 1 running, 828 sleeping, 0 stopped, 1 zombie > > Cpu(s): 3.4%us, 0.7%sy, 0.0%ni, 81.7%id, 14.2%wa, 0.0%hi, 0.0%si, > 0.0%st > > *Mem:* 32830016k total, *32142596k* used, *687420k* free,77460k > buffers > > Swap: 8190972k total, 204196k used, 7986776k free, 27981268k cached > > > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > > 30639 postgres 20 0 10.5g 4.7g 4.7g S 13.5 14.9 10:20.95 postgres > > 18185 postgres 20 0 10.5g 603m 596m S 4.9 1.9 2:51.16 postgres > > 16543 postgres 20 0 10.5g 2.8g 2.8g S 4.3 8.8 1:34.04 postgres > > 14710 postgres 20 0 10.5g 2.9g 2.9g S 3.9 9.2 1:20.84 postgres > > 1214 root 20 0 15668 1848 896 S 1.0 0.0 130:46.43 top > > 13462 postgres 20 0 10.5g 1.4g 1.3g S 1.0 4.3 0:25.56 postgres > > 20081 root 20 0 15668 1880 936 R 1.0 0.0 0:00.12 top > > 13478 postgres 20 0 10.5g 2.1g 2.1g S 0.7 6.9 0:56.43 postgres > > 41107 root 20 0 416m 10m 4892 S 0.7 0.0 305:25.71 pgadmin3 > > 2680 root 20 0 000 S 0.3 0.0 103:38.54 nfsiod > > 3558 root 20 0 13688 1100 992 S 0.3 0.0 45:00.36 gam_server > > 15576 root 20 0 000 S 0.3 0.0 0:01.16 flush-253:1 > > 18430 postgres 20 0 10.5g 18m 13m S 0.3 0.1 0:00.64 postgres > > 20083 postgres 20 0 105m 1852 1416 S 0.3 0.0 0:00.01 bash > > 24188 postgres 20 0 102m 1856 832 S 0.3 0.0 0:23.39 sshd > > 28250 postgres 20 0 156m 1292 528 S 0.3 0.0 0:46.86 postgres > > 1 root 20 0 19356 1188 996 S 0.0 0.0 0:05.00 init > > > > *Regards,* > > *Dinesh Chandra* > > *|Database administrator (Oracle/PostgreSQL)| * > > > > -- > > DISCLAIMER: > > This email message is for the sole use of the intended recipient(s) and > may contain confidential and privileged information. Any unauthorized > review, use, disclosure or distribution is prohibited. If you are not the > intended recipient, please contact the sender by reply email and destroy > all copies of the original message. Check all attachments for viruses > before opening them. All views or opinions presented in this e-mail are > those of the author and may not reflect the opinion of Cyient or those of > our affiliates. >
Re: [PERFORM] Performance issue with castings args of the function
Thanks all for explain! On Mon, Jan 2, 2017 at 9:36 PM, Tom Lanewrote: > =?UTF-8?B?0JDQvdC00YDQtdC5INCl0L7Qt9C+0LI=?= writes: > > create table t1 (id serial, str char(32)); > > > create function f1(line text) returns void as $$ > > begin > > perform * from t1 where str = line; > > end; > > $$ language plpgsql; > > This query is specifying a text comparison (text = text operator). > Since the table column isn't text, a char-to-text conversion must > happen at each line. > > > create function f2(line char) returns void as $$ > > begin > > perform * from t1 where str = line; > > end; > > $$ language plpgsql; > > This query is specifying a char(n) comparison (char = char operator). > No type conversion step needed, so it's faster. > > regards, tom lane > -- Andrey Khozov
Re: [PERFORM] Performance issue with castings args of the function
=?UTF-8?B?0JDQvdC00YDQtdC5INCl0L7Qt9C+0LI=?=writes: > create table t1 (id serial, str char(32)); > create function f1(line text) returns void as $$ > begin > perform * from t1 where str = line; > end; > $$ language plpgsql; This query is specifying a text comparison (text = text operator). Since the table column isn't text, a char-to-text conversion must happen at each line. > create function f2(line char) returns void as $$ > begin > perform * from t1 where str = line; > end; > $$ language plpgsql; This query is specifying a char(n) comparison (char = char operator). No type conversion step needed, so it's faster. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue with castings args of the function
Hi 2017-01-02 15:34 GMT+01:00 Андрей Хозов: > Hello there! > > I have an performance issue with functions and args type. > > Table and data: > create table t1 (id serial, str char(32)); > insert into t1 (str) select md5(s::text) from generate_series(1, 100) > as s; > > And simple functions: > create function f1(line text) returns void as $$ > begin > perform * from t1 where str = line; > end; > $$ language plpgsql; > > create function f2(line char) returns void as $$ > begin > perform * from t1 where str = line; > end; > $$ language plpgsql; > > Query: > test=> explain analyze select f2('2b00042f7481c7b056c4b410d28f33cf'); >QUERY PLAN > > > > Result (cost=0.00..0.26 rows=1 width=0) (actual time=189.008..189.010 > rows=1 loops=1) > Planning time: 0.039 ms > Execution time: 189.039 ms > (3 rows) > > Time: 189,524 ms > test=> explain analyze select f1('2b00042f7481c7b056c4b410d28f33cf'); >QUERY PLAN > > > > Result (cost=0.00..0.26 rows=1 width=0) (actual time=513.734..513.735 > rows=1 loops=1) > Planning time: 0.024 ms > Execution time: 513.757 ms > (3 rows) > > Time: 514,125 ms > test=> explain analyze select f1('2b00042f7481c7b056c4b410d2 > 8f33cf'::char(32)); >QUERY PLAN > > > > Result (cost=0.00..0.26 rows=1 width=0) (actual time=513.507..513.509 > rows=1 loops=1) > Planning time: 0.074 ms > Execution time: 513.535 ms > (3 rows) > This explain shows nothing - you need to use nested explain look on auto-explain https://www.postgresql.org/docs/current/static/auto-explain.html Maybe index was not used due different types. Regards Pavel > Time: 514,104 ms > test=> > > Seems that casting param from text to char(32) needs to be done only once > and f1 and f2 must be identical on performance. But function f2 with text > param significantly slower, even with casting arg while pass it to function. > > Tested postgresql versions 9.5.5 and 9.6.1 on Ubuntu 16.04. It's normal > behavior or it's can be fixed? > > -- > Andrey Khozov >
Re: [PERFORM] performance issue with bitmap index scans on huge amounts of big jsonb documents
On Wed, Nov 30, 2016 at 6:26 AM, Marc-Olaf Jaschkewrote: > Hi, > > i have a performance issue with bitmap index scans on huge amounts of big > jsonb documents. > > > = Background = > > - table with big jsonb documents > - gin index on these documents > - queries using index conditions with low selectivity > > > = Example = > > select version(); >> PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 >> 20120313 (Red Hat 4.4.7-17), 64-bit > > show work_mem; >> 1GB > > -- setup test data > create table bitmap_scan_test as > select > i, > (select jsonb_agg(jsonb_build_object('x', i % 2, 'filler', md5(j::text))) > from generate_series(0, 100) j) big_jsonb > from > generate_series(0, 10) i; > > create index on bitmap_scan_test using gin (big_jsonb); > > analyze bitmap_scan_test; > > > -- query with bitmap scan > explain analyze > select > count(*) > from > bitmap_scan_test > where > big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]'; > > Aggregate (cost=272.74..272.75 rows=1 width=8) (actual time=622.272..622.272 > rows=1 loops=1) > -> Bitmap Heap Scan on bitmap_scan_test (cost=120.78..272.49 rows=100 > width=0) (actual time=16.496..617.431 rows=5 loops=1) > Recheck Cond: (big_jsonb @> '[{"x": 1, "filler": > "cfcd208495d565ef66e7dff9f98764da"}]'::jsonb) > Heap Blocks: exact=637 > -> Bitmap Index Scan on bitmap_scan_test_big_jsonb_idx > (cost=0.00..120.75 rows=100 width=0) (actual time=16.371..16.371 rows=5 > loops=1) > Index Cond: (big_jsonb @> '[{"x": 1, "filler": > "cfcd208495d565ef66e7dff9f98764da"}]'::jsonb) > Planning time: 0.106 ms > Execution time: 622.334 ms > > > perf top -p... shows heavy usage of pglz_decompress: > > Overhead Shared Object Symbol > 51,06% postgres [.] pglz_decompress >7,33% libc-2.12.so [.] memcpy Another thing to possibly look at is configuring the column not to compress; over half the time is spent decompressing the data. See: ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN } Naturally this is a huge tradeoff so do some careful analysis before making the change. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance issue with bitmap index scans on huge amounts of big jsonb documents
Thanks for the explanation! Best Regards, Marc-Olaf Marc-Olaf Jaschke · Softwareentwickler shopping24 GmbH Werner-Otto-Straße 1-7 · 22179 Hamburg Telefon: +49 (0) 40 6461 5830 · Fax: +49 (0) 40 64 61 7879 marc-olaf.jasc...@s24.com · www.s24.com AG Hamburg HRB 63371 vertreten durch Dr. Björn Schäfers und Martin Mildner 2016-12-05 3:28 GMT+01:00 Jeff Janes: > > > big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]'; > > >> I wonder why bitmap heap scan adds such a big amount of time on top of >> the plain bitmap index scan. >> It seems to me, that the recheck is active although all blocks are exact >> [1] and that pg is loading the jsonb for the recheck. >> >> Is this an expected behavior? >> > > > Yes, this is expected. The gin index is lossy. It knows that all the > elements are present (except when it doesn't--large elements might get > hashed down and suffer hash collisions), but it doesn't know what the > recursive structure between them is, and has to do a recheck. > > For example, if you change your example where clause to: > > big_jsonb @> '[{"filler": 1, "x": "cfcd208495d565ef66e7dff9f98764da"}]'; > > You will see that the index still returns 50,000 rows, but now all of them > get rejected upon the recheck. > > You could try changing the type of index to jsonb_path_ops. In your given > example, it won't make a difference, because you are actually counting half > the table and so half the table needs to be rechecked. But in my example, > jsonb_path_ops successfully rejects all the rows at the index stage. > > Cheers, > > Jeff >
Re: [PERFORM] performance issue with bitmap index scans on huge amounts of big jsonb documents
> big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]'; > I wonder why bitmap heap scan adds such a big amount of time on top of the > plain bitmap index scan. > It seems to me, that the recheck is active although all blocks are exact > [1] and that pg is loading the jsonb for the recheck. > > Is this an expected behavior? > Yes, this is expected. The gin index is lossy. It knows that all the elements are present (except when it doesn't--large elements might get hashed down and suffer hash collisions), but it doesn't know what the recursive structure between them is, and has to do a recheck. For example, if you change your example where clause to: big_jsonb @> '[{"filler": 1, "x": "cfcd208495d565ef66e7dff9f98764da"}]'; You will see that the index still returns 50,000 rows, but now all of them get rejected upon the recheck. You could try changing the type of index to jsonb_path_ops. In your given example, it won't make a difference, because you are actually counting half the table and so half the table needs to be rechecked. But in my example, jsonb_path_ops successfully rejects all the rows at the index stage. Cheers, Jeff
Re: [PERFORM] Performance issue with NestedLoop query
On Tue, Aug 4, 2015 at 8:40 PM, Ram N yrami...@gmail.com wrote: Thanks much for responding guys. I have tried both, building multi column indexes and GIST, with no improvement. I have reduced the window from 180 days to 30 days and below are the numbers Composite index - takes 30 secs With Btree indexing - takes 9 secs With GIST - takes 30 secs with kind of materialize plan in explain Any other ideas I can do for window based joins. From this query: select sum(a), count(id), a.ts, st from table1 a, table2 b where a.ts b.start_date and a.ts b.end_date and a.ts '2015-01-01 20:50:44.00 +00:00:00' and a.ts '2015-07-01 19:50:44.00 +00:00:00' group by a.ts, st order by a.ts We can actually derive that b.start_date '2015-07-01 19:50:44.00 +00:00:00' and b.end_date '2015-01-01 20:50:44.00 +00:00:00'. If we add these two predicates to the original query, does it help? Thanks, Qingqing -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue with NestedLoop query
Thanks much for responding guys. I have tried both, building multi column indexes and GIST, with no improvement. I have reduced the window from 180 days to 30 days and below are the numbers Composite index - takes 30 secs With Btree indexing - takes 9 secs With GIST - takes 30 secs with kind of materialize plan in explain Any other ideas I can do for window based joins. --yr On Fri, Jul 31, 2015 at 11:37 AM, Qingqing Zhou zhouqq.postg...@gmail.com wrote: On Fri, Jul 31, 2015 at 10:55 AM, Ram N yrami...@gmail.com wrote: Thanks Qingqing for responding. That didn't help. It in fact increased the scan time. Looks like a lot of time is being spent on the NestedLoop Join than index lookups though I am not sure how to optimize the join. Good news is that optimizer is right this time :-). The NLJ here does almost nothing but schedule each outer row to probing the inner index. So the index seek is the major cost. Have you tried build a two column index on (b.start_date, b.end_date)? Regards, Qingqing
Re: [PERFORM] Performance issue with NestedLoop query
On Thu, Jul 30, 2015 at 4:51 AM, Ram N yrami...@gmail.com wrote: select sum(a), count(id), a.ts, st from table1 a, table2 b where a.ts b.start_date and a.ts b.end_date and a.ts '2015-01-01 20:50:44.00 +00:00:00' and a.ts '2015-07-01 19:50:44.00 +00:00:00' group by a.ts, st order by a.ts You could try to use a range type: CREATE INDEX ON table2 USING gin (tstzrange(start_date, end_date, '()')); Then: select sum(a), count(id), a.ts, st from table1 a, table2 b where tstzrange(b.start_date, b.end_date, '()') @ a.ts and a.ts '2015-07-01 19:50:44.00 +00:00:00' group by a.ts, st order by a.ts Regards, -- Matheus de Oliveira
Re: [PERFORM] Performance issue with NestedLoop query
On Fri, Jul 31, 2015 at 10:55 AM, Ram N yrami...@gmail.com wrote: Thanks Qingqing for responding. That didn't help. It in fact increased the scan time. Looks like a lot of time is being spent on the NestedLoop Join than index lookups though I am not sure how to optimize the join. Good news is that optimizer is right this time :-). The NLJ here does almost nothing but schedule each outer row to probing the inner index. So the index seek is the major cost. Have you tried build a two column index on (b.start_date, b.end_date)? Regards, Qingqing -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue with NestedLoop query
On Fri, Jul 31, 2015 at 3:06 PM, Matheus de Oliveira matioli.math...@gmail.com wrote: CREATE INDEX ON table2 USING gin (tstzrange(start_date, end_date, '()')); The index should be USING GIST, not GIN. Sorry. -- Matheus de Oliveira
Re: [PERFORM] Performance issue with NestedLoop query
Thanks Qingqing for responding. That didn't help. It in fact increased the scan time. Looks like a lot of time is being spent on the NestedLoop Join than index lookups though I am not sure how to optimize the join. I am assuming its in memory join, so I am not sure why it should take such a lot of time. Increase work_mem has helped in reducing the processing time but it's still 1 min. --yr On Thu, Jul 30, 2015 at 1:24 PM, Qingqing Zhou zhouqq.postg...@gmail.com wrote: On Thu, Jul 30, 2015 at 12:51 AM, Ram N yrami...@gmail.com wrote: - Index Scan using end_date_idx on public.table2 b (cost=0.43..23181.37 rows=345833 width=52) (actual time=0.063..622.274 rows=403936 loops=181) Output: b.serial_no, b.name, b.st, b.end_date, b.a, b.start_date Index Cond: (a.ts b.end_date) Filter: (a.ts b.start_date) Rows Removed by Filter: 392642 In your case, do you have index built for both b.end_date and b.start_date? If so, can you try set enable_index=off to see if bitmap heap scan helps? Regards, Qingqing
Re: [PERFORM] Performance issue with NestedLoop query
On Thu, Jul 30, 2015 at 12:51 AM, Ram N yrami...@gmail.com wrote: - Index Scan using end_date_idx on public.table2 b (cost=0.43..23181.37 rows=345833 width=52) (actual time=0.063..622.274 rows=403936 loops=181) Output: b.serial_no, b.name, b.st, b.end_date, b.a, b.start_date Index Cond: (a.ts b.end_date) Filter: (a.ts b.start_date) Rows Removed by Filter: 392642 In your case, do you have index built for both b.end_date and b.start_date? If so, can you try set enable_index=off to see if bitmap heap scan helps? Regards, Qingqing -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue: index not used on GROUP BY...
Thanks for the feedback, everybody. I spent a couple of days trying to optimise this; As mentioned , the increased memory is not an option for me, as this query is part of a report that can be run by any user on an ad hoc basis. Allocating the required memory to any session on demand is not feasible in this environment. In the end , it seems to me that a more sustainable solution will be to introduce an additional table to carry the summarized values and lookup on that table in this type of scenario. Regards -- View this message in context: http://postgresql.1045698.n5.nabble.com/Performance-issue-index-not-used-on-GROUP-BY-tp5816702p5817622.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue: index not used on GROUP BY...
2014-08-28 11:50 GMT+03:00 gmb gmbou...@gmail.com: It seems as if the planner is not using the PRIMARY KEY as index which was my assumption. Can you send `EXPLAIN (analyze, buffers)` for your query instead? It'll show exactly what's going on. -- Victor Y. Yegorov
Re: [PERFORM] Performance issue: index not used on GROUP BY...
Can you send `EXPLAIN (analyze, buffers)` for your query instead? It'll show exactly what's going on. GroupAggregate (cost=303425.31..339014.43 rows=136882 width=48) (actual time=4708.181..6688.699 rows=287268 loops=1) Buffers: shared read=23899, temp read=30974 written=30974 - Sort (cost=303425.31..306847.34 rows=1368812 width=48) (actual time=4708.170..5319.429 rows=1368744 loops=1) Sort Key: co_id, client_id, doc_no, Sort Method: external merge Disk: 80304kB Buffers: shared read=23899, temp read=30974 written=30974 - Seq Scan on ddetail (cost=0.00..37587.12 rows=1368812 width=48) (actual time=0.122..492.964 rows=1368744 loops=1) Buffers: shared read=23899 Total runtime: 6708.244 ms My initial attempt was this (this is what I actually need): SELECT co_id , client_id , doc_no , line_id , batch_no , sum( amount ) FROM ddetail GROUP BY co_id , client_id , doc_no , line_id , batch_no ; but I removed column batch_no from the query because I thought this was the cause of the problem ( batch_no is not part of my PK ). Thanks -- View this message in context: http://postgresql.1045698.n5.nabble.com/Performance-issue-index-not-used-on-GROUP-BY-tp5816702p5816706.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue: index not used on GROUP BY...
2014-08-28 12:08 GMT+03:00 gmb gmbou...@gmail.com: GroupAggregate (cost=303425.31..339014.43 rows=136882 width=48) (actual time=4708.181..6688.699 rows=287268 loops=1) Buffers: shared read=23899, temp read=30974 written=30974 - Sort (cost=303425.31..306847.34 rows=1368812 width=48) (actual time=4708.170..5319.429 rows=1368744 loops=1) Sort Key: co_id, client_id, doc_no, Sort Method: external merge Disk: 80304kB Buffers: shared read=23899, temp read=30974 written=30974 - Seq Scan on ddetail (cost=0.00..37587.12 rows=1368812 width=48) (actual time=0.122..492.964 rows=1368744 loops=1) Buffers: shared read=23899 Total runtime: 6708.244 ms My initial attempt was this (this is what I actually need): SELECT co_id , client_id , doc_no , line_id , batch_no , sum( amount ) FROM ddetail GROUP BY co_id , client_id , doc_no , line_id , batch_no ; I think index will be of no help here, as (1) you're reading whole table anyway and (2) `amount` is not part of your index. Try to avoid disk-based sort by increasing `work_mem` for your session, I think value in the range 120MB-150MB should work: SET work_mem TO '150MB'; Check `EXPLAIN` output after the change. -- Victor Y. Yegorov
Re: [PERFORM] Performance issue: index not used on GROUP BY...
On Thu, Aug 28, 2014 at 11:50 AM, gmb gmbou...@gmail.com wrote: Can somebody please confirm whether aggregate functions such as GROUP BY should use indexes ? Yes, if the planner deems it faster than other approaches. It can make wrong choices for many reasons, but usually when your planner tunables like random_page_cost, effective_cache_size aren't set appropriately. There's some advice here: https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server Just for the purpose of testing, you could try set enable_sort=false in your session and see if that makes it faster. On Thu, Aug 28, 2014 at 12:08 PM, gmb gmbou...@gmail.com wrote: Sort Key: co_id, client_id, doc_no, Something went missing from this line... Sort Method: external merge Disk: 80304kB Depends on your hardware and workloads, but more work_mem may also improve queries to avoid sorts and hashes needing to use disk. But beware, setting it too high may result in your server running out of memory. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue: index not used on GROUP BY...
Thanks for these suggestions Unfortunately , I don't have a lot of memory available ( 65 connections , work_mem = 64MB in pg conf ). I think index will be of no help here, as (1) you're reading whole table anyway and (2) `amount` is not part of your index. I did not think that the the field being used in the agg function should also be part of the index. I'll try this and check the result. My problem is that dropping / adding indexes on this table takes a LOT of time, so I'm stuck with doing the tests using the indexes as is, or doing the tests on a smaller dataset. On the smaller dataset ( 1.5 mill records on that table ) the planner did not take the index into account, even when I omit the amount column: CREATE INDEX ix_1 ON ddetail USING btree (co_id , client_id , doc_no , line_id , batch_no); SELECT co_id , client_id , doc_no , line_id , batch_no FROM ddetail GROUP BY co_id , client_id , doc_no , line_id , batch_no ; HashAggregate (cost=54695.74..56064.49 rows=136875 width=22) - Seq Scan on debfdetail (cost=0.00..37586.44 rows=1368744 width=22) still does a seq scan instead of the index scan. I guess it is possible that on the 1.4 million records, it is faster to do a seq scan ? So I guess I'll have to try and do this on the 10 mill table and check the result there. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Performance-issue-index-not-used-on-GROUP-BY-tp5816702p5816715.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue: index not used on GROUP BY...
2014-08-28 14:29 GMT+03:00 gmb gmbou...@gmail.com: Unfortunately , I don't have a lot of memory available ( 65 connections , work_mem = 64MB in pg conf ). You don't have to change cluster-wide settings here. You can issue `SET` command from your client right before running your query, only your session will be affected. -- Victor Y. Yegorov
Re: [PERFORM] Performance issue: index not used on GROUP BY...
On 08/28/2014 01:50 AM, gmb wrote: Can somebody please confirm whether aggregate functions such as GROUP BY should use indexes ? Sometimes. In your case, the index has one more column than the GROUP BY, which makes it less likely that Postgres will use it (since depending on the cardinality of ddet_id, it might actually be slower to use the index). In addition, other folks on this thread have already pointed out the memory settings issues to you. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue with Insert
Jenish jenishv...@gmail.com wrote: I am using postgres 8.4.x With x being what? On what OS and hardware? Table is having 3 before insert trigger and one after insert trigger. With all triggers enable it is inserting only 4-5 record per second. But if I disable after insert trigger it is able to insert 667 records per second. http://wiki.postgresql.org/wiki/SlowQueryQuestions After insert trigger is recursive trigger. So are you counting only the top level inserts or also the ones generated by the recursive inserts? My question. How to avoid the bottleneck? First you need to find out what the bottleneck is. Parallel processing is possible in Postgres? How? To achieve parallel processing in PostgreSQL you need to use multiple connections. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue with Insert
Hi, DB : POSTGRES 8.4.8 OS : Debian HD : SAS 10k rpm Shared_buffer is 4096 25 % of RAM , effective_cache is 8GB 75% of RAM After insert trigger is again calling 2 more trigger and insert record in another table depends on condition. with all trigger enable there are 8 insert and 32 updates(approx. update is depends on hierarchy) Plz explain multiple connections. Current scenario application server is sending all requests. -- Thanks regards, JENISH VYAS On Mon, Jun 27, 2011 at 5:37 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Jenish jenishv...@gmail.com wrote: I am using postgres 8.4.x With x being what? On what OS and hardware? Table is having 3 before insert trigger and one after insert trigger. With all triggers enable it is inserting only 4-5 record per second. But if I disable after insert trigger it is able to insert 667 records per second. http://wiki.postgresql.org/wiki/SlowQueryQuestions After insert trigger is recursive trigger. So are you counting only the top level inserts or also the ones generated by the recursive inserts? My question. How to avoid the bottleneck? First you need to find out what the bottleneck is. Parallel processing is possible in Postgres? How? To achieve parallel processing in PostgreSQL you need to use multiple connections. -Kevin
Re: [PERFORM] Performance issue with Insert
Hi, DB : POSTGRES 8.4.8 OS : Debian HD : SAS 10k rpm Shared_buffer is 4096 25 % of RAM , effective_cache is 8GB 75% of RAM After insert trigger is again calling 2 more trigger and insert record in another table depends on condition. with all trigger enable there are 8 insert and 32 updates(approx. update is depends on hierarchy) Hi, it's very difficult to give you reliable recommendations with this little info, but the triggers are obviously the bottleneck. We have no idea what queries are executed in them, but I guess there are some slow queries. Find out what queries are executed in the triggers, benchmark each of them and make them faster. Just don't forget that those SQL queries are executed as prepared statements, so they may behave a bit differently than plain queries. So use 'PREPARE' and 'EXPLAIN EXECUTE' to tune them. Plz explain multiple connections. Current scenario application server is sending all requests. PostgreSQL does not support parallel queries (i.e. a query distributed on multiple CPUs) so each query may use just a single CPU. If you're CPU bound (one CPU is 100% utilized but the other CPUs are idle), you can usually parallelize the workload on your own - just use multiple connections. But if you're using an application server and there are multiple connections used, this is not going to help you. How many connections are active at the same time? Are the CPUs idle or utilized? Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue with Insert
Hi, I have already checked all the statements present in the trigger, no one is taking more then 20 ms. I am using 8-Processor, Quad-Core Server ,CPU utilization is more then 90-95 % for all. (htop result) DB has 960 concurrent users. io : writing 3-4 MB per second or less (iotop result). Scenario : All insert are waiting for previous insert to complete. Cant we avoid this situation ? What is the max_connections postgresql support? Plz help -- Thanks regards, JENISH VYAS On Mon, Jun 27, 2011 at 6:32 PM, t...@fuzzy.cz wrote: Hi, DB : POSTGRES 8.4.8 OS : Debian HD : SAS 10k rpm Shared_buffer is 4096 25 % of RAM , effective_cache is 8GB 75% of RAM After insert trigger is again calling 2 more trigger and insert record in another table depends on condition. with all trigger enable there are 8 insert and 32 updates(approx. update is depends on hierarchy) Hi, it's very difficult to give you reliable recommendations with this little info, but the triggers are obviously the bottleneck. We have no idea what queries are executed in them, but I guess there are some slow queries. Find out what queries are executed in the triggers, benchmark each of them and make them faster. Just don't forget that those SQL queries are executed as prepared statements, so they may behave a bit differently than plain queries. So use 'PREPARE' and 'EXPLAIN EXECUTE' to tune them. Plz explain multiple connections. Current scenario application server is sending all requests. PostgreSQL does not support parallel queries (i.e. a query distributed on multiple CPUs) so each query may use just a single CPU. If you're CPU bound (one CPU is 100% utilized but the other CPUs are idle), you can usually parallelize the workload on your own - just use multiple connections. But if you're using an application server and there are multiple connections used, this is not going to help you. How many connections are active at the same time? Are the CPUs idle or utilized? Tomas
Re: [PERFORM] Performance issue with Insert
On Mon, Jun 27, 2011 at 9:22 AM, Jenish jenishv...@gmail.com wrote: Hi All, I am facing some performance issue with insert into some table. I am using postgres 8.4.x Table is having 3 before insert trigger and one after insert trigger. With all triggers enable it is inserting only 4-5 record per second. But if I disable after insert trigger it is able to insert 667 records per second. After insert trigger is recursive trigger. My question. How to avoid the bottleneck? Parallel processing is possible in Postgres? How? Please give you suggestion. this sounds like a coding issue -- to get to the bottom of this we are going to need to see the table and the triggers. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue with Insert
Dne 27.6.2011 17:58, Jenish napsal(a): Hi, I have already checked all the statements present in the trigger, no one is taking more then 20 ms. I am using 8-Processor, Quad-Core Server ,CPU utilization is more then 90-95 % for all. (htop result) So all cores are 95% utilized? That means you're CPU bound and you need to fix that somehow. How much of that belongs to postgres? Are there other processes consuming significant portion of CPU? And what do you mean by 'utilized'? Does that mean user/sys time, or wait time? DB has 960 concurrent users. Whad does that mean? Does that mean there are 960 active connections? io : writing 3-4 MB per second or less (iotop result). Sequential or random? Post a few lines of 'iostat -x 1' and a few lines of 'vmstat 1' (collected when the database is busy). Scenario : All insert are waiting for previous insert to complete. Cant we avoid this situation ? What do you mean by 'previous'? Does that mean another insert in the same session (connection), or something performed in another session? What is the max_connections postgresql support? That limits number of background processes - each connection is served by a dedicated posgres process. You can see that in top / ps output. High values usually mean you need some kind of pooling (you probably already have one as you're using application server). And if the connections are really active (doing something all the time), this should not be significantly higher than the number of cores. See, you have 8 cores, which means 8 seconds of CPU time each second. No matter how many connections you allow, you still have just those 8 seconds. So if you need to perform 100x something that takes 1 second, you need to spend 100 seconds of CPU time. So with those 8 cores, you can do that in about 12,5 seconds. Actually if you create too many connections, you'll notice it takes much more - there's an overhead with process management, context switching, locking etc. regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue with Insert
Dne 27.6.2011 17:01, Jenish napsal(a): Hi, DB : POSTGRES 8.4.8 OS : Debian HD : SAS 10k rpm Shared_buffer is 4096 25 % of RAM , effective_cache is 8GB 75% of RAM How much data are we talking about? Does that fit into the shared buffers or is it significantly larger? Do the triggers touch the whole database or just a small part of it (active part)? regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue with Insert
Hi , This server is the dedicated database server. And I am testing the limit for the concurrent active users. When I am running my test for 400 concurrent user ie. Active connection. I am getting good performance but when I am running the same the same test for 950 concurrent users I am getting very bad performance. Scenario : All insert are waiting for previous insert to complete. I don’t know whether it is the same session or different session. DB id huge but Triggers are not touching the whole database. I’ll provide the result set of vmstat and iostat tomorrow. -- Thanks regards, JENISH VYAS On Mon, Jun 27, 2011 at 10:48 PM, Tomas Vondra t...@fuzzy.cz wrote: Dne 27.6.2011 17:01, Jenish napsal(a): Hi, DB : POSTGRES 8.4.8 OS : Debian HD : SAS 10k rpm Shared_buffer is 4096 25 % of RAM , effective_cache is 8GB 75% of RAM How much data are we talking about? Does that fit into the shared buffers or is it significantly larger? Do the triggers touch the whole database or just a small part of it (active part)? regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue with Insert
Dne 27.6.2011 22:14, Jenish napsal(a): And I am testing the limit for the concurrent active users. When I am running my test for 400 concurrent user ie. Active connection. I am getting good performance but when I am running the same the same test for 950 concurrent users I am getting very bad performance. This is typical behaviour - the performance is good up until some point, then it degrades much faster. Why do you even need such number of connections? Does that really improve performance (e.g. how many inserts do you do with 100 and 400 connections)? Such number of active connections is not going to give you any advantage I guess ... regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue with Insert
Jenish jenishv...@gmail.com wrote: This server is the dedicated database server. And I am testing the limit for the concurrent active users. When I am running my test for 400 concurrent user ie. Active connection. I am getting good performance but when I am running the same the same test for 950 concurrent users I am getting very bad performance. To serve a large number of concurrent users you need to use a connection pooler which limits the number of database connections to a small number. Typically the most effective number of database connections is somewhere between the number of actual cores on your server and twice that plus the number of disk drives. (It depends on the details of your hardware and your load.) The connection pooler should queue requests which arrive when all database connections are busy and release them for execution as transactions complete. Restricting the active database connections in this way improves both throughput and latency and will allow you to serve a much larger number of users without getting into bad performance; and when you do hit the wall performance will degrade more gracefully. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance issue in the fields.
I have created two tables. In the first table i am using many fields to store 3 address. as well as in b table, i am using array data type to store 3 address. is there any issue would face in performance related things which one will cause the performance issue. The array is interesting : - if you put a gist index on it and do searches like array contains values X and Y and Z, gist index has a some special optimizations for this - if you might store a variable number of integers, and for some reason you don't want a normalized one-line-per-value approach -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance issue in the fields.
On Mon, Feb 14, 2011 at 5:36 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Hello 2011/2/14 dba dba...@gmail.com: create table a( address1 int,address2 int,address3 int) create table b(address int[3]) I have created two tables. In the first table i am using many fields to store 3 address. as well as in b table, i am using array data type to store 3 address. is there any issue would face in performance related things which one will cause the performance issue. yes, there is. Planner can not to work well with foreign keys stored in array. also the array variant is going to be bigger on disk. This is because as fields, all the important info about the fields is stored in the table header (inside the system catalogs). But with the array, various header information specific to the array has to be stored with each row. This is largely due to some questionable design decisions made in early array implementation that we are stuck with :-). merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance issue in the fields.
Hello 2011/2/14 dba dba...@gmail.com: create table a( address1 int,address2 int,address3 int) create table b(address int[3]) I have created two tables. In the first table i am using many fields to store 3 address. as well as in b table, i am using array data type to store 3 address. is there any issue would face in performance related things which one will cause the performance issue. yes, there is. Planner can not to work well with foreign keys stored in array. Regards Pavel Stehule -- View this message in context: http://postgresql.1045698.n5.nabble.com/performance-issue-in-the-fields-tp3384307p3384307.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue - 2 linux machines, identical configs, different performance
2009/6/17 Mark Steben mste...@autorevenue.com: A few details – I can always provide more Could you send: 1. Exact text of query. 2. EXPLAIN ANALYZE output on each machine. 3. VACUUM VERBOSE output on each machine, or at least the last 10 lines. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue - 2 linux machines, identical configs, different performance
We have two machines. Both running Linux Redhat, both running postgres 8.2.5. Both have nearly identical 125 GB databases. In fact we use PITR Recovery to Replicate from one to the other. I have to ask the obvious question. Do you regularly analyze the machine you replicate too? Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue - 2 linux machines, identical configs, different performance
Yes I analyze after each replication. Mark Steben│Database Administrator│ @utoRevenue-R- Join the Revenue-tion 95 Ashley Ave. West Springfield, MA., 01089 413-243-4800 x1512 (Phone) │ 413-732-1824 (Fax) @utoRevenue is a registered trademark and a division of Dominion Enterprises -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Dave Dutcher Sent: Wednesday, June 17, 2009 1:39 PM To: 'Mark Steben'; pgsql-performance@postgresql.org Cc: 'Rich Garabedian' Subject: Re: [PERFORM] Performance issue - 2 linux machines, identical configs, different performance We have two machines. Both running Linux Redhat, both running postgres 8.2.5. Both have nearly identical 125 GB databases. In fact we use PITR Recovery to Replicate from one to the other. I have to ask the obvious question. Do you regularly analyze the machine you replicate too? Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue using Tsearch2
On 2008-02-05 Viviane Lestic wrote: QUERY PLAN - Sort (cost=2345.54..2345.58 rows=16 width=308) (actual time=270638.774..270643.142 rows=7106 loops=1) Sort Key: rank(tab_ocr.zoneindex_test, q.q) - Nested Loop (cost=80.04..2345.22 rows=16 width=308) (actual time=40886.553..270619.730 rows=7106 loops=1) - Nested Loop (cost=80.04..1465.76 rows=392 width=308) (actual time=38209.193..173932.313 rows=272414 loops=1) - Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual time=0.006..0.007 rows=1 loops=1) - Bitmap Heap Scan on tab_ocr (cost=80.04..1460.85 rows=392 width=276) (actual time=38209.180..173507.052 rows=272414 loops=1) Filter: (tab_ocr.zoneindex_test @@ q.q) - Bitmap Index Scan on zoneindex_test_idx (cost=0.00..79.94 rows=392 width=0) (actual time=38204.261..38204.261 rows=283606 loops=1) Index Cond: (tab_ocr.zoneindex_test @@ q.q) - Index Scan using tab_chemin_label_index on tab_chemin (cost=0.00..2.23 rows=1 width=4) (actual time=0.036..0.036 rows=0 loops=272414) Index Cond: (tab_ocr.idstruct = tab_chemin.label) Filter: ((chemin)::text ~~ '%;2;%'::text) Total runtime: 270647.946 ms ** Could someone help me analyze this problem? Your planner estimates are way off. Try increasing the statistics target for the columns used in this query and re-analyze the tables after doing so. Regards Ansgar Wiechers -- The Mac OS X kernel should never panic because, when it does, it seriously inconveniences the user. --http://developer.apple.com/technotes/tn2004/tn2118.html ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance issue using Tsearch2
On Feb 5, 2008 12:47 PM, Viviane Lestic [EMAIL PROTECTED] wrote: Could someone help me analyze this problem? I don't manage to see if the problem comes from bad tsearch tuning, postgresql configuration, or something else... Can you try to replace zoneindex_test @@ q with zoneindex_test @@ to_tsquery('partir')? Increasing the statistics for zoneindex_test may be a good idea too (see ALTER TABLE ... ALTER COLUMN doc). I'm surprised you have the word partir in so many documents? Do you use real data? -- Guillaume ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Performance issue using Tsearch2
2008/2/5, Ansgar -59cobalt- Wiechers wrote: Your planner estimates are way off. Try increasing the statistics target for the columns used in this query and re-analyze the tables after doing so. I first set STATISTICS to 1000 for column zoneindex_test and saw no significant improvement (with a vacuum full analyze in between). Then I set default_statistics_target to 1000: there is now an improvement, but the overall time is still way too long... (and the estimated costs didn't change...) Here are the results with default_statistics_target set to 1000: explain analyze SELECT idstruct, headline(zonetext, q), rank(zoneindex_test, q) FROM tab_ocr, tab_chemin, to_tsquery('partir') AS q WHERE tab_chemin.chemin like '%;2;%' AND tab_ocr.idstruct = tab_chemin.label AND zoneindex_test @@ q ORDER BY rank(zoneindex_test, q) DESC; QUERY PLAN - Sort (cost=2345.30..2345.32 rows=8 width=327) (actual time=229913.715..229918.172 rows=7106 loops=1) Sort Key: rank(tab_ocr.zoneindex_test, q.q) - Nested Loop (cost=80.04..2345.18 rows=8 width=327) (actual time=28159.626..229892.957 rows=7106 loops=1) - Nested Loop (cost=80.04..1465.76 rows=392 width=327) (actual time=26084.558..130979.395 rows=272414 loops=1) - Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual time=0.006..0.007 rows=1 loops=1) - Bitmap Heap Scan on tab_ocr (cost=80.04..1460.85 rows=392 width=295) (actual time=26084.544..130562.220 rows=272414 loops=1) Filter: (tab_ocr.zoneindex_test @@ q.q) - Bitmap Index Scan on zoneindex_test_idx (cost=0.00..79.94 rows=392 width=0) (actual time=26073.315..26073.315 rows=283606 loops=1) Index Cond: (tab_ocr.zoneindex_test @@ q.q) - Index Scan using tab_chemin_label_index on tab_chemin (cost=0.00..2.23 rows=1 width=4) (actual time=0.040..0.040 rows=0 loops=272414) Index Cond: (tab_ocr.idstruct = tab_chemin.label) Filter: ((chemin)::text ~~ '%;2;%'::text) Total runtime: 229922.864 ms 2008/2/5, Guillaume Smet wrote: Can you try to replace zoneindex_test @@ q with zoneindex_test @@ to_tsquery('partir')? The improvement seems negligible (with default_statistics_target back to 10, its default value): explain analyze SELECT idstruct, headline(zonetext, q), rank(zoneindex_test, q) FROM tab_ocr, tab_chemin, to_tsquery('partir') AS q WHERE tab_chemin.chemin like '%;2;%' AND tab_ocr.idstruct = tab_chemin.label AND zoneindex_test @@ to_tsquery('partir') ORDER BY rank(zoneindex_test, q) DESC; QUERY PLAN - Sort (cost=4358.91..4358.95 rows=16 width=308) (actual time=266489.667..266494.132 rows=7106 loops=1) Sort Key: rank(tab_ocr.zoneindex_test, q.q) - Nested Loop (cost=80.04..4358.59 rows=16 width=308) (actual time=42245.881..266469.644 rows=7106 loops=1) - Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual time=0.007..0.008 rows=1 loops=1) - Nested Loop (cost=80.04..4358.34 rows=16 width=276) (actual time=42239.570..178496.761 rows=7106 loops=1) - Bitmap Heap Scan on tab_ocr (cost=80.04..1461.83 rows=392 width=276) (actual time=38317.423..174188.779 rows=272414 loops=1) Filter: (zoneindex_test @@ '''partir'''::tsquery) - Bitmap Index Scan on zoneindex_test_idx (cost=0.00..79.94 rows=392 width=0) (actual time=38289.289..38289.289 rows=283606 loops=1) Index Cond: (zoneindex_test @@ '''partir'''::tsquery) - Index Scan using tab_chemin_label_index on tab_chemin (cost=0.00..7.38 rows=1 width=4) (actual time=0.014..0.014 rows=0 loops=272414) Index Cond: (tab_ocr.idstruct = tab_chemin.label) Filter: ((chemin)::text ~~ '%;2;%'::text) Total runtime: 266498.704 ms Increasing the statistics for zoneindex_test may be a good idea too (see ALTER TABLE ... ALTER COLUMN doc). I posted the results above. I'm surprised you have the word partir in so many documents? Do you use real data? I'm using real data. The indexed documents are extracted from newspapers, and partir (and its derivates) is quite a common verb in the French language, so I'm not that surprised to see it show up in many documents. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Performance issue
Hi Guys, Following Tom Lane's advice I upgraded to 8.2, and that solved all my problems. :D Thank you so much for your input, I really appreciate it. Kind regards Willo van der Merwe ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Performance issue
In response to Willo van der Merwe [EMAIL PROTECTED]: Hi Guys, I have something odd. I have Gallery2 running on PostgreSQL 8.1, and recently I upgraded to 8.1.9-1.el4s1.1 (64bit). The issue here really is how do I get PostgreSQL to work with their horrible code. The queries they generate look something like : SELECT blah, blah FROM table1, table2 WHERE some relational stuff AND id IN (here a list of 42000+ IDs are listed) On the previous version (which I can't recall what it was, but it was a version 8.1) the queries executed fine, but suddenly now, these queries are taking up-to 4 minutes to complete. I am convinced it's the parsing/handling of the IN clause. It could, of course, be that the list has grown so large that it can't fit into a buffer anymore. For obvious reasons I can't run an EXPLAIN ANALYZE from a prompt. Those reasons are not obvious to me. The explain analyze output is going to be key to working this out -- unless it's something like your postgresql.conf isn't properly tuned. I vacuum and reindex the database daily. I'd prefer not to have to rewrite the code, so any suggestions would be very welcome. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance issue
Willo van der Merwe [EMAIL PROTECTED] writes: I have something odd. I have Gallery2 running on PostgreSQL 8.1, and recently I upgraded to 8.1.9-1.el4s1.1 (64bit). The issue here really is how do I get PostgreSQL to work with their horrible code. The queries they generate look something like : SELECT blah, blah FROM table1, table2 WHERE some relational stuff AND id IN (here a list of 42000+ IDs are listed) On the previous version (which I can't recall what it was, but it was a version 8.1) the queries executed fine, but suddenly now, these queries are taking up-to 4 minutes to complete. I am convinced it's the parsing/handling of the IN clause. You're wrong about that, because we have not done anything to change IN planning in 8.1.x. You might need to re-ANALYZE or something; it sounds to me more like the planner has changed strategies in the wrong direction. FWIW, 8.2 should be vastly more efficient than 8.1 for this sort of query --- any chance of an upgrade? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Performance issue with 8.2.3 - C application
Karl Denninger skrev: I've got an interesting issue here that I'm running into with 8.2.3 This is an application that has run quite well for a long time, and has been operating without significant changes (other than recompilation) since back in the early 7.x Postgres days. But now we're seeing a LOT more load than we used to with it, and suddenly, we're seeing odd performance issues. It APPEARS that the problem isn't query performance per-se. That is, while I can find a few processes here and there in a run state when I look with a PS, I don't see them consistently churning. But here's the query that has a habit of taking the most time select forum, * from post where toppost = 1 and (replied (select lastview from forumlog where login='theuser' and forum=post.forum and number is null)) is not false AND (replied (select lastview from forumlog where login='theuser' and forum=post.forum and number=post.number)) is not false order by pinned desc, replied desc offset 0 limit 20 Since I can do little to help you with anything else, here is a little help from a guy with a hammer. It seems you may be able to convert the subqueries into a left join. Not sure whether this helps, nor whether I got some bits of the logic wrong, but something like this might help the planner find a better plan: SELECT forum, * FROM post LEFT JOIN forumlog ON post.forum = forumlog.forum AND forumlog.login = 'theuser' AND (post.number = forumlog.number OR forumlog.number IS NULL) AND post.replied = lastview WHERE forumlog.forum IS NULL AND forum.toppost = 1 ORDER BY pinned DESC, replied DESC OFFSET 0 LIMIT 20 ; Nis ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance issue with 8.2.3 - C application
Karl Denninger [EMAIL PROTECTED] writes: Not sure where to start here. It appears that I'm CPU limited and the problem may be that this is a web-served application that must connect to the Postgres backend for each transaction, perform its queries, and then close the connection down - in other words the load may be coming not from Postgres but rather from places I can't fix at the application layer (e.g. fork() overhead, etc). The DBMS and Apache server are on the same machine, so there's no actual network overhead involved. If that's the case the only solution is to throw more hardware at it. I can do that, but before I go tossing more CPU at the problem I'd like to know I'm not just wasting money. I know you found the proximate cause of your current problems, but it sounds like you have something else you should consider looking at here. There are techniques for avoiding separate database connections for each request. If you're using Apache you can reduce the CPU usage a lot by writing your module as an Apache module instead of a CGI or whatever type of program it is now. Then your module would live as long as a single Apache instance which you can configure to be hours or days instead of a single request. It can keep around the database connection for that time. If that's impossible there are still techniques that can help. You can set up PGPool or PGBouncer or some other connection aggregating tool to handle the connections. This is a pretty low-impact change which shouldn't require making any application changes aside from changing the database connection string. Effectively this is a just a connection pool that lives in a separate process. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance issue with 8.2.3 - C application
Looks like that was the problem - got a day under the belt now with the 8.2.4 rev and all is back to normal. Karl Denninger ([EMAIL PROTECTED]) http://www.denninger.net Karl Denninger wrote: Aha! BIG difference. I won't know for sure until the biz day tomorrow but the first blush look is that it makes a HUGE difference in system load, and I no longer have the stats collector process on the top of the top list.. Karl Denninger ([EMAIL PROTECTED]) http://www.denninger.net Tom Lane wrote: Karl Denninger [EMAIL PROTECTED] writes: Hm. now that's interesting. Stats collector IS accumulating quite a bit of runtime. me thinks its time to go grab 8.2.4. I think Merlin might have nailed it --- the stats collector bug is that it tries to write out the stats file way more often than it should. So any excessive userland CPU time you see is just the tip of the iceberg compared to the system and I/O costs incurred. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly %SPAMBLOCK-SYS: Matched [hub.org+], message ok %SPAMBLOCK-SYS: Matched [EMAIL PROTECTED], message ok ---(end of broadcast)--- TIP 6: explain analyze is your friend %SPAMBLOCK-SYS: Matched [hub.org+], message ok %SPAMBLOCK-SYS: Matched [EMAIL PROTECTED], message ok ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Performance issue with 8.2.3 - C application
Yeah, the problem doesn't appear to be there. As I said, if I look at the PS of the system when its bogging, there aren't a whole bunch of processes stuck doing these, so while this does take a second or two to come back, that's not that bad. Its GENERAL performance that just bites - the system is obviously out of CPU, but what I can't get a handle on is WHY. It does not appear to be accumulating large amounts of runtime in processes I can catch, but the load average is quite high. This is why I'm wondering if what I'm taking here is a hit on the fork/exec inside the portmaster, in the setup internally in there, in the IPC between my process via libPQ, etc - and how I can profile what's going on. Karl Denninger ([EMAIL PROTECTED]) http://www.denninger.net Tom Lane wrote: Karl Denninger [EMAIL PROTECTED] writes: But here's the query that has a habit of taking the most time select forum, * from post where toppost = 1 and (replied (select lastview from forumlog where login='theuser' and forum=post.forum and number is null)) is not false AND (replied (select lastview from forumlog where login='theuser' and forum=post.forum and number=post.number)) is not f alse order by pinned desc, replied desc offset 0 limit 20 Did that ever perform well for you? It's the sub-selects that are likely to hurt ... in particular, - Index Scan using post_top on post (cost=0.00..57266.37 rows=113 width=757) Index Cond: (toppost = 1) Filter: (((replied (subplan)) IS NOT FALSE) AND ((replied (subplan)) IS NOT FALSE)) versus Index Scan using post_top on post (cost=0.00..632.03 rows=1013 width=11) Index Cond: (toppost = 1) The planner thinks that the two subplan filter conditions will eliminate about 90% of the rows returned by the bare indexscan (IIRC this is purely a rule of thumb, not based on any statistics) and that testing them 1013 times will add over 5 cost units to the basic indexscan. That part I believe --- correlated subqueries are expensive. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings %SPAMBLOCK-SYS: Matched [hub.org+], message ok
Re: [PERFORM] Performance issue with 8.2.3 - C application
On 7/25/07, Karl Denninger [EMAIL PROTECTED] wrote: Yeah, the problem doesn't appear to be there. As I said, if I look at the PS of the system when its bogging, there aren't a whole bunch of processes stuck doing these, so while this does take a second or two to come back, that's not that bad. Its GENERAL performance that just bites - the system is obviously out of CPU, but what I can't get a handle on is WHY. It does not appear to be accumulating large amounts of runtime in processes I can catch, but the load average is quite high. 8.2.3 has the 'stats collector bug' (fixed in 8.2.4) which increased load in high concurrency conditions. on a client's machine after patching the postmaster load drop from the 4-5 range to 1-2 range on a 500 tps server. maybe this is biting you? symptoms are high load avg and high cpu usage of stats collector process. merlin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance issue with 8.2.3 - C application
Hm. now that's interesting. Stats collector IS accumulating quite a bit of runtime. me thinks its time to go grab 8.2.4. Karl Denninger ([EMAIL PROTECTED]) http://www.denninger.net Merlin Moncure wrote: On 7/25/07, Karl Denninger [EMAIL PROTECTED] wrote: Yeah, the problem doesn't appear to be there. As I said, if I look at the PS of the system when its bogging, there aren't a whole bunch of processes stuck doing these, so while this does take a second or two to come back, that's not that bad. Its GENERAL performance that just bites - the system is obviously out of CPU, but what I can't get a handle on is WHY. It does not appear to be accumulating large amounts of runtime in processes I can catch, but the load average is quite high. 8.2.3 has the 'stats collector bug' (fixed in 8.2.4) which increased load in high concurrency conditions. on a client's machine after patching the postmaster load drop from the 4-5 range to 1-2 range on a 500 tps server. maybe this is biting you? symptoms are high load avg and high cpu usage of stats collector process. merlin %SPAMBLOCK-SYS: Matched [google.com+], message ok %SPAMBLOCK-SYS: Matched [EMAIL PROTECTED], message ok ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance issue with 8.2.3 - C application
Karl Denninger [EMAIL PROTECTED] writes: Hm. now that's interesting. Stats collector IS accumulating quite a bit of runtime. me thinks its time to go grab 8.2.4. I think Merlin might have nailed it --- the stats collector bug is that it tries to write out the stats file way more often than it should. So any excessive userland CPU time you see is just the tip of the iceberg compared to the system and I/O costs incurred. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance issue with 8.2.3 - C application
Aha! BIG difference. I won't know for sure until the biz day tomorrow but the first blush look is that it makes a HUGE difference in system load, and I no longer have the stats collector process on the top of the top list.. Karl Denninger ([EMAIL PROTECTED]) http://www.denninger.net Tom Lane wrote: Karl Denninger [EMAIL PROTECTED] writes: Hm. now that's interesting. Stats collector IS accumulating quite a bit of runtime. me thinks its time to go grab 8.2.4. I think Merlin might have nailed it --- the stats collector bug is that it tries to write out the stats file way more often than it should. So any excessive userland CPU time you see is just the tip of the iceberg compared to the system and I/O costs incurred. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly %SPAMBLOCK-SYS: Matched [hub.org+], message ok %SPAMBLOCK-SYS: Matched [EMAIL PROTECTED], message ok ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] performance issue with a specific query
On Thu, 2006-07-27 at 09:23, Eliott wrote: Hi! I hope I'm sending my question to the right list, please don't flame if it's the wrong one. I have noticed that while a query runs in about 1.5seconds on a 8.xx version postgresql server on our 7.4.13 it takes around 15-20 minutes. Since we are using RHEL4 on our server we are stuck with 7.4.13. The enormous time difference between the different builds drives me crazy. Can you please help me identifying the bottleneck or suggest anything to improve the dismal performance. You are absolutely on the right list. A couple of points. 1: Which 8.xx? 8.0.x or 8.1.x? 8.1.x is literally light years ahead of 7.4 in terms of performance. 8.0 is somewhere between them. The performance difference you're seeing is pretty common. 2: Looking at your query, there are places where you're joining on things like date_trunc(...). In 7.4 the database will not, and cannot use a normal index on the date field for those kinds of things. It can, however, use a funtional index on some of them. Try creating an index on date_trunc('day',yourfieldhere) and see if that helps. 3: You are NOT Stuck on 7.4.13. I have a RHEL server that will be running 8.1.4 or so pretty soon as a dataware house. It may get updated to RHEL4, may not. You can either compile from the .tar.[gz|bz2] files or download the PGDG rpms for your distro. 4: You are fighting an uphill battle. There were a LOT of improvements made all over in the march from 7.4 to 8.1. Not all of them were simple planner tweaks and shortcuts, but honest to goodness changes to the way things happen. No amount of tuning can make 7.4 run as fast as 8.1. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] performance issue with a specific query
On 7/27/06, Eliott [EMAIL PROTECTED] wrote: Hi! I hope I'm sending my question to the right list, please don't flame if it's the wrong one. I have noticed that while a query runs in about 1.5seconds on a 8.xx version postgresql server on our 7.4.13 it takes around 15-20 minutes. Since we are using RHEL4 on our server we are stuck with 7.4.13. The enormous time difference between the different builds drives me crazy. Can you please help me identifying the bottleneck or suggest anything to improve the dismal performance. The query is the following: try turning off genetic query optimization. regarding the rhel4 issue...does rhel not come with a c compiler? :) merlin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] performance issue with a specific query
try turning off genetic query optimization. regarding the rhel4 issue...does rhel not come with a c compiler? :) Enterprises are not going to compile. They are going to accept the latest support by vendor release. Redhat has a tendency to be incredibly stupid about this particular area of their packaging. Joshua D. Drake merlin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] performance issue with a specific query
Joshua D. Drake wrote: try turning off genetic query optimization. regarding the rhel4 issue...does rhel not come with a c compiler? :) Enterprises are not going to compile. They are going to accept the latest support by vendor release. Redhat has a tendency to be incredibly stupid about this particular area of their packaging. Stupid how? -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] performance issue with a specific query
Alvaro Herrera [EMAIL PROTECTED] writes: Joshua D. Drake wrote: Enterprises are not going to compile. They are going to accept the latest support by vendor release. Redhat has a tendency to be incredibly stupid about this particular area of their packaging. Stupid how? Red Hat feels (apparently accurately, judging by their subscription revenue ;-)) that what RHEL customers want is a platform that's stable over multi-year application lifespans. So major incompatible changes in the system software are not looked on with favor. That's why RHEL4 is still shipping PG 7.4.*. You can call it a stupid policy if you like, but it's hard to argue with success. However, there will be an RH-supported release of PG 8.1.* as an optional add-on for RHEL4. Real Soon Now, I hope --- the release date has been pushed back a couple times already. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] PERFORMANCE ISSUE ODBC x LIBPQ C++ Application
Hi ! My company is evaluating to compatibilizate our system (developed in C++) to PostgreSQL. Our programmer made a lot of tests and he informed me that the performance using ODBC is very similar than using libpq, even with a big number of simultaneous connections/queries. Of course that for us is simpler use ODBC because will be easier to maintan as we already support a lot of other databases using ODBC (MySQL, DB2, etc). Someone already had this experience? What are the key benefits using libpq insted of ODBC ? Our application have a heavy load and around 150 concorrent users. The ODBC driver for postgresql implements its own protocol stack. Unfortunately, it is still on protocol revision 2 (out of 3). Also, IMO libpq is a little better tested and durable than the odbc driver. This naturally follows from the fact that libpq is more widely used and more actively developed than odbc. If you are heavily C++ invested you can consider wrapping libpq yourself if you want absolute maximum performance. If you happen to be developing on Borland platform give strong consideration to Zeos connection library which is very well designed (it wraps libpq). You might want to consider posting your question to the odbc list. Merlin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] PERFORMANCE ISSUE ODBC x LIBPQ C++ Application
i would take a peek at psqlodbc-8.0 drivers .. i wouldn't battle with other version you might find such as (unixodbc ones) -elz -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Merlin Moncure Sent: 27 juin 2005 10:29 To: grupos Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] PERFORMANCE ISSUE ODBC x LIBPQ C++ Application Hi ! My company is evaluating to compatibilizate our system (developed in C++) to PostgreSQL. Our programmer made a lot of tests and he informed me that the performance using ODBC is very similar than using libpq, even with a big number of simultaneous connections/queries. Of course that for us is simpler use ODBC because will be easier to maintan as we already support a lot of other databases using ODBC (MySQL, DB2, etc). Someone already had this experience? What are the key benefits using libpq insted of ODBC ? Our application have a heavy load and around 150 concorrent users. The ODBC driver for postgresql implements its own protocol stack. Unfortunately, it is still on protocol revision 2 (out of 3). Also, IMO libpq is a little better tested and durable than the odbc driver. This naturally follows from the fact that libpq is more widely used and more actively developed than odbc. If you are heavily C++ invested you can consider wrapping libpq yourself if you want absolute maximum performance. If you happen to be developing on Borland platform give strong consideration to Zeos connection library which is very well designed (it wraps libpq). You might want to consider posting your question to the odbc list. Merlin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Performance issue
Peter, One possibility is to drop all the indexes, do the insert and re-add the indexes. The more indexes that exist and the more rows that exist, the more costly the insert. Regards, Joseph At 05:48 PM 9/24/2003 +1200, peter wrote: Hello, I have been trying to get my Postgres database to do faster inserts. The environment is basically a single user situation. The part that I would like to speed up is when a User copys a Project. A Project consists of a number of Rooms(say 60). Each room contains a number of items. A project will contain say 20,000 records. Anyway the copying process gets slower and slower, as more projects are added to the database. My statistics(Athlon 1.8Ghz) 20,000 itemsTakes on average 0.078seconds/room 385,000 items Takes on average .11seconds/room 690,000 items takes on average .270seconds/room 1,028,000 items Takes on average .475seconds/room As can be seen the time taken to process each room increases. A commit occurs when a room has been copied. The hard drive is not being driven very hard. The hard drive light only flashes about twice a second when there are a million records in the database. I thought that the problem could have been my plpgsql procedure because I assume the code is interpreted. However I have just rewriten the code using straight sql(with some temp fields), and the times turn out to be almost exactly the same as the plpgsql version. The read speed for the Application is fine. The sql planner seems to be doing a good job. There has been only one problem that I have found with one huge select, which was fixed by a cross join. I am running Red hat 8. Some of my conf entries that I have changed follow shared_buffers = 3700 effective_cache_size = 4000 sort_mem = 32168 Are the increasing times reasonable? The times themselves might look slow, but thats because there are a number of tables involved in a Copy I can increase the shared buffer sizes above 32M, but would this really help? TIA peter Mcgregor ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Performance issue
get rid of any unnecessary indexes? i've found that droping indexes and re-creating them isn't usually worth the effort mount the disk with the noatime option which saves you the time involved in updating the last access time on files make sure you're doing all the inserts in one transaction.. wrapping a bunch of INSERTS in BEGIN COMMIT speeds them up loads. At 05:48 PM 9/24/2003 +1200, peter wrote: Hello, I have been trying to get my Postgres database to do faster inserts. The environment is basically a single user situation. The part that I would like to speed up is when a User copys a Project. A Project consists of a number of Rooms(say 60). Each room contains a number of items. A project will contain say 20,000 records. Anyway the copying process gets slower and slower, as more projects are added to the database. My statistics(Athlon 1.8Ghz) 20,000 itemsTakes on average 0.078seconds/room 385,000 items Takes on average .11seconds/room 690,000 items takes on average .270seconds/room 1,028,000 items Takes on average .475seconds/room As can be seen the time taken to process each room increases. A commit occurs when a room has been copied. The hard drive is not being driven very hard. The hard drive light only flashes about twice a second when there are a million records in the database. I thought that the problem could have been my plpgsql procedure because I assume the code is interpreted. However I have just rewriten the code using straight sql(with some temp fields), and the times turn out to be almost exactly the same as the plpgsql version. The read speed for the Application is fine. The sql planner seems to be doing a good job. There has been only one problem that I have found with one huge select, which was fixed by a cross join. I am running Red hat 8. Some of my conf entries that I have changed follow shared_buffers = 3700 effective_cache_size = 4000 sort_mem = 32168 Are the increasing times reasonable? The times themselves might look slow, but thats because there are a number of tables involved in a Copy I can increase the shared buffer sizes above 32M, but would this really help? TIA peter Mcgregor ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Performance issue
My statistics(Athlon 1.8Ghz) 20,000 itemsTakes on average 0.078seconds/room 385,000 items Takes on average .11seconds/room 690,000 items takes on average .270seconds/room 1,028,000 items Takes on average .475seconds/room [snip] I am running Red hat 8. Some of my conf entries that I have changed follow shared_buffers = 3700 effective_cache_size = 4000 sort_mem = 32168 Have you twiddled with your wal_buffers or checkpoint_segments? Might be something to look at. -sc -- Sean Chittenden ---(end of broadcast)--- TIP 8: explain analyze is your friend