Re: Specific query taking time to process
On Mon, Dec 09, 2019 at 10:39:38PM +, Fahiz Mohamed wrote: > Thank you very much for your prompt responses. > > I have analysed more regarding this and found the long running query. > > I ran "explain analyse" on this query and I got following result. (We have 2 > identical DB instances and they consist of same data. Instane 1 took 20+ > second to process and instance 2 took less than a second) > > Instance 1: (This is used by regular User - More than 600,000 request a day) > - The result is same even when there is no user in the server. > EXPLAIN ANALYZE > Nested Loop Semi Join (cost=998547.53..3319573.36 rows=1 width=8) (actual > time=10568.217..22945.971 rows=22 loops=1) > -> Hash Semi Join (cost=998546.96..3319545.95 rows=41 width=16) (actual > time=10568.198..22945.663 rows=22 loops=1) > Hash Cond: (node.id = prop.node_id) > -> Bitmap Heap Scan on alf_node node (cost=995009.97..3303978.85 > rows=4565737 width=8) (actual time=3304.419..20465.551 rows=41109751 loops=1) >Recheck Cond: ((store_id = 6) AND (type_qname_id = 240)) >Rows Removed by Index Recheck: 54239131 >Filter: (NOT (hashed SubPlan 1)) >Rows Removed by Filter: 2816 >Heap Blocks: exact=24301 lossy=1875383 ... This is doing a bitmap scan on alf_node where the second is doing an Index Scan. Is alf_node well-correlated on the 2nd server on store_id or type_qname_id ? If you CLUSTER and ANALYZE on the 1st server, maybe it would perform similarly. (But, that could hurt other queries). > We are running Postgres 9.6.9 on Amazon RDS (db.m5.4xlarge instance) > With the fresh DB with the restore of the DATA without any indexes Search > query performs relatively quick and most of the time its less than a second. > But after 3 weeks of use of the DB it sudenly started to slowdown only for > this perticular query and it takes 20+ seconds to respond. If I do a restore > the DB again then it continues to work fine and the symptom pops out after 3 > weeks time. > Instance 2: (Only by testers - 250 request a day) > > Nested Loop Semi Join (cost=6471.94..173560841.08 rows=2 width=8) (actual > time=0.162..0.464 rows=17 loops=1) > -> Nested Loop (cost=6471.37..173560684.36 rows=45 width=16) (actual > time=0.154..0.387 rows=17 loops=1) > -> HashAggregate (cost=3508.15..3516.80 rows=865 width=8) (actual > time=0.041..0.047 rows=18 loops=1) > Group Key: prop.node_id > -> Index Only Scan using idx_alf_nprop_s on > alf_node_properties prop (cost=0.70..3505.99 rows=866 width=8) (actual > time=0.020..0.035 r > ows=18 loops=1) > Index Cond: ((qname_id = '242'::bigint) AND (string_value > = 'E292432'::text)) > Heap Fetches: 18 > -> Index Scan using alf_node_pkey on alf_node node > (cost=2963.22..200644.11 rows=1 width=8) (actual time=0.019..0.019 rows=1 > loops=18) > Index Cond: (id = prop.node_id) > Filter: ((type_qname_id <> 145) AND (store_id = 6) AND > (type_qname_id = 240) AND (NOT (SubPlan 1))) > Rows Removed by Filter: 0 > SubPlan 1
Re: Specific query taking time to process
On Mon, Dec 9, 2019 at 3:39 PM Fahiz Mohamed wrote: > I ran "explain analyse" on this query and I got following result. (We have > 2 identical DB instances and they consist of same data. Instane 1 took 20+ > second to process and instance 2 took less than a second) > > Instance 1: (This is used by regular User - More than 600,000 request a > day) - The result is same even when there is no user in the server. > > -> Bitmap Heap Scan on alf_node node (cost=995009.97..3303978.85 > rows=4565737 width=8) (actual time=3304.419..20465.551 rows=41109751 loops=1) > Recheck Cond: ((store_id = 6) AND (type_qname_id = 240)) > Rows Removed by Index Recheck: 54239131 > Filter: (NOT (hashed SubPlan 1)) > Rows Removed by Filter: 2816 > Heap Blocks: exact=24301 lossy=1875383 > > > Planning time: 0.639 ms > Execution time: 22946.036 ms > > https://paquier.xyz/postgresql-2/postgres-9-4-feature-highlight-lossyexact-pages-for-bitmap-heap-scan/ That seems like a lot of lossy blocks. As I understand it, that means the system didn't have enough work_mem to fit all the references to the individual rows which perhaps isn't surprising when it estimates it needs 4.5 million rows and ends up with 41 million. Do both DB instances have the same data? I ask because the two plans are rather different which makes me think that statistics about the data are not very similar. Are both configured the same, particularly for shared_buffers and work_mem, as well as the various planning cost parameters like random_page cost? If you can provide these plans again with explain( analyze, buffers ) this time? Did you check on the last time autovacuum ran in pg_stat_user_tables?
Re: unexpected result for wastedbytes query after vacuum full
On Tue, Dec 10, 2019 at 11:43 AM Guillaume Lelarge wrote: This query uses the column statistics to estimate bloat. AFAIK, json > columns don't have statistics, so the estimation can't be relied on (for > this specific table at least). > This was true prior to 9.5 (for xml at least, I don't know about json), but should not be true from that release onward. But still the difference between 74440704 and 74506240, this does seem to me to be straining at a gnat to swallow a camel. Cheers, Jeff
Re: unexpected result for wastedbytes query after vacuum full
Le ven. 6 déc. 2019 à 18:18, Mike Schanne a écrit : > Hi all, > > > > This question is somewhat related to my previous question: > > > https://www.postgresql.org/message-id/0871fcf35ceb4caa8a2204ca9c38e330%40USEPRDEX1.corp.kns.com > > > > I was attempting to measure the benefit of doing a VACUUM FULL on my > database. I was using the query found here: > > > > https://wiki.postgresql.org/wiki/Show_database_bloat > > > > However, I got an unexpected result in that the “wastedbytes” value > actually increased for some tables after doing the vacuum. > > > > Before VACUUM FULL: > > current_database | schemaname | tablename | tbloat | > wastedbytes | > iname | ibloat | wastedibytes > > > --++---++-+-++-- > > postgres | myschema | mytableA |1.1 | > 74440704 | myindex1 > |0.2 |0 > > postgres | myschema | mytableA |1.1 | > 74440704 | myindex2 > |0.2 |0 > > postgres | myschema | mytableA |1.1 | > 74440704 | myindex3 > |0.2 |0 > > postgres | myschema | mytableA |1.1 | > 74440704 | myindex4 > |0.2 |0 > > postgres | myschema | mytableB |1.0 | > 63324160 | myindex5 > |0.0 |0 > > ... > > After VACUUM FULL: > > current_database | schemaname | tablename | tbloat | > wastedbytes | > iname | ibloat | wastedibytes > > > --++---++-+-++-- > > postgres | myschema | mytableA |1.1 | > 74506240 | > myindex4 |0.2 > |0 > > postgres | myschema | mytableA |1.1 | > 74506240 | > myindex3 |0.2 > |0 > > postgres | myschema | mytableA |1.1 | > 74506240 | myindex2 > |0.2 |0 > > postgres | myschema | mytableA |1.1 | > 74506240 | > myindex1 |0.2 > |0 > > postgres | myschema | mytableB |1.0 | > 63332352 | > myindex5 |0.0 > |0 > > ... > > > > This is the schema for mytableA above: > > > > Column |Type | > Modifiers > > > ---+-+ > > colA | integer | not null default > nextval('myschema.myseq'::regclass) > > colB | integer | > > colC | integer | > > colD | timestamp without time zone | > > colE | json| > > colF | integer | > > colG | integer | > > > > I was wondering if the fact that we use a json column could be interfering > with the wastedbytes calculation. Can anyone explain how wastedbytes could > increase from a vacuum? > > > This query uses the column statistics to estimate bloat. AFAIK, json columns don't have statistics, so the estimation can't be relied on (for this specific table at least). -- Guillaume.
Re: Specific query taking time to process
Thank you very much for your prompt responses. I have analysed more regarding this and found the long running query. I ran "explain analyse" on this query and I got following result. (We have 2 identical DB instances and they consist of same data. Instane 1 took 20+ second to process and instance 2 took less than a second) Instance 1: (This is used by regular User - More than 600,000 request a day) - The result is same even when there is no user in the server. EXPLAIN ANALYZE Nested Loop Semi Join (cost=998547.53..3319573.36 rows=1 width=8) (actual time=10568.217..22945.971 rows=22 loops=1) -> Hash Semi Join (cost=998546.96..3319545.95 rows=41 width=16) (actual time=10568.198..22945.663 rows=22 loops=1) Hash Cond: (node.id = prop.node_id) -> Bitmap Heap Scan on alf_node node (cost=995009.97..3303978.85 rows=4565737 width=8) (actual time=3304.419..20465.551 rows=41109751 loops=1) Recheck Cond: ((store_id = 6) AND (type_qname_id = 240)) Rows Removed by Index Recheck: 54239131 Filter: (NOT (hashed SubPlan 1)) Rows Removed by Filter: 2816 Heap Blocks: exact=24301 lossy=1875383 -> Bitmap Index Scan on idx_alf_node_mdq (cost=0.00..646144.01 rows=20047144 width=0) (actual time=3232.067..3232.067 rows=44246360 loops=1) Index Cond: ((store_id = 6) AND (type_qname_id = 240)) SubPlan 1 -> Bitmap Heap Scan on alf_node_aspects aspect_1 (cost=2503.51..347403.58 rows=128379 width=8) (actual time=25.447..65.392 rows=5635 loops=1) Recheck Cond: (qname_id = 251) Heap Blocks: exact=40765 -> Bitmap Index Scan on fk_alf_nasp_qn (cost=0.00..2471.41 rows=128379 width=0) (actual time=18.835..18.835 rows=239610 loops=1) Index Cond: (qname_id = 251) -> Hash (cost=3526.11..3526.11 rows=871 width=8) (actual time=0.045..0.045 rows=23 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Index Only Scan using idx_alf_nprop_s on alf_node_properties prop (cost=0.70..3526.11 rows=871 width=8) (actual time=0.021..0.042 rows=23 loops=1) Index Cond: ((qname_id = '242'::bigint) AND (string_value = 'E292432'::text)) Heap Fetches: 23 -> Index Only Scan using alf_node_aspects_pkey on alf_node_aspects aspect (cost=0.57..2.01 rows=15 width=8) (actual time=0.011..0.011 rows=1 loops=22) Index Cond: ((node_id = node.id) AND (qname_id = 245)) Heap Fetches: 22 Planning time: 0.639 ms Execution time: 22946.036 ms Instance 2: (Only by testers - 250 request a day) Nested Loop Semi Join (cost=6471.94..173560841.08 rows=2 width=8) (actual time=0.162..0.464 rows=17 loops=1) -> Nested Loop (cost=6471.37..173560684.36 rows=45 width=16) (actual time=0.154..0.387 rows=17 loops=1) -> HashAggregate (cost=3508.15..3516.80 rows=865 width=8) (actual time=0.041..0.047 rows=18 loops=1) Group Key: prop.node_id -> Index Only Scan using idx_alf_nprop_s on alf_node_properties prop (cost=0.70..3505.99 rows=866 width=8) (actual time=0.020..0.035 r ows=18 loops=1) Index Cond: ((qname_id = '242'::bigint) AND (string_value = 'E292432'::text)) Heap Fetches: 18 -> Index Scan using alf_node_pkey on alf_node node (cost=2963.22..200644.11 rows=1 width=8) (actual time=0.019..0.019 rows=1 loops=18) Index Cond: (id = prop.node_id) Filter: ((type_qname_id <> 145) AND (store_id = 6) AND (type_qname_id = 240) AND (NOT (SubPlan 1))) Rows Removed by Filter: 0 SubPlan 1 -> Materialize (cost=2962.65..397912.89 rows=158204 width=8) (actual time=0.001..0.009 rows=85 loops=17) -> Bitmap Heap Scan on alf_node_aspects aspect_1 (cost=2962.65..396503.87 rows=158204 width=8) (actual time=0.021..0.082 rows= 85 loops=1) Recheck Cond: (qname_id = 251) Heap Blocks: exact=55 -> Bitmap Index Scan on fk_alf_nasp_qn (cost=0.00..2923.10 rows=158204 width=0) (actual time=0.015..0.015 rows=87 loops= 1) Index Cond: (qname_id = 251) -> Index Only Scan using alf_node_aspects_pkey on alf_node_aspects aspect (cost=0.57..34.32 rows=12 width=8) (actual time=0.004..0.004 rows=1 loop s=17) Index Cond: ((node_id = node.id) AND (qname_id = 245)) Heap Fetches: 17 Planning time: 0.623 ms Execution time: 0.540 ms Configurations are same in both servers. Please advise me on this. Is there any configuration specifically I need to look like “work_mem”, “Shared_buffers”, “checkpoint_segment”, “effective_cache_size”, “enable_seqscan” and “checkpoint_compression_target”? Thanks in advance. Fahiz On 9 Dec 2019, 19:03 +, Michael Lewis , wrote: > > > There is a specific search query I am running to get list of Documents > > > and their metadata from several table in the DB. > > > We are running