Re: Specific query taking time to process

2019-12-10 Thread Justin Pryzby
On Mon, Dec 09, 2019 at 10:39:38PM +, Fahiz Mohamed wrote:
> Thank you very much for your prompt responses.
> 
> I have analysed more regarding this and found the long running query.
> 
> I ran "explain analyse" on this query and I got following result. (We have 2 
> identical DB instances and they consist of same data. Instane 1 took 20+ 
> second to process and instance 2 took less than a second)
> 
> Instance 1: (This is used by regular User - More than 600,000 request a day) 
> - The result is same even when there is no user in the server.
> EXPLAIN ANALYZE
> Nested Loop Semi Join (cost=998547.53..3319573.36 rows=1 width=8) (actual 
> time=10568.217..22945.971 rows=22 loops=1)
>  -> Hash Semi Join (cost=998546.96..3319545.95 rows=41 width=16) (actual 
> time=10568.198..22945.663 rows=22 loops=1)
> Hash Cond: (node.id = prop.node_id)
> -> Bitmap Heap Scan on alf_node node (cost=995009.97..3303978.85 
> rows=4565737 width=8) (actual time=3304.419..20465.551 rows=41109751 loops=1)
>Recheck Cond: ((store_id = 6) AND (type_qname_id = 240))
>Rows Removed by Index Recheck: 54239131
>Filter: (NOT (hashed SubPlan 1))
>Rows Removed by Filter: 2816
>Heap Blocks: exact=24301 lossy=1875383
...

This is doing a bitmap scan on alf_node where the second is doing an Index Scan.
Is alf_node well-correlated on the 2nd server on store_id or type_qname_id ?
If you CLUSTER and ANALYZE on the 1st server, maybe it would perform similarly.
(But, that could hurt other queries).

> We are running Postgres 9.6.9 on Amazon RDS (db.m5.4xlarge instance)
> With the fresh DB with the restore of the DATA without any indexes Search 
> query performs relatively quick and most of the time its less than a second.
> But after 3 weeks of use of the DB it sudenly started to slowdown only for 
> this perticular query and it takes 20+ seconds to respond. If I do a restore 
> the DB again then it continues to work fine and the symptom pops out after 3 
> weeks time.

> Instance 2: (Only by testers - 250 request a day)
> 
> Nested Loop Semi Join  (cost=6471.94..173560841.08 rows=2 width=8) (actual 
> time=0.162..0.464 rows=17 loops=1)
>   ->  Nested Loop  (cost=6471.37..173560684.36 rows=45 width=16) (actual 
> time=0.154..0.387 rows=17 loops=1)
> ->  HashAggregate  (cost=3508.15..3516.80 rows=865 width=8) (actual 
> time=0.041..0.047 rows=18 loops=1)
>   Group Key: prop.node_id
>   ->  Index Only Scan using idx_alf_nprop_s on 
> alf_node_properties prop  (cost=0.70..3505.99 rows=866 width=8) (actual 
> time=0.020..0.035 r
> ows=18 loops=1)
> Index Cond: ((qname_id = '242'::bigint) AND (string_value 
> = 'E292432'::text))
> Heap Fetches: 18
> ->  Index Scan using alf_node_pkey on alf_node node  
> (cost=2963.22..200644.11 rows=1 width=8) (actual time=0.019..0.019 rows=1 
> loops=18)
>   Index Cond: (id = prop.node_id)
>   Filter: ((type_qname_id <> 145) AND (store_id = 6) AND 
> (type_qname_id = 240) AND (NOT (SubPlan 1)))
>   Rows Removed by Filter: 0
>   SubPlan 1




Re: Specific query taking time to process

2019-12-10 Thread Michael Lewis
On Mon, Dec 9, 2019 at 3:39 PM Fahiz Mohamed  wrote:

> I ran "explain analyse" on this query and I got following result. (We have
> 2 identical DB instances and they consist of same data. Instane 1 took 20+
> second to process and instance 2 took less than a second)
>
> Instance 1: (This is used by regular User - More than 600,000 request a
> day) - The result is same even when there is no user in the server.
>
>  -> Bitmap Heap Scan on alf_node node (cost=995009.97..3303978.85 
> rows=4565737 width=8) (actual time=3304.419..20465.551 rows=41109751 loops=1)
> Recheck Cond: ((store_id = 6) AND (type_qname_id = 240))
> Rows Removed by Index Recheck: 54239131
> Filter: (NOT (hashed SubPlan 1))
> Rows Removed by Filter: 2816
> Heap Blocks: exact=24301 lossy=1875383
>
>
> Planning time: 0.639 ms
> Execution time: 22946.036 ms
>
>
https://paquier.xyz/postgresql-2/postgres-9-4-feature-highlight-lossyexact-pages-for-bitmap-heap-scan/


That seems like a lot of lossy blocks. As I understand it, that means the
system didn't have enough work_mem to fit all the references to the
individual rows which perhaps isn't surprising when it estimates it needs
4.5 million rows and ends up with 41 million.

Do both DB instances have the same data? I ask because the two plans are
rather different which makes me think that statistics about the data are
not very similar. Are both configured the same, particularly for
shared_buffers and work_mem, as well as the various planning cost
parameters like random_page cost? If you can provide these plans again with
explain( analyze, buffers ) this time? Did you check on the last time
autovacuum ran in pg_stat_user_tables?


Re: unexpected result for wastedbytes query after vacuum full

2019-12-10 Thread Jeff Janes
On Tue, Dec 10, 2019 at 11:43 AM Guillaume Lelarge 
wrote:

This query uses the column statistics to estimate bloat. AFAIK, json
> columns don't have statistics, so the estimation can't be relied on (for
> this specific table at least).
>

This was true prior to 9.5 (for xml at least, I don't know about json), but
should not be true from that release onward.  But still the difference
between 74440704 and 74506240, this does seem to me to be straining at a
gnat to swallow a camel.

Cheers,

Jeff


Re: unexpected result for wastedbytes query after vacuum full

2019-12-10 Thread Guillaume Lelarge
Le ven. 6 déc. 2019 à 18:18, Mike Schanne  a écrit :

> Hi all,
>
>
>
> This question is somewhat related to my previous question:
>
>
> https://www.postgresql.org/message-id/0871fcf35ceb4caa8a2204ca9c38e330%40USEPRDEX1.corp.kns.com
>
>
>
> I was attempting to measure the benefit of doing a VACUUM FULL on my
> database.  I was using the query found here:
>
>
>
> https://wiki.postgresql.org/wiki/Show_database_bloat
>
>
>
> However, I got an unexpected result in that the “wastedbytes” value
> actually increased for some tables after doing the vacuum.
>
>
>
> Before VACUUM FULL:
>
> current_database |   schemaname   | tablename | tbloat |
> wastedbytes |
> iname  | ibloat | wastedibytes
>
>
> --++---++-+-++--
>
> postgres | myschema | mytableA  |1.1 |
> 74440704 | myindex1
> |0.2 |0
>
> postgres | myschema | mytableA  |1.1 |
> 74440704 | myindex2
> |0.2 |0
>
> postgres | myschema | mytableA  |1.1 |
> 74440704 | myindex3
>   |0.2 |0
>
> postgres | myschema | mytableA  |1.1 |
> 74440704 | myindex4
> |0.2 |0
>
> postgres | myschema | mytableB  |1.0 |
> 63324160 | myindex5
> |0.0 |0
>
> ...
>
> After VACUUM FULL:
>
>   current_database |   schemaname   | tablename | tbloat |
> wastedbytes |
> iname | ibloat | wastedibytes
>
>
> --++---++-+-++--
>
> postgres | myschema | mytableA  |1.1 |
> 74506240 |
> myindex4  |0.2
> |0
>
> postgres | myschema | mytableA  |1.1 |
> 74506240 |
> myindex3  |0.2
> |0
>
> postgres | myschema | mytableA  |1.1 |
> 74506240 | myindex2
> |0.2 |0
>
> postgres | myschema | mytableA  |1.1 |
> 74506240 |
> myindex1  |0.2
> |0
>
> postgres | myschema | mytableB  |1.0 |
> 63332352 |
> myindex5  |0.0
> |0
>
> ...
>
>
>
> This is the schema for mytableA above:
>
>
>
> Column |Type |
>  Modifiers
>
>
> ---+-+
>
> colA  | integer | not null default
> nextval('myschema.myseq'::regclass)
>
> colB  | integer |
>
> colC  | integer |
>
> colD  | timestamp without time zone |
>
> colE  | json|
>
> colF  | integer |
>
> colG  | integer |
>
>
>
> I was wondering if the fact that we use a json column could be interfering
> with the wastedbytes calculation.  Can anyone explain how wastedbytes could
> increase from a vacuum?
>
>
>

This query uses the column statistics to estimate bloat. AFAIK, json
columns don't have statistics, so the estimation can't be relied on (for
this specific table at least).


-- 
Guillaume.


Re: Specific query taking time to process

2019-12-10 Thread Fahiz Mohamed
Thank you very much for your prompt responses.

I have analysed more regarding this and found the long running query.

I ran "explain analyse" on this query and I got following result. (We have 2 
identical DB instances and they consist of same data. Instane 1 took 20+ second 
to process and instance 2 took less than a second)

Instance 1: (This is used by regular User - More than 600,000 request a day) - 
The result is same even when there is no user in the server.
EXPLAIN ANALYZE
Nested Loop Semi Join (cost=998547.53..3319573.36 rows=1 width=8) (actual 
time=10568.217..22945.971 rows=22 loops=1)
 -> Hash Semi Join (cost=998546.96..3319545.95 rows=41 width=16) (actual 
time=10568.198..22945.663 rows=22 loops=1)
Hash Cond: (node.id = prop.node_id)
-> Bitmap Heap Scan on alf_node node (cost=995009.97..3303978.85 
rows=4565737 width=8) (actual time=3304.419..20465.551 rows=41109751 loops=1)
   Recheck Cond: ((store_id = 6) AND (type_qname_id = 240))
   Rows Removed by Index Recheck: 54239131
   Filter: (NOT (hashed SubPlan 1))
   Rows Removed by Filter: 2816
   Heap Blocks: exact=24301 lossy=1875383
   -> Bitmap Index Scan on idx_alf_node_mdq (cost=0.00..646144.01 
rows=20047144 width=0) (actual time=3232.067..3232.067 rows=44246360 loops=1)
  Index Cond: ((store_id = 6) AND (type_qname_id = 240))
   SubPlan 1
-> Bitmap Heap Scan on alf_node_aspects aspect_1 
(cost=2503.51..347403.58 rows=128379 width=8) (actual time=25.447..65.392 
rows=5635 loops=1)
   Recheck Cond: (qname_id = 251)
   Heap Blocks: exact=40765
   -> Bitmap Index Scan on fk_alf_nasp_qn (cost=0.00..2471.41 
rows=128379 width=0) (actual time=18.835..18.835 rows=239610 loops=1)
  Index Cond: (qname_id = 251)
-> Hash (cost=3526.11..3526.11 rows=871 width=8) (actual time=0.045..0.045 
rows=23 loops=1)
   Buckets: 1024 Batches: 1 Memory Usage: 9kB
   -> Index Only Scan using idx_alf_nprop_s on alf_node_properties prop 
(cost=0.70..3526.11 rows=871 width=8) (actual time=0.021..0.042 rows=23 loops=1)
  Index Cond: ((qname_id = '242'::bigint) AND (string_value = 
'E292432'::text))
  Heap Fetches: 23
 -> Index Only Scan using alf_node_aspects_pkey on alf_node_aspects aspect 
(cost=0.57..2.01 rows=15 width=8) (actual time=0.011..0.011 rows=1 loops=22)
Index Cond: ((node_id = node.id) AND (qname_id = 245))
Heap Fetches: 22
Planning time: 0.639 ms
Execution time: 22946.036 ms

Instance 2: (Only by testers - 250 request a day)

Nested Loop Semi Join  (cost=6471.94..173560841.08 rows=2 width=8) (actual 
time=0.162..0.464 rows=17 loops=1)
  ->  Nested Loop  (cost=6471.37..173560684.36 rows=45 width=16) (actual 
time=0.154..0.387 rows=17 loops=1)
->  HashAggregate  (cost=3508.15..3516.80 rows=865 width=8) (actual 
time=0.041..0.047 rows=18 loops=1)
  Group Key: prop.node_id
  ->  Index Only Scan using idx_alf_nprop_s on alf_node_properties 
prop  (cost=0.70..3505.99 rows=866 width=8) (actual time=0.020..0.035 r
ows=18 loops=1)
Index Cond: ((qname_id = '242'::bigint) AND (string_value = 
'E292432'::text))
Heap Fetches: 18
->  Index Scan using alf_node_pkey on alf_node node  
(cost=2963.22..200644.11 rows=1 width=8) (actual time=0.019..0.019 rows=1 
loops=18)
  Index Cond: (id = prop.node_id)
  Filter: ((type_qname_id <> 145) AND (store_id = 6) AND 
(type_qname_id = 240) AND (NOT (SubPlan 1)))
  Rows Removed by Filter: 0
  SubPlan 1
->  Materialize  (cost=2962.65..397912.89 rows=158204 width=8) 
(actual time=0.001..0.009 rows=85 loops=17)
  ->  Bitmap Heap Scan on alf_node_aspects aspect_1  
(cost=2962.65..396503.87 rows=158204 width=8) (actual time=0.021..0.082 rows=
85 loops=1)
Recheck Cond: (qname_id = 251)
Heap Blocks: exact=55
->  Bitmap Index Scan on fk_alf_nasp_qn  
(cost=0.00..2923.10 rows=158204 width=0) (actual time=0.015..0.015 rows=87 
loops=
1)
  Index Cond: (qname_id = 251)
  ->  Index Only Scan using alf_node_aspects_pkey on alf_node_aspects aspect  
(cost=0.57..34.32 rows=12 width=8) (actual time=0.004..0.004 rows=1 loop
s=17)
Index Cond: ((node_id = node.id) AND (qname_id = 245))
Heap Fetches: 17
Planning time: 0.623 ms
Execution time: 0.540 ms

Configurations are same in both servers.

Please advise me on this. Is there any configuration specifically I need to 
look like “work_mem”, “Shared_buffers”, “checkpoint_segment”, 
“effective_cache_size”, “enable_seqscan” and “checkpoint_compression_target”?

Thanks in advance.

Fahiz

On 9 Dec 2019, 19:03 +, Michael Lewis , wrote:
> > > There is a specific search query I am running to get list of Documents 
> > > and their metadata from several table in the DB.
> > > We are running