Re: [PERFORM] Performance issue in PostgreSQL server...

2017-03-05 Thread Justin Pryzby
On Sun, Mar 05, 2017 at 08:23:08PM -0800, Jeff Janes wrote:
> On Fri, Mar 3, 2017 at 4:44 AM, Dinesh Chandra 12108 
>  wrote:
> > The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT
> > feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence
> > oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND
> > (p.modification_time > '2015-05-10 00:06:56.056 IST' OR
> > oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id
> ...
> 
> >  ->  Index Scan using point_domain_class_id_index on 
> > point p  (cost=0.00..1483472.70 rows=1454751 width=16) (actual 
> > time=27.265..142101.1 59 rows=1607491 loops=1)
> >Index Cond: (domain_class_id = 11)
> 
> Why wouldn't this be using a bitmap scan rather than a regular index scan?
> It seems like it should prefer the bitmap scan, unless the table is well
> clustered on domain_class_id.  In which case, why isn't it just faster?

Could you send:

SELECT * FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ;

.. or if that's too verbose or you don't want to share the histogram or MCV
list:

SELECT correlation FROM pg_stats WHERE tablename='point' AND 
attname='domain_class_id' ;

Justin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Speeding up JSON + TSQUERY + GIN

2017-03-05 Thread Jeff Janes
On Thu, Mar 2, 2017 at 1:19 PM, Sven R. Kunze  wrote:

> On 01.03.2017 18:04, Jeff Janes wrote:
>
> On Wed, Mar 1, 2017 at 6:02 AM, Sven R. Kunze  wrote:
>
>> On 28.02.2017 17:49, Jeff Janes wrote:
>>
>> Oh.  In my hands, it works very well.  I get 70 seconds to do the {age:
>> 20} query from pure cold caches, versus 1.4 seconds from cold caches which
>> was followed by pg_prewarm('docs','prefetch').
>>
>> How much RAM do you have?  Maybe you don't have enough to hold the table
>> in RAM.  What kind of IO system?  And what OS?
>>
>>
>> On my test system:
>>
>> RAM: 4GB
>> IO: SSD (random_page_cost = 1.0)
>> OS: Ubuntu 16.04
>>
>
>
> 4GB is not much RAM to be trying to pre-warm this amount of data into.
> Towards the end of the pg_prewarm, it is probably evicting data read in by
> the earlier part of it.
>
> What is shared_buffers?
>
>
> 942MB.
>
> But I see where you are coming from. How come that these queries need a
> Recheck Cond? I gather that this would require reading not only the index
> data but also the table itself which could be huge, right?
>

Bitmaps can overflow and drop the row-level information, tracking only the
blocks which need to be inspected.  So it has to have a recheck in case
that happens (although in your case it is not actually overflowing--but it
still needs to be prepared for that).  Also, I think that jsonb_path_ops
indexes the hashes of the paths, so it can deliver false positives which
need to be rechecked.  And you are selecting `id`, which is not in the
index so it would have to consult the table anyway to retrieve that.  Even
if it could get all the data from the index itself, I don't think GIN
indexes support that feature.

Cheers,

Jeff


Re: [PERFORM] Performance issue in PostgreSQL server...

2017-03-05 Thread Jeff Janes
On Fri, Mar 3, 2017 at 4:44 AM, Dinesh Chandra 12108 <
dinesh.chan...@cyient.com> wrote:

> Dear Nur,
>
>
>
> The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT
> feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence
> oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND
> (p.modification_time > '2015-05-10 00:06:56.056 IST' OR
> oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id
>
>
>
>
...


>  ->  Index Scan using point_domain_class_id_index on
> point p  (cost=0.00..1483472.70 rows=1454751 width=16) (actual
> time=27.265..142101.1
>
> 59 rows=1607491 loops=1)
>
>Index Cond: (domain_class_id = 11)
>

Why wouldn't this be using a bitmap scan rather than a regular index scan?
It seems like it should prefer the bitmap scan, unless the table is well
clustered on domain_class_id.  In which case, why isn't it just faster?

You could try repeating the explain analyze after setting enable_indexscan
=off to see what that gives.  If it gives a seq scan, then repeat with
enable_seqscan also turned off.  Or If it gives the bitmap scan, then
repeat with enable_bitmapscan turned off.

How many rows is in point, and how big is it?

The best bet for making this better might be to have an index on
(domain_class_id, modification_time) and hope for an index only scan.
Except that you are on 9.1, so first you would have to upgrade.  Which
would allow you to use BUFFERS in the explain analyze, as well as
track_io_timings, both of which would also be pretty nice to see.  Using
9.1 is like having one hand tied behind your back.

Also, any idea why this execution of this query 15 is times faster than the
execution you found in the log file?  Was the top output you showed in the
first email happening at the time the really slow query was running, or was
that from a different period?

Cheers,

Jeff