That makes a lot of sense, thanks for taking a look.  An index like you
suggest would probably further improve the query.   Is that suggestion
sidestepping the original problem that production is evaluating the
landing_id bit with the partition index and qa is sequence scanning instead?

AND exists (select 1 from t_sap where e.landing_id = t_sap.landing_id)) as
rankings;

Based on the difference in row estimate I am attempting an analyze with a
higher default_statistic_target (currently 100) to see if that helps.




On Fri, Sep 15, 2017 at 3:42 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Mike Broers <mbro...@gmail.com> writes:
> > If Im reading this correctly postgres thinks the partition will return
> 6.5
> > million matching rows but actually comes back with 162k.  Is this a case
> > where something is wrong with the analyze job?
>
> You've got a lot of scans there that're using conditions like
>
> > │                           ->  Seq Scan on event__99999999 e_1
> (cost=0.00..2527828.05 rows=11383021 width=778) (actual
> time=25522.389..747238.885 rows=42 loops=1)
> > │                                 Filter: (((body ->> 'SID'::text) IS
> NOT NULL) AND (validation_status_code = 'P'::bpchar))
> > │                                 Rows Removed by Filter: 12172186
>
> While I'd expect the planner to be pretty solid on estimating the
> validation_status_code condition, it's not going to have any idea about
> that JSON field test.  That's apparently very selective, but you're just
> getting a default estimate, which is not going to think that a NOT NULL
> test will exclude lots of rows.
>
> One thing you could consider doing about this is creating an index
> on (body ->> 'SID'::text), which would prompt ANALYZE to gather statistics
> about that expression.  Even if the index weren't actually used in the
> plan, this might improve the estimates and the resulting planning choices
> enough to make it worth maintaining such an index.
>
> Or you could think about pulling that field out and storing it on its own.
> JSON columns are great for storing random unstructured data, but they are
> less great when you want to do relational-ish things on subfields.
>
>                         regards, tom lane
>

Reply via email to