On Wed, May 6, 2020 at 12:20 PM Steve Pritchard <steve.pritch...@bto.org>
wrote:

> Version: Postgres 9.6.3 production system (but also tested on Postgres 12)
>
> For my query the Planner is sometimes choosing an execution plan that uses
> "Bitmap And" (depending on the parameters):
>
> ->  Bitmap Heap Scan on observation  (cost=484.92..488.93 rows=1
> width=203) (actual time=233.129..330.886 rows=15636 loops=1)
>   Recheck Cond: (((user_id)::text = 'USER123'::text) AND ((loc_id)::text =
> ANY ('{LOC12345678}'::text[])))
>

If you change " = ANY(array_of_one)" to " = scalar", does that change
anything?  You might be able to fix this (in v12) using CREATE STATISTICS,
but I don't know if that mechanism can see through the ANY(array_of_one)
wrapper.


> Note that in cases where the Planner selects a single Index Scan for this
> query (with different parameters), the Planner makes an accurate estimate
> of the number of rows and then makes sensible selections of joins (i.e.
> quick).
> i.e. the issue seems to be with the "Bitmap And".
>


I don't know if this nitpick matters, but I don't think that that is how
the planner works.  The row estimates work from the top down, not the
bottom up.  The row estimate of 1 is based on what conditions the bitmap
heap scan implements, it is not arrived at by combining the estimates from
the index scans below it.  If it were to change to a different type of node
but implemented the same conditions, I think it would have the same row
estimate.


>
> I don't have an index with both user_id & loc_id, as this is one of
> several different combinations that can arise (it would require quite a few
> indexes to cover all the possible combinations).
>

Are you actually experiencing problems with those other combinations as
well?  If not, I wouldn't worry about solving hypothetical problems.  If
those other combinations are actually problems and you go with CREATE
STATISTICS, then you would have to be creating a lot of different
statistics.  That would still be ugly, but at least the overhead for
statistics is lower than for indexes.


> However if I did have such an index, the planner would presumably be able
> to use the statistics for user_id and loc_id to estimate the number of rows.
>

Indexes on physical columns do not have statistics, so making that index
would not help with the estimation.  (Expressional indexes do have
statistics, but I don't see that helping you here).   So while this node
would execute faster with that index, it would still be kicking the unshown
nested loop left join 15,636 times when it thinks it will be doing it
once, and so would still be slow.  The most robust solution might be to
make the outer part of that nested loop left join faster, so that your
system would be more tolerant of statistics problems.


>
> So why can't it make an accurate estimate of the rows with a "Bitmap And"
> & " Bitmap Heap Scan"? (as above)
>

In the absence of custom statistics, it assumes the selectivities of user_id
= 'USER123', of loc_id = ANY ('{LOC12345678}'::text[]), and of taxa =
'Birds' are all independent of each other and can be multiplied to arrive
at the overall selectivity.  But clearly that is not the case.  Bird
watchers mostly watch near where they live, not in random other places.

Cheers,

Jeff

Reply via email to