Re: Problem with a Query

Siraj G Mon, 26 Aug 2024 03:30:31 -0700

Thanks Tom. Collecting full stats on the tables involved corrected the
execution.


On Tue, Aug 13, 2024 at 9:57 AM Tom Lane <t...@sss.pgh.pa.us> wrote:

> Siraj G <tosira...@gmail.com> writes:
> > We migrated a PgSQL database from Cloud SQL to compute engine and since
> > then there is a SQL we observed taking a long time. After some study, I
> > found that the SQL is using NESTED LOOP where the cost is too high.
>
> The core of your problem seems to be here:
>
> >                      ->  Index Scan using marketing_a_cancel__55ffff_idx
> on
> > marketing_app_leadhistory w0  (cost=0.57..4274.30 rows=1 width=8) (actual
> > time=46.678..51.232 rows=44 loops=1)
> >                            Index Cond: ((cancel_event_id IS NOT NULL) AND
> > (cancel_event_type = 1))
> >                            Filter: ((status_id = 93) AND
> > ((followup_date)::date >= '2024-08-01'::date) AND ((followup_date)::date
> <=
> > '2024-08-07'::date))
> >                            Rows Removed by Filter: 22268
> >                            Buffers: shared hit=9170 read=19
>
> If the planner had estimated 40-some rows out of this step, rather
> than one, it would certainly not have chosen to use nestloop joins
> atop this.  So the big problem to focus on is making that estimate
> better.
>
> A secondary problem is that the choice of index seems poor: the
> index itself is selecting 44+22268 = 22312 rows and then the filter
> condition is throwing away 99.8% of those rows.  Probably, using
> an index on (status_id, followup_date) would have worked better.
>
> I suspect that both of these things are tied to the non-normalization
> of your "cancel" condition.  The planner probably believes that
> "cancel_event_id IS NOT NULL" is statistically independent of
> "cancel_event_type = 1"; but I'll bet it isn't, and thus the index
> condition selects many more rows than the planner guessed.  You might
> be able to improve that estimate by creating extended stats on both of
> those columns, but really a better idea would be to take a step back
> and figure out if those two columns can't be merged into one.
>
>                         regards, tom lane
>

Re: Problem with a Query

Reply via email to