Thanks Tom. Collecting full stats on the tables involved corrected the execution.
On Tue, Aug 13, 2024 at 9:57 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > Siraj G <tosira...@gmail.com> writes: > > We migrated a PgSQL database from Cloud SQL to compute engine and since > > then there is a SQL we observed taking a long time. After some study, I > > found that the SQL is using NESTED LOOP where the cost is too high. > > The core of your problem seems to be here: > > > -> Index Scan using marketing_a_cancel__55ffff_idx > on > > marketing_app_leadhistory w0 (cost=0.57..4274.30 rows=1 width=8) (actual > > time=46.678..51.232 rows=44 loops=1) > > Index Cond: ((cancel_event_id IS NOT NULL) AND > > (cancel_event_type = 1)) > > Filter: ((status_id = 93) AND > > ((followup_date)::date >= '2024-08-01'::date) AND ((followup_date)::date > <= > > '2024-08-07'::date)) > > Rows Removed by Filter: 22268 > > Buffers: shared hit=9170 read=19 > > If the planner had estimated 40-some rows out of this step, rather > than one, it would certainly not have chosen to use nestloop joins > atop this. So the big problem to focus on is making that estimate > better. > > A secondary problem is that the choice of index seems poor: the > index itself is selecting 44+22268 = 22312 rows and then the filter > condition is throwing away 99.8% of those rows. Probably, using > an index on (status_id, followup_date) would have worked better. > > I suspect that both of these things are tied to the non-normalization > of your "cancel" condition. The planner probably believes that > "cancel_event_id IS NOT NULL" is statistically independent of > "cancel_event_type = 1"; but I'll bet it isn't, and thus the index > condition selects many more rows than the planner guessed. You might > be able to improve that estimate by creating extended stats on both of > those columns, but really a better idea would be to take a step back > and figure out if those two columns can't be merged into one. > > regards, tom lane >