I want to clean a large log table by chunks. I write such a query: delete from categorization.log where ctid in ( select ctid from categorization.log where timestamp < now() - interval '2 month' limit 1000 )
But I am getting the following weird plan: [Plan 1] Delete on log (cost=74988058.17..77101421.77 rows=211334860 width=36) -> Merge Semi Join (cost=74988058.17..77101421.77 rows=211334860 width=36) Merge Cond: (log.ctid = "ANY_subquery".ctid) -> Sort (cost=74987967.33..76044641.63 rows=422669720 width=6) Sort Key: log.ctid -> Seq Scan on log (cost=0.00..8651368.20 rows=422669720 width=6) -> Sort (cost=90.83..93.33 rows=1000 width=36) Sort Key: "ANY_subquery".ctid -> Subquery Scan on "ANY_subquery" (cost=0.00..41.00 rows=1000 width=36) -> Limit (cost=0.00..31.00 rows=1000 width=6) -> Seq Scan on log log_1 (cost=0.00..11821391.10 rows=381284367 width=6) Filter: ("timestamp" < (now() - '2 mons'::interval)) And it takes infinity to complete (with any number in LIMIT from 1 to 1000). However if I extract CTIDs manually: select array_agg(ctid) from ( select ctid from s.log where timestamp < now() - interval '2 month' limit 5 ) v and substitute the result inside the DELETE query, it does basic TID scan and completes in just milliseconds: explain delete from s.log where ctid = any('{"(3020560,1)","(3020560,2)","(3020560,3)","(3020560,4)","(3020560,5)"}'::tid[]) [Plan 2] Delete on log (cost=0.01..20.06 rows=5 width=6) -> Tid Scan on log (cost=0.01..20.06 rows=5 width=6) TID Cond: (ctid = ANY ('{"(3020560,1)","(3020560,2)","(3020560,3)","(3020560,4)","(3020560,5)"}'::tid[])) In case the table's definition helps: CREATE TABLE s.log ( article_id bigint NOT NULL, topic_id integer NOT NULL, weight double precision NOT NULL, cat_system character varying(50) NOT NULL, lang character varying(5) NOT NULL, is_final boolean NOT NULL, comment character varying(50), "timestamp" timestamp without time zone DEFAULT now() ) Number of rows ~ 423M n_live_tup = 422426725 last_vacuum = 2018-10-22 Postgres version(): PostgreSQL 10.3 (Ubuntu 10.3-1.pgdg14.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit Why does this query want to use Seq Scan and Sort on a 423M rows table? How to fix this (reduce it to Plan 2)? -- Vlad