Re: Very slow "bloat query"

Imre Samu Fri, 14 May 2021 08:45:26 -0700

> Unfortunately it's still 9.6.

And what is your "*version()*"?



for example:
postgres=# select version();
                                                             version

---------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.6.22 on x86_64-pc-linux-gnu (Debian 9.6.22-1.pgdg110+1),
compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)

Imre


Marcin Gozdalik <goz...@gmail.com> ezt írta (időpont: 2021. máj. 14., P,
14:11):

> Unfortunately it's still 9.6. Upgrade to latest 13 is planned for this
> year.
>
> pt., 14 maj 2021 o 12:08 Imre Samu <pella.s...@gmail.com> napisał(a):
>
>> >  Apart from the above hack of filtering out live tuples to a separate
>> table is there anything I could do?
>>
>> This is the latest PG13.3 version?
>>
>> IMHO:  If not,  maybe worth updating to the latest patch release, as soon
>> as possible
>>
>> https://www.postgresql.org/docs/release/13.3/
>> Release date: 2021-05-13
>> *"Disable the vacuum_cleanup_index_scale_factor parameter and storage
>> option (Peter Geoghegan)*
>> *The notion of tracking “stale” index statistics proved to interact badly
>> with the autovacuum_vacuum_insert_threshold parameter, resulting in
>> unnecessary full-index scans and consequent degradation of autovacuum
>> performance. The latter mechanism seems superior, so remove the
>> stale-statistics logic. The control parameter for that,
>> vacuum_cleanup_index_scale_factor, will be removed entirely in v14. In v13,
>> it remains present to avoid breaking existing configuration files, but it
>> no longer does anything."*
>>
>> best,
>>  Imre
>>
>>
>> Marcin Gozdalik <goz...@gmail.com> ezt írta (időpont: 2021. máj. 14., P,
>> 13:20):
>>
>>> Hi
>>>
>>> I am trying to use `pgmetrics` on a big (10TB+), busy (1GB/s RW)
>>> database. It takes around 5 minutes for pgmetrics to run. I traced the
>>> problem to the "bloat query" (version of
>>> https://wiki.postgresql.org/wiki/Show_database_bloat) spinning in CPU,
>>> doing no I/O.
>>>
>>> I have traced the problem to the bloated `pg_class` (the irony:
>>> `pgmetrics` does not collect bloat on `pg_catalog`):
>>> `vacuum (full, analyze, verbose) pg_class;`
>>> ```
>>> INFO:  vacuuming "pg_catalog.pg_class"
>>> INFO:  "pg_class": found 1 removable, 7430805 nonremovable row versions
>>> in 158870 pages
>>> DETAIL:  7429943 dead row versions cannot be removed yet.
>>> CPU 1.36s/6.40u sec elapsed 9.85 sec.
>>> INFO:  analyzing "pg_catalog.pg_class"
>>> INFO:  "pg_class": scanned 60000 of 158869 pages, containing 295 live
>>> rows and 2806547 dead rows; 295 rows in sample, 781 estimated total rows
>>> VACUUM
>>> ```
>>>
>>> `pg_class` has so many dead rows because the workload is temp-table
>>> heavy (creating/destroying 1M+ temporary tables per day) and has long
>>> running analytics queries running for 24h+.
>>>
>>> PG query planner assumes that index scan on `pg_class` will be very
>>> quick and plans Nested loop with Index scan. However, the index scan has 7M
>>> dead tuples to filter out and the query takes more than 200 seconds (
>>> https://explain.depesz.com/s/bw2G).
>>>
>>> If I create a temp table from `pg_class` to contain only the live tuples:
>>> ```
>>> CREATE TEMPORARY TABLE pg_class_alive AS SELECT oid,* from pg_class;
>>> CREATE UNIQUE INDEX pg_class_alive_oid_index ON pg_class_alive(oid);
>>> CREATE UNIQUE INDEX pg_class_alive_relname_nsp_index ON
>>> pg_class_alive(relname, relnamespace);
>>> CREATE INDEX pg_class_tblspc_relfilenode_index ON
>>> pg_class_alive(reltablespace, relfilenode);
>>> ANALYZE pg_class_alive;
>>> ```
>>>
>>> and run the bloat query on `pg_class_alive` instead of `pg_class`:
>>> ```
>>> SELECT
>>>    nn.nspname AS schemaname,
>>>    cc.relname AS tablename,
>>>    COALESCE(cc.reltuples,0) AS reltuples,
>>>    COALESCE(cc.relpages,0) AS relpages,
>>>    COALESCE(CEIL((cc.reltuples*((datahdr+8-
>>>      (CASE WHEN datahdr%8=0 THEN 8 ELSE datahdr%8
>>> END))+nullhdr2+4))/(8192-20::float)),0) AS otta
>>>  FROM
>>>     pg_class_alive cc
>>>  JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname <>
>>> 'information_schema'
>>>  LEFT JOIN
>>>  (
>>>    SELECT
>>>      foo.nspname,foo.relname,
>>>      (datawidth+32)::numeric AS datahdr,
>>>      (maxfracsum*(nullhdr+8-(case when nullhdr%8=0 THEN 8 ELSE nullhdr%8
>>> END))) AS nullhdr2
>>>    FROM (
>>>      SELECT
>>>        ns.nspname, tbl.relname,
>>>        SUM((1-coalesce(null_frac,0))*coalesce(avg_width, 2048)) AS
>>> datawidth,
>>>        MAX(coalesce(null_frac,0)) AS maxfracsum,
>>>        23+(
>>>          SELECT 1+count(*)/8
>>>          FROM pg_stats s2
>>>          WHERE null_frac<>0 AND s2.schemaname = ns.nspname AND
>>> s2.tablename = tbl.relname
>>>        ) AS nullhdr
>>>      FROM pg_attribute att
>>>      JOIN pg_class_alive tbl ON att.attrelid = tbl.oid
>>>      JOIN pg_namespace ns ON ns.oid = tbl.relnamespace
>>>      LEFT JOIN pg_stats s ON s.schemaname=ns.nspname
>>>      AND s.tablename = tbl.relname
>>>      AND s.inherited=false
>>>      AND s.attname=att.attname
>>>      WHERE att.attnum > 0 AND tbl.relkind='r'
>>>      GROUP BY 1,2
>>>    ) AS foo
>>>  ) AS rs
>>>  ON cc.relname = rs.relname AND nn.nspname = rs.nspname
>>>  LEFT JOIN pg_index i ON indrelid = cc.oid
>>>  LEFT JOIN pg_class_alive c2 ON c2.oid = i.indexrelid
>>> ```
>>>
>>> it runs in 10s, 20x faster (https://explain.depesz.com/s/K4SH)
>>>
>>> The rabbit hole probably goes deeper (e.g. should do the same for
>>> pg_statistic and pg_attribute and create a new pg_stats view).
>>>
>>> I am not able (at least not quickly) change the amount of temporary
>>> tables created or make the analytics queries finish quicker. Apart from the
>>> above hack of filtering out live tuples to a separate table is there
>>> anything I could do?
>>>
>>> Thank you,
>>> Marcin Gozdalik
>>>
>>> --
>>> Marcin Gozdalik
>>>
>>
>
> --
> Marcin Gozdalik
>

Re: Very slow "bloat query"

Reply via email to