[PERFORM] Optimizing around retained tuples
Hi pgsql-performance! So I have a Postgresql database -- version "PostgreSQL 9.4.8 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit", specifically. In it, I have essentially two categories of tables: - small tables that are updated frequently, and tend to be often queried in their entirety (seq scan) - large tables that are updated infrequently, and tend to be often queried using an index Let us assume that I have a table "A" that falls in the small category, and a table "B" that falls in the large category. The problem I'm having is that it is very difficult to do any amount of maintenance on tables like B without DOSing any queries that reference table A. The reason, as far as I can tell, is that having any statement run against a table like B results in all updates to table A being kept around until the statement on table B completes (as per the READ COMMITTED transaction isolation level -- statements against B must only see rows committed before they started). This makes sense -- it's required to keep ACID. However, there are times where I need to do large operations to B -- and these operations can be literally anything, but I'll focus on my most recent need: running a "pg_dump" against table B. I should add that table B is never involved with any query that touches table A -- in this case, it is an append-only table that records changes to a table C that is equivalently never involved with table A. So, on to the data from which I base the above claims: Let table A have 43 thousand rows: database=> select count(*) from a; -[ RECORD 1 ] count | 43717 Time: 10447.681 ms Let table B have 21 million rows: meraki_shard_production=> select count(id) from b; -[ RECORD 1 ]--- count | 21845610 Time: 116873.051 ms Assume a pg_dump operation is copying table B, i.e. there's a currently running query that looks like "COPY public.b (id, ...) TO STDOUT" Then this is what I get for running a verbose vacuum against A: database=> vacuum verbose a; INFO: vacuuming "public.a" INFO: index "a_pkey" now contains 2119583 row versions in 9424 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.04s/0.03u sec elapsed 0.49 sec. INFO: "a": found 0 removable, 2112776 nonremovable row versions in 185345 out of 186312 pages DETAIL: 2069676 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 1.28s/1.15u sec elapsed 22.93 sec. INFO: vacuuming "pg_toast.pg_toast_18889" INFO: index "pg_toast_18889_index" now contains 31 row versions in 2 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_18889": found 0 removable, 31 nonremovable row versions in 7 out of 7 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM Time: 23035.282 ms ... and here's how long it takes to read all of the rows: database=> select max(an unindexed bigint column) from a; -[ RECORD 1 ] max | Time: 10624.368 ms Running this another time immediately afterward (to show the cached speed) returns: Time: 13782.363 ms If I go to a separate database cluster that has an equivalent schema, and roughly equivalent table a (+- 2% on the number of rows), the above queries look more like this: meraki_shard_production=> vacuum verbose a; INFO: vacuuming "public.a" INFO: index "a_pkey" now contains 42171 row versions in 162 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "a": found 487 removable, 42286 nonremovable row versions in 7809 out of 7853 pages DETAIL: 373 dead row versions cannot be removed yet. There were 42436 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.02u sec elapsed 0.01 sec. INFO: vacuuming "pg_toast.pg_toast_19037" INFO: index "pg_toast_19037_index" now contains 57 row versions in 2 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_19037": found 0 removable, 57 nonremovable row versions in 12 out of 12 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM Time: 32.890 ms database=> select max(the same unindexed bigint column) from a; max - (1 row) Time: 16.696 ms (The second iteration takes 15.320 ms) So, the way I see it, my problem boils down to table "A" getting roughly 100-1000x slower when it gets roughly 20-50x bigger (depending if you measure in pages or tuples). Unfortunately, in my use case, table "A" acts as a join table for a lot of aspects of our company's webapp. Every 10 minutes, the
Re: [PERFORM] Optimizing around retained tuples
On Tue, Mar 21, 2017 at 4:24 PM, James Parks wrote: > ... and here's how long it takes to read all of the rows: > database=> select max(an unindexed bigint column) from a; > -[ RECORD 1 ] > max | > Time: 10624.368 ms > > Running this another time immediately afterward (to show the cached speed) > returns: > Time: 13782.363 ms > > If I go to a separate database cluster that has an equivalent schema, and > roughly equivalent table a (+- 2% on the number of rows), the above queries > look more like this: > > meraki_shard_production=> vacuum verbose a; > INFO: vacuuming "public.a" > INFO: index "a_pkey" now contains 42171 row versions in 162 pages > DETAIL: 0 index row versions were removed. > 0 index pages have been deleted, 0 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: "a": found 487 removable, 42286 nonremovable row versions in 7809 out > of 7853 pages > DETAIL: 373 dead row versions cannot be removed yet. > There were 42436 unused item pointers. > 0 pages are entirely empty. > CPU 0.00s/0.02u sec elapsed 0.01 sec. > INFO: vacuuming "pg_toast.pg_toast_19037" > INFO: index "pg_toast_19037_index" now contains 57 row versions in 2 pages > DETAIL: 0 index row versions were removed. > 0 index pages have been deleted, 0 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: "pg_toast_19037": found 0 removable, 57 nonremovable row versions in > 12 out of 12 pages > DETAIL: 0 dead row versions cannot be removed yet. > There were 0 unused item pointers. > 0 pages are entirely empty. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > VACUUM > Time: 32.890 ms > > database=> select max(the same unindexed bigint column) from a; >max > - > > (1 row) > Time: 16.696 ms > (The second iteration takes 15.320 ms) > > So, the way I see it, my problem boils down to table "A" getting roughly > 100-1000x slower when it gets roughly 20-50x bigger (depending if you > measure in pages or tuples). Unfortunately, in my use case, table "A" acts > as a join table for a lot of aspects of our company's webapp. Every 10 > minutes, the table is queried for 35 million rows via sequential scan (~800 > seq scans per minute, ~1.3 per second on average), and 6.5 million rows via > index lookup. When a sequential scan over 40k rows takes less than 1 second, > everything is fine -- when it takes 10+ seconds the database starts to slow > down significantly. Thankfully, queries can share sequential scans, but you > can imagine how the responsiveness of the webapp might suffer as a > consequence. There's also the secondary effect that, should the query on B > complete, there now exist many queries against A (and other related tables) > that are slow enough to potentially increase the size of A even further. It > is not uncommon for queries involving A to start taking upwards of 30 > minutes to complete, when they usually complete in roughly 300ms, after some > maintenance query against B has completed. > > Our go-to solution has been to detect and stop these maintenance queries if > they take too long, and then to CLUSTER table A. This puts a cap on how long > any maintenance query can take -- down to somewhere around 1 hour. > > And thus my query to you guys: > > What can I do to keep running long maintenance operations on large tables > (SELECTing significant fractions of B, DELETEing significant fractions of B, > running VACUUM FULL on B) without denying other Postgresql backends their > ability to efficiently query table A? Or, in other words, how do I avoid > incurring the cost of transaction isolation for queries against B on a > case-by-case basis? > > Anything is on the table for implementation: > - moving tables to a different database / cluster / completely different > DBMS system > - designing an extension to tune either sets of queries > - partitioning tables > - etc > ... although the simpler the better. If you were in this position, what > would you do? > > Regards, > James You're experiencing bloat because the transaction on B is preventing the xid horizon from moving forward, thus dead tuples from A cannot be reclaimed in case the transaction on B decides to query them. There's only one "easy" solution for this as far as I know, and it is to run your long-running queries on a hot standby. That certainly works for most read-only workloads, especially pg_dump. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimizing around retained tuples
On Tue, Mar 21, 2017 at 10:56 PM, Claudio Freire wrote: > On Tue, Mar 21, 2017 at 4:24 PM, James Parks wrote: >> ... and here's how long it takes to read all of the rows: >> database=> select max(an unindexed bigint column) from a; >> -[ RECORD 1 ] >> max | >> Time: 10624.368 ms >> >> Running this another time immediately afterward (to show the cached speed) >> returns: >> Time: 13782.363 ms >> >> If I go to a separate database cluster that has an equivalent schema, and >> roughly equivalent table a (+- 2% on the number of rows), the above queries >> look more like this: >> >> meraki_shard_production=> vacuum verbose a; >> INFO: vacuuming "public.a" >> INFO: index "a_pkey" now contains 42171 row versions in 162 pages >> DETAIL: 0 index row versions were removed. >> 0 index pages have been deleted, 0 are currently reusable. >> CPU 0.00s/0.00u sec elapsed 0.00 sec. >> INFO: "a": found 487 removable, 42286 nonremovable row versions in 7809 out >> of 7853 pages >> DETAIL: 373 dead row versions cannot be removed yet. >> There were 42436 unused item pointers. >> 0 pages are entirely empty. >> CPU 0.00s/0.02u sec elapsed 0.01 sec. >> INFO: vacuuming "pg_toast.pg_toast_19037" >> INFO: index "pg_toast_19037_index" now contains 57 row versions in 2 pages >> DETAIL: 0 index row versions were removed. >> 0 index pages have been deleted, 0 are currently reusable. >> CPU 0.00s/0.00u sec elapsed 0.00 sec. >> INFO: "pg_toast_19037": found 0 removable, 57 nonremovable row versions in >> 12 out of 12 pages >> DETAIL: 0 dead row versions cannot be removed yet. >> There were 0 unused item pointers. >> 0 pages are entirely empty. >> CPU 0.00s/0.00u sec elapsed 0.00 sec. >> VACUUM >> Time: 32.890 ms >> >> database=> select max(the same unindexed bigint column) from a; >>max >> - >> >> (1 row) >> Time: 16.696 ms >> (The second iteration takes 15.320 ms) >> >> So, the way I see it, my problem boils down to table "A" getting roughly >> 100-1000x slower when it gets roughly 20-50x bigger (depending if you >> measure in pages or tuples). Unfortunately, in my use case, table "A" acts >> as a join table for a lot of aspects of our company's webapp. Every 10 >> minutes, the table is queried for 35 million rows via sequential scan (~800 >> seq scans per minute, ~1.3 per second on average), and 6.5 million rows via >> index lookup. When a sequential scan over 40k rows takes less than 1 second, >> everything is fine -- when it takes 10+ seconds the database starts to slow >> down significantly. Thankfully, queries can share sequential scans, but you >> can imagine how the responsiveness of the webapp might suffer as a >> consequence. There's also the secondary effect that, should the query on B >> complete, there now exist many queries against A (and other related tables) >> that are slow enough to potentially increase the size of A even further. It >> is not uncommon for queries involving A to start taking upwards of 30 >> minutes to complete, when they usually complete in roughly 300ms, after some >> maintenance query against B has completed. >> >> Our go-to solution has been to detect and stop these maintenance queries if >> they take too long, and then to CLUSTER table A. This puts a cap on how long >> any maintenance query can take -- down to somewhere around 1 hour. >> >> And thus my query to you guys: >> >> What can I do to keep running long maintenance operations on large tables >> (SELECTing significant fractions of B, DELETEing significant fractions of B, >> running VACUUM FULL on B) without denying other Postgresql backends their >> ability to efficiently query table A? Or, in other words, how do I avoid >> incurring the cost of transaction isolation for queries against B on a >> case-by-case basis? >> >> Anything is on the table for implementation: >> - moving tables to a different database / cluster / completely different >> DBMS system >> - designing an extension to tune either sets of queries >> - partitioning tables >> - etc >> ... although the simpler the better. If you were in this position, what >> would you do? >> >> Regards, >> James > > You're experiencing bloat because the transaction on B is preventing > the xid horizon from moving forward, thus dead tuples from A cannot be > reclaimed in case the transaction on B decides to query them. > > There's only one "easy" solution for this as far as I know, and it is > to run your long-running queries on a hot standby. That certainly > works for most read-only workloads, especially pg_dump. Forgot to clarify... for your use case, make sure you *don't* enable standby feedback on the standby. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance