date:20170321

[PERFORM] Optimizing around retained tuples

2017-03-21 Thread James Parks

Hi pgsql-performance!

So I have a Postgresql database -- version "PostgreSQL 9.4.8 on
x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit",
specifically.

In it, I have essentially two categories of tables:
- small tables that are updated frequently, and tend to be often queried in
their entirety (seq scan)
- large tables that are updated infrequently, and tend to be often queried
using an index

Let us assume that I have a table "A" that falls in the small category, and
a table "B" that falls in the large category.

The problem I'm having is that it is very difficult to do any amount of
maintenance on tables like B without DOSing any queries that reference
table A. The reason, as far as I can tell, is that having any statement run
against a table like B results in all updates to table A being kept around
until the statement on table B completes (as per the READ COMMITTED
transaction isolation level -- statements against B must only see rows
committed before they started). This makes sense -- it's required to keep
ACID.

However, there are times where I need to do large operations to B -- and
these operations can be literally anything, but I'll focus on my most
recent need: running a "pg_dump" against table B.

I should add that table B is never involved with any query that touches
table A -- in this case, it is an append-only table that records changes to
a table C that is equivalently never involved with table A.

So, on to the data from which I base the above claims:

Let table A have 43 thousand rows:
database=> select count(*) from a;
-[ RECORD 1 ]
count | 43717
Time: 10447.681 ms

Let table B have 21 million rows:
meraki_shard_production=> select count(id) from b;
-[ RECORD 1 ]---
count | 21845610
Time: 116873.051 ms

Assume a pg_dump operation is copying table B, i.e. there's a currently
running query that looks like "COPY public.b (id, ...) TO STDOUT"

Then this is what I get for running a verbose vacuum against A:

database=> vacuum verbose a;
INFO:  vacuuming "public.a"
INFO:  index "a_pkey" now contains 2119583 row versions in 9424 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.04s/0.03u sec elapsed 0.49 sec.
INFO:  "a": found 0 removable, 2112776 nonremovable row versions in 185345
out of 186312 pages
DETAIL:  2069676 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 1.28s/1.15u sec elapsed 22.93 sec.
INFO:  vacuuming "pg_toast.pg_toast_18889"
INFO:  index "pg_toast_18889_index" now contains 31 row versions in 2 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_18889": found 0 removable, 31 nonremovable row versions in
7 out of 7 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
Time: 23035.282 ms

... and here's how long it takes to read all of the rows:
database=> select max(an unindexed bigint column) from a;
-[ RECORD 1 ]
max | 
Time: 10624.368 ms

Running this another time immediately afterward (to show the cached speed)
returns:
Time: 13782.363 ms

If I go to a separate database cluster that has an equivalent schema, and
roughly equivalent table a (+- 2% on the number of rows), the above queries
look more like this:

meraki_shard_production=> vacuum verbose a;
INFO:  vacuuming "public.a"
INFO:  index "a_pkey" now contains 42171 row versions in 162 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "a": found 487 removable, 42286 nonremovable row versions in 7809
out of 7853 pages
DETAIL:  373 dead row versions cannot be removed yet.
There were 42436 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.02u sec elapsed 0.01 sec.
INFO:  vacuuming "pg_toast.pg_toast_19037"
INFO:  index "pg_toast_19037_index" now contains 57 row versions in 2 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_19037": found 0 removable, 57 nonremovable row versions in
12 out of 12 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
Time: 32.890 ms

database=> select max(the same unindexed bigint column) from a;
   max
-
 
(1 row)
Time: 16.696 ms
(The second iteration takes 15.320 ms)

So, the way I see it, my problem boils down to table "A" getting roughly
100-1000x slower when it gets roughly 20-50x bigger (depending if you
measure in pages or tuples). Unfortunately, in my use case, table "A" acts
as a join table for a lot of aspects of our company's webapp. Every 10
minutes, the

Re: [PERFORM] Optimizing around retained tuples

2017-03-21 Thread Claudio Freire

On Tue, Mar 21, 2017 at 4:24 PM, James Parks  wrote:
> ... and here's how long it takes to read all of the rows:
> database=> select max(an unindexed bigint column) from a;
> -[ RECORD 1 ]
> max | 
> Time: 10624.368 ms
>
> Running this another time immediately afterward (to show the cached speed)
> returns:
> Time: 13782.363 ms
>
> If I go to a separate database cluster that has an equivalent schema, and
> roughly equivalent table a (+- 2% on the number of rows), the above queries
> look more like this:
>
> meraki_shard_production=> vacuum verbose a;
> INFO:  vacuuming "public.a"
> INFO:  index "a_pkey" now contains 42171 row versions in 162 pages
> DETAIL:  0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  "a": found 487 removable, 42286 nonremovable row versions in 7809 out
> of 7853 pages
> DETAIL:  373 dead row versions cannot be removed yet.
> There were 42436 unused item pointers.
> 0 pages are entirely empty.
> CPU 0.00s/0.02u sec elapsed 0.01 sec.
> INFO:  vacuuming "pg_toast.pg_toast_19037"
> INFO:  index "pg_toast_19037_index" now contains 57 row versions in 2 pages
> DETAIL:  0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  "pg_toast_19037": found 0 removable, 57 nonremovable row versions in
> 12 out of 12 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 0 unused item pointers.
> 0 pages are entirely empty.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> VACUUM
> Time: 32.890 ms
>
> database=> select max(the same unindexed bigint column) from a;
>max
> -
>  
> (1 row)
> Time: 16.696 ms
> (The second iteration takes 15.320 ms)
>
> So, the way I see it, my problem boils down to table "A" getting roughly
> 100-1000x slower when it gets roughly 20-50x bigger (depending if you
> measure in pages or tuples). Unfortunately, in my use case, table "A" acts
> as a join table for a lot of aspects of our company's webapp. Every 10
> minutes, the table is queried for 35 million rows via sequential scan (~800
> seq scans per minute, ~1.3 per second on average), and 6.5 million rows via
> index lookup. When a sequential scan over 40k rows takes less than 1 second,
> everything is fine -- when it takes 10+ seconds the database starts to slow
> down significantly. Thankfully, queries can share sequential scans, but you
> can imagine how the responsiveness of the webapp might suffer as a
> consequence. There's also the secondary effect that, should the query on B
> complete, there now exist many queries against A (and other related tables)
> that are slow enough to potentially increase the size of A even further. It
> is not uncommon for queries involving A to start taking upwards of 30
> minutes to complete, when they usually complete in roughly 300ms, after some
> maintenance query against B has completed.
>
> Our go-to solution has been to detect and stop these maintenance queries if
> they take too long, and then to CLUSTER table A. This puts a cap on how long
> any maintenance query can take -- down to somewhere around 1 hour.
>
> And thus my query to you guys:
>
> What can I do to keep running long maintenance operations on large tables
> (SELECTing significant fractions of B, DELETEing significant fractions of B,
> running VACUUM FULL on B) without denying other Postgresql backends their
> ability to efficiently query table A? Or, in other words, how do I avoid
> incurring the cost of transaction isolation for queries against B on a
> case-by-case basis?
>
> Anything is on the table for implementation:
> - moving tables to a different database / cluster / completely different
> DBMS system
> - designing an extension to tune either sets of queries
> - partitioning tables
> - etc
> ... although the simpler the better. If you were in this position, what
> would you do?
>
> Regards,
> James

You're experiencing bloat because the transaction on B is preventing
the xid horizon from moving forward, thus dead tuples from A cannot be
reclaimed in case the transaction on B decides to query them.

There's only one "easy" solution for this as far as I know, and it is
to run your long-running queries on a hot standby. That certainly
works for most read-only workloads, especially pg_dump.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Optimizing around retained tuples

2017-03-21 Thread Claudio Freire

On Tue, Mar 21, 2017 at 10:56 PM, Claudio Freire  wrote:
> On Tue, Mar 21, 2017 at 4:24 PM, James Parks  wrote:
>> ... and here's how long it takes to read all of the rows:
>> database=> select max(an unindexed bigint column) from a;
>> -[ RECORD 1 ]
>> max | 
>> Time: 10624.368 ms
>>
>> Running this another time immediately afterward (to show the cached speed)
>> returns:
>> Time: 13782.363 ms
>>
>> If I go to a separate database cluster that has an equivalent schema, and
>> roughly equivalent table a (+- 2% on the number of rows), the above queries
>> look more like this:
>>
>> meraki_shard_production=> vacuum verbose a;
>> INFO:  vacuuming "public.a"
>> INFO:  index "a_pkey" now contains 42171 row versions in 162 pages
>> DETAIL:  0 index row versions were removed.
>> 0 index pages have been deleted, 0 are currently reusable.
>> CPU 0.00s/0.00u sec elapsed 0.00 sec.
>> INFO:  "a": found 487 removable, 42286 nonremovable row versions in 7809 out
>> of 7853 pages
>> DETAIL:  373 dead row versions cannot be removed yet.
>> There were 42436 unused item pointers.
>> 0 pages are entirely empty.
>> CPU 0.00s/0.02u sec elapsed 0.01 sec.
>> INFO:  vacuuming "pg_toast.pg_toast_19037"
>> INFO:  index "pg_toast_19037_index" now contains 57 row versions in 2 pages
>> DETAIL:  0 index row versions were removed.
>> 0 index pages have been deleted, 0 are currently reusable.
>> CPU 0.00s/0.00u sec elapsed 0.00 sec.
>> INFO:  "pg_toast_19037": found 0 removable, 57 nonremovable row versions in
>> 12 out of 12 pages
>> DETAIL:  0 dead row versions cannot be removed yet.
>> There were 0 unused item pointers.
>> 0 pages are entirely empty.
>> CPU 0.00s/0.00u sec elapsed 0.00 sec.
>> VACUUM
>> Time: 32.890 ms
>>
>> database=> select max(the same unindexed bigint column) from a;
>>max
>> -
>>  
>> (1 row)
>> Time: 16.696 ms
>> (The second iteration takes 15.320 ms)
>>
>> So, the way I see it, my problem boils down to table "A" getting roughly
>> 100-1000x slower when it gets roughly 20-50x bigger (depending if you
>> measure in pages or tuples). Unfortunately, in my use case, table "A" acts
>> as a join table for a lot of aspects of our company's webapp. Every 10
>> minutes, the table is queried for 35 million rows via sequential scan (~800
>> seq scans per minute, ~1.3 per second on average), and 6.5 million rows via
>> index lookup. When a sequential scan over 40k rows takes less than 1 second,
>> everything is fine -- when it takes 10+ seconds the database starts to slow
>> down significantly. Thankfully, queries can share sequential scans, but you
>> can imagine how the responsiveness of the webapp might suffer as a
>> consequence. There's also the secondary effect that, should the query on B
>> complete, there now exist many queries against A (and other related tables)
>> that are slow enough to potentially increase the size of A even further. It
>> is not uncommon for queries involving A to start taking upwards of 30
>> minutes to complete, when they usually complete in roughly 300ms, after some
>> maintenance query against B has completed.
>>
>> Our go-to solution has been to detect and stop these maintenance queries if
>> they take too long, and then to CLUSTER table A. This puts a cap on how long
>> any maintenance query can take -- down to somewhere around 1 hour.
>>
>> And thus my query to you guys:
>>
>> What can I do to keep running long maintenance operations on large tables
>> (SELECTing significant fractions of B, DELETEing significant fractions of B,
>> running VACUUM FULL on B) without denying other Postgresql backends their
>> ability to efficiently query table A? Or, in other words, how do I avoid
>> incurring the cost of transaction isolation for queries against B on a
>> case-by-case basis?
>>
>> Anything is on the table for implementation:
>> - moving tables to a different database / cluster / completely different
>> DBMS system
>> - designing an extension to tune either sets of queries
>> - partitioning tables
>> - etc
>> ... although the simpler the better. If you were in this position, what
>> would you do?
>>
>> Regards,
>> James
>
> You're experiencing bloat because the transaction on B is preventing
> the xid horizon from moving forward, thus dead tuples from A cannot be
> reclaimed in case the transaction on B decides to query them.
>
> There's only one "easy" solution for this as far as I know, and it is
> to run your long-running queries on a hot standby. That certainly
> works for most read-only workloads, especially pg_dump.

Forgot to clarify... for your use case, make sure you *don't* enable
standby feedback on the standby.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Optimizing around retained tuples

Re: [PERFORM] Optimizing around retained tuples

Re: [PERFORM] Optimizing around retained tuples

3 matches

Site Navigation

Mail list logo

Footer information