Re: [HACKERS] [GENERAL] 4B row limit for CLOB tables

2015-02-05 Thread Matthew Kelly
 That's assuming that toasting is evenly spread between tables. In my 
 experience, that's not a great bet...

Time to create a test:
SELECT chunk_id::bigint/10 as id_range, count(*), count(*)/(10::float) 
density FROM (SELECT chunk_id FROM pg_toast.pg_toast_39000165 WHERE chunk_id 
 1 AND chunk_seq = 0) f GROUP BY id_range ORDER BY id_range;

The machine in question was restored in parallel in Sept 2013 as part of an 
upgrade from 8.4.  It has about 2000 tables, so its definitely not dominated by 
a couple tables. Progress towards oid wrap around is about 25.6%.

With minimal effort, I found 2 bad examples, and I’m sure I can easily find 
more. I attached the results for those two.

There were runs of 1,100,000+ and 600,000+ chunk_ids where more than 99% of the 
chunk_id are taken.  After restore completion, oid densities averaged less than 
20 per 100,000 and 400 per 100,000 respectively.  The only reasons those runs 
seem to be so short is because the tables were much smaller back then.  I 
expect that next time I dump restore (necessary for upgrading OS versions due 
to the collation issue), I’m going to have runs closer to 20,,000.

 ... this fix would actually make things enormously worse.  With the
 single counter feeding all tables, you at least have a reasonable
 probability that there are not enormously long runs of consecutive OIDs in
 any one toast table.  With a sequence per table, you are nearly guaranteed
 that there are such runs, because inserts into other tables don't create a
 break.

It makes each toast table independent (and far less likely to wrap) .  It would 
wrap when the sum(mods on THIS toast table)  2^32.  Right now the function 
looks like:

sum(mods on ALL toast tables in cluster) + sum(created normal tables in cluster 
* k) + sum(created temp tables in cluster * k) + [...]  2^32,
WHERE k average number of ids consumed for pg_class, pg_type, etc...

In the case of an insert only table (which is a common use case for 
partitions), the id would only wrap when the TOAST table was “full”.  On the 
other hand currently, it would wrap into its pg_restored section when the 
combined oid consuming operations on the cluster surpassed 4 billion.

That being said, I’m certainly not attached to that solution.  My real argument 
is that although its not a problem today, we are only about 5 years from it 
being a problem for large installs and the first time you’ll hear about it is 
after someone has a 5 minute production outage on a database thats been taking 
traffic for 2 years.

- Matt K.


id_range | count | density 
-+---+-
 390 | 92188 | 0.92188
 391 | 99186 | 0.99186
 392 | 99826 | 0.99826
 393 | 99101 | 0.99101
 394 | 99536 | 0.99536
 395 | 99796 | 0.99796
 396 | 99321 | 0.99321
 397 | 99768 | 0.99768
 398 | 99744 | 0.99744
 399 | 99676 | 0.99676
 400 | 98663 | 0.98663
 401 | 40690 |  0.4069
 403 |92 | 0.00092
 404 |   491 | 0.00491
 407 |74 | 0.00074
 408 |54 | 0.00054
 415 |   152 | 0.00152
 416 |47 | 0.00047
 419 |59 | 0.00059
 422 | 2 |   2e-05
 423 |14 | 0.00014
 424 | 5 |   5e-05
 425 |11 | 0.00011
 426 | 7 |   7e-05
 427 | 5 |   5e-05
 428 | 6 |   6e-05
 517 | 5 |   5e-05
 518 | 9 |   9e-05
 519 | 6 |   6e-05
 520 |12 | 0.00012
 521 |17 | 0.00017
 522 | 5 |   5e-05
 588 |15 | 0.00015
 589 |10 |  0.0001
 590 |19 | 0.00019
 591 |12 | 0.00012
 592 |12 | 0.00012
 593 | 2 |   2e-05
 617 | 4 |   4e-05
 618 | 9 |   9e-05
 619 | 7 |   7e-05
 620 |14 | 0.00014
 621 | 5 |   5e-05
 622 |11 | 0.00011
 682 | 8 |   8e-05
 683 |13 | 0.00013
 684 |17 | 0.00017
 685 | 6 |   6e-05
 686 |17 | 0.00017
 687 | 4 |   4e-05
 767 | 5 |   5e-05
 768 |10 |  0.0001
 769 | 9 |   9e-05
 770 | 2 |   2e-05
 771 |14 | 0.00014
 772 | 2 |   2e-05
 773 |11 | 0.00011
 774 |13 | 0.00013
 775 |10 |  0.0001
 776 | 3 |   3e-05
 914 | 7 |   7e-05
 915 | 7 |   7e-05
 916 | 1 |   1e-05
 917 | 3 |   3e-05
 918 | 3 |   3e-05
 919 | 5 |   5e-05
 920 | 4 |   4e-05
 921 | 9 |   9e-05
 922 | 9 |   9e-05
 923 | 1 |   1e-05
(70 rows)

id_range | count | density 
-+---+-
 402 | 96439 | 0.96439
 403 | 99102 | 0.99102
 404 | 98787 | 0.98787
 405 | 99351 

Re: [HACKERS] [GENERAL] 4B row limit for CLOB tables

2015-02-03 Thread Tom Lane
Matthew Kelly mke...@tripadvisor.com writes:
 However, I do have active databases where the current oid is between 1 
 billion and 2 billion.  They were last dump-restored for a hardware upgrade a 
 couple years ago and were a bit more than half the size.  I therefore can 
 imagine that I have tables which are keyed by ~8,000,000 consecutive oids.

 I would argue that when it wraps there will be a single insert that will 
 probably block for 2-5 minutes while it tries to accomplish ~8,000,000 index 
 scans inside of GetNewOidWithIndex.  Even partitioning doesn’t protect you 
 from this potential problem.

That may be a hazard, but ...

 That being said I’d be perfectly happy merely giving each TOAST table its 
 own sequence as that almost entire mitigates the risk of an unexpected lock 
 up on reasonably sized tables/partitions, and provides a functional work 
 around for those of us with larger than average installs.

... this fix would actually make things enormously worse.  With the
single counter feeding all tables, you at least have a reasonable
probability that there are not enormously long runs of consecutive OIDs in
any one toast table.  With a sequence per table, you are nearly guaranteed
that there are such runs, because inserts into other tables don't create a
break.

(This effect is also why you're wrong to claim that partitioning can't fix
it.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] 4B row limit for CLOB tables

2015-02-03 Thread Matthew Kelly
 Hmm 2^32 times aprox. 2kB (as per usual heuristics, ~4 rows per heap 
 page) is 8796093022208 (~9e13) bytes
 ... which results in 8192 1GB segments :O

8192 1GB segments is just 8TB, its not _that_ large.  At TripAdvisor we’ve been 
using a NoSQL solution to do session storage.  We are looking to probably swap 
that out to be Postgres (every other db backing the site is Postgres).  
Essentially, what I’m building is a system with 1 logical table that maps 
session id to a 2KB+ grab bag of ever changing session attributes which is 
partially normalized, partially json.  315 million uniques a month multiplied 
by the retention policy means I need to hold 2-4 billion session objects (and 
somehow expire old ones).  Additionally, most http calls can update the 
session, so between maintenance windows I expect to take around 20 billion 
'upserts’.  Obviously, I will have to shard and partition the table in 
practice, but this weekend I ran a test that demonstrated that a single table 
on a 9.4 server + logical replication + Dell 730xd can handle 4x that workload. 
 Well, it can for 38 hours… until you wrap xid’s on the toast table.  :P  I’ll 
be the first to admit that isn’t the normal use case though.  I’m happy to have 
found this thread, however, because I’m going to have to build around the 
global oid counter, explicitly the prevent the problem I explain below 
regarding clustering.

 Anybody actually reaching this limit out there?

Well its not the 4 billion row limit that concerns me, its the global shared 
counter in conjunction with pg_restore/clustering that is actually pretty 
concerning.

Just checked through all of TripAdvisor’s normal databases and the max tuples I 
see in single toast table is 17,000,000, so that is still a couple of orders of 
magnitude too small.  (however, close enough that it’ll be a concern in a few 
years).

However, I do have active databases where the current oid is between 1 billion 
and 2 billion.  They were last dump-restored for a hardware upgrade a couple 
years ago and were a bit more than half the size.  I therefore can imagine that 
I have tables which are keyed by ~8,000,000 consecutive oids.

I would argue that when it wraps there will be a single insert that will 
probably block for 2-5 minutes while it tries to accomplish ~8,000,000 index 
scans inside of GetNewOidWithIndex.  Even partitioning doesn’t protect you from 
this potential problem.

What even more weird is that this issue can be trigged by consuming too many 
oid’s in a different database in the same cluster (i.e. creating large amounts 
of temp tables)

 The problem with changing the id from 32 to 64 bits is that the storage *for 
 everybody else* doubles, making the implementation slower for most though 
 this might be actually not that important.

Well, you aren’t doubling the storage.  Even if you have to store the key in 4 
places, you are adding 16 bytes per TOAST tuple.  If we work off the 2KB 
estimate for each TOAST tuple, then you are only increasing the storage by 
0.7%.  I’m sure there are more hidden costs but we are really only talking 
about a low single digit percent increase.  In exchange, you get to drop one 
index scan per toast insert; an index scan looking in the only hot part of the 
index. 

That being said I’d be perfectly happy merely giving each TOAST table its own 
sequence as that almost entire mitigates the risk of an unexpected lock up on 
reasonably sized tables/partitions, and provides a functional work around for 
those of us with larger than average installs.

- Matt K
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] 4B row limit for CLOB tables

2015-02-03 Thread Jim Nasby

On 2/3/15 9:01 AM, Tom Lane wrote:

Matthew Kelly mke...@tripadvisor.com writes:

However, I do have active databases where the current oid is between 1 billion 
and 2 billion.  They were last dump-restored for a hardware upgrade a couple 
years ago and were a bit more than half the size.  I therefore can imagine that 
I have tables which are keyed by ~8,000,000 consecutive oids.



I would argue that when it wraps there will be a single insert that will 
probably block for 2-5 minutes while it tries to accomplish ~8,000,000 index 
scans inside of GetNewOidWithIndex.  Even partitioning doesn’t protect you 
from this potential problem.


That may be a hazard, but ...


That being said I’d be perfectly happy merely giving each TOAST table its own 
sequence as that almost entire mitigates the risk of an unexpected lock up on 
reasonably sized tables/partitions, and provides a functional work around for 
those of us with larger than average installs.


... this fix would actually make things enormously worse.  With the
single counter feeding all tables, you at least have a reasonable
probability that there are not enormously long runs of consecutive OIDs in
any one toast table.  With a sequence per table, you are nearly guaranteed
that there are such runs, because inserts into other tables don't create a
break.

(This effect is also why you're wrong to claim that partitioning can't fix
it.)


That's assuming that toasting is evenly spread between tables. In my 
experience, that's not a great bet...

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers