Re: [PERFORM] Millions of tables

2016-11-25 Thread Robert Klemme
Greg, sorry for the resent: I had forgotten to include the list. On Wed, Oct 5, 2016 at 2:34 PM, Greg Spiegelberg wrote: > Data is not static. The 4M tables fall into one of two groups. > > Group A contains 2M tables. INSERT will occur ~100 times/day and maximum >

Re: [PERFORM] Millions of tables

2016-10-08 Thread Jim Nasby
On 10/5/16 7:34 AM, Greg Spiegelberg wrote: When you say "must do a vacuum of the entire database", are you saying the entire database must be vacuum'd as a whole per 2B transactions or all tables must be vacuum'd eventually at least once? All tables at least once. Prior to 9.6, that had to

Re: [PERFORM] Millions of tables

2016-10-05 Thread Greg Spiegelberg
On Fri, Sep 30, 2016 at 4:49 PM, Jim Nasby wrote: > On 9/29/16 6:11 AM, Alex Ignatov (postgrespro) wrote: > >> With millions of tables you have to setautovacuum_max_workers >> sky-high =). We have some situation when at thousands of tables >> autovacuum can’t

Re: [PERFORM] Millions of tables

2016-10-01 Thread Jeff Janes
On Thu, Sep 29, 2016 at 4:11 AM, Alex Ignatov (postgrespro) < a.igna...@postgrespro.ru> wrote: > > > *From:* pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance- > ow...@postgresql.org] *On Behalf Of * > > Thank you Terry. You get the gold star. :) I was waiting for that to >

Re: [PERFORM] Millions of tables

2016-09-30 Thread Jim Nasby
On 9/29/16 6:11 AM, Alex Ignatov (postgrespro) wrote: With millions of tables you have to setautovacuum_max_workers sky-high =). We have some situation when at thousands of tables autovacuum can’t vacuum all tables that need it. Simply it vacuums some of most modified table and never reach

Re: [PERFORM] Millions of tables

2016-09-29 Thread Alex Ignatov (postgrespro)
From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Greg Spiegelberg Sent: Tuesday, September 27, 2016 7:28 PM To: Terry Schmitt <tschm...@schmittworks.com> Cc: pgsql-performa. <pgsql-performance@postgresql.org> Subject:

Re: [PERFORM] Millions of tables

2016-09-29 Thread Simon Riggs
On 26 September 2016 at 05:19, Greg Spiegelberg wrote: > I did look at PostgresXL and CitusDB. Both are admirable however neither > could support the need to read a random record consistently under 30ms. > It's a similar problem Cassandra and others have: network latency.

Re: [PERFORM] Millions of tables

2016-09-28 Thread Stephen Frost
Greg, * Greg Spiegelberg (gspiegelb...@gmail.com) wrote: > On Wed, Sep 28, 2016 at 11:27 AM, Stephen Frost wrote: > > * Greg Spiegelberg (gspiegelb...@gmail.com) wrote: > > > Bigger buckets mean a wider possibility of response times. Some buckets > > > may contain 140k

Re: [PERFORM] Millions of tables

2016-09-28 Thread Greg Spiegelberg
On Wed, Sep 28, 2016 at 11:27 AM, Stephen Frost wrote: > Greg, > > * Greg Spiegelberg (gspiegelb...@gmail.com) wrote: > > Bigger buckets mean a wider possibility of response times. Some buckets > > may contain 140k records and some 100X more. > > Have you analyzed the depth

Re: [PERFORM] Millions of tables

2016-09-28 Thread Stephen Frost
Greg, * Greg Spiegelberg (gspiegelb...@gmail.com) wrote: > Bigger buckets mean a wider possibility of response times. Some buckets > may contain 140k records and some 100X more. Have you analyzed the depth of the btree indexes to see how many more pages need to be read to handle finding a row

Re: [PERFORM] Millions of tables

2016-09-28 Thread Greg Spiegelberg
On Wed, Sep 28, 2016 at 9:39 AM, Vitalii Tymchyshyn wrote: > Have you considered having many databases (e.g. 100) and possibly many > postgresql servers (e.g. 10) started on different ports? > This would give you 1000x less tables per db. > The system design already allows for many

Re: [PERFORM] Millions of tables

2016-09-28 Thread Richard Albright
If going that route, why not just use plproxy? On Wed, Sep 28, 2016 at 11:39 AM, Vitalii Tymchyshyn wrote: > Have you considered having many databases (e.g. 100) and possibly many > postgresql servers (e.g. 10) started on different ports? > This would give you 1000x less tables per

Re: [PERFORM] Millions of tables

2016-09-28 Thread Vitalii Tymchyshyn
> > Have you considered having many databases (e.g. 100) and possibly many postgresql servers (e.g. 10) started on different ports? This would give you 1000x less tables per db. > >>

Re: [PERFORM] Millions of tables

2016-09-27 Thread Greg Spiegelberg
On Tue, Sep 27, 2016 at 10:15 AM, Terry Schmitt wrote: > > > On Sun, Sep 25, 2016 at 7:50 PM, Greg Spiegelberg > wrote: > >> Hey all, >> >> Obviously everyone who's been in PostgreSQL or almost any RDBMS for a >> time has said not to have

Re: [PERFORM] Millions of tables

2016-09-27 Thread Terry Schmitt
On Sun, Sep 25, 2016 at 7:50 PM, Greg Spiegelberg wrote: > Hey all, > > Obviously everyone who's been in PostgreSQL or almost any RDBMS for a time > has said not to have millions of tables. I too have long believed it until > recently. > > AWS d2.8xlarge instance with

Re: [PERFORM] Millions of tables

2016-09-27 Thread Greg Spiegelberg
On Tue, Sep 27, 2016 at 9:42 AM, Mike Sofen wrote: > *From:* Mike Sofen *Sent:* Tuesday, September 27, 2016 8:10 AM > > *From:* Greg Spiegelberg *Sent:* Monday, September 26, 2016 7:25 AM > I've gotten more responses than anticipated and have answered some > questions and

Re: [PERFORM] Millions of tables

2016-09-27 Thread Greg Spiegelberg
On Tue, Sep 27, 2016 at 8:30 AM, Craig James wrote: > On Sun, Sep 25, 2016 at 7:50 PM, Greg Spiegelberg > wrote: > >> Hey all, >> >> Obviously everyone who's been in PostgreSQL or almost any RDBMS for a >> time has said not to have millions of

Re: [PERFORM] Millions of tables

2016-09-27 Thread Mike Sofen
From: Mike Sofen Sent: Tuesday, September 27, 2016 8:10 AM From: Greg Spiegelberg Sent: Monday, September 26, 2016 7:25 AM I've gotten more responses than anticipated and have answered some questions and gotten some insight but my challenge again is what should I capture along the way to

Re: [PERFORM] Millions of tables

2016-09-27 Thread Mike Sofen
From: Greg Spiegelberg Sent: Monday, September 26, 2016 7:25 AM I've gotten more responses than anticipated and have answered some questions and gotten some insight but my challenge again is what should I capture along the way to prove or disprove this storage pattern? Alternatives to

Re: [PERFORM] Millions of tables

2016-09-27 Thread Craig James
On Sun, Sep 25, 2016 at 7:50 PM, Greg Spiegelberg wrote: > Hey all, > > Obviously everyone who's been in PostgreSQL or almost any RDBMS for a time > has said not to have millions of tables. I too have long believed it until > recently. > > AWS d2.8xlarge instance with

Re: [PERFORM] Millions of tables

2016-09-26 Thread Tom Lane
Jeff Janes writes: > A problem is that those statistics are stored in one file (per database; it > used to be one file per cluster). With 8 million tables, that is going to > be a pretty big file. But the code pretty much assumes the file is going > to be pretty small, and

Re: [PERFORM] Millions of tables

2016-09-26 Thread Jeff Janes
On Mon, Sep 26, 2016 at 5:53 AM, Greg Spiegelberg wrote: > >> >> I may need to understand autovacuum better. My impression was it > consulted statistics and performed vacuums one table at a time based on the > vacuum threshold formula on https://www.postgresql.org/ >

Re: [PERFORM] Millions of tables

2016-09-26 Thread Greg Spiegelberg
On Sun, Sep 25, 2016 at 8:50 PM, Greg Spiegelberg wrote: > Hey all, > > Obviously everyone who's been in PostgreSQL or almost any RDBMS for a time > has said not to have millions of tables. I too have long believed it until > recently. > > AWS d2.8xlarge instance with

Re: [PERFORM] Millions of tables

2016-09-26 Thread Stuart Bishop
On 26 September 2016 at 20:51, Greg Spiegelberg wrote: > > An alternative if you exhaust or don't trust other options, use a foreign >> data wrapper to access your own custom storage. A single table at the PG >> level, you can shard the data yourself into 8 bazillion

Re: [PERFORM] Millions of tables

2016-09-26 Thread Greg Spiegelberg
On Mon, Sep 26, 2016 at 4:23 AM, Rick Otten wrote: > Are the tables constantly being written to, or is this a mostly read > scenario? One architecture possibility, if the writes are not so > frequent, is to create just a handful of very big tables for writing, and >

Re: [PERFORM] Millions of tables

2016-09-26 Thread Greg Spiegelberg
On Mon, Sep 26, 2016 at 7:05 AM, Mike Sofen wrote: > *From:* Rick Otten *Sent:* Monday, September 26, 2016 3:24 AM > Are the tables constantly being written to, or is this a mostly read > scenario? > > > > With regards to consistent query performance, I think you need to get

Re: [PERFORM] Millions of tables

2016-09-26 Thread Greg Spiegelberg
Consider the problem though. Random access to trillions of records with no guarantee any one will be fetched twice in a short time frame nullifies the effectiveness of a cache unless the cache is enormous. If such a cache were that big, 100's of TB's, I wouldn't be looking at on-disk storage

Re: [PERFORM] Millions of tables

2016-09-26 Thread Greg Spiegelberg
On Mon, Sep 26, 2016 at 3:43 AM, Stuart Bishop wrote: > On 26 September 2016 at 11:19, Greg Spiegelberg > wrote: > >> I did look at PostgresXL and CitusDB. Both are admirable however neither >> could support the need to read a random record

Re: [PERFORM] Millions of tables

2016-09-26 Thread Mike Sofen
From: Rick Otten Sent: Monday, September 26, 2016 3:24 AM Are the tables constantly being written to, or is this a mostly read scenario? With regards to consistent query performance, I think you need to get out of AWS. That environment is terrible if you are going for consistency unless

Re: [PERFORM] Millions of tables

2016-09-26 Thread Yves Dorfsman
Something that is not talked about at all in this thread is caching. A bunch of memcache servers in front of the DB should be able to help with the 30ms constraint (doesn't have to be memcache, some caching technology). -- http://yves.zioup.com gpg: 4096R/32B0F416 -- Sent via

Re: [PERFORM] Millions of tables

2016-09-26 Thread Greg Spiegelberg
Following list etiquette response inline ;) On Mon, Sep 26, 2016 at 2:28 AM, Álvaro Hernández Tortosa wrote: > > > On 26/09/16 05:50, Greg Spiegelberg wrote: > >> Hey all, >> >> Obviously everyone who's been in PostgreSQL or almost any RDBMS for a >> time has said not to have

Re: [PERFORM] Millions of tables

2016-09-26 Thread Rick Otten
Are the tables constantly being written to, or is this a mostly read scenario? One architecture possibility, if the writes are not so frequent, is to create just a handful of very big tables for writing, and then make smaller tables as materialized views for reading. The vacuum and bloat

Re: [PERFORM] Millions of tables

2016-09-26 Thread Stuart Bishop
On 26 September 2016 at 11:19, Greg Spiegelberg wrote: > I did look at PostgresXL and CitusDB. Both are admirable however neither > could support the need to read a random record consistently under 30ms. > It's a similar problem Cassandra and others have: network

Re: [PERFORM] Millions of tables

2016-09-26 Thread Álvaro Hernández Tortosa
On 26/09/16 05:50, Greg Spiegelberg wrote: Hey all, Obviously everyone who's been in PostgreSQL or almost any RDBMS for a time has said not to have millions of tables. I too have long believed it until recently. AWS d2.8xlarge instance with 9.5 is my test rig using XFS on EBS (io1) for

Re: [PERFORM] Millions of tables

2016-09-26 Thread julyanto SUTANDANG
-sorry for my last email, which also not bottom posting- Hi Greg, On Mon, Sep 26, 2016 at 11:19 AM, Greg Spiegelberg wrote: > I did look at PostgresXL and CitusDB. Both are admirable however neither > could support the need to read a random record consistently under

Re: [PERFORM] Millions of tables

2016-09-26 Thread Jeff Janes
On Sun, Sep 25, 2016 at 7:50 PM, Greg Spiegelberg wrote: > Hey all, > > Obviously everyone who's been in PostgreSQL or almost any RDBMS for a time > has said not to have millions of tables. I too have long believed it until > recently. > > AWS d2.8xlarge instance with

Re: [PERFORM] Millions of tables

2016-09-25 Thread Gavin Flower
Hi Greg, Please follow the conventions of this mailing list, to avoid confusion - see bottom of this posting for further comments On 26/09/16 17:05, Greg Spiegelberg wrote: Precisely why I shared with the group. I must understand the risks involved. I need to explore if it can be stable

Re: [PERFORM] Millions of tables

2016-09-25 Thread Greg Spiegelberg
I did look at PostgresXL and CitusDB. Both are admirable however neither could support the need to read a random record consistently under 30ms. It's a similar problem Cassandra and others have: network latency. At this scale, to provide the ability to access any given record amongst trillions

Re: [PERFORM] Millions of tables

2016-09-25 Thread Greg Spiegelberg
Precisely why I shared with the group. I must understand the risks involved. I need to explore if it can be stable at this size when does it become unstable? Aside from locking down user access to superuser, is there a way to prohibit database-wide VACUUM & ANALYZE? Certainly putting my trust

Re: [PERFORM] Millions of tables

2016-09-25 Thread Mike Sofen
From: Greg Spiegelberg Sent: Sunday, September 25, 2016 7:50 PM … Over the weekend, I created 8M tables with 16M indexes on those tables. … A system or database crash could take potentially hours to days to recover. There are likely other issues ahead. You may wonder, "why is Greg

Re: [PERFORM] Millions of tables

2016-09-25 Thread julyanto SUTANDANG
Dear Greg, Have you checked PostgresXL ? with millions of table, how the apps choose which table is approriate? in my opinion, with that scale it should go with parallel query with data sharing like what PostgresXL is done. Thanks, Julyanto SUTANDANG Equnix Business Solutions, PT (An Open