Re: [PERFORM] Millions of tables

2016-09-26 Thread Jeff Janes
On Mon, Sep 26, 2016 at 5:53 AM, Greg Spiegelberg wrote: > >> >> I may need to understand autovacuum better. My impression was it > consulted statistics and performed vacuums one table at a time based on the > vacuum threshold formula on https://www.postgresql.org/ >

Re: [PERFORM] Millions of tables

2016-09-26 Thread Tom Lane
Jeff Janes writes: > A problem is that those statistics are stored in one file (per database; it > used to be one file per cluster). With 8 million tables, that is going to > be a pretty big file. But the code pretty much assumes the file is going > to be pretty small, and

Re: [PERFORM] [HACKERS] temporary table vs array performance

2016-09-26 Thread David G. Johnston
Its considered bad form to post to multiple lists. Please pick the most relevant one - in this case I'd suggest -general. On Mon, Sep 26, 2016 at 8:39 AM, dby...@163.com wrote: > > Array is not convenient to use in function, whether > there are other methods can be replaced

[PERFORM] temporary table vs array performance

2016-09-26 Thread dby...@163.com
test: create type h3 as (id int,name char(10)); CREATE or replace FUNCTION proc17() RETURNS SETOF h3 AS $$ DECLARE v_rec h3; BEGIN create temp table abc(id int,name varchar) on commit drop; insert into abc select 1,'lw'; insert into abc select 2,'lw2'; for v_rec in select * from abc loop

Re: [PERFORM] Problem with performance using query with unnest after migrating from V9.1 to V9.2 and higher

2016-09-26 Thread Knels, Udo
Hi, Thank you very much for your answers. Yes, 50 rows aren't enough, but the original table has about 14 million rows and after analyzing the table I got the same result. We changed our functions and used string_to_array instead of unnest and its ok. It was not only a problem with one

Re: [PERFORM] [HACKERS] temporary table vs array performance

2016-09-26 Thread Pavel Stehule
2016-09-26 17:39 GMT+02:00 dby...@163.com : > test: > create type h3 as (id int,name char(10)); > > CREATE or replace FUNCTION proc17() > RETURNS SETOF h3 AS $$ > DECLARE > v_rec h3; > BEGIN > create temp table abc(id int,name varchar) on commit drop; > insert into abc

Re: [PERFORM] Storing large documents - one table or partition by doc?

2016-09-26 Thread Jim Nasby
Please CC the mailing list so others can chime in or learn... On 9/26/16 3:26 AM, Dev Nop wrote: What I would look into at this point is using int ranges and arrays to greatly reduce your overhead: CREATE TABLE ...( document_version_id int NOT NULL REFERENCES document_version

Re: [PERFORM] Millions of tables

2016-09-26 Thread Álvaro Hernández Tortosa
On 26/09/16 05:50, Greg Spiegelberg wrote: Hey all, Obviously everyone who's been in PostgreSQL or almost any RDBMS for a time has said not to have millions of tables. I too have long believed it until recently. AWS d2.8xlarge instance with 9.5 is my test rig using XFS on EBS (io1) for

Re: [PERFORM] Storing large documents - one table or partition by doc?

2016-09-26 Thread Dev Nop
> > If GUIDs *stored in a binary format* were too large, then you won't be > terribly happy with the 24 byte per-row overhead in Postgres. Heh. In this case the ids have a life outside the database in various text formats. > What I would look into at this point is using int ranges and arrays

Re: [PERFORM] Millions of tables

2016-09-26 Thread Jeff Janes
On Sun, Sep 25, 2016 at 7:50 PM, Greg Spiegelberg wrote: > Hey all, > > Obviously everyone who's been in PostgreSQL or almost any RDBMS for a time > has said not to have millions of tables. I too have long believed it until > recently. > > AWS d2.8xlarge instance with

Re: [PERFORM] Millions of tables

2016-09-26 Thread Rick Otten
Are the tables constantly being written to, or is this a mostly read scenario? One architecture possibility, if the writes are not so frequent, is to create just a handful of very big tables for writing, and then make smaller tables as materialized views for reading. The vacuum and bloat

Re: [PERFORM] Millions of tables

2016-09-26 Thread Stuart Bishop
On 26 September 2016 at 11:19, Greg Spiegelberg wrote: > I did look at PostgresXL and CitusDB. Both are admirable however neither > could support the need to read a random record consistently under 30ms. > It's a similar problem Cassandra and others have: network

Re: [PERFORM] Millions of tables

2016-09-26 Thread Greg Spiegelberg
Following list etiquette response inline ;) On Mon, Sep 26, 2016 at 2:28 AM, Álvaro Hernández Tortosa wrote: > > > On 26/09/16 05:50, Greg Spiegelberg wrote: > >> Hey all, >> >> Obviously everyone who's been in PostgreSQL or almost any RDBMS for a >> time has said not to have

Re: [PERFORM] Millions of tables

2016-09-26 Thread Mike Sofen
From: Rick Otten Sent: Monday, September 26, 2016 3:24 AM Are the tables constantly being written to, or is this a mostly read scenario? With regards to consistent query performance, I think you need to get out of AWS. That environment is terrible if you are going for consistency unless

Re: [PERFORM] Millions of tables

2016-09-26 Thread Yves Dorfsman
Something that is not talked about at all in this thread is caching. A bunch of memcache servers in front of the DB should be able to help with the 30ms constraint (doesn't have to be memcache, some caching technology). -- http://yves.zioup.com gpg: 4096R/32B0F416 -- Sent via

Re: [PERFORM] Millions of tables

2016-09-26 Thread julyanto SUTANDANG
-sorry for my last email, which also not bottom posting- Hi Greg, On Mon, Sep 26, 2016 at 11:19 AM, Greg Spiegelberg wrote: > I did look at PostgresXL and CitusDB. Both are admirable however neither > could support the need to read a random record consistently under

Re: [PERFORM] Millions of tables

2016-09-26 Thread Greg Spiegelberg
On Mon, Sep 26, 2016 at 3:43 AM, Stuart Bishop wrote: > On 26 September 2016 at 11:19, Greg Spiegelberg > wrote: > >> I did look at PostgresXL and CitusDB. Both are admirable however neither >> could support the need to read a random record

Re: [PERFORM] Millions of tables

2016-09-26 Thread Greg Spiegelberg
On Mon, Sep 26, 2016 at 4:23 AM, Rick Otten wrote: > Are the tables constantly being written to, or is this a mostly read > scenario? One architecture possibility, if the writes are not so > frequent, is to create just a handful of very big tables for writing, and >

Re: [PERFORM] Millions of tables

2016-09-26 Thread Greg Spiegelberg
Consider the problem though. Random access to trillions of records with no guarantee any one will be fetched twice in a short time frame nullifies the effectiveness of a cache unless the cache is enormous. If such a cache were that big, 100's of TB's, I wouldn't be looking at on-disk storage

Re: [PERFORM] Millions of tables

2016-09-26 Thread Greg Spiegelberg
On Mon, Sep 26, 2016 at 7:05 AM, Mike Sofen wrote: > *From:* Rick Otten *Sent:* Monday, September 26, 2016 3:24 AM > Are the tables constantly being written to, or is this a mostly read > scenario? > > > > With regards to consistent query performance, I think you need to get

Re: [PERFORM] Millions of tables

2016-09-26 Thread Stuart Bishop
On 26 September 2016 at 20:51, Greg Spiegelberg wrote: > > An alternative if you exhaust or don't trust other options, use a foreign >> data wrapper to access your own custom storage. A single table at the PG >> level, you can shard the data yourself into 8 bazillion

Re: [PERFORM] Millions of tables

2016-09-26 Thread Greg Spiegelberg
On Sun, Sep 25, 2016 at 8:50 PM, Greg Spiegelberg wrote: > Hey all, > > Obviously everyone who's been in PostgreSQL or almost any RDBMS for a time > has said not to have millions of tables. I too have long believed it until > recently. > > AWS d2.8xlarge instance with