Re: [PERFORM] Number of tables

2009-08-31 Thread Mike Ivanov

Fabio La Farcioli wrote:

Hi to all,

i am developing a web app for thousands users (1.000/2.000).

Each user have a 2 table of work...I finally have 2.000 (users) x 2 
tables = 4.000 tables!


As a someone with a ~50K-table database, I can tell you it's definitely 
possible to survive with such a layout :-)


However, expect very slow (hours) pg_dump, \dt and everything else that 
requires reading schema information for the whole db.



Mike


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-31 Thread Greg Stark
On Tue, Sep 1, 2009 at 1:19 AM, Mike Ivanovmi...@activestate.com wrote:
 i am developing a web app for thousands users (1.000/2.000).

 Each user have a 2 table of work...I finally have 2.000 (users) x 2 tables
 = 4.000 tables!

 As a someone with a ~50K-table database, I can tell you it's definitely
 possible to survive with such a layout :-)

The usual recommendation is to have a single table (or two tables in
this case) with userid forming part of the primary key in addition to
whatever identifies the records within the user's set of data. You may
not expect to be need to run queries which combine multiple users'
data now but you will eventually.

This doesn't work so great when each user is going to be specifying
their own custom schema on the fly but that's not really what
relational databases were designed for. For that you might want to
look into the hstore contrib module or  something like CouchDB (which
can be combined with Postgres I hear)


-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-31 Thread Mike Ivanov

Greg Stark wrote:

You may
not expect to be need to run queries which combine multiple users'
data now but you will eventually.
  


We store cross-user data in a separate schema, which solves all *our* 
problems.



This doesn't work so great when each user is going to be specifying
their own custom schema on the fly 


This works fine, at least we didn't encounter any issues with that.


but that's not really what
relational databases were designed for. 


Sometimes you have to.. you know, unusual things to meet some specific 
requirements, like independent user schemas. It's not a conventional web 
app we run :-)


I'm not arguing this is a bit extremal approach, but if one is forced to 
go this path, it's quite walkable ;-)


Mike


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-24 Thread Kevin Grittner
Greg Stark gsst...@mit.edu wrote:
 
 Creating new catalog entries for [temp tables] gives up -- what I
 think is the whole point of their design -- their lack of DDL
 overhead.
 
As long as we're brainstorming...  Would it make any sense for temp
tables to be created as in-memory tuplestores up to the point that we
hit the temp_buffers threshold?  Creating and deleting a whole set of
disk files per temp table is part of what makes them so heavy. 
(There's still the issue of dealing with the catalogs, of course)
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-22 Thread Robert Haas
On Thu, Aug 20, 2009 at 8:38 PM, Alvaro
Herreraalvhe...@commandprompt.com wrote:
 Greg Stark wrote:

 It would be nice to have a solution to that where you could create
 lightweight temporary objects which belong to an application session
 which can be picked up by a different database connection each go
 around.

 It would be useful:

 CREATE SCHEMA session1234 UNLOGGED
  CREATE TABLE hitlist ( ... );

 Each table in the session1234 schema would not be WAL-logged, and
 would be automatically dropped on crash recovery (actually the whole
 schema would be).  But while the server is live it behaves like a
 regular schema/table and can be seen by all backends (i.e. not temp)

+1.  In fact, I don't even see why the unlogged property needs to be
a schema property.  I think you could just add a table reloption.
(There are some possible foot-gun scenarios if the option were changed
subsequent to table creation, so we'd either need to decide how to
deal with those, or decide not to allow it.)

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-21 Thread Jerry Champlin
I think this requirement can be lumped into the category of right  
hammer, right nail instead of the one hammer, all nails category.   
There are many memory only or disk backed memory based key value  
stores which meet your requirements like Reddis and memcached.


-Jerry

Jerry Champlin|Absolute Performance Inc.

On Aug 20, 2009, at 5:52 PM, Greg Stark gsst...@mit.edu wrote:

On Thu, Aug 20, 2009 at 11:18 PM, Craig Jamescraig_ja...@emolecules.com 
 wrote:

Greg Stark wrote:


What you want is a multi-column primary key where userid is part of
the key. You don't want to have a separate table for each user  
unless

each user has their own unique set of columns.

Not always true.

...

The primary difference is between
 delete from big_table where userid = xx
vesus
 truncate user_table



This is a valid point but it's a fairly special case. For most
applications the overhead of deleting records and having to run vacuum
will be manageable and a small contribution to the normal vacuum
traffic. Assuming the above is necessary is a premature optimization
which is probably unnecessary.


There are also significant differences in performance for large  
inserts,
because a single-user table almost never needs indexes at all,  
whereas a big
table for everyone has to have at least one user-id column that's  
indexed.


Maintaining indexes isn't free but one index is hardly going to be a
dealbreaker.

Once the hitlist is populated, the user can page through it quickly  
with no

further searching, e.g. using a web app.


The traditional approach to this would be a temporary table. However
in the modern world of web applications where the user session does
not map directly to a database session that no longer works (well it
never really worked in Postgres where temporary tables are not so
lightweight :( ).

It would be nice to have a solution to that where you could create
lightweight temporary objects which belong to an application session
which can be picked up by a different database connection each go
around.

--
greg
http://mit.edu/~gsstark/resume.pdf

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org 
)

To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Number of tables

2009-08-20 Thread Fabio La Farcioli

Hi to all,

i am developing a web app for thousands users (1.000/2.000).

Each user have a 2 table of work...I finally have 2.000 (users) x 2 
tables = 4.000 tables!


Postgres support an elevate number of tables??
i have problem of performance ???


Thanks

Sorry for my english

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-20 Thread Fabio La Farcioli

Craig Ringer ha scritto:

On Thu, 2009-08-20 at 09:01 +0200, Fabio La Farcioli wrote:

Each user have a 2 table of work...I finally have 2.000 (users) x 2 
tables = 4.000 tables!


Hmm, ok. Does each user really need two tables each? Why?

Does the set of tables for each user have a different structure? Or are
you separating them so you can give each user a separate database role
and ownership of their own tables?


No no...


i have problem of performance ???


Yes, you probably will. There is a cost to having _lots_ of tables in
PostgreSQL in terms of maintaining table statistics, autovacuum work,
etc. I doubt it'll be too bad at 4000 tables, but if your user numbers
keep growing it could become a problem.


The number of the user probably will increase with the time...


Other concerns are that it'll also be hard to maintain your design,
difficult to write queries that read data from more than one user, etc.
If you need to change the schema of your user tables you're going to
have to write custom tools to automate it. It could get very clumsy.


It's true...i don't think to this problem..



Note that whether this is a good idea DOES depend on how much data
you're going to have. If each user table will have _lots_ of data, then
individual tables might be a better approach after all. It's also a
benefit if you do intend to give each user their own database role.


Every table have between 1.000 and 100.000(MAX) records...

Do you think i don't have problem in performance ??
The user only view the record whit its user_id

I am thinking to redesign the DB


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-20 Thread Jochen Erwied
Thursday, August 20, 2009, 9:01:30 AM you wrote:

 i am developing a web app for thousands users (1.000/2.000).

 Each user have a 2 table of work...I finally have 2.000 (users) x 2 
 tables = 4.000 tables!

If all tables are created equal, I would rethink the design. Instead of
using 2 tables per user I'd use 2 tables with one column specifying the
user(-id).

Especially changes in table layout would require you to change up to 2000 
tables, which is prone to errors...

-- 
Jochen Erwied |   home: joc...@erwied.eu +49-208-38800-18, FAX: -19
Sauerbruchstr. 17 |   work: j...@mbs-software.de  +49-2151-7294-24, FAX: -50
D-45470 Muelheim  | mobile: jochen.erw...@vodafone.de   +49-173-5404164


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-20 Thread Craig Ringer
On Thu, 2009-08-20 at 09:01 +0200, Fabio La Farcioli wrote:

 Each user have a 2 table of work...I finally have 2.000 (users) x 2 
 tables = 4.000 tables!

Hmm, ok. Does each user really need two tables each? Why?

Does the set of tables for each user have a different structure? Or are
you separating them so you can give each user a separate database role
and ownership of their own tables?


 Postgres support an elevate number of tables??

Thousands? Sure.

 i have problem of performance ???
 
Yes, you probably will. There is a cost to having _lots_ of tables in
PostgreSQL in terms of maintaining table statistics, autovacuum work,
etc. I doubt it'll be too bad at 4000 tables, but if your user numbers
keep growing it could become a problem.

Other concerns are that it'll also be hard to maintain your design,
difficult to write queries that read data from more than one user, etc.
If you need to change the schema of your user tables you're going to
have to write custom tools to automate it. It could get very clumsy.

Instead of one or two tables per user, perhaps you should keep the data
in just a few tables, with a composite primary key that includes the
user ID. eg given the user table:

CREATE TABLE user (
  id SERIAL PRIMARY KEY,
  name text
);

instead of:

CREATE TABLE user1_tablea(
  id INTEGER PRIMARY KEY,
  blah text,
  blah2 integer
);

CREATE TABLE user2_tablea(
  id INTEGER PRIMARY KEY,
  blah text,
  blah2 integer
);

... etc ...


you might write:

CREATE TABLE tablea  (
  user_id INTEGER REFERENCES user(id),
  id INTEGER,
  PRIMARY KEY(user_id, id),
  blah text,
  blah2 integer
);


You can, of course, partition this table into blocks of user-IDs behind
the scenes, but your partitioning is invisible to your web app and can
be done solely for performance reasons. You don't have to try juggling
all these little tables.


Note that whether this is a good idea DOES depend on how much data
you're going to have. If each user table will have _lots_ of data, then
individual tables might be a better approach after all. It's also a
benefit if you do intend to give each user their own database role.

--
Craig Ringer



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-20 Thread Craig James

Fabio La Farcioli wrote:

i am developing a web app for thousands users (1.000/2.000).

Each user have a 2 table of work...I finally have 2.000 (users) x 2 
tables = 4.000 tables!


Postgres support an elevate number of tables??
i have problem of performance ???


We have run databases with over 100,000 tables with no problems.

However, we found that it's not a good idea to have a table-per-user design.  
As you get more users, it is hard to maintain the database.  Most of the time 
there are only a few users active.

So, we create a single large archive table, identical to the per-user table except that 
it also has a user-id column.  When a user hasn't logged in for a few hours, a cron process copies 
their tables into the large archive table, and returns their personal tables to a pool 
of available tables.

When the user logs back in, a hidden part of the login process gets a table 
from the pool of available tables, assigns it to this user, and copies the 
user's  data from the archive into this personal table.  They are now ready to 
work. This whole process takes just a fraction of a second for most users.

We keep a pool of about 200 tables, which automatically will expand (create 
more tables) if needed, but we've never had more than 200 users active at one 
time.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-20 Thread Greg Stark
On Thu, Aug 20, 2009 at 9:16 PM, Craig Jamescraig_ja...@emolecules.com wrote:
 Fabio La Farcioli wrote:

 i am developing a web app for thousands users (1.000/2.000).

 Each user have a 2 table of work...I finally have 2.000 (users) x 2 tables
 = 4.000 tables!

 Postgres support an elevate number of tables??
 i have problem of performance ???

What you want is a multi-column primary key where userid is part of
the key. You don't want to have a separate table for each user unless
each user has their own unique set of columns.


 When the user logs back in, a hidden part of the login process gets a table
 from the pool of available tables, assigns it to this user, and copies the
 user's  data from the archive into this personal table.  They are now ready
 to work. This whole process takes just a fraction of a second for most
 users.

And what does all this accomplish?


-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-20 Thread Craig James

Greg Stark wrote:

What you want is a multi-column primary key where userid is part of
the key. You don't want to have a separate table for each user unless
each user has their own unique set of columns.


Not always true.


When the user logs back in, a hidden part of the login process gets a table
from the pool of available tables, assigns it to this user, and copies the
user's  data from the archive into this personal table.  They are now ready
to work. This whole process takes just a fraction of a second for most
users.


And what does all this accomplish?


The primary difference is between

 delete from big_table where userid = xx

vesus

 truncate user_table

There are also significant differences in performance for large inserts, 
because a single-user table almost never needs indexes at all, whereas a big 
table for everyone has to have at least one user-id column that's indexed.

In our application, the per-user tables are hitlists -- scratch lists that 
are populated something like this.  The hitlist is something like this:

  create table hitlist_xxx (
row_id integer,
sortorder integer default nextval('hitlist_seq_xxx')
  )


  truncate table hitlist_xxx;
  select setval(hitlist_seq_xxx, 1, false);
  insert into hitlist_xxx (row_id) (select some_id from ... where ... order by 
...);

Once the hitlist is populated, the user can page through it quickly with no 
further searching, e.g. using a web app.

We tested the performance using a single large table in Postgres, and it was 
not nearly what we needed.  These hitlists tend to be transitory, and the 
typical operation is to discard the entire list and create a new one.  
Sometimes the user will sort the entire list based on some criterion, which 
also requires a copy/delete/re-insert using a new order-by.

With both Oracle and Postgres, truncate is MUCH faster than delete, and the 
added index needed for a single large table only makes it worse.  With 
Postgres, the repeated large delete/insert makes for tables that need a lot of 
vacuuming and index bloat, further hurting performance.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-20 Thread Greg Stark
On Thu, Aug 20, 2009 at 11:18 PM, Craig Jamescraig_ja...@emolecules.com wrote:
 Greg Stark wrote:

 What you want is a multi-column primary key where userid is part of
 the key. You don't want to have a separate table for each user unless
 each user has their own unique set of columns.
 Not always true.
...
 The primary difference is between
  delete from big_table where userid = xx
 vesus
  truncate user_table


This is a valid point but it's a fairly special case. For most
applications the overhead of deleting records and having to run vacuum
will be manageable and a small contribution to the normal vacuum
traffic. Assuming the above is necessary is a premature optimization
which is probably unnecessary.


 There are also significant differences in performance for large inserts,
 because a single-user table almost never needs indexes at all, whereas a big
 table for everyone has to have at least one user-id column that's indexed.

Maintaining indexes isn't free but one index is hardly going to be a
dealbreaker.

 Once the hitlist is populated, the user can page through it quickly with no
 further searching, e.g. using a web app.

The traditional approach to this would be a temporary table. However
in the modern world of web applications where the user session does
not map directly to a database session that no longer works (well it
never really worked in Postgres where temporary tables are not so
lightweight :( ).

It would be nice to have a solution to that where you could create
lightweight temporary objects which belong to an application session
which can be picked up by a different database connection each go
around.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-20 Thread Alvaro Herrera
Greg Stark wrote:

 It would be nice to have a solution to that where you could create
 lightweight temporary objects which belong to an application session
 which can be picked up by a different database connection each go
 around.

It would be useful:

CREATE SCHEMA session1234 UNLOGGED
  CREATE TABLE hitlist ( ... );

Each table in the session1234 schema would not be WAL-logged, and
would be automatically dropped on crash recovery (actually the whole
schema would be).  But while the server is live it behaves like a
regular schema/table and can be seen by all backends (i.e. not temp)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-20 Thread Greg Stark
On Fri, Aug 21, 2009 at 1:38 AM, Alvaro
Herreraalvhe...@commandprompt.com wrote:
 Greg Stark wrote:

 It would be nice to have a solution to that where you could create
 lightweight temporary objects which belong to an application session
 which can be picked up by a different database connection each go
 around.

 It would be useful:

 CREATE SCHEMA session1234 UNLOGGED
  CREATE TABLE hitlist ( ... );

 Each table in the session1234 schema would not be WAL-logged, and
 would be automatically dropped on crash recovery (actually the whole
 schema would be).  But while the server is live it behaves like a
 regular schema/table and can be seen by all backends (i.e. not temp)

I don't think unlogged is the only, and perhaps not even the most
important, desirable property.

I would want these objects not to cause catalog churn. I might have
thousands of sessions being created all the time and creating new rows
and index pointers which have to be vacuumed would be a headache.

I would actually want the objects to be invisible to other sessions,
at least by default. You would have to have the handle for the
application session to put them into your scope and then you would get
them all en masse. This isn't so much for security -- I would be fine
if there was a back door if you have the right privileges -- but for
application design, so application queries could use prepared plans
without modifying the query to point to hard code the session
information within them and be replanned.

I'm not sure if they should use shared buffers or local buffers. As
long as only one backend at a time could access them it would be
possible to use local buffers and evict them all when the handle is
given up. But that means giving up any caching benefit across
sessions. On the other hand it means they'll be much lighter weight
and easier to make safely unlogged than if they lived in shared
buffers.

These are just some brainstorming ideas, I don't have a clear vision
of how to achieve all this yet. This does sound a lot like the SQL
standard temp table discussion and I think Tom and I are still at odds
on that. Creating new catalog entries for them gives up -- what I
think is the whole point of their design -- their lack of DDL
overhead. But my design above means problems for transactional
TRUNCATE and other DDL.


-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance