Re: [PERFORM] Databases vs Schemas

2009-10-10 Thread Merlin Moncure
On Fri, Oct 9, 2009 at 10:50 PM, Scott Carey sc...@richrelevance.com wrote:
 On 10/9/09 2:02 PM, Merlin Moncure mmonc...@gmail.com wrote:

 On Fri, Oct 9, 2009 at 1:46 PM, Scott Otis scott.o...@intand.com wrote:
 Over the next couple of months we will be creating an instance of our
 solution for each public school district in the US which is around 18,000.
 That means currently we would be creating 18,000 databases (all on one 
 server
 right now ­ which is running 8.4).  I am assuming this is probably not the
 best way of doing things.

 Schema advantages:
 *) maintenance advantages; all functions/trigger functions can be
 shared.  HUGE help if you use them
 *) can query shared data between schemas without major headaches
 *) a bit more efficiency especially if private data areas are small.
 kinda analogous to processes vs threads
 *) Can manage the complete system without changing database sessions.
 This is the kicker IMO.

 Database Advantages:
 *) More discrete.  Easier to distinctly create, dump, drop, or move to
 separate server
 *) Smaller system catalogs might give efficiency benefits


 I'm concerned how a system with 57 * 18000  1M tables will function.

 I've got 200,000 tables in one db (8.4), and some tools barely work.  The
 system catalogs get inefficient when large and psql especially has trouble.
 Tab completion takes forever, even if I make a schema s with one table in
 it and type s. and try and tab complete -- its as if its scanning all
 without a schema qualifier or using an index.  Sometimes it does not match
 valid tables at all, and sometimes regex matching fails too ('\dt
 schema.*_*_*' intermittently flakes out if it returns a lot of matches).
 Other than that the number of tables doesn't seem to cause much performance
 trouble.  The only exception is constraint exclusion which is fundamentally
 broken with too many tables on the performance and memory consumption side.

 Having a lot of tables really makes me wish VACUUM, ANALYZE, and other
 maintenance tools could partially matched object names with regex though.

 On the other hand, lots of databases probably has performance drawbacks too.
 And its maintenance drawbacks are even bigger.

 I certainly don't see any reason at all to try and put all of these in one
 schema.  The only useful choices are schemas vs databases.  I'd go for
 schemas unless the performance issues there are a problem.   Schemas can be
 dumped/restored/backed up independent of one another easily too.

They can, but: drop schema foo cascade; is a different operation than:
drop database foo;  The first is kinda surgical and the second is a
rocket launcher.  What would you rather have in battle?

For the record, just about every database I've ever designed has had
some of what I call 'de facto table partitioning' using
schemas/search_path tricks.  I'm working on a system right now that is
going to get very large and if I started to run into psql problems I'd
probably look at patching it, maybe \set an option  to simplify some
of the queries.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Databases vs Schemas

2009-10-10 Thread Chris Kratz
On Fri, Oct 9, 2009 at 11:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Scott Carey sc...@richrelevance.com writes:
  I've got 200,000 tables in one db (8.4), and some tools barely work.  The
  system catalogs get inefficient when large and psql especially has
 trouble.
  Tab completion takes forever, even if I make a schema s with one table
 in
  it and type s. and try and tab complete -- its as if its scanning all
  without a schema qualifier or using an index.

 The tab-completion queries have never been vetted for performance
 particularly :-(

 Just out of curiosity, how much does this help?

 alter function pg_table_is_visible(oid) cost 10;

 (You'll need to do it as superuser --- if it makes things worse, just
 set the cost back to 1.)

  Sometimes it does not match
  valid tables at all, and sometimes regex matching fails too ('\dt
  schema.*_*_*' intermittently flakes out if it returns a lot of matches).

 There are some arbitrary LIMIT 1000 clauses in those queries, which
 probably explains this ... but taking them out would likely cause
 libreadline to get indigestion ...

regards, tom lane


We ran into this exact situation with a pg 8.3 database and a very large
number of tables.  psql would wait for 20 to 30 seconds if the user was
unlucky enough to hit the tab key.  After doing some research with query
logging, explain analyze and some trial and error, we came to the same
conclusion.  Altering the cost for the pg_table_is_visible function to 10
fixed our performance problem immediately.  It appears that when the cost
was set to 1, that the query optimizer first ran the function over the
entire pg_class table.  By increasing the cost, it now only runs the
function over the rows returned by the other items in the where clause.

-chris


Re: [PERFORM] Databases vs Schemas

2009-10-10 Thread Scott Marlowe
On Sat, Oct 10, 2009 at 8:44 AM, Chris Kratz chris.kr...@vistashare.com wrote:

 alter function pg_table_is_visible(oid) cost 10;

 (You'll need to do it as superuser --- if it makes things worse, just
 set the cost back to 1.)

  Sometimes it does not match
  valid tables at all, and sometimes regex matching fails too ('\dt
  schema.*_*_*' intermittently flakes out if it returns a lot of matches).

 There are some arbitrary LIMIT 1000 clauses in those queries, which
 probably explains this ... but taking them out would likely cause
 libreadline to get indigestion ...

                        regards, tom lane


 We ran into this exact situation with a pg 8.3 database and a very large
 number of tables.  psql would wait for 20 to 30 seconds if the user was
 unlucky enough to hit the tab key.  After doing some research with query
 logging, explain analyze and some trial and error, we came to the same
 conclusion.  Altering the cost for the pg_table_is_visible function to 10
 fixed our performance problem immediately.  It appears that when the cost
 was set to 1, that the query optimizer first ran the function over the
 entire pg_class table.  By increasing the cost, it now only runs the
 function over the rows returned by the other items in the where clause.

We have a large number of objects in our db and this worked for me
too!  Thanks a lot.  As a side note, it also makes slony create set
stuff run really really slow as well, and I'm guessing there's a
similar trick for the slony functions I can add and see if it helps.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Databases vs Schemas

2009-10-09 Thread Scott Otis
I am seeking advice on what the best setup for the following would be.

 

My company provides a hosted web calendaring solution for school
districts.  For each school district we have a separate database.  Each
database has 57 tables.  There are a total of 649 fields in those
tables.  Here is a table of the different kinds of field and how many
there are:

 

time without time zone

bytea

date

smallint

boolean

integer

timestamp without time zone

numeric

text

9

4

8

1

79

195

36

8

309

 

 

Over the next couple of months we will be creating an instance of our
solution for each public school district in the US which is around
18,000.  That means currently we would be creating 18,000 databases (all
on one server right now - which is running 8.4).  I am assuming this is
probably not the best way of doing things.

 

I have read up on schemas and it looks like a good change to make would
be to create 1 database with 18,000 schemas.

 

Would that be a good idea?  What sort of issues should I be aware of
(administrative, management, performance, etc...)?  Is that too many
schemas to put into 1 database?  What are the limits on the number of
databases and schemas you can create?

 

Should I try to re-engineer things so that all 18,000 instances only use
1 database and 1 schema?

 

Let me know if you need any more info.

 

Any advice and information would be greatly appreciated.

 

Regards,

 

Scott Otis

CIO / Lead Developer

Intand

www.intand.com

 



Re: [PERFORM] Databases vs Schemas

2009-10-09 Thread Ben Chobot

Scott Otis wrote:


I am seeking advice on what the best setup for the following would be.

 

My company provides a hosted web calendaring solution for school 
districts.  For each school district we have a separate database.  
Each database has 57 tables.






Over the next couple of months we will be creating an instance of our 
solution for each public school district in the US which is around 
18,000. 




Why are you trying to keep all this information on one server? It seems 
like you have such perfectly independent silos of data, why not take the 
opportunity to scale out horizontally? It's usually a lot cheaper to buy 
4 machines of power x than one machine of power (4x).


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Databases vs Schemas

2009-10-09 Thread Merlin Moncure
On Fri, Oct 9, 2009 at 1:46 PM, Scott Otis scott.o...@intand.com wrote:
 Over the next couple of months we will be creating an instance of our 
 solution for each public school district in the US which is around 18,000.  
 That means currently we would be creating 18,000 databases (all on one server 
 right now – which is running 8.4).  I am assuming this is probably not the 
 best way of doing things.

Schema advantages:
*) maintenance advantages; all functions/trigger functions can be
shared.  HUGE help if you use them
*) can query shared data between schemas without major headaches
*) a bit more efficiency especially if private data areas are small.
kinda analogous to processes vs threads
*) Can manage the complete system without changing database sessions.
This is the kicker IMO.

Database Advantages:
*) More discrete.  Easier to distinctly create, dump, drop, or move to
separate server
*) Smaller system catalogs might give efficiency benefits

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Databases vs Schemas

2009-10-09 Thread Tom Lane
Scott Carey sc...@richrelevance.com writes:
 I've got 200,000 tables in one db (8.4), and some tools barely work.  The
 system catalogs get inefficient when large and psql especially has trouble.
 Tab completion takes forever, even if I make a schema s with one table in
 it and type s. and try and tab complete -- its as if its scanning all
 without a schema qualifier or using an index.

The tab-completion queries have never been vetted for performance
particularly :-(

Just out of curiosity, how much does this help?

alter function pg_table_is_visible(oid) cost 10;

(You'll need to do it as superuser --- if it makes things worse, just
set the cost back to 1.)

 Sometimes it does not match
 valid tables at all, and sometimes regex matching fails too ('\dt
 schema.*_*_*' intermittently flakes out if it returns a lot of matches).

There are some arbitrary LIMIT 1000 clauses in those queries, which
probably explains this ... but taking them out would likely cause
libreadline to get indigestion ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Databases Vs. Schemas

2004-03-26 Thread CoL
hi

Josh Berkus wrote:

Stalin,


We are evaluating the options for having multiple databases vs. schemas on a
single database cluster for a custom grown app that we developed. Each app
installs same set of tables for each service. And the service could easily
be in thousands. so Is it better to have 1000 databases vs 1000 schemas in a
database cluster. What are the performance overhead of having multiple
databases vs. schemas (if any). I'm leaning towards having schemas rather
than databases but i would like to get others opinion on this. Appreciate
your reply.


No performance difference AFAIK.   The real question is whether you have to 
have queries joining several databases.   If yes, use Schema; if no, use 
databases.
don't forget the pg_hba.conf :) You need 1000 declaration. Was a thread 
before, title: performance problem - 10.000 databases
Check this:
http://groups.google.com/groups?hl=enlr=ie=UTF-8threadm=1068039213.28814.116.camel%40franki-laptop.tpi.plrnum=10prev=/groups%3Fq%3D1000%2Bdatabase%2Bgroup:comp.databases.postgresql.*%26hl%3Den%26lr%3D%26ie%3DUTF-8%26group%3Dcomp.databases.postgresql.*%26selm%3D1068039213.28814.116.camel%2540franki-laptop.tpi.pl%26rnum%3D10

C.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] Databases Vs. Schemas

2004-03-26 Thread Subbiah, Stalin
Hi All,

We are evaluating the options for having multiple databases vs. schemas on a
single database cluster for a custom grown app that we developed. Each app
installs same set of tables for each service. And the service could easily
be in thousands. so Is it better to have 1000 databases vs 1000 schemas in a
database cluster. What are the performance overhead of having multiple
databases vs. schemas (if any). I'm leaning towards having schemas rather
than databases but i would like to get others opinion on this. Appreciate
your reply.

Thanks,
Stalin




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Databases Vs. Schemas

2004-03-22 Thread Josh Berkus
Stalin,

 We are evaluating the options for having multiple databases vs. schemas on a
 single database cluster for a custom grown app that we developed. Each app
 installs same set of tables for each service. And the service could easily
 be in thousands. so Is it better to have 1000 databases vs 1000 schemas in a
 database cluster. What are the performance overhead of having multiple
 databases vs. schemas (if any). I'm leaning towards having schemas rather
 than databases but i would like to get others opinion on this. Appreciate
 your reply.

No performance difference AFAIK.   The real question is whether you have to 
have queries joining several databases.   If yes, use Schema; if no, use 
databases.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match