Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

2009-09-14 Thread Josh Berkus

 So the question I would ask goes more like do you really need 32K
 databases in one installation?  Have you considered using schemas
 instead?  Databases are, by design, pretty heavyweight objects.
 
 I agree, but at the same time, we might: a) update our documentation to
 indicate it depends on the filesystem, and b) consider how we might
 work around this limit (and if we feel the effort to be worth it).

I don't feel it's worth the effort.

I can think of lots of hosted application configurations where one might
need 33K tables.  Note that PostgreSQL *already* handles this better
than Oracle or MySQL do -- I know at least one case where our ability to
handle large numbers of tables was a reason for migration from Oracle to
PostgreSQL.

However, I can think of no legitimate reason to need 33K active
databases in a single instance.  I think someone has confused databases
with schema ... or even with tables.  Filemaker developer, maybe?  Or
maybe it 10 active databases and 32.99K archive ones ... in which case
they should be dumped to compressed backup and dropped.

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

2009-09-12 Thread Stephen Frost
* Stephen Frost (sfr...@snowman.net) wrote:
 Ehhh, it's likely to be cached..  Sounds like a stretch to me that this
 would actually be a performance hit.  If it turns out to really be one,
 we could just wait to move to subdirectories until some threshold (eg-
 30k) is hit.

Thinking this through a bit more, I realized that I didn't explain my
thought here very well.  My idea would be-
do everything as we do now, until we hit a threshold (perhaps an easy
one would be '1').
Once we hit the threshold, create a subdirectory first and then the new
database directory in that.  eg:

0/
1/
2/
3/
4/
[...]
/
1/0/
1/1/
1/2/
1/3/
1/[...]
1//
2/0/
2/1/
2/2/
2/[...]
2//
3/0/
3/1/
[...]
0/0/
0/1/
0/[...]
0//

This would allow for 220M+ databases.  I'm not sure how bad it'd be to
introduce another field to pg_database which provides the directory (as
it'd now be distinct from the oid..) or if that might require alot of
changes.  Not sure how easy it'd be to implement something to address
this problem while we continue to tie the directory name to the oid.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

2009-09-12 Thread Mark Mielke

On 09/12/2009 03:33 PM, Stephen Frost wrote:

* Mark Mielke (m...@mark.mielke.cc) wrote:
   

No matter what scheme PostgreSQL uses for storing the data, there can be
underlying file system limitations.
 

This is true, but there's a reason we only create 1GB files too.  I
wouldn't be against a scheme such as described to minimize the impact to
PG of these limitations.
   


Ok - but minimizing the impact does not necessarily mean keep doing 
what we are doing, but work around the issue. One interpretation of the 
problem is that the problem is that PostgreSQL is trying to use too many 
sub-directories in the same containing directory. I would argue that the 
problem is that PostgreSQL requires so many sub-directories in the first 
place.


There are many database designs that do not require one file per 
database. Berkeley DB JE, for instance, treats each database as one 
root in a larger tree. The entire database is stored in one set of 
files, where the files are created due to database volume, not database 
quantity. Tables can be thought of similarly.



There are many ways PostgreSQL could work around this problem - your
suggestion of using sub-directories being one of them - but what happens
if this causes performance degradation for existing users, due to the
extra file system lookups required on every access?
 

Ehhh, it's likely to be cached..  Sounds like a stretch to me that this
would actually be a performance hit.  If it turns out to really be one,
we could just wait to move to subdirectories until some threshold (eg-
30k) is hit.
   


Cached does not eliminate the cost. It just means it doesn't have to go 
to disk. It still needs to traverse an additional level of the VFS tree. 
Sure, this is designed to be cheap - but this avoids the real cost from 
consideration - that of having so many subdirectories in the first place.



Another solution would be to store everything in the same file.
 

eh?
   


There is no technical requirement for PostgreSQL to separate data in 
databases or tables on subdirectory or file boundaries. Nothing wrong 
with having one or more large files that contain everything. PostgreSQL 
doesn't happen to do this today - but it's bothered me at times that it 
has so many files in the database directory - even very small tables 
require their own files.



In any case, I think this would be a significant architecture change for
something that sounds like a bad idea. I would expect having 32k
databases to have significant performance degradations in other ways.
 

Actually, I think some of the changes to remove flatfiles might improve
our performance with large numbers of databases.  I also don't see how
this would be a significant architecture change at all.  If there are
still issues that make having lots of databases slow, we might want to
look into fixing those issues rather than saying well, just don't do
that.
   


I guess I'm not seeing how using 32k tables is a sensible model. So yes, 
things can be done to reduce the cost - but it seems like something is 
wrong if this is truly a requirement. There are alternative models of 
storage that would not require 32k tables, that likely perform better. 
Although, I don't know your requirements, so perhaps I am missing something.



In
particular, I am thinking about having to open a file descriptor for
each of these files. What sort of database architecture requires 32k
databases or tables for the same PostgreSQL instance? Have you
considered having an additional field for your primary key and combining
several tables into one?
 

I've got a ton of instances that have32K tables.  My approach is
generally to keep the number of databases low, while having lots of
schemas, but there are distinct downsides to that (specifically related
to hiding information..  something alot of people care about, but
thankfully I don't have to).
   


Do you agree with me that having 32k open file descriptors (or worse, 
open on demand file descriptors that need to be re-opened many times) is 
a problem?


Looking at PostgreSQL today - I don't think it's designed to scale to 
this. Looking at SQL today, I think I would find it difficult to justify 
creating a solution that requires this capability.


Honestly - it seems a bit insane. Sorry. :-) Maybe I'm just naive...

Cheers,
mark

--
Mark Mielkem...@mielke.cc


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

2009-09-12 Thread Mark Mielke

On 09/12/2009 03:48 PM, Stephen Frost wrote:

This would allow for 220M+ databases.  I'm not sure how bad it'd be to
introduce another field to pg_database which provides the directory (as
it'd now be distinct from the oid..) or if that might require alot of
changes.  Not sure how easy it'd be to implement something to address
this problem while we continue to tie the directory name to the oid.
   


Other than bragging rights - what part of this would be a GOOD thing? :-)

My God - I thought 32k databases in the same directory was insane. 
220M+???


Hehehe...

If you can patch PostgreSQL to support such extremes without hurting my 
performance - I'll shut up and leave you be. :-)


Cheers,
mark

--
Mark Mielkem...@mielke.cc


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

2009-09-12 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * Mark Mielke (m...@mark.mielke.cc) wrote:
 No matter what scheme PostgreSQL uses for storing the data, there can be  
 underlying file system limitations.

 This is true, but there's a reason we only create 1GB files too.  I
 wouldn't be against a scheme such as described to minimize the impact to
 PG of these limitations.

There are plenty of filesystems available that do not have this silly
limitation, so I don't see a reason for us to work around it.  If the
OP is on a platform that only offers UFS and ZFS, and he doesn't like
either of those, maybe he should find another platform.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

2009-09-12 Thread Tom Lane
Mark Mielke m...@mark.mielke.cc writes:
 My God - I thought 32k databases in the same directory was insane. 
 220M+???

Considering that the system catalogs alone occupy about 5MB per
database, that would require an impressive amount of storage...

In practice I think users would be complaining about our choice
to instantiate the catalogs per-database a lot sooner than they'd
hit the subdirectory-count limit.

BTW, there is another avenue that the OP could look into if he
really wants this many databases on a UFS filesystem: split them up
into multiple tablespaces.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

2009-09-12 Thread Stephen Frost
* Mark Mielke (m...@mark.mielke.cc) wrote:
 There is no technical requirement for PostgreSQL to separate data in  
 databases or tables on subdirectory or file boundaries. Nothing wrong  
 with having one or more large files that contain everything.

Uhh, except where you run into system limitations on file size (eg- a 2G
max file size..).  You'll note PG creates files up to 1G and then splits
them into separate files.  It's not done just because it's fun.

 I guess I'm not seeing how using 32k tables is a sensible model.

For one thing, there's partitioning.  For another, there's a large user
base.  32K tables is, to be honest, not all that many, especially for
some of these databases which reach into the multi-TB range..

 So yes,  
 things can be done to reduce the cost - but it seems like something is  
 wrong if this is truly a requirement.

I have no idea what you've been working with, but I hardly think it
makes sense for PG to consider over 32k tables as not worth supporting.

 There are alternative models of  
 storage that would not require 32k tables, that likely perform better.  

Eh?  You would advocate combining tables for no reason other than you
think it's bad to have alot?

 Do you agree with me that having 32k open file descriptors (or worse,  
 open on demand file descriptors that need to be re-opened many times) is  
 a problem?

Nope.

 Looking at PostgreSQL today - I don't think it's designed to scale to  
 this. Looking at SQL today, I think I would find it difficult to justify  
 creating a solution that requires this capability.

Actually, I find that PG handles it pretty well.  And we used to be an
Oracle shop.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

2009-09-12 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * Mark Mielke (m...@mark.mielke.cc) wrote:
 I guess I'm not seeing how using 32k tables is a sensible model.

 For one thing, there's partitioning.  For another, there's a large user
 base.  32K tables is, to be honest, not all that many, especially for
 some of these databases which reach into the multi-TB range..

I believe the filesystem limit the OP is hitting is on the number of
*subdirectories* per directory, not on the number of plain files.
If we had a hard limit at 32K tables many people would have hit it
before now.

So the question I would ask goes more like do you really need 32K
databases in one installation?  Have you considered using schemas
instead?  Databases are, by design, pretty heavyweight objects.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

2009-09-12 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 I believe the filesystem limit the OP is hitting is on the number of
 *subdirectories* per directory, not on the number of plain files.

Right, I'm not entirely sure how we got onto the question of number of
tables.

 So the question I would ask goes more like do you really need 32K
 databases in one installation?  Have you considered using schemas
 instead?  Databases are, by design, pretty heavyweight objects.

I agree, but at the same time, we might: a) update our documentation to
indicate it depends on the filesystem, and b) consider how we might
work around this limit (and if we feel the effort to be worth it).

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

2009-09-12 Thread Mark Mielke

On 09/12/2009 04:17 PM, Stephen Frost wrote:

* Mark Mielke (m...@mark.mielke.cc) wrote:
   

There is no technical requirement for PostgreSQL to separate data in
databases or tables on subdirectory or file boundaries. Nothing wrong
with having one or more large files that contain everything.
 

Uhh, except where you run into system limitations on file size (eg- a 2G
max file size..).  You'll note PG creates files up to 1G and then splits
them into separate files.  It's not done just because it's fun.
   


This becomes a bit of a side thread - but note that I carefully didn't 
say exactly one file. I said one or more large files that contain 
everything. That is, if we have 3 databases each of size 50 Mbytes, 
there is no technical reason why this cannot be stored within a single 
150 Mbyte data file. Sure, if it goes beyond 2G, we can break it into a 
set of files, and treat each file as a 2G block in a virtual larger 
storage pool. VMWare has this for storing virtual drives.


If we assume that 32k *databases* is reasonable for a single instance, 
for 32k databases to *require* 32k immediate sub-directories is the real 
problem. This can be solved either by: 1) Adding additional depth to the 
directory height to work around this limit (what the OP and you are 
proposing), or 2) Storing multiple databases within the same files or 
sub-directories. If you really must have this amount of scalability, I 
am suggesting that you consider all of the resources required to access 
32k worth of sub-directories in the file systems, specifically including 
file descriptors, inodes, the backing bitmaps or extent mappings that 
allocate from the file system free space, the rather inefficient 
directory layouts of many file systems (many file systems still do 
LINEAR searches for filenames, making file lookups linearly slower as 
the directory becomes larger), and the kernel memory caches that track 
all of these little details. The POSIX guarantees required are certainly 
more heavy weight than the requirements that PostgreSQL has, and I am 
certain it is possible to create a targetted solution to this problem 
that is simpler and faster. For only a few databases and a few files, 
the effort isn't worth it. But, if supporting 32k+ *databases*, or even 
32k+ tables and indexes is a major requirement, and a major design 
target, then PostgreSQL should do this stuff itself.


Modern file systems don't have the 2G problem. ext2/ext3 for 4Kbyte 
blocks (standard) supports up to 2Tbytes. This also matches the 
practical limit on addressing a single physical disk, at least on the 
platforms I am familiar with. The requirement to stay under 2G for a 
single file is a bit out dated.




I guess I'm not seeing how using 32k tables is a sensible model.
 

For one thing, there's partitioning.  For another, there's a large user
base.  32K tables is, to be honest, not all that many, especially for
some of these databases which reach into the multi-TB range..
   


Talking philosophically - the need to use table-based partitioning to 
achieve acceptable performance or storage requirements is somewhat of a 
hacky work around. It's effectively moving the database query logic back 
into the application space, where the application must know which tables 
contain which data. The inherited tables and automatic constraint-based 
query planning helps out, but it's still an elaborate hack. It's 
exposing data that the application should not need to care about, and 
then making it possible to hide some of it again. Table partitioning 
should be far more automatic. I don't want to break my theoretical table 
containing every call made on my network into per-hour tables, each with 
a constraint for the time range it includes data for. I want to create a 
table, with a timestamp column, fill it with billions of records, 
provide a few hints, and the database engine should be smart enough to 
partition the table such that my queries just work.


Back to reality - maybe things have not reached this level of maturity 
yet, and people with practical requirements today, have found that they 
need to use very complex manual partitioning schemes that chew up 
thousands of tables.




So yes,
things can be done to reduce the cost - but it seems like something is
wrong if this is truly a requirement.
 

I have no idea what you've been working with, but I hardly think it
makes sense for PG to consider over 32k tables as not worth supporting.
   


I don't advocate any limits. However, I also don't advocate designing 
PostgreSQL specifically for the case of 32k tables. If you want to use 
32k tables, then you better have a file system that supports 32k+ files 
in a single directory, and a kernel that is able to work efficiently 
when postgres has thousands or more file descriptors open and in use at 
the same time. The system *supports* 32k tables, but if you look at the 
design, you'll see that it is not optimal for 32k tables. Even with 

Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

2009-09-12 Thread Andrew Dunstan



Tom Lane wrote:

So the question I would ask goes more like do you really need 32K
databases in one installation?  Have you considered using schemas
instead?  Databases are, by design, pretty heavyweight objects.


  


That's a fair question. OTOH, devising a scheme to get around it would 
not be terribly difficult, would it? I can imagine a scheme where the 
subdir for a database was lo/hi for some division of the database oid. I 
guess it could make matters ugly for pg_migrator, though.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

2009-09-12 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Tom Lane wrote:
 So the question I would ask goes more like do you really need 32K
 databases in one installation?  Have you considered using schemas
 instead?  Databases are, by design, pretty heavyweight objects.

 That's a fair question. OTOH, devising a scheme to get around it would 
 not be terribly difficult, would it? I can imagine a scheme where the 
 subdir for a database was lo/hi for some division of the database oid. I 
 guess it could make matters ugly for pg_migrator, though.

As I said earlier, the number-of-subdirectories issue is not the
important thing.  The OP was already up to 160GB worth of system
catalogs before his filesystem wimped out, and would be needing
terabytes if he wanted to go significantly past the filesystem limit.
So there is no point in devising some clever workaround for the
limitations of one filesystem unless you want to reconsider our system
catalog representation --- and that will carry actual user-visible
functional costs; it's not just a cute hack somewhere in the guts of
the system.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers