Re: [PERFORM] Large rows number, and large objects

Craig James Sun, 19 Jun 2011 08:50:53 -0700

On 6/19/11 4:37 AM, Samuel Gendler wrote:

On Sat, Jun 18, 2011 at 9:06 PM, Jose Ildefonso Camargo Tolosa 
<ildefonso.cama...@gmail.com <mailto:ildefonso.cama...@gmail.com>> wrote:


    Greetings,

    I have been thinking a lot about pgsql performance when it is dealing
    with tables with lots of rows on one table (several millions, maybe
    thousands of millions).  Say, the Large Object use case:

    one table has large objects (have a pointer to one object).
    The large object table stores the large object in 2000 bytes chunks
    (iirc), so, if we have something like 1TB of data stored in large
    objects, the large objects table would have something like 550M rows,
    if we get to 8TB, we will have 4400M rows (or so).

    I have read at several places that huge tables should be partitioned,
    to improve performance.... now, my first question comes: does the
    large objects system automatically partitions itself? if no: will
    Large Objects system performance degrade as we add more data? (I guess
    it would).

You should consider "partitioning" your data in a different way: Separate the 
relational/searchable data from the bulk data that is merely being stored.

Relational databases are just that: relational.  The thing they do well is to 
store relationships between various objects, and they are very good at finding 
objects using relational queries and logical operators.

But when it comes to storing bulk data, a relational database is no better than 
a file system.

In our system, each "object" is represented by a big text object of a few 
kilobytes.  Searching that text file is essential useless -- the only reason it's there 
is for visualization and to pass on to other applications.  So it's separated out into 
its own table, which only has the text record and a primary key.

We then use other tables to hold extracted fields and computed data about the primary 
object, and the relationships between the objects.  That means we've effectively 
"partitioned" our data into searchable relational data and non-searchable bulk 
data.  The result is that we have around 50 GB of bulk data that's never searched, and 
about 1GB of relational, searchable data in a half-dozen other tables.

With this approach, there's no need for table partitioning, and full table 
scans are quite reasonable.

Craig

Re: [PERFORM] Large rows number, and large objects

Reply via email to