On 01/12/2014, at 19:26, Andy Colson wrote:

> On 12/1/2014 11:14 AM, Herouth Maoz wrote:
>> I am currently in the process of creating a huge archive database that
>> contains data from all of our systems, going back for almost a decade.
>> 
>> Most of the tables fall into one of two categories:
>> 
>> 1. Static tables, which are rarely updated, such as lookup tables or
>> user lists. I don't intend to partition these, I'll just refresh them
>> periodically from production.
>> 2. Transaction tables, that have a timestamp field, for which I have the
>> data archived in COPY format by month. Of course a monolithic table over
>> a decade is not feasible, so I am partitioning these by month.
>> 
>> (I don't mean "transaction" in the database sense, but in the sense that
>> the data represents historical activity, e.g. message sent, file
>> downloaded etc.)
>> 
>> I have one table, though, that doesn't fall into this pattern. It's a
>> many-to-one table relating to one of the transaction tables. So on one
>> hand, it doesn't have a time stamp field, and on the other hand, it has
>> accumulated lots of data over the last decade so I can't keep it
>> unpartitioned.
>> 
> 
> Lets stop here.  One big table with lots of rows (and a good index) isn't a 
> problem.  As long as you are not table scanning everything, there isn't a 
> reason to partition the table.
> 
> Lots and lots of rows isnt a problem except for a few usage patterns:
> 1) delete from bigtable where (some huge percent of the rows)
> 2) select * from bigtable where (lots and lots of table scanning and cant 
> really index)
> 
> If your index is selective enough, you'll be fine.


Hmm. I suppose you're right. I planned the whole partition thing in the first 
place because most of my "transaction" tables are still alive so I'll need to 
continue bulk-inserting data every month, and inserting into a fresh partition 
is better than into a huge table.

But in this case, since we have stopped working on this application in January, 
there will be no fresh inserts so it's not as important. We just need the 
archive for legal purposes.

One thing, though: I noticed on my other system (a reports system, that holds a 
year's worth of data) that after I have partitioned the largest tables, backup 
time dropped. I suppose pg_dump of a single huge table takes is not as fast as 
pg_dump of multiple smaller ones.

Herouth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to