Re: [PERFORM] multi billion row tables: possible or insane?

John Arbash Meinel Tue, 01 Mar 2005 07:26:02 -0800

Ramon Bastiaans wrote:

Hi all,

I am doing research for a project of mine where I need to store
several billion values for a monitoring and historical tracking system
for a big computer system. My currect estimate is that I have to store
(somehow) around 1 billion values each month (possibly more).

If you have that 1 billion perfectly distributed over all hours of the
day, then you need 1e9/30/24/3600 = 385 transactions per second.

Which I'm pretty sure is possible with postgres, you just need pretty
beefy hardware. And like Jeff said, lots of disks for lots of IO.
Like a quad opteron, with 16GB of ram, and around 14-20 very fast disks.
raid10 not raid5, etc. To improve query performance, you can do some
load balancing by having replication machines by using Slony.

Or if you can do batch processing, you could split up the work into a
few update machines, which then do bulk updates on the master database.
This lets you get more machines into the job, since you can't share a
database across multiple machines.

I was wondering if anyone has had any experience with these kind of
big numbers of data in a postgres sql database and how this affects
database design and optimization.

Well, one of the biggest things is if you can get bulk updates, or if
clients can handle data being slightly out of date, so you can use
cacheing. Can you segregate your data into separate tables as much as
possible? Are your clients okay if aggregate information takes a little
while to update?

One trick is to use semi-lazy materialized views to get your updates to
be fast.

What would be important issues when setting up a database this big,
and is it at all doable? Or would it be a insane to think about
storing up to 5-10 billion rows in a postgres database.


I think you if you can design the db properly, it is doable. But if you
have a clients saying "I need up to the second information on 1 billion
rows", you're never going to get it.


The database's performance is important. There would be no use in
storing the data if a query will take ages. Query's should be quite
fast if possible.

Again, it depends on the queries being done.
There are some nice tricks you can use, like doing a month-by-month
partitioning (if you are getting 1G inserts, you might want week-by-week
partitioning), and then with a date column index, and a union all view
you should be able to get pretty good insert speed, and still keep fast
*recent* queries. Going through 1billion rows is always going to be
expensive.

I would really like to hear people's thoughts/suggestions or "go see a
shrink, you must be mad" statements ;)

Kind regards,

Ramon Bastiaans


I think it would be possible, but there are a lot of design issues with
a system like this. You can't go into it thinking that you can design a
multi billion row database the same way you would design a million row db.

John
=:->

signature.asc
Description: OpenPGP digital signature

Re: [PERFORM] multi billion row tables: possible or insane?

Reply via email to