Sven Willenberger wrote:
Trying to determine the best overall approach for the following
scenario:
Each month our primary table accumulates some 30 million rows (which
could very well hit 60+ million rows per month by year's end). Basically
there will end up being a lot of historical data with little value
beyond archival.
The question arises then as the best approach of which I have enumerated
three:
I just thought of another possibility. You could create each table
month-by-month, and then use a view to combine them, and possibly a rule
to keep things clean.
So you would do something like:
I will assume you already have the data in one big table to show the
easiest way to create the small tables.
create table tblname-2005-01 as select * from orig_tbl where day >=
'2005-01-01' and day < '2005-02-01';
create table tblname-2005-02 as select * from orig_tbl where day >=
'2005-02-01' and day < '2005-03-01';
create table tblname-2005-03 as select * from orig_tbl where day >=
'2005-03-01' and day < '2005-04-01';
-- create appropriate indicies, rules, constraints on these tables
Then you create a view which includes all of these tables.
create or replace view tblname as
select * from tblname-2005-01
union all select * from tblname-2005-02
union all select * from tblname-2005-03
;
Then create insert and update rules which fixe which table gets the new
data.
create rule up_tblname as on update to tblname do instead
update tblname-2005-03 set
col1 = NEW.col1,
col2 = NEW.col2,
...
where id = NEW.id;
-- This assumes that you have a unique id on your tables. This is just
whatever your
-- primary key is, so it should be a decent assumption.
create rule ins_tblname as on insert to tblname do instead
insert into tblname-2005-03 (col1, col2, ...)
values (new.col1, new.col2, ...);
Now the downside of this method, is that every month you need to create
a new table, and then update the views and the rules. The update rules
are pretty straightforward, though.
The nice thing is that it keeps your data partitioned, and you don't
ever have a large select/delete step. You probably will want a small one
each month to keep the data exactly aligned by month. You don't really
have to have exact alignments, but as humans, we tend to like that stuff. :)
Probably this is more overhead than you would like to do. Especially if
you know that you can get away with method 2 (keep 1 big table, and just
remove old rows out of it every month.)
But this method means that all of your data stays live, but queries with
appropriate restrictions should stay fast. You also have the ability
(with v8.0) to move the individual tables onto separate disks.
One more time, though, if you can get away with removing old data and
just archiving it, do so. But if you want to keep the data live, there
are a couple of alternatives.