[GENERAL] Garbage Collecting

2009-03-23 Thread Joshua Berry
I'm a postgresql newbie that's inherited eight production servers running
Postgresql 8.2.5 as the backend. I have many questions covering topics such
as administration of the database (upgrading, maintaining conf files, etc),
improving the schema of the system (many tables don't currently have primary
keys; to do anything useful you must join at least 5 tables), optimizing
poor performing queries that can take hours, and knowing where functionality
of the system should reside (curenly as PL/SQL functions, as external c
code, external php code, and external perl code).

Please refer me to appropriate documentation/FAQs/books. I've read
http://www.postgresql.org/docs/faqs.FAQ.html If anyone knows of writeups for
newbies that touches upon the things I mentioned, that would probably be
really helpful for me.

I have one specific question about Garbage Collecting within the database.
The database system I'm working with has data that is no longer needed after
a period of time. For example: transaction records only need to be kept
around for the last 31 days; php web sessions that don't need to persist
longer than a day. Could I create some function in the database that would
act a bit like a daily cron job that deletes old records from tables (and
then performs the appropriate VACUUM to regain the space)?

If yes, how does one impliment something like that? As a trigger function
written in PL/SQL? Can I hook the function into something that executes once
per day?
If no, why? Should the external scripts/code that puts the data into the
database be responsible for removing the old data?

Thanks in advance for any/all pointers!

-Joshua

-- 
Joshua Berry
Software Engineer
Opentech, S.A.
+(595 21) 282557 Work
+(595) 981 330 701 Mobile


Re: [GENERAL] Garbage Collecting

2009-03-23 Thread justin

Joshua Berry wrote:
I'm a postgresql newbie that's inherited eight production servers 
running Postgresql 8.2.5 as the backend. I have many questions 
covering topics such as administration of the database (upgrading, 
maintaining conf files, etc), improving the schema of the system (many 
tables don't currently have primary keys; to do anything useful you 
must join at least 5 tables), optimizing poor performing queries that 
can take hours, and knowing where functionality of the system should 
reside (curenly as PL/SQL functions, as external c code, external php 
code, and external perl code).
Indexing can be a win once you figure out which quires are run the most 
and what the common where clauses look like.  Indexes can eat up allot 
of disk space and slow performance in other places.  Its a double edge 
sword. 


http://www.postgresql.org/docs/8.2/interactive/sql-createindex.html

The database I maintain has far more Reads by a factor of 10,000 times 
to inserts, updates, and deletes combine  so having many indexes is a 
win in my case.  This is the part of tuning the database to the load. 


http://wiki.postgresql.org/wiki/Performance_Optimization
http://wiki.postgresql.org/wiki/Database_Administration_and_Maintenance



Please refer me to appropriate documentation/FAQs/books. I've read 
http://www.postgresql.org/docs/faqs.FAQ.html If anyone knows of 
writeups for newbies that touches upon the things I mentioned, that 
would probably be really helpful for me.

http://wiki.postgresql.org/wiki/Main_Page
and the help files with comments has lots of helpful information.  Just 
make sure that you are reading information related to the version of 
Postgresql you are running.


http://www.postgresql.org/docs/8.2/interactive/index.html


I have one specific question about Garbage Collecting within the 
database. The database system I'm working with has data that is no 
longer needed after a period of time. For example: transaction records 
only need to be kept around for the last 31 days; php web sessions 
that don't need to persist longer than a day. Could I create some 
function in the database that would act a bit like a daily cron job 
that deletes old records from tables (and then performs the 
appropriate VACUUM to regain the space)?

Yes  you can do this with pgAgent it comes with pgAdmin
http://www.pgadmin.org/docs/1.4/pgagent.html
http://www.pgadmin.org/


If the records have a date when added  then a command like so can be 
scheduled


Delete from Mytable where DateAdded = (Current_date - '31 day 
'::interval' )::date


If yes, how does one impliment something like that? As a trigger 
function written in PL/SQL? Can I hook the function into something 
that executes once per day?
Yes it can be written in PL/SQL function then have pgAgent call it at 
midnight or at sometime thats off peak load times.  


Also you want to make sure autovaccum is turned to your needs

http://www.postgresql.org/docs/8.2/static/routine-vacuuming.html
this can have big impacts on performance.

If no, why? Should the external scripts/code that puts the data into 
the database be responsible for removing the old data?


It really does not matter. 


Thanks in advance for any/all pointers!

-Joshua

--
Joshua Berry
Software Engineer
Opentech, S.A.
+(595 21) 282557 Work
+(595) 981 330 701 Mobile


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general