Sorry for the delay response. We had some hardware/configuration issues that appear to be solved now, so now we're starting to actually play with modifying the database.
On Sat, Dec 7, 2013 at 1:29 PM, Jeff Janes <jeff.ja...@gmail.com> wrote: > On Thu, Dec 5, 2013 at 7:36 AM, Dave Johansen <davejohan...@gmail.com>wrote: > >> I'm managing a database that is adding about 10-20M records per day to a >> table and time is a core part of most queries, >> > > > What is the nature of how the time column is used in the queries? > Depending on how it is used, you might not get much improvement at all, or > you might get N fold improvement, or you might find that re-designing your > indexes could get you the same query improvement that partitioning would, > but with less headache. > The time column is usually used to calculate statistics, find/analyze duplicates, analyze data contents, etc on a specific time window. So there will be queries with GROUP BY and WINDOWs with a specific time filter in the where clause. > > so I've been looking into seeing if I need to start using partitioning >> based on the time column and I've found these general guidelines: >> >> Don't use more than about 50 paritions ( >> http://www.postgresql.org/message-id/17147.1271696...@sss.pgh.pa.us ) >> Use triggers to make the interface easier ( >> https://wiki.postgresql.org/wiki/Table_partitioning#Trigger-based and >> http://stackoverflow.com/questions/16049396/postgres-partition-by-week ) >> > > Using triggers slows INSERTs down by a lot (unless they were already slow > due to the need to hit disk to maintain the indexes or something like > that). Are you sure you can handle that slow down, given your insertion > rate? You could get the best of both worlds by having your bulk loaders > target the correct partition directly, but also have the triggers on the > parent table for any programs that don't get the message. > Inserting directly into the correct partition whenever possible and leaving the trigger on the parent table seems like the best option. > >> The only data I found fell inline with what you'd expect (i.e. speeds up >> selects but slows down inserts/updates >> http://www.if-not-true-then-false.com/2009/performance-testing-between-partitioned-and-non-partitioned-postgresql-tables-part-3/) >> > > > One of the big benefits of partitioning can be to speed up insertions > tremendously, by keeping the hot part of the indices that need to be > maintained upon insertion together in shared_buffers. > We insert lots of new data, but rarely modify existing data once it's in the database, so it sounds like this would be a big benefit for us. > > >> So I was thinking that partitioning based on month to keep the number of >> partitions low, so that would mean about 0.5G records in each table. Does >> that seem like a reasonable number of records in each partition? Is there >> anything else that I should consider or be aware of? >> > > How will data be expired? Hows does the size of one of your intended > partitions compare to your RAM and shared_buffers. > We add about 10-20 million records per day with each being about 200 bytes in size (there's a bytea in there with that being the average size) to each table and there's 64 GB of RAM on the machine. > > Cheers, > > Jeff > On Sat, Dec 7, 2013 at 1:37 PM, Scott Marlowe <scott.marl...@gmail.com>wrote: > I'll add that you can use assymetric partitioning if you tend to do a > lot of more fine grained queries on recent data and more big roll up > on older ones. I.e. partition by month except for the last 30 days, do > it by day etc. Then at the end of the month roll all the days into a > month partition and delete them. This sounds like a great solution for us. Is there some trick to roll the records from one partition to another? Or is the only way just a SELECT INTO followed by a DELETE? Thanks, Dave