Sorry for the delay response. We had some hardware/configuration issues
that appear to be solved now, so now we're starting to actually play with
modifying the database.

On Sat, Dec 7, 2013 at 1:29 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:

> On Thu, Dec 5, 2013 at 7:36 AM, Dave Johansen <davejohan...@gmail.com>wrote:
>
>> I'm managing a database that is adding about 10-20M records per day to a
>> table and time is a core part of most queries,
>>
>
>
> What is the nature of how the time column is used in the queries?
> Depending on how it is used, you might not get much improvement at all, or
> you might get N fold improvement, or you might find that re-designing your
> indexes could get you the same query improvement that partitioning would,
> but with less headache.
>

The time column is usually used to calculate statistics, find/analyze
duplicates, analyze data contents, etc on a specific time window. So there
will be queries with GROUP BY and WINDOWs with a specific time filter in
the where clause.


>
> so I've been looking into seeing if I need to start using partitioning
>> based on the time column and I've found these general guidelines:
>>
>> Don't use more than about 50 paritions (
>> http://www.postgresql.org/message-id/17147.1271696...@sss.pgh.pa.us )
>>  Use triggers to make the interface easier (
>> https://wiki.postgresql.org/wiki/Table_partitioning#Trigger-based and
>> http://stackoverflow.com/questions/16049396/postgres-partition-by-week )
>>
>
> Using triggers slows INSERTs down by a lot (unless they were already slow
> due to the need to hit disk to maintain the indexes or something like
> that).  Are you sure you can handle that slow down, given your insertion
> rate?  You could get the best of both worlds by having your bulk loaders
> target the correct partition directly, but also have the triggers on the
> parent table for any programs that don't get the message.
>

Inserting directly into the correct partition whenever possible and leaving
the trigger on the parent table seems like the best option.


>
>> The only data I found fell inline with what you'd expect (i.e. speeds up
>> selects but slows down inserts/updates
>> http://www.if-not-true-then-false.com/2009/performance-testing-between-partitioned-and-non-partitioned-postgresql-tables-part-3/)
>>
>
>
> One of the big benefits of partitioning can be to speed up insertions
> tremendously, by keeping the hot part of the indices that need to be
> maintained upon insertion together in shared_buffers.
>

We insert lots of new data, but rarely modify existing data once it's in
the database, so it sounds like this would be a big benefit for us.


>
>
>> So I was thinking that partitioning based on month to keep the number of
>> partitions low, so that would mean about 0.5G records in each table. Does
>> that seem like a reasonable number of records in each partition? Is there
>> anything else that I should consider or be aware of?
>>
>
> How will data be expired?  Hows does the size of one of your intended
> partitions compare to your RAM and shared_buffers.
>

We add about 10-20 million records per day with each being about 200 bytes
in size (there's a bytea in there with that being the average size) to each
table and there's 64 GB of RAM on the machine.


>
> Cheers,
>
> Jeff
>

On Sat, Dec 7, 2013 at 1:37 PM, Scott Marlowe <scott.marl...@gmail.com>wrote:
> I'll add that you can use assymetric partitioning if you tend to do a
> lot of more fine grained queries on recent data and more big roll up
> on older ones. I.e. partition by month except for the last 30 days, do
> it by day etc. Then at the end of the month roll all the days into a
> month partition and delete them.

This sounds like a great solution for us. Is there some trick to roll the
records from one partition to another? Or is the only way just a SELECT
INTO followed by a DELETE?

Thanks,
Dave

Reply via email to