Re: Performance opportunities for write-only audit tables?

2018-05-24 Thread Andrew Bartley
2. Use  UNLOGGED in the log table creates.

Seriously?  For audit tables?


I guess that depends on what the log tables are used for Also in this
case the logs are written to via a trigger I all within one
transaction.  So it may not matter.

We use UNLOGGED because the audit logs are not financial, and used only
very rarely, and only for analysis of access and performance in a batch
setting.  Also certainly not via a trigger, as we don't want the logging
bound to the transaction.

On Fri, 25 May 2018 at 07:17 David G. Johnston 
wrote:

> On Thursday, May 24, 2018, Andrew Bartley  wrote:
>
>> Hi,
>>
>> The two main techniques we use are.
>>
>> The idea here is to backup the rest of your DB to one backup regime and
>> the log tables to another. We set it up so at the end of the day the
>> current log table is backed up and loaded into an backup archive, then we
>> vacuum freeze the log table.
>>
>
> Yeah, doing logging in-database for immediate performance while
> periodically copying or moving said data to external storage seems like the
> best option.  Depends on the definition of an acceptable process and
> response time should the audit data be needed though.
>
>
>>
>> 2. Use  UNLOGGED in the log table creates.
>>
>
> Seriously?  For audit tables?
>
> David J.
>


Re: Performance opportunities for write-only audit tables?

2018-05-24 Thread David G. Johnston
On Thursday, May 24, 2018, Andrew Bartley  wrote:

> Hi,
>
> The two main techniques we use are.
>
> The idea here is to backup the rest of your DB to one backup regime and
> the log tables to another. We set it up so at the end of the day the
> current log table is backed up and loaded into an backup archive, then we
> vacuum freeze the log table.
>

Yeah, doing logging in-database for immediate performance while
periodically copying or moving said data to external storage seems like the
best option.  Depends on the definition of an acceptable process and
response time should the audit data be needed though.


>
> 2. Use  UNLOGGED in the log table creates.
>

Seriously?  For audit tables?

David J.


Re: Performance opportunities for write-only audit tables?

2018-05-24 Thread Andrew Bartley
Sorry that should be a "vacuum full freeze"... not just a  "vacuum freeze"

On Fri, 25 May 2018 at 07:07 Andrew Bartley  wrote:

> Hi,
>
> The two main techniques we use are.
>
> 1. Create a script to pass the -t param to pg_dump to exclude the log
> tables.  The idea here is to backup the rest of your DB to one backup
> regime and the log tables to another. We set it up so at the end of the day
> the current log table is backed up and loaded into an backup archive, then
> we vacuum freeze the log table.  The benefits are that once each log table
> is "Closed" (meaning you will no longer be writing to that log table ever
> again), is backed up once only ever...   It speeds housekeeping up, and
> your daily backups are much smaller.
>
> 2. Use  UNLOGGED in the log table creates.
>
> Have fun.
>
> Andrew
>
> On Fri, 25 May 2018 at 02:55 Paul Jungwirth 
> wrote:
>
>> Hi,
>>
>> A project of mine uses a trigger-based approach to record changes to an
>> audit table. The audit table is partitioned by month (pg 9.5, so
>> old-fashioned partitioning). These tables are write-heavy but
>> append-only and practically write-only: we never UPDATE or DELETE, and
>> we seem to consult them only a few times a year. But they are enormous:
>> bigger than the rest of the database in fact. They slow down our
>> backups, they increase WAL size and streaming replication, they add to
>> recovery time, they make upgrades more time-consuming, and I suppose
>> they compete for RAM.
>>
>> This is all on an AWS EC2 instance with EBS storage. We also run a warm
>> standby with streaming replication.
>>
>> Since these tables are so different from everything else, I'm wondering
>> what opportunities we have to reduce their performance cost. I'm
>> interested both in practical high-bang-for-buck changes, but also in
>> harder just-interesting-to-think-about last-resort approaches. Here are
>> a few ideas of my own, but I'm curious what others think:
>>
>> We already have no indexes or foreign keys on these tables, so at least
>> there's no cost there.
>>
>> Since they are already partitioned, we could move old data to offline
>> storage and drop those tables. This feels like the biggest, easiest win,
>> and something we should have done a long time ago. Probably it's all we
>> need.
>>
>> Put them on a different tablespace. This one is also pretty obvious, but
>> aside from using a separate disk, I'm curious what other crazy things we
>> could do. Is there any per-tablespace tuning possible? (I think the
>> answer within Postgres is no, but I wish we could change the settings
>> for wal_level, or exclude them from replication, or something, so I'm
>> wondering if we could achieve the same effect by exploiting being on a
>> separate filesystem.) Maybe put the tablespace on some FUSE filesystem
>> to get async writes? Or just pick different mount options, e.g. on ext4
>> lazytime,dealloc,data=writeback? I don't know. Or at a different level:
>> change the triggers so they call a custom function that uses a new
>> thread to store the audit records elsewhere. Maybe these ideas are all
>> too risky, but I think the organization is fine with slightly relaxed
>> durability guarantees for this data, and anyway I'm just curious to have
>> a list of possibilities before I categorize anything as too crazy or
>> not. :-)
>>
>> If we upgraded to pg 10 we could use logical replication and leave out
>> the audit tables. That is appealing. Even without upgrading, I guess we
>> could replace those tables with postgres_fdw ones, so that they are not
>> replicated? Has anyone else used that trick?
>>
>> Thanks!
>>
>> --
>> Paul  ~{:-)
>> p...@illuminatedcomputing.com
>>
>>


Re: Performance opportunities for write-only audit tables?

2018-05-24 Thread Andrew Bartley
Hi,

The two main techniques we use are.

1. Create a script to pass the -t param to pg_dump to exclude the log
tables.  The idea here is to backup the rest of your DB to one backup
regime and the log tables to another. We set it up so at the end of the day
the current log table is backed up and loaded into an backup archive, then
we vacuum freeze the log table.  The benefits are that once each log table
is "Closed" (meaning you will no longer be writing to that log table ever
again), is backed up once only ever...   It speeds housekeeping up, and
your daily backups are much smaller.

2. Use  UNLOGGED in the log table creates.

Have fun.

Andrew

On Fri, 25 May 2018 at 02:55 Paul Jungwirth 
wrote:

> Hi,
>
> A project of mine uses a trigger-based approach to record changes to an
> audit table. The audit table is partitioned by month (pg 9.5, so
> old-fashioned partitioning). These tables are write-heavy but
> append-only and practically write-only: we never UPDATE or DELETE, and
> we seem to consult them only a few times a year. But they are enormous:
> bigger than the rest of the database in fact. They slow down our
> backups, they increase WAL size and streaming replication, they add to
> recovery time, they make upgrades more time-consuming, and I suppose
> they compete for RAM.
>
> This is all on an AWS EC2 instance with EBS storage. We also run a warm
> standby with streaming replication.
>
> Since these tables are so different from everything else, I'm wondering
> what opportunities we have to reduce their performance cost. I'm
> interested both in practical high-bang-for-buck changes, but also in
> harder just-interesting-to-think-about last-resort approaches. Here are
> a few ideas of my own, but I'm curious what others think:
>
> We already have no indexes or foreign keys on these tables, so at least
> there's no cost there.
>
> Since they are already partitioned, we could move old data to offline
> storage and drop those tables. This feels like the biggest, easiest win,
> and something we should have done a long time ago. Probably it's all we
> need.
>
> Put them on a different tablespace. This one is also pretty obvious, but
> aside from using a separate disk, I'm curious what other crazy things we
> could do. Is there any per-tablespace tuning possible? (I think the
> answer within Postgres is no, but I wish we could change the settings
> for wal_level, or exclude them from replication, or something, so I'm
> wondering if we could achieve the same effect by exploiting being on a
> separate filesystem.) Maybe put the tablespace on some FUSE filesystem
> to get async writes? Or just pick different mount options, e.g. on ext4
> lazytime,dealloc,data=writeback? I don't know. Or at a different level:
> change the triggers so they call a custom function that uses a new
> thread to store the audit records elsewhere. Maybe these ideas are all
> too risky, but I think the organization is fine with slightly relaxed
> durability guarantees for this data, and anyway I'm just curious to have
> a list of possibilities before I categorize anything as too crazy or
> not. :-)
>
> If we upgraded to pg 10 we could use logical replication and leave out
> the audit tables. That is appealing. Even without upgrading, I guess we
> could replace those tables with postgres_fdw ones, so that they are not
> replicated? Has anyone else used that trick?
>
> Thanks!
>
> --
> Paul  ~{:-)
> p...@illuminatedcomputing.com
>
>


Performance opportunities for write-only audit tables?

2018-05-24 Thread Paul Jungwirth

Hi,

A project of mine uses a trigger-based approach to record changes to an 
audit table. The audit table is partitioned by month (pg 9.5, so 
old-fashioned partitioning). These tables are write-heavy but 
append-only and practically write-only: we never UPDATE or DELETE, and 
we seem to consult them only a few times a year. But they are enormous: 
bigger than the rest of the database in fact. They slow down our 
backups, they increase WAL size and streaming replication, they add to 
recovery time, they make upgrades more time-consuming, and I suppose 
they compete for RAM.


This is all on an AWS EC2 instance with EBS storage. We also run a warm 
standby with streaming replication.


Since these tables are so different from everything else, I'm wondering 
what opportunities we have to reduce their performance cost. I'm 
interested both in practical high-bang-for-buck changes, but also in 
harder just-interesting-to-think-about last-resort approaches. Here are 
a few ideas of my own, but I'm curious what others think:


We already have no indexes or foreign keys on these tables, so at least 
there's no cost there.


Since they are already partitioned, we could move old data to offline 
storage and drop those tables. This feels like the biggest, easiest win, 
and something we should have done a long time ago. Probably it's all we 
need.


Put them on a different tablespace. This one is also pretty obvious, but 
aside from using a separate disk, I'm curious what other crazy things we 
could do. Is there any per-tablespace tuning possible? (I think the 
answer within Postgres is no, but I wish we could change the settings 
for wal_level, or exclude them from replication, or something, so I'm 
wondering if we could achieve the same effect by exploiting being on a 
separate filesystem.) Maybe put the tablespace on some FUSE filesystem 
to get async writes? Or just pick different mount options, e.g. on ext4 
lazytime,dealloc,data=writeback? I don't know. Or at a different level: 
change the triggers so they call a custom function that uses a new 
thread to store the audit records elsewhere. Maybe these ideas are all 
too risky, but I think the organization is fine with slightly relaxed 
durability guarantees for this data, and anyway I'm just curious to have 
a list of possibilities before I categorize anything as too crazy or 
not. :-)


If we upgraded to pg 10 we could use logical replication and leave out 
the audit tables. That is appealing. Even without upgrading, I guess we 
could replace those tables with postgres_fdw ones, so that they are not 
replicated? Has anyone else used that trick?


Thanks!

--
Paul  ~{:-)
p...@illuminatedcomputing.com