On Wed, Aug 17, 2016 at 5:18 PM, Jim Nasby <jim.na...@bluetreble.com> wrote:
> On 8/17/16 2:51 PM, Simon Riggs wrote:
>> On 17 August 2016 at 12:19, Greg Stark <st...@mit.edu> wrote:
>>> Yes, this is exactly what it should be doing and exactly why it's
>>> useful. Physical replication accurately replicates the data from the
>>> master including "corruption" whereas a logical replication system
>>> will not, causing divergence and possible issues during a failover.
>> Yay! Completely agree.
>> Physical replication, as used by DRBD and all other block-level HA
>> solutions, and also used by other databases, such as Oracle.
>> Corruption on the master would often cause errors that would prevent
>> writes and therefore those changes wouldn't even be made, let alone be
>> replicated.
> My experience has been that you discover corruption after it's already
> safely on disk, and more than once I've been able to recover by using data
> on a londiste replica.
> As I said originally, it's critical to understand the different solutions
> and the pros and cons of each. There is no magic bullet.

Data point: in the half or so cases I've experienced corruption on
replicated systems, in all cases but one the standby was clean.  The
'unclean' case actually 8.2 warm standby; the source of the corruption
was a very significant bug where prepared statements would write back
corrupted data if the table definitions changed under the statement
(fixed in 8.3).  In that particular case the corruption was very
unfortunately quite widespread and passed directly along to the
standby server.  This bug nearly costed us a user as well although not
nearly so famous as uber :-).

In the few modern cases I've seen I've not been able to trace it back
to any bug in postgres (in particular multixact was ruled out) and
I've chalked it up to media or (more likely I think) filesystem
problems in the face of a -9 reset.


Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to