On Mon, Aug 15, 2016 at 7:23 PM, James Sewell
wrote:
> Those are all good questions.
>
> Essentially this is a situation where DR is network separated from Prod -
> so I would expect the archive command to fail.
>
archive_command or restore_command? I thought it was restore_command.
> I'll
Hi,
No, this was a one off in a network split situation.
I'll check the startup when I get a chance - thanks for the help.
Cheers,
James Sewell,
Solutions Architect
Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009
*P *(+61) 2 8099 9000 <(+61)%202%208099%209000> *W* www.jirot
On 16 August 2016 at 08:11, James Sewell wrote:
> As per the logs there was a crash of one standby, which seems to have
> corrupted that standby and the two cascading standby.
>
>- No backups
>- Full page writes enabled
>- Fsync enabled
>
> WAL records are CRC checked, so it may just
Hey Sameer,
As per the logs there was a crash of one standby, which seems to have
corrupted that standby and the two cascading standby.
- No backups
- Full page writes enabled
- Fsync enabled
Cheers,
James Sewell,
Solutions Architect
Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, P
On Tue, Aug 16, 2016 at 1:10 PM James Sewell
wrote:
> Hey,
>
> I understand that.
>
> But a hot standby should always be ready to promote (given it originally
> caught up) right?
>
> I think it's a moot point really as some sort of corruption has been
> introduced, the machines still won't wouldn
Hey,
I understand that.
But a hot standby should always be ready to promote (given it originally
caught up) right?
I think it's a moot point really as some sort of corruption has been
introduced, the machines still won't wouldn't start after they could see
the archive destination again.
Cheers,
On 8/15/2016 7:23 PM, James Sewell wrote:
Those are all good questions.
Essentially this is a situation where DR is network separated from
Prod - so I would expect the archive command to fail. I'll have to
check the script it must not be passing the error back through to
PostgreSQL.
This st
Those are all good questions.
Essentially this is a situation where DR is network separated from Prod -
so I would expect the archive command to fail. I'll have to check the
script it must not be passing the error back through to PostgreSQL.
This still shouldn't cause database corruption though r
On Thu, Aug 11, 2016 at 10:39 PM, James Sewell
wrote:
> Hello,
>
> We recently experienced a critical failure when failing to a DR
> environment.
>
> This is in the following environment:
>
>
>- 3 x PostgreSQL machines in Prod in a sync replication cluster
>- 3 x PostgreSQL machines in DR
Hello All,
The thing which I find a little worrying is that this 'corruption' was
introduced either on the network from PROD -> DR, but then also cascaded to
both other DR servers (either via replication or via archive_command).
Is WAL corruption checked for in any way on standby servers?.
Here
Hello,
I double posted this (posted once from an unregistered email and assumed it
would be junked).
I'm continuing all discussion on the other thread now.
Cheers,
James Sewell,
PostgreSQL Team Lead / Solutions Architect
Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009
*P *(+
(from other thread)
- 9.5.3
- Redhat 7.2 on VMWare
- Single PostgreSQL instance one each machine
- Every machine in DR became corrupt, so interestingly this must have
been sent to the two cascading nodes via WAL before the crash on the hub DR
node
- No OS logs indicating anyt
James Sewell wrote:
> 2016-08-12 04:43:53 GMT [23614]: [5-1] user=,db=,client= (0:0)LOG:
> consistent recovery state reached at 3/8811DFF0
> 2016-08-12 04:43:53 GMT [23614]: [6-1] user=,db=,client= (0:XX000)FATAL:
> invalid memory alloc request size 3445219328
> 2016-08-12 04:43:53 GMT [
On Fri, Aug 12, 2016 at 1:39 AM, James Sewell
wrote:
> Hello,
>
> We recently experienced a critical failure when failing to a DR
> environment.
>
> This is in the following environment:
>
>
>- 3 x PostgreSQL machines in Prod in a sync replication cluster
>- 3 x PostgreSQL machines in DR,
Hello,
We recently experienced a critical failure when failing to a DR environment.
This is in the following environment:
- 3 x PostgreSQL machines in Prod in a sync replication cluster
- 3 x PostgreSQL machines in DR, with a single machine async and the
other two cascading from the fi
15 matches
Mail list logo