subject:"\[GENERAL\] Critical failure of standby"

Re: [GENERAL] Critical failure of standby

2016-08-20 Thread Jeff Janes

On Mon, Aug 15, 2016 at 7:23 PM, James Sewell wrote: > Those are all good questions. > > Essentially this is a situation where DR is network separated from Prod - > so I would expect the archive command to fail. > archive_command or restore_command? I thought it was restore_command. > I'll

Re: [GENERAL] Critical failure of standby

2016-08-17 Thread James Sewell

Hi, No, this was a one off in a network split situation. I'll check the startup when I get a chance - thanks for the help. Cheers, James Sewell, Solutions Architect Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009 *P *(+61) 2 8099 9000 <(+61)%202%208099%209000> *W* www.jirot

Re: [GENERAL] Critical failure of standby

2016-08-16 Thread Simon Riggs

On 16 August 2016 at 08:11, James Sewell wrote: > As per the logs there was a crash of one standby, which seems to have > corrupted that standby and the two cascading standby. > >- No backups >- Full page writes enabled >- Fsync enabled > > WAL records are CRC checked, so it may just

Re: [GENERAL] Critical failure of standby

2016-08-16 Thread James Sewell

Hey Sameer, As per the logs there was a crash of one standby, which seems to have corrupted that standby and the two cascading standby. - No backups - Full page writes enabled - Fsync enabled Cheers, James Sewell, Solutions Architect Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, P

Re: [GENERAL] Critical failure of standby

2016-08-15 Thread Sameer Kumar

On Tue, Aug 16, 2016 at 1:10 PM James Sewell wrote: > Hey, > > I understand that. > > But a hot standby should always be ready to promote (given it originally > caught up) right? > > I think it's a moot point really as some sort of corruption has been > introduced, the machines still won't wouldn

Re: [GENERAL] Critical failure of standby

2016-08-15 Thread James Sewell

Hey, I understand that. But a hot standby should always be ready to promote (given it originally caught up) right? I think it's a moot point really as some sort of corruption has been introduced, the machines still won't wouldn't start after they could see the archive destination again. Cheers,

Re: [GENERAL] Critical failure of standby

2016-08-15 Thread John R Pierce

On 8/15/2016 7:23 PM, James Sewell wrote: Those are all good questions. Essentially this is a situation where DR is network separated from Prod - so I would expect the archive command to fail. I'll have to check the script it must not be passing the error back through to PostgreSQL. This st

Re: [GENERAL] Critical failure of standby

2016-08-15 Thread James Sewell

Those are all good questions. Essentially this is a situation where DR is network separated from Prod - so I would expect the archive command to fail. I'll have to check the script it must not be passing the error back through to PostgreSQL. This still shouldn't cause database corruption though r

Re: [GENERAL] Critical failure of standby

2016-08-15 Thread Jeff Janes

On Thu, Aug 11, 2016 at 10:39 PM, James Sewell wrote: > Hello, > > We recently experienced a critical failure when failing to a DR > environment. > > This is in the following environment: > > >- 3 x PostgreSQL machines in Prod in a sync replication cluster >- 3 x PostgreSQL machines in DR

Re: [GENERAL] Critical failure of standby

2016-08-14 Thread James Sewell

Hello All, The thing which I find a little worrying is that this 'corruption' was introduced either on the network from PROD -> DR, but then also cascaded to both other DR servers (either via replication or via archive_command). Is WAL corruption checked for in any way on standby servers?. Here

Re: [GENERAL] Critical failure of standby

2016-08-12 Thread James Sewell

Hello, I double posted this (posted once from an unregistered email and assumed it would be junked). I'm continuing all discussion on the other thread now. Cheers, James Sewell, PostgreSQL Team Lead / Solutions Architect Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009 *P *(+

Re: [GENERAL] Critical failure of standby

2016-08-12 Thread James Sewell

(from other thread) - 9.5.3 - Redhat 7.2 on VMWare - Single PostgreSQL instance one each machine - Every machine in DR became corrupt, so interestingly this must have been sent to the two cascading nodes via WAL before the crash on the hub DR node - No OS logs indicating anyt

Re: [GENERAL] Critical failure of standby

2016-08-12 Thread Alvaro Herrera

James Sewell wrote: > 2016-08-12 04:43:53 GMT [23614]: [5-1] user=,db=,client= (0:0)LOG: > consistent recovery state reached at 3/8811DFF0 > 2016-08-12 04:43:53 GMT [23614]: [6-1] user=,db=,client= (0:XX000)FATAL: > invalid memory alloc request size 3445219328 > 2016-08-12 04:43:53 GMT [

Re: [GENERAL] Critical failure of standby

2016-08-12 Thread Melvin Davidson

On Fri, Aug 12, 2016 at 1:39 AM, James Sewell wrote: > Hello, > > We recently experienced a critical failure when failing to a DR > environment. > > This is in the following environment: > > >- 3 x PostgreSQL machines in Prod in a sync replication cluster >- 3 x PostgreSQL machines in DR,

[GENERAL] Critical failure of standby

2016-08-12 Thread James Sewell

Hello, We recently experienced a critical failure when failing to a DR environment. This is in the following environment: - 3 x PostgreSQL machines in Prod in a sync replication cluster - 3 x PostgreSQL machines in DR, with a single machine async and the other two cascading from the fi

Re: [GENERAL] Critical failure of standby

Re: [GENERAL] Critical failure of standby

Re: [GENERAL] Critical failure of standby

Re: [GENERAL] Critical failure of standby

Re: [GENERAL] Critical failure of standby

Re: [GENERAL] Critical failure of standby

Re: [GENERAL] Critical failure of standby

Re: [GENERAL] Critical failure of standby

Re: [GENERAL] Critical failure of standby

Re: [GENERAL] Critical failure of standby

Re: [GENERAL] Critical failure of standby

Re: [GENERAL] Critical failure of standby

Re: [GENERAL] Critical failure of standby

Re: [GENERAL] Critical failure of standby

[GENERAL] Critical failure of standby

15 matches

Site Navigation

Mail list logo

Footer information