Re: [HACKERS] Funny WAL corruption issue

2017-08-11 Thread Chris Travers
On Fri, Aug 11, 2017 at 1:33 PM, Greg Stark wrote: > On 10 August 2017 at 15:26, Chris Travers wrote: > > > > > > The bitwise comparison is interesting. Remember the error was: > > > > pg_xlogdump: FATAL: error in WAL record at 1E39C/E1117FB8:

Re: [HACKERS] Funny WAL corruption issue

2017-08-11 Thread Chris Travers
On Fri, Aug 11, 2017 at 1:33 PM, Greg Stark wrote: > On 10 August 2017 at 15:26, Chris Travers wrote: > > > > > > The bitwise comparison is interesting. Remember the error was: > > > > pg_xlogdump: FATAL: error in WAL record at 1E39C/E1117FB8:

Re: [HACKERS] Funny WAL corruption issue

2017-08-11 Thread Greg Stark
On 10 August 2017 at 15:26, Chris Travers wrote: > > > The bitwise comparison is interesting. Remember the error was: > > pg_xlogdump: FATAL: error in WAL record at 1E39C/E1117FB8: unexpected > pageaddr 1E375/61118000 in log segment 0001E39C00E1, offset >

Re: [HACKERS] Funny WAL corruption issue

2017-08-10 Thread Chris Travers
On Thu, Aug 10, 2017 at 3:17 PM, Vladimir Rusinov wrote: > > > On Thu, Aug 10, 2017 at 1:48 PM, Aleksander Alekseev < > a.aleks...@postgrespro.ru> wrote: > >> I just wanted to point out that a hardware issue or third party software >> issues (bugs in FS, software RAID, ...)

Re: [HACKERS] Funny WAL corruption issue

2017-08-10 Thread Vladimir Rusinov
On Thu, Aug 10, 2017 at 1:48 PM, Aleksander Alekseev < a.aleks...@postgrespro.ru> wrote: > I just wanted to point out that a hardware issue or third party software > issues (bugs in FS, software RAID, ...) could not be fully excluded from > the list of suspects. According to the talk by

Re: [HACKERS] Funny WAL corruption issue

2017-08-10 Thread Aleksander Alekseev
Hi Chris, > I ran into a funny situation today regarding PostgreSQL replication and > wal corruption and wanted to go over what I think happened and what I > wonder about as a possible solution. Sad story. Unfortunately I have no idea what could be a reason nor can I suggest a good way to find

Re: [HACKERS] Funny WAL corruption issue

2017-08-10 Thread Chris Travers
> Yes. Exactly the same output until a certain point where pg_xlogdump dies > with an error. That is the same LSN where PostgreSQL died with an error on > restart. > One other thing that is possibly relevant after reading through the last bug report is the error pgxlogdumo gives: pg_xlogdump:

Re: [HACKERS] Funny WAL corruption issue

2017-08-10 Thread Chris Travers
Sorry, meant to reply all. On Thu, Aug 10, 2017 at 2:19 PM, Vladimir Borodin wrote: > Hi, Chris. > > 10 авг. 2017 г., в 15:09, Chris Travers > написал(а): > > Hi; > > I ran into a funny situation today regarding PostgreSQL replication and > wal

Re: [HACKERS] Funny WAL corruption issue

2017-08-10 Thread Vladimir Borodin
Hi, Chris. > 10 авг. 2017 г., в 15:09, Chris Travers написал(а): > > Hi; > > I ran into a funny situation today regarding PostgreSQL replication and wal > corruption and wanted to go over what I think happened and what I wonder > about as a possible solution. > >

[HACKERS] Funny WAL corruption issue

2017-08-10 Thread Chris Travers
Hi; I ran into a funny situation today regarding PostgreSQL replication and wal corruption and wanted to go over what I think happened and what I wonder about as a possible solution. Basic information is custom-build PostgreSQL 9.6.3 on Gentoo, on a ~5TB database with variable load. Master