Re: 'Invalid lp' during heap_xlog_delete

2019-12-10 Thread Daniel Wood
> On December 6, 2019 at 3:06 PM Andres Freund wrote: ... > > crash > > smgrtruncate - Not reached > > This seems like a somewhat confusing description to me, because > smgrtruncate() is what calls DropRelFileNodeBuffers(). I assume what you > mean by "smgrtruncate" is not the function, but the

Re: 'Invalid lp' during heap_xlog_delete

2019-12-06 Thread Andres Freund
Hi, On 2019-11-08 12:46:51 -0800, Daniel Wood wrote: > Page on disk has empty lp 1 > * Insert into page lp 1 > > checkpoint START.Redo eventually starts here. > ** Delete all rows on page. It's worthwhile to note that this part cannot happen without full page writes disabled. By dint of a

Re: 'Invalid lp' during heap_xlog_delete

2019-11-18 Thread Daniel Wood
> I'll try to look into that with lower page sizes for relation and WAL > pages. The page size is totally unrelated to this bug. When you repro the redo failure it is because the log record is being applied to an old page version. The correct newer page version never got written because

Re: 'Invalid lp' during heap_xlog_delete

2019-11-18 Thread Michael Paquier
On Thu, Nov 14, 2019 at 07:38:19PM -0800, Daniel Wood wrote: > Sorry I missed one thing. Turn off full page writes. Hmm. Linux FSes use typically 4kB pages. I'll try to look into that with lower page sizes for relation and WAL pages. > I'm running in an env. with atomic 8K writes. What's

Re: 'Invalid lp' during heap_xlog_delete

2019-11-14 Thread Daniel Wood
Sorry I missed one thing. Turn off full page writes. I'm running in an env. with atomic 8K writes. > On November 12, 2019 at 6:23 PM Daniel Wood wrote: > > It's been tedious to get it exactly right but I think I got it. FYI, I > was delayed because today we had yet another customer hit

Re: 'Invalid lp' during heap_xlog_delete

2019-11-12 Thread Daniel Wood
It's been tedious to get it exactly right but I think I got it. FYI, I was delayed because today we had yet another customer hit this: 'redo max offset' error. The system crashed as a number of autovacuums and a checkpoint happened and then the REDO failure. Two tiny code changes:

Re: 'Invalid lp' during heap_xlog_delete

2019-11-10 Thread Michael Paquier
On Fri, Nov 08, 2019 at 06:44:08PM -0800, Daniel Wood wrote: > I repro'ed on PG11 and PG10 STABLE but several months old. > I looked at 6d05086 but it doesn't address the core issue. > > DropRelFileNodeBuffers prevents the checkpoint from writing all > needed dirty pages for any REDO's that exist

Re: 'Invalid lp' during heap_xlog_delete

2019-11-08 Thread Daniel Wood
I repro'ed on PG11 and PG10 STABLE but several months old. I looked at 6d05086 but it doesn't address the core issue. DropRelFileNodeBuffers prevents the checkpoint from writing all needed dirty pages for any REDO's that exist BEFORE the truncate. If we crash after a checkpoint but before the

'Invalid lp' during heap_xlog_delete

2019-11-08 Thread Daniel Wood
Page on disk has empty lp 1 * Insert into page lp 1 checkpoint START.Redo eventually starts here. ** Delete all rows on page. autovac truncate DropRelFileNodeBuffers - dirty page NOT written. lp 1 on disk still empty checkpoint completes crash smgrtruncate - Not reached heap_xlog_delete