On Thu, Apr 22, 2021 at 8:07 AM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > On 4/21/21 6:30 PM, Tom Lane wrote: > > Thomas Munro <thomas.mu...@gmail.com> writes: > >> Yeah, it would have been nice to include that but it'll have to be for > >> v15 due to lack of time to convince myself that it was correct. I do > >> intend to look into more concurrency of that kind for v15. I have > >> pushed these patches, updated to be disabled by default. > > > > I have a fairly bad feeling about these patches. I've already fixed > > one critical bug (see 9e4114822), but I am still seeing random, hard > > to reproduce failures in WAL replay testing. It looks like sometimes > > the "decoded" version of a WAL record doesn't match what I see in > > the on-disk data, which I'm having no luck tracing down.
Ugh. Looking into this now. Also, this week I have been researching a possible problem with eg ALTER TABLE SET TABLESPACE in the higher level patch, which I'll write about soon. > > I am not sure whether the checksum failure itself is real or a variant > > of the seeming bad-reconstruction problem, but what I'm on about right > > at this moment is that the error handling logic for this case seems > > quite broken. Why is a checksum failure only worthy of a LOG message? > > Why is ValidXLogRecord() issuing a log message for itself, rather than > > being tied into the report_invalid_record() mechanism? Why are we > > evidently still trying to decode records afterwards? > > Yeah, that seems suspicious. I may have invited trouble by deciding to rebase on the other proposal late in the cycle. That interfaces around there. > > In general, I'm not too pleased with the apparent attitude in this > > thread that it's okay to push a patch that only mostly works on the > > last day of the dev cycle and plan to stabilize it later. > > Was there such attitude? I don't think people were arguing for pushing a > patch's not working correctly. The discussion was mostly about getting > it committed even and leaving some optimizations for v15. That wasn't my plan, but I admit that the timing was non-ideal. In any case, I'll dig into these failures and then consider options. More soon.