I wrote:
> I suppose that if we're unable to reproduce it on at least one other box,
> we have to write it off as hardware flakiness.

BTW, that conclusion shouldn't distract us from the very real bug
that Andres identified.  I was just scraping the buildfarm logs
concerning recent failures, and I found several recent cases
that match the symptom he reported:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2021-04-23%2022%3A27%3A41
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2021-04-21%2005%3A15%3A24
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2021-04-20%2002%3A03%3A08
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2021-05-04%2004%3A07%3A41
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2021-04-20%2021%3A08%3A59

They all show the standby in recovery/019_replslot_limit.pl failing
with symptoms like

2021-05-04 07:42:00.968 UTC [24707406:1] LOG:  database system was shut down in 
recovery at 2021-05-04 07:41:39 UTC
2021-05-04 07:42:00.968 UTC [24707406:2] LOG:  entering standby mode
2021-05-04 07:42:01.050 UTC [24707406:3] LOG:  redo starts at 0/1C000D8
2021-05-04 07:42:01.079 UTC [24707406:4] LOG:  consistent recovery state 
reached at 0/1D00000
2021-05-04 07:42:01.079 UTC [24707406:5] FATAL:  invalid memory alloc request 
size 1476397045
2021-05-04 07:42:01.080 UTC [13238274:3] LOG:  database system is ready to 
accept read only connections
2021-05-04 07:42:01.082 UTC [13238274:4] LOG:  startup process (PID 24707406) 
exited with exit code 1

(BTW, the behavior seen here where the failure occurs *immediately*
after reporting "consistent recovery state reached" is seen in the
other reports as well, including Andres' version.  I wonder if that
means anything.)

                        regards, tom lane


Reply via email to