Re: [ADMIN] recovery is stuck when children are not processing SIGQUIT from previous crash

2009-11-12 Thread Tom Lane
Peter Eisentraut writes: > On tor, 2009-11-12 at 10:45 -0500, Tom Lane wrote: >> In practice the code path isn't sufficiently used or critical >> enough to be worth trying to make that bulletproof. > Well, the subject line is "recovery is stuck". Not critical enough? The particular case looks l

Re: [ADMIN] recovery is stuck when children are not processing SIGQUIT from previous crash

2009-11-12 Thread Marko Kreen
On 11/12/09, Tom Lane wrote: > Marko Kreen writes: > > You talked about blocking in quickdie(), but you'd need > > to block in elog(). > > I'm not really particularly worried about that case. By that logic, > we could not use quickdie at all, because any part of the system > might be doing s

Re: [ADMIN] recovery is stuck when children are not processing SIGQUIT from previous crash

2009-11-12 Thread Peter Eisentraut
On tor, 2009-11-12 at 10:45 -0500, Tom Lane wrote: > In practice the code path isn't sufficiently used or critical > enough to be worth trying to make that bulletproof. Well, the subject line is "recovery is stuck". Not critical enough? -- Sent via pgsql-admin mailing list (pgsql-admin@postgre

[ADMIN] recovery lag question

2009-11-12 Thread John Lister
Hi, I've set up a warm standby box with postgresql 8.3.8 and pg_standby. Everything seems to be ok, the wal files are being copied across and being processed during the recovery as you'd expect but I have one question. The recovery seems to be processing the final wal files at the same rate as t

[ADMIN] recovery lag question

2009-11-12 Thread John Lister
Hi, I've set up a warm standby box with postgresql 8.3.8 and pg_standby. Everything seems to be ok except and the wal files are being copied across and being processed during the recovery as you'd expect but I have one question. The recovery seems to be processing the final wal files (about 30)

Re: [ADMIN] recovery is stuck when children are not processing SIGQUIT from previous crash

2009-11-12 Thread Tom Lane
Marko Kreen writes: > You talked about blocking in quickdie(), but you'd need > to block in elog(). I'm not really particularly worried about that case. By that logic, we could not use quickdie at all, because any part of the system might be doing something that wouldn't survive being interrupte

Re: [ADMIN] recovery is stuck when children are not processing SIGQUIT from previous crash

2009-11-12 Thread Marko Kreen
On 11/12/09, Tom Lane wrote: > Marko Kreen writes: > > On 11/12/09, Tom Lane wrote: > >> The other thought is that quickdie should block signals before > >> starting to do anything. > > > There would still be possibility of recursive syslog() calls. > > Shouldn't we fix that too? > > > That

Re: [ADMIN] recovery is stuck when children are not processing SIGQUIT from previous crash

2009-11-12 Thread Tom Lane
Marko Kreen writes: > On 11/12/09, Tom Lane wrote: >> The other thought is that quickdie should block signals before >> starting to do anything. > There would still be possibility of recursive syslog() calls. > Shouldn't we fix that too? That's what the signal block would do.

Re: [ADMIN] recovery is stuck when children are not processing SIGQUIT from previous crash

2009-11-12 Thread Marko Kreen
On 11/12/09, Tom Lane wrote: > The other thought is that quickdie should block signals before > starting to do anything. There would still be possibility of recursive syslog() calls. Shouldn't we fix that too? I'm not sure how exactly. If the recursive elog() must stay, then perhaps simple 'v

Re: [ADMIN] recovery is stuck when children are not processing SIGQUIT from previous crash

2009-11-12 Thread Tom Lane
Peter Eisentraut writes: >>> strace on the backend processes all showed them waiting at >>> futex(0x7f1ee5e21c90, FUTEX_WAIT_PRIVATE, 2, NULL >>> Notably, the first argument was the same for all of them. > Looks like a race condition or lockup in the syslog code. Hm, why are there two calls in

Re: [ADMIN] recovery is stuck when children are not processing SIGQUIT from previous crash

2009-11-12 Thread Marko Kreen
On 11/12/09, Peter Eisentraut wrote: > On lör, 2009-09-26 at 12:19 -0400, Tom Lane wrote: > > Peter Eisentraut writes: > > > strace on the backend processes all showed them waiting at > > > futex(0x7f1ee5e21c90, FUTEX_WAIT_PRIVATE, 2, NULL > > > Notably, the first argument was the same for al

Re: [ADMIN] recovery is stuck when children are not processing SIGQUIT from previous crash

2009-11-12 Thread Peter Eisentraut
On lör, 2009-09-26 at 12:19 -0400, Tom Lane wrote: > Peter Eisentraut writes: > > strace on the backend processes all showed them waiting at > > futex(0x7f1ee5e21c90, FUTEX_WAIT_PRIVATE, 2, NULL > > Notably, the first argument was the same for all of them. > > Probably means they are blocked on s