Alvaro Herrera wrote:
> Alvaro Herrera wrote:
> > Stefan Kaltenbrunner wrote:
> >
> > > well - i now have a core file but it does not seem to be much worth
> > > except to prove that autovacuum seems to be the culprit:
> > >
> > > Core was generated by `postgres: autovacuum worker process
> > > '.
> > > Program terminated with signal 6, Aborted.
> > >
> > > [...]
> > >
> > > #0 0x00000ed9 in ?? ()
> > > warning: GDB can't find the start of the function at 0xed9.
>
> I just noticed an ugly bug in the worker code which I'm fixing. I think
> this one would also throw SIGSEGV, not SIGABRT.
Nailed it -- this is the actual bug that causes the abort. But I am
surprised that it doesn't print the error message in Stefan machine's;
here it outputs
TRAP: FailedAssertion("!((((unsigned long)(elem)) > ShmemBase))", File:
"/pgsql/source/00head/src/backend/storage/ipc/shmqueue.c", Line: 107)
16496 2007-05-02 11:30:31 CLT DEBUG: server process (PID 16540) was terminated
by signal 6: Aborted
16496 2007-05-02 11:30:31 CLT LOG: server process (PID 16540) was terminated
by signal 6: Aborted
16496 2007-05-02 11:30:31 CLT LOG: terminating any other active server
processes
16496 2007-05-02 11:30:31 CLT DEBUG: sending SIGQUIT to process 16541
16496 2007-05-02 11:30:31 CLT DEBUG: sending SIGQUIT to process 16498
16496 2007-05-02 11:30:31 CLT DEBUG: sending SIGQUIT to process 16500
16496 2007-05-02 11:30:31 CLT DEBUG: sending SIGQUIT to process 16499
16541 2007-05-02 11:30:33 CLT WARNING: terminating connection because of crash
of another server process
Maybe stderr is going somewhere else? That would be strange, I think.
I'll commit the fix shortly; attached.
--
Alvaro Herrera http://www.flickr.com/photos/alvherre/
"La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)
Index: src/backend/postmaster/autovacuum.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/postmaster/autovacuum.c,v
retrieving revision 1.42
diff -c -p -r1.42 autovacuum.c
*** src/backend/postmaster/autovacuum.c 18 Apr 2007 16:44:18 -0000 1.42
--- src/backend/postmaster/autovacuum.c 2 May 2007 15:25:27 -0000
*************** AutoVacWorkerMain(int argc, char *argv[]
*** 1407,1431 ****
* Get the info about the database we're going to work on.
*/
LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
! MyWorkerInfo = (WorkerInfo) MAKE_PTR(AutoVacuumShmem->av_startingWorker);
! dbid = MyWorkerInfo->wi_dboid;
! MyWorkerInfo->wi_workerpid = MyProcPid;
!
! /* insert into the running list */
! SHMQueueInsertBefore(&AutoVacuumShmem->av_runningWorkers,
! &MyWorkerInfo->wi_links);
/*
! * remove from the "starting" pointer, so that the launcher can start a new
! * worker if required
*/
! AutoVacuumShmem->av_startingWorker = INVALID_OFFSET;
! LWLockRelease(AutovacuumLock);
! on_shmem_exit(FreeWorkerInfo, 0);
! /* wake up the launcher */
! if (AutoVacuumShmem->av_launcherpid != 0)
! kill(AutoVacuumShmem->av_launcherpid, SIGUSR1);
if (OidIsValid(dbid))
{
--- 1407,1442 ----
* Get the info about the database we're going to work on.
*/
LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
!
/*
! * beware of startingWorker being INVALID; this could happen if the
! * launcher thinks we've taking too long to start.
*/
! if (AutoVacuumShmem->av_startingWorker != INVALID_OFFSET)
! {
! MyWorkerInfo = (WorkerInfo) MAKE_PTR(AutoVacuumShmem->av_startingWorker);
! dbid = MyWorkerInfo->wi_dboid;
! MyWorkerInfo->wi_workerpid = MyProcPid;
!
! /* insert into the running list */
! SHMQueueInsertBefore(&AutoVacuumShmem->av_runningWorkers,
! &MyWorkerInfo->wi_links);
! /*
! * remove from the "starting" pointer, so that the launcher can start a new
! * worker if required
! */
! AutoVacuumShmem->av_startingWorker = INVALID_OFFSET;
! LWLockRelease(AutovacuumLock);
! on_shmem_exit(FreeWorkerInfo, 0);
! /* wake up the launcher */
! if (AutoVacuumShmem->av_launcherpid != 0)
! kill(AutoVacuumShmem->av_launcherpid, SIGUSR1);
! }
! else
! /* no worker entry for me, go away */
! LWLockRelease(AutovacuumLock);
if (OidIsValid(dbid))
{
*************** AutoVacWorkerMain(int argc, char *argv[]
*** 1466,1473 ****
}
/*
! * FIXME -- we need to notify the launcher when we are gone. But this
! * should be done after our PGPROC is released, in ProcKill.
*/
/* All done, go away */
--- 1477,1484 ----
}
/*
! * The launcher will be notified of my death in ProcKill, *if* we managed
! * to get a worker slot at all
*/
/* All done, go away */
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org