Re: [HACKERS] How to handle waitingForLock in LockWaitCancel()

2001-03-08 Thread Hiroshi Inoue
Tom Lane wrote:
 
 Hiroshi Inoue [EMAIL PROTECTED] writes:
  Is it OK to commit the change before 7.1 release ?
  I want to do it before forgetting this issue.
 
 If that fixes the problem for you, then commit it.  I was waiting to
 hear back whether you still saw a crash or not...
 

I see no crash in my test case.
I would commit the change.

regards,
Hiroshi Inoue

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] How to handle waitingForLock in LockWaitCancel()

2001-03-05 Thread Hiroshi Inoue
Tom Lane wrote:
 
 Hiroshi Inoue [EMAIL PROTECTED] writes:
  I sometimes encountered SEGV errors in my test case
  when I canceled the execution.
 
 Can you provide backtraces from these SEGVs?
 

ISTM the backtrace isn't sufficient to figure out
how the error occured. As far as I see the error
was caused by other backends which terimnated
safely with the similar sequence of signals.
I have a (about 7MB) postmaster log with the
trace_locks option on. I could send it to you
if you want. Note that I applied the following
patch to debug my suspicion. In addition I added
an SegvException Handler to postgres.c.

diff -c -r1.35 proc.c
*** storage/lmgr/proc.c 2001/01/29 10:00:58 1.35
--- storage/lmgr/proc.c 2001/03/05 07:56:56
***
*** 327,332 
--- 327,334 
if (!waitingForLock)
return false;
  
+   if (MyProc-links.next != INVALID_OFFSET)
+   fprintf(stderr, "LockWaitCancel pid=%d must
RemoveFromWaitQueue\n", MyProc-pid);
waitingForLock = false;
  
/* Turn off the deadlock timer, if it's still running (see
ProcSleep) */
 
And the backtrace is as follows.

#0  0x40130db7 in __libc_pause ()
#1  0x811981c in SegvExceptionHandler (postgres_signal_arg=11)
at postgres.c:966
#2  signal handler called
#3  0x8482 in ?? ()
#4  0x811547e in ProcLockWakeup (lockMethodTable=0x8204260,
lock=0x409d6bdc)
at proc.c:771
#5  0x8114500 in LockReleaseAll (lockmethod=1, proc=0x409d9d20, 
allxids=1 '\001', xid=0) at lock.c:1438
#6  0x81150b4 in ProcKill () at proc.c:446
#7  0x810df3b in shmem_exit (code=0) at ipc.c:187
#8  0x810dead in proc_exit (code=0) at ipc.c:146
#9  0x815891e in elog (lev=1, fmt=0x81aee71 "The system is shutting
down")
at elog.c:465
#10 0x8119927 in ProcessInterrupts () at postgres.c:1039
#11 0x810c526 in s_lock (lock=0x40195016 "\001", file=0x81ac4ea
"spin.c", 
line=156) at s_lock.c:140
#12 0x810fec7 in SpinAcquire (lockid=6) at spin.c:156
#13 0x8114f74 in LockWaitCancel () at proc.c:349
#14 0x81197d8 in die (postgres_signal_arg=15) at postgres.c:947
#15 signal handler called
#16 0x8482 in ?? ()
#17 0x400ffcd4 in _IO_old_file_xsputn (f=0x81ce3a8, data=0xbfffc288,
n=49)
at oldfileops.c:294
#18 0x400f1acf in buffered_vfprintf (s=0x81ce3a8, 
format=0x81adba0 "LockWaitCancel pid=%d must RemoveFromWaitQueue\n", 
args=0xbfffe8c8) at vfprintf.c:1767
#19 0x400ed5cc in _IO_vfprintf (s=0x81ce3a8, 
format=0x81adba0 "LockWaitCancel pid=%d must RemoveFromWaitQueue\n", 
ap=0xbfffe8c8) at vfprintf.c:1029
#20 0x400f4f03 in fprintf (stream=0x81ce3a8, 
format=0x81adba0 "LockWaitCancel pid=%d must RemoveFromWaitQueue\n")
at fprintf.c:32
#21 0x8114f25 in LockWaitCancel () at proc.c:331
#22 0x811988c in QueryCancelHandler (postgres_signal_arg=2) at
postgres.c:993
#23 signal handler called
#24 0x8482 in ?? ()
#25 0x810e22b in IpcSemaphoreLock (semId=32768, sem=8, interruptOK=1
'\001')
at ipc.c:423
#26 0x811530f in ProcSleep (lockMethodTable=0x8204260, lockmode=4, 
lock=0x409d726c, holder=0x409d7e08) at proc.c:669
#27 0x8112eb3 in WaitOnLock (lockmethod=1, lockmode=4, lock=0x409d726c, 
holder=0x409d7e08) at lock.c:995
#28 0x81124d8 in LockAcquire (lockmethod=1, locktag=0xbfffeb5c,
xid=373273, 
lockmode=4) at lock.c:779
#29 0x8111372 in XactLockTableWait (xid=373282) at lmgr.c:309
#30 0x80cf5ef in EvalPlanQual (estate=0x8260a0c, rti=1, tid=0xbfffebe0)
at execMain.c:1751
#31 0x80ceebd in ExecReplace (slot=0x8260bec, tupleid=0xbfffec3c, 
estate=0x8260a0c) at execMain.c:1449
#32 0x80ceb2e in ExecutePlan (estate=0x8260a0c, plan=0x8260964, 
operation=CMD_UPDATE, numberTuples=0,
direction=ForwardScanDirection, 
destfunc=0x82611c0) at execMain.c:1119
#33 0x80cdf73 in ExecutorRun (queryDesc=0x82609f0, estate=0x8260a0c, 
feature=3, count=0) at execMain.c:233
#34 0x811ac2b in ProcessQuery (parsetree=0x825379c, plan=0x8260964, 
dest=Remote) at pquery.c:297
#35 0x811964b in pg_exec_query_string (
query_string=0x8253400 "update branches set bbalance = bbalance +
687 where bid = 1", dest=Remote, parse_context=0x8237640) at
postgres.c:810
#36 0x811a712 in PostgresMain (argc=9, argv=0xbfffee9c, real_argc=12, 
real_argv=0xb774, username=0x82104b9 "reindex") at
postgres.c:1900
#37 0x80ffb03 in DoBackend (port=0x8210250) at postmaster.c:2080
#38 0x80ff6fa in BackendStartup (port=0x8210250) at postmaster.c:1863
#39 0x80fea96 in ServerLoop () at postmaster.c:963
#40 0x80fe527 in PostmasterMain (argc=12, argv=0xb774) at
postmaster.c:662
#41 0x80df8e9 in main (argc=12, argv=0xb774) at main.c:149

Regards,
Hiroshi Inoue

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly


Re: [HACKERS] How to handle waitingForLock in LockWaitCancel()

2001-03-05 Thread Tom Lane

Hiroshi Inoue [EMAIL PROTECTED] writes:
 [ backtrace snipped ]

Hmm, this is definitely not operating as intended: LockWaitCancel is
getting interrupted, because ProcessInterrupts may be called when it's
trying to acquire the lockmanager spinlock, and ProcessInterrupts will
see the ProcDiePending flag already set.  I think the correct fix (or
at least part of it) is in postgres.c's die():

/*
 * If it's safe to interrupt, and we're waiting for input or a lock,
 * service the interrupt immediately
 */
if (ImmediateInterruptOK  InterruptHoldoffCount == 0 
CritSectionCount == 0)
{
+   /* bump holdoff count to make ProcessInterrupts() a no-op */
+   /* until we are done getting ready for it */
+   InterruptHoldoffCount++;
DisableNotifyInterrupt();
/* Make sure HandleDeadLock won't run while shutting down... */
LockWaitCancel();
+   InterruptHoldoffCount--;
ProcessInterrupts();
}

QueryCancelHandler probably needs similar additions.

I suspect you will find that these crashes occur during the window just
after the semop() call in IpcSemaphoreLock() --- see the comment
beginning at line 399 of ipc.c.  You could probably make the crash
easier to reproduce by inserting a delay there, if you want to test
more.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] How to handle waitingForLock in LockWaitCancel()

2001-03-05 Thread Hiroshi Inoue
Tom Lane wrote:
 
 Hiroshi Inoue [EMAIL PROTECTED] writes:
  [ backtrace snipped ]
 
 Hmm, this is definitely not operating as intended: LockWaitCancel is
 getting interrupted, because ProcessInterrupts may be called when it's
 trying to acquire the lockmanager spinlock, and ProcessInterrupts will
 see the ProcDiePending flag already set.  I think the correct fix (or
 at least part of it) is in postgres.c's die():
 
 /*
  * If it's safe to interrupt, and we're waiting for input or a lock,
  * service the interrupt immediately
  */
 if (ImmediateInterruptOK  InterruptHoldoffCount == 0 
 CritSectionCount == 0)
 {
 +   /* bump holdoff count to make ProcessInterrupts() a no-op */
 +   /* until we are done getting ready for it */
 +   InterruptHoldoffCount++;
 DisableNotifyInterrupt();
 /* Make sure HandleDeadLock won't run while shutting down... */
 LockWaitCancel();
 +   InterruptHoldoffCount--;
 ProcessInterrupts();
 }
 
 QueryCancelHandler probably needs similar additions.
 

Agreed. Adding similar code to QueryCancelHandler seems
sufficient.

 I suspect you will find that these crashes occur during the window just
 after

Sorry what does 'just after' mean ?
Isn't it during the semop() ?

 the semop() call in IpcSemaphoreLock() --- see the comment

Regards,
Hiroshi Inoue

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly


Re: [HACKERS] How to handle waitingForLock in LockWaitCancel()

2001-03-05 Thread Tom Lane

Hiroshi Inoue [EMAIL PROTECTED] writes:
 I suspect you will find that these crashes occur during the window just
 after

 Sorry what does 'just after' mean ?
 Isn't it during the semop() ?

 the semop() call in IpcSemaphoreLock() --- see the comment

If an interrupt during the semop led to a crash, it would be easy to
reproduce.  I suspect that the crash condition arises only when the
interrupt occurs in a narrow time window ... such as the few
instructions just before or just after the semop call.  It's just a
hunch though.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



[HACKERS] How to handle waitingForLock in LockWaitCancel()

2001-03-04 Thread Hiroshi Inoue
Hi,

I sometimes encountered SEGV errors in my test case
when I canceled the execution.
Probably it's due to the almost simultaneous arrival
of multiple signals and the following patch seems to
fix the bug. However I'm afraid that the change should
cause another bug.

Comments ?

Regards,
Hiroshi Inoue


Index: proc.c
===
RCS file:
/home/projects/pgsql/cvsroot/pgsql/src/backend/storage/lmgr/proc.c,v
retrieving revision 1.98
diff -c -c -r1.98 proc.c
*** proc.c  2001/01/26 18:23:12 1.98
--- proc.c  2001/03/05 02:28:09
***
*** 327,334 
if (!waitingForLock)
return false;

-   waitingForLock = false;
-
/* Turn off the deadlock timer, if it's still running (see
ProcSleep) */
  #ifndef __BEOS__
{
--- 327,332 
***
*** 345,350 
--- 343,349 

/* Unlink myself from the wait queue, if on it (might not be
anymore!) *
/
LockLockTable();
+   waitingForLock = false;
if (MyProc-links.next != INVALID_OFFSET)
RemoveFromWaitQueue(MyProc);
UnlockLockTable();

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly


Re: [HACKERS] How to handle waitingForLock in LockWaitCancel()

2001-03-04 Thread Tom Lane

Hiroshi Inoue [EMAIL PROTECTED] writes:
 I sometimes encountered SEGV errors in my test case
 when I canceled the execution.

Can you provide backtraces from these SEGVs?

 Probably it's due to the almost simultaneous arrival
 of multiple signals and the following patch seems to
 fix the bug. However I'm afraid that the change should
 cause another bug.

I do not like that change at *all*.  In the first place, how could it
stop whatever is causing the SEGV?  The waitingForLock flag is not
examined anywhere else, so unless things are already broken this cannot
have any effect.  In the second place, postponing the reset of the
flag has the potential for an infinite loop, because this routine is
called during error exit.  Suppose LockLockTable causes an elog(ERROR)?

I think we need to look harder to find the cause of the SEGVs you are
seeing.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly