Re: [HACKERS] Quite strange crash

2001-01-09 Thread Vadim Mikheev
Well, it's not good idea because of SIGTERM is used for ABORT + EXIT (pg_ctl -m fast stop), but shouldn't ABORT clean up everything? Er, shouldn't ABORT leave the system in the exact state that it's in so that one can get a crashdump/traceback on a wedged process without it trying to

Re: [HACKERS] Quite strange crash

2001-01-09 Thread Tom Lane
[EMAIL PROTECTED] (Nathan Myers) writes: The relevance to the issue at hand is that processes dying during heavy memory load is a documented feature of our supported platforms. Ugh. Do you know anything about *how* they get killed --- ie, with what signal? regards,

Re: [HACKERS] Quite strange crash

2001-01-09 Thread Tom Lane
Denis Perchine [EMAIL PROTECTED] writes: Didn't you get my mail with a piece of Linux kernel code? I think all is clear there. That was implementing CPU-time-exceeded kill, which is a different issue. regards, tom lane

Re: [HACKERS] Quite strange crash

2001-01-09 Thread Denis Perchine
Didn't you get my mail with a piece of Linux kernel code? I think all is clear there. That was implementing CPU-time-exceeded kill, which is a different issue. Opps.. You are talking about OOM killer. /* This process has hardware access, be more careful. */ if (cap_t(p-cap_effective)

RE: [HACKERS] Quite strange crash

2001-01-09 Thread Mikheev, Vadim
START_/END_CRIT_SECTION is mostly CritSectionCount++/--. Recording could be made as LockedSpinLocks[LockedSpinCounter++] = spinlock in pre-allocated array. Yeah, I suppose. We already do record locking of all the fixed spinlocks (BufMgrLock etc), it's just the per-buffer spinlocks

Re: [HACKERS] Quite strange crash

2001-01-09 Thread Tom Lane
"Mikheev, Vadim" [EMAIL PROTECTED] writes: Yeah, I suppose. We already do record locking of all the fixed spinlocks (BufMgrLock etc), it's just the per-buffer spinlocks that are missing from that (and CRIT_SECTION calls). Would it be reasonable to assume that only one buffer spinlock could

Re: [HACKERS] Quite strange crash

2001-01-09 Thread Tom Lane
Denis Perchine [EMAIL PROTECTED] writes: You will get SIGKILL in most cases. Well, a SIGKILL will cause the postmaster to shut down and restart the other backends, so we should be safe if that happens. (Annoyed as heck, maybe, but safe.) Anyway, this is looking more and more like the SIGTERM

Re: [HACKERS] Quite strange crash

2001-01-09 Thread Nathan Myers
On Wed, Jan 10, 2001 at 12:46:50AM +0600, Denis Perchine wrote: Didn't you get my mail with a piece of Linux kernel code? I think all is clear there. That was implementing CPU-time-exceeded kill, which is a different issue. Opps.. You are talking about OOM killer. /* This

Re: [HACKERS] Quite strange crash

2001-01-09 Thread Tom Lane
[EMAIL PROTECTED] (Nathan Myers) writes: If a backend dies while holding a lock, doesn't that imply that the shared memory may be in an inconsistent state? Yup. I had just come to the realization that we'd be best off to treat the *entire* period from SpinAcquire to SpinRelease as a critical

RE: [HACKERS] Quite strange crash

2001-01-09 Thread Mikheev, Vadim
Yup. I had just come to the realization that we'd be best off to treat the *entire* period from SpinAcquire to SpinRelease as a critical section for the purposes of die(). That is, response to SIGTERM will be held off until we have released the spinlock. Most of the places where we grab

Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane
Denis Perchine [EMAIL PROTECTED] writes: On Monday 08 January 2001 00:08, Tom Lane wrote: FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. Were there any errors before that? No... Just clean log (I redirect log from stderr/out t file, and all other to syslog). The

Re: [HACKERS] Quite strange crash

2001-01-08 Thread Denis Perchine
FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. Were there any errors before that? No... Just clean log (I redirect log from stderr/out t file, and all other to syslog). The error messages would be in the syslog then, not in stderr. Hmmm... The only strange

Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane
Denis Perchine [EMAIL PROTECTED] writes: FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. Were there any errors before that? Actually you can have a look on the logs yourself. Well, I found a smoking gun: Jan 7 04:27:51 mx postgres[2501]: FATAL 1: The system is

Re: [HACKERS] Quite strange crash

2001-01-08 Thread Nathan Myers
On Mon, Jan 08, 2001 at 12:21:38PM -0500, Tom Lane wrote: Denis Perchine [EMAIL PROTECTED] writes: FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. Were there any errors before that? Actually you can have a look on the logs yourself. Well, I found a smoking gun:

Re: [HACKERS] Quite strange crash

2001-01-08 Thread Denis Perchine
Well, I found a smoking gun: ... What seems to have happened is that 2501 curled up and died, leaving one or more buffer spinlocks locked. ... There is something pretty fishy about this. You aren't by any chance running the postmaster under a ulimit setting that might cut off

Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane
Denis Perchine [EMAIL PROTECTED] writes: It's worth noting here that modern Unixes run around killing user-level processes more or less at random when free swap space (and sometimes just RAM) runs low. That's not the case for sure. There are 512Mb on the machine, and when I had this

Re: [HACKERS] Quite strange crash

2001-01-08 Thread Denis Perchine
On Monday 08 January 2001 23:21, Tom Lane wrote: Denis Perchine [EMAIL PROTECTED] writes: FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. Were there any errors before that? Actually you can have a look on the logs yourself. Well, I found a smoking gun: Jan 7

Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane
Denis Perchine [EMAIL PROTECTED] writes: Hmmm... actually this is real problem with vacuum lazy. Sometimes it just do something for enormous amount of time (I have mailed a sample database to Vadim, but did not get any response yet). It is possible, that it was me, who killed the backend.

RE: [HACKERS] Quite strange crash

2001-01-08 Thread Mikheev, Vadim
Killing an individual backend with SIGTERM is bad luck. The backend will assume that it's being killed by the postmaster, and will exit without a whole lot of concern for cleaning up shared memory --- the What code will be returned to postmaster in this case? Vadim

Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane
"Mikheev, Vadim" [EMAIL PROTECTED] writes: Killing an individual backend with SIGTERM is bad luck. The backend will assume that it's being killed by the postmaster, and will exit without a whole lot of concern for cleaning up shared memory --- the What code will be returned to postmaster in

RE: [HACKERS] Quite strange crash

2001-01-08 Thread Mikheev, Vadim
Killing an individual backend with SIGTERM is bad luck. The backend will assume that it's being killed by the postmaster, and will exit without a whole lot of concern for cleaning up shared memory --- the SIGTERM -- die() -- elog(FATAL) Is it true that elog(FATAL) doesn't clean up

Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane
"Mikheev, Vadim" [EMAIL PROTECTED] writes: Killing an individual backend with SIGTERM is bad luck. SIGTERM -- die() -- elog(FATAL) Is it true that elog(FATAL) doesn't clean up shmem etc? This would be very bad... It tries, but I don't think it's possible to make a complete guarantee

Re: [HACKERS] Quite strange crash

2001-01-08 Thread Alfred Perlstein
* Mikheev, Vadim [EMAIL PROTECTED] [010108 23:08] wrote: Killing an individual backend with SIGTERM is bad luck. The backend will assume that it's being killed by the postmaster, and will exit without a whole lot of concern for cleaning up shared memory --- the SIGTERM -- die()

[HACKERS] Quite strange crash

2001-01-07 Thread Denis Perchine
Hi, Does anyone seen this on PostgreSQL 7.0.3? FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. Server process (pid 1008) exited with status 6 at Sun Jan 7 04:29:07 2001 Terminating any active server

Re: [HACKERS] Quite strange crash

2001-01-07 Thread Tom Lane
Denis Perchine [EMAIL PROTECTED] writes: Does anyone seen this on PostgreSQL 7.0.3? FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. Were there any errors before that? I've been suspicious for awhile that the system might neglect to release buffer cntx_lock spinlocks if

Re: [HACKERS] Quite strange crash

2001-01-07 Thread Denis Perchine
On Monday 08 January 2001 00:08, Tom Lane wrote: Denis Perchine [EMAIL PROTECTED] writes: Does anyone seen this on PostgreSQL 7.0.3? FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. Were there any errors before that? No... Just clean log (I redirect log from