Re: [HACKERS] "stuck spinlock"

2013-12-26 Thread Andres Freund
On 2013-12-12 20:45:17 -0500, Tom Lane wrote: > Memo to hackers: I think the SIGSTOP stuff is rather obsolete now that > most systems dump core files with process IDs embedded in the names. > What would be more useful today is an option to send SIGABRT, or some > other signal that would force core

Re: [HACKERS] "stuck spinlock"

2013-12-26 Thread Martijn van Oosterhout
On Thu, Dec 26, 2013 at 03:18:23PM -0800, Robert Haas wrote: > On Thu, Dec 26, 2013 at 11:54 AM, Peter Eisentraut wrote: > > On 12/12/13, 8:45 PM, Tom Lane wrote: > >> Memo to hackers: I think the SIGSTOP stuff is rather obsolete now that > >> most systems dump core files with process IDs embedded

Re: [HACKERS] "stuck spinlock"

2013-12-26 Thread Robert Haas
On Thu, Dec 26, 2013 at 11:54 AM, Peter Eisentraut wrote: > On 12/12/13, 8:45 PM, Tom Lane wrote: >> Memo to hackers: I think the SIGSTOP stuff is rather obsolete now that >> most systems dump core files with process IDs embedded in the names. > > Which systems are those? MacOS X dumps core files

Re: [HACKERS] "stuck spinlock"

2013-12-26 Thread Peter Eisentraut
On 12/12/13, 8:45 PM, Tom Lane wrote: > Memo to hackers: I think the SIGSTOP stuff is rather obsolete now that > most systems dump core files with process IDs embedded in the names. Which systems are those? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to

Re: [HACKERS] "stuck spinlock"

2013-12-17 Thread bricklen
On Mon, Dec 16, 2013 at 6:46 AM, Tom Lane wrote: > Andres Freund writes: > > Hard to say, the issues fixed in the release are quite important as > > well. I'd tend to say they are more important. I think we just need to > > release 9.3.3 pretty soon. > > Yeah. > Has there been any talk about wh

Re: [HACKERS] "stuck spinlock"

2013-12-16 Thread Alvaro Herrera
Tom Lane escribió: > Andres Freund writes: > > On 2013-12-16 09:46:19 -0500, Tom Lane wrote: > >> Are they complete now? > > > Hm. There's two issues I know of left, both discovered in #8673: > > - slru.c:SlruScanDirectory() doesn't support long enough > > filenames. Afaics that should be a fai

Re: [HACKERS] "stuck spinlock"

2013-12-16 Thread Tom Lane
Andres Freund writes: > On 2013-12-16 09:46:19 -0500, Tom Lane wrote: >> Are they complete now? > Hm. There's two issues I know of left, both discovered in #8673: > - slru.c:SlruScanDirectory() doesn't support long enough > filenames. Afaics that should be a fairly easy fix. > - multixact/membe

Re: [HACKERS] "stuck spinlock"

2013-12-16 Thread Andres Freund
On 2013-12-16 09:46:19 -0500, Tom Lane wrote: > Andres Freund writes: > > The multixact fixes in 9.3.2 weren't complete either... (see recent push) > > Are they complete now? Hm. There's two issues I know of left, both discovered in #8673: - slru.c:SlruScanDirectory() doesn't support long enough

Re: [HACKERS] "stuck spinlock"

2013-12-16 Thread Tom Lane
Andres Freund writes: > Hard to say, the issues fixed in the release are quite important as > well. I'd tend to say they are more important. I think we just need to > release 9.3.3 pretty soon. Yeah. > The multixact fixes in 9.3.2 weren't complete either... (see recent push) Are they complete n

Re: [HACKERS] "stuck spinlock"

2013-12-16 Thread Andres Freund
On 2013-12-16 08:36:51 -0600, Merlin Moncure wrote: > On Sat, Dec 14, 2013 at 6:20 AM, Andres Freund wrote: > > On 2013-12-13 15:49:45 -0600, Merlin Moncure wrote: > >> Is this an edge case or something that will hit a lot of users? > >> Arbitrary server panics seems pretty serious... > > > > Is y

Re: [HACKERS] "stuck spinlock"

2013-12-16 Thread Merlin Moncure
On Sat, Dec 14, 2013 at 6:20 AM, Andres Freund wrote: > On 2013-12-13 15:49:45 -0600, Merlin Moncure wrote: >> On Fri, Dec 13, 2013 at 12:32 PM, Robert Haas wrote: >> > On Fri, Dec 13, 2013 at 11:26 AM, Tom Lane wrote: >> >> And while we're on the subject ... isn't bgworker_die() utterly and >>

Re: [HACKERS] "stuck spinlock"

2013-12-14 Thread Andres Freund
On 2013-12-13 13:39:42 -0500, Robert Haas wrote: > On Fri, Dec 13, 2013 at 1:15 PM, Andres Freund wrote: > > Agreed on not going forward like now, but I don't really see how they > > could usefully use die(). I think we should just mandate that every > > bgworker conneced to shared memory register

Re: [HACKERS] "stuck spinlock"

2013-12-14 Thread Andres Freund
Hi, On 2013-12-13 15:57:14 -0300, Alvaro Herrera wrote: > If there was a way for raising an #error at compile time whenever a > worker relies on the existing signal handler, I would vote for doing > that. (But then I have no idea how to do such a thing.) I don't see a way either given how discon

Re: [HACKERS] "stuck spinlock"

2013-12-14 Thread Andres Freund
On 2013-12-13 15:49:45 -0600, Merlin Moncure wrote: > On Fri, Dec 13, 2013 at 12:32 PM, Robert Haas wrote: > > On Fri, Dec 13, 2013 at 11:26 AM, Tom Lane wrote: > >> And while we're on the subject ... isn't bgworker_die() utterly and > >> completely broken? That unconditional elog(FATAL) means t

Re: [HACKERS] "stuck spinlock"

2013-12-13 Thread Christophe Pettus
On Dec 13, 2013, at 8:52 AM, Tom Lane wrote: > Please apply commit 478af9b79770da43a2d89fcc5872d09a2d8731f8 and see > if that doesn't fix it for you. It appears to fix it. Thanks! -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.

Re: [HACKERS] "stuck spinlock"

2013-12-13 Thread Christophe Pettus
On Dec 13, 2013, at 1:49 PM, Merlin Moncure wrote: > Is this an edge case or something that will hit a lot of users? My understanding (Tom can correct me if I'm wrong, I'm sure) is that it is an issue for servers on 9.3.2 where there are a lot of query cancellations due to facilities like stat

Re: [HACKERS] "stuck spinlock"

2013-12-13 Thread Merlin Moncure
On Fri, Dec 13, 2013 at 12:32 PM, Robert Haas wrote: > On Fri, Dec 13, 2013 at 11:26 AM, Tom Lane wrote: >> And while we're on the subject ... isn't bgworker_die() utterly and >> completely broken? That unconditional elog(FATAL) means that no process >> using that handler can do anything remotel

Re: [HACKERS] "stuck spinlock"

2013-12-13 Thread Tom Lane
Robert Haas writes: > It seems to me that we should change every place that temporarily > changes ImmediateInterruptOK to restore the original value instead of > making assumptions about what it must have been. No, that's backwards. The problem isn't that it could be sane to enter, say, PGSemaph

Re: [HACKERS] "stuck spinlock"

2013-12-13 Thread Alvaro Herrera
Robert Haas escribió: > On Fri, Dec 13, 2013 at 11:26 AM, Tom Lane wrote: > > And while we're on the subject ... isn't bgworker_die() utterly and > > completely broken? That unconditional elog(FATAL) means that no process > > using that handler can do anything remotely interesting, like say touch

Re: [HACKERS] "stuck spinlock"

2013-12-13 Thread Robert Haas
On Fri, Dec 13, 2013 at 1:15 PM, Andres Freund wrote: > On 2013-12-13 12:54:09 -0500, Tom Lane wrote: >> Andres Freund writes: >> > I wonder what to do about bgworker's bgworker_die()? I don't really see >> > how that can be fixed without breaking the API? >> >> IMO it should be flushed and bgwor

Re: [HACKERS] "stuck spinlock"

2013-12-13 Thread Robert Haas
On Fri, Dec 13, 2013 at 11:26 AM, Tom Lane wrote: > And while we're on the subject ... isn't bgworker_die() utterly and > completely broken? That unconditional elog(FATAL) means that no process > using that handler can do anything remotely interesting, like say touch > shared memory. Yeah, but f

Re: [HACKERS] "stuck spinlock"

2013-12-13 Thread Andres Freund
On 2013-12-13 12:54:09 -0500, Tom Lane wrote: > Andres Freund writes: > > I wonder what to do about bgworker's bgworker_die()? I don't really see > > how that can be fixed without breaking the API? > > IMO it should be flushed and bgworkers should use the same die() handler > as every other backe

Re: [HACKERS] "stuck spinlock"

2013-12-13 Thread Tom Lane
Andres Freund writes: > I wonder what to do about bgworker's bgworker_die()? I don't really see > how that can be fixed without breaking the API? IMO it should be flushed and bgworkers should use the same die() handler as every other backend, or else one like the one in worker_spi, which just set

Re: [HACKERS] "stuck spinlock"

2013-12-13 Thread Andres Freund
On 2013-12-13 12:19:56 -0500, Tom Lane wrote: > Andres Freund writes: > > Shouldn't the HOLD_INTERRUPTS() in handle_sig_alarm() prevent any > > eventual ProcessInterrupts() in the timeout handlers from doing anything > > harmful? > > Sorry, I misspoke there. The case I'm worried about is doing s

Re: [HACKERS] "stuck spinlock"

2013-12-13 Thread Tom Lane
Christophe Pettus writes: > On Dec 13, 2013, at 8:52 AM, Tom Lane wrote: >> Please apply commit 478af9b79770da43a2d89fcc5872d09a2d8731f8 and see >> if that doesn't fix it for you. > Great, thanks. Would the statement_timeout firing invoke this path? (I'm > wondering why this particular instal

Re: [HACKERS] "stuck spinlock"

2013-12-13 Thread Tom Lane
Andres Freund writes: > On 2013-12-13 11:26:44 -0500, Tom Lane wrote: >> On closer inspection, I'm thinking that actually it'd be a good idea if >> handle_sig_alarm did what we do in, for example, HandleCatchupInterrupt: >> it should save, clear, and restore ImmediateInterruptOK, so as to make >>

Re: [HACKERS] "stuck spinlock"

2013-12-13 Thread Christophe Pettus
On Dec 13, 2013, at 8:52 AM, Tom Lane wrote: > Please apply commit 478af9b79770da43a2d89fcc5872d09a2d8731f8 and see > if that doesn't fix it for you. Great, thanks. Would the statement_timeout firing invoke this path? (I'm wondering why this particular installation was experiencing this.) -

Re: [HACKERS] "stuck spinlock"

2013-12-13 Thread Tom Lane
Christophe Pettus writes: > Yes, that's what is happening there (I had to check with the client's > developers). It's possible that the one-minute repeat is due to the > application reissuing the query, rather than specifically related to the > spinlock issue. What this does reveal is that al

Re: [HACKERS] "stuck spinlock"

2013-12-13 Thread Andres Freund
On 2013-12-13 11:26:44 -0500, Tom Lane wrote: > On closer inspection, I'm thinking that actually it'd be a good idea if > handle_sig_alarm did what we do in, for example, HandleCatchupInterrupt: > it should save, clear, and restore ImmediateInterruptOK, so as to make > the world safe for timeout ha

Re: [HACKERS] "stuck spinlock"

2013-12-13 Thread Tom Lane
On closer inspection, I'm thinking that actually it'd be a good idea if handle_sig_alarm did what we do in, for example, HandleCatchupInterrupt: it should save, clear, and restore ImmediateInterruptOK, so as to make the world safe for timeout handlers to do things that might include a CHECK_FOR_INT

Re: [HACKERS] "stuck spinlock"

2013-12-13 Thread Andres Freund
On 2013-12-13 10:30:48 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2013-12-13 09:52:06 -0500, Tom Lane wrote: > >> I think you're probably right: > >> what should be in the interrupt handler is something like > >> "if (ImmediateInterruptOK) CHECK_FOR_INTERRUPTS();" > > > Yea, that sound

Re: [HACKERS] "stuck spinlock"

2013-12-13 Thread Tom Lane
Andres Freund writes: > On 2013-12-13 09:52:06 -0500, Tom Lane wrote: >> I think you're probably right: >> what should be in the interrupt handler is something like >> "if (ImmediateInterruptOK) CHECK_FOR_INTERRUPTS();" > Yea, that sounds right. Or just don't set process interrupts there, it > do

Re: [HACKERS] "stuck spinlock"

2013-12-13 Thread Andres Freund
On 2013-12-13 09:52:06 -0500, Tom Lane wrote: > Andres Freund writes: > > Tom, could this be caused by c357be2cd9434c70904d871d9b96828b31a50cc5? > > Specifically the added CHECK_FOR_INTERRUPTS() in handle_sig_alarm()? > > ISTM nothing is preventing us from jumping out of code holding a > > spinloc

Re: [HACKERS] "stuck spinlock"

2013-12-13 Thread Tom Lane
Andres Freund writes: > Tom, could this be caused by c357be2cd9434c70904d871d9b96828b31a50cc5? > Specifically the added CHECK_FOR_INTERRUPTS() in handle_sig_alarm()? > ISTM nothing is preventing us from jumping out of code holding a > spinlock? Hm ... what should stop it is that ImmediateInterrup

Re: [HACKERS] "stuck spinlock"

2013-12-13 Thread Andres Freund
Hi, On 2013-12-12 19:35:36 -0800, Christophe Pettus wrote: > On Dec 12, 2013, at 6:41 PM, Andres Freund wrote: > > > Christophe: are there any "unusual" ERROR messages preceding the crash, > > possibly some minutes before? > > Interestingly, each spinlock PANIC is *followed*, about one minute l

Re: [HACKERS] "stuck spinlock"

2013-12-12 Thread Christophe Pettus
On Dec 12, 2013, at 7:40 PM, Peter Geoghegan wrote: > Couldn't that just be the app setting it locally? Yes, that's what is happening there (I had to check with the client's developers). It's possible that the one-minute repeat is due to the application reissuing the query, rather than specif

Re: [HACKERS] "stuck spinlock"

2013-12-12 Thread Peter Geoghegan
On Thu, Dec 12, 2013 at 7:35 PM, Christophe Pettus wrote: > There are a *lot* of "canceling statement due to statement timeout" messages, > which is interesting, because: > > postgres=# show statement_timeout; > statement_timeout > --- > 0 > (1 row) Couldn't that just be the ap

Re: [HACKERS] "stuck spinlock"

2013-12-12 Thread Christophe Pettus
On Dec 12, 2013, at 6:41 PM, Andres Freund wrote: > Christophe: are there any "unusual" ERROR messages preceding the crash, > possibly some minutes before? Interestingly, each spinlock PANIC is *followed*, about one minute later (+/- five seconds) by a "canceling statement due to statement tim

Re: [HACKERS] "stuck spinlock"

2013-12-12 Thread Christophe Pettus
On Dec 12, 2013, at 6:24 PM, Andres Freund wrote: > Is it really a regular pattern like hourly? What's your > checkpoint_segments? No, it's not a pattern like that; that's an approximation. Sometimes, they come in clusters, sometimes, 2-3 hours past without one. They don't happen exclusively

Re: [HACKERS] "stuck spinlock"

2013-12-12 Thread Andres Freund
On 2013-12-12 21:15:29 -0500, Tom Lane wrote: > Christophe Pettus writes: > > On Dec 12, 2013, at 5:45 PM, Tom Lane wrote: > >> Presumably, we are seeing the victim rather than the perpetrator of > >> whatever is going wrong. > > > This is probing about a bit blindly, but the only thing I can se

Re: [HACKERS] "stuck spinlock"

2013-12-12 Thread Peter Geoghegan
On Thu, Dec 12, 2013 at 5:45 PM, Tom Lane wrote: > Memo to hackers: I think the SIGSTOP stuff is rather obsolete now that > most systems dump core files with process IDs embedded in the names. > What would be more useful today is an option to send SIGABRT, or some > other signal that would force c

Re: [HACKERS] "stuck spinlock"

2013-12-12 Thread Andres Freund
Hi, On 2013-12-12 13:50:06 -0800, Christophe Pettus wrote: > Immediately after an upgrade from 9.3.1 to 9.3.2, we have a client getting > frequent (hourly) errors of the form: Is it really a regular pattern like hourly? What's your checkpoint_segments? Could you, arround the time of a crash, c

Re: [HACKERS] "stuck spinlock"

2013-12-12 Thread Christophe Pettus
On Dec 12, 2013, at 6:15 PM, Tom Lane wrote: > Are you possibly using any nonstandard extensions? No, totally stock PostgreSQL. -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://ww

Re: [HACKERS] "stuck spinlock"

2013-12-12 Thread Tom Lane
Christophe Pettus writes: > On Dec 12, 2013, at 5:45 PM, Tom Lane wrote: >> Presumably, we are seeing the victim rather than the perpetrator of >> whatever is going wrong. > This is probing about a bit blindly, but the only thing I can see about this > system that is in some way unique (and thi

Re: [HACKERS] "stuck spinlock"

2013-12-12 Thread Christophe Pettus
On Dec 12, 2013, at 5:45 PM, Tom Lane wrote: > Presumably, we are seeing the victim rather than the perpetrator of > whatever is going wrong. This is probing about a bit blindly, but the only thing I can see about this system that is in some way unique (and this is happening on multiple machin

Re: [HACKERS] "stuck spinlock"

2013-12-12 Thread Tom Lane
Christophe Pettus writes: > On Dec 12, 2013, at 4:23 PM, Andres Freund wrote: >> Could you install the -dbg package and regenerate? > Here's another, same system, different crash: Both of these look like absolutely run-of-the-mill buffer access attempts. Presumably, we are seeing the victim rat

Re: [HACKERS] "stuck spinlock"

2013-12-12 Thread Christophe Pettus
On Dec 12, 2013, at 4:23 PM, Andres Freund wrote: > Could you install the -dbg package and regenerate? Here's another, same system, different crash: #0 0x7fa03faf5425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x7fa03faf8b8b in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2

Re: [HACKERS] "stuck spinlock"

2013-12-12 Thread Christophe Pettus
On Dec 12, 2013, at 4:23 PM, Andres Freund wrote: > Could you install the -dbg package and regenerate? Of course! #0 0x7f699a4fa425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x7f699a4fdb8b in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x7f699c81991b in errfinish (

Re: [HACKERS] "stuck spinlock"

2013-12-12 Thread Andres Freund
On 2013-12-12 16:22:28 -0800, Christophe Pettus wrote: > > On Dec 12, 2013, at 4:04 PM, Tom Lane wrote: > > If you aren't getting a core file for a PANIC, then core > > files are disabled. > > And just like that, we get one. Stack trace: > > #0 0x7f699a4fa425 in raise () from /lib/x86_64-

Re: [HACKERS] "stuck spinlock"

2013-12-12 Thread Christophe Pettus
On Dec 12, 2013, at 4:04 PM, Tom Lane wrote: > If you aren't getting a core file for a PANIC, then core > files are disabled. And just like that, we get one. Stack trace: #0 0x7f699a4fa425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) bt #0 0x7f699a4fa425 in raise () from /l

Re: [HACKERS] "stuck spinlock"

2013-12-12 Thread Tom Lane
Christophe Pettus writes: > On Dec 12, 2013, at 3:18 PM, Tom Lane wrote: >> Hm, a PANIC really ought to result in a core file. You sure you don't >> have that disabled (perhaps via a ulimit setting)? > Since it's using the Ubuntu packaging, we have pg_ctl_options = '-c' in > /etc/postgresql/9.

Re: [HACKERS] "stuck spinlock"

2013-12-12 Thread Christophe Pettus
On Dec 12, 2013, at 3:18 PM, Tom Lane wrote: > Hm, a PANIC really ought to result in a core file. You sure you don't > have that disabled (perhaps via a ulimit setting)? Since it's using the Ubuntu packaging, we have pg_ctl_options = '-c' in /etc/postgresql/9.3/main/pg_ctl.conf. > As for the

Re: [HACKERS] "stuck spinlock"

2013-12-12 Thread Christophe Pettus
On Dec 12, 2013, at 3:33 PM, Andres Freund wrote: > Any other changes but the upgrade? Maybe a different compiler version? Just the upgrade; they're using the Ubuntu packages from apt.postgresql.org. > Also, could you share some details about the workload? Highly > concurrent? Standby? ... Th

Re: [HACKERS] "stuck spinlock"

2013-12-12 Thread Christophe Pettus
On Dec 12, 2013, at 3:37 PM, Peter Geoghegan wrote: > Show pg_config output. Below; it's the Ubuntu package. BINDIR = /usr/lib/postgresql/9.3/bin DOCDIR = /usr/share/doc/postgresql-doc-9.3 HTMLDIR = /usr/share/doc/postgresql-doc-9.3 INCLUDEDIR = /usr/include/postgresql PKGINCLUDEDIR = /usr/incl

Re: [HACKERS] "stuck spinlock"

2013-12-12 Thread Peter Geoghegan
On Thu, Dec 12, 2013 at 3:33 PM, Andres Freund wrote: > Any other changes but the upgrade? Maybe a different compiler version? Show pg_config output. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.pos

Re: [HACKERS] "stuck spinlock"

2013-12-12 Thread Andres Freund
On 2013-12-12 13:50:06 -0800, Christophe Pettus wrote: > Immediately after an upgrade from 9.3.1 to 9.3.2, we have a client getting > frequent (hourly) errors of the form: > > /var/lib/postgresql/9.3/main/pg_log/postgresql-2013-12-12_211710.csv:2013-12-12 > 21:40:10.328 > UTC,"n","n",32376,"10.

Re: [HACKERS] "stuck spinlock"

2013-12-12 Thread Tom Lane
Christophe Pettus writes: > Immediately after an upgrade from 9.3.1 to 9.3.2, we have a client getting > frequent (hourly) errors of the form: > /var/lib/postgresql/9.3/main/pg_log/postgresql-2013-12-12_211710.csv:2013-12-12 > 21:40:10.328 > UTC,"n","n",32376,"10.2.1.142:52451",52aa24eb.7e78,5

Re: [HACKERS] stuck spinlock

2001-02-28 Thread Tom Lane
Interesting numbers --- thanks for sending them along. Looks like I was mistaken to think that most platforms would allow tv_usec >= 1 sec. Ah well, another day, another bug... regards, tom lane

Re: [HACKERS] stuck spinlock

2001-02-28 Thread Peter Schindler
Tom Lane wrote: > Judging from the line number, this is in CreateCheckPoint. I'm > betting that your platform (Solaris 2.7, you said?) has the same odd > behavior that I discovered a couple days ago on HPUX: a select with > a delay of tv_sec = 0, tv_usec = 100 doesn't delay 1 second like > a

Re: [HACKERS] stuck spinlock

2001-02-26 Thread Tom Lane
Peter Schindler <[EMAIL PROTECTED]> writes: > FATAL: s_lock(fcc01067) at xlog.c:2088, stuck spinlock. Aborting. Judging from the line number, this is in CreateCheckPoint. I'm betting that your platform (Solaris 2.7, you said?) has the same odd behavior that I discovered a couple days ago on HPUX

Re: [HACKERS] Stuck Spinlock (fwd) - m68k architecture, 7.0.3

2001-02-05 Thread Tom Lane
"Oliver Elphick" <[EMAIL PROTECTED]> writes: > Has anyone got PostgreSQL 7.0.3 working on m68k architecture? > Russell is trying to install it on m68k and is consistently getting a > stuck spinlock in initdb. He used to have 6.3.2 working. Both 6.5.3 > and 7.0.3 fail. > His message shows that th