Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

2006-12-31 Thread Stefan Kaltenbrunner
Tom Lane wrote: Seneca Cunningham [EMAIL PROTECTED] writes: I don't have a core, but here's the CrashReporter output for both of jackal's failed runs: Wow, some actual data, rather than just noodling about how to get it ... thanks! ... 11 postgres 0x0022b2e3

Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

2006-12-31 Thread Tom Lane
Stefan Kaltenbrunner [EMAIL PROTECTED] writes: fwiw - I can trigger that issue now pretty reliably on a fast Opteron box (running Debian Sarge/AMD64) with make regress in a loop - I seem to be able to trigger it in about 20-25% of the runs. the resulting core however looks totally stack

Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

2006-12-31 Thread Seneca Cunningham
On Sun, Dec 31, 2006 at 05:43:45PM +0100, Stefan Kaltenbrunner wrote: Tom Lane wrote: What you seem to have here is infinite recursion during relcache initialization. That's surely not hard to believe, considering I just whacked that code around, and indeed changed some of the tests that

Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

2006-12-30 Thread Tom Lane
Seneca Cunningham [EMAIL PROTECTED] writes: I don't have a core, but here's the CrashReporter output for both of jackal's failed runs: Wow, some actual data, rather than just noodling about how to get it ... thanks! ... 11 postgres 0x0022b2e3 RelationIdGetRelation + 110 (relcache.c:1496)

Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

2006-12-29 Thread Andrew Dunstan
Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: Andrew Dunstan wrote: here's a quick untested patch for buildfarm that Stefan might like to try. Note that not all core files are named core. On some Linux distros, it's configured to be core.PID by default.

Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

2006-12-29 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes: I'm actually wondering if unlimiting core might not be a useful switch to provide on pg_ctl, as long as the platform has setrlimit(). Not a bad thought; that's actually one of the reasons that I still usually use a handmade script rather than pg_ctl for

Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

2006-12-29 Thread Stefan Kaltenbrunner
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: I'm actually wondering if unlimiting core might not be a useful switch to provide on pg_ctl, as long as the platform has setrlimit(). Not a bad thought; that's actually one of the reasons that I still usually use a handmade script

Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

2006-12-28 Thread Stefan Kaltenbrunner
Tom Lane wrote: Several of the buildfarm machines are exhibiting repeatable signal 11 crashes in what seem perfectly ordinary queries. This started about four days ago so I suppose it's got something to do with my operator-families patch :-( ... but I dunno what, and none of my own machines

Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

2006-12-28 Thread Tom Lane
Stefan Kaltenbrunner [EMAIL PROTECTED] writes: Tom Lane wrote: Several of the buildfarm machines are exhibiting repeatable signal 11 crashes in what seem perfectly ordinary queries. no stack trace yet however impala at least seems to be running out of memory (!) with 380MB of RAM and some

Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

2006-12-28 Thread Stefan Kaltenbrunner
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: Tom Lane wrote: Several of the buildfarm machines are exhibiting repeatable signal 11 crashes in what seem perfectly ordinary queries. no stack trace yet however impala at least seems to be running out of memory (!) with 380MB

Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

2006-12-28 Thread Tom Lane
Stefan Kaltenbrunner [EMAIL PROTECTED] writes: Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: ... Maybe something is causing a dramatic increase in memory usage that is causing the random failures (in impalas case the OOM-killer actually decides to terminate the postmaster) ?

Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

2006-12-28 Thread Stefan Kaltenbrunner
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: ... Maybe something is causing a dramatic increase in memory usage that is causing the random failures (in impalas case the OOM-killer actually decides to terminate

Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

2006-12-28 Thread Alvaro Herrera
Tom Lane wrote: Actually ... one way that a memory overconsumption bug could manifest as sig11 would be if it's a runaway-recursion issue: usually you get sig11 when the machine's stack size limit is exceeded. This doesn't put us any closer to localizing the problem, but at least it's a

Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

2006-12-28 Thread Andrew Dunstan
Alvaro Herrera wrote: Tom Lane wrote: I wonder whether there's any way to get the buildfarm script to report a stack trace automatically if it finds a core file left behind in the $PGDATA directory after running the tests. Would something like this be adequately portable? if [ -f

Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

2006-12-28 Thread Alvaro Herrera
Andrew Dunstan wrote: here's a quick untested patch for buildfarm that Stefan might like to try. Note that not all core files are named core. On some Linux distros, it's configured to be core.PID by default. And you can even change it to weirder names, but I haven't seen those anywhere by

Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

2006-12-28 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes: Andrew Dunstan wrote: here's a quick untested patch for buildfarm that Stefan might like to try. Note that not all core files are named core. On some Linux distros, it's configured to be core.PID by default. And on some platforms, cores don't drop in

[HACKERS] Recent SIGSEGV failures in buildfarm HEAD

2006-12-26 Thread Tom Lane
Several of the buildfarm machines are exhibiting repeatable signal 11 crashes in what seem perfectly ordinary queries. This started about four days ago so I suppose it's got something to do with my operator-families patch :-( ... but I dunno what, and none of my own machines show the failure.