Re: [HACKERS] Gerbil build farm failure
On Mon, Sep 26, 2005 at 06:58:16PM -0400, Tom Lane wrote: Michael Fuhr [EMAIL PROTECTED] writes: Gerbil's looking better lately: http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=gerbilbr=REL8_0_STABLE Yeah. We've been poking at it off-list, and it seems that the problem was a local build failure due to not having a clean copy of the repository (ye olde junk-in-the-supposedly-clean-vpath-tree problem). Well, just to be clear, I first logged into that box after the problem started. It's possible that someone else had mucked with the install, but unlikely. I suspect that there was a real build issue of some kind to start with. Since it's working now I guess it doesn't matter, but I'd still suspect code from back when the problem started. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Gerbil build farm failure
Gerbil's looking better lately: http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=gerbilbr=REL8_0_STABLE -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Gerbil build farm failure
Michael Fuhr [EMAIL PROTECTED] writes: Gerbil's looking better lately: http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=gerbilbr=REL8_0_STABLE Yeah. We've been poking at it off-list, and it seems that the problem was a local build failure due to not having a clean copy of the repository (ye olde junk-in-the-supposedly-clean-vpath-tree problem). regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Gerbil build farm failure
Now that we have backtrace, does anyone have a clue about the cause/fix? --- Jim C. Nasby wrote: On Tue, Sep 20, 2005 at 01:17:10PM -0400, Bruce Momjian wrote: I worked with Jim Nasby and we found this is the line that is failing on Gerbil in the build farm during initdb: tqual.c, line 844 in 8.0.X if (HeapTupleHeaderGetCmin(tuple) = snapshot-curcid) This particular line was last modified in 2002. However, this was a file that was changed as part of the VACUUM tuple chain commit: revision 1.81.4.2 date: 2005/08/25 19:45:01; author: tgl; state: Exp; lines: +7 -4 Back-patch fixes for problems with VACUUM destroying t_ctid chains too soon, and with insufficient paranoia in code that follows t_ctid links. This patch covers the 8.0 branch. and the date of the commit to 8.0.X corresponds to the date that failures started to happen: http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=gerbilbr=REL8_0_STABLE BTW, I want to point out for others that when initdb dumps core trying to get a stack trace out of the initdb binary will probably be useless, because initdb is just calling other binaries. In this case we had sucess with the postgres binary. Had I know this I would have had this stack trace available a couple weeks ago. :( http://lnk.nu/developer.postgresql.org/3zx.c is the annotated version of tqual. As Bruce mentioned, the line referenced in the core file probably isn't the culprit. http://lnk.nu/pgbuildfarm.org/3zz.pl has the list of files that changed to break gerbil. Here's the output from gdb: #0 HeapTupleSatisfiesSnapshot (tuple=0xfe28fc78, snapshot=0xd7, buffer=295) at tqual.c:844 844 tqual.c: No such file or directory. in tqual.c (gdb) bt #0 HeapTupleSatisfiesSnapshot (tuple=0xfe28fc78, snapshot=0xd7, buffer=295) at tqual.c:844 #1 0x0004bdd0 in heap_update () #2 0x000ec4b0 in ExecutorRun (queryDesc=0x0, direction=-4198192, count=16) at execMain.c:1592 (gdb) I'm in the process of trying to get this machine moved someplace where I could give a developer ssh access. That should hopefully happen by the end of the week. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 6: explain analyze is your friend -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Gerbil build farm failure
Bruce Momjian pgman@candle.pha.pa.us writes: Now that we have backtrace, does anyone have a clue about the cause/fix? The backtrace suggests a garbage snapshot value, but doesn't provide nearly enough info to guess where it's coming from. I'm waiting for the promised ssh access... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Gerbil build farm failure
On Fri, Sep 23, 2005 at 12:56:33AM -0400, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: Fire lit under IT dept. Their initial plan was everything outbound but SSH would be cut-off, which I nixed, but would that suffice in the short term if it means getting the box on the net faster? AFAICS, an ssh connection to an unprivileged account should be enough. I just need to be able to duplicate your build environment. Ok, if that greases the wheels I'll have them do that. Hopefully they can get it done tomorrow. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Gerbil build farm failure
Jim C. Nasby [EMAIL PROTECTED] writes: Fire lit under IT dept. Their initial plan was everything outbound but SSH would be cut-off, which I nixed, but would that suffice in the short term if it means getting the box on the net faster? AFAICS, an ssh connection to an unprivileged account should be enough. I just need to be able to duplicate your build environment. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Gerbil build farm failure
On Thu, Sep 22, 2005 at 08:03:43PM -0400, Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: Now that we have backtrace, does anyone have a clue about the cause/fix? The backtrace suggests a garbage snapshot value, but doesn't provide nearly enough info to guess where it's coming from. I'm waiting for the promised ssh access... Fire lit under IT dept. Their initial plan was everything outbound but SSH would be cut-off, which I nixed, but would that suffice in the short term if it means getting the box on the net faster? -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Gerbil build farm failure
On Tue, Sep 20, 2005 at 01:17:10PM -0400, Bruce Momjian wrote: I worked with Jim Nasby and we found this is the line that is failing on Gerbil in the build farm during initdb: tqual.c, line 844 in 8.0.X if (HeapTupleHeaderGetCmin(tuple) = snapshot-curcid) This particular line was last modified in 2002. However, this was a file that was changed as part of the VACUUM tuple chain commit: revision 1.81.4.2 date: 2005/08/25 19:45:01; author: tgl; state: Exp; lines: +7 -4 Back-patch fixes for problems with VACUUM destroying t_ctid chains too soon, and with insufficient paranoia in code that follows t_ctid links. This patch covers the 8.0 branch. and the date of the commit to 8.0.X corresponds to the date that failures started to happen: http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=gerbilbr=REL8_0_STABLE BTW, I want to point out for others that when initdb dumps core trying to get a stack trace out of the initdb binary will probably be useless, because initdb is just calling other binaries. In this case we had sucess with the postgres binary. Had I know this I would have had this stack trace available a couple weeks ago. :( http://lnk.nu/developer.postgresql.org/3zx.c is the annotated version of tqual. As Bruce mentioned, the line referenced in the core file probably isn't the culprit. http://lnk.nu/pgbuildfarm.org/3zz.pl has the list of files that changed to break gerbil. Here's the output from gdb: #0 HeapTupleSatisfiesSnapshot (tuple=0xfe28fc78, snapshot=0xd7, buffer=295) at tqual.c:844 844 tqual.c: No such file or directory. in tqual.c (gdb) bt #0 HeapTupleSatisfiesSnapshot (tuple=0xfe28fc78, snapshot=0xd7, buffer=295) at tqual.c:844 #1 0x0004bdd0 in heap_update () #2 0x000ec4b0 in ExecutorRun (queryDesc=0x0, direction=-4198192, count=16) at execMain.c:1592 (gdb) I'm in the process of trying to get this machine moved someplace where I could give a developer ssh access. That should hopefully happen by the end of the week. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 6: explain analyze is your friend