Re: [HACKERS] Gerbil build farm failure

2005-09-27 Thread Jim C. Nasby
On Mon, Sep 26, 2005 at 06:58:16PM -0400, Tom Lane wrote:
 Michael Fuhr [EMAIL PROTECTED] writes:
  Gerbil's looking better lately:
  http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=gerbilbr=REL8_0_STABLE
 
 Yeah.  We've been poking at it off-list, and it seems that the problem
 was a local build failure due to not having a clean copy of the
 repository (ye olde junk-in-the-supposedly-clean-vpath-tree problem).

Well, just to be clear, I first logged into that box after the problem
started. It's possible that someone else had mucked with the install,
but unlikely. I suspect that there was a real build issue of some kind
to start with. Since it's working now I guess it doesn't matter, but I'd
still suspect code from back when the problem started.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Gerbil build farm failure

2005-09-26 Thread Michael Fuhr
Gerbil's looking better lately:

http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=gerbilbr=REL8_0_STABLE

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Gerbil build farm failure

2005-09-26 Thread Tom Lane
Michael Fuhr [EMAIL PROTECTED] writes:
 Gerbil's looking better lately:
 http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=gerbilbr=REL8_0_STABLE

Yeah.  We've been poking at it off-list, and it seems that the problem
was a local build failure due to not having a clean copy of the
repository (ye olde junk-in-the-supposedly-clean-vpath-tree problem).

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Gerbil build farm failure

2005-09-22 Thread Bruce Momjian

Now that we have backtrace, does anyone have a clue about the cause/fix?

---

Jim C. Nasby wrote:
 On Tue, Sep 20, 2005 at 01:17:10PM -0400, Bruce Momjian wrote:
  I worked with Jim Nasby and we found this is the line that is failing on
  Gerbil in the build farm during initdb: tqual.c, line 844 in 8.0.X
  
  if (HeapTupleHeaderGetCmin(tuple) = snapshot-curcid)
  
  This particular line was last modified in 2002.  However, this was a
  file that was changed as part of the VACUUM tuple chain commit:
  
  revision 1.81.4.2
  date: 2005/08/25 19:45:01;  author: tgl;  state: Exp;  lines: +7 -4
  Back-patch fixes for problems with VACUUM destroying t_ctid chains too 
  soon,
  and with insufficient paranoia in code that follows t_ctid links.
  This patch covers the 8.0 branch.
  
  and the date of the commit to 8.0.X corresponds to the date that
  failures started to happen:
  
  
  http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=gerbilbr=REL8_0_STABLE
 
 BTW, I want to point out for others that when initdb dumps core trying
 to get a stack trace out of the initdb binary will probably be useless,
 because initdb is just calling other binaries. In this case we had
 sucess with the postgres binary. Had I know this I would have had this
 stack trace available a couple weeks ago. :(
 
 http://lnk.nu/developer.postgresql.org/3zx.c is the annotated version of
 tqual. As Bruce mentioned, the line referenced in the core file probably
 isn't the culprit. http://lnk.nu/pgbuildfarm.org/3zz.pl has the list of
 files that changed to break gerbil.
 
 Here's the output from gdb:
 #0  HeapTupleSatisfiesSnapshot (tuple=0xfe28fc78, snapshot=0xd7, buffer=295) 
 at tqual.c:844
 844 tqual.c: No such file or directory.
 in tqual.c
 (gdb) bt
 #0  HeapTupleSatisfiesSnapshot (tuple=0xfe28fc78, snapshot=0xd7, buffer=295) 
 at tqual.c:844
 #1  0x0004bdd0 in heap_update ()
 #2  0x000ec4b0 in ExecutorRun (queryDesc=0x0, direction=-4198192, count=16) 
 at execMain.c:1592
 (gdb)
 
 I'm in the process of trying to get this machine moved someplace where I
 could give a developer ssh access. That should hopefully happen by the
 end of the week.
 -- 
 Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
 Pervasive Software  http://pervasive.comwork: 512-231-6117
 vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461
 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Gerbil build farm failure

2005-09-22 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes:
 Now that we have backtrace, does anyone have a clue about the cause/fix?

The backtrace suggests a garbage snapshot value, but doesn't provide
nearly enough info to guess where it's coming from.  I'm waiting for the
promised ssh access...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Gerbil build farm failure

2005-09-22 Thread Jim C. Nasby
On Fri, Sep 23, 2005 at 12:56:33AM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  Fire lit under IT dept. Their initial plan was everything outbound but
  SSH would be cut-off, which I nixed, but would that suffice in the short
  term if it means getting the box on the net faster?
 
 AFAICS, an ssh connection to an unprivileged account should be enough.
 I just need to be able to duplicate your build environment.

Ok, if that greases the wheels I'll have them do that. Hopefully they
can get it done tomorrow.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Gerbil build farm failure

2005-09-22 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 Fire lit under IT dept. Their initial plan was everything outbound but
 SSH would be cut-off, which I nixed, but would that suffice in the short
 term if it means getting the box on the net faster?

AFAICS, an ssh connection to an unprivileged account should be enough.
I just need to be able to duplicate your build environment.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Gerbil build farm failure

2005-09-22 Thread Jim C. Nasby
On Thu, Sep 22, 2005 at 08:03:43PM -0400, Tom Lane wrote:
 Bruce Momjian pgman@candle.pha.pa.us writes:
  Now that we have backtrace, does anyone have a clue about the cause/fix?
 
 The backtrace suggests a garbage snapshot value, but doesn't provide
 nearly enough info to guess where it's coming from.  I'm waiting for the
 promised ssh access...

Fire lit under IT dept. Their initial plan was everything outbound but
SSH would be cut-off, which I nixed, but would that suffice in the short
term if it means getting the box on the net faster?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Gerbil build farm failure

2005-09-20 Thread Jim C. Nasby
On Tue, Sep 20, 2005 at 01:17:10PM -0400, Bruce Momjian wrote:
 I worked with Jim Nasby and we found this is the line that is failing on
 Gerbil in the build farm during initdb: tqual.c, line 844 in 8.0.X
 
   if (HeapTupleHeaderGetCmin(tuple) = snapshot-curcid)
 
 This particular line was last modified in 2002.  However, this was a
 file that was changed as part of the VACUUM tuple chain commit:
 
   revision 1.81.4.2
   date: 2005/08/25 19:45:01;  author: tgl;  state: Exp;  lines: +7 -4
   Back-patch fixes for problems with VACUUM destroying t_ctid chains too 
 soon,
   and with insufficient paranoia in code that follows t_ctid links.
   This patch covers the 8.0 branch.
 
 and the date of the commit to 8.0.X corresponds to the date that
 failures started to happen:
 
   
 http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=gerbilbr=REL8_0_STABLE

BTW, I want to point out for others that when initdb dumps core trying
to get a stack trace out of the initdb binary will probably be useless,
because initdb is just calling other binaries. In this case we had
sucess with the postgres binary. Had I know this I would have had this
stack trace available a couple weeks ago. :(

http://lnk.nu/developer.postgresql.org/3zx.c is the annotated version of
tqual. As Bruce mentioned, the line referenced in the core file probably
isn't the culprit. http://lnk.nu/pgbuildfarm.org/3zz.pl has the list of
files that changed to break gerbil.

Here's the output from gdb:
#0  HeapTupleSatisfiesSnapshot (tuple=0xfe28fc78, snapshot=0xd7, buffer=295) at 
tqual.c:844
844 tqual.c: No such file or directory.
in tqual.c
(gdb) bt
#0  HeapTupleSatisfiesSnapshot (tuple=0xfe28fc78, snapshot=0xd7, buffer=295) at 
tqual.c:844
#1  0x0004bdd0 in heap_update ()
#2  0x000ec4b0 in ExecutorRun (queryDesc=0x0, direction=-4198192, count=16) at 
execMain.c:1592
(gdb)

I'm in the process of trying to get this machine moved someplace where I
could give a developer ssh access. That should hopefully happen by the
end of the week.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend