[HACKERS] Beta 6 Regression results on Redat 7.0.

2001-03-20 Thread Lamar Owen

Ok, thanks to our snowstorm :-0 I have been working on the beta 6 RPM situation
on my _slow_ notebook today (power outages for ten minutes at a time happening
at hour or so intervals due to 45mph+ winds and a foot of snow).

Well, I have preliminary RPM's built -- just need to work on the contrib tree
situation.  I ran regression the usual RPM way (which I am fully aware is not
the normally approved method, but it _would_ be the method any RPM beta testers
would use), and got a different failure, one that is not locale related
(LC_ALL=C both for the initdb and the postmaster startup in the newest
initscript).  See attached regression.diffs for details of the temptest failure
I experienced.

Regression run with CWD=/usr/share/test/regress, user=postgres.
./pg_regress --schedule=parallel_schedule

This is the only regression test failure I have found thus far. I have never
seen this failure before, so I'm not sure where to proceed.

Now to attack the contrib tree (looking forward to my new notebook, as this old
P133 takes an hour and twenty minutes to slog through a full build).

Seeing that RC1 is in prep, is there a pressing need to upload and release beta
6 RPM's, or will it be a day or two before RC1?
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11

*** ./expected/temp.out Sat Jan  8 22:48:39 2000
--- ./results/temp.out  Tue Mar 20 16:06:10 2001
***
*** 23,32 
  (1 row)
  
  DROP TABLE temptest;
  SELECT * FROM temptest;
   col 
  -
!1
  (1 row)
  
  DROP TABLE temptest;
--- 23,34 
  (1 row)
  
  DROP TABLE temptest;
+ NOTICE:  FlushRelationBuffers(temptest, 0): block 0 is referenced (private 0, global 
+1)
+ ERROR:  heap_drop_with_catalog: FlushRelationBuffers returned -2
  SELECT * FROM temptest;
   col 
  -
!2
  (1 row)
  
  DROP TABLE temptest;
***
*** 34,37 
  -- test temp table deletion
  \c regression
  SELECT * FROM temptest;
! ERROR:  Relation 'temptest' does not exist
--- 36,43 
  -- test temp table deletion
  \c regression
  SELECT * FROM temptest;
!  col 
! -
!1
! (1 row)
! 

==




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Beta 6 Regression results on Redat 7.0.

2001-03-20 Thread Tom Lane

Lamar Owen [EMAIL PROTECTED] writes:
   DROP TABLE temptest;
 + NOTICE:  FlushRelationBuffers(temptest, 0): block 0 is referenced (private 0, 
global 1)
 + ERROR:  heap_drop_with_catalog: FlushRelationBuffers returned -2
   SELECT * FROM temptest;

Hoo, that's interesting ...  Exactly what fileset were you using again?

 Seeing that RC1 is in prep, is there a pressing need to upload and
 release beta 6 RPM's, or will it be a day or two before RC1?

I think you might as well wait for RC1 as far as actually making RPMs
goes.  But do you want to let anyone else check out the RPM build
process?  For instance, I've been wondering what you did about the
which-set-of-headers-to-install issue.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Beta 6 Regression results on Redat 7.0.

2001-03-20 Thread The Hermit Hacker

On Tue, 20 Mar 2001, Lamar Owen wrote:

 Ok, thanks to our snowstorm :-0 I have been working on the beta 6 RPM situation
 on my _slow_ notebook today (power outages for ten minutes at a time happening
 at hour or so intervals due to 45mph+ winds and a foot of snow).

 Well, I have preliminary RPM's built -- just need to work on the contrib tree
 situation.  I ran regression the usual RPM way (which I am fully aware is not
 the normally approved method, but it _would_ be the method any RPM beta testers
 would use), and got a different failure, one that is not locale related
 (LC_ALL=C both for the initdb and the postmaster startup in the newest
 initscript).  See attached regression.diffs for details of the temptest failure
 I experienced.

 Regression run with CWD=/usr/share/test/regress, user=postgres.
 ./pg_regress --schedule=parallel_schedule

 This is the only regression test failure I have found thus far. I have never
 seen this failure before, so I'm not sure where to proceed.

 Now to attack the contrib tree (looking forward to my new notebook, as this old
 P133 takes an hour and twenty minutes to slog through a full build).

 Seeing that RC1 is in prep, is there a pressing need to upload and release beta
 6 RPM's, or will it be a day or two before RC1?

Im going to do RC1 tonight ... so no pressng need :)



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Beta 6 Regression results on Redat 7.0.

2001-03-20 Thread Lamar Owen

On Tue, 20 Mar 2001, Tom Lane wrote:
 Lamar Owen [EMAIL PROTECTED] writes:
DROP TABLE temptest;
  + NOTICE:  FlushRelationBuffers(temptest, 0): block 0 is referenced (private 0, 
global 1)
  + ERROR:  heap_drop_with_catalog: FlushRelationBuffers returned -2
SELECT * FROM temptest;
 
 Hoo, that's interesting ...  Exactly what fileset were you using again?

When you say 'fileset', I'm assuming you are referring to the --schedule
parameter -- I am invoking the following command:
./pg_regress --schedule=parallel_schedule  

7.1beta6 distribution tarball.  LC_ALL=C.  Compiled on RedHat 7 as shipped.

I'm rerunning to see if it is intermittent. Second run -- no error.  Running a
third time..no error.  Now I'm confused.  What would cause such an error,
Tom?  I'm going to check on my desktop, once power gets more stable (and it
quits lightning -- yes, a snowstorm with lightning :-0  I certainly got what I
wanted.).  So, more to come later.

  Seeing that RC1 is in prep, is there a pressing need to upload and
  release beta 6 RPM's, or will it be a day or two before RC1?
 
 I think you might as well wait for RC1 as far as actually making RPMs
 goes.  But do you want to let anyone else check out the RPM build
 process?  For instance, I've been wondering what you did about the
 which-set-of-headers-to-install issue.

Oh, ok.  Spec file attached.  All other files needed are the beta6 tarball and
the contents of the beta4-1 source rpm, with names changed to match the beta6
version number.  There are some other changes I have to merge in --
particularly a set from Karl for the optional PL/Perl build, as well as others,
so this is a preliminary spec file.

But I was just getting the basic build done and tested.

To directly answer your question, I'm using 'make install-all-headers' and
stuffing it into the devel rpm in one piece at this time.
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11

Summary: PostgreSQL client programs and libraries.
Name: postgresql
Version: 7.1beta6
Release: 0.2
License: BSD
Group: Applications/Databases
Source0: ftp://ftp.postgresql.org/pub/source/v%{version}/postgresql-%{version}.tar.gz
Source3: postgresql.init-%{version}
Source4: file-lists-pgsql-%{version}.tar.gz
Source5: ftp://ftp.postgresql.org/pub/source/v%{version}/postgresql-%{version}.tar.gz.md5
Source6: README.rpm-dist.postgresql-%{version}
Source7: pg-migration-scripts-%{version}.tar.gz
Source8: logrotate.postgresql-%{version}
Source10: http://www.retep.org.uk/postgres/jdbc7.0-1.1.jar
Source11: http://www.retep.org.uk/postgres/jdbc7.0-1.2.jar
Source12: postgresql-dump.1.gz
Source14: rh-pgdump.sh
Patch1: rpm-pgsql-%{version}.patch
Requires: perl
Prereq: /sbin/chkconfig /sbin/ldconfig /usr/sbin/useradd initscripts
BuildPrereq: python-devel perl tcl /lib/cpp
Url: http://www.postgresql.org/ 
Obsoletes: postgresql-clients
Buildroot: %{_tmppath}/%{name}-%{version}-root


# This is the PostgreSQL Global Development Group Official RPMset spec file.
# Copyright 2000 Lamar Owen [EMAIL PROTECTED] [EMAIL PROTECTED]
# and others listed.

# Major Contributors:
# ---
# Lamar Owen
# Trond Eivind Glomsrød [EMAIL PROTECTED]
# Thomas Lockhart

# This spec file and ancilliary files are licensed in accordance with 
# The PostgreSQL license.

#Below are the default build package list macros.  These can be overridden by defining
# on the rpm command line:
# rpm --define 'packagename 1'  to force the package to build.
# rpm --define 'packagename 0'  to force the package NOT to build.
# The base package, the lib package, the devel package, and the server package always get built.

%{!?perl:%define perl 1}
%{!?tcl:%define tcl 1}
%{!?tkpkg:%define tkpkg %{expand:tcl}}
%{!?odbc:%define odbc 1}
%{!?jdbc:%define jdbc 1}
%{!?test:%define test 1}
%{!?python:%define python 1}
%{!?pltcl:%define pltcl 1}
%{!?plperl:%define plperl 1}

# Utility feature defines.
%{!?enable_mb:%define enable_mb 1}
%{!?pgacess:%define pgaccess 1}

%dump
%description
PostgreSQL is an advanced Object-Relational database management system
(DBMS) that supports almost all SQL constructs (including
transactions, subselects and user-defined types and functions). The
postgresql package includes the client programs and libraries that
you'll need to access a PostgreSQL DBMS server.  These PostgreSQL
client programs are programs that directly manipulate the internal
structure of PostgreSQL databases on a PostgreSQL server. These client
programs can be located on the same machine with the PostgreSQL
server, or may be on a remote machine which accesses a PostgreSQL
server over a network connection. This package contains the client
libraries for C and C++, as well as command-line utilities for
managing PostgreSQL databases on a PostgreSQL server. 

If you want to manipulate a PostgreSQL database on a remote PostgreSQL
server, you need this package. You also need to install this package
if you're installing the postgresql-server package.

%package libs
Summary: The 

Re: [HACKERS] Beta 6 Regression results on Redat 7.0.

2001-03-20 Thread Tom Lane

Lamar Owen [EMAIL PROTECTED] writes:
 DROP TABLE temptest;
 + NOTICE:  FlushRelationBuffers(temptest, 0): block 0 is referenced (private 0, 
global 1)
 + ERROR:  heap_drop_with_catalog: FlushRelationBuffers returned -2
 SELECT * FROM temptest;
 
 Hoo, that's interesting ...  Exactly what fileset were you using again?

 When you say 'fileset', I'm assuming you are referring to the --schedule
 parameter --

No, I was wondering about whether you had an inconsistent set of source
files, or had managed to not do a complete rebuild, or something like
that.  The above error should be entirely impossible considering that
the table in question is a temp table that's not been touched by any
other backend.  If you did manage to get this from a clean build then
I think we have a serious problem to look at.

 I think you might as well wait for RC1 as far as actually making RPMs
 goes.  But do you want to let anyone else check out the RPM build
 process?  For instance, I've been wondering what you did about the
 which-set-of-headers-to-install issue.

 Oh, ok.  Spec file attached.  All other files needed are the beta6 tarball and
 the contents of the beta4-1 source rpm, with names changed to match the beta6
 version number.

OK, I will pull the files and try to replicate this on my own laptop.
Does anyone else have time to try to duplicate the problem tonight?
If it's replicatable at all, I think it's a release stopper.

 To directly answer your question, I'm using 'make install-all-headers' and
 stuffing it into the devel rpm in one piece at this time.

Works for me.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Beta 6 Regression results on Redat 7.0.

2001-03-20 Thread Lamar Owen

On Tue, 20 Mar 2001, Tom Lane wrote:
 Lamar Owen [EMAIL PROTECTED] writes:
  DROP TABLE temptest;
  + NOTICE:  FlushRelationBuffers(temptest, 0): block 0 is referenced (private 0, 
global 1)
  + ERROR:  heap_drop_with_catalog: FlushRelationBuffers returned -2
  SELECT * FROM temptest;

  Hoo, that's interesting ...  Exactly what fileset were you using again?
 
  When you say 'fileset', I'm assuming you are referring to the --schedule
  parameter --
 
 No, I was wondering about whether you had an inconsistent set of source
 files, or had managed to not do a complete rebuild, or something like
 that.  The above error should be entirely impossible considering that
 the table in question is a temp table that's not been touched by any
 other backend.  If you did manage to get this from a clean build then
 I think we have a serious problem to look at.

Standard RPM rebuild -- always wipes the whole build tree out and re-expands
from the tarball, reapplies patches, and rebuilds from scratch every time I
change even the smallest detail in the spec file -- which is why it takes so
long to get these things out.  So, no, this is a scratch build from a fresh
tarball.

 Does anyone else have time to try to duplicate the problem tonight?
 If it's replicatable at all, I think it's a release stopper.

I have not yet been able to repeat the problem.  I am running my fifth
regression test run (which takes a long time on this P133) with a freshly
initdb'ed PGDATA -- the previous regression runs were done on the same PGDATA
tree as the first run was done on.  Took 12 minutes 40 seconds, but I can't
repeat the error. I'm hoping it was a problem on my machine -- educate me on
what caused the error so I can see if something in my setup did something not
so nice.  So, the score is one error out of six test runs, thus far.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Beta 6 Regression results on Redat 7.0.

2001-03-20 Thread Tom Lane

Lamar Owen [EMAIL PROTECTED] writes:
 I'm hoping it was a problem on my machine -- educate me on
 what caused the error

Well, that's exactly what I'd like to know.  The direct cause of the
error is that DROP TABLE is finding that some other backend has a
reference-count hold on a page of the temp table it's trying to drop.
Since no other backend should be trying to touch this temp table,
there's something pretty fishy here.

Given that this is a parallel test, you may be looking at a
low-probability timing-dependent failure.  I'd say set up the machine
and run repeat tests for an hour or three ... that's what I plan to do
here.

BTW, what postmaster parameters are you using --- -B and so forth?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



RE: [HACKERS] Beta 6 Regression results on Redat 7.0.

2001-03-20 Thread Mikheev, Vadim

 I'm rerunning to see if it is intermittent. Second run -- no 
 error.  Running a third time..no error.  Now I'm confused.
  What would cause such an error, Tom? I'm going to check on my

Hmm, concurrent checkpoint? Probably we could simplify dirty test
in ByfferSync() - ie test bufHdr-cntxDirty without holding
shlock (and pin!) on buffer: should be good as long as we set
cntxDirty flag *before* XLogInsert in access methods. Have to
look more...

Vadim

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Beta 6 Regression results on Redat 7.0.

2001-03-20 Thread Lamar Owen

On Tue, 20 Mar 2001, Tom Lane wrote:
 Since no other backend should be trying to touch this temp table,
 there's something pretty fishy here.

I see.
 
 Given that this is a parallel test, you may be looking at a
 low-probability timing-dependent failure.  I'd say set up the machine
 and run repeat tests for an hour or three ... that's what I plan to do
 here.

As a broadcast engineer, I'm a little too familiar with such things.  But this
isn't an engineer list, so I'll spare you the war stories. :-)

 BTW, what postmaster parameters are you using --- -B and so forth?

Default.  To be changed before RPM release, but currently it is the default.
The only option that postmaster.opts records is -D, and I'm not passing
anything else. 
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Beta 6 Regression results on Redat 7.0.

2001-03-20 Thread The Hermit Hacker

On Tue, 20 Mar 2001, Tom Lane wrote:

 Lamar Owen [EMAIL PROTECTED] writes:
  I'm hoping it was a problem on my machine -- educate me on
  what caused the error

 Well, that's exactly what I'd like to know.  The direct cause of the
 error is that DROP TABLE is finding that some other backend has a
 reference-count hold on a page of the temp table it's trying to drop.
 Since no other backend should be trying to touch this temp table,
 there's something pretty fishy here.

 Given that this is a parallel test, you may be looking at a
 low-probability timing-dependent failure.  I'd say set up the machine
 and run repeat tests for an hour or three ... that's what I plan to do
 here.

Okay, I roll'd an RC1 but haven't put it up for FTP yet ... I'll wait for
a few hours to see if anyone can reproduce this, and, if not, put out what
I've rolled ...

say, 00:00AST ...


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Beta 6 Regression results on Redat 7.0.

2001-03-20 Thread Tom Lane

"Mikheev, Vadim" [EMAIL PROTECTED] writes:
 Hmm, concurrent checkpoint? Probably we could simplify dirty test
 in ByfferSync() - ie test bufHdr-cntxDirty without holding
 shlock (and pin!) on buffer: should be good as long as we set
 cntxDirty flag *before* XLogInsert in access methods. Have to
 look more...

Yes, I'm wondering if some other backend is trying to write/flush
the buffer (maybe as part of a checkpoint, maybe not).  But seems
like we should have seen this before, if so; that's not a low-
probability scenario, particularly with just 64 buffers...

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Beta 6 Regression results on Redat 7.0.

2001-03-20 Thread Tom Lane

The Hermit Hacker [EMAIL PROTECTED] writes:
 Okay, I roll'd an RC1 but haven't put it up for FTP yet ... I'll wait for
 a few hours to see if anyone can reproduce this, and, if not, put out what
 I've rolled ...

This will not be RC1 :-(

I'm been running one backend doing repeated iterations of

CREATE TABLE temptest(col int);
INSERT INTO temptest VALUES (1);

CREATE TEMP TABLE temptest(col int);
INSERT INTO temptest VALUES (2);
SELECT * FROM temptest;
DROP TABLE temptest;

SELECT * FROM temptest;
DROP TABLE temptest;

and another one doing repeated CHECKPOINTs.  I've already gotten a
couple occurrences of Lamar's failure.

I think the problem is that BufferSync unconditionally does PinBuffer
on each buffer, and holds the pin during intervals where it's released
BufMgrLock, even if there's not really anything for it to do on that
buffer.  If someone else is running FlushRelationBuffers then it's
possible for that routine to see a nonzero pin count when it looks.

Vadim, what do you think about how to change this?  I think this is
BufferSync's fault not FlushRelationBuffers's ...

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Beta 6 Regression results on Redat 7.0.

2001-03-20 Thread Lamar Owen

On Tue, 20 Mar 2001, Tom Lane wrote:
 This will not be RC1 :-(
 'Ive already gotten a
 couple occurrences of Lamar's failure.

Well, I was at least hoping it was a problem here -- particularly since I
haven't been able to reproduce it.  But, since it is not a local problem, I'm
glad I caught it -- on the first regression test run, no less.  I've run a
dozen tests since without duplication.

Although, like you, Tom, I'm curious as to why it hadn't showed up before -- is
the fact that this is a slow machine a factor, possibly?

Although I am now much more leery of our regression suite -- this issue isn't
even tested, in reality.  Do we have _any_ WAL-related tests?  The parallel
testing is a good thing -- but I wonder what boundary conditions aren't getting
tested.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Beta 6 Regression results on Redat 7.0.

2001-03-20 Thread Tom Lane

Lamar Owen [EMAIL PROTECTED] writes:
 Although I am now much more leery of our regression suite

The regression tests are not at all designed to test concurrent
behavior, and never have been.  The parallel form runs some tests
in parallel, true, but those tests are deliberately designed not to
interact.  So I don't put any faith in the regression tests as a means
to catch bugs like this.  We need some thought and work on better
concurrent tests...

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



RE: [HACKERS] Beta 6 Regression results on Redat 7.0.

2001-03-20 Thread Mikheev, Vadim

 I think the problem is that BufferSync unconditionally does PinBuffer
 on each buffer, and holds the pin during intervals where it's released
 BufMgrLock, even if there's not really anything for it to do on that
 buffer.  If someone else is running FlushRelationBuffers then it's
 possible for that routine to see a nonzero pin count when it looks.
 
 Vadim, what do you think about how to change this?  I think this is
 BufferSync's fault not FlushRelationBuffers's ...

I'm looking there right now...

Vadim

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Beta 6 Regression results on Redat 7.0.

2001-03-20 Thread Tom Lane

 I think the problem is that BufferSync unconditionally does PinBuffer
 on each buffer, and holds the pin during intervals where it's released
 BufMgrLock, even if there's not really anything for it to do on that
 buffer.  If someone else is running FlushRelationBuffers then it's
 possible for that routine to see a nonzero pin count when it looks.

Further note: this bug does not arise in 7.0.* because in that code,
BufferSync will only pin buffers that have been dirtied in the current
transaction.  This cannot affect a concurrent FlushRelationBuffers,
which should be holding exclusive lock on the table it's flushing.

Or can it?  The above is safe enough for user tables, but on system
tables we have a bad habit of releasing locks early.  It seems possible
that a VACUUM on a system table might see pins due to BufferSyncs
running in concurrent transactions that have altered that system table.

Perhaps this issue does explain some of the reports of
FlushRelationBuffers failure that we've seen from the field.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



RE: [HACKERS] Beta 6 Regression results on Redat 7.0.

2001-03-20 Thread Mikheev, Vadim

 Further note: this bug does not arise in 7.0.* because in that code,
 BufferSync will only pin buffers that have been dirtied in the current
 transaction.  This cannot affect a concurrent FlushRelationBuffers,
 which should be holding exclusive lock on the table it's flushing.
 
 Or can it?  The above is safe enough for user tables, but on system
 tables we have a bad habit of releasing locks early. It seems possible
 that a VACUUM on a system table might see pins due to BufferSyncs
 running in concurrent transactions that have altered that system table.
 
 Perhaps this issue does explain some of the reports of
 FlushRelationBuffers failure that we've seen from the field.

Another possible source of this problem (in 7.0.X) is BufferReplace..?

Vadim

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster