Re: Why don't we have a small reserved OID range for patch revisions?

2019-03-12 Thread Tom Lane
John Naylor  writes:
> On Tue, Mar 12, 2019 at 5:36 AM Tom Lane  wrote:
>> This seems committable from my end --- any further comments?

> I gave it a read and it looks good to me, but I haven't tried to run it.

Thanks for checking.  I've pushed both patches now.

I noticed while looking at the pg_class data that someone had stuck in a
hack to make genbki.pl substitute for "PGHEAPAM", which AFAICS is just
following the bad old precedent of PGNSP and PGUID.  I got rid of that
in favor of using the already-existing BKI_LOOKUP(pg_am) mechanism.
Maybe someday we should try to get rid of PGNSP and PGUID too, although
there are stumbling blocks in the way of both:

* PGNSP is also substituted for in the bodies of some SQL procedures.

* Replacing PGUID with the actual name of the bootstrap superuser is a
bit problematic because that name isn't necessarily "postgres".  We
could probably make it work, but I'm not convinced it'd be any less
confusing than the existing special-case behavior is.

Anyway I think we're basically done here.  There's some additional
cleanup that could possibly be done, like removing the hard-wired
references to OID 1 in initdb.c.  But I'm having a hard time convincing
myself that it's worth the trouble, except maybe for the question of
information_schema.sql's hard-wired type OIDs.  Even there, it's
certainly possible for a patch to use a regtype constant even if
the existing code doesn't.

regards, tom lane



Re: Why don't we have a small reserved OID range for patch revisions?

2019-03-12 Thread John Naylor
On Tue, Mar 12, 2019 at 5:36 AM Tom Lane  wrote:
> This seems committable from my end --- any further comments?

I gave it a read and it looks good to me, but I haven't tried to run it.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Why don't we have a small reserved OID range for patch revisions?

2019-03-11 Thread Tom Lane
I wrote:
> I've successfully done check-world after renumbering every OID above
> 4000 to somewhere else.  I also tried renumbering everything below
> 4000, which unsurprisingly blew up because there are various catalog
> columns we haven't fixed to use symbolic OIDs.  (The one that initdb
> immediately trips over is pg_database.dattablespace.)  I'm not sure
> if it's worth the trouble to make that totally clean, but I suspect
> we ought to at least mop up text-search references to be symbolic.
> That's material for a separate patch though.

So I couldn't resist poking at that, and after a couple hours' work
I have the attached patch, which removes all remaining hard-wired
OID references in the .dat files.

Using this, I renumbered all the OIDs in include/catalog, and behold
things pretty much worked.  I got through check-world after hacking
up these points:

* Unsurprisingly, there are lots of regression tests that have object
OIDs hard-wired in queries and/or expected output.

* initdb.c has a couple of places that know that template1 has OID 1.

* information_schema.sql has several SQL-language functions that
hard-wire the OIDs of assorted built-in types.

I'm not particularly fussed about the first two points, but the
last is a bit worrisome.  It's not too hard to imagine somebody
adding knowledge of their new type to those functions, and the
code getting broken by a renumbering pass, and us not noticing
if the point isn't stressed by a regression test (which mostly
those functions aren't).

We could imagine fixing those functions along the lines of

   CASE WHEN $2 = -1 /* default typmod */
THEN null
-   WHEN $1 IN (1042, 1043) /* char, varchar */
+   WHEN $1 IN ('pg_catalog.bpchar'::pg_catalog.regtype,
+   'pg_catalog.varchar'::pg_catalog.regtype)
THEN $2 - 4

which would add some parsing overhead, but I'm not sure if anyone
would notice that.

regards, tom lane

diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 10c2b24..ce159ae 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -174,6 +174,20 @@ foreach my $row (@{ $catalog_data{pg_am} })
 	$amoids{ $row->{amname} } = $row->{oid};
 }
 
+# class (relation) OID lookup (note this only covers bootstrap catalogs!)
+my %classoids;
+foreach my $row (@{ $catalog_data{pg_class} })
+{
+	$classoids{ $row->{relname} } = $row->{oid};
+}
+
+# collation OID lookup
+my %collationoids;
+foreach my $row (@{ $catalog_data{pg_collation} })
+{
+	$collationoids{ $row->{collname} } = $row->{oid};
+}
+
 # language OID lookup
 my %langoids;
 foreach my $row (@{ $catalog_data{pg_language} })
@@ -243,6 +257,41 @@ foreach my $row (@{ $catalog_data{pg_proc} })
 	}
 }
 
+# tablespace OID lookup
+my %tablespaceoids;
+foreach my $row (@{ $catalog_data{pg_tablespace} })
+{
+	$tablespaceoids{ $row->{spcname} } = $row->{oid};
+}
+
+# text search configuration OID lookup
+my %tsconfigoids;
+foreach my $row (@{ $catalog_data{pg_ts_config} })
+{
+	$tsconfigoids{ $row->{cfgname} } = $row->{oid};
+}
+
+# text search dictionary OID lookup
+my %tsdictoids;
+foreach my $row (@{ $catalog_data{pg_ts_dict} })
+{
+	$tsdictoids{ $row->{dictname} } = $row->{oid};
+}
+
+# text search parser OID lookup
+my %tsparseroids;
+foreach my $row (@{ $catalog_data{pg_ts_parser} })
+{
+	$tsparseroids{ $row->{prsname} } = $row->{oid};
+}
+
+# text search template OID lookup
+my %tstemplateoids;
+foreach my $row (@{ $catalog_data{pg_ts_template} })
+{
+	$tstemplateoids{ $row->{tmplname} } = $row->{oid};
+}
+
 # type lookups
 my %typeoids;
 my %types;
@@ -287,14 +336,21 @@ close $ef;
 
 # Map lookup name to the corresponding hash table.
 my %lookup_kind = (
-	pg_am   => \%amoids,
-	pg_language => \%langoids,
-	pg_opclass  => \%opcoids,
-	pg_operator => \%operoids,
-	pg_opfamily => \%opfoids,
-	pg_proc => \%procoids,
-	pg_type => \%typeoids,
-	encoding=> \%encids);
+	pg_am  => \%amoids,
+	pg_class   => \%classoids,
+	pg_collation   => \%collationoids,
+	pg_language=> \%langoids,
+	pg_opclass => \%opcoids,
+	pg_operator=> \%operoids,
+	pg_opfamily=> \%opfoids,
+	pg_proc=> \%procoids,
+	pg_tablespace  => \%tablespaceoids,
+	pg_ts_config   => \%tsconfigoids,
+	pg_ts_dict => \%tsdictoids,
+	pg_ts_parser   => \%tsparseroids,
+	pg_ts_template => \%tstemplateoids,
+	pg_type=> \%typeoids,
+	encoding   => \%encids);
 
 
 # Open temp files
@@ -730,8 +786,8 @@ sub morph_row_for_pgattr
 	$row->{attndims} = $type->{typcategory} eq 'A' ? '1' : '0';
 
 	# collation-aware catalog columns must use C collation
-	$row->{attcollation} = $type->{typcollation} != 0 ?
-	$C_COLLATION_OID : 0;
+	$row->{attcollation} =
+	  $type->{typcollation} ne '0' ? $C_COLLATION_OID : 0;
 
 	if (defined $attr->{forcenotnull})
 	{
diff --git a/src/include/catalog/pg_class.dat b/src/include/catalog/pg_class.dat
index 5a1f3aa..511d876 100644
--- 

Re: Why don't we have a small reserved OID range for patch revisions?

2019-03-11 Thread Tom Lane
John Naylor  writes:
> Now it looks like:
> perl  renumber_oids.pl  --first-mapped-oid 8000 --last-mapped-oid 8999
>  --first-target-oid 2000  *.dat
> To prevent a maintenance headache, I didn't copy any of the formatting
> logic over. You'll also have to run reformat_dat_files.pl afterwards
> to restore that. It seems to work, but I haven't tested thoroughly.

I didn't like the use of Data::Dumper, because it made it quite impossible
to check what the script had done by eyeball.  After some thought
I concluded that we could probably just apply the changes via
search-and-replace, which is pretty ugly and low-tech but it leads to
easily diffable results, whether or not the initial state is exactly
what reformat_dat_files would produce.

I also changed things so that the OID mapping is computed before we start
changing any files, because as it stood the objects would get renumbered
in a pretty random order; and I renamed one of the switches so they all
have unique one-letter abbreviations.

Experimenting with this, I realized that it couldn't renumber OIDs that
are defined in .h files rather than .dat files, which is a serious
deficiency, but given the search-and-replace implementation it's not too
hard to fix up the .h files as well.  So I did that, and removed the
expectation that the target files would be listed on the command line;
that seems more likely to be a foot-gun than to do anything useful.

I've successfully done check-world after renumbering every OID above
4000 to somewhere else.  I also tried renumbering everything below
4000, which unsurprisingly blew up because there are various catalog
columns we haven't fixed to use symbolic OIDs.  (The one that initdb
immediately trips over is pg_database.dattablespace.)  I'm not sure
if it's worth the trouble to make that totally clean, but I suspect
we ought to at least mop up text-search references to be symbolic.
That's material for a separate patch though.

This seems committable from my end --- any further comments?

regards, tom lane

diff --git a/doc/src/sgml/bki.sgml b/doc/src/sgml/bki.sgml
index 3c5830b..b6038b9 100644
--- a/doc/src/sgml/bki.sgml
+++ b/doc/src/sgml/bki.sgml
@@ -388,8 +388,25 @@
 to see which ones do not appear.  You can also use
 the duplicate_oids script to check for mistakes.
 (genbki.pl will assign OIDs for any rows that
-didn't get one hand-assigned to them and also detect duplicate OIDs
-at compile time.)
+didn't get one hand-assigned to them, and it will also detect duplicate
+OIDs at compile time.)
+   
+
+   
+When choosing OIDs for a patch that is not expected to be committed
+immediately, best practice is to use a group of more-or-less
+consecutive OIDs starting with some random choice in the range
+8000.  This minimizes the risk of OID collisions with other
+patches being developed concurrently.  To keep the 8000
+range free for development purposes, after a patch has been committed
+to the master git repository its OIDs should be renumbered into
+available space below that range.  Typically, this will be done
+near the end of each development cycle, moving all OIDs consumed by
+patches committed in that cycle at the same time.  The script
+renumber_oids.pl can be used for this purpose.
+If an uncommitted patch is found to have OID conflicts with some
+recently-committed patch, renumber_oids.pl may
+also be useful for recovering from that situation.

 

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 3bf308f..368b1de 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -87,6 +87,8 @@ sub ParseHeader
 		}
 
 		# Push the data into the appropriate data structure.
+		# Caution: when adding new recognized OID-defining macros,
+		# also update src/include/catalog/renumber_oids.pl.
 		if (/^DECLARE_TOAST\(\s*(\w+),\s*(\d+),\s*(\d+)\)/)
 		{
 			push @{ $catalog{toasting} },
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 833fad1..f253613 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -4,6 +4,9 @@
  *	  This file provides some definitions to support indexing
  *	  on system catalogs
  *
+ * Caution: all #define's with numeric values in this file had better be
+ * object OIDs, else renumber_oids.pl might change them inappropriately.
+ *
  *
  * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
diff --git a/src/include/catalog/renumber_oids.pl b/src/include/catalog/renumber_oids.pl
new file mode 100755
index 000..8a07340
--- /dev/null
+++ b/src/include/catalog/renumber_oids.pl
@@ -0,0 +1,291 @@
+#!/usr/bin/perl
+#--
+#
+# renumber_oids.pl
+#Perl script that shifts a range of OIDs in the Postgres 

Re: Why don't we have a small reserved OID range for patch revisions?

2019-03-10 Thread John Naylor
On Sat, Mar 9, 2019 at 1:14 AM Tom Lane  wrote:
> I took a quick look at this.  I went ahead and pushed the parts that
> were just code cleanup in reformat_dat_file.pl, since that seemed
> pretty uncontroversial.  As far as the rest of it goes:

Okay, thanks.

> * I'm really not terribly happy with sticking this functionality into
> reformat_dat_file.pl.  First, there's an issue of discoverability:
> it's not obvious that a script named that way would have such an
> ability.  Second, it clutters the script in a way that seems to me
> to hinder its usefulness as a basis for one-off hacks.  So I'd really
> rather have a separate script named something like "renumber_oids.pl",
> even if there's a good deal of code duplication between it and
> reformat_dat_file.pl.

> * In my vision of what this might be good for, I think it's important
> that it be possible to specify a range of input OIDs to renumber, not
> just "everything above N".  I agree the output range only needs a
> starting OID.

Now it looks like:

perl  renumber_oids.pl  --first-mapped-oid 8000 --last-mapped-oid 8999
 --first-target-oid 2000  *.dat

To prevent a maintenance headache, I didn't copy any of the formatting
logic over. You'll also have to run reformat_dat_files.pl afterwards
to restore that. It seems to work, but I haven't tested thoroughly.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From c6f5a1f6016feeb7f7ea94e504be6d23a264ca07 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Mon, 11 Mar 2019 01:05:15 +0800
Subject: [PATCH v2] Provide script to renumber OIDs.

Historically, It's been considered good practice to choose an OID that's
at the beginning of the range shown by the unused_oids script, but
nowadays that space is getting cramped, leading to merge conflicts.

Instead, allow developers to use high-numbered OIDs, and remap them
to a lower range at some future date.
---
 doc/src/sgml/bki.sgml|  16 +++
 src/include/catalog/renumber_oids.pl | 193 +++
 2 files changed, 209 insertions(+)
 create mode 100644 src/include/catalog/renumber_oids.pl

diff --git a/doc/src/sgml/bki.sgml b/doc/src/sgml/bki.sgml
index 3c5830bc8a..7e553dc30d 100644
--- a/doc/src/sgml/bki.sgml
+++ b/doc/src/sgml/bki.sgml
@@ -392,6 +392,22 @@
 at compile time.)

 
+   
+For a variety of reasons, newly assigned OIDs should occupy the lower
+end of the reserved range.  Because multiple patches considered for
+inclusion into PostgreSQL could be using
+the same new OIDs, there is the possibilty of conflict.  To mitigate
+this, there is a convention that OIDs 8000 and over are reserved for
+development.  This reduces the effort in rebasing over the HEAD branch.
+To enable committers to move these OIDs to the lower range easily,
+renumber_oids.pl can map OIDs as in the following,
+following, which maps any OIDs between 8000 and 8999, inclusive, to
+unused OIDs starting at 2000:
+
+$ perl  renumber_oids.pl  --first-mapped-oid 8000 --last-mapped-oid 8999  --first-target-oid 2000  *.dat
+
+   
+

 The OID counter starts at 1 at the beginning of a bootstrap run.
 If a row from a source other than postgres.bki
diff --git a/src/include/catalog/renumber_oids.pl b/src/include/catalog/renumber_oids.pl
new file mode 100644
index 00..79537465bc
--- /dev/null
+++ b/src/include/catalog/renumber_oids.pl
@@ -0,0 +1,193 @@
+#!/usr/bin/perl
+#--
+#
+# renumber_oids.pl
+#Perl script that shifts a range of high-numbered OIDs in the given
+#catalog data file(s) to a lower range.
+#
+#Note: This does not format the output, so you will still need to
+#run reformat_dat_file.pl.
+#
+# Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/include/catalog/renumber_oids.pl
+#
+#--
+
+use strict;
+use warnings;
+
+use FindBin;
+use Getopt::Long;
+use Data::Dumper;
+$Data::Dumper::Terse = 1;
+
+# If you copy this script to somewhere other than src/include/catalog,
+# you'll need to modify this "use lib" or provide a suitable -I switch.
+use lib "$FindBin::RealBin/../../backend/catalog/";
+use Catalog;
+
+# Must run in src/include/catalog
+chdir $FindBin::RealBin or die "could not cd to $FindBin::RealBin: $!\n";
+
+my $FirstGenbkiObjectId =
+  Catalog::FindDefinedSymbol('access/transam.h', '..',
+	'FirstGenbkiObjectId');
+
+# Process command line switches.
+my $output_path = '';
+my $first_mapped_oid;
+my $last_mapped_oid = $FirstGenbkiObjectId - 1;
+my $first_target_oid = 1;
+
+GetOptions(
+	'output:s'   => \$output_path,
+	'first-mapped-oid:i' => \$first_mapped_oid,
+	'last-mapped-oid:i'  => \$last_mapped_oid,
+	'first-target-oid:i' => 

Re: Why don't we have a small reserved OID range for patch revisions?

2019-03-08 Thread Tom Lane
John Naylor  writes:
>> On 2/8/19, Tom Lane  wrote:
>>> A script such as you suggest might be a good way to reduce the temptation
>>> to get lazy at the last minute.  Now that the catalog data is pretty
>>> machine-readable, I suspect it wouldn't be very hard --- though I'm
>>> not volunteering either.  I'm envisioning something simple like "renumber
>>> all OIDs in range - into range -", perhaps with the
>>> ability to skip any already-used OIDs in the target range.

>> This might be something that can be done inside reformat_dat_files.pl.
>> It's a little outside it's scope, but better than the alternatives.

> Along those lines, here's a draft patch to do just that. It handles
> array type oids as well. Run it like this:

> perl  reformat_dat_file.pl  --map-from 9000  --map-to 2000  *.dat

I took a quick look at this.  I went ahead and pushed the parts that
were just code cleanup in reformat_dat_file.pl, since that seemed
pretty uncontroversial.  As far as the rest of it goes:

* I'm really not terribly happy with sticking this functionality into
reformat_dat_file.pl.  First, there's an issue of discoverability:
it's not obvious that a script named that way would have such an
ability.  Second, it clutters the script in a way that seems to me
to hinder its usefulness as a basis for one-off hacks.  So I'd really
rather have a separate script named something like "renumber_oids.pl",
even if there's a good deal of code duplication between it and
reformat_dat_file.pl.

* In my vision of what this might be good for, I think it's important
that it be possible to specify a range of input OIDs to renumber, not
just "everything above N".  I agree the output range only needs a
starting OID.

BTW, I changed the CF entry's target back to v12; I don't see a
reason not to get this done this month, and indeed kind of wish
it was available right now ;-)

regards, tom lane



Re: Why don't we have a small reserved OID range for patch revisions?

2019-03-01 Thread Tom Lane
Peter Geoghegan  writes:
> On Fri, Mar 1, 2019 at 11:56 AM Robert Haas  wrote:
>> It would be neat if there were a tool you could run to somehow tell
>> you whether catversion needs to be changed for a given patch.

> That seems infeasible because of stored rules. A lot of things bleed
> into that. We could certainly do better at documenting this on the
> "committing checklist" page, though.

A first approximation to that is "did you touch readfuncs.c", though
that rule will give a false positive if you only changed Plan nodes.

regards, tom lane



Re: Why don't we have a small reserved OID range for patch revisions?

2019-03-01 Thread Peter Geoghegan
On Fri, Mar 1, 2019 at 11:56 AM Robert Haas  wrote:
> It would be neat if there were a tool you could run to somehow tell
> you whether catversion needs to be changed for a given patch.

That seems infeasible because of stored rules. A lot of things bleed
into that. We could certainly do better at documenting this on the
"committing checklist" page, though.

> Indeed, Simon got complaints a number of years ago (2010, it looks
> like) when he had the temerity to change the magic number to some
> other unrelated value instead of just incrementing it by one.

Off with his head!

> Although I think that the criticism was to a certain extent
> well-founded -- why deviate from previous practice? -- there is at the
> same time something a little crazy about somebody getting excited
> about the particular value that has been chosen for a number that is
> described in the very name of the constant as a MAGIC number.  And
> especially because there is absolutely zip in the way of code comments
> or a README that explain to you how to do it "right."

I have learned to avoid ambiguity more than anything else, because
ambiguity causes patches to flounder indefinitely, whereas it's
usually not that hard to fix something that's broken. I agree -
anything that adds ambiguity rather than taking it away is a big
problem.

> Well, perhaps I'm proposing some additional code, but I don't think of
> that as making the mechanism more complicated.  I want to make it
> simpler for patch submitters and reviewers and committers to not make
> mistakes that they have to run around and fix.

Right. So do I. I just don't think that it's that bad to ask the final
committer to do something once, rather than getting everyone else
(including committers) to do it multiple times. If we can avoid even
this burden, and totally centralize the management of the OID space,
then so much the better.

> If there are fewer
> kinds of things that qualify as mistakes, as in Tom's latest proposal,
> then we are moving in the right direction IMO regardless of anything
> else.

I'm glad that we now have a plan that is a clear step forward.

> I think Peter and I are more agreeing than we are at the opposite ends
> of a spectrum, but more importantly, I think it is worth having a
> discussion first about what people like and dislike, and what goals
> they have, and then only if necessary, counting the votes afterwards.

I agree that that's totally worthwhile.

> I don't like having the feeling that because I have a different view
> of something and want to write an email about that, I am somehow an
> impediment to progress.  I think if we reduce discussions to
> you're-for-it-or-your-against-it, that's not that helpful.

That was not my intention. The way that you brought the issue of the
difficulty of being a contributor in general into it was unhelpful,
though. It didn't seem useful or fair to link Tom's position to a big,
well known controversy.

We now have a solution that everyone is happy with, or can at least
live with, which suggests to me that Tom wasn't being intransigent or
insensitive to the concerns of contributors.

-- 
Peter Geoghegan



Re: Why don't we have a small reserved OID range for patch revisions?

2019-03-01 Thread Robert Haas
On Thu, Feb 28, 2019 at 5:36 PM Peter Geoghegan  wrote:
> I have attempted to institute some general guidelines for what the
> thicket of rules are by creating the "committing checklist" page. This
> is necessarily imperfect, because the rules are in many cases open to
> interpretation, often for good practical reasons. I don't have any
> sympathy for committers that find it hard to remember to do a
> catversion bump with any kind of regularity. That complexity seems
> inherent, not incidental, since it's often convenient to ignore
> catalog incompatibilities during development.

Bumping catversion is something of a special case, because one does it
so often that one gets used to remembering it.  The rules are usually
not too hard to remember, although they are trickier when you don't
directly change anything in src/include/catalog but just change the
definition of some node that may be serialized in a catalog someplace.
It would be neat if there were a tool you could run to somehow tell
you whether catversion needs to be changed for a given patch.

But you know there are a log of other version numbers floating around,
like XLOG_PAGE_MAGIC or the pg_dump archive version, and it is not
really that easy to know -- as a new contributor or sometimes even as
an experienced one -- whether your work requires any changes to that
stuff, or even that that stuff *exists*.  Indeed, XLOG_PAGE_MAGIC is a
particularly annoying case, both because the constant name doesn't
contain VERSION and because the comment just says /* can be used as
WAL version indicator */ which does not exactly make it clear that if
you fail to bump it when you touch the WAL format you will Make People
Unhappy.

Indeed, Simon got complaints a number of years ago (2010, it looks
like) when he had the temerity to change the magic number to some
other unrelated value instead of just incrementing it by one.
Although I think that the criticism was to a certain extent
well-founded -- why deviate from previous practice? -- there is at the
same time something a little crazy about somebody getting excited
about the particular value that has been chosen for a number that is
described in the very name of the constant as a MAGIC number.  And
especially because there is absolutely zip in the way of code comments
or a README that explain to you how to do it "right."

> > So, I suspect that for every
> > unit of work it saves somebody, it's probably going to generate about
> > one unit of extra work for somebody else.
>
> Maybe so. I think that you're jumping to conclusions, though.

I did say "I suspect..." which was intended as a concession that I
don't know for sure.

> But you seem to want to make the mechanism itself even more
> complicated, not less complicated (based on your remarks about making
> OID assignment happen during the build). In order to make the use of
> the mechanism easier. That seems worth considering, but ISTM that this
> is talking at cross purposes. There are far simpler ways of making it
> unlikely that a committer is going to miss this step. There is also a
> simple way of noticing that they do quickly (e.g. a simple buildfarm
> test).

Well, perhaps I'm proposing some additional code, but I don't think of
that as making the mechanism more complicated.  I want to make it
simpler for patch submitters and reviewers and committers to not make
mistakes that they have to run around and fix.  If there are fewer
kinds of things that qualify as mistakes, as in Tom's latest proposal,
then we are moving in the right direction IMO regardless of anything
else.

> > OK.  Well, I think that doing nothing is superior to this proposal,
> > for reasons similar to what Peter Eisentraut has already articulated.
> > And I think rather than blasting forward with your own preferred
> > alternative in the face of disagreement, you should be willing to
> > discuss other possible options.  But if you're not willing to do that,
> > I can't make you.
>
> Peter seemed to not want to do this on the grounds that it isn't
> necessary at all, whereas you think that it doesn't go far enough. If
> there is a consensus against what Tom has said, it's a cacophonous one
> that cannot really be said to be in favor of anything.

I think Peter and I are more agreeing than we are at the opposite ends
of a spectrum, but more importantly, I think it is worth having a
discussion first about what people like and dislike, and what goals
they have, and then only if necessary, counting the votes afterwards.
I don't like having the feeling that because I have a different view
of something and want to write an email about that, I am somehow an
impediment to progress.  I think if we reduce discussions to
you're-for-it-or-your-against-it, that's not that helpful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Why don't we have a small reserved OID range for patch revisions?

2019-03-01 Thread Robert Haas
On Thu, Feb 28, 2019 at 6:40 PM Tom Lane  wrote:
> 1. Encourage people to develop new patches using chosen-at-random
> high OIDs, in the 7K-9K range.  They do this already, it'd just
> be encouraged instead of discouraged.
>
> 2. Commit patches as received.
>
> 3. Once each devel cycle, after feature freeze, somebody uses the
> renumbering tool to shove all the new OIDs down to lower numbers,
> freeing the high-OID range for the next devel cycle.  We'd have
> to remember to do that, but it could be added to the RELEASE_CHANGES
> checklist.

Sure, that sounds nice.  It seems like it might be slightly less
convenient for non-committers than what I was proposing, but still
more convenient than what they're doing right now.  And it's also more
convenient for committers, because they're not being asked to manually
fiddle patches at the last moment, something that I at least find
rather error-prone.  It also, and I think this is really good, moves
in the direction of fewer things for both patch authors and patch
committers to worry about doing wrong.  Instead of throwing rocks at
people whose OID assignments are "wrong," we just accept what people
do and adjust it later if it makes sense to do so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-28 Thread Peter Geoghegan
On Thu, Feb 28, 2019 at 3:40 PM Tom Lane  wrote:
> Robert Haas  writes:
> >>> just as a thought, what if we stopped assigning manual OIDs for new
> >>> catalog entries altogether, except for once at the end of each release
> >>> cycle?
>
> Actually ... that leads to an idea that wouldn't add any per-commit
> overhead, or really much change at all to existing processes.  Given
> the existence of a reliable OID-renumbering tool, we could:

> In this scheme, OID collisions are a problem for in-progress patches
> only if two patches are unlucky enough to choose the same random
> high OIDs during the same devel cycle.  That's unlikely, or at least
> a good bit less likely than collisions are today.

That sounds like a reasonable compromise. Perhaps the unused_oids
script could give specific guidance on using a randomly determined
small range of contiguous OIDs that fall within the current range for
that devel cycle. That would prevent collisions caused by the natural
human tendency to prefer a round number. Having contiguous OIDs for
the same patch seems worth preserving.

-- 
Peter Geoghegan



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-28 Thread Tomas Vondra



On 3/1/19 12:41 AM, Peter Geoghegan wrote:
> On Thu, Feb 28, 2019 at 3:09 PM Tom Lane  wrote:
>> The only thing that's really clear is that some senior committers don't
>> want to be bothered because they don't think there's a problem here that
>> justifies any additional expenditure of their time.  Perhaps they are
>> right, because I'd expected some comments from non-committer developers
>> confirming that they see a problem, and the silence is deafening.
> 
> I don't think that you can take that as too strong a signal. The
> incentives are different for non-committers.
> 
>> I'm inclined to commit some form of Naylor's tool improvement anyway,
>> because I have use for it; I remember times when I've renumbered OIDs
>> manually in patches, and it wasn't much fun.  But I can't force a
>> process change if there's not consensus for it among the committers.
> 
> I think that that's a reasonable thing to do, provided there is
> obvious feedback that makes it highly unlikely that the committer will
> make an error at the last moment. I have a hard time coming up with a
> suggestion that won't be considered annoying by at least one person,
> though.
> 

FWIW I personally would not mind if such tool / process was added. But I
have a related question - do we have some sort of list of such processes
that I could check? That is, a list of stuff that is expected to be done
by a committer before a commit?

I do recall we have [1], but perhaps we have something else.

https://wiki.postgresql.org/wiki/Committing_checklist


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-28 Thread Peter Geoghegan
On Thu, Feb 28, 2019 at 3:09 PM Tom Lane  wrote:
> The only thing that's really clear is that some senior committers don't
> want to be bothered because they don't think there's a problem here that
> justifies any additional expenditure of their time.  Perhaps they are
> right, because I'd expected some comments from non-committer developers
> confirming that they see a problem, and the silence is deafening.

I don't think that you can take that as too strong a signal. The
incentives are different for non-committers.

> I'm inclined to commit some form of Naylor's tool improvement anyway,
> because I have use for it; I remember times when I've renumbered OIDs
> manually in patches, and it wasn't much fun.  But I can't force a
> process change if there's not consensus for it among the committers.

I think that that's a reasonable thing to do, provided there is
obvious feedback that makes it highly unlikely that the committer will
make an error at the last moment. I have a hard time coming up with a
suggestion that won't be considered annoying by at least one person,
though.

Would it be awful if there was a #warning directive that kicked in
when the temporary OID range is in use? It should be possible to do
that without breaking -Werror builds, which I believe Robert uses (I
am reminded of the Flex bug that we used to have to work around). It's
not like there are that many patches that need to assign OIDs to new
catalog entries. I would suggest that we put the warning in the
regression tests if I didn't know that that could be missed by the use
of parallel variants, where the output flies by. There is no precedent
for using #warning for something like that, but offhand it seems like
the only thing that would work consistently.

I don't really mind having to do slightly more work when the issue
crops up, especially if that means less work for everyone involved in
aggregate, which is the cost that I'm concerned about the most.
However, an undocumented or under-documented process that requires a
fixed amount of extra mental effort when committing *anything* is
another matter.

-- 
Peter Geoghegan



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-28 Thread Tom Lane
Robert Haas  writes:
>>> just as a thought, what if we stopped assigning manual OIDs for new
>>> catalog entries altogether, except for once at the end of each release
>>> cycle?

Actually ... that leads to an idea that wouldn't add any per-commit
overhead, or really much change at all to existing processes.  Given
the existence of a reliable OID-renumbering tool, we could:

1. Encourage people to develop new patches using chosen-at-random
high OIDs, in the 7K-9K range.  They do this already, it'd just
be encouraged instead of discouraged.

2. Commit patches as received.

3. Once each devel cycle, after feature freeze, somebody uses the
renumbering tool to shove all the new OIDs down to lower numbers,
freeing the high-OID range for the next devel cycle.  We'd have
to remember to do that, but it could be added to the RELEASE_CHANGES
checklist.

In this scheme, OID collisions are a problem for in-progress patches
only if two patches are unlucky enough to choose the same random
high OIDs during the same devel cycle.  That's unlikely, or at least
a good bit less likely than collisions are today.  If/when it does
happen we'd have a couple of alternatives for ameliorating the problem
--- either the not-yet-committed patch could use the renumbering tool
on their own OIDs, or we could do an off-schedule run of step 3 to get
the already-committed OIDs out of their way.

regards, tom lane



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-28 Thread Tom Lane
Peter Geoghegan  writes:
> On Thu, Feb 28, 2019 at 7:59 AM Robert Haas  wrote:
>> OK.  Well, I think that doing nothing is superior to this proposal,
>> for reasons similar to what Peter Eisentraut has already articulated.
>> And I think rather than blasting forward with your own preferred
>> alternative in the face of disagreement, you should be willing to
>> discuss other possible options.  But if you're not willing to do that,
>> I can't make you.

> Peter seemed to not want to do this on the grounds that it isn't
> necessary at all, whereas you think that it doesn't go far enough. If
> there is a consensus against what Tom has said, it's a cacophonous one
> that cannot really be said to be in favor of anything.

The only thing that's really clear is that some senior committers don't
want to be bothered because they don't think there's a problem here that
justifies any additional expenditure of their time.  Perhaps they are
right, because I'd expected some comments from non-committer developers
confirming that they see a problem, and the silence is deafening.

I'm inclined to commit some form of Naylor's tool improvement anyway,
because I have use for it; I remember times when I've renumbered OIDs
manually in patches, and it wasn't much fun.  But I can't force a
process change if there's not consensus for it among the committers.

regards, tom lane



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-28 Thread Peter Geoghegan
On Thu, Feb 28, 2019 at 7:59 AM Robert Haas  wrote:
> I don't think this is the worst proposal ever.  However, I also think
> that it's not unreasonable to raise the issue that writing OR
> reviewing OR committing a patch already involves adhering to a thicket
> of undocumented rules.  When somebody fails to adhere to one of those
> rules, they get ignored or publicly shamed.  Now you want to add yet
> another step to the process - really two.

There does seem to be a real problem with undocumented processes. For
example, I must confess that it came as news to me that we already had
a reserved OID range. However, I don't think that there is that much
of an issue with adding new mechanisms like this, provided it makes it
easy to do the right thing and hard to do the wrong thing. What Tom
has proposed so far is not something that self-evidently meets that
standard, but it's also not something that self-evidently fails to
meet that standard.

I have attempted to institute some general guidelines for what the
thicket of rules are by creating the "committing checklist" page. This
is necessarily imperfect, because the rules are in many cases open to
interpretation, often for good practical reasons. I don't have any
sympathy for committers that find it hard to remember to do a
catversion bump with any kind of regularity. That complexity seems
inherent, not incidental, since it's often convenient to ignore
catalog incompatibilities during development.

> So, I suspect that for every
> unit of work it saves somebody, it's probably going to generate about
> one unit of extra work for somebody else.

Maybe so. I think that you're jumping to conclusions, though.

> A lot of projects have a much less painful process for getting patches
> integrated than we do.  I don't know how those projects maintain
> adequate code quality, but I do know that making it easy to get a
> patch accepted makes people more likely to contribute patches, and
> increases overall development velocity.  It is not even vaguely
> unreasonable to worry about whether making this more complicated is
> going to hurt more than it helps, and I don't know why you think
> otherwise.

But you seem to want to make the mechanism itself even more
complicated, not less complicated (based on your remarks about making
OID assignment happen during the build). In order to make the use of
the mechanism easier. That seems worth considering, but ISTM that this
is talking at cross purposes. There are far simpler ways of making it
unlikely that a committer is going to miss this step. There is also a
simple way of noticing that they do quickly (e.g. a simple buildfarm
test).

> Right now every entry in pg_proc.dat includes an OID assignment.  What
> I'm proposing is that we would also allow entries that did not have
> one, and the build process would assign one while processing the .dat
> files.  Then later, somebody could use a script that went through and
> rewrote the .dat file to add OID assignments to any entries that
> lacked them.  Since the topic of having tools for automated rewrite of
> those files has been discussed at length, and since we already have a
> script called reformat_dat_file.pl in the tree which  contains
> comments indicating that it could be modified for bulk editing, said
> script having been committed BY YOU, I don't understand why you think
> that bulk editing is infeasible.

I'm also curious to hear what Tom thinks about this.

> OK.  Well, I think that doing nothing is superior to this proposal,
> for reasons similar to what Peter Eisentraut has already articulated.
> And I think rather than blasting forward with your own preferred
> alternative in the face of disagreement, you should be willing to
> discuss other possible options.  But if you're not willing to do that,
> I can't make you.

Peter seemed to not want to do this on the grounds that it isn't
necessary at all, whereas you think that it doesn't go far enough. If
there is a consensus against what Tom has said, it's a cacophonous one
that cannot really be said to be in favor of anything.

-- 
Peter Geoghegan



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-28 Thread Robert Haas
On Thu, Feb 28, 2019 at 10:27 AM Tom Lane  wrote:
> Robert Haas  writes:
> > This seems like a big piece of new mechanism being invented
> > to solve an occasional annoyance. Your statistics are not convincing
> > at all: you're arguing that this is a big problem because 2-3% of
> > pending patches current have an issue here, and some others have in
> > the past, but that's a really small percentage, and the time spent
> > doing OID renumbering must be a tiny percentage of the total time
> > anyone spends hacking on PostgreSQL.
>
> TBH, I find this position utterly baffling.  It's true that only a
> small percentage of patches have an issue here, because only a small
> percentage of patches dabble in manually-assigned OIDs at all.  But
> *among those that do*, there is a huge problem.  I had not actually
> realized how bad it is until I gathered those stats, but it's bad.
>
> I don't understand the objection to inventing a mechanism that will
> help those patches and has no impact whatever when working on patches
> that don't involve manually-assigned OIDs.
>
> And, yeah, I'd like us not to have patches hanging around for years
> either, but that's a reality that's not going away.

I don't think this is the worst proposal ever.  However, I also think
that it's not unreasonable to raise the issue that writing OR
reviewing OR committing a patch already involves adhering to a thicket
of undocumented rules.  When somebody fails to adhere to one of those
rules, they get ignored or publicly shamed.  Now you want to add yet
another step to the process - really two.  If you want to submit a
patch that requires new catalog entries, you must know that you're
supposed to put those OIDs in this new range that we're going to set
aside for such things, and if you want to commit one, you must know
that you're suppose to renumber those OID assignments into some other
range.  And people are going to screw it up - submitters are going to
fail to know about this new policy (which will probably be documented
nowhere, just like all the other ones) - and committers are going to
fail to remember to renumber things.  So, I suspect that for every
unit of work it saves somebody, it's probably going to generate about
one unit of extra work for somebody else.

A lot of projects have a much less painful process for getting patches
integrated than we do.  I don't know how those projects maintain
adequate code quality, but I do know that making it easy to get a
patch accepted makes people more likely to contribute patches, and
increases overall development velocity.  It is not even vaguely
unreasonable to worry about whether making this more complicated is
going to hurt more than it helps, and I don't know why you think
otherwise.

> > but
> > just as a thought, what if we stopped assigning manual OIDs for new
> > catalog entries altogether, except for once at the end of each release
> > cycle?
>
> And that's another idea without any basis in reality.  What are you
> going to do instead?  What mechanism will you use to track these
> OIDs so you can clean up later?

Right now every entry in pg_proc.dat includes an OID assignment.  What
I'm proposing is that we would also allow entries that did not have
one, and the build process would assign one while processing the .dat
files.  Then later, somebody could use a script that went through and
rewrote the .dat file to add OID assignments to any entries that
lacked them.  Since the topic of having tools for automated rewrite of
those files has been discussed at length, and since we already have a
script called reformat_dat_file.pl in the tree which  contains
comments indicating that it could be modified for bulk editing, said
script having been committed BY YOU, I don't understand why you think
that bulk editing is infeasible.

> Who's going to write the code that
> will support this?  Not me.  I think the proposal that is on the
> table is superior.

OK.  Well, I think that doing nothing is superior to this proposal,
for reasons similar to what Peter Eisentraut has already articulated.
And I think rather than blasting forward with your own preferred
alternative in the face of disagreement, you should be willing to
discuss other possible options.  But if you're not willing to do that,
I can't make you.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-28 Thread Tom Lane
Robert Haas  writes:
> This seems like a big piece of new mechanism being invented
> to solve an occasional annoyance. Your statistics are not convincing
> at all: you're arguing that this is a big problem because 2-3% of
> pending patches current have an issue here, and some others have in
> the past, but that's a really small percentage, and the time spent
> doing OID renumbering must be a tiny percentage of the total time
> anyone spends hacking on PostgreSQL.

TBH, I find this position utterly baffling.  It's true that only a
small percentage of patches have an issue here, because only a small
percentage of patches dabble in manually-assigned OIDs at all.  But
*among those that do*, there is a huge problem.  I had not actually
realized how bad it is until I gathered those stats, but it's bad.

I don't understand the objection to inventing a mechanism that will
help those patches and has no impact whatever when working on patches
that don't involve manually-assigned OIDs.

And, yeah, I'd like us not to have patches hanging around for years
either, but that's a reality that's not going away.

> We could fix that problem by caring less about keeping all the numbers
> gapless and increasing the size of the reserved space to say 100k,

We already had this discussion.  Moving FirstNormalObjectId is infeasible
without forcing a dump/reload, which I don't think anyone wants to do.

> but
> just as a thought, what if we stopped assigning manual OIDs for new
> catalog entries altogether, except for once at the end of each release
> cycle?

And that's another idea without any basis in reality.  What are you
going to do instead?  What mechanism will you use to track these
OIDs so you can clean up later?  Who's going to write the code that
will support this?  Not me.  I think the proposal that is on the
table is superior.

regards, tom lane



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-28 Thread Robert Haas
On Wed, Feb 27, 2019 at 10:41 PM Tom Lane  wrote:
> In short, this situation may look fine from the perspective of a committer
> with a relatively short timeline to commit, but it's pretty darn awful for
> everybody else.  The only way to avoid a ~ 50% failure rate is to choose
> OIDs above 6K, and once everybody starts doing it like that, things are
> going to get very unpleasant very quickly.

The root problem here from the perspective of a non-committer is not
that they might have to renumber OIDs a few times over the year or two
it takes to get their patch merged, but rather that it takes a year or
two to get their patch merged.  That's not to say that I have no
sympathy with people in that situation or don't want to make their
lives easier, but I'm not really convinced that burdening committers
with additional manual steps is the right way to get patches merged
faster.  This seems like a big piece of new mechanism being invented
to solve an occasional annoyance. Your statistics are not convincing
at all: you're arguing that this is a big problem because 2-3% of
pending patches current have an issue here, and some others have in
the past, but that's a really small percentage, and the time spent
doing OID renumbering must be a tiny percentage of the total time
anyone spends hacking on PostgreSQL.

I think that the problem here is have a very limited range of OIDs
(10k) which can be used for this purpose, and the number of OIDs that
are used in that space is now a significant fraction of the total
(>4.5k), and the problem is further complicated by the desire to keep
the OIDs assigned near the low end of the available numbering space
and/or near to other OIDs used for similar purposes.  The sheer fact
that the address space is nearly half-used means that conflicts are
likely even if people choose OIDs at random, and when people choose
OIDs non-randomly -- lowest, round numbers, near to other OIDs -- the
chances of conflicts just go up.

We could fix that problem by caring less about keeping all the numbers
gapless and increasing the size of the reserved space to say 100k, but
just as a thought, what if we stopped assigning manual OIDs for new
catalog entries altogether, except for once at the end of each release
cycle?  Make a way that people can add an entry to pg_proc.dat or
whatever without fixing an OID, and let the build scripts generate
one.  As many patches as happen during a release cycle will add new
such entries and they'll just all get some OID or other.  Then, at the
end of the release cycle, we'll run a script that finds all of those
catalog entries and rewrites the .dat files, adding a permanent OID
assignment to each one, so that those OIDs will then be fixed for all
future releases (unless we drop the entries or explicitly change
something).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-27 Thread Tom Lane
Peter Eisentraut  writes:
> On 2019-02-27 22:27, Tom Lane wrote:
>>> OID collision doesn't seem to be a significant problem (for me).

>> Um, I beg to differ.  It's not at all unusual for pending patches to
>> bit-rot for no reason other than suddenly getting an OID conflict.
>> I don't have to look far for a current example:

> I'm not saying it doesn't happen, but that it's not a significant
> problem overall.

I do not think you are correct.  It may not be a big problem across
all our incoming patches, but that's only because most of them don't
have anything to do with hand-assigned OIDs.  Among those that do,
I think there is a significant problem.

To try to quantify this a bit, I looked through v12-cycle and pending
patches that touch the catalog data.

We've only committed 12 patches adding new hand-assigned OIDs since v11
was branched off.  (I suspect that's lower than in a typical cycle,
but have not attempted to quantify things further back.)  Of those,
only two seem to have needed OID adjustments after initial posting,
but that's mostly because most of them were committer-originated patches
that got pushed within a week or two.  That's certainly not the typical
wait time for a patch submitted by anybody else.  Also, a lot of these
patches recycled OIDs that'd recently been freed by patches such as the
abstime-ectomy, which means that the amount of OID conflict created for
pending patches is probably *really* low in this cycle-so-far, compared
to our historical norms.

Of what's in the queue to be reviewed right now, there are just
20 (out of 150-plus) patches that touch the catalog/*.dat files.
I got this number by groveling through the cfbot's reports of
patch applications, to see which patches touched those files.
It might omit some patches that the cfbot failed to make sense of.
Also, I'm pretty sure that a few of these patches don't actually
assign any new OIDs but just change existing entries, or create
only entries with auto-assigned OIDs.  I did not try to separate
those out, however, since the point here is to estimate for how
many patches a committer would even need to think about this.

Of those twenty patches, three have unresolved OID conflicts
right now:

multivariate MCV lists and histograms
commontype polymorphics
log10/hyper functions

Another one has recently had to resolve an OID conflict:

SortSupport implementation on inet/cdir 

which is notable considering that that thread is less than three weeks
old.  (The log10 and commontype threads aren't really ancient either.)

I spent some extra effort looking at the patches that both create more
than a few new OIDs and have been around for awhile:

Generic type subscripting
KNN for btree
Custom compression methods
SQL/JSON: functions
SQL/JSON: jsonpath
Generated columns
BRIN bloom indexes

The first four of those have all had to reassign OIDs during their
lifetime.  jsonpath has avoided doing so by choosing fairly high-numbered
OIDs (6K range) to begin with; which I trust you will agree is a solution
that doesn't scale for long.  I'm not entirely sure that the last two
haven't had to renumber OIDs; I ran out of energy before poking through
their history in detail.

In short, this situation may look fine from the perspective of a committer
with a relatively short timeline to commit, but it's pretty darn awful for
everybody else.  The only way to avoid a ~ 50% failure rate is to choose
OIDs above 6K, and once everybody starts doing it like that, things are
going to get very unpleasant very quickly.

regards, tom lane



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-27 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Feb 27, 2019 at 2:44 PM Peter Eisentraut
>  wrote:
>> If this is the problem (although I think we'd find that OID collisions
>> are rather rare compared to other gratuitous cfbot failures), why not
>> have the cfbot build with a flag that ignores OID collisions?

> How would that work?

It could work for conflicting OIDs in different system catalogs (so that
the "conflict" is an artifact of our assignment rules rather than an
intrinsic problem).  But I think the majority of new hand-assigned OIDs
are in pg_proc, so that this kind of hack would not help as much as one
might wish.

regards, tom lane



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-27 Thread Peter Geoghegan
On Wed, Feb 27, 2019 at 2:44 PM Peter Eisentraut
 wrote:
> If this is the problem (although I think we'd find that OID collisions
> are rather rare compared to other gratuitous cfbot failures), why not
> have the cfbot build with a flag that ignores OID collisions?

How would that work?

-- 
Peter Geoghegan



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-27 Thread Peter Eisentraut
On 2019-02-27 22:50, Peter Geoghegan wrote:
> However, the continuous
> integration stuff has created an expectation that your patch shouldn't
> be left to bitrot for long. Silly mechanical bitrot now seems like a
> much bigger problem than it was before these developments. It unfairly
> puts reviewers off engaging.

If this is the problem (although I think we'd find that OID collisions
are rather rare compared to other gratuitous cfbot failures), why not
have the cfbot build with a flag that ignores OID collisions?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-27 Thread Peter Geoghegan
On Wed, Feb 27, 2019 at 2:39 PM Peter Eisentraut
 wrote:
> The changes of a patch (a) allocating a new OID, (b) a second patch
> allocating a new OID, (c) both being in flight at the same time, (d)
> actually picking the same OID, are small.

But...they are. Most patches don't create new system catalog entries
at all. Of those that do, the conventions around assigning new OIDs
make it fairly likely that problems will emerge.

> I guess the overall time lost
> to this issue is perhaps 2 hours per year.  On the other hand, with
> about 2000 commits to master per year, if this renumbering business only
> adds 2 seconds of overhead to committing, we're coming out behind.

The time spent on the final commit is not the cost we're concerned
about, though. It isn't necessary to do that more than once, whereas
all but the most trivial of patches receive multiple rounds of review
and revision.

-- 
Peter Geoghegan



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-27 Thread Peter Eisentraut
On 2019-02-27 22:27, Tom Lane wrote:
>> OID collision doesn't seem to be a significant problem (for me).
> 
> Um, I beg to differ.  It's not at all unusual for pending patches to
> bit-rot for no reason other than suddenly getting an OID conflict.
> I don't have to look far for a current example:

I'm not saying it doesn't happen, but that it's not a significant
problem overall.

The changes of a patch (a) allocating a new OID, (b) a second patch
allocating a new OID, (c) both being in flight at the same time, (d)
actually picking the same OID, are small.  I guess the overall time lost
to this issue is perhaps 2 hours per year.  On the other hand, with
about 2000 commits to master per year, if this renumbering business only
adds 2 seconds of overhead to committing, we're coming out behind.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-27 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Feb 27, 2019 at 1:27 PM Tom Lane  wrote:
>>> OID collision doesn't seem to be a significant problem (for me).

>> Um, I beg to differ.  It's not at all unusual for pending patches to
>> bit-rot for no reason other than suddenly getting an OID conflict.
>> I don't have to look far for a current example:
>> https://travis-ci.org/postgresql-cfbot/postgresql/builds/498955351

> Patch authors shouldn't be left with any excuse for leaving their
> patch to bitrot for long. And, more casual patch reviewers shouldn't
> have any excuse for not downloading a patch and applying it locally,
> so that they can spend a spare 10 minutes kicking the tires.

Yeah, that latter point is really the killer argument.  We don't want
to make people spend valuable review time on fixing uninteresting OID
conflicts.  It's even more annoying that several people might have to
duplicate the same work, if they're testing a patch independently.

Given a convention that under-development patches use OIDs in the 9K
range, the only time anybody would have to resolve OID conflicts for
testing would be if they were trying to test the combination of two
or more patches.  Even then, an OID-renumbering script would make it
pretty painless: apply patch 1, renumber its OIDs to someplace else,
apply patch 2, repeat as needed.

> Why not have unused_oids reference the convention as a "tip"?

Hmm, could be helpful.

regards, tom lane



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-27 Thread Tom Lane
I wrote:
> We do need a couple of pieces of new infrastructure to make this idea
> conveniently workable.  One is a tool to allow automatic OID renumbering
> instead of having to do it by hand; Naylor has a draft for that upthread.

Oh: arguably, something else we'd need to do to ensure that OID
renumbering is trouble-free is to institute a strict rule that OID
references in the *.dat files must be symbolic.  We had not bothered
to convert every single reference type before, reasoning that some
of them were too little-used to be worth the trouble; but someday
that'll rise up to bite us, if semi-automated renumbering becomes
a thing.

It looks to me like the following OID columns remain unconverted:

pg_class.reltype
pg_database.dattablespace
pg_ts_config.cfgparser
pg_ts_config_map.mapcfg, mapdict
pg_ts_dict.dicttemplate
pg_type.typcollation
pg_type.typrelid

regards, tom lane



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-27 Thread Peter Geoghegan
On Wed, Feb 27, 2019 at 1:27 PM Tom Lane  wrote:
> > OID collision doesn't seem to be a significant problem (for me).
>
> Um, I beg to differ.  It's not at all unusual for pending patches to
> bit-rot for no reason other than suddenly getting an OID conflict.
> I don't have to look far for a current example:
> https://travis-ci.org/postgresql-cfbot/postgresql/builds/498955351

OID conflicts are not that big of a deal when building a patch
locally, because almost everyone knows what the exact problem is
immediately, and because you probably have more than a passing
interest in the patch to even do that much. However, the continuous
integration stuff has created an expectation that your patch shouldn't
be left to bitrot for long. Silly mechanical bitrot now seems like a
much bigger problem than it was before these developments. It unfairly
puts reviewers off engaging.

Patch authors shouldn't be left with any excuse for leaving their
patch to bitrot for long. And, more casual patch reviewers shouldn't
have any excuse for not downloading a patch and applying it locally,
so that they can spend a spare 10 minutes kicking the tires.

> Perhaps it'd be useful for genbki.pl to spit out a warning (NOT an
> error) if it sees OIDs in the reserved range.  I'm not sure that that'd
> really be worth the trouble though, since one could easily forget
> about it while reviewing/testing just before commit, and it'd just be
> useless noise up until it was time to commit.

My sense is that we should err on the side of being informative.

> Another issue, as Robert pointed out, is that this does need to be
> a formal convention not something undocumented.  Naylor's patch adds
> a mention of it in bki.sgml, but I wonder if anyplace else should
> talk about it.

Why not have unused_oids reference the convention as a "tip"?

> I concede your point that a prudent committer would do a rebuild and
> retest rather than just trusting the tool.  But really, how much
> extra work is that?  If you've spent any time optimizing your workflow,
> a full rebuild and check-world should be under five minutes on any
> hardware anyone would be using for development today.

If you use the "check-world parallel" recipe on the committing
checklist Wiki page, and if you use ccache, ~2 minutes is attainable
for optimized builds (though the recipe doesn't work on all release
branches). I don't think that a committer should be a committing
anything if they're not willing to do this much. It's not just prudent
-- it is the *bare minimum* when committing a patch that creates
system catalog entries.

-- 
Peter Geoghegan



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-27 Thread Tom Lane
Peter Eisentraut  writes:
> On 2019-02-08 19:14, Tom Lane wrote:
>> Quite a few people have used OIDs up around 8000 or 9000 for this purpose;
>> I doubt we need a formally reserved range for it.  The main problem with
>> doing it is the hazard that the patch'll get committed just like that,
>> suddenly breaking things for everyone else doing likewise.

> For that reason, I'm not in favor of this.  Forgetting to update the
> catversion is already common enough (for me).  Adding another step
> between having a seemingly final patch and being able to actually commit
> it doesn't seem attractive.  Moreover, these "final adjustments" would
> tend to require a full rebuild and retest, adding even more overhead.

> OID collision doesn't seem to be a significant problem (for me).

Um, I beg to differ.  It's not at all unusual for pending patches to
bit-rot for no reason other than suddenly getting an OID conflict.
I don't have to look far for a current example:
https://travis-ci.org/postgresql-cfbot/postgresql/builds/498955351

We do need a couple of pieces of new infrastructure to make this idea
conveniently workable.  One is a tool to allow automatic OID renumbering
instead of having to do it by hand; Naylor has a draft for that upthread.

Perhaps it'd be useful for genbki.pl to spit out a warning (NOT an
error) if it sees OIDs in the reserved range.  I'm not sure that that'd
really be worth the trouble though, since one could easily forget
about it while reviewing/testing just before commit, and it'd just be
useless noise up until it was time to commit.

Another issue, as Robert pointed out, is that this does need to be
a formal convention not something undocumented.  Naylor's patch adds
a mention of it in bki.sgml, but I wonder if anyplace else should
talk about it.

I concede your point that a prudent committer would do a rebuild and
retest rather than just trusting the tool.  But really, how much
extra work is that?  If you've spent any time optimizing your workflow,
a full rebuild and check-world should be under five minutes on any
hardware anyone would be using for development today.

And, yeah, we probably will make mistakes like this, just like we
sometimes forget the catversion bump.  As long as we have a tool
for OID renumbering, I don't think that's the end of the world.
Fixing it after the fact isn't going to be a big deal, any more
than it is for catversion.

regards, tom lane



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-27 Thread Peter Eisentraut
On 2019-02-08 19:14, Tom Lane wrote:
> Quite a few people have used OIDs up around 8000 or 9000 for this purpose;
> I doubt we need a formally reserved range for it.  The main problem with
> doing it is the hazard that the patch'll get committed just like that,
> suddenly breaking things for everyone else doing likewise.

For that reason, I'm not in favor of this.  Forgetting to update the
catversion is already common enough (for me).  Adding another step
between having a seemingly final patch and being able to actually commit
it doesn't seem attractive.  Moreover, these "final adjustments" would
tend to require a full rebuild and retest, adding even more overhead.

OID collision doesn't seem to be a significant problem (for me).

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-23 Thread John Naylor
I wrote:

> Along those lines, here's a draft patch to do just that. It handles
> array type oids as well. Run it like this:
>
> perl  reformat_dat_file.pl  --map-from 9000  --map-to 2000  *.dat
>
> There is some attempt at documentation. So far it doesn't map by
> default, but that could be changed if we agreed on the convention of
> 9000 or whatever.

In case we don't want to lose track of this, I added it to the March
commitfest with a target of v13. (I didn't see a way to  add it to the
July commitfest)

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-14 Thread John Naylor
I wrote:

> On 2/8/19, Tom Lane  wrote:
>> A script such as you suggest might be a good way to reduce the temptation
>> to get lazy at the last minute.  Now that the catalog data is pretty
>> machine-readable, I suspect it wouldn't be very hard --- though I'm
>> not volunteering either.  I'm envisioning something simple like "renumber
>> all OIDs in range - into range -", perhaps with the
>> ability to skip any already-used OIDs in the target range.
>
> This might be something that can be done inside reformat_dat_files.pl.
> It's a little outside it's scope, but better than the alternatives.

Along those lines, here's a draft patch to do just that. It handles
array type oids as well. Run it like this:

perl  reformat_dat_file.pl  --map-from 9000  --map-to 2000  *.dat

There is some attempt at documentation. So far it doesn't map by
default, but that could be changed if we agreed on the convention of
9000 or whatever.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/bki.sgml b/doc/src/sgml/bki.sgml
index 3c5830bc8a..27b9b52e33 100644
--- a/doc/src/sgml/bki.sgml
+++ b/doc/src/sgml/bki.sgml
@@ -392,6 +392,22 @@
 at compile time.)

 
+   
+For a variety of reasons, newly assigned OIDs should occupy the lower
+end of the reserved range.  Because multiple patches considered for
+inclusion into PostgreSQL could be using
+the same new OIDs, there is the possibilty of conflict.  To mitigate
+this, there is a convention that OIDs 9000 and over are reserved for
+development.  This reduces the effort in rebasing over the HEAD branch.
+To enable committers to move these OIDs to the lower range easily,
+reformat_dat_file.pl can map OIDs as in the
+following, which maps any OIDs 9000 and over to unused OIDs starting
+at 2000:
+
+$ perl  reformat_dat_file.pl  --map-from 9000  --map-to 2000  *.dat
+
+   
+

 The OID counter starts at 1 at the beginning of a bootstrap run.
 If a row from a source other than postgres.bki
diff --git a/src/include/catalog/Makefile b/src/include/catalog/Makefile
index ae797d2465..33fbcf7677 100644
--- a/src/include/catalog/Makefile
+++ b/src/include/catalog/Makefile
@@ -13,19 +13,16 @@ subdir = src/include/catalog
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-# location of Catalog.pm
-catalogdir = $(top_srcdir)/src/backend/catalog
-
 # 'make reformat-dat-files' is a convenience target for rewriting the
 # catalog data files in our standard format.  This includes collapsing
 # out any entries that are redundant with a BKI_DEFAULT annotation.
 reformat-dat-files:
-	$(PERL) -I $(catalogdir) $(srcdir)/reformat_dat_file.pl -o $(srcdir) $(srcdir)/pg_*.dat
+	$(PERL) $(srcdir)/reformat_dat_file.pl --output $(srcdir) $(srcdir)/pg_*.dat
 
 # 'make expand-dat-files' is a convenience target for expanding out all
 # default values in the catalog data files.  This should be run before
 # altering or removing any BKI_DEFAULT annotation.
 expand-dat-files:
-	$(PERL) -I $(catalogdir) $(srcdir)/reformat_dat_file.pl -o $(srcdir) $(srcdir)/pg_*.dat --full-tuples
+	$(PERL) $(srcdir)/reformat_dat_file.pl --output $(srcdir) $(srcdir)/pg_*.dat --full-tuples
 
 .PHONY: reformat-dat-files expand-dat-files
diff --git a/src/include/catalog/reformat_dat_file.pl b/src/include/catalog/reformat_dat_file.pl
index 4835c2e41b..05c157ed48 100755
--- a/src/include/catalog/reformat_dat_file.pl
+++ b/src/include/catalog/reformat_dat_file.pl
@@ -19,10 +19,14 @@
 
 use strict;
 use warnings;
+use Getopt::Long;
+
+# Must run in src/include/catalog
+use FindBin;
+chdir $FindBin::RealBin or die "could not cd to $FindBin::RealBin: $!\n";
 
 # If you copy this script to somewhere other than src/include/catalog,
 # you'll need to modify this "use lib" or provide a suitable -I switch.
-use FindBin;
 use lib "$FindBin::RealBin/../../backend/catalog/";
 use Catalog;
 
@@ -34,35 +38,27 @@ use Catalog;
 my @METADATA =
   ('oid', 'oid_symbol', 'array_type_oid', 'descr', 'autogenerated');
 
-my @input_files;
+my $FirstGenbkiObjectId =
+  Catalog::FindDefinedSymbol('access/transam.h', '..',
+	'FirstGenbkiObjectId');
+
 my $output_path = '';
 my $full_tuples = 0;
 
-# Process command line switches.
-while (@ARGV)
-{
-	my $arg = shift @ARGV;
-	if ($arg !~ /^-/)
-	{
-		push @input_files, $arg;
-	}
-	elsif ($arg =~ /^-o/)
-	{
-		$output_path = length($arg) > 2 ? substr($arg, 2) : shift @ARGV;
-	}
-	elsif ($arg eq '--full-tuples')
-	{
-		$full_tuples = 1;
-	}
-	else
-	{
-		usage();
-	}
-}
+# Don't map any OIDs by default
+my $min_reserved_oid = $FirstGenbkiObjectId;
+my $first_target_oid = 1;
+
+GetOptions(
+	'output:s'=> \$output_path,
+	'full-tuples' => \$full_tuples,
+	'map-from:i'  => \$min_reserved_oid,
+	'map-to:i'=> \$first_target_oid) || usage();
 
 # Sanity check arguments.
-die "No input files.\n"
-  if !@input_files;
+die 

Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-11 Thread John Naylor
On 2/8/19, Tom Lane  wrote:
> A script such as you suggest might be a good way to reduce the temptation
> to get lazy at the last minute.  Now that the catalog data is pretty
> machine-readable, I suspect it wouldn't be very hard --- though I'm
> not volunteering either.  I'm envisioning something simple like "renumber
> all OIDs in range - into range -", perhaps with the
> ability to skip any already-used OIDs in the target range.

This might be something that can be done inside reformat_dat_files.pl.
It's a little outside it's scope, but better than the alternatives.
And we already have a function in Catalog.pm to get the currently used
oids. I'll volunteer to look into it but I don't know when that will
be.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-09 Thread Robert Haas
On Fri, Feb 8, 2019 at 11:59 PM Tom Lane  wrote:
> To the extent that this works at all, OIDs in the 9000 range ought
> to be enough of a flag already, I think.

A "flag" that isn't documented anywhere outside of a mailing list
discussion and that isn't checked by any code anywhere is not much of
a flag, IMHO.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-08 Thread Peter Geoghegan
On Fri, Feb 8, 2019 at 10:29 AM Tom Lane  wrote:
> Um.  That would not be just an add-on script but something that
> genbki.pl would have to accept.  I'm not excited about that; it would
> complicate what's already complex, and if it works enough for test
> purposes then it wouldn't really stop a committer who wasn't paying
> attention from committing the patch un-revised.
>
> To the extent that this works at all, OIDs in the 9000 range ought
> to be enough of a flag already, I think.

I tend to agree that this isn't enough of a problem to justify making
genbki.pl significantly more complicated.

-- 
Peter Geoghegan



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-08 Thread Tom Lane
Peter Geoghegan  writes:
> On Fri, Feb 8, 2019 at 10:14 AM Tom Lane  wrote:
>> A script such as you suggest might be a good way to reduce the temptation
>> to get lazy at the last minute.  Now that the catalog data is pretty
>> machine-readable, I suspect it wouldn't be very hard --- though I'm
>> not volunteering either.  I'm envisioning something simple like "renumber
>> all OIDs in range - into range -", perhaps with the
>> ability to skip any already-used OIDs in the target range.

> I imagined that the machine-readable catalog data would allow us to
> assign non-numeric identifiers to this OID range. Perhaps there'd be a
> textual symbol with a number in the range of 0-20 at the end. Those
> would stick out like a sore thumb, making it highly unlikely that
> anybody would forget about it at the last minute.

Um.  That would not be just an add-on script but something that
genbki.pl would have to accept.  I'm not excited about that; it would
complicate what's already complex, and if it works enough for test
purposes then it wouldn't really stop a committer who wasn't paying
attention from committing the patch un-revised.

To the extent that this works at all, OIDs in the 9000 range ought
to be enough of a flag already, I think.

regards, tom lane



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-08 Thread Peter Geoghegan
On Fri, Feb 8, 2019 at 10:14 AM Tom Lane  wrote:
> A script such as you suggest might be a good way to reduce the temptation
> to get lazy at the last minute.  Now that the catalog data is pretty
> machine-readable, I suspect it wouldn't be very hard --- though I'm
> not volunteering either.  I'm envisioning something simple like "renumber
> all OIDs in range - into range -", perhaps with the
> ability to skip any already-used OIDs in the target range.

I imagined that the machine-readable catalog data would allow us to
assign non-numeric identifiers to this OID range. Perhaps there'd be a
textual symbol with a number in the range of 0-20 at the end. Those
would stick out like a sore thumb, making it highly unlikely that
anybody would forget about it at the last minute.

-- 
Peter Geoghegan



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-08 Thread Tom Lane
Peter Geoghegan  writes:
> Why don't we provide a small reserved OID range that can be used by
> patch authors temporarily, with the expectation that they'll be
> replaced by "real" OIDs at the point the patch gets committed? This
> would be similar the situation with catversion bumps -- we don't
> expect patches that will eventually need them to have them.

Quite a few people have used OIDs up around 8000 or 9000 for this purpose;
I doubt we need a formally reserved range for it.  The main problem with
doing it is the hazard that the patch'll get committed just like that,
suddenly breaking things for everyone else doing likewise.

(I would argue, in fact, that the reason we have any preassigned OIDs
above perhaps 6000 is that exactly this has happened before.)

A script such as you suggest might be a good way to reduce the temptation
to get lazy at the last minute.  Now that the catalog data is pretty
machine-readable, I suspect it wouldn't be very hard --- though I'm
not volunteering either.  I'm envisioning something simple like "renumber
all OIDs in range - into range -", perhaps with the
ability to skip any already-used OIDs in the target range.

regards, tom lane



Why don't we have a small reserved OID range for patch revisions?

2019-02-08 Thread Peter Geoghegan
Why don't we provide a small reserved OID range that can be used by
patch authors temporarily, with the expectation that they'll be
replaced by "real" OIDs at the point the patch gets committed? This
would be similar the situation with catversion bumps -- we don't
expect patches that will eventually need them to have them.

It's considered good practice to choose an OID that's at the beginning
of the range shown by the unused_oids script, so naturally there is a
good chance that any patch that adds a system catalog entry will bit
rot prematurely. This seems totally unnecessary to me. You could even
have a replace_oids script under this system. That would replace the
known-temporary OIDs with mapped contiguous real values at the time of
commit (maybe it would just print out which permanent OIDs to use in
place of the temporary ones, and leave the rest up to the committer).
I don't do Perl, so I'm not volunteering for this.

-- 
Peter Geoghegan