Re: unused_oids script is broken with bsd sed

John Naylor Thu, 26 Apr 2018 04:38:43 -0700

On 4/26/18, Tom Lane <t...@sss.pgh.pa.us> wrote:
> John Naylor <jcnay...@gmail.com> writes:
>> For those following along, these scripts still assume we're in the
>> catalog directory. I can hack on that part tomorrow if no one else
>> has.
>
> I didn't touch this point.


Patch 0002 is my attempt at this. Not sure how portable this is. It's
also clunkier than before in that you need to tell it where to find
Catalog.pm, since it seems Perl won't compile unless "use lib" points
to a string and not an expression. Suggestions welcome. I also edited
the boilerplate comments.

> I notice that duplicate_oids is now a good factor of 10 slower than it was
> before (~0.04 sec to ~0.4 sec, on my machine).  While this doesn't seem
> like a problem for manual use, it seems annoying as part of the standard
> build process, especially on slower buildfarm critters.

If you're concerned about build speed, I think the generated *_d.h
headers cause a lot more files to be rebuilt than before. I'll think
about how to recover that.

> I think we should
> do what you said upthread and teach genbki.pl to complain about duplicate
> oids, so that we can remove duplicate_oids from the build process.
>
> I went ahead and pushed things as-is so we can confirm that duplicate_oids
> has no portability issues that the buildfarm could find.  But let's try
> to get the build process change in place after a day or so.

Patch 0001
-moves the compile-time test into genbki.pl
-removes a usability quirk from unused_oids
-removes duplicate_oids (not sure if you intended this, but since
unused_oids has already been committed to report duplicates, we may as
well standardize on that) and cleans up after that fact
-updates the documentation.

-John Naylor

From 9cf5c517b6a9359fb1863e3315f5412860db45ff Mon Sep 17 00:00:00 2001
From: John Naylor <jcnay...@gmail.com>
Date: Thu, 26 Apr 2018 15:54:53 +0700
Subject: [PATCH 1/2] Remove standalone script duplicate_oids

Commit 5602265f770 added duplicate checking to unused_oids, so use
that script for manual checks. Also change it so that it doesn't
exit when encountering a duplicate -- doing so slows down fixing
multiple duplicates.

This left unused_oids as the only caller of FindAllOidsFromHeaders().
Since that function no longer offers any useful abstraction, remove
it from Catalog.pm and inline it into unused_oids.

Make genbki.pl responsible for compile time duplicate checks. This
simplifies the Makefile a bit and keeps from having to read the
catalog files twice.
---
 doc/src/sgml/bki.sgml              |  9 +++----
 src/backend/catalog/Catalog.pm     | 51 --------------------------------------
 src/backend/catalog/Makefile       |  3 +--
 src/backend/catalog/genbki.pl      | 31 ++++++++++++++++++++++-
 src/include/catalog/duplicate_oids | 29 ----------------------
 src/include/catalog/unused_oids    | 49 +++++++++++++++++++++++++++++++++---
 6 files changed, 80 insertions(+), 92 deletions(-)
 delete mode 100755 src/include/catalog/duplicate_oids

diff --git a/doc/src/sgml/bki.sgml b/doc/src/sgml/bki.sgml
index 5a4cd39..087272c 100644
--- a/doc/src/sgml/bki.sgml
+++ b/doc/src/sgml/bki.sgml
@@ -377,13 +377,12 @@
     script <filename>src/include/catalog/unused_oids</filename>.
     It prints inclusive ranges of unused OIDs (e.g., the output
     line <quote>45-900</quote> means OIDs 45 through 900 have not been
-    allocated yet).  Currently, OIDs 1-9999 are reserved for manual
+    allocated yet), as well as any duplicate OIDs found.
+    Currently, OIDs 1-9999 are reserved for manual
     assignment; the <filename>unused_oids</filename> script simply looks
     through the catalog headers and <filename>.dat</filename> files
-    to see which ones do not appear.  You can also use
-    the <filename>duplicate_oids</filename> script to check for mistakes.
-    (That script is run automatically at compile time, and will stop the
-    build if a duplicate is found.)
+    to see which ones do not appear.  In addition, if genbki.pl detects
+    duplicates at compile time, it will stop the build.
    </para>
 
    <para>
diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 0b057a8..823e09a 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -384,55 +384,4 @@ sub FindDefinedSymbolFromData
 	die "no definition found for $symbol\n";
 }
 
-# Extract an array of all the OIDs assigned in the specified catalog headers
-# and their associated data files (if any).
-sub FindAllOidsFromHeaders
-{
-	my @input_files = @_;
-
-	my @oids = ();
-
-	foreach my $header (@input_files)
-	{
-		$header =~ /(.+)\.h$/
-		  or die "Input files need to be header files.\n";
-		my $datfile = "$1.dat";
-
-		my $catalog = Catalog::ParseHeader($header);
-
-		# We ignore the pg_class OID and rowtype OID of bootstrap catalogs,
-		# as those are expected to appear in the initial data for pg_class
-		# and pg_type.  For regular catalogs, include these OIDs.
-		if (!$catalog->{bootstrap})
-		{
-			push @oids, $catalog->{relation_oid}
-			  if ($catalog->{relation_oid});
-			push @oids, $catalog->{rowtype_oid} if ($catalog->{rowtype_oid});
-		}
-
-		# Not all catalogs have a data file.
-		if (-e $datfile)
-		{
-			my $catdata =
-			  Catalog::ParseData($datfile, $catalog->{columns}, 0);
-
-			foreach my $row (@$catdata)
-			{
-				push @oids, $row->{oid} if defined $row->{oid};
-			}
-		}
-
-		foreach my $toast (@{ $catalog->{toasting} })
-		{
-			push @oids, $toast->{toast_oid}, $toast->{toast_index_oid};
-		}
-		foreach my $index (@{ $catalog->{indexing} })
-		{
-			push @oids, $index->{index_oid};
-		}
-	}
-
-	return \@oids;
-}
-
 1;
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index d25d98a..a54197d 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -84,8 +84,7 @@ generated-header-symlinks: $(top_builddir)/src/include/catalog/header-stamp
 # configure run, even in distribution tarballs.  So depending on configure.in
 # instead is cheating a bit, but it will achieve the goal of updating the
 # version number when it changes.
-bki-stamp: genbki.pl Catalog.pm $(POSTGRES_BKI_SRCS) $(POSTGRES_BKI_DATA) $(top_srcdir)/configure.in $(top_srcdir)/src/include/catalog/duplicate_oids
-	cd $(top_srcdir)/src/include/catalog && $(PERL) ./duplicate_oids
+bki-stamp: genbki.pl Catalog.pm $(POSTGRES_BKI_SRCS) $(POSTGRES_BKI_DATA) $(top_srcdir)/configure.in
 	$(PERL) -I $(catalogdir) $< --set-version=$(MAJORVERSION) $(POSTGRES_BKI_SRCS)
 	touch $@
 
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 83b6158..199a068 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -78,6 +78,7 @@ my %catalogs;
 my %catalog_data;
 my @toast_decls;
 my @index_decls;
+my %oidcounts;		# number of times we see an OID
 foreach my $header (@input_files)
 {
 	$header =~ /(.+)\.h$/
@@ -94,10 +95,27 @@ foreach my $header (@input_files)
 		$catalogs{$catname} = $catalog;
 	}
 
+	# We ignore the pg_class OID and rowtype OID of bootstrap catalogs,
+	# as those are expected to appear in the initial data for pg_class
+	# and pg_type.  For regular catalogs, include these OIDs.
+	if (!$catalog->{bootstrap})
+	{
+		$oidcounts{ $catalog->{relation_oid} }++
+		  if ($catalog->{relation_oid});
+		$oidcounts{ $catalog->{rowtype_oid} }++
+		  if ($catalog->{rowtype_oid});
+	}
+
 	# Not all catalogs have a data file.
 	if (-e $datfile)
 	{
-		$catalog_data{$catname} = Catalog::ParseData($datfile, $schema, 0);
+		my $data = Catalog::ParseData($datfile, $schema, 0);
+		$catalog_data{$catname} = $data;
+		foreach my $row (@$data)
+		{
+			$oidcounts{ $row->{oid} }++ if defined $row->{oid};
+		}
+
 	}
 
 	# If the header file contained toast or index info, build BKI
@@ -119,6 +137,17 @@ foreach my $header (@input_files)
 	}
 }
 
+# Exit if we have any duplicate OIDs.
+my $found = 0;
+foreach my $oid (keys %oidcounts)
+{
+	next unless $oidcounts{$oid} > 1;
+	print "Duplicate oids detected:\n" if !$found;
+	$found = 1;
+	print "$oid\n";
+}
+die if $found;
+
 # Fetch some special data that we will substitute into the output file.
 # CAUTION: be wary about what symbols you substitute into the .bki file here!
 # It's okay to substitute things that are expected to be really constant
diff --git a/src/include/catalog/duplicate_oids b/src/include/catalog/duplicate_oids
deleted file mode 100755
index 0d7aa15..0000000
--- a/src/include/catalog/duplicate_oids
+++ /dev/null
@@ -1,29 +0,0 @@
-#!/usr/bin/perl
-
-use lib '../../backend/catalog/';
-use Catalog;
-
-use strict;
-use warnings;
-
-my @input_files = (glob("pg_*.h"), qw(indexing.h toasting.h));
-
-my $oids = Catalog::FindAllOidsFromHeaders(@input_files);
-
-my %oidcounts;
-
-foreach my $oid (@{$oids})
-{
-	$oidcounts{$oid}++;
-}
-
-my $found = 0;
-
-foreach my $oid (sort { $a <=> $b } keys %oidcounts)
-{
-	next unless $oidcounts{$oid} > 1;
-	$found = 1;
-	print "$oid\n";
-}
-
-exit $found;
diff --git a/src/include/catalog/unused_oids b/src/include/catalog/unused_oids
index a727225..333b547 100755
--- a/src/include/catalog/unused_oids
+++ b/src/include/catalog/unused_oids
@@ -23,16 +23,58 @@ use warnings;
 
 my @input_files = (glob("pg_*.h"), qw(indexing.h toasting.h));
 
-my $oids = Catalog::FindAllOidsFromHeaders(@input_files);
+# Extract an array of all the OIDs assigned in the specified catalog headers
+# and their associated data files (if any).
+my @oids;
+foreach my $header (@input_files)
+{
+	$header =~ /(.+)\.h$/
+	  or die "Input files need to be header files.\n";
+	my $datfile = "$1.dat";
+
+	my $catalog = Catalog::ParseHeader($header);
+
+	# We ignore the pg_class OID and rowtype OID of bootstrap catalogs,
+	# as those are expected to appear in the initial data for pg_class
+	# and pg_type.  For regular catalogs, include these OIDs.
+	if (!$catalog->{bootstrap})
+	{
+		push @oids, $catalog->{relation_oid}
+		  if ($catalog->{relation_oid});
+		push @oids, $catalog->{rowtype_oid}
+		  if ($catalog->{rowtype_oid});
+	}
+
+	# Not all catalogs have a data file.
+	if (-e $datfile)
+	{
+		my $catdata =
+		  Catalog::ParseData($datfile, $catalog->{columns}, 0);
+
+		foreach my $row (@$catdata)
+		{
+			push @oids, $row->{oid} if defined $row->{oid};
+		}
+	}
+
+	foreach my $toast (@{ $catalog->{toasting} })
+	{
+		push @oids, $toast->{toast_oid}, $toast->{toast_index_oid};
+	}
+	foreach my $index (@{ $catalog->{indexing} })
+	{
+		push @oids, $index->{index_oid};
+	}
+}
 
 # Also push FirstBootstrapObjectId to serve as a terminator for the last gap.
 my $FirstBootstrapObjectId =
   Catalog::FindDefinedSymbol('access/transam.h', [".."],
 	'FirstBootstrapObjectId');
-push @{$oids}, $FirstBootstrapObjectId;
+push @oids, $FirstBootstrapObjectId;
 
 my $prev_oid = 0;
-foreach my $oid (sort { $a <=> $b } @{$oids})
+foreach my $oid (sort { $a <=> $b } @oids)
 {
 	if ($oid > $prev_oid + 1)
 	{
@@ -48,7 +90,6 @@ foreach my $oid (sort { $a <=> $b } @{$oids})
 	elsif ($oid == $prev_oid)
 	{
 		print "Duplicate oid detected: $oid\n";
-		exit 1;
 	}
 	$prev_oid = $oid;
 }
-- 
2.7.4

From 337d1eee0473dd5742993161256692bed25097fa Mon Sep 17 00:00:00 2001
From: John Naylor <jcnay...@gmail.com>
Date: Thu, 26 Apr 2018 17:50:28 +0700
Subject: [PATCH 2/2] Remove requirement to call unused_oids from its directory

No doc change, since that fact wasn't mentioned before.

Also do an editing pass over the boilerplate comments.
---
 src/include/catalog/unused_oids | 42 +++++++++++++++++++++++++++--------------
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/src/include/catalog/unused_oids b/src/include/catalog/unused_oids
index 333b547..120f744 100755
--- a/src/include/catalog/unused_oids
+++ b/src/include/catalog/unused_oids
@@ -1,33 +1,47 @@
-#!/usr/bin/perl
+#!/usr/bin/perl -w
+#----------------------------------------------------------------------
 #
 # unused_oids
+#    Finds blocks of manually-assignable oids that have not already been
+#    claimed by previous hackers.  The main use is for finding available
+#    OIDs for new internal functions.  The numbers printed are inclusive
+#    ranges of unused OIDs. It also reports any duplicate OIDs found.
 #
-# src/include/catalog/unused_oids
+#    Before using a large empty block, make sure you aren't about
+#    to take over what was intended as expansion space for something
+#    else.
 #
-#	finds blocks of manually-assignable oids that have not already been
-#	claimed by post_hackers.  primarily useful for finding available
-#	oids for new internal functions.  the numbers printed are inclusive
-#	ranges of unused oids.
+#    Note: You must pass the location of the Catalog.pm module to the
+#    Perl interpreter:
+#    perl -I /path/to/Catalog.pm /path/to/unused_oids
 #
-#	before using a large empty block, make sure you aren't about
-#	to take over what was intended as expansion space for something
-#	else.
+# Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
 #
-#	run this script in src/include/catalog.
+# src/include/catalog/unused_oids
 #
-use lib '../../backend/catalog/';
+#----------------------------------------------------------------------
+
 use Catalog;
+use File::Basename qw(dirname);
+my $catdir = dirname(__FILE__);
 
 use strict;
 use warnings;
 
-my @input_files = (glob("pg_*.h"), qw(indexing.h toasting.h));
+
+opendir(my $cd, $catdir)
+  or die "Can't opendir $catdir $!";
+my @filenames = grep { /(pg_\w+|toasting|indexing)\.h$/ } readdir($cd);
+closedir $cd;
 
 # Extract an array of all the OIDs assigned in the specified catalog headers
 # and their associated data files (if any).
 my @oids;
-foreach my $header (@input_files)
+foreach my $filename (@filenames)
 {
+	my $header = "$catdir/$filename";
+
 	$header =~ /(.+)\.h$/
 	  or die "Input files need to be header files.\n";
 	my $datfile = "$1.dat";
@@ -69,7 +83,7 @@ foreach my $header (@input_files)
 
 # Also push FirstBootstrapObjectId to serve as a terminator for the last gap.
 my $FirstBootstrapObjectId =
-  Catalog::FindDefinedSymbol('access/transam.h', [".."],
+  Catalog::FindDefinedSymbol('access/transam.h', ["$catdir/.."],
 	'FirstBootstrapObjectId');
 push @oids, $FirstBootstrapObjectId;
 
-- 
2.7.4

Re: unused_oids script is broken with bsd sed

Reply via email to