Re: WIP: a way forward on bootstrap data

John Naylor Tue, 27 Mar 2018 07:33:54 -0700

Tom Lane wrote:

>> -Maybe document examples of how to do bulk-editing of data files?
>
> +1.  In the end, that's the reason we're doing all this work, so showing
> people how to benefit seems like a good thing.


I'll hold off on posting a new patchset until I add this to the
documentation, but I wanted to report on a couple of other things:

While adjusting to the 80-column limit, I encountered a separation of
concerns violation between Catalog.pm and reformat_dat_files.pl that I
hadn't noticed before. Fixing that made things easier to read, with
fewer lines of code.

Speaking of bulk editing, that would be done via adopting
reformat_dat_files.pl to the task at hand. I did this myself for two
of the conversion helper scripts. However, enough bitrot has now
occurred that to make the relationship murky. Since I had to adopt
them to the 80-column limit as well, I shaved all the irrelevant
differences away, and now they're just a small diff away from the
reformat script. I also added block comments to help developers find
where they need to edit the script. Since reformat_dat_files.pl has
been substantially altered, I'll attach it here, along with the diffs
to the the helper scripts.

I wrote:

> I’ll see about a new Makefile.

I've attached a draft of this. I thought about adding a call to
duplicate_oids here, but this won't run unless you've run configure
first, and if you've done that, you've likely built already, running
duplicate_oids in the process.

I think I'll consolidate all documentation patches into one, at the
end of the series for maximum flexibility. I liked the idea of
spreading the doc changes over the patches, but there is not a huge
amount of time left.

-John Naylor

--- /home/john/pgdev/postgresql/src/include/catalog/reformat_dat_files.pl       
2018-03-27 18:04:54.698464144 +0700
+++ remove_pg_type_oid_symbols.pl       2018-03-27 18:13:42.270611897 +0700
@@ -1,18 +1,12 @@
 #!/usr/bin/perl -w
 #----------------------------------------------------------------------
 #
-# reformat_dat_files.pl
-#    Perl script that reads in a catalog data file and writes out
-#    a functionally equivalent file in a standard format.
-#
-#    Metadata entries (if any) come first, with normal attributes
-#    starting on the following line, in the same order they would be in
-#    the actual table.
+# remove_pg_type_oid_symbols.pl
 #
 # Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
 # Portions Copyright (c) 1994, Regents of the University of California
 #
-# /src/include/catalog/reformat_dat_files.pl
+# /src/include/catalog/remove_pg_type_oid_symbols.pl
 #
 #----------------------------------------------------------------------
 
@@ -85,22 +79,6 @@
        $catalog_data{$catname} = Catalog::ParseData($datfile, $schema, 1);
 }
 
-########################################################################
-# At this point, we have read all the data. If you are modifying this
-# script for bulk editing, this is a good place to build lookup tables,
-# if you need to. In the following example, the "next if !ref $row"
-# check below is a hack to filter out non-hash objects. This is because
-# we build the lookup tables from data that we read using the
-# "preserve_formatting" parameter.
-#
-##Index access method lookup.
-#my %amnames;
-#foreach my $row (@{ $catalog_data{pg_am} })
-#{
-#      next if !ref $row;
-#      $amnames{$row->{oid}} = $row->{amname};
-#}
-########################################################################
 
 # Write the data.
 foreach my $catname (@catnames)
@@ -131,10 +109,15 @@
                        my %values = %$data;
 
                        
############################################################
-                       # At this point we have the full tuple in memory as a 
hash
-                       # and can do any operations we want. As written, it only
-                       # removes default values, but this script can be 
adopted to
-                       # do one-off bulk-editing.
+                       # Remove pg_type OID symbols if they can match the rule
+                       # we use to generate them.
+                       if ($catname eq 'pg_type' and exists 
$values{oid_symbol})
+                       {
+                               my $symbol = 
form_pg_type_symbol($values{typname});
+                               delete $values{oid_symbol}
+                                 if defined $symbol
+                                       and $values{oid_symbol} eq $symbol;
+                       }
                        
############################################################
 
                        if (!$full_tuples)
@@ -181,6 +164,26 @@
        }
 }
 
+########################################################################
+# Determine canonical pg_type OID #define symbol from the type name.
+sub form_pg_type_symbol
+{
+       my $typename = shift;
+
+       # Skip for rowtypes of bootstrap tables.
+       return
+         if $typename eq 'pg_type'
+           or $typename eq 'pg_proc'
+           or $typename eq 'pg_attribute'
+           or $typename eq 'pg_class';
+
+       $typename =~ /(_)?(.+)/;
+       my $arraystr = $1 ? 'ARRAY' : '';
+       my $name = uc $2;
+       return $name . $arraystr . 'OID';
+}
+########################################################################
+
 # Leave values out if there is a matching default.
 sub strip_default_values
 {

--- /home/john/pgdev/postgresql/src/include/catalog/reformat_dat_files.pl       
2018-03-27 18:41:30.097479755 +0700
+++ convert_oid2name.pl 2018-03-27 18:17:16.154549365 +0700
@@ -1,18 +1,14 @@
 #!/usr/bin/perl -w
 #----------------------------------------------------------------------
 #
-# reformat_dat_files.pl
-#    Perl script that reads in a catalog data file and writes out
-#    a functionally equivalent file in a standard format.
-#
-#    Metadata entries (if any) come first, with normal attributes
-#    starting on the following line, in the same order they would be in
-#    the actual table.
+# convert_oid2name.pl
+#    Perl script that replaces some numeric OIDs with human readable
+#    macros.
 #
 # Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
 # Portions Copyright (c) 1994, Regents of the University of California
 #
-# /src/include/catalog/reformat_dat_files.pl
+# /src/include/catalog/convert_oid2name.pl
 #
 #----------------------------------------------------------------------
 
@@ -85,22 +81,68 @@
        $catalog_data{$catname} = Catalog::ParseData($datfile, $schema, 1);
 }
 
-########################################################################
-# At this point, we have read all the data. If you are modifying this
-# script for bulk editing, this is a good place to build lookup tables,
-# if you need to. In the following example, the "next if !ref $row"
-# check below is a hack to filter out non-hash objects. This is because
-# we build the lookup tables from data that we read using the
-# "preserve_formatting" parameter.
-#
-##Index access method lookup.
-#my %amnames;
-#foreach my $row (@{ $catalog_data{pg_am} })
-#{
-#      next if !ref $row;
-#      $amnames{$row->{oid}} = $row->{amname};
-#}
-########################################################################
+# Build lookup tables.
+# Note: the "next if !ref $row" checks below are a hack to filter out
+# non-hash objects. This is because we build the lookup tables from data
+# that we read using the "preserve_formatting" switch.
+
+# Index access method lookup.
+my %amnames;
+foreach my $row (@{ $catalog_data{pg_am} })
+{
+       next if !ref $row;
+       $amnames{$row->{oid}} = $row->{amname};
+}
+
+# Type oid lookup.
+my %typenames;
+$typenames{'0'} = '0';  # Easier than adding a check at every type lookup
+foreach my $row (@{ $catalog_data{pg_type} })
+{
+       next if !ref $row;
+       $typenames{$row->{oid}} = $row->{typname};
+}
+
+# Opfamily oid lookup.
+my %opfnames;
+foreach my $row (@{ $catalog_data{pg_opfamily} })
+{
+       next if !ref $row;
+       $opfnames{$row->{oid}} = $amnames{$row->{opfmethod}} . '/' . 
$row->{opfname};
+}
+
+# Opclass oid lookup.
+my %opcnames;
+foreach my $row (@{ $catalog_data{pg_opclass} })
+{
+       next if !ref $row;
+       $opcnames{$row->{oid}} = $amnames{$row->{opcmethod}} . '/' . 
$row->{opcname}
+         if exists $row->{oid};
+}
+
+# Operator oid lookup.
+my %opernames;
+foreach my $row (@{ $catalog_data{pg_operator} })
+{
+       next if !ref $row;
+       $opernames{$row->{oid}} = sprintf "%s(%s,%s)",
+         $row->{oprname}, $typenames{$row->{oprleft}}, 
$typenames{$row->{oprright}};
+}
+
+# Proc oid lookup.
+my %procoids;
+foreach my $row (@{ $catalog_data{pg_proc} })
+{
+       next if !ref $row;
+       if (defined($procoids{ $row->{proname} }))
+       {
+               $procoids{ $row->{proname} } = 'MULTIPLE';
+       }
+       else
+       {
+               $procoids{ $row->{oid} } = $row->{proname};
+       }
+}
 
 # Write the data.
 foreach my $catname (@catnames)
@@ -131,17 +173,102 @@
                        my %values = %$data;
 
                        
############################################################
-                       # At this point we have the full tuple in memory as a 
hash
-                       # and can do any operations we want. As written, it only
-                       # removes default values, but this script can be 
adopted to
-                       # do one-off bulk-editing.
-                       
############################################################
 
+                       # We strip default values first because at the time it 
seemed
+                       # easier to check for existence rather than add 
sentinel values
+                       # to the lookups.
                        if (!$full_tuples)
                        {
                                strip_default_values(\%values, $schema, 
$catname);
                        }
 
+                       # Replace OIDs with names
+
+                       if ($catname eq 'pg_proc')
+                       {
+                               $values{prorettype} = 
$typenames{$values{prorettype}};
+                               if ($values{proargtypes})
+                               {
+                                       my @argtypeoids = split /\s+/, 
$values{proargtypes};
+                                       my @argtypenames;
+                                       foreach my $argtypeoid (@argtypeoids)
+                                       {
+                                               push @argtypenames, 
$typenames{$argtypeoid};
+                                       }
+                                       $values{proargtypes} = join(' ', 
@argtypenames);
+                               }
+                               if ($values{proallargtypes})
+                               {
+                                       $values{proallargtypes} =~ s/[{}]//g;
+                                       my @argtypeoids = split /,/, 
$values{proallargtypes};
+                                       my @argtypenames;
+                                       foreach my $argtypeoid (@argtypeoids)
+                                       {
+                                               push @argtypenames, 
$typenames{$argtypeoid};
+                                       }
+                                       $values{proallargtypes} = '{' . 
join(',', @argtypenames) . '}';
+                               }
+                       }
+                       elsif ($catname eq 'pg_aggregate')
+                       {
+                               $values{aggsortop}     = 
$opernames{$values{aggsortop}}
+                                 if exists $values{aggsortop};
+                               $values{aggtranstype}  = 
$typenames{$values{aggtranstype}};
+                               $values{aggmtranstype} = 
$typenames{$values{aggmtranstype}}
+                                 if exists $values{aggmtranstype};
+                       }
+                       elsif ($catname eq 'pg_amop')
+                       {
+                               $values{amoplefttype}   = 
$typenames{$values{amoplefttype}};
+                               $values{amoprighttype}  = 
$typenames{$values{amoprighttype}};
+                               $values{amopmethod}     = 
$amnames{$values{amopmethod}};
+                               $values{amopfamily}     = 
$opfnames{$values{amopfamily}};
+                               $values{amopopr}        = 
$opernames{$values{amopopr}};
+                               $values{amopsortfamily} = 
$opfnames{$values{amopsortfamily}}
+                                 if exists $values{amopsortfamily};
+                       }
+                       elsif ($catname eq 'pg_amproc')
+                       {
+                               $values{amprocfamily}    = 
$opfnames{$values{amprocfamily}};
+                               $values{amproclefttype}  = 
$typenames{$values{amproclefttype}};
+                               $values{amprocrighttype} = 
$typenames{$values{amprocrighttype}};
+                       }
+                       elsif ($catname eq 'pg_cast')
+                       {
+                               $values{castsource} = 
$typenames{$values{castsource}};
+                               $values{casttarget} = 
$typenames{$values{casttarget}};
+                       }
+                       elsif ($catname eq 'pg_opclass')
+                       {
+                               $values{opcmethod}  = 
$amnames{$values{opcmethod}};
+                               $values{opcfamily}  = 
$opfnames{$values{opcfamily}};
+                               $values{opcintype}  = 
$typenames{$values{opcintype}};
+                               $values{opckeytype} = 
$typenames{$values{opckeytype}}
+                                 if exists $values{opckeytype};
+                       }
+                       elsif ($catname eq 'pg_operator')
+                       {
+                               $values{oprleft}   = 
$typenames{$values{oprleft}};
+                               $values{oprright}  = 
$typenames{$values{oprright}};
+                               $values{oprresult} = 
$typenames{$values{oprresult}};
+                               $values{oprcom}    = $opernames{$values{oprcom}}
+                                 if exists $values{oprcom};
+                               $values{oprnegate} = 
$opernames{$values{oprnegate}}
+                                 if exists $values{oprnegate};
+                       }
+                       elsif ($catname eq 'pg_opfamily')
+                       {
+                               $values{opfmethod}  = 
$amnames{$values{opfmethod}};
+                       }
+                       elsif ($catname eq 'pg_range')
+                       {
+                               $values{rngtypid}   = 
$typenames{$values{rngtypid}};
+                               $values{rngsubtype} = 
$typenames{$values{rngsubtype}};
+                               $values{rngsubopc}  = 
$opcnames{$values{rngsubopc}};
+                       }
+
+                       
############################################################
+
                        print $dat "{";
 
                        # Separate out metadata fields for readability.
@@ -286,7 +413,7 @@
 sub usage
 {
        die <<EOM;
-Usage: rewrite_dat.pl [options] datafile...
+Usage: convert_oid2name.pl [options] datafile...
 
 Options:
     -o               output path

v13alpha-reformat_dat_files.pl
Description: Perl program

v13alpha-Makefile
Description: Binary data

Re: WIP: a way forward on bootstrap data

Reply via email to