On Fri, Sep 2, 2022 at 11:35 PM Andres Freund <and...@anarazel.de> wrote:
>
> Hi,
>
> On 2022-09-02 14:17:26 +0700, John Naylor wrote:
> > On Thu, Sep 1, 2022 at 1:12 AM Andres Freund <and...@anarazel.de> wrote:
> > > [v12]
> >
> > +# Build a small utility static lib for the parser. This makes it easier to 
> > not
> > +# depend on gram.h already having been generated for most of the other code
> > +# (which depends on generated headers having been generated). The 
> > generation
> > +# of the parser is slow...
> >
> > It's not obvious whether this is intended to be a Meson-only
> > optimization or a workaround for something awkward to specify.
>
> It is an optimization. The parser generation is by far the slowest part of a
> build. If other files can only be compiled once gram.h is generated, there's a
> long initial period where little can happen. So instead of having all .c files
> have a dependency on gram.h having been generated, the above makes only
> scan.c, gram.c compilation depend on gram.h.  It only matters for the first
> compilation, because such dependencies are added as order-only dependencies,
> supplanted by more precise compiler generated dependencies after.

Okay, I think the comment could include some of this info for clarity.

> It's still pretty annoying that so much of the build is initially idle,
> waiting for genbki.pl to finish.
>
> Part of that is due to some ugly dependencies of src/common on backend headers
> that IMO probably shouldn't exist (e.g. src/common/relpath.c includes
> catalog/pg_tablespace_d.h).

Technically, *_d.h headers are not backend, that's why it's safe to
include them anywhere. relpath.c in its current form has to know the
tablespace OIDs, which I guess is what you think is ugly. (I agree
it's not great)

> Looks like it'd not be hard to get at least the
> _shlib version of src/common and libpq build without waiting for that. But for
> all the backend code I don't really see a way, so it'd be nice to make genbki
> faster at some point.

The attached gets me a ~15% reduction in clock time by having
Catalog.pm parse the .dat files in one sweep, when we don't care about
formatting, i.e. most of the time:

master:
User time (seconds): 0.48
Maximum resident set size (kbytes): 36112

patch:
User time (seconds): 0.41
Maximum resident set size (kbytes): 35808

That's pretty simple -- I think going beyond that would require some
perl profiling.

-- 
John Naylor
EDB: http://www.enterprisedb.com
diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index e91a8e10a8..9dd932e30a 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -287,6 +287,8 @@ sub ParseData
 	my $catname = $1;
 	my $data    = [];
 
+	if ($preserve_formatting)
+	{
 	# Scan the input file.
 	while (<$ifd>)
 	{
@@ -346,11 +348,24 @@ sub ParseData
 		{
 			push @$data, $hash_ref if !$hash_ref->{autogenerated};
 		}
-		elsif ($preserve_formatting)
+		else
 		{
 			push @$data, $_;
 		}
 	}
+	}
+	else
+	{
+		# When we only care about the contents, it's faster to read and eval
+		# the whole file at once.
+		my $full_file = do { local(@ARGV, $/) = $input_file; <> };
+		eval '$data = ' . $full_file;
+		foreach my $hash_ref (@{$data})
+		{
+			AddDefaultValues($hash_ref, $schema, $catname);
+		}
+	}
+
 	close $ifd;
 
 	# If this is pg_type, auto-generate array types too.

Reply via email to