On Fri, Sep 2, 2022 at 11:35 PM Andres Freund <and...@anarazel.de> wrote: > > Hi, > > On 2022-09-02 14:17:26 +0700, John Naylor wrote: > > On Thu, Sep 1, 2022 at 1:12 AM Andres Freund <and...@anarazel.de> wrote: > > > [v12] > > > > +# Build a small utility static lib for the parser. This makes it easier to > > not > > +# depend on gram.h already having been generated for most of the other code > > +# (which depends on generated headers having been generated). The > > generation > > +# of the parser is slow... > > > > It's not obvious whether this is intended to be a Meson-only > > optimization or a workaround for something awkward to specify. > > It is an optimization. The parser generation is by far the slowest part of a > build. If other files can only be compiled once gram.h is generated, there's a > long initial period where little can happen. So instead of having all .c files > have a dependency on gram.h having been generated, the above makes only > scan.c, gram.c compilation depend on gram.h. It only matters for the first > compilation, because such dependencies are added as order-only dependencies, > supplanted by more precise compiler generated dependencies after.
Okay, I think the comment could include some of this info for clarity. > It's still pretty annoying that so much of the build is initially idle, > waiting for genbki.pl to finish. > > Part of that is due to some ugly dependencies of src/common on backend headers > that IMO probably shouldn't exist (e.g. src/common/relpath.c includes > catalog/pg_tablespace_d.h). Technically, *_d.h headers are not backend, that's why it's safe to include them anywhere. relpath.c in its current form has to know the tablespace OIDs, which I guess is what you think is ugly. (I agree it's not great) > Looks like it'd not be hard to get at least the > _shlib version of src/common and libpq build without waiting for that. But for > all the backend code I don't really see a way, so it'd be nice to make genbki > faster at some point. The attached gets me a ~15% reduction in clock time by having Catalog.pm parse the .dat files in one sweep, when we don't care about formatting, i.e. most of the time: master: User time (seconds): 0.48 Maximum resident set size (kbytes): 36112 patch: User time (seconds): 0.41 Maximum resident set size (kbytes): 35808 That's pretty simple -- I think going beyond that would require some perl profiling. -- John Naylor EDB: http://www.enterprisedb.com
diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm index e91a8e10a8..9dd932e30a 100644 --- a/src/backend/catalog/Catalog.pm +++ b/src/backend/catalog/Catalog.pm @@ -287,6 +287,8 @@ sub ParseData my $catname = $1; my $data = []; + if ($preserve_formatting) + { # Scan the input file. while (<$ifd>) { @@ -346,11 +348,24 @@ sub ParseData { push @$data, $hash_ref if !$hash_ref->{autogenerated}; } - elsif ($preserve_formatting) + else { push @$data, $_; } } + } + else + { + # When we only care about the contents, it's faster to read and eval + # the whole file at once. + my $full_file = do { local(@ARGV, $/) = $input_file; <> }; + eval '$data = ' . $full_file; + foreach my $hash_ref (@{$data}) + { + AddDefaultValues($hash_ref, $schema, $catname); + } + } + close $ifd; # If this is pg_type, auto-generate array types too.