Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-18 Thread Dagfinn Ilmari Mannsåker
Tom Lane  writes:

> Andres Freund  writes:
>> On 2017-04-17 17:49:54 +0100, Dagfinn Ilmari Mannsåker wrote:
>>> I threw Devel::NYTProf at it and picked some more low-hanging fruit.
>
>> I'm a bit doubtful about improving the performance of genbki at the cost
>> of any sort of complication - it's only executed during the actual
>> build, not during initdb...  I don't see much point in doing things like
>> 3) and 4), it's just not worth it imo.
>
> Yeah, it's only run once per build, or probably even less often than that
> considering we only re-run it when the input files change.  I could get
> interested in another 20% reduction in initdb's time, but not genbki's.

Fair enough.  I still think 1), 2) and 5) are worthwile code cleanups
regardless of the performance impact.  In fact, if we don't care that
much about the performance, we can move the duplicated code in
Gen_fmgrmtab.pl and genbki.pl that turns bki_values into a hash into
Catalogs.pm.  That regresses genbki.pl time by ~30ms on my machine.

Revised patch series attached.

-- 
- Twitter seems more influential [than blogs] in the 'gets reported in
  the mainstream press' sense at least.   - Matt McLeod
- That'd be because the content of a tweet is easier to condense down
  to a mainstream media article.  - Calle Dybedahl

>From f7b75bcbe993d4ddbc92da85c7148bf7fee143ee Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Mon, 17 Apr 2017 13:26:35 +0100
Subject: [PATCH 1/4] Avoid unnecessary regex captures in Catalog.pm

---
 src/backend/catalog/Catalog.pm | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index fa8de04..e7b647a 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -54,7 +54,7 @@ sub Catalogs
 		{
 
 			# Strip C-style comments.
-			s;/\*(.|\n)*\*/;;g;
+			s;/\*(?:.|\n)*\*/;;g;
 			if (m;/\*;)
 			{
 
@@ -80,12 +80,12 @@ sub Catalogs
 			{
 $catalog{natts} = $1;
 			}
-			elsif (/^DATA\(insert(\s+OID\s+=\s+(\d+))?\s+\(\s*(.*)\s*\)\s*\)$/)
+			elsif (/^DATA\(insert(?:\s+OID\s+=\s+(\d+))?\s+\(\s*(.*)\s*\)\s*\)$/)
 			{
-check_natts($filename, $catalog{natts}, $3,
+check_natts($filename, $catalog{natts}, $2,
 			$input_file, $input_line_number);
 
-push @{ $catalog{data} }, { oid => $2, bki_values => $3 };
+push @{ $catalog{data} }, { oid => $1, bki_values => $2 };
 			}
 			elsif (/^DESCR\(\"(.*)\"\)$/)
 			{
-- 
2.7.4

>From decd1b8c171cc508da5f79f01d2f0779a569a963 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Mon, 17 Apr 2017 14:04:20 +0100
Subject: [PATCH 2/4] Avoid repeated calls to Catalog::SplitDataLine

Both check_natts and the callers of Catalog::Catalogs were calling
Catalog::SplitDataLines, the former discarding the result.

Instead, have Catalog::Catalogs store the split fields directly and pass
the count to check_natts
---
 src/backend/catalog/Catalog.pm   | 14 +++---
 src/backend/catalog/genbki.pl|  3 +--
 src/backend/utils/Gen_fmgrtab.pl |  3 +--
 3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index e7b647a..81513c7 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -82,10 +82,12 @@ sub Catalogs
 			}
 			elsif (/^DATA\(insert(?:\s+OID\s+=\s+(\d+))?\s+\(\s*(.*)\s*\)\s*\)$/)
 			{
-check_natts($filename, $catalog{natts}, $2,
+my @bki_values = SplitDataLine($2);
+
+check_natts($filename, $catalog{natts}, scalar(@bki_values),
 			$input_file, $input_line_number);
 
-push @{ $catalog{data} }, { oid => $1, bki_values => $2 };
+push @{ $catalog{data} }, { oid => $1, bki_values => \@bki_values };
 			}
 			elsif (/^DESCR\(\"(.*)\"\)$/)
 			{
@@ -254,17 +256,15 @@ sub RenameTempFile
 # verify the number of fields in the passed-in DATA line
 sub check_natts
 {
-	my ($catname, $natts, $bki_val, $file, $line) = @_;
+	my ($catname, $natts, $bki_vals, $file, $line) = @_;
 
 	die "Could not find definition for Natts_${catname} before start of DATA() in $file\n"
 		unless defined $natts;
 
-	my $nfields = scalar(SplitDataLine($bki_val));
-
 	die sprintf
 		"Wrong number of attributes in DATA() entry at %s:%d (expected %d but got %d)\n",
-		$file, $line, $natts, $nfields
-	  unless $natts == $nfields;
+		$file, $line, $natts, $bki_vals
+	  unless $natts == $bki_vals;
 }
 
 1;
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 198e072..8875f6c 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -161,9 +161,8 @@ foreach my $catname (@{ $catalogs->{names} })
 		foreach my $row (@{ $catalog->{data} })
 		{
 
-			# Split line into tokens without interpreting their meaning.
 			my %bki_values;
-			@bki_values{@attnames} 

Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-17 Thread Tom Lane
Andres Freund  writes:
> Btw, I think Tom's "more that could be done" was referring more to doing
> more upfront work, checking, easier input format, whatnot in the genbki,
> not so much performance work...  Tom, correct me if I'm wrong.

Yeah, as per what we were just saying, performance of that code isn't
really an issue right now.  Supporting a nicer input format is probably
the biggest step forward that could be taken, but that will require some
major work :-(.  We've also talked about things like supporting
regprocedure rather than only regproc (ie disambiguating overloaded
functions), likewise regoperator, and so on, with an eye to reducing
the number of places where people have to write numeric OIDs in the
input.  There's some threads on this in the archives, but I'm too
lazy to look.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-17 Thread Andres Freund
On 2017-04-17 17:49:54 +0100, Dagfinn Ilmari Mannsåker wrote:
> Tom Lane  writes:
> 
> > There's certainly lots more that could be done in the genbki code,
> > but I think all we can justify at this stage of the development
> > cycle is to get the low-hanging fruit for testing speedups.
> 
> I threw Devel::NYTProf at it and picked some more low-hanging fruit.
> Attached are separate patches for each change, and here are the runtimes
> of genbki.pl and Gen_fmgrtab.pl, respectively, after each patch
> (averages of 5 runs, in millseconds):

Btw, I think Tom's "more that could be done" was referring more to doing
more upfront work, checking, easier input format, whatnot in the genbki,
not so much performance work...  Tom, correct me if I'm wrong.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-17 Thread Tom Lane
Andres Freund  writes:
> On 2017-04-17 17:49:54 +0100, Dagfinn Ilmari Mannsåker wrote:
>> I threw Devel::NYTProf at it and picked some more low-hanging fruit.

> I'm a bit doubtful about improving the performance of genbki at the cost
> of any sort of complication - it's only executed during the actual
> build, not during initdb...  I don't see much point in doing things like
> 3) and 4), it's just not worth it imo.

Yeah, it's only run once per build, or probably even less often than that
considering we only re-run it when the input files change.  I could get
interested in another 20% reduction in initdb's time, but not genbki's.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-17 Thread Andres Freund
On 2017-04-17 17:49:54 +0100, Dagfinn Ilmari Mannsåker wrote:
> Tom Lane  writes:
> 
> > There's certainly lots more that could be done in the genbki code,
> > but I think all we can justify at this stage of the development
> > cycle is to get the low-hanging fruit for testing speedups.
> 
> I threw Devel::NYTProf at it and picked some more low-hanging fruit.
> Attached are separate patches for each change, and here are the runtimes
> of genbki.pl and Gen_fmgrtab.pl, respectively, after each patch
> (averages of 5 runs, in millseconds):
> 
> master (b6dd1271): 355, 182
> 
> 1: Avoid unnecessary regex captures: 349, 183
> 2: Avoid repeated calls to SplitDataLine: 316, 158
> 3: Inline SplitDataLine: 291, 141
> 4: Inline check_natts: 287, 141
> 
> Together they shave 68ms or 19.2% off the runtime of genbki.pl and 41ms
> or 22.5% off the runtime of Gen_fmgrtab.pl

I'm a bit doubtful about improving the performance of genbki at the cost
of any sort of complication - it's only executed during the actual
build, not during initdb...  I don't see much point in doing things like
3) and 4), it's just not worth it imo.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-17 Thread Dagfinn Ilmari Mannsåker
Tom Lane  writes:

> There's certainly lots more that could be done in the genbki code,
> but I think all we can justify at this stage of the development
> cycle is to get the low-hanging fruit for testing speedups.

I threw Devel::NYTProf at it and picked some more low-hanging fruit.
Attached are separate patches for each change, and here are the runtimes
of genbki.pl and Gen_fmgrtab.pl, respectively, after each patch
(averages of 5 runs, in millseconds):

master (b6dd1271): 355, 182

1: Avoid unnecessary regex captures: 349, 183
2: Avoid repeated calls to SplitDataLine: 316, 158
3: Inline SplitDataLine: 291, 141
4: Inline check_natts: 287, 141

Together they shave 68ms or 19.2% off the runtime of genbki.pl and 41ms
or 22.5% off the runtime of Gen_fmgrtab.pl

Finally, one non-performance patch, which just removes the use of
Exporter in Catalog.pm, since none of the users actually import anything
from it.

- ilmari
-- 
"I use RMS as a guide in the same way that a boat captain would use
 a lighthouse.  It's good to know where it is, but you generally
 don't want to find yourself in the same spot." - Tollef Fog Heen

>From 5d7f4b1e78243daa6939501b88b2644bfcf9bd77 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Mon, 17 Apr 2017 13:26:35 +0100
Subject: [PATCH 1/8] Avoid unnecessary regex captures in Catalog.pm

---
 src/backend/catalog/Catalog.pm | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index fa8de04..e7b647a 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -54,7 +54,7 @@ sub Catalogs
 		{
 
 			# Strip C-style comments.
-			s;/\*(.|\n)*\*/;;g;
+			s;/\*(?:.|\n)*\*/;;g;
 			if (m;/\*;)
 			{
 
@@ -80,12 +80,12 @@ sub Catalogs
 			{
 $catalog{natts} = $1;
 			}
-			elsif (/^DATA\(insert(\s+OID\s+=\s+(\d+))?\s+\(\s*(.*)\s*\)\s*\)$/)
+			elsif (/^DATA\(insert(?:\s+OID\s+=\s+(\d+))?\s+\(\s*(.*)\s*\)\s*\)$/)
 			{
-check_natts($filename, $catalog{natts}, $3,
+check_natts($filename, $catalog{natts}, $2,
 			$input_file, $input_line_number);
 
-push @{ $catalog{data} }, { oid => $2, bki_values => $3 };
+push @{ $catalog{data} }, { oid => $1, bki_values => $2 };
 			}
 			elsif (/^DESCR\(\"(.*)\"\)$/)
 			{
-- 
2.7.4

>From 92402e0306eda209b1a2811acc41325d75add0cb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Mon, 17 Apr 2017 14:04:20 +0100
Subject: [PATCH 2/8] Avoid repeated calls to Catalog::SplitDataLine

Both check_natts and the callers of Catalog::Catalogs were calling
Catalog::SplitDataLines, the former discarding the result.

Instead, have Catalog::Catalogs store the split fields directly and pass
the count to check_natts
---
 src/backend/catalog/Catalog.pm   | 14 +++---
 src/backend/catalog/genbki.pl|  3 +--
 src/backend/utils/Gen_fmgrtab.pl |  3 +--
 3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index e7b647a..0c508b0 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -82,10 +82,12 @@ sub Catalogs
 			}
 			elsif (/^DATA\(insert(?:\s+OID\s+=\s+(\d+))?\s+\(\s*(.*)\s*\)\s*\)$/)
 			{
-check_natts($filename, $catalog{natts}, $2,
+my @bki_values = SplitDataLine($2);
+
+check_natts($filename, $catalog{natts}, scalar(@bki_values),
 			$input_file, $input_line_number);
 
-push @{ $catalog{data} }, { oid => $1, bki_values => $2 };
+push @{ $catalog{data} }, { oid => $1, bki_values => \@bki_values };
 			}
 			elsif (/^DESCR\(\"(.*)\"\)$/)
 			{
@@ -254,17 +256,15 @@ sub RenameTempFile
 # verify the number of fields in the passed-in DATA line
 sub check_natts
 {
-	my ($catname, $natts, $bki_val, $file, $line) = @_;
+	my ($catname, $natts, $bki_vals, $file, $line) = @_;
 
 	die "Could not find definition for Natts_${catname} before start of DATA() in $file\n"
 		unless defined $natts;
 
-	my $nfields = scalar(SplitDataLine($bki_val));
-
 	die sprintf
 		"Wrong number of attributes in DATA() entry at %s:%d (expected %d but got %d)\n",
-		$file, $line, $natts, $nfields
-	  unless $natts == $nfields;
+		$file, $line, $natts, $bki_vals
+	unless $natts == $bki_vals;
 }
 
 1;
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 198e072..8875f6c 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -161,9 +161,8 @@ foreach my $catname (@{ $catalogs->{names} })
 		foreach my $row (@{ $catalog->{data} })
 		{
 
-			# Split line into tokens without interpreting their meaning.
 			my %bki_values;
-			@bki_values{@attnames} = Catalog::SplitDataLine($row->{bki_values});
+			@bki_values{@attnames} = @{$row->{bki_values}};
 
 			# Perform required substitutions on fields
 			foreach my $att (keys %bki_values)
diff --git 

Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-16 Thread Andrew Dunstan


On 04/16/2017 07:43 PM, Peter Eisentraut wrote:
> On 4/15/17 12:33, Andrew Dunstan wrote:
>> Sure. Just means putting this code a bit later in the file. "make check"
>> is only one initdb, so it won't cost much. I'm still inclined to force a
>> TAP test for initdb with no TZ set, though.
> How much is this going to buy overall?  Is it worth the complications?
>


I don't know, but it's a very small change.

I am going to spend some time instrumenting where the time goes in
various tests.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-16 Thread Peter Eisentraut
On 4/15/17 12:33, Andrew Dunstan wrote:
> Sure. Just means putting this code a bit later in the file. "make check"
> is only one initdb, so it won't cost much. I'm still inclined to force a
> TAP test for initdb with no TZ set, though.

How much is this going to buy overall?  Is it worth the complications?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-15 Thread Andrew Dunstan


On 04/15/2017 02:13 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> Sure. Just means putting this code a bit later in the file. "make check"
>> is only one initdb, so it won't cost much. I'm still inclined to force a
>> TAP test for initdb with no TZ set, though.
> I'd like that to be an additional tweak for some existing test case
> rather than expending a whole initdb run just for that --- but otherwise,
> no objection.
>
>   


That's what my patch did.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-15 Thread Tom Lane
Andrew Dunstan  writes:
> Sure. Just means putting this code a bit later in the file. "make check"
> is only one initdb, so it won't cost much. I'm still inclined to force a
> TAP test for initdb with no TZ set, though.

I'd like that to be an additional tweak for some existing test case
rather than expending a whole initdb run just for that --- but otherwise,
no objection.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-15 Thread Andrew Dunstan


On 04/15/2017 12:13 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> What I had in mind was the attached plus roughly this in the buildfarm
>> client:
>> $ENV{TZ} ||= 'US/Eastern';
>> or whatever zone we choose to use.
> How about letting the first "make check" run with whatever is in the
> environment, and then forcing TZ to become set (much as above) for
> all the remaining tests?  I'm afraid what you've got here might
> encourage a certain sameness of the test environments.
>
>   


Sure. Just means putting this code a bit later in the file. "make check"
is only one initdb, so it won't cost much. I'm still inclined to force a
TAP test for initdb with no TZ set, though.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-15 Thread Tom Lane
Andrew Dunstan  writes:
> What I had in mind was the attached plus roughly this in the buildfarm
> client:
> $ENV{TZ} ||= 'US/Eastern';
> or whatever zone we choose to use.

How about letting the first "make check" run with whatever is in the
environment, and then forcing TZ to become set (much as above) for
all the remaining tests?  I'm afraid what you've got here might
encourage a certain sameness of the test environments.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-15 Thread Andrew Dunstan


On 04/15/2017 11:44 AM, Alvaro Herrera wrote:
> Andrew Dunstan wrote:
>
>> Alternatively, we could have an initdb TAP test that explicitly removed
>> the environment setting so we'd get coverage of select_default_timezone,
>> and have the buildfarm set TZ to something if it's not already set.
> What about having an initdb option that runs select_default_timezone
> only and reports the result, so that it can be used in the buildfarm
> script to set TZ in all the regular initdb calls?
>


Seems like more work.

What I had in mind was the attached plus roughly this in the buildfarm
client:

$ENV{TZ} ||= 'US/Eastern';

or whatever zone we choose to use.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 372865d..11d05c4 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -34,8 +34,16 @@ command_fails(
 	'role names cannot begin with "pg_"');
 
 mkdir $datadir;
-command_ok([ 'initdb', '-N', '-T', 'german', '-X', $xlogdir, $datadir ],
-	'successful creation');
 
+# make sure we run one successful test without a TZ setting so we test
+# initdb's time zone setting code
+{
+	# delete local only works from perl 5.12, so use the older way to do this
+	local (%ENV) = %ENV;
+	delete $ENV{TZ};
+
+	command_ok([ 'initdb', '-N', '-T', 'german', '-X', $xlogdir, $datadir ],
+		'successful creation');
+}
 command_ok([ 'initdb', '-S', $datadir ], 'sync only');
 command_fails([ 'initdb', $datadir ], 'existing data directory');

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-15 Thread Alvaro Herrera
Andrew Dunstan wrote:

> Alternatively, we could have an initdb TAP test that explicitly removed
> the environment setting so we'd get coverage of select_default_timezone,
> and have the buildfarm set TZ to something if it's not already set.

What about having an initdb option that runs select_default_timezone
only and reports the result, so that it can be used in the buildfarm
script to set TZ in all the regular initdb calls?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-15 Thread Andrew Dunstan


On 04/15/2017 12:07 AM, Tom Lane wrote:
> Andreas Karlsson  writes:
>> Looked some at this and what take time now for me seems to mainly be 
>> these four things (out of a total runtime of 560 ms).
>> 1. setup_conversion:140 ms
>> 2. select_default_timezone:  90 ms
>> 3. bootstrap_template1:  80 ms
>> 4. setup_schema: 65 ms
> FWIW, you can bypass select_default_timezone by setting environment
> variable TZ to a valid timezone name before calling initdb.  On
> my workstation, that currently cuts the time for "initdb --no-sync"
> from about 1.25 sec to just about exactly 1.0 sec.
>
> I doubt that we want all the buildfarm members to switch over to
> doing that, since then we'd lose coverage of select_default_timezone.
> But it's interesting to speculate about having the buildfarm script
> extract the selected timezone name out of postgresql.conf after the
> first initdb it does, and then set TZ for remaining tests.  We surely
> don't need to test select_default_timezone more than once per
> buildfarm run.
>
>   



Yeah. The obvious place to do this would be the "make check" stage,
which runs very early. Unfortunately, pg_regress carefully removes its
data directory on success - kind of a pity that's not switchable.

Probably for now the simplest thing for buildfarm animals whose owners
are trying to squeeze every last second would be to put something like

TZ => 'Us/Eastern',

in the build_env section of the config file. But as you say we don't
want everyone doing that.

Alternatively, we could have an initdb TAP test that explicitly removed
the environment setting so we'd get coverage of select_default_timezone,
and have the buildfarm set TZ to something if it's not already set.

cheers

andrew

-- 

Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-14 Thread Tom Lane
Andreas Karlsson  writes:
> Looked some at this and what take time now for me seems to mainly be 
> these four things (out of a total runtime of 560 ms).

> 1. setup_conversion:140 ms
> 2. select_default_timezone:  90 ms
> 3. bootstrap_template1:  80 ms
> 4. setup_schema: 65 ms

FWIW, you can bypass select_default_timezone by setting environment
variable TZ to a valid timezone name before calling initdb.  On
my workstation, that currently cuts the time for "initdb --no-sync"
from about 1.25 sec to just about exactly 1.0 sec.

I doubt that we want all the buildfarm members to switch over to
doing that, since then we'd lose coverage of select_default_timezone.
But it's interesting to speculate about having the buildfarm script
extract the selected timezone name out of postgresql.conf after the
first initdb it does, and then set TZ for remaining tests.  We surely
don't need to test select_default_timezone more than once per
buildfarm run.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-14 Thread Gavin Flower

On 15/04/17 13:44, Andreas Karlsson wrote:

On 04/14/2017 11:54 PM, Tom Lane wrote:

I failed to resist the temptation to poke at this, and found that
indeed nothing seems to break if we just use one transaction for the
whole processing of postgres.bki.  So I've pushed a patch that does
that.  We're definitely down to the point where worrying about the
speed of bootstrap mode, per se, is useless; the other steps in
initdb visibly take a lot more time.


Looked some at this and what take time now for me seems to mainly be 
these four things (out of a total runtime of 560 ms).


1. setup_conversion:140 ms
2. select_default_timezone:  90 ms
3. bootstrap_template1:  80 ms
4. setup_schema: 65 ms

These four take up about two thirds of the total runtime, so it seems 
likely that we may still have relatively low hanging fruit (but not 
worth committing for PostgreSQL 10).


I have not done profiling of these functions yet, so am not sure how 
they best would be fixed but maybe setup_conversion could be converted 
into bki entries to speed it up.


Andreas



How much could be done concurrently?


Cheers.
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-14 Thread Andreas Karlsson

On 04/14/2017 11:54 PM, Tom Lane wrote:

I failed to resist the temptation to poke at this, and found that
indeed nothing seems to break if we just use one transaction for the
whole processing of postgres.bki.  So I've pushed a patch that does
that.  We're definitely down to the point where worrying about the
speed of bootstrap mode, per se, is useless; the other steps in
initdb visibly take a lot more time.


Looked some at this and what take time now for me seems to mainly be 
these four things (out of a total runtime of 560 ms).


1. setup_conversion:140 ms
2. select_default_timezone:  90 ms
3. bootstrap_template1:  80 ms
4. setup_schema: 65 ms

These four take up about two thirds of the total runtime, so it seems 
likely that we may still have relatively low hanging fruit (but not 
worth committing for PostgreSQL 10).


I have not done profiling of these functions yet, so am not sure how 
they best would be fixed but maybe setup_conversion could be converted 
into bki entries to speed it up.


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-14 Thread Tom Lane
Andres Freund  writes:
> On 2017-04-13 14:05:43 -0400, Tom Lane wrote:
>> Hm.  That ties into something I was looking at yesterday.  The only
>> reason that function is called so much is that bootstrap mode runs a
>> separate transaction for *each line of the bki file* (cf do_start,
>> do_end in bootparse.y).  Which seems pretty silly.

> Indeed.

I failed to resist the temptation to poke at this, and found that
indeed nothing seems to break if we just use one transaction for the
whole processing of postgres.bki.  So I've pushed a patch that does
that.  We're definitely down to the point where worrying about the
speed of bootstrap mode, per se, is useless; the other steps in
initdb visibly take a lot more time.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-14 Thread Andreas Karlsson

On 04/13/2017 06:13 PM, Tom Lane wrote:

I've pushed this with some mostly-cosmetic adjustments:


Thanks! I like your adjustments.


There's certainly lots more that could be done in the genbki code,
but I think all we can justify at this stage of the development
cycle is to get the low-hanging fruit for testing speedups.


Yeah, I also noticed that the genbki code seems to have gotten little 
love and that much more can be done here.


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-13 Thread Andres Freund
On 2017-04-13 14:05:43 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-04-13 12:56:14 -0400, Tom Lane wrote:
> >> Andres Freund  writes:
> >>> Cool.  I wonder if we also should remove AtEOXact_CatCache()'s
> >>> cross-checks - the resowner replacement has been in place for a while,
> >>> and seems robust enough.  They're now the biggest user of time.
> 
> >> Hm, biggest user of time in what workload?  I've not noticed that
> >> function particularly.
> 
> > Just initdb.  I presume it's because the catcaches will frequently be
> > relatively big there.
> 
> Hm.  That ties into something I was looking at yesterday.  The only
> reason that function is called so much is that bootstrap mode runs a
> separate transaction for *each line of the bki file* (cf do_start,
> do_end in bootparse.y).  Which seems pretty silly.

Indeed.


> On the whole, though, we may be looking at diminishing returns here.
> I just did some "perf" measurement of the overall "initdb" cycle,
> and what I'm seeing suggests that bootstrap mode as such is now a
> pretty small fraction of the overall cycle:
> 
> +   51.07% 0.01%28  postgres postgres 
>  [.] PostgresMain  #
> ...
> +   13.52% 0.00% 0  postgres postgres 
>  [.] AuxiliaryProcessMain  #
> 
> That says that the post-bootstrap steps are now the bulk of the time,
> which agrees with naked-eye observation.

The AtEOXact_CatCache weren't only in bootstrap mode, but yea, it's by
far smaller wins in comparison to the regprocin thing.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-13 Thread Tom Lane
Andres Freund  writes:
> On 2017-04-13 12:56:14 -0400, Tom Lane wrote:
>> Andres Freund  writes:
>>> Cool.  I wonder if we also should remove AtEOXact_CatCache()'s
>>> cross-checks - the resowner replacement has been in place for a while,
>>> and seems robust enough.  They're now the biggest user of time.

>> Hm, biggest user of time in what workload?  I've not noticed that
>> function particularly.

> Just initdb.  I presume it's because the catcaches will frequently be
> relatively big there.

Hm.  That ties into something I was looking at yesterday.  The only
reason that function is called so much is that bootstrap mode runs a
separate transaction for *each line of the bki file* (cf do_start,
do_end in bootparse.y).  Which seems pretty silly.  I experimented
with collapsing all the transactions for consecutive DATA lines into
one transaction, but couldn't immediately make it work due to memory
management issues.  I didn't try collapsing the entire run into a
single transaction, but maybe that would actually be easier, though
no doubt more wasteful of memory.

>> I agree that it doesn't seem like we need to spend a lot of time
>> cross-checking there, though.  Maybe keep the code but #ifdef it
>> under some nondefault debugging symbol.

> Hm, if we want to keep it, maybe tie it to CLOBBER_CACHE_ALWAYS or such,
> so it gets compiled at least sometimes? Not a great fit, but ...

Don't like that, because CCA is by definition not the normal cache
behavior.  It would make a bit of sense to tie it to CACHEDEBUG,
but as you say, it'd never get tested normally if we do that.

On the whole, though, we may be looking at diminishing returns here.
I just did some "perf" measurement of the overall "initdb" cycle,
and what I'm seeing suggests that bootstrap mode as such is now a
pretty small fraction of the overall cycle:

+   51.07% 0.01%28  postgres postgres   
   [.] PostgresMain  #
...
+   13.52% 0.00% 0  postgres postgres   
   [.] AuxiliaryProcessMain  #

That says that the post-bootstrap steps are now the bulk of the time,
which agrees with naked-eye observation.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-13 Thread Andres Freund
On 2017-04-13 12:56:14 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Cool.  I wonder if we also should remove AtEOXact_CatCache()'s
> > cross-checks - the resowner replacement has been in place for a while,
> > and seems robust enough.  They're now the biggest user of time.
> 
> Hm, biggest user of time in what workload?  I've not noticed that
> function particularly.

Just initdb.  I presume it's because the catcaches will frequently be
relatively big there.


> I agree that it doesn't seem like we need to spend a lot of time
> cross-checking there, though.  Maybe keep the code but #ifdef it
> under some nondefault debugging symbol.

Hm, if we want to keep it, maybe tie it to CLOBBER_CACHE_ALWAYS or such,
so it gets compiled at least sometimes? Not a great fit, but ...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-13 Thread Tom Lane
Andres Freund  writes:
> Cool.  I wonder if we also should remove AtEOXact_CatCache()'s
> cross-checks - the resowner replacement has been in place for a while,
> and seems robust enough.  They're now the biggest user of time.

Hm, biggest user of time in what workload?  I've not noticed that
function particularly.

I agree that it doesn't seem like we need to spend a lot of time
cross-checking there, though.  Maybe keep the code but #ifdef it
under some nondefault debugging symbol.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-13 Thread Andres Freund
On 2017-04-13 12:13:30 -0400, Tom Lane wrote:
> Andreas Karlsson  writes:
> > Here is my proof of concept patch. It does basically the same thing as 
> > Andres's patch except that it handles quoted values a bit better and 
> > does not try to support anything other than the regproc type.
> 
> > The patch speeds up initdb without fsync from 0.80 seconds to 0.55 
> > seconds, which is a nice speedup, while adding a negligible amount of 
> > extra work on compilation.
> 
> I've pushed this with some mostly-cosmetic adjustments:

Cool.  I wonder if we also should remove AtEOXact_CatCache()'s
cross-checks - the resowner replacement has been in place for a while,
and seems robust enough.  They're now the biggest user of time.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-13 Thread Tom Lane
Andreas Karlsson  writes:
> Here is my proof of concept patch. It does basically the same thing as 
> Andres's patch except that it handles quoted values a bit better and 
> does not try to support anything other than the regproc type.

> The patch speeds up initdb without fsync from 0.80 seconds to 0.55 
> seconds, which is a nice speedup, while adding a negligible amount of 
> extra work on compilation.

I've pushed this with some mostly-cosmetic adjustments:

* created a single subroutine that understands how to split DATA lines,
rather than having several copies of the regex

* rearranged the code so that the data structure returned by
Catalog::Catalogs() isn't scribbled on (which was already
happening before your patch, but it seemed pretty ugly to me)

* stripped out the bootstrap-time name lookup code from all of reg*
not just regproc.

There's certainly lots more that could be done in the genbki code,
but I think all we can justify at this stage of the development
cycle is to get the low-hanging fruit for testing speedups.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-12 Thread Andreas Karlsson

On 04/12/2017 05:00 PM, Andreas Karlsson wrote:

Looked at this an option 1 seems simple enough if I am not missing
something. I might hack something up later tonight. Either way I think
this improvement can be done separately from the proposed replacement of
the catalog header files. Trying to fix everything at once often leads
to nothing being fixed at all.


Here is my proof of concept patch. It does basically the same thing as 
Andres's patch except that it handles quoted values a bit better and 
does not try to support anything other than the regproc type.


The patch speeds up initdb without fsync from 0.80 seconds to 0.55 
seconds, which is a nice speedup, while adding a negligible amount of 
extra work on compilation.


Two things which could be improved in this patch if people deem it 
important:


- Refactor the code to be more generic, like Andres patch, so it is 
easier to add lookups for other data types.


- Write something closer to a real .bki parser to actually understand 
quoted values and _null_ when parsing the data. Right now this patch 
only splits each row into its fields (while being aware of quotes) but 
does not attempt to remove quotes. The PoC patch treats "foo" and foo as 
different.


Andreas
commit bc8ede661b12407c0b6a9e4c0932d21028f87fc1
Author: Andreas Karlsson 
Date:   Wed Apr 12 21:00:49 2017 +0200

WIP: Resolve regproc entires when generating .bki

diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 6e9d57aa8d..f918c9ef8a 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -102,6 +102,7 @@ print $bki "# PostgreSQL $major_version\n";
 # vars to hold data needed for schemapg.h
 my %schemapg_entries;
 my @tables_needing_macros;
+my %procs;
 our @types;
 
 # produce output, one catalog at a time
@@ -164,20 +165,41 @@ foreach my $catname (@{ $catalogs->{names} })
 			$row->{bki_values} =~ s/\bPGUID\b/$BOOTSTRAP_SUPERUSERID/g;
 			$row->{bki_values} =~ s/\bPGNSP\b/$PG_CATALOG_NAMESPACE/g;
 
+			# Split values into tokens without interpreting their meaning.
+			my %bki_values;
+			@bki_values{@attnames} = $row->{bki_values} =~ /"[^"]*"|\S+/g;
+
+			# Substitute regproc entires with oids
+			foreach my $att (keys %bki_values)
+			{
+next if $bki_attr{$att}->{type} ne 'regproc';
+next if $bki_values{$att} =~ /\A(\d+|-|_null_)\z/;
+
+$bki_values{$att} = $procs{$bki_values{$att}};
+			}
+
+			# Save pg_proc oids for use by later catalogs. This relies on the
+			# order we process the files, but the bootstrap code also relies on
+			# pg_proc being loaded first.
+			if ($catname eq 'pg_proc')
+			{
+$procs{$bki_values{proname}} = $row->{oid};
+			}
+
 			# Save pg_type info for pg_attribute processing below
 			if ($catname eq 'pg_type')
 			{
-my %type;
+my %type = %bki_values;
 $type{oid} = $row->{oid};
-@type{@attnames} = split /\s+/, $row->{bki_values};
 push @types, \%type;
 			}
 
 			# Write to postgres.bki
 			my $oid = $row->{oid} ? "OID = $row->{oid} " : '';
-			printf $bki "insert %s( %s)\n", $oid, $row->{bki_values};
+			printf $bki "insert %s( %s)\n", $oid,
+			  join(' ', @bki_values{@attnames});
 
-		   # Write comments to postgres.description and postgres.shdescription
+			# Write comments to postgres.description and postgres.shdescription
 			if (defined $row->{descr})
 			{
 printf $descr "%s\t%s\t0\t%s\n", $row->{oid}, $catname,
diff --git a/src/backend/utils/Gen_fmgrtab.pl b/src/backend/utils/Gen_fmgrtab.pl
index 2af9b355e7..10d9c6abc7 100644
--- a/src/backend/utils/Gen_fmgrtab.pl
+++ b/src/backend/utils/Gen_fmgrtab.pl
@@ -58,16 +58,9 @@ foreach my $column (@{ $catalogs->{pg_proc}->{columns} })
 my $data = $catalogs->{pg_proc}->{data};
 foreach my $row (@$data)
 {
-
-	# To construct fmgroids.h and fmgrtab.c, we need to inspect some
-	# of the individual data fields.  Just splitting on whitespace
-	# won't work, because some quoted fields might contain internal
-	# whitespace.  We handle this by folding them all to a simple
-	# "xxx". Fortunately, this script doesn't need to look at any
-	# fields that might need quoting, so this simple hack is
-	# sufficient.
-	$row->{bki_values} =~ s/"[^"]*"/"xxx"/g;
-	@{$row}{@attnames} = split /\s+/, $row->{bki_values};
+	# Split values into tokens without interpreting their meaning.
+	# This is safe becease we are going to use the values as-is.
+	@{$row}{@attnames} = $row->{bki_values} =~ /"[^"]*"|\S+/g;
 
 	# Select out just the rows for internal-language procedures.
 	# Note assumption here that INTERNALlanguageId is 12.
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 702924a958..a4237b0661 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -87,51 +87,11 @@ regprocin(PG_FUNCTION_ARGS)
 	/* Else it's a name, possibly schema-qualified */
 
 	/*
-	 * In bootstrap mode we assume the given name is not schema-qualified, and
-	 * just 

Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-12 Thread Andres Freund
On 2017-04-12 10:12:47 -0400, Tom Lane wrote:
> Andres mentioned, and I've confirmed locally, that a large chunk of
> initdb's runtime goes into regprocin's brute-force lookups of function
> OIDs from function names.  The recent discussion about cutting TAP test
> time prompted me to look into that question again.  We had had some
> grand plans for getting genbki.pl to perform the name-to-OID conversion
> as part of a big rewrite, but since that project is showing few signs
> of life, I'm thinking that a more localized performance fix would be
> a good thing to look into.  There seem to be a couple of plausible
> routes to a fix:
> 
> 1. The best thing would still be to make genbki.pl do the conversion,
> and write numeric OIDs into postgres.bki.  The core stumbling block
> here seems to be that for most catalogs, Catalog.pm and genbki.pl
> never really break down a DATA line into fields --- and we certainly
> have got to do that, if we're going to replace the values of regproc
> fields.  The places that do need to do that approximate it like this:
> 
>   # To construct fmgroids.h and fmgrtab.c, we need to inspect some
>   # of the individual data fields.  Just splitting on whitespace
>   # won't work, because some quoted fields might contain internal
>   # whitespace.  We handle this by folding them all to a simple
>   # "xxx". Fortunately, this script doesn't need to look at any
>   # fields that might need quoting, so this simple hack is
>   # sufficient.
>   $row->{bki_values} =~ s/"[^"]*"/"xxx"/g;
>   @{$row}{@attnames} = split /\s+/, $row->{bki_values};
> 
> We would need a bullet-proof, non-hack, preferably not too slow way to
> split DATA lines into fields properly.  I'm one of the world's worst
> Perl programmers, but surely there's a way?

I've done something like 1) before:
http://archives.postgresql.org/message-id/20150221230839.GE2037%40awork2.anarazel.de

I don't think the speeds matters all that much, because we'll only do it
when generating the .bki file - a couple ms more or less won't matter
much.

I IIRC spent some more time to also load the data files from a different
format:
https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/sane-catalog
although that's presumably heavily outdated now.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-12 Thread Andreas Karlsson

On 04/12/2017 04:12 PM, Tom Lane wrote:

1. The best thing would still be to make genbki.pl do the conversion,
and write numeric OIDs into postgres.bki.  The core stumbling block
here seems to be that for most catalogs, Catalog.pm and genbki.pl
never really break down a DATA line into fields --- and we certainly
have got to do that, if we're going to replace the values of regproc
fields.  The places that do need to do that approximate it like this:

# To construct fmgroids.h and fmgrtab.c, we need to inspect some
# of the individual data fields.  Just splitting on whitespace
# won't work, because some quoted fields might contain internal
# whitespace.  We handle this by folding them all to a simple
# "xxx". Fortunately, this script doesn't need to look at any
# fields that might need quoting, so this simple hack is
# sufficient.
$row->{bki_values} =~ s/"[^"]*"/"xxx"/g;
@{$row}{@attnames} = split /\s+/, $row->{bki_values};

We would need a bullet-proof, non-hack, preferably not too slow way to
split DATA lines into fields properly.  I'm one of the world's worst
Perl programmers, but surely there's a way?

2. Alternatively, we could teach bootstrap mode to build a hashtable
mapping proname to OID while it reads pg_proc data from postgres.bki,
and then make regprocin's bootstrap path consult the hashtable instead
of looking directly at the pg_proc file.  That I'm quite sure is do-able,
but it seems like it's leaving money on the table compared to doing
the transformation earlier.

Thoughts?


Looked at this an option 1 seems simple enough if I am not missing 
something. I might hack something up later tonight. Either way I think 
this improvement can be done separately from the proposed replacement of 
the catalog header files. Trying to fix everything at once often leads 
to nothing being fixed at all.


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-12 Thread Tom Lane
Andres mentioned, and I've confirmed locally, that a large chunk of
initdb's runtime goes into regprocin's brute-force lookups of function
OIDs from function names.  The recent discussion about cutting TAP test
time prompted me to look into that question again.  We had had some
grand plans for getting genbki.pl to perform the name-to-OID conversion
as part of a big rewrite, but since that project is showing few signs
of life, I'm thinking that a more localized performance fix would be
a good thing to look into.  There seem to be a couple of plausible
routes to a fix:

1. The best thing would still be to make genbki.pl do the conversion,
and write numeric OIDs into postgres.bki.  The core stumbling block
here seems to be that for most catalogs, Catalog.pm and genbki.pl
never really break down a DATA line into fields --- and we certainly
have got to do that, if we're going to replace the values of regproc
fields.  The places that do need to do that approximate it like this:

# To construct fmgroids.h and fmgrtab.c, we need to inspect some
# of the individual data fields.  Just splitting on whitespace
# won't work, because some quoted fields might contain internal
# whitespace.  We handle this by folding them all to a simple
# "xxx". Fortunately, this script doesn't need to look at any
# fields that might need quoting, so this simple hack is
# sufficient.
$row->{bki_values} =~ s/"[^"]*"/"xxx"/g;
@{$row}{@attnames} = split /\s+/, $row->{bki_values};

We would need a bullet-proof, non-hack, preferably not too slow way to
split DATA lines into fields properly.  I'm one of the world's worst
Perl programmers, but surely there's a way?

2. Alternatively, we could teach bootstrap mode to build a hashtable
mapping proname to OID while it reads pg_proc data from postgres.bki,
and then make regprocin's bootstrap path consult the hashtable instead
of looking directly at the pg_proc file.  That I'm quite sure is do-able,
but it seems like it's leaving money on the table compared to doing
the transformation earlier.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers