Re: initdb's -c option behaves wrong way?

2024-03-04 Thread Kyotaro Horiguchi
At Mon, 4 Mar 2024 09:39:39 +0100, Daniel Gustafsson  wrote in 
> 
> 
> > On 4 Mar 2024, at 02:01, Tom Lane  wrote:
> > 
> > Daniel Gustafsson  writes:
> >> I took the liberty to add this to the upcoming CF to make sure we don't 
> >> lose
> >> track of it.
> > 
> > Thanks for doing that, because the cfbot pointed out a problem:
> > I should have written pg_strncasecmp not strncasecmp.  If this
> > version tests cleanly, I'll push it.
> 
> +1, LGTM.

Thank you for fixing this, Tom!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: initdb's -c option behaves wrong way?

2024-03-04 Thread Daniel Gustafsson



> On 4 Mar 2024, at 02:01, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> I took the liberty to add this to the upcoming CF to make sure we don't lose
>> track of it.
> 
> Thanks for doing that, because the cfbot pointed out a problem:
> I should have written pg_strncasecmp not strncasecmp.  If this
> version tests cleanly, I'll push it.

+1, LGTM.

--
Daniel Gustafsson





Re: initdb's -c option behaves wrong way?

2024-03-03 Thread Tom Lane
Daniel Gustafsson  writes:
> I took the liberty to add this to the upcoming CF to make sure we don't lose
> track of it.

Thanks for doing that, because the cfbot pointed out a problem:
I should have written pg_strncasecmp not strncasecmp.  If this
version tests cleanly, I'll push it.

regards, tom lane

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index ac409b0006..200b2e8e31 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -484,6 +484,7 @@ replace_guc_value(char **lines, const char *guc_name, const char *guc_value,
 	for (i = 0; lines[i]; i++)
 	{
 		const char *where;
+		const char *namestart;
 
 		/*
 		 * Look for a line assigning to guc_name.  Typically it will be
@@ -494,15 +495,19 @@ replace_guc_value(char **lines, const char *guc_name, const char *guc_value,
 		where = lines[i];
 		while (*where == '#' || isspace((unsigned char) *where))
 			where++;
-		if (strncmp(where, guc_name, namelen) != 0)
+		if (pg_strncasecmp(where, guc_name, namelen) != 0)
 			continue;
+		namestart = where;
 		where += namelen;
 		while (isspace((unsigned char) *where))
 			where++;
 		if (*where != '=')
 			continue;
 
-		/* found it -- append the original comment if any */
+		/* found it -- let's use the canonical casing shown in the file */
+		memcpy(>data[mark_as_comment ? 1 : 0], namestart, namelen);
+
+		/* now append the original comment if any */
 		where = strrchr(where, '#');
 		if (where)
 		{
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 03376cc0f7..413a5eca67 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -199,4 +199,17 @@ command_fails(
 command_fails([ 'initdb', '--no-sync', '--set', 'foo=bar', "$tempdir/dataX" ],
 	'fails for invalid --set option');
 
+# Make sure multiple invocations of -c parameters are added case insensitive
+command_ok(
+	[
+		'initdb', '-cwork_mem=128', '-cWork_Mem=256', '-cWORK_MEM=512',
+		"$tempdir/dataY"
+	],
+	'multiple -c options with different case');
+
+my $conf = slurp_file("$tempdir/dataY/postgresql.conf");
+ok($conf !~ qr/^WORK_MEM = /m, "WORK_MEM should not be configured");
+ok($conf !~ qr/^Work_Mem = /m, "Work_Mem should not be configured");
+ok($conf =~ qr/^work_mem = 512/m, "work_mem should be in config");
+
 done_testing();


Re: initdb's -c option behaves wrong way?

2024-02-29 Thread Daniel Gustafsson
> On 22 Jan 2024, at 11:09, Daniel Gustafsson  wrote:

> Feel free to go ahead with that
> version, or let me know if you want me to deal with it.

I took the liberty to add this to the upcoming CF to make sure we don't lose
track of it.

--
Daniel Gustafsson





Re: initdb's -c option behaves wrong way?

2024-01-22 Thread Daniel Gustafsson
> On 19 Jan 2024, at 17:33, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> I'll give some more time for opinions, then I'll go ahead with one of the
>> patches with a backpatch to v16.
> 
> OK, I take back my previous complaint that just using strncasecmp
> would be enough --- I was misremembering how the code worked, and
> you're right that it would use the spelling from the command line
> rather than that from the file.
> 
> However, the v3 patch is flat broken.  You can't assume you have
> found a match until you've verified that whitespace and '='
> appear next --- otherwise, you'll be fooled by a guc_name that
> is a prefix of one that appears in the file.  I think the simplest
> change that does it correctly is as attached.

The attached v4 looks good to me, I don't think it moves the needle wrt
readability (ie, no need to move the search).  Feel free to go ahead with that
version, or let me know if you want me to deal with it.

--
Daniel Gustafsson





Re: initdb's -c option behaves wrong way?

2024-01-19 Thread Tom Lane
Daniel Gustafsson  writes:
> I'll give some more time for opinions, then I'll go ahead with one of the
> patches with a backpatch to v16.

OK, I take back my previous complaint that just using strncasecmp
would be enough --- I was misremembering how the code worked, and
you're right that it would use the spelling from the command line
rather than that from the file.

However, the v3 patch is flat broken.  You can't assume you have
found a match until you've verified that whitespace and '='
appear next --- otherwise, you'll be fooled by a guc_name that
is a prefix of one that appears in the file.  I think the simplest
change that does it correctly is as attached.

Now, since I was the one who wrote the existing code, I freely
concede that I may have too high an opinion of its readability.
Maybe some larger refactoring is appropriate.  But I didn't
find that what you'd done improved the readability.  I'd still
rather keep the newline-assembly code together as much as possible.
Maybe we should do the search part before we build any of newline?

regards, tom lane

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index ac409b0006..6a620c9d5f 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -484,6 +484,7 @@ replace_guc_value(char **lines, const char *guc_name, const char *guc_value,
 	for (i = 0; lines[i]; i++)
 	{
 		const char *where;
+		const char *namestart;
 
 		/*
 		 * Look for a line assigning to guc_name.  Typically it will be
@@ -494,15 +495,19 @@ replace_guc_value(char **lines, const char *guc_name, const char *guc_value,
 		where = lines[i];
 		while (*where == '#' || isspace((unsigned char) *where))
 			where++;
-		if (strncmp(where, guc_name, namelen) != 0)
+		if (strncasecmp(where, guc_name, namelen) != 0)
 			continue;
+		namestart = where;
 		where += namelen;
 		while (isspace((unsigned char) *where))
 			where++;
 		if (*where != '=')
 			continue;
 
-		/* found it -- append the original comment if any */
+		/* found it -- let's use the canonical casing shown in the file */
+		memcpy(>data[mark_as_comment ? 1 : 0], namestart, namelen);
+
+		/* now append the original comment if any */
 		where = strrchr(where, '#');
 		if (where)
 		{
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 03376cc0f7..413a5eca67 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -199,4 +199,17 @@ command_fails(
 command_fails([ 'initdb', '--no-sync', '--set', 'foo=bar', "$tempdir/dataX" ],
 	'fails for invalid --set option');
 
+# Make sure multiple invocations of -c parameters are added case insensitive
+command_ok(
+	[
+		'initdb', '-cwork_mem=128', '-cWork_Mem=256', '-cWORK_MEM=512',
+		"$tempdir/dataY"
+	],
+	'multiple -c options with different case');
+
+my $conf = slurp_file("$tempdir/dataY/postgresql.conf");
+ok($conf !~ qr/^WORK_MEM = /m, "WORK_MEM should not be configured");
+ok($conf !~ qr/^Work_Mem = /m, "Work_Mem should not be configured");
+ok($conf =~ qr/^work_mem = 512/m, "work_mem should be in config");
+
 done_testing();


Re: initdb's -c option behaves wrong way?

2024-01-19 Thread Daniel Gustafsson
> On 18 Jan 2024, at 05:49, Kyotaro Horiguchi  wrote:
> 
> Thank you for upicking this up.
> 
> At Wed, 17 Jan 2024 23:47:41 +0100, Daniel Gustafsson  wrote 
> in 
>>> On 17 Jan 2024, at 21:33, Tom Lane  wrote:
>>> 
>>> I wrote:
 However ... I don't like the patch much.  It seems to have left
 the code in a rather random state.  Why, for example, didn't you
 keep all the code that constructs the "newline" value together?
>>> 
>>> After thinking about it a bit more, I don't see why you didn't just
>>> s/strncmp/strncasecmp/ and call it good.  The messiness seems to be
>>> a result of your choice to replace the GUC's case as shown in the
>>> file with the case used on the command line, which is not better
>>> IMO.  We don't change our mind about the canonical spelling of a
>>> GUC because somebody varied the case in a SET command.
>> 
>> The original patch was preserving the case of the file throwing away the case
>> from the commandline.  The attached is a slimmed down version which only 
>> moves
>> the assembly of newline to the different cases (replace vs.  new) keeping the
>> rest of the code intact.  Keeping it in one place gets sort of messy too 
>> since
>> it needs to use different values for a replacement and a new variable.  Not
>> sure if there is a cleaner approach?
> 
> Just to clarify, the current code breaks out after processing the
> first matching line. I haven't changed that behavior.

Yup.

> The reason I
> moved the rewrite processing code out of the loop was I wanted to
> avoid adding more lines that are executed only once into the
> loop. However, it is in exchange of additional complexity to remember
> the original spelling of the parameter name. Personally, I believe
> separating the search and rewrite processing is better, but I'm not
> particularly attached to the approach, so either way is fine with me.

I'll give some more time for opinions, then I'll go ahead with one of the
patches with a backpatch to v16.

--
Daniel Gustafsson





Re: initdb's -c option behaves wrong way?

2024-01-17 Thread Kyotaro Horiguchi
Thank you for upicking this up.

At Wed, 17 Jan 2024 23:47:41 +0100, Daniel Gustafsson  wrote 
in 
> > On 17 Jan 2024, at 21:33, Tom Lane  wrote:
> > 
> > I wrote:
> >> However ... I don't like the patch much.  It seems to have left
> >> the code in a rather random state.  Why, for example, didn't you
> >> keep all the code that constructs the "newline" value together?
> > 
> > After thinking about it a bit more, I don't see why you didn't just
> > s/strncmp/strncasecmp/ and call it good.  The messiness seems to be
> > a result of your choice to replace the GUC's case as shown in the
> > file with the case used on the command line, which is not better
> > IMO.  We don't change our mind about the canonical spelling of a
> > GUC because somebody varied the case in a SET command.
> 
> The original patch was preserving the case of the file throwing away the case
> from the commandline.  The attached is a slimmed down version which only moves
> the assembly of newline to the different cases (replace vs.  new) keeping the
> rest of the code intact.  Keeping it in one place gets sort of messy too since
> it needs to use different values for a replacement and a new variable.  Not
> sure if there is a cleaner approach?

Just to clarify, the current code breaks out after processing the
first matching line. I haven't changed that behavior.  The reason I
moved the rewrite processing code out of the loop was I wanted to
avoid adding more lines that are executed only once into the
loop. However, it is in exchange of additional complexity to remember
the original spelling of the parameter name. Personally, I believe
separating the search and rewrite processing is better, but I'm not
particularly attached to the approach, so either way is fine with me.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: initdb's -c option behaves wrong way?

2024-01-17 Thread Daniel Gustafsson
> On 17 Jan 2024, at 21:33, Tom Lane  wrote:
> 
> I wrote:
>> However ... I don't like the patch much.  It seems to have left
>> the code in a rather random state.  Why, for example, didn't you
>> keep all the code that constructs the "newline" value together?
> 
> After thinking about it a bit more, I don't see why you didn't just
> s/strncmp/strncasecmp/ and call it good.  The messiness seems to be
> a result of your choice to replace the GUC's case as shown in the
> file with the case used on the command line, which is not better
> IMO.  We don't change our mind about the canonical spelling of a
> GUC because somebody varied the case in a SET command.

The original patch was preserving the case of the file throwing away the case
from the commandline.  The attached is a slimmed down version which only moves
the assembly of newline to the different cases (replace vs.  new) keeping the
rest of the code intact.  Keeping it in one place gets sort of messy too since
it needs to use different values for a replacement and a new variable.  Not
sure if there is a cleaner approach?

--
Daniel Gustafsson



v3-0001-Make-initdb-c-option-case-insensitive.patch
Description: Binary data


Re: initdb's -c option behaves wrong way?

2024-01-17 Thread Tom Lane
I wrote:
> However ... I don't like the patch much.  It seems to have left
> the code in a rather random state.  Why, for example, didn't you
> keep all the code that constructs the "newline" value together?

After thinking about it a bit more, I don't see why you didn't just
s/strncmp/strncasecmp/ and call it good.  The messiness seems to be
a result of your choice to replace the GUC's case as shown in the
file with the case used on the command line, which is not better
IMO.  We don't change our mind about the canonical spelling of a
GUC because somebody varied the case in a SET command.

regards, tom lane




Re: initdb's -c option behaves wrong way?

2024-01-17 Thread Tom Lane
Robert Haas  writes:
> On Wed, Jan 17, 2024 at 2:31 PM Daniel Gustafsson  wrote:
>> Agreed, I think the patch as it stands now where it replaces case 
>> insensitive,
>> while keeping the original casing, is the best path forward.  The issue exist
>> in 16 as well so I propose a backpatch to there.

> I like that approach, too. I could go either way on back-patching. It
> doesn't seem important, but it probably won't break anything, either.

We just added this switch in 16, so I think backpatching to keep all
the branches consistent is a good idea.

However ... I don't like the patch much.  It seems to have left
the code in a rather random state.  Why, for example, didn't you
keep all the code that constructs the "newline" value together?
I'm also unconvinced by the removal of the "assume there's only
one match" break --- if we need to support multiple matches
I doubt that's sufficient.

regards, tom lane




Re: initdb's -c option behaves wrong way?

2024-01-17 Thread Robert Haas
On Wed, Jan 17, 2024 at 2:31 PM Daniel Gustafsson  wrote:
> Agreed, I think the patch as it stands now where it replaces case insensitive,
> while keeping the original casing, is the best path forward.  The issue exist
> in 16 as well so I propose a backpatch to there.

I like that approach, too. I could go either way on back-patching. It
doesn't seem important, but it probably won't break anything, either.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: initdb's -c option behaves wrong way?

2024-01-17 Thread Daniel Gustafsson
> On 17 Jan 2024, at 18:05, Tom Lane  wrote:
> 
> Alvaro Herrera  writes:
>> Hmm, how about raising an error if multiple options are given targetting
>> the same GUC?
> 
> I don't see any reason to do that.  The underlying configuration
> files don't complain about duplicate entries, they just take the
> last setting.

Agreed, I think the patch as it stands now where it replaces case insensitive,
while keeping the original casing, is the best path forward.  The issue exist
in 16 as well so I propose a backpatch to there.

--
Daniel Gustafsson





Re: initdb's -c option behaves wrong way?

2024-01-17 Thread Tom Lane
Alvaro Herrera  writes:
> Hmm, how about raising an error if multiple options are given targetting
> the same GUC?

I don't see any reason to do that.  The underlying configuration
files don't complain about duplicate entries, they just take the
last setting.

regards, tom lane




Re: initdb's -c option behaves wrong way?

2024-01-17 Thread Alvaro Herrera
On 2024-Jan-16, Daniel Gustafsson wrote:

> > On 28 Sep 2023, at 09:49, Kyotaro Horiguchi  wrote:
> 
> > I noticed that -c option of initdb behaves in an unexpected
> > manner. Identical variable names with variations in letter casing are
> > treated as distinct variables.
> > 
> > $ initdb -cwork_mem=100 -cWORK_MEM=1000 -cWork_mem=2000
> 
> > The original intention was apparently to overwrite the existing
> > line. Furthermore, I surmise that preserving the original letter
> > casing is preferable.
> 
> Circling back to an old thread, I agree that this seems odd and the original
> thread [0] makes no mention of it being intentional.

Hmm, how about raising an error if multiple options are given targetting
the same GUC?

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/




Re: initdb's -c option behaves wrong way?

2024-01-16 Thread Daniel Gustafsson
> On 28 Sep 2023, at 09:49, Kyotaro Horiguchi  wrote:

> I noticed that -c option of initdb behaves in an unexpected
> manner. Identical variable names with variations in letter casing are
> treated as distinct variables.
> 
> $ initdb -cwork_mem=100 -cWORK_MEM=1000 -cWork_mem=2000

> The original intention was apparently to overwrite the existing
> line. Furthermore, I surmise that preserving the original letter
> casing is preferable.

Circling back to an old thread, I agree that this seems odd and the original
thread [0] makes no mention of it being intentional.

The patch seems fine to me, the attached version is rebased, pgindented and has
a test case added.

--
Daniel Gustafsson

[0] 
https://www.postgresql.org/message-id/flat/2844176.1674681919%40sss.pgh.pa.us



v2-0001-Make-initdb-c-option-case-insensitive.patch
Description: Binary data


initdb's -c option behaves wrong way?

2023-09-28 Thread Kyotaro Horiguchi
Hello.

I noticed that -c option of initdb behaves in an unexpected
manner. Identical variable names with variations in letter casing are
treated as distinct variables.

$ initdb -cwork_mem=100 -cWORK_MEM=1000 -cWork_mem=2000
...
$ grep -i 'work_mem ' $PGDATA/postgresql.conf
work_mem = 100  # min 64kB
#maintenance_work_mem = 64MB# min 1MB
#autovacuum_work_mem = -1   # min 1MB, or -1 to use 
maintenance_work_mem
#logical_decoding_work_mem = 64MB   # min 64kB
WORK_MEM = 1000
Work_mem = 2000


The original intention was apparently to overwrite the existing
line. Furthermore, I surmise that preserving the original letter
casing is preferable.

Attached is a patch to address this issue.  To retrieve the variable
name from the existing line, the code is slightly restructured.
Alternatively, should we just down-case the provided variable names?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 0c6f5ceb0a..04419840f4 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -470,20 +470,14 @@ replace_guc_value(char **lines, const char *guc_name, const char *guc_value,
 	int			namelen = strlen(guc_name);
 	PQExpBuffer newline = createPQExpBuffer();
 	int			i;
+	const char *where;
+	const char *pname;
 
-	/* prepare the replacement line, except for possible comment and newline */
 	if (mark_as_comment)
 		appendPQExpBufferChar(newline, '#');
-	appendPQExpBuffer(newline, "%s = ", guc_name);
-	if (guc_value_requires_quotes(guc_value))
-		appendPQExpBuffer(newline, "'%s'", escape_quotes(guc_value));
-	else
-		appendPQExpBufferStr(newline, guc_value);
 
 	for (i = 0; lines[i]; i++)
 	{
-		const char *where;
-
 		/*
 		 * Look for a line assigning to guc_name.  Typically it will be
 		 * preceded by '#', but that might not be the case if a -c switch
@@ -493,15 +487,32 @@ replace_guc_value(char **lines, const char *guc_name, const char *guc_value,
 		where = lines[i];
 		while (*where == '#' || isspace((unsigned char) *where))
 			where++;
-		if (strncmp(where, guc_name, namelen) != 0)
+		if (strncasecmp(where, guc_name, namelen) != 0)
 			continue;
+
+		pname = where;
 		where += namelen;
 		while (isspace((unsigned char) *where))
 			where++;
-		if (*where != '=')
-			continue;
 
-		/* found it -- append the original comment if any */
+		/* assume there's only one match */
+		if (*where == '=')
+			break;
+	}
+
+	if (lines[i])
+	{
+		/* found it, rewrite the line preserving the original comment if any */
+		appendPQExpBuffer(newline, "%.*s = ", namelen, pname);
+		if (guc_value_requires_quotes(guc_value))
+			appendPQExpBuffer(newline, "'%s'", escape_quotes(guc_value));
+		else
+			appendPQExpBufferStr(newline, guc_value);
+
+		/*
+		 * completed body of line, now continue with potential indentation and
+		 * comment
+		 */
 		where = strrchr(where, '#');
 		if (where)
 		{
@@ -548,16 +559,19 @@ replace_guc_value(char **lines, const char *guc_name, const char *guc_value,
 
 		free(lines[i]);
 		lines[i] = newline->data;
-
-		break;	/* assume there's only one match */
 	}
-
-	if (lines[i] == NULL)
+	else
 	{
 		/*
 		 * No match, so append a new entry.  (We rely on the bootstrap server
 		 * to complain if it's not a valid GUC name.)
 		 */
+		appendPQExpBuffer(newline, "%s = ", guc_name);
+		if (guc_value_requires_quotes(guc_value))
+			appendPQExpBuffer(newline, "'%s'", escape_quotes(guc_value));
+		else
+			appendPQExpBufferStr(newline, guc_value);
+
 		appendPQExpBufferChar(newline, '\n');
 		lines = pg_realloc_array(lines, char *, i + 2);
 		lines[i++] = newline->data;