Re: initdb's -c option behaves wrong way?
At Mon, 4 Mar 2024 09:39:39 +0100, Daniel Gustafsson wrote in > > > > On 4 Mar 2024, at 02:01, Tom Lane wrote: > > > > Daniel Gustafsson writes: > >> I took the liberty to add this to the upcoming CF to make sure we don't > >> lose > >> track of it. > > > > Thanks for doing that, because the cfbot pointed out a problem: > > I should have written pg_strncasecmp not strncasecmp. If this > > version tests cleanly, I'll push it. > > +1, LGTM. Thank you for fixing this, Tom! regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: initdb's -c option behaves wrong way?
> On 4 Mar 2024, at 02:01, Tom Lane wrote: > > Daniel Gustafsson writes: >> I took the liberty to add this to the upcoming CF to make sure we don't lose >> track of it. > > Thanks for doing that, because the cfbot pointed out a problem: > I should have written pg_strncasecmp not strncasecmp. If this > version tests cleanly, I'll push it. +1, LGTM. -- Daniel Gustafsson
Re: initdb's -c option behaves wrong way?
Daniel Gustafsson writes: > I took the liberty to add this to the upcoming CF to make sure we don't lose > track of it. Thanks for doing that, because the cfbot pointed out a problem: I should have written pg_strncasecmp not strncasecmp. If this version tests cleanly, I'll push it. regards, tom lane diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index ac409b0006..200b2e8e31 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -484,6 +484,7 @@ replace_guc_value(char **lines, const char *guc_name, const char *guc_value, for (i = 0; lines[i]; i++) { const char *where; + const char *namestart; /* * Look for a line assigning to guc_name. Typically it will be @@ -494,15 +495,19 @@ replace_guc_value(char **lines, const char *guc_name, const char *guc_value, where = lines[i]; while (*where == '#' || isspace((unsigned char) *where)) where++; - if (strncmp(where, guc_name, namelen) != 0) + if (pg_strncasecmp(where, guc_name, namelen) != 0) continue; + namestart = where; where += namelen; while (isspace((unsigned char) *where)) where++; if (*where != '=') continue; - /* found it -- append the original comment if any */ + /* found it -- let's use the canonical casing shown in the file */ + memcpy(>data[mark_as_comment ? 1 : 0], namestart, namelen); + + /* now append the original comment if any */ where = strrchr(where, '#'); if (where) { diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl index 03376cc0f7..413a5eca67 100644 --- a/src/bin/initdb/t/001_initdb.pl +++ b/src/bin/initdb/t/001_initdb.pl @@ -199,4 +199,17 @@ command_fails( command_fails([ 'initdb', '--no-sync', '--set', 'foo=bar', "$tempdir/dataX" ], 'fails for invalid --set option'); +# Make sure multiple invocations of -c parameters are added case insensitive +command_ok( + [ + 'initdb', '-cwork_mem=128', '-cWork_Mem=256', '-cWORK_MEM=512', + "$tempdir/dataY" + ], + 'multiple -c options with different case'); + +my $conf = slurp_file("$tempdir/dataY/postgresql.conf"); +ok($conf !~ qr/^WORK_MEM = /m, "WORK_MEM should not be configured"); +ok($conf !~ qr/^Work_Mem = /m, "Work_Mem should not be configured"); +ok($conf =~ qr/^work_mem = 512/m, "work_mem should be in config"); + done_testing();
Re: initdb's -c option behaves wrong way?
> On 22 Jan 2024, at 11:09, Daniel Gustafsson wrote: > Feel free to go ahead with that > version, or let me know if you want me to deal with it. I took the liberty to add this to the upcoming CF to make sure we don't lose track of it. -- Daniel Gustafsson
Re: initdb's -c option behaves wrong way?
> On 19 Jan 2024, at 17:33, Tom Lane wrote: > > Daniel Gustafsson writes: >> I'll give some more time for opinions, then I'll go ahead with one of the >> patches with a backpatch to v16. > > OK, I take back my previous complaint that just using strncasecmp > would be enough --- I was misremembering how the code worked, and > you're right that it would use the spelling from the command line > rather than that from the file. > > However, the v3 patch is flat broken. You can't assume you have > found a match until you've verified that whitespace and '=' > appear next --- otherwise, you'll be fooled by a guc_name that > is a prefix of one that appears in the file. I think the simplest > change that does it correctly is as attached. The attached v4 looks good to me, I don't think it moves the needle wrt readability (ie, no need to move the search). Feel free to go ahead with that version, or let me know if you want me to deal with it. -- Daniel Gustafsson
Re: initdb's -c option behaves wrong way?
Daniel Gustafsson writes: > I'll give some more time for opinions, then I'll go ahead with one of the > patches with a backpatch to v16. OK, I take back my previous complaint that just using strncasecmp would be enough --- I was misremembering how the code worked, and you're right that it would use the spelling from the command line rather than that from the file. However, the v3 patch is flat broken. You can't assume you have found a match until you've verified that whitespace and '=' appear next --- otherwise, you'll be fooled by a guc_name that is a prefix of one that appears in the file. I think the simplest change that does it correctly is as attached. Now, since I was the one who wrote the existing code, I freely concede that I may have too high an opinion of its readability. Maybe some larger refactoring is appropriate. But I didn't find that what you'd done improved the readability. I'd still rather keep the newline-assembly code together as much as possible. Maybe we should do the search part before we build any of newline? regards, tom lane diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index ac409b0006..6a620c9d5f 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -484,6 +484,7 @@ replace_guc_value(char **lines, const char *guc_name, const char *guc_value, for (i = 0; lines[i]; i++) { const char *where; + const char *namestart; /* * Look for a line assigning to guc_name. Typically it will be @@ -494,15 +495,19 @@ replace_guc_value(char **lines, const char *guc_name, const char *guc_value, where = lines[i]; while (*where == '#' || isspace((unsigned char) *where)) where++; - if (strncmp(where, guc_name, namelen) != 0) + if (strncasecmp(where, guc_name, namelen) != 0) continue; + namestart = where; where += namelen; while (isspace((unsigned char) *where)) where++; if (*where != '=') continue; - /* found it -- append the original comment if any */ + /* found it -- let's use the canonical casing shown in the file */ + memcpy(>data[mark_as_comment ? 1 : 0], namestart, namelen); + + /* now append the original comment if any */ where = strrchr(where, '#'); if (where) { diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl index 03376cc0f7..413a5eca67 100644 --- a/src/bin/initdb/t/001_initdb.pl +++ b/src/bin/initdb/t/001_initdb.pl @@ -199,4 +199,17 @@ command_fails( command_fails([ 'initdb', '--no-sync', '--set', 'foo=bar', "$tempdir/dataX" ], 'fails for invalid --set option'); +# Make sure multiple invocations of -c parameters are added case insensitive +command_ok( + [ + 'initdb', '-cwork_mem=128', '-cWork_Mem=256', '-cWORK_MEM=512', + "$tempdir/dataY" + ], + 'multiple -c options with different case'); + +my $conf = slurp_file("$tempdir/dataY/postgresql.conf"); +ok($conf !~ qr/^WORK_MEM = /m, "WORK_MEM should not be configured"); +ok($conf !~ qr/^Work_Mem = /m, "Work_Mem should not be configured"); +ok($conf =~ qr/^work_mem = 512/m, "work_mem should be in config"); + done_testing();
Re: initdb's -c option behaves wrong way?
> On 18 Jan 2024, at 05:49, Kyotaro Horiguchi wrote: > > Thank you for upicking this up. > > At Wed, 17 Jan 2024 23:47:41 +0100, Daniel Gustafsson wrote > in >>> On 17 Jan 2024, at 21:33, Tom Lane wrote: >>> >>> I wrote: However ... I don't like the patch much. It seems to have left the code in a rather random state. Why, for example, didn't you keep all the code that constructs the "newline" value together? >>> >>> After thinking about it a bit more, I don't see why you didn't just >>> s/strncmp/strncasecmp/ and call it good. The messiness seems to be >>> a result of your choice to replace the GUC's case as shown in the >>> file with the case used on the command line, which is not better >>> IMO. We don't change our mind about the canonical spelling of a >>> GUC because somebody varied the case in a SET command. >> >> The original patch was preserving the case of the file throwing away the case >> from the commandline. The attached is a slimmed down version which only >> moves >> the assembly of newline to the different cases (replace vs. new) keeping the >> rest of the code intact. Keeping it in one place gets sort of messy too >> since >> it needs to use different values for a replacement and a new variable. Not >> sure if there is a cleaner approach? > > Just to clarify, the current code breaks out after processing the > first matching line. I haven't changed that behavior. Yup. > The reason I > moved the rewrite processing code out of the loop was I wanted to > avoid adding more lines that are executed only once into the > loop. However, it is in exchange of additional complexity to remember > the original spelling of the parameter name. Personally, I believe > separating the search and rewrite processing is better, but I'm not > particularly attached to the approach, so either way is fine with me. I'll give some more time for opinions, then I'll go ahead with one of the patches with a backpatch to v16. -- Daniel Gustafsson
Re: initdb's -c option behaves wrong way?
Thank you for upicking this up. At Wed, 17 Jan 2024 23:47:41 +0100, Daniel Gustafsson wrote in > > On 17 Jan 2024, at 21:33, Tom Lane wrote: > > > > I wrote: > >> However ... I don't like the patch much. It seems to have left > >> the code in a rather random state. Why, for example, didn't you > >> keep all the code that constructs the "newline" value together? > > > > After thinking about it a bit more, I don't see why you didn't just > > s/strncmp/strncasecmp/ and call it good. The messiness seems to be > > a result of your choice to replace the GUC's case as shown in the > > file with the case used on the command line, which is not better > > IMO. We don't change our mind about the canonical spelling of a > > GUC because somebody varied the case in a SET command. > > The original patch was preserving the case of the file throwing away the case > from the commandline. The attached is a slimmed down version which only moves > the assembly of newline to the different cases (replace vs. new) keeping the > rest of the code intact. Keeping it in one place gets sort of messy too since > it needs to use different values for a replacement and a new variable. Not > sure if there is a cleaner approach? Just to clarify, the current code breaks out after processing the first matching line. I haven't changed that behavior. The reason I moved the rewrite processing code out of the loop was I wanted to avoid adding more lines that are executed only once into the loop. However, it is in exchange of additional complexity to remember the original spelling of the parameter name. Personally, I believe separating the search and rewrite processing is better, but I'm not particularly attached to the approach, so either way is fine with me. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: initdb's -c option behaves wrong way?
> On 17 Jan 2024, at 21:33, Tom Lane wrote: > > I wrote: >> However ... I don't like the patch much. It seems to have left >> the code in a rather random state. Why, for example, didn't you >> keep all the code that constructs the "newline" value together? > > After thinking about it a bit more, I don't see why you didn't just > s/strncmp/strncasecmp/ and call it good. The messiness seems to be > a result of your choice to replace the GUC's case as shown in the > file with the case used on the command line, which is not better > IMO. We don't change our mind about the canonical spelling of a > GUC because somebody varied the case in a SET command. The original patch was preserving the case of the file throwing away the case from the commandline. The attached is a slimmed down version which only moves the assembly of newline to the different cases (replace vs. new) keeping the rest of the code intact. Keeping it in one place gets sort of messy too since it needs to use different values for a replacement and a new variable. Not sure if there is a cleaner approach? -- Daniel Gustafsson v3-0001-Make-initdb-c-option-case-insensitive.patch Description: Binary data
Re: initdb's -c option behaves wrong way?
I wrote: > However ... I don't like the patch much. It seems to have left > the code in a rather random state. Why, for example, didn't you > keep all the code that constructs the "newline" value together? After thinking about it a bit more, I don't see why you didn't just s/strncmp/strncasecmp/ and call it good. The messiness seems to be a result of your choice to replace the GUC's case as shown in the file with the case used on the command line, which is not better IMO. We don't change our mind about the canonical spelling of a GUC because somebody varied the case in a SET command. regards, tom lane
Re: initdb's -c option behaves wrong way?
Robert Haas writes: > On Wed, Jan 17, 2024 at 2:31 PM Daniel Gustafsson wrote: >> Agreed, I think the patch as it stands now where it replaces case >> insensitive, >> while keeping the original casing, is the best path forward. The issue exist >> in 16 as well so I propose a backpatch to there. > I like that approach, too. I could go either way on back-patching. It > doesn't seem important, but it probably won't break anything, either. We just added this switch in 16, so I think backpatching to keep all the branches consistent is a good idea. However ... I don't like the patch much. It seems to have left the code in a rather random state. Why, for example, didn't you keep all the code that constructs the "newline" value together? I'm also unconvinced by the removal of the "assume there's only one match" break --- if we need to support multiple matches I doubt that's sufficient. regards, tom lane
Re: initdb's -c option behaves wrong way?
On Wed, Jan 17, 2024 at 2:31 PM Daniel Gustafsson wrote: > Agreed, I think the patch as it stands now where it replaces case insensitive, > while keeping the original casing, is the best path forward. The issue exist > in 16 as well so I propose a backpatch to there. I like that approach, too. I could go either way on back-patching. It doesn't seem important, but it probably won't break anything, either. -- Robert Haas EDB: http://www.enterprisedb.com
Re: initdb's -c option behaves wrong way?
> On 17 Jan 2024, at 18:05, Tom Lane wrote: > > Alvaro Herrera writes: >> Hmm, how about raising an error if multiple options are given targetting >> the same GUC? > > I don't see any reason to do that. The underlying configuration > files don't complain about duplicate entries, they just take the > last setting. Agreed, I think the patch as it stands now where it replaces case insensitive, while keeping the original casing, is the best path forward. The issue exist in 16 as well so I propose a backpatch to there. -- Daniel Gustafsson
Re: initdb's -c option behaves wrong way?
Alvaro Herrera writes: > Hmm, how about raising an error if multiple options are given targetting > the same GUC? I don't see any reason to do that. The underlying configuration files don't complain about duplicate entries, they just take the last setting. regards, tom lane
Re: initdb's -c option behaves wrong way?
On 2024-Jan-16, Daniel Gustafsson wrote: > > On 28 Sep 2023, at 09:49, Kyotaro Horiguchi wrote: > > > I noticed that -c option of initdb behaves in an unexpected > > manner. Identical variable names with variations in letter casing are > > treated as distinct variables. > > > > $ initdb -cwork_mem=100 -cWORK_MEM=1000 -cWork_mem=2000 > > > The original intention was apparently to overwrite the existing > > line. Furthermore, I surmise that preserving the original letter > > casing is preferable. > > Circling back to an old thread, I agree that this seems odd and the original > thread [0] makes no mention of it being intentional. Hmm, how about raising an error if multiple options are given targetting the same GUC? -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/
Re: initdb's -c option behaves wrong way?
> On 28 Sep 2023, at 09:49, Kyotaro Horiguchi wrote: > I noticed that -c option of initdb behaves in an unexpected > manner. Identical variable names with variations in letter casing are > treated as distinct variables. > > $ initdb -cwork_mem=100 -cWORK_MEM=1000 -cWork_mem=2000 > The original intention was apparently to overwrite the existing > line. Furthermore, I surmise that preserving the original letter > casing is preferable. Circling back to an old thread, I agree that this seems odd and the original thread [0] makes no mention of it being intentional. The patch seems fine to me, the attached version is rebased, pgindented and has a test case added. -- Daniel Gustafsson [0] https://www.postgresql.org/message-id/flat/2844176.1674681919%40sss.pgh.pa.us v2-0001-Make-initdb-c-option-case-insensitive.patch Description: Binary data
initdb's -c option behaves wrong way?
Hello. I noticed that -c option of initdb behaves in an unexpected manner. Identical variable names with variations in letter casing are treated as distinct variables. $ initdb -cwork_mem=100 -cWORK_MEM=1000 -cWork_mem=2000 ... $ grep -i 'work_mem ' $PGDATA/postgresql.conf work_mem = 100 # min 64kB #maintenance_work_mem = 64MB# min 1MB #autovacuum_work_mem = -1 # min 1MB, or -1 to use maintenance_work_mem #logical_decoding_work_mem = 64MB # min 64kB WORK_MEM = 1000 Work_mem = 2000 The original intention was apparently to overwrite the existing line. Furthermore, I surmise that preserving the original letter casing is preferable. Attached is a patch to address this issue. To retrieve the variable name from the existing line, the code is slightly restructured. Alternatively, should we just down-case the provided variable names? regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 0c6f5ceb0a..04419840f4 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -470,20 +470,14 @@ replace_guc_value(char **lines, const char *guc_name, const char *guc_value, int namelen = strlen(guc_name); PQExpBuffer newline = createPQExpBuffer(); int i; + const char *where; + const char *pname; - /* prepare the replacement line, except for possible comment and newline */ if (mark_as_comment) appendPQExpBufferChar(newline, '#'); - appendPQExpBuffer(newline, "%s = ", guc_name); - if (guc_value_requires_quotes(guc_value)) - appendPQExpBuffer(newline, "'%s'", escape_quotes(guc_value)); - else - appendPQExpBufferStr(newline, guc_value); for (i = 0; lines[i]; i++) { - const char *where; - /* * Look for a line assigning to guc_name. Typically it will be * preceded by '#', but that might not be the case if a -c switch @@ -493,15 +487,32 @@ replace_guc_value(char **lines, const char *guc_name, const char *guc_value, where = lines[i]; while (*where == '#' || isspace((unsigned char) *where)) where++; - if (strncmp(where, guc_name, namelen) != 0) + if (strncasecmp(where, guc_name, namelen) != 0) continue; + + pname = where; where += namelen; while (isspace((unsigned char) *where)) where++; - if (*where != '=') - continue; - /* found it -- append the original comment if any */ + /* assume there's only one match */ + if (*where == '=') + break; + } + + if (lines[i]) + { + /* found it, rewrite the line preserving the original comment if any */ + appendPQExpBuffer(newline, "%.*s = ", namelen, pname); + if (guc_value_requires_quotes(guc_value)) + appendPQExpBuffer(newline, "'%s'", escape_quotes(guc_value)); + else + appendPQExpBufferStr(newline, guc_value); + + /* + * completed body of line, now continue with potential indentation and + * comment + */ where = strrchr(where, '#'); if (where) { @@ -548,16 +559,19 @@ replace_guc_value(char **lines, const char *guc_name, const char *guc_value, free(lines[i]); lines[i] = newline->data; - - break; /* assume there's only one match */ } - - if (lines[i] == NULL) + else { /* * No match, so append a new entry. (We rely on the bootstrap server * to complain if it's not a valid GUC name.) */ + appendPQExpBuffer(newline, "%s = ", guc_name); + if (guc_value_requires_quotes(guc_value)) + appendPQExpBuffer(newline, "'%s'", escape_quotes(guc_value)); + else + appendPQExpBufferStr(newline, guc_value); + appendPQExpBufferChar(newline, '\n'); lines = pg_realloc_array(lines, char *, i + 2); lines[i++] = newline->data;