Re: Inconsistent printf placeholders
Thank you for looking this. At Tue, 19 Mar 2024 10:50:23 +0100, Peter Eisentraut wrote in > On 15.03.24 08:20, Kyotaro Horiguchi wrote: > > diff --git a/src/backend/access/transam/twophase.c > > b/src/backend/access/transam/twophase.c > > @@ -1369,8 +1369,8 @@ ReadTwoPhaseFile(TransactionId xid, bool > > missing_ok) > > errmsg("could not read file \"%s\": > > %m", path))); > > else > > ereport(ERROR, > > - (errmsg("could not read file \"%s\": > > - read %d of %lld", > > - path, r, (long long > > - int) stat.st_size))); > > + (errmsg("could not read file \"%s\": read %zd of %zu", > > + path, r, (Size) stat.st_size))); > > } > > pgstat_report_wait_end(); > > This might be worse, because stat.st_size is of type off_t, which > could be smaller than Size/size_t. I think you were trying to mention that off_t could be wider than size_t and you're right in that point. I thought that it is safe since we are trying to read the whole content of the file at once here into palloc'ed memory. However, on second thought, if st_size is out of the range of ssize_t, and palloc accepts that size, at least on Linux, read(2) reads only 0x7000 bytes and raches the error reporting. Addition to that, this size was closer to the memory allocation size limit than I had thought. As the result, I removed the change. However, I kept the change of the type of variable "r" and corresponding placeholder %zd. > > diff --git a/src/backend/libpq/be-secure-gssapi.c > > b/src/backend/libpq/be-secure-gssapi.c > > index bc04e78abb..68645b4519 100644 > > --- a/src/backend/libpq/be-secure-gssapi.c > > +++ b/src/backend/libpq/be-secure-gssapi.c > > @@ -572,9 +572,9 @@ secure_open_gssapi(Port *port) > > if (input.length > PQ_GSS_RECV_BUFFER_SIZE) > > { > > ereport(COMMERROR, > > - (errmsg("oversize GSSAPI packet sent > > - by the client (%zu > %d)", > > + (errmsg("oversize GSSAPI packet sent by the client (%zu > %zu)", > > (size_t) input.length, > > - > > PQ_GSS_RECV_BUFFER_SIZE))); > > + (size_t) PQ_GSS_RECV_BUFFER_SIZE))); > > return -1; > > } > > > > Might be better to add that cast to the definition of > PQ_GSS_RECV_BUFFER_SIZE instead, so that all code can benefit. As far as I see, the only exceptional uses I found were a comparison with int values, and being passed as an OM_uint32 (to one of the parameters of gss_wrap_size_limit()). Therefore, I agree that it is beneficial. By the way, we currently define Size as the same as size_t (since 1998). Is it correct to consider Size as merely for backward compatibility and we should use size_t for new code? I used size_t in the modified part in the attached patch. > > diff --git a/src/backend/replication/repl_gram.y > > b/src/backend/replication/repl_gram.y > > index 7474f5bd67..baa76280b9 100644 > > --- a/src/backend/replication/repl_gram.y > > +++ b/src/backend/replication/repl_gram.y > > @@ -312,11 +312,6 @@ timeline_history: > > { > > TimeLineHistoryCmd *cmd; > > - if ($2 <= 0) > > - ereport(ERROR, > > - > > (errcode(ERRCODE_SYNTAX_ERROR), > > -errmsg("invalid > > -timeline %u", > > -$2))); > > - ... > I don't think this is correct. It loses the check for == 0. Ugh. It's my mistake. So we need to consider unifying the messages again. In walsummaryfuncs.c, %lld is required, but it's silly for the uses in repl_gram.y. Finally, I chose not to change anything here. > > diff --git a/src/backend/tsearch/to_tsany.c > > b/src/backend/tsearch/to_tsany.c > > index 88cba58cba..9d21178107 100644 > > --- a/src/backend/tsearch/to_tsany.c > > +++ b/src/backend/tsearch/to_tsany.c > > @@ -191,7 +191,8 @@ make_tsvector(ParsedText *prs) > > if (lenstr > MAXSTRPOS) > > ereport(ERROR, > > (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), > > -errmsg("string is too long for tsvector (%d > > -bytes, max %d bytes)", lenstr, MAXSTRPOS))); > > + /* cast values to avoid extra translatable messages */ > > + errmsg("string is too long for tsvector (%ld bytes, max %ld bytes)", > > (long) lenstr, (long)
Re: Inconsistent printf placeholders
On 15.03.24 08:20, Kyotaro Horiguchi wrote: diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c @@ -1369,8 +1369,8 @@ ReadTwoPhaseFile(TransactionId xid, bool missing_ok) errmsg("could not read file \"%s\": %m", path))); else ereport(ERROR, - (errmsg("could not read file \"%s\": read %d of %lld", - path, r, (long long int) stat.st_size))); + (errmsg("could not read file \"%s\": read %zd of %zu", + path, r, (Size) stat.st_size))); } pgstat_report_wait_end(); This might be worse, because stat.st_size is of type off_t, which could be smaller than Size/size_t. diff --git a/src/backend/libpq/be-secure-gssapi.c b/src/backend/libpq/be-secure-gssapi.c index bc04e78abb..68645b4519 100644 --- a/src/backend/libpq/be-secure-gssapi.c +++ b/src/backend/libpq/be-secure-gssapi.c @@ -572,9 +572,9 @@ secure_open_gssapi(Port *port) if (input.length > PQ_GSS_RECV_BUFFER_SIZE) { ereport(COMMERROR, - (errmsg("oversize GSSAPI packet sent by the client (%zu > %d)", + (errmsg("oversize GSSAPI packet sent by the client (%zu > %zu)", (size_t) input.length, - PQ_GSS_RECV_BUFFER_SIZE))); + (size_t) PQ_GSS_RECV_BUFFER_SIZE))); return -1; } Might be better to add that cast to the definition of PQ_GSS_RECV_BUFFER_SIZE instead, so that all code can benefit. diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y index 7474f5bd67..baa76280b9 100644 --- a/src/backend/replication/repl_gram.y +++ b/src/backend/replication/repl_gram.y @@ -312,11 +312,6 @@ timeline_history: { TimeLineHistoryCmd *cmd; - if ($2 <= 0) - ereport(ERROR, - (errcode(ERRCODE_SYNTAX_ERROR), -errmsg("invalid timeline %u", $2))); - cmd = makeNode(TimeLineHistoryCmd); cmd->timeline = $2; @@ -352,13 +347,7 @@ opt_slot: opt_timeline: K_TIMELINE UCONST - { - if ($2 <= 0) - ereport(ERROR, - (errcode(ERRCODE_SYNTAX_ERROR), -errmsg("invalid timeline %u", $2))); - $$ = $2; - } + { $$ = $2; } | /* EMPTY */ { $$ = 0; } ; I don't think this is correct. It loses the check for == 0. diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c index 88cba58cba..9d21178107 100644 --- a/src/backend/tsearch/to_tsany.c +++ b/src/backend/tsearch/to_tsany.c @@ -191,7 +191,8 @@ make_tsvector(ParsedText *prs) if (lenstr > MAXSTRPOS) ereport(ERROR, (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), -errmsg("string is too long for tsvector (%d bytes, max %d bytes)", lenstr, MAXSTRPOS))); +/* cast values to avoid extra translatable messages */ +errmsg("string is too long for tsvector (%ld bytes, max %ld bytes)", (long) lenstr, (long) MAXSTRPOS))); totallen = CALCDATASIZE(prs->curwords, lenstr); in = (TSVector) palloc0(totallen); I think it would be better instead to change the message in tsvectorin() to *not* use long. The size of long is unportable, so I would rather avoid using it at all. diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index 8d28dd42ce..5de490b569 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -3217,8 +3217,9 @@ byteaGetByte(PG_FUNCTION_ARGS) if (n < 0 || n >= len) ereport(ERROR, (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR), -errmsg("index %d out of valid range, 0..%d", - n, len - 1))); +/* cast
Re: Inconsistent printf placeholders
At Fri, 15 Mar 2024 16:20:27 +0900 (JST), Kyotaro Horiguchi wrote in > I checked for that kind of msgids in a bit more intensive way. The > numbers are the line numbers in ja.po of backend. I didn't check the > same for other modules. > > > ###: invalid timeline %lld > >@ backup/walsummaryfuncs.c:95 > > invalid timeline %u > >@ repl_gram.y:318 repl_gram.y:359 "The numbers are the line numbers in ja.po" is wrong. Correctly, it should be written as: The information consists of two lines: the first of them is the msgid, and the second indicates the locations where they appear. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Inconsistent printf placeholders
At Fri, 15 Mar 2024 16:01:28 +1300, David Rowley wrote in > On Fri, 15 Mar 2024 at 15:27, Kyotaro Horiguchi > wrote: > > I have considered only the two messages. Actually, buffile.c and md.c > > are already like that. The attached aligns the messages in > > pg_combinebackup.c and reconstruct.c with the precedents. > > This looks like a worthy cause to make translator work easier. > > I don't want to widen the goalposts or anything, but just wondering if > you'd searched for any others that could get similar treatment? > > I only just had a quick look at the following. > > $ cat src/backend/po/fr.po | grep -E "^msgid\s" | sed -E > 's/%[a-zA-Z]+/\%/g' | sort | uniq -d -c > 31 msgid "" > 2 msgid "could not accept SSL connection: %" > 2 msgid "could not initialize LDAP: %" > 2 msgid "could not look up local user ID %: %" ... > I've not looked at how hard it would be with any of the above to > determine how hard it would be to make the formats consistent. The > 3rd last one seems similar enough that it might be worth doing > together with this? I checked for that kind of msgids in a bit more intensive way. The numbers are the line numbers in ja.po of backend. I didn't check the same for other modules. > ###: invalid timeline %lld >@ backup/walsummaryfuncs.c:95 > invalid timeline %u >@ repl_gram.y:318 repl_gram.y:359 In the first case, the %lld can be more than 2^31. In the second case, %u is uint32. However, the bigger issue is checking if the uint32 value is negative: repl_gram.c: 147 > if ((yyvsp[0].uintval) <= 0) > ereport(ERROR, > (errcode(ERRCODE_SYNTAX_ERROR), >errmsg("invalid timeline %u", > (yyvsp[0].uintval; which cannot happen. I think we can simply remove the useless error check. > ###: could not read file \"%s\": read %d of %zu >@ ../common/controldata_utils.c:116 ../common/controldata_utils.c:119 > access/transam/xlog.c:3417 access/transam/xlog.c:4278 > replication/logical/origin.c:750 replication/logical/origin.c:789 > replication/logical/snapbuild.c:2040 replication/slot.c:2218 > replication/slot.c:2259 replication/walsender.c:660 > utils/cache/relmapper.c:833 > could not read file \"%s\": read %d of %lld >@ access/transam/twophase.c:1372 > could not read file \"%s\": read %zd of %zu >@ backup/basebackup.c:2102 > ###: oversize GSSAPI packet sent by the client (%zu > %zu) >@ libpq/be-secure-gssapi.c:351 > oversize GSSAPI packet sent by the client (%zu > %d) >@ libpq/be-secure-gssapi.c:575 > ###: compressed segment file \"%s\" has incorrect uncompressed size %d, > skipping >@ pg_receivewal.c:362 > compressed segment file \"%s\" has incorrect uncompressed size %zu, > skipping >@ pg_receivewal.c:448 > ###: could not read file \"%s\": read only %zd of %lld bytes >@ pg_combinebackup.c:1304 > could not read file \"%s\": read only %d of %u bytes >@ reconstruct.c:514 We can "fix" them the same way. I debated whether to use ssize_t for read() and replace all instances of size_t with Size. However, in the end, I decided to only keep it consistent with the surrounding code. > ###: index %d out of valid range, 0..%d >@ utils/adt/varlena.c:3220 utils/adt/varlena.c:3287 > index %lld out of valid range, 0..%lld >@ utils/adt/varlena.c:3251 utils/adt/varlena.c:3323 > ###: string is too long for tsvector (%d bytes, max %d bytes) >@ tsearch/to_tsany.c:194 utils/adt/tsvector.c:277 > utils/adt/tsvector_op.c:1126 > string is too long for tsvector (%ld bytes, max %ld bytes) >@ utils/adt/tsvector.c:223 We can unify them and did in the attached, but I'm not sure that's sensible.. > ###: could not look up local user ID %d: %s >@ ../port/user.c:43 ../port/user.c:79 > could not look up local user ID %ld: %s >@ libpq/auth.c:1886 Both of the above use uid_t (defined as int) as user ID and it is explicitly cast to "int" in the first case and to long in the second, which seems mysterious. Although I'm not sure if there's a possibility of uid_t being widened in the future, I unified them to the latter way for robustness. > ###: error while cloning relation \"%s.%s\" (\"%s\" to \"%s\"): %m >@ file.c:44 > error while cloning relation \"%s.%s\" (\"%s\" to \"%s\"): %s >@ file.c:65 > > The latter seems to be changed to %m by reassiging save_errno to errno, as > done in other places. > > ###: could not get data directory using %s: %m >@ option.c:448 > could not get data directory using %s: %s >@ option.c:452 > ###: could not get control data using %s: %m >@ controldata.c:129 controldata.c:199 > could not get control data using %s: %s >@ controldata.c:175 controldata.c:507 > ###: %s: %m >@ copy.c:401 psqlscanslash.l:805 psqlscanslash.l:817 >
Re: Inconsistent printf placeholders
On Fri, 15 Mar 2024 at 15:27, Kyotaro Horiguchi wrote: > I have considered only the two messages. Actually, buffile.c and md.c > are already like that. The attached aligns the messages in > pg_combinebackup.c and reconstruct.c with the precedents. This looks like a worthy cause to make translator work easier. I don't want to widen the goalposts or anything, but just wondering if you'd searched for any others that could get similar treatment? I only just had a quick look at the following. $ cat src/backend/po/fr.po | grep -E "^msgid\s" | sed -E 's/%[a-zA-Z]+/\%/g' | sort | uniq -d -c 31 msgid "" 2 msgid "could not accept SSL connection: %" 2 msgid "could not initialize LDAP: %" 2 msgid "could not look up local user ID %: %" 2 msgid "could not open file \"%\": %" 2 msgid "could not read file \"%\": read % of %" 2 msgid "could not read from log segment %, offset %: %" 2 msgid "could not read from log segment %, offset %: read % of %" 2 msgid "index % out of valid range, 0..%" 2 msgid "invalid value for parameter \"%\": %" 2 msgid "%%% is outside the valid range for parameter \"%\" (% .. %)" 2 msgid "must be owner of large object %" 2 msgid "oversize GSSAPI packet sent by the client (% > %)" 2 msgid "permission denied for large object %" 2 msgid "string is too long for tsvector (% bytes, max % bytes)" 2 msgid "timestamp out of range: \"%\"" 2 msgid "Valid values are between \"%\" and \"%\"." I've not looked at how hard it would be with any of the above to determine how hard it would be to make the formats consistent. The 3rd last one seems similar enough that it might be worth doing together with this? David
Re: Inconsistent printf placeholders
Thank you for the suggestions. At Thu, 14 Mar 2024 13:45:41 +0100, Daniel Gustafsson wrote in > I've only skimmed this so far but +1 on keeping the messages the same where > possible to reduce translation work. Adding a comment on the message where > the > casting is done to indicate that it is for translation might reduce the risk > of > it "getting fixed" down the line. Added a comment "/* cast xxx to avoid extra translatable messages */. At Thu, 14 Mar 2024 14:02:46 +0100, Peter Eisentraut wrote in > If you want to make them uniform, then I suggest the error messages Yeah. Having the same messages with only the placeholders changed is not very pleasing during translation. If possible, I would like to align them. > should both be "%zd of %zu bytes", which are the actual types read() > deals with. I have considered only the two messages. Actually, buffile.c and md.c are already like that. The attached aligns the messages in pg_combinebackup.c and reconstruct.c with the precedents. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/bin/pg_combinebackup/pg_combinebackup.c b/src/bin/pg_combinebackup/pg_combinebackup.c index 6f0814d9ac..feb4d5dcf4 100644 --- a/src/bin/pg_combinebackup/pg_combinebackup.c +++ b/src/bin/pg_combinebackup/pg_combinebackup.c @@ -1301,8 +1301,9 @@ slurp_file(int fd, char *filename, StringInfo buf, int maxlen) if (rb < 0) pg_fatal("could not read file \"%s\": %m", filename); else - pg_fatal("could not read file \"%s\": read only %zd of %lld bytes", - filename, rb, (long long int) st.st_size); + /* cast st_size to avoid extra translatable messages */ + pg_fatal("could not read file \"%s\": read only %zd of %zu bytes", + filename, rb, (size_t) st.st_size); } /* Adjust buffer length for new data and restore trailing-\0 invariant */ diff --git a/src/bin/pg_combinebackup/reconstruct.c b/src/bin/pg_combinebackup/reconstruct.c index 41f06bb26b..a4badb90e2 100644 --- a/src/bin/pg_combinebackup/reconstruct.c +++ b/src/bin/pg_combinebackup/reconstruct.c @@ -504,15 +504,16 @@ make_rfile(char *filename, bool missing_ok) static void read_bytes(rfile *rf, void *buffer, unsigned length) { - int rb = read(rf->fd, buffer, length); + ssize_t rb = read(rf->fd, buffer, length); if (rb != length) { if (rb < 0) pg_fatal("could not read file \"%s\": %m", rf->filename); else - pg_fatal("could not read file \"%s\": read only %d of %u bytes", - rf->filename, rb, length); + /* cast length to avoid extra translatable messages */ + pg_fatal("could not read file \"%s\": read only %zd of %zu bytes", + rf->filename, rb, (size_t) length); } }
Re: Inconsistent printf placeholders
On 14.03.24 05:20, Kyotaro Horiguchi wrote: A recent commit 6612185883 introduced two error messages that are identical in text but differ in their placeholders. - pg_fatal("could not read file \"%s\": read only %d of %d bytes", -filename, (int) rb, (int) st.st_size); + pg_fatal("could not read file \"%s\": read only %zd of %lld bytes", +filename, rb, (long long int) st.st_size); ... - pg_fatal("could not read file \"%s\": read only %d of %d bytes", + pg_fatal("could not read file \"%s\": read only %d of %u bytes", rf->filename, rb, length); I'd be happy if the two messages kept consistency. I suggest aligning types instead of making the messages different, as attached. If you want to make them uniform, then I suggest the error messages should both be "%zd of %zu bytes", which are the actual types read() deals with.
Re: Inconsistent printf placeholders
> On 14 Mar 2024, at 05:20, Kyotaro Horiguchi wrote: > I'd be happy if the two messages kept consistency. I suggest aligning > types instead of making the messages different, as attached. I've only skimmed this so far but +1 on keeping the messages the same where possible to reduce translation work. Adding a comment on the message where the casting is done to indicate that it is for translation might reduce the risk of it "getting fixed" down the line. -- Daniel Gustafsson
Inconsistent printf placeholders
Hello. A recent commit 6612185883 introduced two error messages that are identical in text but differ in their placeholders. - pg_fatal("could not read file \"%s\": read only %d of %d bytes", -filename, (int) rb, (int) st.st_size); + pg_fatal("could not read file \"%s\": read only %zd of %lld bytes", +filename, rb, (long long int) st.st_size); ... - pg_fatal("could not read file \"%s\": read only %d of %d bytes", + pg_fatal("could not read file \"%s\": read only %d of %u bytes", rf->filename, rb, length); I'd be happy if the two messages kept consistency. I suggest aligning types instead of making the messages different, as attached. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/bin/pg_combinebackup/reconstruct.c b/src/bin/pg_combinebackup/reconstruct.c index 41f06bb26b..e3b8e84289 100644 --- a/src/bin/pg_combinebackup/reconstruct.c +++ b/src/bin/pg_combinebackup/reconstruct.c @@ -504,15 +504,15 @@ make_rfile(char *filename, bool missing_ok) static void read_bytes(rfile *rf, void *buffer, unsigned length) { - int rb = read(rf->fd, buffer, length); + ssize_t rb = read(rf->fd, buffer, length); if (rb != length) { if (rb < 0) pg_fatal("could not read file \"%s\": %m", rf->filename); else - pg_fatal("could not read file \"%s\": read only %d of %u bytes", - rf->filename, rb, length); + pg_fatal("could not read file \"%s\": read only %zd of %lld bytes", + rf->filename, rb, (long long int) length); } }