Re: Inconsistent printf placeholders

2024-03-21 Thread Kyotaro Horiguchi
Thank you for looking this.

At Tue, 19 Mar 2024 10:50:23 +0100, Peter Eisentraut  
wrote in 
> On 15.03.24 08:20, Kyotaro Horiguchi wrote:
> > diff --git a/src/backend/access/transam/twophase.c
> > b/src/backend/access/transam/twophase.c
> > @@ -1369,8 +1369,8 @@ ReadTwoPhaseFile(TransactionId xid, bool
> > missing_ok)
> >  errmsg("could not read file \"%s\":
> >  %m", path)));
> > else
> > ereport(ERROR,
> > -   (errmsg("could not read file \"%s\":
> > -   read %d of %lld",
> > -   path, r, (long long
> > -   int) stat.st_size)));
> > + (errmsg("could not read file \"%s\": read %zd of %zu",
> > + path, r, (Size) stat.st_size)));
> > }
> > pgstat_report_wait_end();
> 
> This might be worse, because stat.st_size is of type off_t, which
> could be smaller than Size/size_t.

I think you were trying to mention that off_t could be wider than
size_t and you're right in that point. I thought that it is safe since
we are trying to read the whole content of the file at once here into
palloc'ed memory.

However, on second thought, if st_size is out of the range of ssize_t,
and palloc accepts that size, at least on Linux, read(2) reads only
0x7000 bytes and raches the error reporting. Addition to that,
this size was closer to the memory allocation size limit than I had
thought.

As the result, I removed the change. However, I kept the change of the
type of variable "r" and corresponding placeholder %zd.

> > diff --git a/src/backend/libpq/be-secure-gssapi.c
> > b/src/backend/libpq/be-secure-gssapi.c
> > index bc04e78abb..68645b4519 100644
> > --- a/src/backend/libpq/be-secure-gssapi.c
> > +++ b/src/backend/libpq/be-secure-gssapi.c
> > @@ -572,9 +572,9 @@ secure_open_gssapi(Port *port)
> > if (input.length > PQ_GSS_RECV_BUFFER_SIZE)
> > {
> > ereport(COMMERROR,
> > -   (errmsg("oversize GSSAPI packet sent
> > -   by the client (%zu > %d)",
> > + (errmsg("oversize GSSAPI packet sent by the client (%zu > %zu)",
> > (size_t) input.length,
> > -   
> > PQ_GSS_RECV_BUFFER_SIZE)));
> > + (size_t) PQ_GSS_RECV_BUFFER_SIZE)));
> > return -1;
> > }
> >   
> 
> Might be better to add that cast to the definition of
> PQ_GSS_RECV_BUFFER_SIZE instead, so that all code can benefit.

As far as I see, the only exceptional uses I found were a comparison
with int values, and being passed as an OM_uint32 (to one of the
parameters of gss_wrap_size_limit()). Therefore, I agree that it is
beneficial. By the way, we currently define Size as the same as size_t
(since 1998). Is it correct to consider Size as merely for backward
compatibility and we should use size_t for new code?  I used size_t in
the modified part in the attached patch.

> > diff --git a/src/backend/replication/repl_gram.y
> > b/src/backend/replication/repl_gram.y
> > index 7474f5bd67..baa76280b9 100644
> > --- a/src/backend/replication/repl_gram.y
> > +++ b/src/backend/replication/repl_gram.y
> > @@ -312,11 +312,6 @@ timeline_history:
> > {
> > TimeLineHistoryCmd *cmd;
> >   - if ($2 <= 0)
> > -   ereport(ERROR,
> > -   
> > (errcode(ERRCODE_SYNTAX_ERROR),
> > -errmsg("invalid
> > -timeline %u",
> > -$2)));
> > -
...
> I don't think this is correct.  It loses the check for == 0.

Ugh. It's my mistake. So we need to consider unifying the messages
again. In walsummaryfuncs.c, %lld is required, but it's silly for the
uses in repl_gram.y. Finally, I chose not to change anything here.

> > diff --git a/src/backend/tsearch/to_tsany.c
> > b/src/backend/tsearch/to_tsany.c
> > index 88cba58cba..9d21178107 100644
> > --- a/src/backend/tsearch/to_tsany.c
> > +++ b/src/backend/tsearch/to_tsany.c
> > @@ -191,7 +191,8 @@ make_tsvector(ParsedText *prs)
> > if (lenstr > MAXSTRPOS)
> > ereport(ERROR,
> > (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
> > -errmsg("string is too long for tsvector (%d
> > -bytes, max %d bytes)", lenstr, MAXSTRPOS)));
> > + /* cast values to avoid extra translatable messages */
> > + errmsg("string is too long for tsvector (%ld bytes, max %ld bytes)",
> > (long) lenstr, (long) 

Re: Inconsistent printf placeholders

2024-03-19 Thread Peter Eisentraut

On 15.03.24 08:20, Kyotaro Horiguchi wrote:

diff --git a/src/backend/access/transam/twophase.c 
b/src/backend/access/transam/twophase.c
@@ -1369,8 +1369,8 @@ ReadTwoPhaseFile(TransactionId xid, bool missing_ok)
 errmsg("could not read file \"%s\": 
%m", path)));
else
ereport(ERROR,
-   (errmsg("could not read file \"%s\": read %d 
of %lld",
-   path, r, (long long 
int) stat.st_size)));
+   (errmsg("could not read file \"%s\": read 
%zd of %zu",
+   path, r, (Size) 
stat.st_size)));
}
  
  	pgstat_report_wait_end();


This might be worse, because stat.st_size is of type off_t, which could 
be smaller than Size/size_t.



diff --git a/src/backend/libpq/be-secure-gssapi.c 
b/src/backend/libpq/be-secure-gssapi.c
index bc04e78abb..68645b4519 100644
--- a/src/backend/libpq/be-secure-gssapi.c
+++ b/src/backend/libpq/be-secure-gssapi.c
@@ -572,9 +572,9 @@ secure_open_gssapi(Port *port)
if (input.length > PQ_GSS_RECV_BUFFER_SIZE)
{
ereport(COMMERROR,
-   (errmsg("oversize GSSAPI packet sent by the 
client (%zu > %d)",
+   (errmsg("oversize GSSAPI packet sent by the 
client (%zu > %zu)",
(size_t) input.length,
-   
PQ_GSS_RECV_BUFFER_SIZE)));
+   (size_t) 
PQ_GSS_RECV_BUFFER_SIZE)));
return -1;
}
  


Might be better to add that cast to the definition of 
PQ_GSS_RECV_BUFFER_SIZE instead, so that all code can benefit.



diff --git a/src/backend/replication/repl_gram.y 
b/src/backend/replication/repl_gram.y
index 7474f5bd67..baa76280b9 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -312,11 +312,6 @@ timeline_history:
{
TimeLineHistoryCmd *cmd;
  
-	if ($2 <= 0)

-   ereport(ERROR,
-   
(errcode(ERRCODE_SYNTAX_ERROR),
-errmsg("invalid 
timeline %u", $2)));
-
cmd = makeNode(TimeLineHistoryCmd);
cmd->timeline = $2;
  
@@ -352,13 +347,7 @@ opt_slot:
  
  opt_timeline:

K_TIMELINE UCONST
-   {
-   if ($2 <= 0)
-   ereport(ERROR,
-   
(errcode(ERRCODE_SYNTAX_ERROR),
-errmsg("invalid 
timeline %u", $2)));
-   $$ = $2;
-   }
+   { $$ = $2; }
| /* EMPTY */   { $$ = 0; }
;
  


I don't think this is correct.  It loses the check for == 0.


diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c
index 88cba58cba..9d21178107 100644
--- a/src/backend/tsearch/to_tsany.c
+++ b/src/backend/tsearch/to_tsany.c
@@ -191,7 +191,8 @@ make_tsvector(ParsedText *prs)
if (lenstr > MAXSTRPOS)
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-errmsg("string is too long for tsvector (%d bytes, 
max %d bytes)", lenstr, MAXSTRPOS)));
+/* cast values to avoid extra translatable 
messages */
+errmsg("string is too long for tsvector (%ld bytes, 
max %ld bytes)", (long) lenstr, (long) MAXSTRPOS)));
  
  	totallen = CALCDATASIZE(prs->curwords, lenstr);

in = (TSVector) palloc0(totallen);


I think it would be better instead to change the message in tsvectorin() 
to *not* use long.  The size of long is unportable, so I would rather 
avoid using it at all.



diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 8d28dd42ce..5de490b569 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -3217,8 +3217,9 @@ byteaGetByte(PG_FUNCTION_ARGS)
if (n < 0 || n >= len)
ereport(ERROR,
(errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
-errmsg("index %d out of valid range, 0..%d",
-   n, len - 1)));
+/* cast 

Re: Inconsistent printf placeholders

2024-03-15 Thread Kyotaro Horiguchi
At Fri, 15 Mar 2024 16:20:27 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> I checked for that kind of msgids in a bit more intensive way. The
> numbers are the line numbers in ja.po of backend. I didn't check the
> same for other modules.
> 
> > ###: invalid timeline %lld
> >@ backup/walsummaryfuncs.c:95
> >  invalid timeline %u
> >@ repl_gram.y:318 repl_gram.y:359

"The numbers are the line numbers in ja.po" is wrong. Correctly, it
should be written as:

The information consists of two lines: the first of them is the msgid,
and the second indicates the locations where they appear.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Inconsistent printf placeholders

2024-03-15 Thread Kyotaro Horiguchi
At Fri, 15 Mar 2024 16:01:28 +1300, David Rowley  wrote 
in 
> On Fri, 15 Mar 2024 at 15:27, Kyotaro Horiguchi  
> wrote:
> > I have considered only the two messages. Actually, buffile.c and md.c
> > are already like that. The attached aligns the messages in
> > pg_combinebackup.c and reconstruct.c with the precedents.
> 
> This looks like a worthy cause to make translator work easier.
> 
> I don't want to widen the goalposts or anything, but just wondering if
> you'd searched for any others that could get similar treatment?
> 
> I only just had a quick look at the following.
> 
> $ cat src/backend/po/fr.po | grep -E "^msgid\s" | sed -E
> 's/%[a-zA-Z]+/\%/g' | sort | uniq -d -c
>  31 msgid ""
>   2 msgid "could not accept SSL connection: %"
>   2 msgid "could not initialize LDAP: %"
>   2 msgid "could not look up local user ID %: %"
...
> I've not looked at how hard it would be with any of the above to
> determine how hard it would be to make the formats consistent.  The
> 3rd last one seems similar enough that it might be worth doing
> together with this?


I checked for that kind of msgids in a bit more intensive way. The
numbers are the line numbers in ja.po of backend. I didn't check the
same for other modules.

> ###: invalid timeline %lld
>@ backup/walsummaryfuncs.c:95
>  invalid timeline %u
>@ repl_gram.y:318 repl_gram.y:359

In the first case, the %lld can be more than 2^31.

In the second case, %u is uint32. However, the bigger issue is
checking if the uint32 value is negative:

repl_gram.c: 147
>   if ((yyvsp[0].uintval) <= 0)
>   ereport(ERROR,
>   (errcode(ERRCODE_SYNTAX_ERROR),
>errmsg("invalid timeline %u", 
> (yyvsp[0].uintval;

which cannot happen. I think we can simply remove the useless error
check.

> ###: could not read file \"%s\": read %d of %zu
>@ ../common/controldata_utils.c:116 ../common/controldata_utils.c:119 
> access/transam/xlog.c:3417 access/transam/xlog.c:4278 
> replication/logical/origin.c:750 replication/logical/origin.c:789 
> replication/logical/snapbuild.c:2040 replication/slot.c:2218 
> replication/slot.c:2259 replication/walsender.c:660 
> utils/cache/relmapper.c:833
>  could not read file \"%s\": read %d of %lld
>@ access/transam/twophase.c:1372
>  could not read file \"%s\": read %zd of %zu
>@ backup/basebackup.c:2102
> ###: oversize GSSAPI packet sent by the client (%zu > %zu)
>@ libpq/be-secure-gssapi.c:351
>  oversize GSSAPI packet sent by the client (%zu > %d)
>@ libpq/be-secure-gssapi.c:575
> ###: compressed segment file \"%s\" has incorrect uncompressed size %d, 
> skipping
>@ pg_receivewal.c:362
>  compressed segment file \"%s\" has incorrect uncompressed size %zu, 
> skipping
>@ pg_receivewal.c:448
> ###: could not read file \"%s\": read only %zd of %lld bytes
>@ pg_combinebackup.c:1304
>  could not read file \"%s\": read only %d of %u bytes
>@ reconstruct.c:514


We can "fix" them the same way.  I debated whether to use ssize_t for
read() and replace all instances of size_t with Size. However, in the
end, I decided to only keep it consistent with the surrounding code.

> ###: index %d out of valid range, 0..%d
>@ utils/adt/varlena.c:3220 utils/adt/varlena.c:3287
>  index %lld out of valid range, 0..%lld
>@ utils/adt/varlena.c:3251 utils/adt/varlena.c:3323
> ###: string is too long for tsvector (%d bytes, max %d bytes)
>@ tsearch/to_tsany.c:194 utils/adt/tsvector.c:277 
> utils/adt/tsvector_op.c:1126
>  string is too long for tsvector (%ld bytes, max %ld bytes)
>@ utils/adt/tsvector.c:223

We can unify them and did in the attached, but I'm not sure that's
sensible..

> ###: could not look up local user ID %d: %s
>@ ../port/user.c:43 ../port/user.c:79
>  could not look up local user ID %ld: %s
>@ libpq/auth.c:1886

Both of the above use uid_t (defined as int) as user ID and it is
explicitly cast to "int" in the first case and to long in the second,
which seems mysterious. Although I'm not sure if there's a possibility
of uid_t being widened in the future, I unified them to the latter way
for robustness.

> ###: error while cloning relation \"%s.%s\" (\"%s\" to \"%s\"): %m
>@ file.c:44
>  error while cloning relation \"%s.%s\" (\"%s\" to \"%s\"): %s
>@ file.c:65
> 
> The latter seems to be changed to %m by reassiging save_errno to errno, as 
> done in other places.
> 
> ###: could not get data directory using %s: %m
>@ option.c:448
>  could not get data directory using %s: %s
>@ option.c:452
> ###: could not get control data using %s: %m
>@ controldata.c:129 controldata.c:199
>  could not get control data using %s: %s
>@ controldata.c:175 controldata.c:507
> ###: %s: %m
>@ copy.c:401 psqlscanslash.l:805 psqlscanslash.l:817 
> 

Re: Inconsistent printf placeholders

2024-03-14 Thread David Rowley
On Fri, 15 Mar 2024 at 15:27, Kyotaro Horiguchi  wrote:
> I have considered only the two messages. Actually, buffile.c and md.c
> are already like that. The attached aligns the messages in
> pg_combinebackup.c and reconstruct.c with the precedents.

This looks like a worthy cause to make translator work easier.

I don't want to widen the goalposts or anything, but just wondering if
you'd searched for any others that could get similar treatment?

I only just had a quick look at the following.

$ cat src/backend/po/fr.po | grep -E "^msgid\s" | sed -E
's/%[a-zA-Z]+/\%/g' | sort | uniq -d -c
 31 msgid ""
  2 msgid "could not accept SSL connection: %"
  2 msgid "could not initialize LDAP: %"
  2 msgid "could not look up local user ID %: %"
  2 msgid "could not open file \"%\": %"
  2 msgid "could not read file \"%\": read % of %"
  2 msgid "could not read from log segment %, offset %: %"
  2 msgid "could not read from log segment %, offset %: read % of %"
  2 msgid "index % out of valid range, 0..%"
  2 msgid "invalid value for parameter \"%\": %"
  2 msgid "%%% is outside the valid range for parameter \"%\" (% .. %)"
  2 msgid "must be owner of large object %"
  2 msgid "oversize GSSAPI packet sent by the client (% > %)"
  2 msgid "permission denied for large object %"
  2 msgid "string is too long for tsvector (% bytes, max % bytes)"
  2 msgid "timestamp out of range: \"%\""
  2 msgid "Valid values are between \"%\" and \"%\"."

I've not looked at how hard it would be with any of the above to
determine how hard it would be to make the formats consistent.  The
3rd last one seems similar enough that it might be worth doing
together with this?

David




Re: Inconsistent printf placeholders

2024-03-14 Thread Kyotaro Horiguchi
Thank you for the suggestions.

At Thu, 14 Mar 2024 13:45:41 +0100, Daniel Gustafsson  wrote 
in 
> I've only skimmed this so far but +1 on keeping the messages the same where
> possible to reduce translation work.  Adding a comment on the message where 
> the
> casting is done to indicate that it is for translation might reduce the risk 
> of
> it "getting fixed" down the line.

Added a comment "/* cast xxx to avoid extra translatable messages */.

At Thu, 14 Mar 2024 14:02:46 +0100, Peter Eisentraut  
wrote in 
> If you want to make them uniform, then I suggest the error messages

Yeah. Having the same messages with only the placeholders changed is
not very pleasing during translation. If possible, I would like to
align them.

> should both be "%zd of %zu bytes", which are the actual types read()
> deals with.

I have considered only the two messages. Actually, buffile.c and md.c
are already like that. The attached aligns the messages in
pg_combinebackup.c and reconstruct.c with the precedents.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pg_combinebackup/pg_combinebackup.c b/src/bin/pg_combinebackup/pg_combinebackup.c
index 6f0814d9ac..feb4d5dcf4 100644
--- a/src/bin/pg_combinebackup/pg_combinebackup.c
+++ b/src/bin/pg_combinebackup/pg_combinebackup.c
@@ -1301,8 +1301,9 @@ slurp_file(int fd, char *filename, StringInfo buf, int maxlen)
 		if (rb < 0)
 			pg_fatal("could not read file \"%s\": %m", filename);
 		else
-			pg_fatal("could not read file \"%s\": read only %zd of %lld bytes",
-	 filename, rb, (long long int) st.st_size);
+			/* cast st_size to avoid extra translatable messages */
+			pg_fatal("could not read file \"%s\": read only %zd of %zu bytes",
+	 filename, rb, (size_t) st.st_size);
 	}
 
 	/* Adjust buffer length for new data and restore trailing-\0 invariant */
diff --git a/src/bin/pg_combinebackup/reconstruct.c b/src/bin/pg_combinebackup/reconstruct.c
index 41f06bb26b..a4badb90e2 100644
--- a/src/bin/pg_combinebackup/reconstruct.c
+++ b/src/bin/pg_combinebackup/reconstruct.c
@@ -504,15 +504,16 @@ make_rfile(char *filename, bool missing_ok)
 static void
 read_bytes(rfile *rf, void *buffer, unsigned length)
 {
-	int			rb = read(rf->fd, buffer, length);
+	ssize_t		rb = read(rf->fd, buffer, length);
 
 	if (rb != length)
 	{
 		if (rb < 0)
 			pg_fatal("could not read file \"%s\": %m", rf->filename);
 		else
-			pg_fatal("could not read file \"%s\": read only %d of %u bytes",
-	 rf->filename, rb, length);
+			/* cast length to avoid extra translatable messages */
+			pg_fatal("could not read file \"%s\": read only %zd of %zu bytes",
+	 rf->filename, rb, (size_t) length);
 	}
 }
 


Re: Inconsistent printf placeholders

2024-03-14 Thread Peter Eisentraut

On 14.03.24 05:20, Kyotaro Horiguchi wrote:

A recent commit 6612185883 introduced two error messages that are
identical in text but differ in their placeholders.

-   pg_fatal("could not read file \"%s\": read only %d of %d 
bytes",
-filename, (int) rb, (int) st.st_size);
+   pg_fatal("could not read file \"%s\": read only %zd of %lld 
bytes",
+filename, rb, (long long int) 
st.st_size);
...
-   pg_fatal("could not read file \"%s\": read only %d of %d 
bytes",
+   pg_fatal("could not read file \"%s\": read only %d of %u 
bytes",
 rf->filename, rb, length);

I'd be happy if the two messages kept consistency. I suggest aligning
types instead of making the messages different, as attached.


If you want to make them uniform, then I suggest the error messages 
should both be "%zd of %zu bytes", which are the actual types read() 
deals with.






Re: Inconsistent printf placeholders

2024-03-14 Thread Daniel Gustafsson
> On 14 Mar 2024, at 05:20, Kyotaro Horiguchi  wrote:

> I'd be happy if the two messages kept consistency. I suggest aligning
> types instead of making the messages different, as attached.

I've only skimmed this so far but +1 on keeping the messages the same where
possible to reduce translation work.  Adding a comment on the message where the
casting is done to indicate that it is for translation might reduce the risk of
it "getting fixed" down the line.

--
Daniel Gustafsson





Inconsistent printf placeholders

2024-03-13 Thread Kyotaro Horiguchi
Hello.

A recent commit 6612185883 introduced two error messages that are
identical in text but differ in their placeholders.

-   pg_fatal("could not read file \"%s\": read only %d of 
%d bytes",
-filename, (int) rb, (int) st.st_size);
+   pg_fatal("could not read file \"%s\": read only %zd of 
%lld bytes",
+filename, rb, (long long int) 
st.st_size);
...
-   pg_fatal("could not read file \"%s\": read only %d of 
%d bytes",
+   pg_fatal("could not read file \"%s\": read only %d of 
%u bytes",
 rf->filename, rb, length);

I'd be happy if the two messages kept consistency. I suggest aligning
types instead of making the messages different, as attached.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pg_combinebackup/reconstruct.c b/src/bin/pg_combinebackup/reconstruct.c
index 41f06bb26b..e3b8e84289 100644
--- a/src/bin/pg_combinebackup/reconstruct.c
+++ b/src/bin/pg_combinebackup/reconstruct.c
@@ -504,15 +504,15 @@ make_rfile(char *filename, bool missing_ok)
 static void
 read_bytes(rfile *rf, void *buffer, unsigned length)
 {
-	int			rb = read(rf->fd, buffer, length);
+	ssize_t		rb = read(rf->fd, buffer, length);
 
 	if (rb != length)
 	{
 		if (rb < 0)
 			pg_fatal("could not read file \"%s\": %m", rf->filename);
 		else
-			pg_fatal("could not read file \"%s\": read only %d of %u bytes",
-	 rf->filename, rb, length);
+			pg_fatal("could not read file \"%s\": read only %zd of %lld bytes",
+	 rf->filename, rb, (long long int) length);
 	}
 }