Re: POSIX xgettext and dgettext() calls

2022-06-13 Thread Geoff Clare via austin-group-l at The Open Group
Bruno Haible wrote, on 12 May 2022:
>
> https://posix.rhansen.org/p/gettext_draft
> Lines 1173..1179
> 
> > on Solaris, the resulting .po file is called "foobar.po" and contains the 
> > msgid "test".
> 
> Confirmed; it's like this on OmniOS and OpenIndiana.
> 
> > Running it on GNU, the resulting .po file is called "messages.po" and there 
> > is no indication that the msgid belongs to "foobar".
> 
> Confirmed as well. It is like this since at least version 0.10.40 from 2001.
> 
> > According to the L18nux specification, the Solaris behavior is intended.
> 
> Confirmed: LI18NUX 2000 says "msgid strings in dgettext() calls are written
> to the output file domainname.po where domainname is the first parameter to
> the dgettext() call."
> 
> > Why does GNU xgettext deviate?
> 
> I think there are three reasons:
> 
[...]
> 
> Suggestion:
> Mark this case as unspecified.

In today's teleconference we made changes to allow both behaviours,
although we made it implementation-defined rather than unspecified.
(I.e. implementations have to document which way they behave.)

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: POSIX xgettext and dgettext() calls

2022-05-11 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_draft
Lines 1173..1179

> on Solaris, the resulting .po file is called "foobar.po" and contains the 
> msgid "test".

Confirmed; it's like this on OmniOS and OpenIndiana.

> Running it on GNU, the resulting .po file is called "messages.po" and there 
> is no indication that the msgid belongs to "foobar".

Confirmed as well. It is like this since at least version 0.10.40 from 2001.

> According to the L18nux specification, the Solaris behavior is intended.

Confirmed: LI18NUX 2000 says "msgid strings in dgettext() calls are written
to the output file domainname.po where domainname is the first parameter to
the dgettext() call."

> Why does GNU xgettext deviate?

I think there are three reasons:

(1) Premature standardization: At that time, there was no established
practice regarding how to deal with multiple domains.

The old Uniforum specification pushed for the idea of a multi-domain
PO file, with the 'domain' directives; this approach made it hard to
concatenate and manipulate the files.

The LI18NUX 2000 specification pushed for extracting a separate .po
file for dgettext() directives.

This did not attain wide use either, because the programmers want to
minimize the number of domains: ideally one domain per package. Then
it makes no sense to mention the domain name at hundreds of places in
the source code. The programmer would instead write
  #define _(msgid) dgettext("mydomain", msgid)
and use the _() macro throughout the source code.

(2) Integration into a build system.

The xgettext utility is, in 99% of the cases, used as part of a build
system. In a build system, a maintainer wants to have control over the
file names; that is, they don't want files with arbitrary names to
appear. For comparison, have you ever seen a C/C++ compiler create a
separate file for each function/class/template/whatever? No, because
the build systems people conceive for C/C++, with Makefile rules etc.,
don't like files with arbitrary file names in the current directory.

(3) Security: When your test program is changed to

#include 
#include 
int main(){
printf("%s\n",dgettext("../../../../../../tmp/foobar","test"));
}

it does indeed create a file /tmp/foobar.po. Similar things can be
done, to write into any writable directory on the disk. This is
nowadays considered a security issue, which is why e.g. GNU tar
prohibits extracting files outside of the current directory, since
version 1.30.

Suggestion:
Mark this case as unspecified.

Rationale: I don't think the Li18nux + Solaris behaviour should be
standardized, because of the points (2), (3) above. And I don't think
it's worth standardizing any particular behaviour at all, because of
what I wrote in (1). It's a fringe case no one uses.

Bruno