Re: POSIX gettext() and uselocale()
Geoff Clare wrote: > The current draft says: > > The returned string may be invalidated by a subsequent call to > bind_textdomain_codeset(), bindtextdomain(), setlocale(), or > textdomain() in the same process, or a subsequent call to > uselocale() in the same thread, except for calls that only query > values. > > [...] > > > I think that specifying gettext() to be so restricted is not useful. > > It would make more sense to allow concurrent uselocale() calls. > > The current draft text allows concurrent uselocale() calls. This is better; thanks. Still, I don't think it is sufficient nor consistent. OBJECTION 1: It requires applications to delegate some calls to separate threads. For example, take an application that regularly updates some UI and also occasionally writes an JSON file. For the UI updates, it will need to call gettext(). Let's assume that the UI caches the string the strings that the application passes it, e.g. for fast rerendering. This is the typical way a UI is built. E.g. Gtk+: label1 = gtk_label_new (gettext ("Hello, world!")); Qt: label1 = new QLabel (gettext ("Hello, world!"), panel); For writing data in JSON format [1], it needs to convert - strings to UTF-8 encoding, - numbers to decimal representation, with '.' as decimal separator. For converting numbers to decimal, since the standard has strtod() but no strtod_l() [2], the most immediate implementation is to use uselocale() with a "C" locale argument, then call strtod(), then switch back to the previous locale using uselocale(). With the current wording, converting a number to decimal like this will invalidate many of the strings that the UI is holding. Thus, the application will need to move its JSON file writing to a separate thread. This is a big architectural requirement. OBJECTION 2: It is inconsistent with other parts of POSIX. For localeconv() [3] the wording is "... might be overwritten by subsequent calls to setlocale() with the categories LC_ALL, LC_MONETARY, or LC_NUMERIC, or by calls to uselocale() which change the categories LC_MONETARY or LC_NUMERIC." To make things consistent, you would need to change the text for gettext from "call to uselocale() in the same thread" to "call to uselocale() in the same thread which changes the category LC_MESSAGES (for gettext(), gettext_l(), dgettext(), dgettext_l()) respectively the locale passed to dcgettext(), dcgettext_l()" Bruno [1] https://datatracker.ietf.org/doc/html/rfc8259 [2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/strtod.html [3] https://pubs.opengroup.org/onlinepubs/9699919799/functions/localeconv.html
Re: POSIX gettext() and uselocale()
Bruno Haible wrote, on 16 Jan 2022: > > [First sent on 2021-05-03. Resending because it has not been handled.] It has been handled. This is how I reported the change to austin-group-l on 25th May 2021 (in a reply to Jilles Tjoelker): | In yesterday's teleconference we updated the proposed text to say | that the returned string may be invalidated by a subsequent call to | uselocale() in the same thread (and clarified that for the other | functions it's a subsequent call in the same process). > https://posix.rhansen.org/p/gettext_draft > says (line 358): > > "The returned string may be invalidated by a subsequent call to >bind_textdomain_codeset(), bindtextdomain(), setlocale(), >textdomain(), or uselocale()." The current draft says: The returned string may be invalidated by a subsequent call to bind_textdomain_codeset(), bindtextdomain(), setlocale(), or textdomain() in the same process, or a subsequent call to uselocale() in the same thread, except for calls that only query values. [...] > I think that specifying gettext() to be so restricted is not useful. > It would make more sense to allow concurrent uselocale() calls. The current draft text allows concurrent uselocale() calls. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: POSIX gettext() and uselocale()
Historically, gettext domains are process wide, making use in multi-threaded apps problematic to begin with. The *_l versions only partially address this. The uselocale() interface is included there for the cases where a locale is used by both a uselocale() and one or more of the *_l versions, in that a second uselocale() call after the retrievals, with a different locale, may cause the memory mapping many implementations use for .mo files to be released on the next *_l call. Yes, it is not the call itself that causes these releases, or shouldn't, but as the root reason, imho, it should stay in the list. On Sun, Jan 16, 2022 at 4:11 PM, Bruno Haible via austin-group-l at The Open Group wrote: [First sent on 2021-05-03. Resending because it has not been handled.] https://posix.rhansen.org/p/gettext_draft says (line 358): "The returned string may be invalidated by a subsequent call to bind_textdomain_codeset(), bindtextdomain(), setlocale(), textdomain(), or uselocale()." While in most programs setlocale(), textdomain(), bindtextdomain(), bind_textdomain_codeset() are being called at the beginning of the program execution, before any call to gettext(), the situation is very different for uselocale(). 1) uselocale() is meant to have effects ONLY on the thread in which it is called. 2) uselocale() is a helper function to implement *_l functions where the POSIX standard does not specify them or the system does not have them. For example, when a program wants to have a function to parse a number, recognizing only the ASCII digits and only '.' as decimal separator, a reliable way to implement such a function is by calling uselocale of the "C" locale, strtod(), and then uselocale() again to switch the thread back to the previous locale. If POSIX did not have uselocale(), it would need to provide many more *_l functions. If the gettext() result may be invalidated by a uselocale() call (in any other thread!), this would mean that ** Programs can use gettext() or uselocale() but not both. ** and - more or less - ** Multithreaded programs that use libraries (that may use uselocale()) cannot use gettext(). ** I think that specifying gettext() to be so restricted is not useful. It would make more sense to allow concurrent uselocale() calls. Proposed wording: "The returned string may be invalidated by a subsequent call to bind_textdomain_codeset(), bindtextdomain(), setlocale(), or textdomain()."
POSIX gettext() and uselocale()
[First sent on 2021-05-03. Resending because it has not been handled.] https://posix.rhansen.org/p/gettext_draft says (line 358): "The returned string may be invalidated by a subsequent call to bind_textdomain_codeset(), bindtextdomain(), setlocale(), textdomain(), or uselocale()." While in most programs setlocale(), textdomain(), bindtextdomain(), bind_textdomain_codeset() are being called at the beginning of the program execution, before any call to gettext(), the situation is very different for uselocale(). 1) uselocale() is meant to have effects ONLY on the thread in which it is called. 2) uselocale() is a helper function to implement *_l functions where the POSIX standard does not specify them or the system does not have them. For example, when a program wants to have a function to parse a number, recognizing only the ASCII digits and only '.' as decimal separator, a reliable way to implement such a function is by calling uselocale of the "C" locale, strtod(), and then uselocale() again to switch the thread back to the previous locale. If POSIX did not have uselocale(), it would need to provide many more *_l functions. If the gettext() result may be invalidated by a uselocale() call (in any other thread!), this would mean that ** Programs can use gettext() or uselocale() but not both. ** and - more or less - ** Multithreaded programs that use libraries (that may use uselocale()) cannot use gettext(). ** I think that specifying gettext() to be so restricted is not useful. It would make more sense to allow concurrent uselocale() calls. Proposed wording: "The returned string may be invalidated by a subsequent call to bind_textdomain_codeset(), bindtextdomain(), setlocale(), or textdomain()."
Re: POSIX gettext() and uselocale()
Jilles Tjoelker wrote, on 24 May 2021: > > On Tue, May 04, 2021 at 01:07:39AM +0200, Bruno Haible via > austin-group-l at The Open Group wrote: > > https://posix.rhansen.org/p/gettext_split > > says (line 92): > > > "The returned string may be invalidated by a subsequent call to > >bind_textdomain_codeset(), bindtextdomain(), setlocale(), > >textdomain(), or uselocale()." > [...] > > > I think that specifying gettext() to be so restricted is not useful. > > It would make more sense to allow concurrent uselocale() calls. > > > Proposed wording: > > > "The returned string may be invalidated by a subsequent call to > >bind_textdomain_codeset(), bindtextdomain(), setlocale(), > >or textdomain()." > > This may be a bit too weak. Now the implementation can never free a > string that was returned by a gettext call on a thread with uselocale() > active, [...] In yesterday's teleconference we updated the proposed text to say that the returned string may be invalidated by a subsequent call to uselocale() in the same thread (and clarified that for the other functions it's a subsequent call in the same process). -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: POSIX gettext() and uselocale()
On Tue, May 04, 2021 at 01:07:39AM +0200, Bruno Haible via austin-group-l at The Open Group wrote: > https://posix.rhansen.org/p/gettext_split > says (line 92): > "The returned string may be invalidated by a subsequent call to >bind_textdomain_codeset(), bindtextdomain(), setlocale(), >textdomain(), or uselocale()." > While in most programs setlocale(), textdomain(), bindtextdomain(), > bind_textdomain_codeset() are being called at the beginning of the > program execution, before any call to gettext(), the situation is > very different for uselocale(). > 1) uselocale() is meant to have effects ONLY on the thread in which it >is called. > 2) uselocale() is a helper function to implement *_l functions where >the POSIX standard does not specify them or the system does not have >them. >For example, when a program wants to have a function to parse >a number, recognizing only the ASCII digits and only '.' as decimal >separator, a reliable way to implement such a function is by calling >uselocale of the "C" locale, strtod(), and then uselocale() again >to switch the thread back to the previous locale. >If POSIX did not have uselocale(), it would need to provide many >more *_l functions. > If the gettext() result may be invalidated by a uselocale() call (in > any other thread!), this would mean that > ** Programs can use gettext() or uselocale() but not both. ** > and - more or less - > ** Multithreaded programs that use libraries (that may use uselocale()) > cannot use gettext(). ** > I think that specifying gettext() to be so restricted is not useful. > It would make more sense to allow concurrent uselocale() calls. > Proposed wording: > "The returned string may be invalidated by a subsequent call to >bind_textdomain_codeset(), bindtextdomain(), setlocale(), >or textdomain()." This may be a bit too weak. Now the implementation can never free a string that was returned by a gettext call on a thread with uselocale() active, while logically the string may be owned by the locale and could be freed if that locale is no longer set on any thread and freelocale() has been called on it as needed. -- Jilles Tjoelker
POSIX gettext() and uselocale()
https://posix.rhansen.org/p/gettext_split says (line 92): "The returned string may be invalidated by a subsequent call to bind_textdomain_codeset(), bindtextdomain(), setlocale(), textdomain(), or uselocale()." While in most programs setlocale(), textdomain(), bindtextdomain(), bind_textdomain_codeset() are being called at the beginning of the program execution, before any call to gettext(), the situation is very different for uselocale(). 1) uselocale() is meant to have effects ONLY on the thread in which it is called. 2) uselocale() is a helper function to implement *_l functions where the POSIX standard does not specify them or the system does not have them. For example, when a program wants to have a function to parse a number, recognizing only the ASCII digits and only '.' as decimal separator, a reliable way to implement such a function is by calling uselocale of the "C" locale, strtod(), and then uselocale() again to switch the thread back to the previous locale. If POSIX did not have uselocale(), it would need to provide many more *_l functions. If the gettext() result may be invalidated by a uselocale() call (in any other thread!), this would mean that ** Programs can use gettext() or uselocale() but not both. ** and - more or less - ** Multithreaded programs that use libraries (that may use uselocale()) cannot use gettext(). ** I think that specifying gettext() to be so restricted is not useful. It would make more sense to allow concurrent uselocale() calls. Proposed wording: "The returned string may be invalidated by a subsequent call to bind_textdomain_codeset(), bindtextdomain(), setlocale(), or textdomain()." Bruno