Re: Unicode string literals

2020-05-01 Thread Daniel Richard G.
Hi everyone, I've been watching this discussion. On Fri, 2020 May 1 18:52-04:00, Bruno Haible wrote: > > Yes, this is unlikely. In a world where people routinely do a "git pull" from > upstream repositories and send patches or pull requests upstream, every > automated downstream manipulation of

Re: Unicode string literals

2020-05-01 Thread Bruno Haible
Paul Eggert wrote: > I was thinking about the case where one develops and normally builds on > systems > that assume UTF-8 source code (perhaps because a build system is old and just > compiles the bytes unchecked), but that on occasion a builder might translate > all the source code to (say)

Re: Unicode string literals

2020-05-01 Thread Paul Eggert
On 5/1/20 2:01 AM, Bruno Haible wrote: > Did you mean (1) that the programmer shall define a macro, that indicates that > their source code is UTF-8 encoded? > > Or did you mean (2) that gnulib shall define a macro, that shall _assume_ that > the source code is UTF-8 encoded, and then expand to

Re: Unicode string literals

2020-05-01 Thread Bruno Haible
Hi Paul, > >> Could we have a macro to be used only in source code encoded via UTF-8? > >> Presumably the older compilers would process the bytes of the string as if > >> they > >> were individual 8-bit characters and would pass them through unchanged, so > >> the > >> run-time string would be

Re: Unicode string literals

2020-04-30 Thread Paul Eggert
On 4/30/20 2:05 PM, Marc Nieper-Wißkirchen wrote: >> Could we have a macro to be used only in source code encoded via UTF-8? >> Presumably the older compilers would process the bytes of the string as if >> they >> were individual 8-bit characters and would pass them through unchanged, so >> the

Re: Unicode string literals

2020-04-30 Thread Marc Nieper-Wißkirchen
Am Do., 30. Apr. 2020 um 22:54 Uhr schrieb Paul Eggert : > > On 4/30/20 6:08 AM, Bruno Haible wrote: > > These not-so-new compilers don't perform > > character set conversion; you have to provide the numeric value of each > > byte yourself (either as escapes, or by enumerating the bytes of the > >

Re: Unicode string literals

2020-04-30 Thread Paul Eggert
On 4/30/20 6:08 AM, Bruno Haible wrote: > These not-so-new compilers don't perform > character set conversion; you have to provide the numeric value of each > byte yourself (either as escapes, or by enumerating the bytes of the > string one by one). Could we have a macro to be used only in source

Re: Unicode string literals

2020-04-30 Thread Bruno Haible
Hi Marc, > I was hoping that compilers not supporting enough of C11 > would have some other way to translate from the source file encoding > to UTF-8, which could be exploited by Gnulib. No, that's not the case. These not-so-new compilers don't perform character set conversion; you have to

Re: Unicode string literals

2020-04-30 Thread Marc Nieper-Wißkirchen
Hi Bruno, thank you very much for your reply. Am Do., 30. Apr. 2020 um 12:06 Uhr schrieb Bruno Haible : [...] > Unfortunately, we cannot provide such macros. The reason is that the > translation from the source file's encoding to UTF-8/UTF-16/UTF-32 must > be done by the compiler, if you want

Re: Unicode string literals

2020-04-30 Thread Bruno Haible
Hi Marc, Marc Nieper-Wißkirchen wrote: > On a system that supports at least C11, I can create an UTF8-encoded > literal string through: > > (uint8_t const *) u8"..." > > Could Gnulib abstract this into a macro so that substitutes for > systems that do not have u8 string literals can be