Re: Unicode string literals
On 4/30/20 2:05 PM, Marc Nieper-Wißkirchen wrote: >> Could we have a macro to be used only in source code encoded via UTF-8? >> Presumably the older compilers would process the bytes of the string as if >> they >> were individual 8-bit characters and would pass them through unchanged, so >> the >> run-time string would be UTF-8 too. > This would allow writing a macro that prefixes "u8" to strings in > compilers supporting enough of C11, skipping the prefix in compilers > that pass UTF-8 encoded bytes in strings unchanged Yes, that was the idea. > and signal an error > in all other cases (hopefully only very exotic platforms), right? I wasn't thinking of requiring a diagnostic of that case, at least not reliably. Not sure it's worth worrying about.
Re: xsize and flexmember
On 4/30/20 2:01 PM, Marc Nieper-Wißkirchen wrote: #define XFLEXSIZEOF_XSIZE(type, member, n) \ (((n) <= FLEXSIZEOF (type, member, n) \ && FLEXSIZEOF (type, member, n) <= (size_t) -1) \ ? (size_t) FLEXSIZEOF (type, member, n) : (size_t) -1) > > Why do you write "(n) <= FLEXSIZEOF (type, member, n)" and not "n < > FLEXSIZEOF (type, member, n)"? In case MEMBER is the first element of > TYPE, this would not indicate an overflow, would it? If n == FLEXSIZEOF (type, member, n) then overflow has not occurred, yes. And in that case the function should yield n. (Admittedly this case would be rare) > My idea was: > > #define XFLEXSIZEOF_XSIZE(type, member, n) xflexsizeof_xsize_bound( > FLEXSIZEOF (type, member, n), n) > static _GL_INLINE size_t xflexsizeof_xsize_bound (umaxint_t m, size_t n) > { > if (n < m && m <= (size_t) -1) > return m; > else > return (size_t) -1; > } This would require including stdint.h to get uintmax_t, which adds a dependency. Also, xflexsizeof_xsize_bound shouldn't be a static function since extern inline functions can't call static functions, though that should be easy to fix. There's also the theoretical problem that INTMAX_MAX might be greater than UINTMAX_MAX, but perhaps we needn't worry about that I can see going either way on this. As a macro, FLEXSIZEOF_XSIZE could insist that its last argument be free of side effects, and that would be simpler on the implementation. It's an annoying restriction, though. > maybe FLEXSIZEOF_XSIZE, which would at least drop the > leading "x" as we no error is signaled. :) Yes, good point.
Re: Unicode string literals
Am Do., 30. Apr. 2020 um 22:54 Uhr schrieb Paul Eggert : > > On 4/30/20 6:08 AM, Bruno Haible wrote: > > These not-so-new compilers don't perform > > character set conversion; you have to provide the numeric value of each > > byte yourself (either as escapes, or by enumerating the bytes of the > > string one by one). > > Could we have a macro to be used only in source code encoded via UTF-8? > Presumably the older compilers would process the bytes of the string as if > they > were individual 8-bit characters and would pass them through unchanged, so the > run-time string would be UTF-8 too. This would allow writing a macro that prefixes "u8" to strings in compilers supporting enough of C11, skipping the prefix in compilers that pass UTF-8 encoded bytes in strings unchanged and signal an error in all other cases (hopefully only very exotic platforms), right?
Re: Unicode string literals
On 4/30/20 6:08 AM, Bruno Haible wrote: > These not-so-new compilers don't perform > character set conversion; you have to provide the numeric value of each > byte yourself (either as escapes, or by enumerating the bytes of the > string one by one). Could we have a macro to be used only in source code encoded via UTF-8? Presumably the older compilers would process the bytes of the string as if they were individual 8-bit characters and would pass them through unchanged, so the run-time string would be UTF-8 too.
Re: xsize and flexmember
On 4/29/20 11:39 PM, Marc Nieper-Wißkirchen wrote: >> #define XFLEXSIZEOF_XSIZE(type, member, n) \ >> (((n) <= FLEXSIZEOF (type, member, n) \ >> && FLEXSIZEOF (type, member, n) <= (size_t) -1) \ >>? (size_t) FLEXSIZEOF (type, member, n) : (size_t) -1) >> >> A couple of problems with this approach: >> >> * It evaluates N more than once. > > Couldn't this be solved by calling a static function that would be > subject to be inlined? I don't offhand see how to get that to work if n exceeds SIZE_MAX. > Why would you prefer the (longer) name XFLEXSIZEOF_XSIZE vs XFLEXSIZEOF? It's specialized for size_t computations, and is not in general suitable for ptrdiff_t or other types. Also, elsewhere in Gnulib a leading "x" means the function signals an error if overflow occurs, and that's not what's happening here. I realize we have dueling conventions here, but would prefer that saturated size_t arithmetic have a longer prefix or suffix than just "x".
Re: pure and const function attributes
Am Mi., 29. Apr. 2020 um 18:05 Uhr schrieb Marc Nieper-Wißkirchen : > > > Paul Eggert schrieb am Mi., 29. Apr. 2020, 18:01: >> >> On 4/29/20 7:28 AM, Marc Nieper-Wißkirchen wrote: >> > I am wondering whether it makes sense to add two new modules, named >> > pure and const that define macros GL_PURE and GL_CONST, respectively >> >> There's already _GL_ATTRIBUTE_PURE and _GL_ATTRIBUTE_CONST. Presumably you >> just >> want them exposed? (I confess that Emacs already uses the latter) > > > That would be perfect! P.S.: It would also be helpful so that warnings coming from "-Wsuggest-attribute=pure" can be handled for the GCC without affecting other compilers.
Re: Unicode string literals
Hi Marc, > I was hoping that compilers not supporting enough of C11 > would have some other way to translate from the source file encoding > to UTF-8, which could be exploited by Gnulib. No, that's not the case. These not-so-new compilers don't perform character set conversion; you have to provide the numeric value of each byte yourself (either as escapes, or by enumerating the bytes of the string one by one). > > Your best bet is > > 1) For UTF-8 encoded strings, ensure that your source code is UTF-8 > > encoded, or use escapes, like in gnulib/tests/uniwidth/test-u8-width.c. > > Using escapes for non-ASCII characters, it will work whenever the > execution character set of the compiler is compatible with ASCII, > right? The only system where the execution character set is not compatible with ASCII is z/OS. Daniel Richard G. is our expert regarding this platform. My understanding is that - there are some facilities in the compiler, but we cannot make use of them in gnulib, - there are some facilities in the run-time library, and Daniel knows how to make use of them with gnulib, - overall it's case-by-case coding; there's no simple magic wand for it. > > > for pre-C2x systems would be nice so that ASCII("c") expands into the > > > ASCII code point of the character `c'. > > > > What's the point of this one? Why not just write 'c'? > > I was thinking of a system whose execution character set is not > compatible with ASCII. You can have a statically allocated translation table from EBCDIC to ASCII and write a macro that expands to ebcdic_to_ascii['c']. But that will not be a constant expression. So, e.g. you cannot use this in a 'switch' statement. And you cannot build a getopt option string from it either. And so on. Bruno
Re: Unicode string literals
Hi Bruno, thank you very much for your reply. Am Do., 30. Apr. 2020 um 12:06 Uhr schrieb Bruno Haible : [...] > Unfortunately, we cannot provide such macros. The reason is that the > translation from the source file's encoding to UTF-8/UTF-16/UTF-32 must > be done by the compiler, if you want to be able to write > static uint8_t my_string[] = u8"Wißkirchen"; For a compiler that supports the "u8" prefix, which is defined by C11, the compiler should do the translation from the source file encoding to UTF-8. I was hoping that compilers not supporting enough of C11 would have some other way to translate from the source file encoding to UTF-8, which could be exploited by Gnulib. > Your best bet is > 1) For UTF-8 encoded strings, ensure that your source code is UTF-8 > encoded, or use escapes, like in gnulib/tests/uniwidth/test-u8-width.c. Using escapes for non-ASCII characters, it will work whenever the execution character set of the compiler is compatible with ASCII, right? > 2) For UTF-16 encoded strings, which you'll need only on Windows, > write L"Wißkirchen". Or use hex codes, like in > gnulib/tests/uniwidth/test-u16-width.c. > 3) Don't use UTF-32 encoded strings. Or use hex codes, like in > gnulib/tests/uniwidth/test-u32-width.c. These two are less important for me; I mentioned them to have a full set of macros. > > > Similarly, something like > > > > #define ASCII(s) (u8 ## s [0]) > > > > for pre-C2x systems would be nice so that ASCII("c") expands into the > > ASCII code point of the character `c'. > > What's the point of this one? Why not just write 'c'? I was thinking of a system whose execution character set is not compatible with ASCII. Or are those excluded in general by Gnulib? Thanks again, Marc
Re: Unicode string literals
Hi Marc, Marc Nieper-Wißkirchen wrote: > On a system that supports at least C11, I can create an UTF8-encoded > literal string through: > > (uint8_t const *) u8"..." > > Could Gnulib abstract this into a macro so that substitutes for > systems that do not have u8 string literals can be provided. > > On a C11 system, we would have > > #define UTF8(s) ((uint8_t const *) u8 ## s) > > and similar definitions for UTF16 and UTF32. Unfortunately, we cannot provide such macros. The reason is that the translation from the source file's encoding to UTF-8/UTF-16/UTF-32 must be done by the compiler, if you want to be able to write static uint8_t my_string[] = u8"Wißkirchen"; Your best bet is 1) For UTF-8 encoded strings, ensure that your source code is UTF-8 encoded, or use escapes, like in gnulib/tests/uniwidth/test-u8-width.c. 2) For UTF-16 encoded strings, which you'll need only on Windows, write L"Wißkirchen". Or use hex codes, like in gnulib/tests/uniwidth/test-u16-width.c. 3) Don't use UTF-32 encoded strings. Or use hex codes, like in gnulib/tests/uniwidth/test-u32-width.c. > Similarly, something like > > #define ASCII(s) (u8 ## s [0]) > > for pre-C2x systems would be nice so that ASCII("c") expands into the > ASCII code point of the character `c'. What's the point of this one? Why not just write 'c'? Bruno
Unicode string literals
On a system that supports at least C11, I can create an UTF8-encoded literal string through: (uint8_t const *) u8"..." Could Gnulib abstract this into a macro so that substitutes for systems that do not have u8 string literals can be provided. On a C11 system, we would have #define UTF8(s) ((uint8_t const *) u8 ## s) and similar definitions for UTF16 and UTF32. Similarly, something like #define ASCII(s) (u8 ## s [0]) for pre-C2x systems would be nice so that ASCII("c") expands into the ASCII code point of the character `c'.
Re: xsize and flexmember
Thank you very much for your quick response! Am Do., 30. Apr. 2020 um 00:39 Uhr schrieb Paul Eggert : > > On 4/29/20 12:29 PM, Marc Nieper-Wißkirchen wrote: > > It would be great if the flexmember exported another macro, say > > XFLEXSIZEOF, which returned SIZE_MAX in case of arithmetic overflow. > > Something like this? > > /* Like FLEXSIZEOF, except yield SIZE_MAX on arithmetic overflow, >and N might be evaluated more than once. */ > > #define XFLEXSIZEOF_XSIZE(type, member, n) \ > (((n) <= FLEXSIZEOF (type, member, n) \ > && FLEXSIZEOF (type, member, n) <= (size_t) -1) \ >? (size_t) FLEXSIZEOF (type, member, n) : (size_t) -1) > > A couple of problems with this approach: > > * It evaluates N more than once. Couldn't this be solved by calling a static function that would be subject to be inlined? > > * If the FLEXSIZEOF calls appears in a ptrdiff_t context it might not > return the right value. ptrdiff_t is also a popular way > to compute sizes. Maybe a warning in the comment above the macro's definition would be enough. > > But perhaps it's good enough. Why would you prefer the (longer) name XFLEXSIZEOF_XSIZE vs XFLEXSIZEOF?