Re: [PATCH 0/3]: C N2653 char8_t implementation

2021-06-13 Thread Tom Honermann via Gcc-patches

On 6/11/21 1:27 PM, Joseph Myers wrote:

On Fri, 11 Jun 2021, Tom Honermann via Gcc-patches wrote:


The option is needed because it impacts core language backward compatibility
(for both C and C++, the type of u8 string literals; for C++, the type of u8
character literals and the new char8_t fundamental type).

Lots of new features in new standard versions can affect backward
compatibility.  We generally bundle all of those up into a single -std
option rather than having an explosion of different language variants with
different features enabled or disabled.  I don't think this feature, for
C, reaches the threshold that would justify having a separate option to
control it, especially given that people can use -Wno-pointer-sign or
pointer casts or their own local char8_t typedef as an intermediate step
if they want code using u8"" strings to work for both old and new standard
versions.
Ok, I'm happy to defer to your experience.  My perspective is likely 
biased by the C++20 changes being more disruptive for that language.


I don't think u8"" strings are widely used in C library headers in a way
where the choice of type matters.  (Use of a feature in library headers is
a key thing that can justify options such as -fgnu89-inline, because it
means the choice of language version is no longer fully under control of a
single project.)

That aligns with my expectations.


The only feature proposed for C2x that I think is likely to have
significant compatibility implications in practice for a lot of code is
making bool, true and false into keywords.  I still don't think a separate
option makes sense there.  (If that feature is accepted for C2x, what
would be useful is for people to do distribution rebuilds with -std=gnu2x
as the default to find and fix code that breaks, in advance of the default
actually changing in GCC.  But the workaround for not-yet-fixed code would
be -std=gnu11, not a separate option for that one feature.)

Ok, that comparison is helpful.



I think the whole patch series would best wait until after the proposal
has been considered by a WG14 meeting, in addition to not increasing the
number of language dialects supported.

As an opt-in feature, this is useful to gain implementation and deployment
experience for WG14.

I think this feature is one of the cases where experience in C++ is
sufficiently relevant for C (although there are certainly cases of other
language features where the languages are sufficiently different that
using C++ experience like that can be problematic).

E.g. we didn't need -fdigit-separators for C before digit separators were
added to C2x, and we don't need -fno-digit-separators now they are in C2x
(the feature is just enabled or disabled based on the language version),
although that's one of many features that do affect compatibility in
corner cases.


Got it, thanks again, that comparison is helpful.

Per this and prior messages, I'll revise the gcc patch series as follows 
(I'll likewise revise the glibc changes, but will detail that in the 
corresponding glibc mailing list thread).


1. Remove the proposed use of -fchar8_t and -fno-char8_t for C code.
2. Remove the updated documentation for the -fchar8_t option since it
   won't be applicable to C code.
3. Remove the _CHAR8_T_SOURCE macro.
4. Enable the change of u8 string literal type based on -std=[gnu|c]2x
   (by setting flag_char8_t if flag_isoc2x is set).
5. Condition the declarations of atomic_char8_t and
   __GCC_ATOMIC_CHAR8_T_LOCK_FREE on _GNU_SOURCE or _ISOC2X_SOURCE.
6. Remove the char8 data member from cpp_options that I had added and
   forgot to remove.
7. Revise the tests and rename them for consistency with other C2x tests.

If I've forgotten anything, please let me know.

Thank you for the thorough review!

Tom.



Re: [PATCH 0/3]: C N2653 char8_t implementation

2021-06-11 Thread Joseph Myers
On Fri, 11 Jun 2021, Tom Honermann via Gcc-patches wrote:

> The option is needed because it impacts core language backward compatibility
> (for both C and C++, the type of u8 string literals; for C++, the type of u8
> character literals and the new char8_t fundamental type).

Lots of new features in new standard versions can affect backward 
compatibility.  We generally bundle all of those up into a single -std 
option rather than having an explosion of different language variants with 
different features enabled or disabled.  I don't think this feature, for 
C, reaches the threshold that would justify having a separate option to 
control it, especially given that people can use -Wno-pointer-sign or 
pointer casts or their own local char8_t typedef as an intermediate step 
if they want code using u8"" strings to work for both old and new standard 
versions.

I don't think u8"" strings are widely used in C library headers in a way 
where the choice of type matters.  (Use of a feature in library headers is 
a key thing that can justify options such as -fgnu89-inline, because it 
means the choice of language version is no longer fully under control of a 
single project.)

The only feature proposed for C2x that I think is likely to have 
significant compatibility implications in practice for a lot of code is 
making bool, true and false into keywords.  I still don't think a separate 
option makes sense there.  (If that feature is accepted for C2x, what 
would be useful is for people to do distribution rebuilds with -std=gnu2x 
as the default to find and fix code that breaks, in advance of the default 
actually changing in GCC.  But the workaround for not-yet-fixed code would 
be -std=gnu11, not a separate option for that one feature.)

> > I think the whole patch series would best wait until after the proposal
> > has been considered by a WG14 meeting, in addition to not increasing the
> > number of language dialects supported.
> 
> As an opt-in feature, this is useful to gain implementation and deployment
> experience for WG14.

I think this feature is one of the cases where experience in C++ is 
sufficiently relevant for C (although there are certainly cases of other 
language features where the languages are sufficiently different that 
using C++ experience like that can be problematic).

E.g. we didn't need -fdigit-separators for C before digit separators were 
added to C2x, and we don't need -fno-digit-separators now they are in C2x 
(the feature is just enabled or disabled based on the language version), 
although that's one of many features that do affect compatibility in 
corner cases.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 0/3]: C N2653 char8_t implementation

2021-06-11 Thread Tom Honermann via Gcc-patches

On 6/7/21 5:03 PM, Joseph Myers wrote:

On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote:


These changes do not impact default gcc behavior.  The existing -fchar8_t
option is extended to C compilation to enable the N2653 changes, and
-fno-char8_t is extended to explicitly disable them.  N2653 has not yet been
accepted by WG14, so no changes are made to handling of the C2X language
dialect.

Why is that option needed?  Normally I'd expect features to be enabled or
disabled based on the selected language version, rather than having
separate options to adjust the configuration for one very specific feature
in a language version.  Adding extra language dialects not corresponding
to any standard version but to some peculiar mix of versions (such as C17
with a changed type for u8"", or C2X with a changed type for u8'') needs a
strong reason for those language dialects to be useful (for example, the
-fgnu89-inline option was justified by widespread use of GNU-style extern
inline in headers).


The option is needed because it impacts core language backward 
compatibility (for both C and C++, the type of u8 string literals; for 
C++, the type of u8 character literals and the new char8_t fundamental 
type).


The ability to opt-in or opt-out of the feature eases migration by 
enabling source code compatibility.  C and C++ standards are not 
published at the same cadence.  A project that targets C++20 and C17 may 
therefore have a need to either opt-out of char8_t support on the C++ 
side (already possible via -fno-char8_t), or to opt-in to char8_t 
support on the C side until such time as the targets change to C++20(+) 
and C23(+); assuming WG14 approval at some point.




I think the whole patch series would best wait until after the proposal
has been considered by a WG14 meeting, in addition to not increasing the
number of language dialects supported.


As an opt-in feature, this is useful to gain implementation and 
deployment experience for WG14.


It would be appropriate to document this as an experimental feature 
pending WG14 approval.  If WG14 declines it or approves it with 
different behavior, the feature can then be removed or changed.


The option could also be introduced as -fexperimental-char8_t if that 
eases concerns, though I do not favor that approach due to misalignment 
with the existing option for C++.


Tom.



Re: [PATCH 0/3]: C N2653 char8_t implementation

2021-06-07 Thread Joseph Myers
On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote:

> These changes do not impact default gcc behavior.  The existing -fchar8_t
> option is extended to C compilation to enable the N2653 changes, and
> -fno-char8_t is extended to explicitly disable them.  N2653 has not yet been
> accepted by WG14, so no changes are made to handling of the C2X language
> dialect.

Why is that option needed?  Normally I'd expect features to be enabled or 
disabled based on the selected language version, rather than having 
separate options to adjust the configuration for one very specific feature 
in a language version.  Adding extra language dialects not corresponding 
to any standard version but to some peculiar mix of versions (such as C17 
with a changed type for u8"", or C2X with a changed type for u8'') needs a 
strong reason for those language dialects to be useful (for example, the 
-fgnu89-inline option was justified by widespread use of GNU-style extern 
inline in headers).

I think the whole patch series would best wait until after the proposal 
has been considered by a WG14 meeting, in addition to not increasing the 
number of language dialects supported.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH 0/3]: C N2653 char8_t implementation

2021-06-06 Thread Tom Honermann via Gcc-patches
This series of patches implements the core language features for the 
WG14 N2653 [1] proposal to provide char8_t support in C.  These changes 
are intended to align char8_t support in C with the support provided in 
C++20 via WG21 P0482R6 [2].


These changes do not impact default gcc behavior.  The existing 
-fchar8_t option is extended to C compilation to enable the N2653 
changes, and -fno-char8_t is extended to explicitly disable them.  N2653 
has not yet been accepted by WG14, so no changes are made to handling of 
the C2X language dialect.


Patch 1: Language support
Patch 2: New tests
Patch 3: Documentation updates

Tom.

[1]: WG14 N2653
 "char8_t: A type for UTF-8 characters and strings (Revision 1)"
 http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm

[2]: WG21 P0482R6
 "char8_t: A type for UTF-8 characters and strings (Revision 6)"
 https://wg21.link/p0482r6