[PATCH] D132877: Downgrade the UAX31 diagnostic to a warning which defaults to an error

2023-01-03 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman abandoned this revision.
aaron.ballman added a comment.

In D132877#4023269 , @cor3ntin wrote:

> @aaron.ballman I don't think this patch is no longer necessary given that we 
> merged the math profile

Agreed, I'm going to abandon this for now. Thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132877/new/

https://reviews.llvm.org/D132877

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D132877: Downgrade the UAX31 diagnostic to a warning which defaults to an error

2023-01-03 Thread Tom Honermann via Phabricator via cfe-commits
tahonermann added a comment.

> I don't think this patch is no longer necessary given that we merged the math 
> profile

I agree; we can revisit this if complaints from users due to use of characters 
not in the math profile materializes.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132877/new/

https://reviews.llvm.org/D132877

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D132877: Downgrade the UAX31 diagnostic to a warning which defaults to an error

2023-01-03 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin added a comment.

@aaron.ballman I don't think this patch is no longer necessary given that we 
merged the math profile


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132877/new/

https://reviews.llvm.org/D132877

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D132877: Downgrade the UAX31 diagnostic to a warning which defaults to an error

2022-08-29 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin added a comment.

@aaron.ballman i am not worried about performance. The old table was fairly 
compact - as it described large ranges - so a drop in the ocean in terms of 
binary size. And we only need to check them for the case where we are only 
going to emit a warning or an error anyway.

See
https://github.com/llvm/llvm-project/commit/4e80636db71a1b6123d15ed1f9eda3979b4292de#diff-c95475d750149050c46892d3a9731fd839737a51a6f19f8e0580acee052e9b26L43


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132877/new/

https://reviews.llvm.org/D132877

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D132877: Downgrade the UAX31 diagnostic to a warning which defaults to an error

2022-08-29 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman marked an inline comment as done.
aaron.ballman added a comment.

In D132877#3756458 , @cor3ntin wrote:

> I am strongly against this change. If we do want to do this, we need to 
> restore the original tables from c++11 and check against them not to extend 
> the set of characters to something much wider than was supported before 
> c++23. It should be fairly easy to extract those tables from the git history. 
> (Sorry for the very short comment, I'm currently traveling). A warning for 
> non characters, controls, pua characters and everything not previously 
> accepted isn't sufficient. A warning for things in the old set but not 23 is 
> fine as long as, as other have said, we have a depreciation period.

Thanks for this feedback! Ugh, I didn't realize we were going to need to keep 
around both sets of tables, that potentially adds more expense to this option 
than I originally realized, depending on how much extra binary size the tables 
add.

In D132877#3756475 , @efriedma wrote:

> I'm not really a fan of the argument of "this was accepted as a DR, therefore 
> new versions of clang have to reject code that old versions accepted".  
> Regardless of what the C++ committee does, our commitment to our users is to 
> ensure that code written against "-std=c++11" continues to compile in newer 
> versions of clang.

This is the status quo in Clang. When one of the committees says something is a 
DR, the expectation users have is that the feature was always specified to 
behave in the fixed way. This case is a bit special though because the old 
rules were approximately 30+ years old and were "use what you like so long as 
it's not punctuation, yolo" and the new rules are more restrictive. I think our 
commitment  is a bit weaker than "ensure code continues to compile." We've 
never promised we wouldn't fix bugs against older standards, and fixing bugs 
can break otherwise working code. (And WG21 considers this a bug, hence the DR 
designation.) However, you're definitely right that we're not *obligated* to 
make the change in older language modes just because the committee said it's a 
DR. We are obligated to implement the functionality in a conforming manner in 
newer language modes though.

It's worth noting that the code which is broken by this is largely within one 
domain (use of math symbols), so the number of users impacted is believed to be 
relatively small (small enough that we were considering closing the issue as 
Won't Fix instead of making this change). The reason I decided to propose a 
relaxation here was to give the handful of impacted users an easier upgrade 
path.

> I don't want to relitigate the whole discussion from the bug, though; just 
> wanted to make sure you considered the possibilities. If you think this is 
> best, I won't object.

Your opinions are very much appreciated on the topic, it's not one with a slam 
dunk answer (at least that we've found yet).

In D132877#3756415 , @thieta wrote:

> Hmm. 15.0.0 is just a week away - I am not planning any more RCs unless we 
> hit something critical. What's the risk of taking this specific change at 
> this point? Would it make more sense to wait for 15.0.1? (I am guessing it's 
> better if it goes into 15.0.0 or not in 15.x at all).
>
> I am just pushing back because I don't want to introduce to much risk this 
> late in the cycle - but I will defer to you guys since you are the domain 
> experts.

Given the feedback from Corentin above, I think the window for getting this 
into Clang 15 is effectively closed. It's not critical (it doesn't fix a crash 
or a regression); I was mostly hoping it would make it in 15.0 because it 
doesn't seem appropriate to introduce a new warning flag in a dot release (that 
would seem to make build systems harder to work with in practice).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132877/new/

https://reviews.llvm.org/D132877

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D132877: Downgrade the UAX31 diagnostic to a warning which defaults to an error

2022-08-29 Thread Tom Honermann via Phabricator via cfe-commits
tahonermann added a comment.

I don't have a strong opinion regarding when, or if, the diagnostic is reverted 
to an always-error. It looks like gcc is not even planning to diagnose 
identifiers that are ill-formed according to the new rules by default.

With regard to Corentin's opposition to the patch as is, I think it would be 
acceptable to put this patch in Clang 15 as is and then tighten things up as 
Corentin suggests for Clang 16. The patch as is is very low risk (it barely 
even qualifies as a code change). any change to address Corentin's concerns 
would add some additional risk this late in the release, and the only people 
impacted are those that opt-in to the new option. I do agree with Corentin's 
concern that suppressing the error does look like it would have the effect of 
allowing identifiers that were not previously allowed under the immutable 
identifier syntax; I think we do want to ensure that identifiers that are in 
neither immutable identifier syntax nor default identifier syntax are diagnosed 
(at least in Clang 16).




Comment at: clang/docs/ReleaseNotes.rst:119-121
+- Added the ``-Winvalid-identifier-character`` warning group to identify
+  identifier characters which are invalid according to UAX31 but were
+  previously allowed. This warning defaults to an error because these

Most people reading this won't know what UAX31 is. Linking to it would help 
some, but the document is fairly inscrutable to the uninitiated. How about 
something like the following?
  The ``-Winvalid-identifier-character`` warning group was added to manage 
diagnostics
  regarding use of invalid identifiers following the adoption of N2836 and 
P1949 by the
  C and C++ committees respectively. This warning default to an error because 
...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132877/new/

https://reviews.llvm.org/D132877

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D132877: Downgrade the UAX31 diagnostic to a warning which defaults to an error

2022-08-29 Thread Eli Friedman via Phabricator via cfe-commits
efriedma added a comment.

I'm not really a fan of the argument of "this was accepted as a DR, therefore 
new versions of clang have to reject code that old versions accepted".  
Regardless of what the C++ committee does, our commitment to our users is to 
ensure that code written against "-std=c++11" continues to compile in newer 
versions of clang.

I don't want to relitigate the whole discussion from the bug, though; just 
wanted to make sure you considered the possibilities.  If you think this is 
best, I won't object.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132877/new/

https://reviews.llvm.org/D132877

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D132877: Downgrade the UAX31 diagnostic to a warning which defaults to an error

2022-08-29 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin added a comment.

I am strongly against this change. If we do want to do this, we need to restore 
the original tables from c++11 and check against them not to extend the set to 
something much wider than was supported before c++23. It should be fairly easy 
to extract those tables from the git history.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132877/new/

https://reviews.llvm.org/D132877

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D132877: Downgrade the UAX31 diagnostic to a warning which defaults to an error

2022-08-29 Thread Tobias Hieta via Phabricator via cfe-commits
thieta added a comment.

Hmm. 15.0.0 is just a week away - I am not planning any more RCs unless we hit 
something critical. What's the risk of taking this specific change at this 
point? Would it make more sense to wait for 15.0.1? (I am guessing it's better 
if it goes into 15.0.0 or not in 15.x at all).

I am just pushing back because I don't want to introduce to much risk this late 
in the cycle - but I will defer to you guys since you are the domain experts.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132877/new/

https://reviews.llvm.org/D132877

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D132877: Downgrade the UAX31 diagnostic to a warning which defaults to an error

2022-08-29 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman marked an inline comment as done.
aaron.ballman added a comment.

In D132877#3756328 , @efriedma wrote:

> If we don't specifically call out the deprecation period in the diagnostic, 
> usage will grow beyond what we expect.  (Most people don't read the release 
> notes; they'll just see it's possible to turn off the error message, and turn 
> it off.)

I'm okay with that approach (I definitely agree putting the info in the 
diagnostic will get more eyeballs on it than putting it in the release notes), 
but at the same time, we use downgradable errors with the intent to turn them 
into a hard error in the future somewhat frequently and don't add any notice of 
pending doom for them. These diagnostics are all exposed in a way that makes it 
clear the code is an error, so anyone who blanket disables the error should not 
be too terribly surprised when it becomes a hard error later. (Putting the 
deprecation message into the warning itself has the same problem -- users who 
disable the warning entirely won't remember what the text of the warning was 
anyway, and users who inherit build system flags from may never even see the 
diagnostic messages at all.) So personally, I'm also fine not rewording the 
diagnostic.

> If we're okay with people deciding their own rules for what they want to 
> allow in identifiers, we can just make the flag permanently available, 
> though, and just drop the whole "deprecation period" discussion.  Or 
> alternatively, we could add a separate diagnostic specifically for older 
> standard modes: allow only the characters that were allowed by the older 
> standards, and don't emit it for C++23 and newer.  That way, usage naturally 
> goes away as people upgrade their code to newer standard modes.

There's quite a bit of discussion on the issue thread that goes into this, but 
there's pretty strong sentiment from Clang maintainers against making the flag 
permanently available. This behavior is controlled by standards bodies and 
allowing users to ignore that effectively makes a custom language subset (goes 
against our usual policy for language extensions). As Erich mentions, this was 
adopted as a DR in C++ so there's no older standards there. WG14 doesn't have 
the notion of DRs any longer and our replacement mechanism wasn't in place when 
the paper was adopted into C23. Technically that makes it a C23-only change, 
but I don't see any value to making this a DR in C++ but not in C given that 
both languages implement the same identifier rules and there's a reasonable 
user expectation that a valid identifier in C is also valid in C++ and vice 
versa (otherwise you run into problems linking C and C++ together).




Comment at: clang/docs/ReleaseNotes.rst:128
+  advised because we intend to turn the warning back into a hard error in Clang
+  18 after giving users a chance to update their code.
 

efriedma wrote:
> If this makes it into clang 15, we don't want to relnote it for clang 16.
Agreed -- I added the release note so folks could review it, but the plan is to 
either add it directly to the 15.x release notes or if we don't want to 
backport they'll be typical Clang 16 release notes.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132877/new/

https://reviews.llvm.org/D132877

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D132877: Downgrade the UAX31 diagnostic to a warning which defaults to an error

2022-08-29 Thread Erich Keane via Phabricator via cfe-commits
erichkeane added a comment.

In D132877#3756328 , @efriedma wrote:

> If we don't specifically call out the deprecation period in the diagnostic, 
> usage will grow beyond what we expect.  (Most people don't read the release 
> notes; they'll just see it's possible to turn off the error message, and turn 
> it off.)
>
> If we're okay with people deciding their own rules for what they want to 
> allow in identifiers, we can just make the flag permanently available, 
> though, and just drop the whole "deprecation period" discussion.  Or 
> alternatively, we could add a separate diagnostic specifically for older 
> standard modes: allow only the characters that were allowed by the older 
> standards, and don't emit it for C++23 and newer.  That way, usage naturally 
> goes away as people upgrade their code to newer standard modes.

IIRC, this was accepted as DR, so there are NO 'older standards' of C++ in 
which allow these characters.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132877/new/

https://reviews.llvm.org/D132877

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D132877: Downgrade the UAX31 diagnostic to a warning which defaults to an error

2022-08-29 Thread Eli Friedman via Phabricator via cfe-commits
efriedma added a comment.

If we don't specifically call out the deprecation period in the diagnostic, 
usage will grow beyond what we expect.  (Most people don't read the release 
notes; they'll just see it's possible to turn off the error message, and turn 
it off.)

If we're okay with people deciding their own rules for what they want to allow 
in identifiers, we can just make the flag permanently available, though, and 
just drop the whole "deprecation period" discussion.  Or alternatively, we 
could add a separate diagnostic specifically for older standard modes: allow 
only the characters that were allowed by the older standards, and don't emit it 
for C++23 and newer.  That way, usage naturally goes away as people upgrade 
their code to newer standard modes.




Comment at: clang/docs/ReleaseNotes.rst:128
+  advised because we intend to turn the warning back into a hard error in Clang
+  18 after giving users a chance to update their code.
 

If this makes it into clang 15, we don't want to relnote it for clang 16.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132877/new/

https://reviews.llvm.org/D132877

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D132877: Downgrade the UAX31 diagnostic to a warning which defaults to an error

2022-08-29 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added subscribers: thieta, tstellar.
aaron.ballman added a comment.

CC @tstellar and @thieta for any early backporting concerns.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132877/new/

https://reviews.llvm.org/D132877

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D132877: Downgrade the UAX31 diagnostic to a warning which defaults to an error

2022-08-29 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman created this revision.
aaron.ballman added reviewers: tahonermann, cor3ntin, clang-language-wg.
Herald added a project: All.
aaron.ballman requested review of this revision.
Herald added a project: clang.

WG21 P1949  and WG14 N2836 were both adopted 
without any deprecation period for users who were making use of now-invalid 
characters in their identifiers. This caused some amount of pain for people, 
especially folks using mathematical symbols in their identifiers, which makes 
it hard to upgrade to newer Clang versions with this diagnostic as an error.

This patch downgrades the error to be a warning which defaults to an error. 
This continues to signal to users that the identifiers are in fact invalid, but 
it gives a grace period for people to update their code bases to the new rules 
(or, alternatively, time for the Unicode consortium to determine whether some 
of these characters should be allowed in an identifier in the immutable set). I 
am tentatively documenting that grace period as being until we ship Clang 18 
(which should give users a year or two to migrate their code).

Note, because we implemented the UAX31 rules in Clang 14, I would like to try 
to land this patch and backport it to Clang 15 so that we avoid having *two* 
releases with no easy migration path for this subset of users. I think this is 
reasonable because we're not altering the implementation logic at all, just 
modifying what diagnostic kind is emitted by default. If someone has a concern 
about wanting to cherry pick this so late in the cycle, please share those 
concerns ASAP.

Fixes #54732


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D132877

Files:
  clang/docs/ReleaseNotes.rst
  clang/include/clang/Basic/DiagnosticLexKinds.td
  clang/test/Preprocessor/utf8-allowed-chars.c

Index: clang/test/Preprocessor/utf8-allowed-chars.c
===
--- clang/test/Preprocessor/utf8-allowed-chars.c
+++ clang/test/Preprocessor/utf8-allowed-chars.c
@@ -1,58 +1,32 @@
-// RUN: %clang_cc1 %s -fsyntax-only -std=c99 -verify
-// RUN: %clang_cc1 %s -fsyntax-only -std=c11 -Wc99-compat -verify
-// RUN: %clang_cc1 %s -fsyntax-only -x c++ -std=c++03 -Wc++11-compat -verify
-// RUN: %clang_cc1 %s -fsyntax-only -x c++ -std=c++11 -Wc++98-compat -verify
+// RUN: %clang_cc1 %s -fsyntax-only -std=c99 -verify=expected,c99
+// RUN: %clang_cc1 %s -fsyntax-only -std=c11 -Wc99-compat -verify=expected,c11
+// RUN: %clang_cc1 %s -fsyntax-only -x c++ -std=c++03 -Wc++11-compat -verify=expected,cpp
+// RUN: %clang_cc1 %s -fsyntax-only -x c++ -std=c++11 -Wc++98-compat -verify=expected,cpp
+// RUN: %clang_cc1 %s -fsyntax-only -Wno-invalid-identifier-character -std=c2x -verify=off
+// RUN: %clang_cc1 %s -fsyntax-only -Wno-invalid-identifier-character -x c++ -std=c++2b -verify=off
 
 // Note: This file contains Unicode characters; please do not remove them!
 
 // Identifier characters
-extern char aǶ; // C11, C++11
+extern char aǶ; // c11-warning {{using this character in an identifier is incompatible with C99}} \
+   c99-error {{not allowed in an identifier}}
 extern char aª; // C99, C11, C++11
-extern char a΄; // C++03, C11, C++11
+extern char a΄; // cpp-error {{not allowed in an identifier}} \
+   c11-warning {{using this character in an identifier is incompatible with C99}} \
+   c99-error {{not allowed in an identifier}}
 extern char a๐; // C99, C++03, C11, C++11
-extern char a﹅; // none
-extern char x̀; // C11, C++11. Note that this does not have a composed form.
-
-
+extern char a﹅; // expected-error {{not allowed in an identifier}}
 
+// Note that this does not have a composed form.
+extern char x̀; // c11-warning {{using this character in an identifier is incompatible with C99}} \
+   c99-error {{not allowed in an identifier}}
 
 // Identifier initial characters
-extern char ๐; // C++03, C11, C++11
-extern char ̀; // disallowed initially in C11/C++, always in C99
-
-
-
-
-#if __cplusplus
-// expected-error@11 {{not allowed in an identifier}}
-// expected-error@13 {{not allowed in an identifier}}
-// expected-error@20 {{character  not allowed at the start of an identifier}}
-// expected-error@21 {{character  not allowed at the start of an identifier}}
-// expected-warning@20 {{declaration does not declare anything}}
-// expected-warning@21 {{declaration does not declare anything}}
-
-#else
-# if __STDC_VERSION__ >= 201112L
-// C11
-// expected-warning@9 {{using this character in an identifier is incompatible with C99}}
-// expected-warning@11 {{using this character in an identifier is incompatible with C99}}
-// expected-error@13 {{not allowed in an identifier}}
-// expected-warning@14 {{using this character in an identifier is incompatible with C99}}
-// expected-warning@20 {{starting an identifier with this character is incompatible with C99}}
-//