[PATCH] D158372: [Clang] Treat invalid UDL as two tokens

2023-08-22 Thread PoYao Chang via Phabricator via cfe-commits
rZhBoYao added a comment.

OK, will do it by the end of this week.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158372/new/

https://reviews.llvm.org/D158372

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158372: [Clang] Treat invalid UDL as two tokens

2023-08-22 Thread Tom Honermann via Phabricator via cfe-commits
tahonermann requested changes to this revision.
tahonermann added a comment.
This revision now requires changes to proceed.

Given Richard's comments, it seems that changes are needed.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158372/new/

https://reviews.llvm.org/D158372

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158372: [Clang] Treat invalid UDL as two tokens

2023-08-21 Thread Richard Smith - zygoloid via Phabricator via cfe-commits
rsmith added inline comments.



Comment at: clang/docs/ReleaseNotes.rst:97
 ^
-- Implemented `CWG1473 `_ which allows spaces after 
``operator""``.
-  Clang used to err on the lack of space when the literal suffix identifier 
was invalid in
-  all the language modes, which contradicted the deprecation of the 
whitespaces.
-  Also turn on ``-Wdeprecated-literal-operator`` by default in all the 
language modes.
+- Implemented `CWG1473 `_ allowing lack of space 
after ``operator""``.
+  Clang used to err on the lack of space when the literal suffix identifier 
was invalid,

I don't think this is accurate. Clang supported CWG1473 before these changes, 
as far as I can see: all valid code under CWG1473 was accepted, and invalid 
code was diagnosed (by default). Rather, what has changed is the behavior for 
invalid code: instead of treating an invalid `""blah` as two tokens always, in 
order to accept as much old code as possible, we now treat it as two tokens 
only when `blah` is defined as a macro name.

This is still a breaking change in some cases, for users of 
`-Wno-deprecated-literal-operator`, eg:

```
#define FOO(xyz) ""xyz
```

...  now will be lexed as a single invalid token rather than two tokens.

I'm not sure what the motivation for making changes here was, and D153156's 
description doesn't really help me understand it. Is the goal to improve the 
diagnostic quality for these kinds of errors on invalid code? Is there some 
example for which Clang's behavior with regard to CWG1473 was non-conforming? 
Something else?



Comment at: clang/lib/Lex/Lexer.cpp:2020
+  bool IsLiteralOperator =
+  StringRef(BufferPtr, 2).equals("\"\"") && BufferPtr + 2 == TokStart;
+  if (unsigned TokLen = CurPtr - TokStart;

Reverse the order of these checks to do the cheaper check first and to avoid 
the possibility of reading off the end of the input.



Comment at: clang/lib/Lex/Lexer.cpp:2027-2029
+// However, don't diagnose operator""E(...) even if E is a macro as it
+// results in confusing error messages. Hence, ""E would not be treated as
+// string concat; instead it's a single PP token (as it should be).

That's still a breaking change compared to what we designed 
`-Wno-reserved-literal-operator` to do. How often does it happen in practice 
that someone has both an ill-formed literal operator *and* a macro defined to 
the same name as the suffix?



Comment at: clang/lib/Lex/Lexer.cpp:2035
+IdentifierInfo *II = PP->LookUpIdentifierInfo(Result);
+if (II->hasMacroDefinition()) {
+  Diag(TokStart, LangOpts.MSVCCompat

This doesn't check whether the identifier is currently defined as a macro, in 
the presence of modules. It also won't be correct if the lexer has got 
substantially ahead of the preprocessor, and the `#define` has been lexed but 
not yet preprocessed.

Overall, it's not really possible to tell from here whether an identifier is 
defined as a macro. To do this properly, you should instead produce a single 
token here and teach the preprocessor to consider splitting it into two tokens 
if the suffix is a reserved ud-suffix naming a defined macro. In principle you 
could also check from the preprocessor whether the previous produced token was 
`operator` and use that as part of the decision...



Comment at: clang/test/CXX/drs/dr14xx.cpp:487
 
 namespace dr1473 { // dr1473: 18
// NB: sup 1762, test reused there

I don't think this is correct. As far as I can tell, Clang has correctly 
implemented CWG1473 since version 3.2.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158372/new/

https://reviews.llvm.org/D158372

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158372: [Clang] Treat invalid UDL as two tokens

2023-08-21 Thread PoYao Chang via Phabricator via cfe-commits
rZhBoYao added inline comments.



Comment at: clang/lib/Lex/Lexer.cpp:1994-2009
   if (!isAsciiIdentifierStart(C)) {
 if (C == '\\' && tryConsumeIdentifierUCN(CurPtr, Size, Result))
   Consumed = true;
 else if (!isASCII(C) && tryConsumeIdentifierUTF8Char(CurPtr))
   Consumed = true;
 else
   return CurPtr;

cor3ntin wrote:
> I missed that in the previous review, is the FIX-IT here still relevant?
Yes, I also put back the fixit test.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158372/new/

https://reviews.llvm.org/D158372

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158372: [Clang] Treat invalid UDL as two tokens

2023-08-21 Thread PoYao Chang via Phabricator via cfe-commits
rZhBoYao updated this revision to Diff 552060.
rZhBoYao marked 5 inline comments as done.
rZhBoYao edited the summary of this revision.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158372/new/

https://reviews.llvm.org/D158372

Files:
  clang/docs/ReleaseNotes.rst
  clang/include/clang/Basic/DiagnosticLexKinds.td
  clang/include/clang/Basic/DiagnosticSemaKinds.td
  clang/include/clang/Basic/IdentifierTable.h
  clang/lib/Lex/Lexer.cpp
  clang/lib/Sema/SemaDeclCXX.cpp
  clang/test/CXX/drs/dr14xx.cpp
  clang/test/CXX/drs/dr17xx.cpp
  clang/test/CXX/lex/lex.literal/lex.ext/p10.cpp
  clang/test/FixIt/fixit-c++11.cpp

Index: clang/test/FixIt/fixit-c++11.cpp
===
--- clang/test/FixIt/fixit-c++11.cpp
+++ clang/test/FixIt/fixit-c++11.cpp
@@ -68,9 +68,9 @@
 }
 
 #define bar "bar"
-const char *p = "foo" bar;
+const char *p = "foo"bar; // expected-error {{requires a space between}}
 #define ord - '0'
-int k = '4' ord;
+int k = '4'ord; // expected-error {{requires a space between}}
 
 void operator"x" _y(char); // expected-error {{must be '""'}}
 void operator L"" _z(char); // expected-error {{encoding prefix}}
Index: clang/test/CXX/lex/lex.literal/lex.ext/p10.cpp
===
--- clang/test/CXX/lex/lex.literal/lex.ext/p10.cpp
+++ clang/test/CXX/lex/lex.literal/lex.ext/p10.cpp
@@ -1,8 +1,8 @@
 // RUN: %clang_cc1 -std=c++11 -verify %s
 
 using size_t = decltype(sizeof(int));
-void operator ""wibble(const char *); // expected-warning {{user-defined literal suffixes not starting with '_' are reserved; no literal will invoke this operator}}
-void operator ""wibble(const char *, size_t); // expected-warning {{user-defined literal suffixes not starting with '_' are reserved; no literal will invoke this operator}}
+void operator ""wibble(const char *); // expected-warning {{user-defined literal suffixes not starting with '_' are reserved}}
+void operator ""wibble(const char *, size_t); // expected-warning {{user-defined literal suffixes not starting with '_' are reserved}}
 
 template
 void f() {
Index: clang/test/CXX/drs/dr17xx.cpp
===
--- clang/test/CXX/drs/dr17xx.cpp
+++ clang/test/CXX/drs/dr17xx.cpp
@@ -141,9 +141,14 @@
 namespace dr1762 { // dr1762: 14
// NB: reusing 1473 test
 #if __cplusplus >= 201103L
-  float operator ""_E(const char *);
-  float operator ""E(const char *);
-  // expected-warning@-1 {{user-defined literal suffixes not starting with '_' are reserved; no literal will invoke this operator}}
+#define E "!"
+const char
+  *operator""_E(const char*),
+  *operator""E(const char*), // don't err on the lack of spaces even when the literal suffix identifier is invalid
+  // expected-warning@-1 {{user-defined literal suffixes not starting with '_' are reserved}}
+  *s = "not empty"E;
+  // expected-error@-1 {{invalid suffix on literal; C++11 requires a space between literal and a macro}}
+#undef E
 #endif
 }
 
Index: clang/test/CXX/drs/dr14xx.cpp
===
--- clang/test/CXX/drs/dr14xx.cpp
+++ clang/test/CXX/drs/dr14xx.cpp
@@ -487,9 +487,14 @@
 namespace dr1473 { // dr1473: 18
// NB: sup 1762, test reused there
 #if __cplusplus >= 201103L
-  float operator ""_E(const char *);
-  float operator ""E(const char *); // don't err on the lack of spaces even when the literal suffix identifier is invalid
-  // expected-warning@-1 {{user-defined literal suffixes not starting with '_' are reserved; no literal will invoke this operator}}
+#define E "!"
+const char
+  *operator""_E(const char*),
+  *operator""E(const char*), // don't err on the lack of spaces even when the literal suffix identifier is invalid
+  // expected-warning@-1 {{user-defined literal suffixes not starting with '_' are reserved}}
+  *s = "not empty"E;
+  // expected-error@-1 {{invalid suffix on literal; C++11 requires a space between literal and a macro}}
+#undef E
 #endif
 }
 
Index: clang/lib/Sema/SemaDeclCXX.cpp
===
--- clang/lib/Sema/SemaDeclCXX.cpp
+++ clang/lib/Sema/SemaDeclCXX.cpp
@@ -16571,8 +16571,7 @@
 //   contain a double underscore __ are reserved for use by C++
 //   implementations.
 Diag(FnDecl->getLocation(), diag::warn_user_literal_reserved)
-<< static_cast(Status)
-<< StringLiteralParser::isValidUDSuffix(getLangOpts(), II->getName());
+<< static_cast(Status);
   }
 
   return false;
Index: clang/lib/Lex/Lexer.cpp
===
--- clang/lib/Lex/Lexer.cpp
+++ clang/lib/Lex/Lexer.cpp
@@ -1986,36 +1986,26 @@
   assert(LangOpts.CPlusPlus);
 
   // Maximally munch an identifier.
+  const char *const TokStart = CurPtr;
   unsigned Size;
   char C = getCharAndSize(CurPtr, Size);
-  bool Consumed = 

[PATCH] D158372: [Clang] Treat invalid UDL as two tokens

2023-08-21 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added inline comments.



Comment at: clang/include/clang/Basic/DiagnosticLexKinds.td:283-284
+def ext_ms_reserved_user_defined_literal : ExtWarn<
+  "invalid suffix on literal; C++11 requires a space between literal and "
+  "a macro">, InGroup;
 def err_unsupported_string_concat : Error<




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158372/new/

https://reviews.llvm.org/D158372

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158372: [Clang] Treat invalid UDL as two tokens

2023-08-21 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin added inline comments.



Comment at: clang/include/clang/Basic/DiagnosticSemaKinds.td:9322
 def warn_user_literal_reserved : Warning<
-  "user-defined literal suffixes %select{|not starting with 
'_'|containing '__'}0 are reserved"
-  "%select{; no literal will invoke this operator|}1">,
+  "user-defined literal suffixes %select{|not starting with 
'_'|containing '__'}0 are reserved">,
   InGroup;

Can you remove `` and adapt the calling code to adjust the index?



Comment at: clang/lib/Lex/Lexer.cpp:1994-2009
   if (!isAsciiIdentifierStart(C)) {
 if (C == '\\' && tryConsumeIdentifierUCN(CurPtr, Size, Result))
   Consumed = true;
 else if (!isASCII(C) && tryConsumeIdentifierUTF8Char(CurPtr))
   Consumed = true;
 else
   return CurPtr;

I missed that in the previous review, is the FIX-IT here still relevant?



Comment at: clang/lib/Lex/Lexer.cpp:2024
+  // the 'operator""if' defining a numeric literal operator).
+  const unsigned MaxStandardSuffixLength = 3;
+  char Buffer[MaxStandardSuffixLength] = {C};

This sounds brittle. I think we are better off looking ahead to the next 
non-identifier character rather than assuming a size here



Comment at: clang/lib/Lex/Lexer.cpp:2031
+char Next = getCharAndSizeNoWarn(CurPtr + Consumed, NextSize, 
LangOpts);
+if (!isAsciiIdentifierContinue(Next)) {
+  // End of suffix. Check whether this is on the allowed list.

This is also sort of brittle, it assumes standard UDL don't have unicode... but 
that sounds more reasonable today.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158372/new/

https://reviews.llvm.org/D158372

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158372: [Clang] Treat invalid UDL as two tokens

2023-08-20 Thread PoYao Chang via Phabricator via cfe-commits
rZhBoYao created this revision.
rZhBoYao added reviewers: clang-language-wg, aaron.ballman, jyknight.
Herald added a project: All.
rZhBoYao requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

As a language extension, if an invalid a UDL's suffix can result in macro
expansion, it is treated as if whitespace preceded it. This allows legacy
code to co-exist with new code via -Wno-reserved-user-defined-literal.
The following code results in string concat not calling literal operator.

  const char* s = "FOO"BAR;

Address comments in D153156 


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D158372

Files:
  clang/docs/ReleaseNotes.rst
  clang/include/clang/Basic/DiagnosticLexKinds.td
  clang/include/clang/Basic/DiagnosticSemaKinds.td
  clang/lib/Lex/Lexer.cpp
  clang/lib/Sema/SemaDeclCXX.cpp
  clang/test/CXX/drs/dr14xx.cpp
  clang/test/CXX/drs/dr17xx.cpp
  clang/test/CXX/lex/lex.literal/lex.ext/p10.cpp

Index: clang/test/CXX/lex/lex.literal/lex.ext/p10.cpp
===
--- clang/test/CXX/lex/lex.literal/lex.ext/p10.cpp
+++ clang/test/CXX/lex/lex.literal/lex.ext/p10.cpp
@@ -1,8 +1,8 @@
 // RUN: %clang_cc1 -std=c++11 -verify %s
 
 using size_t = decltype(sizeof(int));
-void operator ""wibble(const char *); // expected-warning {{user-defined literal suffixes not starting with '_' are reserved; no literal will invoke this operator}}
-void operator ""wibble(const char *, size_t); // expected-warning {{user-defined literal suffixes not starting with '_' are reserved; no literal will invoke this operator}}
+void operator ""wibble(const char *); // expected-warning {{user-defined literal suffixes not starting with '_' are reserved}}
+void operator ""wibble(const char *, size_t); // expected-warning {{user-defined literal suffixes not starting with '_' are reserved}}
 
 template
 void f() {
Index: clang/test/CXX/drs/dr17xx.cpp
===
--- clang/test/CXX/drs/dr17xx.cpp
+++ clang/test/CXX/drs/dr17xx.cpp
@@ -143,7 +143,7 @@
 #if __cplusplus >= 201103L
   float operator ""_E(const char *);
   float operator ""E(const char *);
-  // expected-warning@-1 {{user-defined literal suffixes not starting with '_' are reserved; no literal will invoke this operator}}
+  // expected-warning@-1 {{user-defined literal suffixes not starting with '_' are reserved}}
 #endif
 }
 
Index: clang/test/CXX/drs/dr14xx.cpp
===
--- clang/test/CXX/drs/dr14xx.cpp
+++ clang/test/CXX/drs/dr14xx.cpp
@@ -489,7 +489,13 @@
 #if __cplusplus >= 201103L
   float operator ""_E(const char *);
   float operator ""E(const char *); // don't err on the lack of spaces even when the literal suffix identifier is invalid
-  // expected-warning@-1 {{user-defined literal suffixes not starting with '_' are reserved; no literal will invoke this operator}}
+  // expected-warning@-1 {{user-defined literal suffixes not starting with '_' are reserved}}
+  const char* s0 = "FOO"BAR;
+  // expected-error@-1 {{no matching literal operator for call to 'operator""BAR' with arguments of types 'const char *' and 'unsigned long', and no matching literal operator template}}
+#define BAR "BAZ"
+  const char* s1 = "FOO"BAR;
+  // expected-error@-1 {{invalid suffix on literal; C++11 requires a space between literal and a macro}}
+#undef BAR
 #endif
 }
 
Index: clang/lib/Sema/SemaDeclCXX.cpp
===
--- clang/lib/Sema/SemaDeclCXX.cpp
+++ clang/lib/Sema/SemaDeclCXX.cpp
@@ -16571,8 +16571,7 @@
 //   contain a double underscore __ are reserved for use by C++
 //   implementations.
 Diag(FnDecl->getLocation(), diag::warn_user_literal_reserved)
-<< static_cast(Status)
-<< StringLiteralParser::isValidUDSuffix(getLangOpts(), II->getName());
+<< static_cast(Status);
   }
 
   return false;
Index: clang/lib/Lex/Lexer.cpp
===
--- clang/lib/Lex/Lexer.cpp
+++ clang/lib/Lex/Lexer.cpp
@@ -1986,6 +1986,7 @@
   assert(LangOpts.CPlusPlus);
 
   // Maximally munch an identifier.
+  const char *TokStart = CurPtr;
   unsigned Size;
   char C = getCharAndSize(CurPtr, Size);
   bool Consumed = false;
@@ -2012,10 +2013,43 @@
   // that does not start with an underscore is ill-formed. We assume a suffix
   // beginning with a UCN or UTF-8 character is more likely to be a ud-suffix
   // than a macro, however, and accept that.
+  bool IsUDSuffix = false;
+  if (!Consumed) {
+if (C == '_')
+  IsUDSuffix = true;
+else if (IsStringLiteral && LangOpts.CPlusPlus14) {
+  // In C++1y, we need to look ahead a few characters to see if this is a
+  // valid suffix for a string literal or a numeric literal (this could be
+  // the 'operator""if' defining a numeric