[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-10-25 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin added inline comments.



Comment at: llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp:46
+  auto SecondSemiPos = Line.find(';', FirstSemiPos + 1);
+  if (FirstSemiPos == std::string::npos)
+continue;

RKSimon wrote:
> @cor3ntin Should this be SecondSemiPos ?
> 
> Report here: https://pvs-studio.com/en/blog/posts/cpp/1003/ (N39)
Yes. Nice catch.
There is no case of that happening in the file so it never manifested.
I'll push a commit fixing it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-10-25 Thread Simon Pilgrim via Phabricator via cfe-commits
RKSimon added inline comments.



Comment at: llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp:46
+  auto SecondSemiPos = Line.find(';', FirstSemiPos + 1);
+  if (FirstSemiPos == std::string::npos)
+continue;

@cor3ntin Should this be SecondSemiPos ?

Report here: https://pvs-studio.com/en/blog/posts/cpp/1003/ (N39)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-06-25 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin added a comment.

@aaron.ballman @tahonermann Thanks for the review. I landed the change after 
confirming with Aaron he was happy with it


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-06-25 Thread Corentin Jabot via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGc92056d03881: [Clang][C++23] P2071 Named universal character 
escapes (authored by cor3ntin).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

Files:
  clang/include/clang/Basic/DiagnosticLexKinds.td
  clang/include/clang/Lex/Lexer.h
  clang/lib/Lex/Lexer.cpp
  clang/lib/Lex/LiteralSupport.cpp
  clang/test/FixIt/fixit-unicode-named-escape-sequences.c
  clang/test/Lexer/char-escapes-delimited.c
  clang/test/Lexer/unicode.c
  clang/test/Parser/cxx11-user-defined-literals.cpp
  clang/test/Preprocessor/ucn-pp-identifier.c
  clang/test/Sema/ucn-identifiers.c
  llvm/CMakeLists.txt
  llvm/include/llvm/Support/Unicode.h
  llvm/lib/Support/CMakeLists.txt
  llvm/lib/Support/UnicodeNameToCodepoint.cpp
  llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
  llvm/unittests/Support/UnicodeTest.cpp
  llvm/utils/UnicodeData/CMakeLists.txt
  llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-06-24 Thread Tom Honermann via Phabricator via cfe-commits
tahonermann accepted this revision as: tahonermann.
tahonermann added a comment.
This revision is now accepted and ready to land.

> @tahonermann gentle ping (Aaron told me you might have further comments)

I'm sorry for the delay. I ran out of time to do the thorough review I would 
have liked to do, but I did scroll through everything now and did not find 
anything concerning; Aaron clearly conducted a thorough review already. It 
looks great to me, really nice work!




Comment at: clang/lib/Lex/Lexer.cpp:3255-3260
+if (!isAlphanumeric(C) && C != '_' && C != '-' && C != ' ')
+  break;
+
+if ((C < 'A' || C > 'Z') && !llvm::isDigit(C) && C != ' ' && C != '-') {
+  Invalid = true;
+}

cor3ntin wrote:
> tahonermann wrote:
> > It isn't clear to me why there are two separate `if` statements here. I 
> > would expect the following to suffice. If I'm misunderstanding something, 
> > then comments might be helpful.
> >   if (!isUppercase(C) && !isDigit(C) && C != '-' && C != ' ') {
> > Invalid = true;
> > break;
> >   }
> > 
> > I'm not sure why there is a test for '_' since that character is not 
> > present in the grammar for `n-char` in P2071.
> > 
> > Is it intentional that there is no `break` statement in the second `if` 
> > statement?
> I improved that, what it does should be more clear now. More importantly, I 
> added a diagnostic note when we detect a loose match.
> 
> We allow `_` because  Unicode does.
> We first perform a strict match - which fails as no Unicode name contains an 
> underscore, we emit a diagnostic, and then we try a loose matching which does 
> allow `_`.
> This enable us to produces better diagnostics
> 
> ```
> :2:18: error: 'GREEK_SMALL_LETTER-OMICRON' is not a valid Unicode 
> character name
> const char* \N{GREEK_SMALL_LETTER-OMICRON} = "\N{zero width no break 
> space}"; 
>  ^
> :2:20: note: characters names in Unicode escape sequences are 
> sensitive to case and whitespaces
> const char* \N{GREEK_SMALL_LETTER-OMICRON} = "\N{zero width no break 
> space}"; 
>^~
>GREEK SMALL LETTER OMICRON
> :2:54: error: 'zero width no break space' is not a valid Unicode 
> character name
> const char* \N{GREEK_SMALL_LETTER-OMICRON} = "\N{zero width no break 
> space}"; 
>  ^
> :2:54: note: characters names in Unicode escape sequences are 
> sensitive to case and whitespaces
> const char* \N{GREEK_SMALL_LETTER-OMICRON} = "\N{zero width no break 
> space}"; 
>  ^
>  ZERO WIDTH NO-BREAK 
> SPACE```
> 
That's great! Very nice!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-06-23 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin added a comment.

@tahonermann gentle ping (Aaron told me you might have further comments)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-06-15 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin added a comment.

In D123064#3585074 , @aaron.ballman 
wrote:

> I had a discussion about the license file on IRC (but not with @tstellar) and 
> the thinking there was: add the license file. It seems to either be required 
> or would be harmless to add. So if we don't hear something different from 
> @tstellar on this, I think you should add the contents of 
> https://www.unicode.org/license.txt to the top of the generated content for 
> `UnicodeNameToCodepointGenerated.cpp` so that the license information stays 
> with the derived data to which it applies.

After more chat with Aaron, I added the license in the generated file, in 
addition of the llvm license header so there is no confusion, This seems to be 
consistent with other third party files like the regex support files.
In the long run, i think we should consolidate all the tools handling unicode 
data in a consistent place, at which point we could commit a license file 
there, along with the data files themselves.




Comment at: clang/lib/Lex/LiteralSupport.cpp:509-511
+if (std::max(Distance, Match.Distance) -
+std::min(Distance, Match.Distance) >
+3)

aaron.ballman wrote:
> 
They are unsigned so that wouldn't work!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-06-15 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 437186.
cor3ntin added a comment.

Fix wrapping in file header


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

Files:
  clang/include/clang/Basic/DiagnosticLexKinds.td
  clang/include/clang/Lex/Lexer.h
  clang/lib/Lex/Lexer.cpp
  clang/lib/Lex/LiteralSupport.cpp
  clang/test/FixIt/fixit-unicode-named-escape-sequences.c
  clang/test/Lexer/char-escapes-delimited.c
  clang/test/Lexer/unicode.c
  clang/test/Parser/cxx11-user-defined-literals.cpp
  clang/test/Preprocessor/ucn-pp-identifier.c
  clang/test/Sema/ucn-identifiers.c
  llvm/CMakeLists.txt
  llvm/include/llvm/Support/Unicode.h
  llvm/lib/Support/CMakeLists.txt
  llvm/lib/Support/UnicodeNameToCodepoint.cpp
  llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
  llvm/unittests/Support/UnicodeTest.cpp
  llvm/utils/UnicodeData/CMakeLists.txt
  llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-06-15 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 437184.
cor3ntin added a comment.

Add unicode license notice to generated file derived fromn the unicode data


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

Files:
  clang/include/clang/Basic/DiagnosticLexKinds.td
  clang/include/clang/Lex/Lexer.h
  clang/lib/Lex/Lexer.cpp
  clang/lib/Lex/LiteralSupport.cpp
  clang/test/FixIt/fixit-unicode-named-escape-sequences.c
  clang/test/Lexer/char-escapes-delimited.c
  clang/test/Lexer/unicode.c
  clang/test/Parser/cxx11-user-defined-literals.cpp
  clang/test/Preprocessor/ucn-pp-identifier.c
  clang/test/Sema/ucn-identifiers.c
  llvm/CMakeLists.txt
  llvm/include/llvm/Support/Unicode.h
  llvm/lib/Support/CMakeLists.txt
  llvm/lib/Support/UnicodeNameToCodepoint.cpp
  llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
  llvm/unittests/Support/UnicodeTest.cpp
  llvm/utils/UnicodeData/CMakeLists.txt
  llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-06-15 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 437160.
cor3ntin marked 17 inline comments as done.
cor3ntin added a comment.

Address more style issues found by Aaron


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

Files:
  clang/include/clang/Basic/DiagnosticLexKinds.td
  clang/include/clang/Lex/Lexer.h
  clang/lib/Lex/Lexer.cpp
  clang/lib/Lex/LiteralSupport.cpp
  clang/test/FixIt/fixit-unicode-named-escape-sequences.c
  clang/test/Lexer/char-escapes-delimited.c
  clang/test/Lexer/unicode.c
  clang/test/Parser/cxx11-user-defined-literals.cpp
  clang/test/Preprocessor/ucn-pp-identifier.c
  clang/test/Sema/ucn-identifiers.c
  llvm/CMakeLists.txt
  llvm/include/llvm/Support/Unicode.h
  llvm/lib/Support/CMakeLists.txt
  llvm/lib/Support/UnicodeNameToCodepoint.cpp
  llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
  llvm/unittests/Support/UnicodeTest.cpp
  llvm/utils/UnicodeData/CMakeLists.txt
  llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-06-15 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added a comment.

I had a discussion about the license file on IRC (but not with @tstellar) and 
the thinking there was: add the license file. It seems to either be required or 
would be harmless to add. So if we don't hear something different from 
@tstellar on this, I think you should add the contents of 
https://www.unicode.org/license.txt to the top of the generated content for 
`UnicodeNameToCodepointGenerated.cpp` so that the license information stays 
with the derived data to which it applies.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-06-15 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added inline comments.



Comment at: clang/lib/Lex/LiteralSupport.cpp:509-511
+if (std::max(Distance, Match.Distance) -
+std::min(Distance, Match.Distance) >
+3)




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-06-15 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added a comment.

Some likely final coding style nits. The only thing of substance I'm waiting on 
is an answer to whether we need to update a license file in order to comply 
with the Unicode license requirements. @tstellar, any chance you can help there?




Comment at: clang/lib/Lex/Lexer.cpp:3237-3239
+  if (C != '{') {
+if (Diagnose)
+  Diag(StartPtr, diag::warn_ucn_escape_incomplete);

cor3ntin wrote:
> aaron.ballman wrote:
> > Is this a case where we might want a fixit because the user did `\N` when 
> > they meant to do `\n`?
> > 
> > (Should we look for `\N(` and fixit that to `\N{`?)
> I think if we wanted to diagnose \n we should also diagnose \U, which we 
> don't do, Maybe a follow up patch, what do you think?
> I can't imagine trying to be smart about `\N(` would be exercised  by many 
> users
Follow-up (if at all) is fine by me!



Comment at: llvm/lib/Support/UnicodeNameToCodepoint.cpp:41
+  constexpr bool isValid() const {
+return Name.size() != 0 || Value == 0x;
+  }





Comment at: llvm/lib/Support/UnicodeNameToCodepoint.cpp:45
+
+  std::string FullName() const {
+std::string s;





Comment at: llvm/lib/Support/UnicodeNameToCodepoint.cpp:120
+
+static bool StartsWith(StringRef Name, StringRef Needle, bool Strict,
+   std::size_t , char ,





Comment at: llvm/lib/Support/UnicodeNameToCodepoint.cpp:131
+  }
+  if (Needle.size() == 0)
+return true;





Comment at: llvm/lib/Support/UnicodeNameToCodepoint.cpp:183
+BufferType , const Node *Parent = nullptr) {
+  auto N = readNode(Offset, Parent);
+  std::size_t Consummed = 0;





Comment at: llvm/lib/Support/UnicodeNameToCodepoint.cpp:222
+// clang-format off
+constexpr const char *const hangul_syllables[][3] = {
+{ "G",  "A",   ""   },





Comment at: llvm/lib/Support/UnicodeNameToCodepoint.cpp:303
+  Name = Name.substr(findSyllable(Name, Strict, NameStart, T, 2));
+  if (L != -1 && V != -1 && T != -1 && Name.size() == 0) {
+if (!Strict) {





Comment at: llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp:1-3
+//===- utils/UnicodeData/UnicodeNameMappingGenerator.cpp - Unicode name data
+// generator -===//
+//-*- C++ -*-===//

Something looks amiss here -- no idea how we usually handle > 80 col file 
names, but maybe removing the `utils/UnicodeData/` prefix will suffice?



Comment at: llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp:31
+
+static const llvm::StringRef LETTERS =
+" _-ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";





Comment at: llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp:56
+
+  auto Name =
+  Line.substr(FirstSemiPos + 1, SecondSemiPos - FirstSemiPos - 1);





Comment at: llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp:59
+
+  if (Name.size() && Name[0] == '<') {
+// Ignore ranges of characters, as their name is either absent or





Comment at: llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp:67
+  if (IsAliasFile) {
+auto Kind = Line.substr(SecondSemiPos + 1);
+if (Kind != "control" && Kind != "correction" && Kind != "alternate")





Comment at: llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp:206
+
+  assert(N->Name.size() != 0);
+  Offsets[N] = Offset;





Comment at: llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp:349
+
+  auto Out = fopen(argv[3], "w");
+  if (!Out) {





Comment at: llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp:360-361
+  for (const std::pair  : Entries) {
+const auto  = Entry.first;
+const auto  = Entry.second;
+// Ignore names which are not valid.





Comment at: llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp:379-380
+  std::pair> Data = T.serialize();
+  const auto  = Data.first;
+  const auto  = Data.second;
+




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-06-14 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 436703.
cor3ntin added a comment.

Fix more style issue (variable casing)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

Files:
  clang/include/clang/Basic/DiagnosticLexKinds.td
  clang/include/clang/Lex/Lexer.h
  clang/lib/Lex/Lexer.cpp
  clang/lib/Lex/LiteralSupport.cpp
  clang/test/FixIt/fixit-unicode-named-escape-sequences.c
  clang/test/Lexer/char-escapes-delimited.c
  clang/test/Lexer/unicode.c
  clang/test/Parser/cxx11-user-defined-literals.cpp
  clang/test/Preprocessor/ucn-pp-identifier.c
  clang/test/Sema/ucn-identifiers.c
  llvm/CMakeLists.txt
  llvm/include/llvm/Support/Unicode.h
  llvm/lib/Support/CMakeLists.txt
  llvm/lib/Support/UnicodeNameToCodepoint.cpp
  llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
  llvm/unittests/Support/UnicodeTest.cpp
  llvm/utils/UnicodeData/CMakeLists.txt
  llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-06-14 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin marked an inline comment as done.
cor3ntin added inline comments.



Comment at: clang/lib/Lex/Lexer.cpp:3237-3239
+  if (C != '{') {
+if (Diagnose)
+  Diag(StartPtr, diag::warn_ucn_escape_incomplete);

aaron.ballman wrote:
> Is this a case where we might want a fixit because the user did `\N` when 
> they meant to do `\n`?
> 
> (Should we look for `\N(` and fixit that to `\N{`?)
I think if we wanted to diagnose \n we should also diagnose \U, which we don't 
do, Maybe a follow up patch, what do you think?
I can't imagine trying to be smart about `\N(` would be exercised  by many users


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-06-14 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 436682.
cor3ntin marked 12 inline comments as done.
cor3ntin added a comment.

Regenerate UnicodeNameToCodepointGenerated.cpp to fix the formatting of its
header.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

Files:
  clang/include/clang/Basic/DiagnosticLexKinds.td
  clang/include/clang/Lex/Lexer.h
  clang/lib/Lex/Lexer.cpp
  clang/lib/Lex/LiteralSupport.cpp
  clang/test/FixIt/fixit-unicode-named-escape-sequences.c
  clang/test/Lexer/char-escapes-delimited.c
  clang/test/Lexer/unicode.c
  clang/test/Parser/cxx11-user-defined-literals.cpp
  clang/test/Preprocessor/ucn-pp-identifier.c
  clang/test/Sema/ucn-identifiers.c
  llvm/CMakeLists.txt
  llvm/include/llvm/Support/Unicode.h
  llvm/lib/Support/CMakeLists.txt
  llvm/lib/Support/UnicodeNameToCodepoint.cpp
  llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
  llvm/unittests/Support/UnicodeTest.cpp
  llvm/utils/UnicodeData/CMakeLists.txt
  llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-06-14 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 436661.
cor3ntin marked an inline comment as done.
cor3ntin added a comment.

Address Aaron's comments

- `{}` => `llvm::None` in Lexer.cpp
- Fix casing in UnicodeNameToCodepoint.cpp to match the style, and a couple I 
missed in UnicodeNameMappingGenerator.cpp
- Fix an underflow in `LiteralSupport.cpp`
- Add an assert in `ProcessNamedUCNEscape`
- Reserve less space in `fullName()` + add comment in 
`UnicodeNameToCodepoint.cpp`
- remove unused jamo constants in `UnicodeNameToCodepoint.cpp`
- Add license + unicode url to `UnicodeNameMappingGenerator.cpp`


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

Files:
  clang/include/clang/Basic/DiagnosticLexKinds.td
  clang/include/clang/Lex/Lexer.h
  clang/lib/Lex/Lexer.cpp
  clang/lib/Lex/LiteralSupport.cpp
  clang/test/FixIt/fixit-unicode-named-escape-sequences.c
  clang/test/Lexer/char-escapes-delimited.c
  clang/test/Lexer/unicode.c
  clang/test/Parser/cxx11-user-defined-literals.cpp
  clang/test/Preprocessor/ucn-pp-identifier.c
  clang/test/Sema/ucn-identifiers.c
  llvm/CMakeLists.txt
  llvm/include/llvm/Support/Unicode.h
  llvm/lib/Support/CMakeLists.txt
  llvm/lib/Support/UnicodeNameToCodepoint.cpp
  llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
  llvm/unittests/Support/UnicodeTest.cpp
  llvm/utils/UnicodeData/CMakeLists.txt
  llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-06-13 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added inline comments.



Comment at: llvm/lib/Support/UnicodeNameToCodepoint.cpp:49
+std::string s;
+s.reserve(64);
+auto n = this;

cor3ntin wrote:
> aaron.ballman wrote:
> > aaron.ballman wrote:
> > > Any particular reason for 64?
> > Still wondering why 64 bytes specifically.
> It's semi arbitrary (aka a nice power of two that fits in a cacheline) - but 
> it's large enough that it fits the 99% of names (the 99th percentile is 
> actually around 46 byte)
Okay, 64 is fine by me then, but if you wanted to bump down to 46 I also 
wouldn't be sad, it's your call.



Comment at: llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp:17-18
+// List of generated names
+// Should be kept in sync with Unicode
+// "Name Derivation Rule Prefix String".
+static bool generated(char32_t c) {

cor3ntin wrote:
> aaron.ballman wrote:
> > aaron.ballman wrote:
> > > Do we have something more direct to point users towards?
> > Unanswered question here. May be a good place for a link like Tom had 
> > mentioned.
> This comment should no longer apply as I got rid of the `generated` method - 
> instead only relying on info we find when parsing the file (see line 44).
> It doesn't mean that we don't need more reference to the UnicodeData.txt url 
> though
Ah, okay -- basically, I'm hoping to have a comment somewhere near the top of 
this file with a link to the Unicode data.

Btw, I noticed this file is missing its license header, you should add that.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-06-13 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin added inline comments.



Comment at: llvm/lib/Support/UnicodeNameToCodepoint.cpp:49
+std::string s;
+s.reserve(64);
+auto n = this;

aaron.ballman wrote:
> aaron.ballman wrote:
> > Any particular reason for 64?
> Still wondering why 64 bytes specifically.
It's semi arbitrary (aka a nice power of two that fits in a cacheline) - but 
it's large enough that it fits the 99% of names (the 99th percentile is 
actually around 46 byte)



Comment at: llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp:17-18
+// List of generated names
+// Should be kept in sync with Unicode
+// "Name Derivation Rule Prefix String".
+static bool generated(char32_t c) {

aaron.ballman wrote:
> aaron.ballman wrote:
> > Do we have something more direct to point users towards?
> Unanswered question here. May be a good place for a link like Tom had 
> mentioned.
This comment should no longer apply as I got rid of the `generated` method - 
instead only relying on info we find when parsing the file (see line 44).
It doesn't mean that we don't need more reference to the UnicodeData.txt url 
though


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-06-13 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added inline comments.



Comment at: clang/lib/Lex/Lexer.cpp:3139
   Diag(SlashLoc, diag::warn_ucn_not_valid_in_c89);
-return 0;
+return {};
   }

aaron.ballman wrote:
> FWIW, I think using `llvm::None` instead of `{}` is more clear to readers (I 
> doubt we're consistent with this in the code base though).
> 
> (Same comment applies other places that we're making an empty `Optional`.)
I think this comment was missed.



Comment at: clang/lib/Lex/LiteralSupport.cpp:509
+  Distance = Match.Distance;
+if (Distance - Match.Distance > 3)
+  break;

Can the subtraction of two `unsigned` values cause a wrapping issue here? (If 
not, can we add an assertion?)



Comment at: clang/lib/Lex/LiteralSupport.cpp:537
+  const char *UcnBegin = ThisTokBuf;
+  assert(UcnBegin[1] == 'N');
+  ThisTokBuf += 2;

Should we assert `UcnBegin[0] == '\\'` as well?



Comment at: llvm/lib/Support/UnicodeNameToCodepoint.cpp:31-44
+struct node {
+  bool is_root = false;
+  char32_t value = 0x;
+  uint32_t children_offset = 0;
+  bool has_sibling = false;
+  uint32_t size = 0;
+  StringRef name = {};

aaron.ballman wrote:
> You should rename things to match the usual coding conventions.
It looks like this comment was missed (it applies to this whole file).



Comment at: llvm/lib/Support/UnicodeNameToCodepoint.cpp:49
+std::string s;
+s.reserve(64);
+auto n = this;

aaron.ballman wrote:
> Any particular reason for 64?
Still wondering why 64 bytes specifically.



Comment at: llvm/lib/Support/UnicodeNameToCodepoint.cpp:257
+constexpr const char32_t SBase = 0xAC00;
+// constexpr const char32_t LBase = 0x1100;
+// constexpr const char32_t VBase = 0x1161;

aaron.ballman wrote:
> Plan to remove the commented out code?
Unanswered question here.



Comment at: llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp:17-18
+// List of generated names
+// Should be kept in sync with Unicode
+// "Name Derivation Rule Prefix String".
+static bool generated(char32_t c) {

aaron.ballman wrote:
> Do we have something more direct to point users towards?
Unanswered question here. May be a good place for a link like Tom had mentioned.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-06-13 Thread Tom Stellard via Phabricator via cfe-commits
tstellar added a comment.

In D123064#3575503 , @cor3ntin wrote:

> @tstellar I saw you say in another path that you got confirmation from 
> lawyers that it's okay to include UnicodeData.txt (which this patch doesn't 
> do) and derived data (which this patch does). Can you confirm? Thanks

This is fine, please provide a link to the original data being used.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-06-11 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 436142.
cor3ntin added a comment.

Do not hardcode the list of generated code points ranges in the generator
code to ease maintainance burden.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

Files:
  clang/include/clang/Basic/DiagnosticLexKinds.td
  clang/include/clang/Lex/Lexer.h
  clang/lib/Lex/Lexer.cpp
  clang/lib/Lex/LiteralSupport.cpp
  clang/test/FixIt/fixit-unicode-named-escape-sequences.c
  clang/test/Lexer/char-escapes-delimited.c
  clang/test/Lexer/unicode.c
  clang/test/Parser/cxx11-user-defined-literals.cpp
  clang/test/Preprocessor/ucn-pp-identifier.c
  clang/test/Sema/ucn-identifiers.c
  llvm/CMakeLists.txt
  llvm/include/llvm/Support/Unicode.h
  llvm/lib/Support/CMakeLists.txt
  llvm/lib/Support/UnicodeNameToCodepoint.cpp
  llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
  llvm/unittests/Support/UnicodeTest.cpp
  llvm/utils/UnicodeData/CMakeLists.txt
  llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-06-11 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 436137.
cor3ntin added a comment.

Use utohexstr and revert the changes that put to_hexString in StringExtras.
I'll remove to_hexString entierly in a separate NFC change to avoid
duplication and further confusion.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

Files:
  clang/include/clang/Basic/DiagnosticLexKinds.td
  clang/include/clang/Lex/Lexer.h
  clang/lib/Lex/Lexer.cpp
  clang/lib/Lex/LiteralSupport.cpp
  clang/test/FixIt/fixit-unicode-named-escape-sequences.c
  clang/test/Lexer/char-escapes-delimited.c
  clang/test/Lexer/unicode.c
  clang/test/Parser/cxx11-user-defined-literals.cpp
  clang/test/Preprocessor/ucn-pp-identifier.c
  clang/test/Sema/ucn-identifiers.c
  llvm/CMakeLists.txt
  llvm/include/llvm/Support/Unicode.h
  llvm/lib/Support/CMakeLists.txt
  llvm/lib/Support/UnicodeNameToCodepoint.cpp
  llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
  llvm/unittests/Support/UnicodeTest.cpp
  llvm/utils/UnicodeData/CMakeLists.txt
  llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-06-11 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin added a comment.

@tstellar I saw you say in another path that you got confirmation from lawyers 
that it's okay to include UnicodeData.txt (which this patch doesn't do) and 
derived data (which this patch does). Can you confirm? Thanks


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-06-11 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 436133.
cor3ntin added a comment.

- Rebase
- The generator code is more consistent with LLVM style guides.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

Files:
  clang/include/clang/Basic/DiagnosticLexKinds.td
  clang/include/clang/Lex/Lexer.h
  clang/lib/Lex/Lexer.cpp
  clang/lib/Lex/LiteralSupport.cpp
  clang/test/FixIt/fixit-unicode-named-escape-sequences.c
  clang/test/Lexer/char-escapes-delimited.c
  clang/test/Lexer/unicode.c
  clang/test/Parser/cxx11-user-defined-literals.cpp
  clang/test/Preprocessor/ucn-pp-identifier.c
  clang/test/Sema/ucn-identifiers.c
  llvm/CMakeLists.txt
  llvm/include/llvm/ADT/StringExtras.h
  llvm/include/llvm/Support/ScopedPrinter.h
  llvm/include/llvm/Support/Unicode.h
  llvm/lib/Support/CMakeLists.txt
  llvm/lib/Support/ScopedPrinter.cpp
  llvm/lib/Support/StringExtras.cpp
  llvm/lib/Support/UnicodeNameToCodepoint.cpp
  llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
  llvm/unittests/Support/UnicodeTest.cpp
  llvm/utils/UnicodeData/CMakeLists.txt
  llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-04-11 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added inline comments.



Comment at: llvm/include/llvm/ADT/StringExtras.h:329
 
+std::string to_hexString(uint64_t Value, bool UpperCase = true);
+

cor3ntin wrote:
> aaron.ballman wrote:
> > `utohexstr()` already exists on line 152 -- any reason we can't reuse that?
> I guess I missed that. Why do we have 2 functions doing the same thing?
> How do you want me to clean that up?
> I think we should have a separate nfc patch to clean that afterwards
In this patch, I'd replace your new calls to `to_hexString()` with calls to 
`utohexstr()` and then in an NFC change, I'd get rid of `to_hexString()` 
entirely.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-04-11 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin added inline comments.



Comment at: clang/lib/Lex/LiteralSupport.cpp:238
+ diag::err_delimited_escape_missing_brace)
+<< std::string(1, 'o');
 

aaron.ballman wrote:
> Can you use `"o"` instead of constructing a `std::string` for it?
Why do simple when you can do complicated?



Comment at: clang/lib/Lex/LiteralSupport.cpp:501
+  auto Matches = llvm::sys::unicode::nearestMatchesForCodepointName(Name, 5);
+  assert(!Matches.empty() && "No unicode characters found");
+

aaron.ballman wrote:
> Just double checking that the assertion here is valid and the function can 
> never return an empty set.
you should always get some result yes, at it find all the characters with the 
smallest edit distance



Comment at: llvm/include/llvm/ADT/StringExtras.h:329
 
+std::string to_hexString(uint64_t Value, bool UpperCase = true);
+

aaron.ballman wrote:
> `utohexstr()` already exists on line 152 -- any reason we can't reuse that?
I guess I missed that. Why do we have 2 functions doing the same thing?
How do you want me to clean that up?
I think we should have a separate nfc patch to clean that afterwards


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-04-11 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 421958.
cor3ntin added a comment.

Cleanup generated code header comment


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

Files:
  clang/include/clang/Basic/DiagnosticLexKinds.td
  clang/include/clang/Lex/Lexer.h
  clang/lib/Lex/Lexer.cpp
  clang/lib/Lex/LiteralSupport.cpp
  clang/test/FixIt/fixit-unicode-named-escape-sequences.c
  clang/test/Lexer/char-escapes-delimited.c
  clang/test/Lexer/unicode.c
  clang/test/Parser/cxx11-user-defined-literals.cpp
  clang/test/Preprocessor/ucn-pp-identifier.c
  clang/test/Sema/ucn-identifiers.c
  llvm/CMakeLists.txt
  llvm/include/llvm/ADT/StringExtras.h
  llvm/include/llvm/Support/ScopedPrinter.h
  llvm/include/llvm/Support/Unicode.h
  llvm/lib/Support/CMakeLists.txt
  llvm/lib/Support/ScopedPrinter.cpp
  llvm/lib/Support/StringExtras.cpp
  llvm/lib/Support/UnicodeNameToCodepoint.cpp
  llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
  llvm/unittests/Support/UnicodeTest.cpp
  llvm/utils/UnicodeData/CMakeLists.txt
  llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-04-11 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 421956.
cor3ntin marked 17 inline comments as done.
cor3ntin added a comment.

Address Aaron's comments


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

Files:
  clang/include/clang/Basic/DiagnosticLexKinds.td
  clang/include/clang/Lex/Lexer.h
  clang/lib/Lex/Lexer.cpp
  clang/lib/Lex/LiteralSupport.cpp
  clang/test/FixIt/fixit-unicode-named-escape-sequences.c
  clang/test/Lexer/char-escapes-delimited.c
  clang/test/Lexer/unicode.c
  clang/test/Parser/cxx11-user-defined-literals.cpp
  clang/test/Preprocessor/ucn-pp-identifier.c
  clang/test/Sema/ucn-identifiers.c
  llvm/CMakeLists.txt
  llvm/include/llvm/ADT/StringExtras.h
  llvm/include/llvm/Support/ScopedPrinter.h
  llvm/include/llvm/Support/Unicode.h
  llvm/lib/Support/CMakeLists.txt
  llvm/lib/Support/ScopedPrinter.cpp
  llvm/lib/Support/StringExtras.cpp
  llvm/lib/Support/UnicodeNameToCodepoint.cpp
  llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
  llvm/unittests/Support/UnicodeTest.cpp
  llvm/utils/UnicodeData/CMakeLists.txt
  llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-04-11 Thread Chris Lattner via Phabricator via cfe-commits
lattner added a comment.

Yeah, we should discuss this, thanks for raising this Aaron.  I'm not sure 
exactly what is being pulled in: @cor3ntin can you please email a summary of 
the situation to bo...@llvm.org and we'll discuss it and run it by Heather as 
needed?  Thanks

-Chris


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-04-11 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added reviewers: aaron.ballman, tahonermann, clang-language-wg.
aaron.ballman added subscribers: tstellar, lattner.
aaron.ballman added a comment.

Thank you for this functionality! I did a pretty quick pass over it and have 
some comments.




Comment at: clang/include/clang/Basic/DiagnosticLexKinds.td:130-131
 
 def ext_delimited_escape_sequence : Extension<
-  "delimited escape sequences are a Clang extension">,
+  "%select{delimited|named}0 escape sequences are a Clang extension">,
   InGroup>;

cor3ntin wrote:
> tahonermann wrote:
> > I don't see much value in combining these diagnostics since these are 
> > distinct features. The `ext_delimited_escape_sequence` name seems odd for 
> > named escape sequences too (even if both features use `{` and `}` as 
> > delimiters).
> I'm used to @aaron.ballman encouraging me to keep the number of diagnostics 
> under control, but I'm fine keeping them separate 
While Tom's right that they're distinct features, diagnostics aren't tied 
directly to a feature in Clang; we try to reuse existing diagnostic text as 
much as possible. My preference is to keep them combined so we reduce the 
number of strings we build into the executable (and we make life easier for 
eventual localization of the diagnostics).



Comment at: clang/lib/Lex/Lexer.cpp:3139
   Diag(SlashLoc, diag::warn_ucn_not_valid_in_c89);
-return 0;
+return {};
   }

FWIW, I think using `llvm::None` instead of `{}` is more clear to readers (I 
doubt we're consistent with this in the code base though).

(Same comment applies other places that we're making an empty `Optional`.)



Comment at: clang/lib/Lex/Lexer.cpp:3226
+llvm::Optional Lexer::tryReadNamedUCN(const char *,
+const char *, Token *Result) {
+  unsigned CharSize;





Comment at: clang/lib/Lex/Lexer.cpp:3237-3239
+  if (C != '{') {
+if (Diagnose)
+  Diag(StartPtr, diag::warn_ucn_escape_incomplete);

Is this a case where we might want a fixit because the user did `\N` when they 
meant to do `\n`?

(Should we look for `\N(` and fixit that to `\N{`?)



Comment at: clang/lib/Lex/Lexer.cpp:3290
+// and we should not make invalid suggestions.
+  }
+

I think it'd be more clear to set `Res = LooseMatch->CodePoint;` before exiting 
here so that after this block, you can always assume there's a valid code point 
(this helps if we later want to add more logic after this point, too).



Comment at: clang/lib/Lex/Lexer.cpp:3292-3294
+  if (Diagnose && PP && !LooseMatch) {
+Diag(BufferPtr, diag::ext_delimited_escape_sequence) << /*named*/ 1;
+  }





Comment at: clang/lib/Lex/Lexer.cpp:3305
+  }
+  return Res ? *Res : LooseMatch->CodePoint;
+}

Then this can become `return *Res;`



Comment at: clang/lib/Lex/Lexer.cpp:3315-3317
+CodePointOpt = tryReadNumericUCN(StartPtr, SlashLoc, Result);
+
+  else if (Kind == 'N')





Comment at: clang/lib/Lex/Lexer.cpp:3325-3327
+  if (Result) {
+Result->setFlag(Token::HasUCN);
+  }

I think this should live inside the other `tryRead*` functions once we know the 
token is definitely a UCN, in case someone finds a reason to want to call one 
of those concretely but not call `tryReadUCN()` to do so.



Comment at: clang/lib/Lex/LiteralSupport.cpp:238
+ diag::err_delimited_escape_missing_brace)
+<< std::string(1, 'o');
 

Can you use `"o"` instead of constructing a `std::string` for it?



Comment at: clang/lib/Lex/LiteralSupport.cpp:362
+  assert(Delim != Input.end());
+  auto Res = llvm::sys::unicode::nameToCodepointLooseMatching(
+  StringRef(I, std::distance(I, Delim)));

Please spell out the type for `Res` as it's not obvious from context what it is.



Comment at: clang/lib/Lex/LiteralSupport.cpp:400-406
   const char *UcnBegin = ThisTokBuf;
-
   // Skip the '\u' char's.
   ThisTokBuf += 2;
-
-  bool Delimited = false;
-  bool EndDelimiterFound = false;
   bool HasError = false;
 
+  Delimited = false;
+  bool EndDelimiterFound = false;

Might as well clean this up a bit more



Comment at: clang/lib/Lex/LiteralSupport.cpp:488
+
+  auto Res = llvm::sys::unicode::nameToCodepointLooseMatching(Name);
+  if (Res) {

Please spell the type out.



Comment at: clang/lib/Lex/LiteralSupport.cpp:500
+  unsigned Distance = 0;
+  auto Matches = llvm::sys::unicode::nearestMatchesForCodepointName(Name, 5);
+  assert(!Matches.empty() && "No unicode characters found");

Same here.



Comment at: clang/lib/Lex/LiteralSupport.cpp:501
+  auto 

[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-04-11 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 421939.
cor3ntin added a comment.

Add a test for `\o'


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

Files:
  clang/include/clang/Basic/DiagnosticLexKinds.td
  clang/include/clang/Lex/Lexer.h
  clang/lib/Lex/Lexer.cpp
  clang/lib/Lex/LiteralSupport.cpp
  clang/test/FixIt/fixit-unicode-named-escape-sequences.c
  clang/test/Lexer/char-escapes-delimited.c
  clang/test/Lexer/unicode.c
  clang/test/Parser/cxx11-user-defined-literals.cpp
  clang/test/Preprocessor/ucn-pp-identifier.c
  clang/test/Sema/ucn-identifiers.c
  llvm/CMakeLists.txt
  llvm/include/llvm/ADT/StringExtras.h
  llvm/include/llvm/Support/ScopedPrinter.h
  llvm/include/llvm/Support/Unicode.h
  llvm/lib/Support/CMakeLists.txt
  llvm/lib/Support/ScopedPrinter.cpp
  llvm/lib/Support/StringExtras.cpp
  llvm/lib/Support/UnicodeNameToCodepoint.cpp
  llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
  llvm/unittests/Support/UnicodeTest.cpp
  llvm/utils/UnicodeData/CMakeLists.txt
  llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-04-10 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 421787.
cor3ntin added a comment.

A non-existing name could return an engaged value if the
whole string matched the node's name, even if that node had no
attached value.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

Files:
  clang/include/clang/Basic/DiagnosticLexKinds.td
  clang/include/clang/Lex/Lexer.h
  clang/lib/Lex/Lexer.cpp
  clang/lib/Lex/LiteralSupport.cpp
  clang/test/FixIt/fixit-unicode-named-escape-sequences.c
  clang/test/Lexer/char-escapes-delimited.c
  clang/test/Lexer/unicode.c
  clang/test/Parser/cxx11-user-defined-literals.cpp
  clang/test/Preprocessor/ucn-pp-identifier.c
  clang/test/Sema/ucn-identifiers.c
  llvm/CMakeLists.txt
  llvm/include/llvm/ADT/StringExtras.h
  llvm/include/llvm/Support/ScopedPrinter.h
  llvm/include/llvm/Support/Unicode.h
  llvm/lib/Support/CMakeLists.txt
  llvm/lib/Support/ScopedPrinter.cpp
  llvm/lib/Support/StringExtras.cpp
  llvm/lib/Support/UnicodeNameToCodepoint.cpp
  llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
  llvm/unittests/Support/UnicodeTest.cpp
  llvm/utils/UnicodeData/CMakeLists.txt
  llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-04-10 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 421771.
cor3ntin added a comment.

Formatting


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

Files:
  clang/include/clang/Basic/DiagnosticLexKinds.td
  clang/include/clang/Lex/Lexer.h
  clang/lib/Lex/Lexer.cpp
  clang/lib/Lex/LiteralSupport.cpp
  clang/test/FixIt/fixit-unicode-named-escape-sequences.c
  clang/test/Lexer/char-escapes-delimited.c
  clang/test/Lexer/unicode.c
  clang/test/Parser/cxx11-user-defined-literals.cpp
  clang/test/Preprocessor/ucn-pp-identifier.c
  clang/test/Sema/ucn-identifiers.c
  llvm/CMakeLists.txt
  llvm/include/llvm/ADT/StringExtras.h
  llvm/include/llvm/Support/ScopedPrinter.h
  llvm/include/llvm/Support/Unicode.h
  llvm/lib/Support/CMakeLists.txt
  llvm/lib/Support/ScopedPrinter.cpp
  llvm/lib/Support/StringExtras.cpp
  llvm/lib/Support/UnicodeNameToCodepoint.cpp
  llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
  llvm/unittests/Support/UnicodeTest.cpp
  llvm/utils/UnicodeData/CMakeLists.txt
  llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-04-10 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 421769.
cor3ntin added a comment.

Fix linking on windows.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

Files:
  clang/include/clang/Basic/DiagnosticLexKinds.td
  clang/include/clang/Lex/Lexer.h
  clang/lib/Lex/Lexer.cpp
  clang/lib/Lex/LiteralSupport.cpp
  clang/test/FixIt/fixit-unicode-named-escape-sequences.c
  clang/test/Lexer/char-escapes-delimited.c
  clang/test/Lexer/unicode.c
  clang/test/Parser/cxx11-user-defined-literals.cpp
  clang/test/Preprocessor/ucn-pp-identifier.c
  clang/test/Sema/ucn-identifiers.c
  llvm/CMakeLists.txt
  llvm/include/llvm/ADT/StringExtras.h
  llvm/include/llvm/Support/ScopedPrinter.h
  llvm/include/llvm/Support/Unicode.h
  llvm/lib/Support/CMakeLists.txt
  llvm/lib/Support/ScopedPrinter.cpp
  llvm/lib/Support/StringExtras.cpp
  llvm/lib/Support/UnicodeNameToCodepoint.cpp
  llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
  llvm/unittests/Support/UnicodeTest.cpp
  llvm/utils/UnicodeData/CMakeLists.txt
  llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

2022-04-10 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 421768.
cor3ntin added a comment.

Add missing header in generated code to fix windows CI.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064

Files:
  clang/include/clang/Basic/DiagnosticLexKinds.td
  clang/include/clang/Lex/Lexer.h
  clang/lib/Lex/Lexer.cpp
  clang/lib/Lex/LiteralSupport.cpp
  clang/test/FixIt/fixit-unicode-named-escape-sequences.c
  clang/test/Lexer/char-escapes-delimited.c
  clang/test/Lexer/unicode.c
  clang/test/Parser/cxx11-user-defined-literals.cpp
  clang/test/Preprocessor/ucn-pp-identifier.c
  clang/test/Sema/ucn-identifiers.c
  llvm/CMakeLists.txt
  llvm/include/llvm/ADT/StringExtras.h
  llvm/include/llvm/Support/ScopedPrinter.h
  llvm/include/llvm/Support/Unicode.h
  llvm/lib/Support/CMakeLists.txt
  llvm/lib/Support/ScopedPrinter.cpp
  llvm/lib/Support/StringExtras.cpp
  llvm/lib/Support/UnicodeNameToCodepoint.cpp
  llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
  llvm/unittests/Support/UnicodeTest.cpp
  llvm/utils/UnicodeData/CMakeLists.txt
  llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits