[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2023-07-09 Thread Fako via Phabricator via cfe-commits
Fako432 added a comment. Herald added a subscriber: wangpc. In D128059#3640564 , @cor3ntin wrote: > In D128059#3640544 , > @hubert.reinterpretcast wrote: > >> In D128059#3640424

Re: [PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2023-06-09 Thread Corentin via cfe-commits
I'll be unavailable the next 2 weeks, feel free to do it if you want! On Fri, Jun 9, 2023, 21:10 Tom Honermann via Phabricator < revi...@reviews.llvm.org> wrote: > tahonermann added a comment. > > @cor3ntin, sorry for failing to keep up with reviews; I know this has > already been committed. I

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2023-06-09 Thread Tom Honermann via Phabricator via cfe-commits
tahonermann added a comment. @cor3ntin, sorry for failing to keep up with reviews; I know this has already been committed. I did spot a couple of typos should you feel inclined to address them. Comment at: clang/lib/Lex/Lexer.cpp:2695 + // diagnostic only once per entire

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2023-05-31 Thread Wanda Golden via Phabricator via cfe-commits
marginstrawberries added a comment. Herald added a subscriber: pcwang-thead. In D128059#3640424 , @cor3ntin wrote: > @aaron.ballman Thanks for the review. I landed the changes and got a bunch of > bots screaming at me for changes that are completely

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-13 Thread Corentin Jabot via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGd4892a168f51: [Clang] Add a warning on invalid UTF-8 in comments. (authored by cor3ntin). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-12 Thread Jonas Devlieghere via Phabricator via cfe-commits
JDevlieghere added a comment. In D128059#3646953 , @cor3ntin wrote: > In D128059#3646695 , @JDevlieghere > wrote: > >> I had to revert this because it breaks a bunch of LLDB tests: >>

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-12 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 444118. cor3ntin added a comment. This turned out to be an interesting bug. The SSE code tried to be clever and skip over valid ascii code units when finding invalid UTF-8. In doing so, it could run over the end of a comment entirely if - there was a short

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-12 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin added a comment. In D128059#3646695 , @JDevlieghere wrote: > I had to revert this because it breaks a bunch of LLDB tests: > https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/45288/#showFailuresLink. > It looks like this causes an

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-12 Thread Jonas Devlieghere via Phabricator via cfe-commits
JDevlieghere added a comment. I had to revert this because it breaks a bunch of LLDB tests: https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/45288/#showFailuresLink. It looks like this causes an error when building some SDK modules. Repository: rG LLVM Github Monorepo CHANGES

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-12 Thread Corentin Jabot via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGcc309721d20c: [Clang] Add a warning on invalid UTF-8 in comments. (authored by cor3ntin). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-12 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman accepted this revision. aaron.ballman added a comment. In D128059#3643081 , @cor3ntin wrote: > Fix crash on PowerPC > > (I forgot to removed a non-sense line that > could cause the CurPtr to move incorrectly > past a / in the ALTIVEC code

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-11 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 443699. cor3ntin added a comment. Fix crash on PowerPC (I forgot to removed a non-sense line that could cause the CurPtr to move incorrectly past a / in the ALTIVEC code path). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-09 Thread Hubert Tong via Phabricator via cfe-commits
hubert.reinterpretcast added a comment. In D128059#3640564 , @cor3ntin wrote: > Good point. The error was a bit misleading but i guess what's happening is a > segfault when running `clang-ast-dump`. > I'm reverting for now and I don't really know how

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-09 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin added a comment. In D128059#3640544 , @hubert.reinterpretcast wrote: > In D128059#3640424 , @cor3ntin > wrote: > >> @aaron.ballman Thanks for the review. I landed the changes and got a bunch >> of

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-09 Thread Hubert Tong via Phabricator via cfe-commits
hubert.reinterpretcast added a comment. In D128059#3640424 , @cor3ntin wrote: > @aaron.ballman Thanks for the review. I landed the changes and got a bunch of > bots screaming at me for changes that are completely unrelated > >

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-09 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin added a comment. @aaron.ballman Thanks for the review. I landed the changes and got a bunch of bots screaming at me for changes that are completely unrelated https://lab.llvm.org/buildbot/#/builders/21/builds/45146 https://lab.llvm.org/buildbot/#/builders/36/builds/22838

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-09 Thread Corentin Jabot via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG355532a1499a: [Clang] Add a warning on invalid UTF-8 in comments. (authored by cor3ntin). Changed prior to commit:

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-08 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 443225. cor3ntin marked 19 inline comments as done. cor3ntin added a comment. Remove superfuous braces. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D128059/new/ https://reviews.llvm.org/D128059 Files:

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-08 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman accepted this revision. aaron.ballman added a comment. I only spotted one thing I think is actually an issue, the rest is style related. LGTM with the one issue fixed. Comment at: clang/lib/Lex/Lexer.cpp:2707-2709 +if (!isASCII(C)) { + goto

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-08 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 443222. cor3ntin added a comment. Fix bound check in the non vectorized case Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D128059/new/ https://reviews.llvm.org/D128059 Files: clang/docs/ReleaseNotes.rst

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-06 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin added a comment. @aaron.ballman Ok, this didn't turn out great, and given the changes + the fact it was completely broken (twice!), I'd love if you could take a look at it again. Thanks :) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-06 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 442702. cor3ntin added a comment. Deploying that turned out to reveal a few critical issues - `getUTF8SequenceSize` never reported a non-zero length for valid UTF-8 sequences. - In *some* circumstances (depending on the size of comment), Unicode

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-06 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin added a comment. The bots found a further issue which i committed a fix for directly https://reviews.llvm.org/D129223 - I should have caught that sooner Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D128059/new/

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-06 Thread Corentin Jabot via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG4174f0ca618b: [Clang] Add a warning on invalid UTF-8 in comments. (authored by cor3ntin). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-06 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 442642. cor3ntin added a comment. Fix the test of a newly landed commit Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D128059/new/ https://reviews.llvm.org/D128059 Files: clang/docs/ReleaseNotes.rst

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-06 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin added a comment. In D128059#3633343 , @thakis wrote: > As far as I can tell, this breaks check-clang everywhere: http://45.33.8.238/ > > Please take a look and revert for now if it takes a while to fix. Thanks for letting me know, i hadn't

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-06 Thread Nico Weber via Phabricator via cfe-commits
thakis added a comment. As far as I can tell, this breaks check-clang everywhere: http://45.33.8.238/ Please take a look and revert for now if it takes a while to fix. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D128059/new/

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-06 Thread Corentin Jabot via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGe3dc56805f10: [Clang] Add a warning on invalid UTF-8 in comments. (authored by cor3ntin). Changed prior to commit:

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-06 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman accepted this revision. aaron.ballman added a comment. This revision is now accepted and ready to land. The changes here LGTM, thank you for this! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D128059/new/

Re: [PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-05 Thread Corentin via cfe-commits
My plan is to do that for all papers once past plenary to keep consistency with cxx_status. On Tue, Jul 5, 2022 at 6:13 PM Aaron Ballman via Phabricator < revi...@reviews.llvm.org> wrote: > aaron.ballman added reviewers: tahonermann, clang-language-wg. > aaron.ballman added inline comments. > >

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-05 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added reviewers: tahonermann, clang-language-wg. aaron.ballman added inline comments. Comment at: clang/docs/ReleaseNotes.rst:271 +- Added ``-Winvalid-utf8`` which diagnoses invalid UTF-8 code unit sequences in + comments. Should we mention

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-01 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 441748. cor3ntin added a comment. Fix typo. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D128059/new/ https://reviews.llvm.org/D128059 Files: clang/docs/ReleaseNotes.rst

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-01 Thread Erich Keane via Phabricator via cfe-commits
erichkeane added inline comments. Comment at: clang/docs/ReleaseNotes.rst:270 This fixes `Issue 55962 `_. +- Added ``-Winvalid-utf8`` which diagnose invalid UTF-8 code unit sequences in + comments.

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-07-01 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin added a comment. @aaron.ballman FYI this is now core-approved Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D128059/new/ https://reviews.llvm.org/D128059 ___ cfe-commits mailing list

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-06-23 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 439276. cor3ntin added a comment. - Remove explicit load instruction - cleanup extra braces Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D128059/new/ https://reviews.llvm.org/D128059 Files:

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-06-22 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 439177. cor3ntin added a comment. - Add a comment referencing the (to be) C++ wording and unicode discussion on replacemet characters - Do not try to skip utf8 checks when the warning is not enabled as checking whether the warning is enabled is more

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-06-22 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin added inline comments. Comment at: clang/lib/Lex/Lexer.cpp:2717 +__m128i Bytes = +_mm_loadu_si128(reinterpret_cast(CurPtr)); +int Mask = _mm_movemask_epi8(Bytes); aaron.ballman wrote: > aaron.ballman wrote: > > cor3ntin

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-06-22 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added inline comments. Comment at: clang/lib/Lex/Lexer.cpp:2717 +__m128i Bytes = +_mm_loadu_si128(reinterpret_cast(CurPtr)); +int Mask = _mm_movemask_epi8(Bytes); aaron.ballman wrote: > cor3ntin wrote: > > This crashes

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-06-22 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added inline comments. Comment at: clang/lib/Lex/Lexer.cpp:2717 +__m128i Bytes = +_mm_loadu_si128(reinterpret_cast(CurPtr)); +int Mask = _mm_movemask_epi8(Bytes); cor3ntin wrote: > This crashes when using

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-06-22 Thread Tom Honermann via Phabricator via cfe-commits
tahonermann added inline comments. Comment at: clang/lib/Lex/Lexer.cpp:2405-2406 // Skip over characters in the fast loop. -while (C != 0 &&// Potentially EOF. - C != '\n' && C != '\r') // Newline or DOS-style newline. +// Warn on invalid

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-06-22 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin added inline comments. Comment at: clang/lib/Lex/Lexer.cpp:2717 +__m128i Bytes = +_mm_loadu_si128(reinterpret_cast(CurPtr)); +int Mask = _mm_movemask_epi8(Bytes); This crashes when using `_mm_load_si128` which suprises me

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-06-22 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added a comment. It looks like precommit CI caught some issues: Failed Tests (2): Clangd :: utf8.test Clangd Unit Tests :: ./ClangdTests.exe/CompletionStringTest/GetDeclCommentBadUTF8 Comment at: clang/include/clang/Basic/DiagnosticLexKinds.td:116-117

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-06-22 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin added a comment. @aaron.ballman I created a 1.3GB file containing 50 comments of random size and content, with the following script import string import random for i in range(0, 50): print("/*{}*/".format(''.join(random.choices('\n' + string.ascii_uppercase +

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-06-22 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 439057. cor3ntin added a comment. Further optimize Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D128059/new/ https://reviews.llvm.org/D128059 Files: clang/docs/ReleaseNotes.rst

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-06-22 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 439047. cor3ntin added a comment. Vectorizes comment skipping even when -Winvalid-utf8 is enabled. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D128059/new/ https://reviews.llvm.org/D128059 Files:

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-06-22 Thread Tom Honermann via Phabricator via cfe-commits
tahonermann added inline comments. Comment at: clang/include/clang/Basic/DiagnosticLexKinds.td:116-117 "source file is not valid UTF-8">; +def warn_invalid_utf8_in_comment : Warning< + "invalid UTF-8 in comment">, InGroup>, DefaultIgnore; def err_character_not_allowed :

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-06-22 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 438990. cor3ntin added a comment. Make sure the warning is off by default. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D128059/new/ https://reviews.llvm.org/D128059 Files: clang/docs/ReleaseNotes.rst

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-06-22 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 438984. cor3ntin added a comment. s/UnicodeDecodeFailed/UnicodeDecodingAlreadyDiagnosed/ Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D128059/new/ https://reviews.llvm.org/D128059 Files:

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-06-22 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added inline comments. Comment at: clang/lib/Lex/Lexer.cpp:2702-2703 // (probably ending) '/' character. -if (CurPtr + 24 < BufferEnd && +// When diagnosing invalid UTF-8 sequences we always skip the fast +// vectorized path. +if

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-06-22 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin added inline comments. Comment at: clang/lib/Lex/Lexer.cpp:2399 + getSourceLocation(CurPtr)); + bool UnicodeDecodeFailed = false; + aaron.ballman wrote: > It looks like this can move into the `while` loop and we can

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-06-22 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin updated this revision to Diff 438926. cor3ntin marked 3 inline comments as done. cor3ntin added a comment. - Address style comments - Improve commit message - Enable the warning in -pedantic Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-06-21 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added inline comments. Comment at: clang/include/clang/Basic/DiagnosticLexKinds.td:116-117 "source file is not valid UTF-8">; +def warn_invalid_utf8_in_comment : Warning< + "invalid UTF-8 in comment">, InGroup>, DefaultIgnore; def err_character_not_allowed :

[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

2022-06-17 Thread Corentin Jabot via Phabricator via cfe-commits
cor3ntin created this revision. Herald added a subscriber: hiraditya. Herald added a project: All. cor3ntin requested review of this revision. Herald added projects: clang, LLVM. Herald added subscribers: llvm-commits, cfe-commits. Introduce an off-by default `-Winvalid-utf8` warning that detects