[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals
alexfh added inline comments. Comment at: test/clang-tidy/modernize-raw-string-literal.cpp:42 char const *const HexNonPrintable("\\\x03"); char const *const Delete("\\\177"); +char const *const MultibyteSnowman("\xE2\x98\x83"); zinovy.nis wrote: > By the way, AFAIK the lines above are not checked for fixes (and fixes > absence) at all! Is it correct? These test cases without corresponding CHECK-FIXES/CHECK-MESSAGES lines will work (that is ensure no fixes are applied to the code) under the assumption that all warnings explicitly covered by CHECK-MESSAGE lines will only contain fixes for relevant code (in most cases in the same line), and there are CHECK-FIXES for all of them. And if there is no CHECK-MESSAGES line then `FileCheck -implicit-check-not='warning|error' -check-prefix=CHECK-MESSAGES` (used in the check_clang_tidy.py) will fail if there is an unaccounted (i.e. not covered by a CHECK-MESSAGES: pattern) warning. The assumption won't work if the check makes distant fixes (for example, if it inserts #include directives, there should be a CHECK-FIXES line for that). Thus there's not much value in checking these lines are left unmodified. But in case there is a warning which should contain no fix, checking that the code was not modified by clang-tidy is quite useful. We could make this more explicit in different ways: 1. Run FileCheck on the result of `diff original-file file-processed-by-clang-tidy`. That may make tests much more verbose. 2. Run FileCheck on the result of `clang-tidy -export-fixes`. That would make the tests both more verbose and harder to verify for sanity (applying textual replacements with given byte offsets is not what a human brain does well). Better ideas are welcome. https://reviews.llvm.org/D45932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals
zinovy.nis added a comment. https://reviews.llvm.org/rCTE331297 https://reviews.llvm.org/D45932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals
zinovy.nis closed this revision. zinovy.nis added a comment. Fixed in git-svn-id: https://llvm.org/svn/llvm-project/clang-tools-extra/trunk@331297 91177308-0d34-0410-b5e6-96231b3b80d8 https://reviews.llvm.org/D45932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals
zinovy.nis marked an inline comment as done. zinovy.nis added inline comments. Comment at: test/clang-tidy/modernize-raw-string-literal.cpp:44 +char const *const MultibyteSnowman("\xE2\x98\x83"); +// CHECK-FIXES: {{^}}char const *const MultibyteSnowman("\xE2\x98\x83");{{$}} LegalizeAdulthood wrote: > Consider adding CHECK-FIXES lines for the other test cases that should remain > unmolested by the check. I'm going to make a separate commit for that, OK? https://reviews.llvm.org/D45932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals
LegalizeAdulthood added inline comments. Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:111 + // Upper ASCII are disallowed too. + for (unsigned char C = 0xFFu; C >= 0x80u; --C) +DisallowedChars.set(C); zinovy.nis wrote: > LegalizeAdulthood wrote: > > Why does this loop go down instead of up? I would have expected a more > > conventional > > > > for (unsigned char C = 0x80u; C <= 0xFFu; ++C) > The only reason is that `++C` for `C==0xFFu` is `0` so `C <= 0xFF` is always > satisfied leading to inifinite loop. OK, makes sense. I rarely write loops that iterate over character types. https://reviews.llvm.org/D45932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals
zinovy.nis added inline comments. Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:111 + // Upper ASCII are disallowed too. + for (unsigned char C = 0xFFu; C >= 0x80u; --C) +DisallowedChars.set(C); LegalizeAdulthood wrote: > Why does this loop go down instead of up? I would have expected a more > conventional > > for (unsigned char C = 0x80u; C <= 0xFFu; ++C) The only reason is that `++C` for `C==0xFFu` is `0` so `C <= 0xFF` is always satisfied leading to inifinite loop. https://reviews.llvm.org/D45932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals
LegalizeAdulthood accepted this revision. LegalizeAdulthood added a comment. This revision is now accepted and ready to land. Other than a few minor nits, ship it. Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:110 + + // Upper ASCII are disallowed too. + for (unsigned char C = 0xFFu; C >= 0x80u; --C) "Upper ASCII" is a misnomer. ASCII only defines character codes 0-127. Non-ASCII codes are what is being described here. Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:111 + // Upper ASCII are disallowed too. + for (unsigned char C = 0xFFu; C >= 0x80u; --C) +DisallowedChars.set(C); Why does this loop go down instead of up? I would have expected a more conventional for (unsigned char C = 0x80u; C <= 0xFFu; ++C) Comment at: clang-tidy/modernize/RawStringLiteralCheck.h:39 std::string DelimiterStem; + std::bitset<1 << CHAR_BIT> DisallowedChars; const bool ReplaceShorterLiterals; Consider introducing a typedef for this ugly type name. Comment at: test/clang-tidy/modernize-raw-string-literal.cpp:44 +char const *const MultibyteSnowman("\xE2\x98\x83"); +// CHECK-FIXES: {{^}}char const *const MultibyteSnowman("\xE2\x98\x83");{{$}} Consider adding CHECK-FIXES lines for the other test cases that should remain unmolested by the check. https://reviews.llvm.org/D45932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals
zinovy.nis added a comment. Gentle ping. https://reviews.llvm.org/D45932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals
zinovy.nis added a comment. Aaron, any comments for the new revision? https://reviews.llvm.org/D45932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals
zinovy.nis updated this revision to Diff 144066. zinovy.nis added a comment. - Optimized `containsEscapedCharacters` not to re-create `bitset` (implicitly in `StringRef::find_first_of`) for each literal. - Merged 2 passes for testing for allowed chars into a single one. https://reviews.llvm.org/D45932 Files: clang-tidy/modernize/RawStringLiteralCheck.cpp clang-tidy/modernize/RawStringLiteralCheck.h test/clang-tidy/modernize-raw-string-literal.cpp Index: test/clang-tidy/modernize-raw-string-literal.cpp === --- test/clang-tidy/modernize-raw-string-literal.cpp +++ test/clang-tidy/modernize-raw-string-literal.cpp @@ -40,6 +40,8 @@ char const *const Us("goink\\\037"); char const *const HexNonPrintable("\\\x03"); char const *const Delete("\\\177"); +char const *const MultibyteSnowman("\xE2\x98\x83"); +// CHECK-FIXES: {{^}}char const *const MultibyteSnowman("\xE2\x98\x83");{{$}} char const *const TrailingSpace("A line \\with space. \n"); char const *const TrailingNewLine("A single \\line.\n"); Index: clang-tidy/modernize/RawStringLiteralCheck.h === --- clang-tidy/modernize/RawStringLiteralCheck.h +++ clang-tidy/modernize/RawStringLiteralCheck.h @@ -11,6 +11,7 @@ #define LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MODERNIZE_RAW_STRING_LITERAL_H #include "../ClangTidy.h" +#include namespace clang { namespace tidy { @@ -35,6 +36,7 @@ const StringLiteral *Literal, StringRef Replacement); std::string DelimiterStem; + std::bitset<1 << CHAR_BIT> DisallowedChars; const bool ReplaceShorterLiterals; }; Index: clang-tidy/modernize/RawStringLiteralCheck.cpp === --- clang-tidy/modernize/RawStringLiteralCheck.cpp +++ clang-tidy/modernize/RawStringLiteralCheck.cpp @@ -41,29 +41,16 @@ return (QuotePos > 0) && (Text[QuotePos - 1] == 'R'); } -bool containsEscapedCharacters(const MatchFinder::MatchResult , - const StringLiteral *Literal) { +bool containsEscapedCharacters( +const MatchFinder::MatchResult , const StringLiteral *Literal, +const std::bitset<1 << CHAR_BIT> ) { // FIXME: Handle L"", u8"", u"" and U"" literals. if (!Literal->isAscii()) return false; - StringRef Bytes = Literal->getBytes(); - // Non-printing characters disqualify this literal: - // \007 = \a bell - // \010 = \b backspace - // \011 = \t horizontal tab - // \012 = \n new line - // \013 = \v vertical tab - // \014 = \f form feed - // \015 = \r carriage return - // \177 = delete - if (Bytes.find_first_of(StringRef("\000\001\002\003\004\005\006\a" -"\b\t\n\v\f\r\016\017" -"\020\021\022\023\024\025\026\027" -"\030\031\032\033\034\035\036\037" -"\177", -33)) != StringRef::npos) -return false; + for (const unsigned char C : Literal->getBytes()) +if (DisallowedChars.test(C)) + return false; CharSourceRange CharRange = Lexer::makeFileCharRange( CharSourceRange::getTokenRange(Literal->getSourceRange()), @@ -102,7 +89,28 @@ ClangTidyContext *Context) : ClangTidyCheck(Name, Context), DelimiterStem(Options.get("DelimiterStem", "lit")), - ReplaceShorterLiterals(Options.get("ReplaceShorterLiterals", false)) {} + ReplaceShorterLiterals(Options.get("ReplaceShorterLiterals", false)) { + // Non-printing characters are disallowed: + // \007 = \a bell + // \010 = \b backspace + // \011 = \t horizontal tab + // \012 = \n new line + // \013 = \v vertical tab + // \014 = \f form feed + // \015 = \r carriage return + // \177 = delete + for (const unsigned char C : StringRef("\000\001\002\003\004\005\006\a" + "\b\t\n\v\f\r\016\017" + "\020\021\022\023\024\025\026\027" + "\030\031\032\033\034\035\036\037" + "\177", + 33)) +DisallowedChars.set(C); + + // Upper ASCII are disallowed too. + for (unsigned char C = 0xFFu; C >= 0x80u; --C) +DisallowedChars.set(C); +} void RawStringLiteralCheck::storeOptions(ClangTidyOptions::OptionMap ) { ClangTidyCheck::storeOptions(Options); @@ -124,7 +132,7 @@ if (Literal->getLocStart().isMacroID()) return; - if (containsEscapedCharacters(Result, Literal)) { + if (containsEscapedCharacters(Result, Literal, DisallowedChars)) { std::string Replacement = asRawStringLiteral(Literal, DelimiterStem); if (ReplaceShorterLiterals || Replacement.length() <= ___ cfe-commits mailing list
[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals
zinovy.nis added inline comments. Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:71 + // supported by specific code pages only. + if (Bytes.find_if_not(isASCII) != StringRef::npos) +return false; aaron.ballman wrote: > zinovy.nis wrote: > > aaron.ballman wrote: > > > zinovy.nis wrote: > > > > aaron.ballman wrote: > > > > > I am starting to think that this functionality should be refactored > > > > > because the check is now O(N^2) in the worst case because all of the > > > > > bytes of the string need to be touched twice. It would be nice for > > > > > performance reasons to combine this so that there's only a single > > > > > pass over all of the characters. > > > > > > > > > > What do you think? > > > > Sorry, but why O(N^2)? `isASCII` is O(1), it's just `return C<=127`. > > > > `find_if_not` is O(N). > > > The `find_if_not()` call you add touches every character in the string to > > > see if it's ASCII and it all characters are ASCII, then > > > `containsEscape()` calls `find()` which touches every character again > > > (assuming the searched character is never encountered). > > > > > > Looking a bit deeper, `isRawStringLiteral()` also calls `find()`, but it > > > asserts that the character is found, so not every character is touched in > > > the string and it should find a the searched character quite quickly. > > OK, I'll see how to combine theses checks into a single one. > > > > But anyway I see only 2*O(N), not O(N^2) here. > Oh, derp, that's my thinko -- sorry! You are correct, that's 2 * O(N) and not > O(N^2). May be we can use here [[ http://en.cppreference.com/w/cpp/string/byte/isprint | std::isprint ]] https://reviews.llvm.org/D45932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals
aaron.ballman added inline comments. Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:71 + // supported by specific code pages only. + if (Bytes.find_if_not(isASCII) != StringRef::npos) +return false; zinovy.nis wrote: > aaron.ballman wrote: > > zinovy.nis wrote: > > > aaron.ballman wrote: > > > > I am starting to think that this functionality should be refactored > > > > because the check is now O(N^2) in the worst case because all of the > > > > bytes of the string need to be touched twice. It would be nice for > > > > performance reasons to combine this so that there's only a single pass > > > > over all of the characters. > > > > > > > > What do you think? > > > Sorry, but why O(N^2)? `isASCII` is O(1), it's just `return C<=127`. > > > `find_if_not` is O(N). > > The `find_if_not()` call you add touches every character in the string to > > see if it's ASCII and it all characters are ASCII, then `containsEscape()` > > calls `find()` which touches every character again (assuming the searched > > character is never encountered). > > > > Looking a bit deeper, `isRawStringLiteral()` also calls `find()`, but it > > asserts that the character is found, so not every character is touched in > > the string and it should find a the searched character quite quickly. > OK, I'll see how to combine theses checks into a single one. > > But anyway I see only 2*O(N), not O(N^2) here. Oh, derp, that's my thinko -- sorry! You are correct, that's 2 * O(N) and not O(N^2). https://reviews.llvm.org/D45932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals
zinovy.nis added inline comments. Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:71 + // supported by specific code pages only. + if (Bytes.find_if_not(isASCII) != StringRef::npos) +return false; aaron.ballman wrote: > zinovy.nis wrote: > > aaron.ballman wrote: > > > I am starting to think that this functionality should be refactored > > > because the check is now O(N^2) in the worst case because all of the > > > bytes of the string need to be touched twice. It would be nice for > > > performance reasons to combine this so that there's only a single pass > > > over all of the characters. > > > > > > What do you think? > > Sorry, but why O(N^2)? `isASCII` is O(1), it's just `return C<=127`. > > `find_if_not` is O(N). > The `find_if_not()` call you add touches every character in the string to see > if it's ASCII and it all characters are ASCII, then `containsEscape()` calls > `find()` which touches every character again (assuming the searched character > is never encountered). > > Looking a bit deeper, `isRawStringLiteral()` also calls `find()`, but it > asserts that the character is found, so not every character is touched in the > string and it should find a the searched character quite quickly. OK, I'll see how to combine theses checks into a single one. But anyway I see only 2*O(N), not O(N^2) here. https://reviews.llvm.org/D45932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals
aaron.ballman added inline comments. Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:71 + // supported by specific code pages only. + if (Bytes.find_if_not(isASCII) != StringRef::npos) +return false; zinovy.nis wrote: > aaron.ballman wrote: > > I am starting to think that this functionality should be refactored because > > the check is now O(N^2) in the worst case because all of the bytes of the > > string need to be touched twice. It would be nice for performance reasons > > to combine this so that there's only a single pass over all of the > > characters. > > > > What do you think? > Sorry, but why O(N^2)? `isASCII` is O(1), it's just `return C<=127`. > `find_if_not` is O(N). The `find_if_not()` call you add touches every character in the string to see if it's ASCII and it all characters are ASCII, then `containsEscape()` calls `find()` which touches every character again (assuming the searched character is never encountered). Looking a bit deeper, `isRawStringLiteral()` also calls `find()`, but it asserts that the character is found, so not every character is touched in the string and it should find a the searched character quite quickly. https://reviews.llvm.org/D45932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals
zinovy.nis added inline comments. Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:71 + // supported by specific code pages only. + if (Bytes.find_if_not(isASCII) != StringRef::npos) +return false; aaron.ballman wrote: > I am starting to think that this functionality should be refactored because > the check is now O(N^2) in the worst case because all of the bytes of the > string need to be touched twice. It would be nice for performance reasons to > combine this so that there's only a single pass over all of the characters. > > What do you think? Sorry, but why O(N^2)? `isASCII` is O(1), it's just `return C<=127`. `find_if_not` is O(N). https://reviews.llvm.org/D45932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals
aaron.ballman added inline comments. Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:71 + // supported by specific code pages only. + if (Bytes.find_if_not(isASCII) != StringRef::npos) +return false; I am starting to think that this functionality should be refactored because the check is now O(N^2) in the worst case because all of the bytes of the string need to be touched twice. It would be nice for performance reasons to combine this so that there's only a single pass over all of the characters. What do you think? https://reviews.llvm.org/D45932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals
zinovy.nis updated this revision to Diff 143679. zinovy.nis marked an inline comment as done. zinovy.nis added a comment. - Use clang::isASCII instead of home-brewed code. https://reviews.llvm.org/D45932 Files: clang-tidy/modernize/RawStringLiteralCheck.cpp test/clang-tidy/modernize-raw-string-literal.cpp Index: test/clang-tidy/modernize-raw-string-literal.cpp === --- test/clang-tidy/modernize-raw-string-literal.cpp +++ test/clang-tidy/modernize-raw-string-literal.cpp @@ -40,6 +40,8 @@ char const *const Us("goink\\\037"); char const *const HexNonPrintable("\\\x03"); char const *const Delete("\\\177"); +char const *const MultibyteSnowman("\xE2\x98\x83"); +// CHECK-FIXES: {{^}}char const *const MultibyteSnowman("\xE2\x98\x83");{{$}} char const *const TrailingSpace("A line \\with space. \n"); char const *const TrailingNewLine("A single \\line.\n"); Index: clang-tidy/modernize/RawStringLiteralCheck.cpp === --- clang-tidy/modernize/RawStringLiteralCheck.cpp +++ clang-tidy/modernize/RawStringLiteralCheck.cpp @@ -10,6 +10,7 @@ #include "RawStringLiteralCheck.h" #include "clang/AST/ASTContext.h" #include "clang/ASTMatchers/ASTMatchFinder.h" +#include "clang/Basic/CharInfo.h" #include "clang/Lex/Lexer.h" using namespace clang::ast_matchers; @@ -65,6 +66,11 @@ 33)) != StringRef::npos) return false; + // Don't replace chars from upper ASCII as they can represent UTF-8 or be + // supported by specific code pages only. + if (Bytes.find_if_not(isASCII) != StringRef::npos) +return false; + CharSourceRange CharRange = Lexer::makeFileCharRange( CharSourceRange::getTokenRange(Literal->getSourceRange()), *Result.SourceManager, Result.Context->getLangOpts()); Index: test/clang-tidy/modernize-raw-string-literal.cpp === --- test/clang-tidy/modernize-raw-string-literal.cpp +++ test/clang-tidy/modernize-raw-string-literal.cpp @@ -40,6 +40,8 @@ char const *const Us("goink\\\037"); char const *const HexNonPrintable("\\\x03"); char const *const Delete("\\\177"); +char const *const MultibyteSnowman("\xE2\x98\x83"); +// CHECK-FIXES: {{^}}char const *const MultibyteSnowman("\xE2\x98\x83");{{$}} char const *const TrailingSpace("A line \\with space. \n"); char const *const TrailingNewLine("A single \\line.\n"); Index: clang-tidy/modernize/RawStringLiteralCheck.cpp === --- clang-tidy/modernize/RawStringLiteralCheck.cpp +++ clang-tidy/modernize/RawStringLiteralCheck.cpp @@ -10,6 +10,7 @@ #include "RawStringLiteralCheck.h" #include "clang/AST/ASTContext.h" #include "clang/ASTMatchers/ASTMatchFinder.h" +#include "clang/Basic/CharInfo.h" #include "clang/Lex/Lexer.h" using namespace clang::ast_matchers; @@ -65,6 +66,11 @@ 33)) != StringRef::npos) return false; + // Don't replace chars from upper ASCII as they can represent UTF-8 or be + // supported by specific code pages only. + if (Bytes.find_if_not(isASCII) != StringRef::npos) +return false; + CharSourceRange CharRange = Lexer::makeFileCharRange( CharSourceRange::getTokenRange(Literal->getSourceRange()), *Result.SourceManager, Result.Context->getLangOpts()); ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals
zinovy.nis marked an inline comment as done. zinovy.nis added inline comments. Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:70-72 + if (Bytes.find_if([](char C) { +return static_cast(C) > 0x7Fu; + }) != StringRef::npos) aaron.ballman wrote: > I think you can use `isASCII()` from CharInfo.h rather than reimplement it. Nice finding! Thanks! Comment at: test/clang-tidy/modernize-raw-string-literal.cpp:44 +char const *const MultibyteSnowman("\xE2\x98\x83"); +// CHECK-FIXES: {{^}}char const *const MultibyteSnowman("\xE2\x98\x83");{{$}} LegalizeAdulthood wrote: > IIRC, the default behavior is that if no matching CHECK-FIXES line is found, > then it is considered a failure. Have you tried your test code without your > change to verify that this is the case? 1. Without my fix my test fails with a Python decoder error as it cannot print Unicode symbols in Windows console. But no FileCheck errors occur: ``` UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 974: ordinal not in range(128) ``` 2. Regardless of my changes when I replace `"char const *const Delete("\\\177");"` with `"char const *const Delete("\000\\\177");"` (leading `\0`) test still passes! Looks like CHECK-FIXes must be explicit. Repository: rCTE Clang Tools Extra https://reviews.llvm.org/D45932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals
LegalizeAdulthood added inline comments. Comment at: test/clang-tidy/modernize-raw-string-literal.cpp:44 +char const *const MultibyteSnowman("\xE2\x98\x83"); +// CHECK-FIXES: {{^}}char const *const MultibyteSnowman("\xE2\x98\x83");{{$}} IIRC, the default behavior is that if no matching CHECK-FIXES line is found, then it is considered a failure. Have you tried your test code without your change to verify that this is the case? Repository: rCTE Clang Tools Extra https://reviews.llvm.org/D45932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals
aaron.ballman added inline comments. Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:70-72 + if (Bytes.find_if([](char C) { +return static_cast(C) > 0x7Fu; + }) != StringRef::npos) I think you can use `isASCII()` from CharInfo.h rather than reimplement it. Repository: rCTE Clang Tools Extra https://reviews.llvm.org/D45932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals
zinovy.nis added inline comments. Comment at: test/clang-tidy/modernize-raw-string-literal.cpp:42 char const *const HexNonPrintable("\\\x03"); char const *const Delete("\\\177"); +char const *const MultibyteSnowman("\xE2\x98\x83"); By the way, AFAIK the lines above are not checked for fixes (and fixes absence) at all! Is it correct? Repository: rCTE Clang Tools Extra https://reviews.llvm.org/D45932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals
zinovy.nis created this revision. zinovy.nis added reviewers: xazax.hun, LegalizeAdulthood. zinovy.nis added a project: clang-tools-extra. Herald added subscribers: cfe-commits, rnkovacs. It's useless and not safe to replace `"\xE2\x98\x83"` with `"☃"` (snowman) Especially there's an explicit test for ASCII in this check. Repository: rCTE Clang Tools Extra https://reviews.llvm.org/D45932 Files: clang-tidy/modernize/RawStringLiteralCheck.cpp test/clang-tidy/modernize-raw-string-literal.cpp Index: test/clang-tidy/modernize-raw-string-literal.cpp === --- test/clang-tidy/modernize-raw-string-literal.cpp +++ test/clang-tidy/modernize-raw-string-literal.cpp @@ -40,6 +40,8 @@ char const *const Us("goink\\\037"); char const *const HexNonPrintable("\\\x03"); char const *const Delete("\\\177"); +char const *const MultibyteSnowman("\xE2\x98\x83"); +// CHECK-FIXES: {{^}}char const *const MultibyteSnowman("\xE2\x98\x83");{{$}} char const *const TrailingSpace("A line \\with space. \n"); char const *const TrailingNewLine("A single \\line.\n"); Index: clang-tidy/modernize/RawStringLiteralCheck.cpp === --- clang-tidy/modernize/RawStringLiteralCheck.cpp +++ clang-tidy/modernize/RawStringLiteralCheck.cpp @@ -65,6 +65,13 @@ 33)) != StringRef::npos) return false; + // Don't replace chars from upper ASCII (>0x7F) as they can represent UTF-8 + // or be supported by specific code pages only. + if (Bytes.find_if([](char C) { +return static_cast(C) > 0x7Fu; + }) != StringRef::npos) +return false; + CharSourceRange CharRange = Lexer::makeFileCharRange( CharSourceRange::getTokenRange(Literal->getSourceRange()), *Result.SourceManager, Result.Context->getLangOpts()); Index: test/clang-tidy/modernize-raw-string-literal.cpp === --- test/clang-tidy/modernize-raw-string-literal.cpp +++ test/clang-tidy/modernize-raw-string-literal.cpp @@ -40,6 +40,8 @@ char const *const Us("goink\\\037"); char const *const HexNonPrintable("\\\x03"); char const *const Delete("\\\177"); +char const *const MultibyteSnowman("\xE2\x98\x83"); +// CHECK-FIXES: {{^}}char const *const MultibyteSnowman("\xE2\x98\x83");{{$}} char const *const TrailingSpace("A line \\with space. \n"); char const *const TrailingNewLine("A single \\line.\n"); Index: clang-tidy/modernize/RawStringLiteralCheck.cpp === --- clang-tidy/modernize/RawStringLiteralCheck.cpp +++ clang-tidy/modernize/RawStringLiteralCheck.cpp @@ -65,6 +65,13 @@ 33)) != StringRef::npos) return false; + // Don't replace chars from upper ASCII (>0x7F) as they can represent UTF-8 + // or be supported by specific code pages only. + if (Bytes.find_if([](char C) { +return static_cast(C) > 0x7Fu; + }) != StringRef::npos) +return false; + CharSourceRange CharRange = Lexer::makeFileCharRange( CharSourceRange::getTokenRange(Literal->getSourceRange()), *Result.SourceManager, Result.Context->getLangOpts()); ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits