[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals

2018-05-02 Thread Alexander Kornienko via Phabricator via cfe-commits
alexfh added inline comments.



Comment at: test/clang-tidy/modernize-raw-string-literal.cpp:42
 char const *const HexNonPrintable("\\\x03");
 char const *const Delete("\\\177");
+char const *const MultibyteSnowman("\xE2\x98\x83");

zinovy.nis wrote:
> By the way, AFAIK the lines above are not checked for fixes (and fixes 
> absence) at all! Is it correct?
These test cases without corresponding CHECK-FIXES/CHECK-MESSAGES lines will 
work (that is ensure no fixes are applied to the code) under the assumption 
that all warnings explicitly covered by CHECK-MESSAGE lines will only contain 
fixes for relevant code (in most cases in the same line), and there are 
CHECK-FIXES for all of them.

And if there is no CHECK-MESSAGES line then `FileCheck 
-implicit-check-not='warning|error' -check-prefix=CHECK-MESSAGES` (used in the 
check_clang_tidy.py) will fail if there is an unaccounted (i.e. not covered by 
a CHECK-MESSAGES: pattern) warning.

The assumption won't work if the check makes distant fixes (for example, if it 
inserts #include directives, there should be a CHECK-FIXES line for that).

Thus there's not much value in checking these lines are left unmodified. But in 
case there is a warning which should contain no fix, checking that the code was 
not modified by clang-tidy is quite useful.

We could make this more explicit in different ways:
1. Run FileCheck on the result of `diff original-file 
file-processed-by-clang-tidy`. That may make tests much more verbose.
2. Run FileCheck on the result of `clang-tidy -export-fixes`. That would make 
the tests both more verbose and harder to verify for sanity (applying textual 
replacements with given byte offsets is not what a human brain does well).

Better ideas are welcome.


https://reviews.llvm.org/D45932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals

2018-05-01 Thread Zinovy Nis via Phabricator via cfe-commits
zinovy.nis added a comment.

https://reviews.llvm.org/rCTE331297


https://reviews.llvm.org/D45932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals

2018-05-01 Thread Zinovy Nis via Phabricator via cfe-commits
zinovy.nis closed this revision.
zinovy.nis added a comment.

Fixed in  git-svn-id: 
https://llvm.org/svn/llvm-project/clang-tools-extra/trunk@331297 
91177308-0d34-0410-b5e6-96231b3b80d8


https://reviews.llvm.org/D45932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals

2018-05-01 Thread Zinovy Nis via Phabricator via cfe-commits
zinovy.nis marked an inline comment as done.
zinovy.nis added inline comments.



Comment at: test/clang-tidy/modernize-raw-string-literal.cpp:44
+char const *const MultibyteSnowman("\xE2\x98\x83");
+// CHECK-FIXES: {{^}}char const *const MultibyteSnowman("\xE2\x98\x83");{{$}}
 

LegalizeAdulthood wrote:
> Consider adding CHECK-FIXES lines for the other test cases that should remain 
> unmolested by the check.
I'm going to make a separate commit for that, OK?


https://reviews.llvm.org/D45932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals

2018-05-01 Thread Richard via Phabricator via cfe-commits
LegalizeAdulthood added inline comments.



Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:111
+  // Upper ASCII are disallowed too.
+  for (unsigned char C = 0xFFu; C >= 0x80u; --C)
+DisallowedChars.set(C);

zinovy.nis wrote:
> LegalizeAdulthood wrote:
> > Why does this loop go down instead of up?  I would have expected a more 
> > conventional
> > 
> > for (unsigned char C = 0x80u; C <= 0xFFu; ++C)
> The only reason is that `++C` for `C==0xFFu` is `0` so `C <= 0xFF` is always 
> satisfied leading to inifinite loop.
OK, makes sense.  I rarely write loops that iterate over character types.


https://reviews.llvm.org/D45932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals

2018-05-01 Thread Zinovy Nis via Phabricator via cfe-commits
zinovy.nis added inline comments.



Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:111
+  // Upper ASCII are disallowed too.
+  for (unsigned char C = 0xFFu; C >= 0x80u; --C)
+DisallowedChars.set(C);

LegalizeAdulthood wrote:
> Why does this loop go down instead of up?  I would have expected a more 
> conventional
> 
> for (unsigned char C = 0x80u; C <= 0xFFu; ++C)
The only reason is that `++C` for `C==0xFFu` is `0` so `C <= 0xFF` is always 
satisfied leading to inifinite loop.


https://reviews.llvm.org/D45932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals

2018-05-01 Thread Richard via Phabricator via cfe-commits
LegalizeAdulthood accepted this revision.
LegalizeAdulthood added a comment.
This revision is now accepted and ready to land.

Other than a few minor nits, ship it.




Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:110
+
+  // Upper ASCII are disallowed too.
+  for (unsigned char C = 0xFFu; C >= 0x80u; --C)

"Upper ASCII" is a misnomer.  ASCII only defines character codes 0-127.  
Non-ASCII codes are what is being described here.



Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:111
+  // Upper ASCII are disallowed too.
+  for (unsigned char C = 0xFFu; C >= 0x80u; --C)
+DisallowedChars.set(C);

Why does this loop go down instead of up?  I would have expected a more 
conventional

for (unsigned char C = 0x80u; C <= 0xFFu; ++C)



Comment at: clang-tidy/modernize/RawStringLiteralCheck.h:39
   std::string DelimiterStem;
+  std::bitset<1 << CHAR_BIT> DisallowedChars;
   const bool ReplaceShorterLiterals;

Consider introducing a typedef for this ugly type name.



Comment at: test/clang-tidy/modernize-raw-string-literal.cpp:44
+char const *const MultibyteSnowman("\xE2\x98\x83");
+// CHECK-FIXES: {{^}}char const *const MultibyteSnowman("\xE2\x98\x83");{{$}}
 

Consider adding CHECK-FIXES lines for the other test cases that should remain 
unmolested by the check.


https://reviews.llvm.org/D45932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals

2018-05-01 Thread Zinovy Nis via Phabricator via cfe-commits
zinovy.nis added a comment.

Gentle ping.


https://reviews.llvm.org/D45932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals

2018-04-27 Thread Zinovy Nis via Phabricator via cfe-commits
zinovy.nis added a comment.

Aaron, any comments for the new revision?


https://reviews.llvm.org/D45932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals

2018-04-26 Thread Zinovy Nis via Phabricator via cfe-commits
zinovy.nis updated this revision to Diff 144066.
zinovy.nis added a comment.

- Optimized `containsEscapedCharacters` not to re-create `bitset` 
(implicitly in `StringRef::find_first_of`) for each literal.
- Merged 2 passes for testing for allowed chars into a single one.


https://reviews.llvm.org/D45932

Files:
  clang-tidy/modernize/RawStringLiteralCheck.cpp
  clang-tidy/modernize/RawStringLiteralCheck.h
  test/clang-tidy/modernize-raw-string-literal.cpp

Index: test/clang-tidy/modernize-raw-string-literal.cpp
===
--- test/clang-tidy/modernize-raw-string-literal.cpp
+++ test/clang-tidy/modernize-raw-string-literal.cpp
@@ -40,6 +40,8 @@
 char const *const Us("goink\\\037");
 char const *const HexNonPrintable("\\\x03");
 char const *const Delete("\\\177");
+char const *const MultibyteSnowman("\xE2\x98\x83");
+// CHECK-FIXES: {{^}}char const *const MultibyteSnowman("\xE2\x98\x83");{{$}}
 
 char const *const TrailingSpace("A line \\with space. \n");
 char const *const TrailingNewLine("A single \\line.\n");
Index: clang-tidy/modernize/RawStringLiteralCheck.h
===
--- clang-tidy/modernize/RawStringLiteralCheck.h
+++ clang-tidy/modernize/RawStringLiteralCheck.h
@@ -11,6 +11,7 @@
 #define LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MODERNIZE_RAW_STRING_LITERAL_H
 
 #include "../ClangTidy.h"
+#include 
 
 namespace clang {
 namespace tidy {
@@ -35,6 +36,7 @@
   const StringLiteral *Literal, StringRef Replacement);
 
   std::string DelimiterStem;
+  std::bitset<1 << CHAR_BIT> DisallowedChars;
   const bool ReplaceShorterLiterals;
 };
 
Index: clang-tidy/modernize/RawStringLiteralCheck.cpp
===
--- clang-tidy/modernize/RawStringLiteralCheck.cpp
+++ clang-tidy/modernize/RawStringLiteralCheck.cpp
@@ -41,29 +41,16 @@
   return (QuotePos > 0) && (Text[QuotePos - 1] == 'R');
 }
 
-bool containsEscapedCharacters(const MatchFinder::MatchResult ,
-   const StringLiteral *Literal) {
+bool containsEscapedCharacters(
+const MatchFinder::MatchResult , const StringLiteral *Literal,
+const std::bitset<1 << CHAR_BIT> ) {
   // FIXME: Handle L"", u8"", u"" and U"" literals.
   if (!Literal->isAscii())
 return false;
 
-  StringRef Bytes = Literal->getBytes();
-  // Non-printing characters disqualify this literal:
-  // \007 = \a bell
-  // \010 = \b backspace
-  // \011 = \t horizontal tab
-  // \012 = \n new line
-  // \013 = \v vertical tab
-  // \014 = \f form feed
-  // \015 = \r carriage return
-  // \177 = delete
-  if (Bytes.find_first_of(StringRef("\000\001\002\003\004\005\006\a"
-"\b\t\n\v\f\r\016\017"
-"\020\021\022\023\024\025\026\027"
-"\030\031\032\033\034\035\036\037"
-"\177",
-33)) != StringRef::npos)
-return false;
+  for (const unsigned char C : Literal->getBytes())
+if (DisallowedChars.test(C))
+  return false;
 
   CharSourceRange CharRange = Lexer::makeFileCharRange(
   CharSourceRange::getTokenRange(Literal->getSourceRange()),
@@ -102,7 +89,28 @@
  ClangTidyContext *Context)
 : ClangTidyCheck(Name, Context),
   DelimiterStem(Options.get("DelimiterStem", "lit")),
-  ReplaceShorterLiterals(Options.get("ReplaceShorterLiterals", false)) {}
+  ReplaceShorterLiterals(Options.get("ReplaceShorterLiterals", false)) {
+  // Non-printing characters are disallowed:
+  // \007 = \a bell
+  // \010 = \b backspace
+  // \011 = \t horizontal tab
+  // \012 = \n new line
+  // \013 = \v vertical tab
+  // \014 = \f form feed
+  // \015 = \r carriage return
+  // \177 = delete
+  for (const unsigned char C : StringRef("\000\001\002\003\004\005\006\a"
+ "\b\t\n\v\f\r\016\017"
+ "\020\021\022\023\024\025\026\027"
+ "\030\031\032\033\034\035\036\037"
+ "\177",
+ 33))
+DisallowedChars.set(C);
+
+  // Upper ASCII are disallowed too.
+  for (unsigned char C = 0xFFu; C >= 0x80u; --C)
+DisallowedChars.set(C);
+}
 
 void RawStringLiteralCheck::storeOptions(ClangTidyOptions::OptionMap ) {
   ClangTidyCheck::storeOptions(Options);
@@ -124,7 +132,7 @@
   if (Literal->getLocStart().isMacroID())
 return;
 
-  if (containsEscapedCharacters(Result, Literal)) {
+  if (containsEscapedCharacters(Result, Literal, DisallowedChars)) {
 std::string Replacement = asRawStringLiteral(Literal, DelimiterStem);
 if (ReplaceShorterLiterals ||
 Replacement.length() <=
___
cfe-commits mailing list

[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals

2018-04-24 Thread Zinovy Nis via Phabricator via cfe-commits
zinovy.nis added inline comments.



Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:71
+  // supported by specific code pages only.
+  if (Bytes.find_if_not(isASCII) != StringRef::npos)
+return false;

aaron.ballman wrote:
> zinovy.nis wrote:
> > aaron.ballman wrote:
> > > zinovy.nis wrote:
> > > > aaron.ballman wrote:
> > > > > I am starting to think that this functionality should be refactored 
> > > > > because the check is now O(N^2) in the worst case because all of the 
> > > > > bytes of the string need to be touched twice. It would be nice for 
> > > > > performance reasons to combine this so that there's only a single 
> > > > > pass over all of the characters.
> > > > > 
> > > > > What do you think?
> > > > Sorry, but why O(N^2)? `isASCII` is O(1), it's just `return C<=127`. 
> > > > `find_if_not` is O(N).
> > > The `find_if_not()` call you add touches every character in the string to 
> > > see if it's ASCII and it all characters are ASCII, then 
> > > `containsEscape()` calls `find()` which touches every character again 
> > > (assuming the searched character is never encountered).
> > > 
> > > Looking a bit deeper, `isRawStringLiteral()` also calls `find()`, but it 
> > > asserts that the character is found, so not every character is touched in 
> > > the string and it should find a the searched character quite quickly.
> > OK, I'll see how to combine theses checks into a single one.
> > 
> > But anyway I see only 2*O(N), not O(N^2) here.
> Oh, derp, that's my thinko -- sorry! You are correct, that's 2 * O(N) and not 
> O(N^2).
May be we can use here [[ http://en.cppreference.com/w/cpp/string/byte/isprint 
| std::isprint ]]


https://reviews.llvm.org/D45932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals

2018-04-24 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added inline comments.



Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:71
+  // supported by specific code pages only.
+  if (Bytes.find_if_not(isASCII) != StringRef::npos)
+return false;

zinovy.nis wrote:
> aaron.ballman wrote:
> > zinovy.nis wrote:
> > > aaron.ballman wrote:
> > > > I am starting to think that this functionality should be refactored 
> > > > because the check is now O(N^2) in the worst case because all of the 
> > > > bytes of the string need to be touched twice. It would be nice for 
> > > > performance reasons to combine this so that there's only a single pass 
> > > > over all of the characters.
> > > > 
> > > > What do you think?
> > > Sorry, but why O(N^2)? `isASCII` is O(1), it's just `return C<=127`. 
> > > `find_if_not` is O(N).
> > The `find_if_not()` call you add touches every character in the string to 
> > see if it's ASCII and it all characters are ASCII, then `containsEscape()` 
> > calls `find()` which touches every character again (assuming the searched 
> > character is never encountered).
> > 
> > Looking a bit deeper, `isRawStringLiteral()` also calls `find()`, but it 
> > asserts that the character is found, so not every character is touched in 
> > the string and it should find a the searched character quite quickly.
> OK, I'll see how to combine theses checks into a single one.
> 
> But anyway I see only 2*O(N), not O(N^2) here.
Oh, derp, that's my thinko -- sorry! You are correct, that's 2 * O(N) and not 
O(N^2).


https://reviews.llvm.org/D45932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals

2018-04-24 Thread Zinovy Nis via Phabricator via cfe-commits
zinovy.nis added inline comments.



Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:71
+  // supported by specific code pages only.
+  if (Bytes.find_if_not(isASCII) != StringRef::npos)
+return false;

aaron.ballman wrote:
> zinovy.nis wrote:
> > aaron.ballman wrote:
> > > I am starting to think that this functionality should be refactored 
> > > because the check is now O(N^2) in the worst case because all of the 
> > > bytes of the string need to be touched twice. It would be nice for 
> > > performance reasons to combine this so that there's only a single pass 
> > > over all of the characters.
> > > 
> > > What do you think?
> > Sorry, but why O(N^2)? `isASCII` is O(1), it's just `return C<=127`. 
> > `find_if_not` is O(N).
> The `find_if_not()` call you add touches every character in the string to see 
> if it's ASCII and it all characters are ASCII, then `containsEscape()` calls 
> `find()` which touches every character again (assuming the searched character 
> is never encountered).
> 
> Looking a bit deeper, `isRawStringLiteral()` also calls `find()`, but it 
> asserts that the character is found, so not every character is touched in the 
> string and it should find a the searched character quite quickly.
OK, I'll see how to combine theses checks into a single one.

But anyway I see only 2*O(N), not O(N^2) here.


https://reviews.llvm.org/D45932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals

2018-04-24 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added inline comments.



Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:71
+  // supported by specific code pages only.
+  if (Bytes.find_if_not(isASCII) != StringRef::npos)
+return false;

zinovy.nis wrote:
> aaron.ballman wrote:
> > I am starting to think that this functionality should be refactored because 
> > the check is now O(N^2) in the worst case because all of the bytes of the 
> > string need to be touched twice. It would be nice for performance reasons 
> > to combine this so that there's only a single pass over all of the 
> > characters.
> > 
> > What do you think?
> Sorry, but why O(N^2)? `isASCII` is O(1), it's just `return C<=127`. 
> `find_if_not` is O(N).
The `find_if_not()` call you add touches every character in the string to see 
if it's ASCII and it all characters are ASCII, then `containsEscape()` calls 
`find()` which touches every character again (assuming the searched character 
is never encountered).

Looking a bit deeper, `isRawStringLiteral()` also calls `find()`, but it 
asserts that the character is found, so not every character is touched in the 
string and it should find a the searched character quite quickly.


https://reviews.llvm.org/D45932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals

2018-04-24 Thread Zinovy Nis via Phabricator via cfe-commits
zinovy.nis added inline comments.



Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:71
+  // supported by specific code pages only.
+  if (Bytes.find_if_not(isASCII) != StringRef::npos)
+return false;

aaron.ballman wrote:
> I am starting to think that this functionality should be refactored because 
> the check is now O(N^2) in the worst case because all of the bytes of the 
> string need to be touched twice. It would be nice for performance reasons to 
> combine this so that there's only a single pass over all of the characters.
> 
> What do you think?
Sorry, but why O(N^2)? `isASCII` is O(1), it's just `return C<=127`. 
`find_if_not` is O(N).


https://reviews.llvm.org/D45932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals

2018-04-24 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added inline comments.



Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:71
+  // supported by specific code pages only.
+  if (Bytes.find_if_not(isASCII) != StringRef::npos)
+return false;

I am starting to think that this functionality should be refactored because the 
check is now O(N^2) in the worst case because all of the bytes of the string 
need to be touched twice. It would be nice for performance reasons to combine 
this so that there's only a single pass over all of the characters.

What do you think?


https://reviews.llvm.org/D45932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals

2018-04-24 Thread Zinovy Nis via Phabricator via cfe-commits
zinovy.nis updated this revision to Diff 143679.
zinovy.nis marked an inline comment as done.
zinovy.nis added a comment.

- Use clang::isASCII instead of home-brewed code.


https://reviews.llvm.org/D45932

Files:
  clang-tidy/modernize/RawStringLiteralCheck.cpp
  test/clang-tidy/modernize-raw-string-literal.cpp


Index: test/clang-tidy/modernize-raw-string-literal.cpp
===
--- test/clang-tidy/modernize-raw-string-literal.cpp
+++ test/clang-tidy/modernize-raw-string-literal.cpp
@@ -40,6 +40,8 @@
 char const *const Us("goink\\\037");
 char const *const HexNonPrintable("\\\x03");
 char const *const Delete("\\\177");
+char const *const MultibyteSnowman("\xE2\x98\x83");
+// CHECK-FIXES: {{^}}char const *const MultibyteSnowman("\xE2\x98\x83");{{$}}
 
 char const *const TrailingSpace("A line \\with space. \n");
 char const *const TrailingNewLine("A single \\line.\n");
Index: clang-tidy/modernize/RawStringLiteralCheck.cpp
===
--- clang-tidy/modernize/RawStringLiteralCheck.cpp
+++ clang-tidy/modernize/RawStringLiteralCheck.cpp
@@ -10,6 +10,7 @@
 #include "RawStringLiteralCheck.h"
 #include "clang/AST/ASTContext.h"
 #include "clang/ASTMatchers/ASTMatchFinder.h"
+#include "clang/Basic/CharInfo.h"
 #include "clang/Lex/Lexer.h"
 
 using namespace clang::ast_matchers;
@@ -65,6 +66,11 @@
 33)) != StringRef::npos)
 return false;
 
+  // Don't replace chars from upper ASCII as they can represent UTF-8 or be
+  // supported by specific code pages only.
+  if (Bytes.find_if_not(isASCII) != StringRef::npos)
+return false;
+
   CharSourceRange CharRange = Lexer::makeFileCharRange(
   CharSourceRange::getTokenRange(Literal->getSourceRange()),
   *Result.SourceManager, Result.Context->getLangOpts());


Index: test/clang-tidy/modernize-raw-string-literal.cpp
===
--- test/clang-tidy/modernize-raw-string-literal.cpp
+++ test/clang-tidy/modernize-raw-string-literal.cpp
@@ -40,6 +40,8 @@
 char const *const Us("goink\\\037");
 char const *const HexNonPrintable("\\\x03");
 char const *const Delete("\\\177");
+char const *const MultibyteSnowman("\xE2\x98\x83");
+// CHECK-FIXES: {{^}}char const *const MultibyteSnowman("\xE2\x98\x83");{{$}}
 
 char const *const TrailingSpace("A line \\with space. \n");
 char const *const TrailingNewLine("A single \\line.\n");
Index: clang-tidy/modernize/RawStringLiteralCheck.cpp
===
--- clang-tidy/modernize/RawStringLiteralCheck.cpp
+++ clang-tidy/modernize/RawStringLiteralCheck.cpp
@@ -10,6 +10,7 @@
 #include "RawStringLiteralCheck.h"
 #include "clang/AST/ASTContext.h"
 #include "clang/ASTMatchers/ASTMatchFinder.h"
+#include "clang/Basic/CharInfo.h"
 #include "clang/Lex/Lexer.h"
 
 using namespace clang::ast_matchers;
@@ -65,6 +66,11 @@
 33)) != StringRef::npos)
 return false;
 
+  // Don't replace chars from upper ASCII as they can represent UTF-8 or be
+  // supported by specific code pages only.
+  if (Bytes.find_if_not(isASCII) != StringRef::npos)
+return false;
+
   CharSourceRange CharRange = Lexer::makeFileCharRange(
   CharSourceRange::getTokenRange(Literal->getSourceRange()),
   *Result.SourceManager, Result.Context->getLangOpts());
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals

2018-04-24 Thread Zinovy Nis via Phabricator via cfe-commits
zinovy.nis marked an inline comment as done.
zinovy.nis added inline comments.



Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:70-72
+  if (Bytes.find_if([](char C) {
+return static_cast(C) > 0x7Fu;
+  }) != StringRef::npos)

aaron.ballman wrote:
> I think you can use `isASCII()` from CharInfo.h rather than reimplement it.
Nice finding! Thanks!



Comment at: test/clang-tidy/modernize-raw-string-literal.cpp:44
+char const *const MultibyteSnowman("\xE2\x98\x83");
+// CHECK-FIXES: {{^}}char const *const MultibyteSnowman("\xE2\x98\x83");{{$}}
 

LegalizeAdulthood wrote:
> IIRC, the default behavior is that if no matching CHECK-FIXES line is found, 
> then it is considered a failure.  Have you tried your test code without your 
> change to verify that this is the case?
1. Without my fix my test fails with a Python decoder error as it cannot print 
Unicode symbols in Windows console. But no FileCheck errors occur:


```
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 974: 
ordinal not in range(128)
```


2. Regardless of my changes when I replace `"char const *const 
Delete("\\\177");"` with `"char const *const Delete("\000\\\177");"` (leading 
`\0`) test still passes! Looks like CHECK-FIXes must be explicit.


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D45932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals

2018-04-23 Thread Richard via Phabricator via cfe-commits
LegalizeAdulthood added inline comments.



Comment at: test/clang-tidy/modernize-raw-string-literal.cpp:44
+char const *const MultibyteSnowman("\xE2\x98\x83");
+// CHECK-FIXES: {{^}}char const *const MultibyteSnowman("\xE2\x98\x83");{{$}}
 

IIRC, the default behavior is that if no matching CHECK-FIXES line is found, 
then it is considered a failure.  Have you tried your test code without your 
change to verify that this is the case?


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D45932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals

2018-04-23 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added inline comments.



Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:70-72
+  if (Bytes.find_if([](char C) {
+return static_cast(C) > 0x7Fu;
+  }) != StringRef::npos)

I think you can use `isASCII()` from CharInfo.h rather than reimplement it.


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D45932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals

2018-04-22 Thread Zinovy Nis via Phabricator via cfe-commits
zinovy.nis added inline comments.



Comment at: test/clang-tidy/modernize-raw-string-literal.cpp:42
 char const *const HexNonPrintable("\\\x03");
 char const *const Delete("\\\177");
+char const *const MultibyteSnowman("\xE2\x98\x83");

By the way, AFAIK the lines above are not checked for fixes (and fixes absence) 
at all! Is it correct?


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D45932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D45932: [clang-tidy][modernize-raw-string-literal] Don't replace upper ASCII with raw literals

2018-04-22 Thread Zinovy Nis via Phabricator via cfe-commits
zinovy.nis created this revision.
zinovy.nis added reviewers: xazax.hun, LegalizeAdulthood.
zinovy.nis added a project: clang-tools-extra.
Herald added subscribers: cfe-commits, rnkovacs.

It's useless and not safe to replace

`"\xE2\x98\x83"` with `"☃"` (snowman)

Especially there's an explicit test for ASCII in this check.


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D45932

Files:
  clang-tidy/modernize/RawStringLiteralCheck.cpp
  test/clang-tidy/modernize-raw-string-literal.cpp


Index: test/clang-tidy/modernize-raw-string-literal.cpp
===
--- test/clang-tidy/modernize-raw-string-literal.cpp
+++ test/clang-tidy/modernize-raw-string-literal.cpp
@@ -40,6 +40,8 @@
 char const *const Us("goink\\\037");
 char const *const HexNonPrintable("\\\x03");
 char const *const Delete("\\\177");
+char const *const MultibyteSnowman("\xE2\x98\x83");
+// CHECK-FIXES: {{^}}char const *const MultibyteSnowman("\xE2\x98\x83");{{$}}
 
 char const *const TrailingSpace("A line \\with space. \n");
 char const *const TrailingNewLine("A single \\line.\n");
Index: clang-tidy/modernize/RawStringLiteralCheck.cpp
===
--- clang-tidy/modernize/RawStringLiteralCheck.cpp
+++ clang-tidy/modernize/RawStringLiteralCheck.cpp
@@ -65,6 +65,13 @@
 33)) != StringRef::npos)
 return false;
 
+  // Don't replace chars from upper ASCII (>0x7F) as they can represent UTF-8
+  // or be supported by specific code pages only.
+  if (Bytes.find_if([](char C) {
+return static_cast(C) > 0x7Fu;
+  }) != StringRef::npos)
+return false;
+
   CharSourceRange CharRange = Lexer::makeFileCharRange(
   CharSourceRange::getTokenRange(Literal->getSourceRange()),
   *Result.SourceManager, Result.Context->getLangOpts());


Index: test/clang-tidy/modernize-raw-string-literal.cpp
===
--- test/clang-tidy/modernize-raw-string-literal.cpp
+++ test/clang-tidy/modernize-raw-string-literal.cpp
@@ -40,6 +40,8 @@
 char const *const Us("goink\\\037");
 char const *const HexNonPrintable("\\\x03");
 char const *const Delete("\\\177");
+char const *const MultibyteSnowman("\xE2\x98\x83");
+// CHECK-FIXES: {{^}}char const *const MultibyteSnowman("\xE2\x98\x83");{{$}}
 
 char const *const TrailingSpace("A line \\with space. \n");
 char const *const TrailingNewLine("A single \\line.\n");
Index: clang-tidy/modernize/RawStringLiteralCheck.cpp
===
--- clang-tidy/modernize/RawStringLiteralCheck.cpp
+++ clang-tidy/modernize/RawStringLiteralCheck.cpp
@@ -65,6 +65,13 @@
 33)) != StringRef::npos)
 return false;
 
+  // Don't replace chars from upper ASCII (>0x7F) as they can represent UTF-8
+  // or be supported by specific code pages only.
+  if (Bytes.find_if([](char C) {
+return static_cast(C) > 0x7Fu;
+  }) != StringRef::npos)
+return false;
+
   CharSourceRange CharRange = Lexer::makeFileCharRange(
   CharSourceRange::getTokenRange(Literal->getSourceRange()),
   *Result.SourceManager, Result.Context->getLangOpts());
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits