[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos

2018-04-10 Thread Alexander Kornienko via Phabricator via cfe-commits
alexfh added inline comments.



Comment at: lib/Format/FormatTokenLexer.cpp:371
   FormatToken *Macro = Tokens[Tokens.size() - 4];
-  if (Macro->TokenText != "_T")
+  if (std::find(Style.TMacros.begin(), Style.TMacros.end(), Macro->TokenText) 
==
+  Style.TMacros.end())

obfuscated wrote:
> alexfh wrote:
> > Consider using llvm::find, which is a range adaptor for std::find.
> No idea what is range adaptor, but I can probably switch to it... :)
It works with a range (something that has .begin() and .end()) instead of two 
iterators.



Comment at: lib/Format/FormatTokenLexer.cpp:386
   String->HasUnescapedNewline = Macro->HasUnescapedNewline;
+  String->TMacroStringLiteral = true;
 

obfuscated wrote:
> alexfh wrote:
> > obfuscated wrote:
> > > krasimir wrote:
> > > > In the original code, TMacro detection was done as:
> > > > ```
> > > > (Text.startswith(Prefix = "_T(\"") && Text.endswith(Postfix = "\")"))
> > > > ```
> > > > In the new code the left part is saved in TMacroStringLiteral, and the 
> > > > right part is checked in ContinuationIndenter. Could you keep them 
> > > > together in `FormatTokenLexer`.
> > > > @alexfh, why are we checking for the right check at all? What would be 
> > > > an example where this is needed to disambiguate?
> > > > 
> > > Are you talking about the code in 
> > > ContinuationIndenter::createBreakableToken?
> > > 
> > > I don't think I understand how the hole thing works.
> > > Using the debugger I can see that this function is executed first and 
> > > then createBreakableToken.
> > > So we first detect the tmacro in FormatTokenLexer::tryMerge_TMacro and 
> > > then use this information in the createBreakableToken to do something 
> > > with it.
> > > 
> > > So when I get a TMacroStringLiteral==true in createBreakableToken I know 
> > > that the token starts with a TMacro and there is no need to use some king 
> > > of startswith calls. Probably the endswith call is redundant and I can do 
> > > just a string search backwards...
> > > In the new code the left part is saved in TMacroStringLiteral, and the 
> > > right part is checked in ContinuationIndenter. Could you keep them 
> > > together in FormatTokenLexer.
> > 
> > +1 to keeping these conditions together.
> > 
> > > @alexfh, why are we checking for the right check at all? What would be an 
> > > example where this is needed to disambiguate?
> > 
> > I don't know whether there's reasonable code that would be handled 
> > incorrectly without checking for the `")`, but I'm sure some test case can 
> > be generated, that would break clang-format badly without this condition. 
> > It also serves as a documentation (similar to asserts).
> >> In the new code the left part is saved in TMacroStringLiteral, and the 
> >> right part is checked in ContinuationIndenter. Could you keep them 
> >> together in FormatTokenLexer.
> >
> > +1 to keeping these conditions together.
> 
> Can you please explain what you mean here?
> And what am I supposed to do, because I don't know how to put these 
> conditions together.
> Do you want me to search the tmacro vector in 
> ContinuationIndenter::createBreakableToken instead of checking the bool flag?
> 
> 
> 
Looking at this code again, I see what the checks for `_T("` and `")` were 
needed for. If there was a space between `_T(` and `"` or between `"` and `)`, 
the code in `createBreakableToken()` would probably not work correctly (see the 
FIXME). `tryMerge_TMacro()` looks at tokens and doesn't care about the 
whitespace, so checking for `TMacroStringLiteral` is not equivalent to checking 
for `("` and `")`. In particular, if you only remove just the first 
check, a test case like `_T  (   "...")` may start being formatted in some way. 
We could remove the second check as well, if we manage to make formatting 
decent. Otherwise, performing these checks (that there are no spaces between 
`_T`, `(`, `"..."` and `)`) in `tryMerge_TMacro()` may be a better option than 
doing this in `createBreakableToken()`.


https://reviews.llvm.org/D44765



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos

2018-04-10 Thread Teodor Petrov via Phabricator via cfe-commits
obfuscated marked an inline comment as done.
obfuscated added a comment.

Ping?


https://reviews.llvm.org/D44765



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos

2018-04-04 Thread Teodor Petrov via Phabricator via cfe-commits
obfuscated marked 3 inline comments as done.
obfuscated added inline comments.



Comment at: lib/Format/FormatTokenLexer.cpp:371
   FormatToken *Macro = Tokens[Tokens.size() - 4];
-  if (Macro->TokenText != "_T")
+  if (std::find(Style.TMacros.begin(), Style.TMacros.end(), Macro->TokenText) 
==
+  Style.TMacros.end())

alexfh wrote:
> Consider using llvm::find, which is a range adaptor for std::find.
No idea what is range adaptor, but I can probably switch to it... :)



Comment at: lib/Format/FormatTokenLexer.cpp:386
   String->HasUnescapedNewline = Macro->HasUnescapedNewline;
+  String->TMacroStringLiteral = true;
 

alexfh wrote:
> obfuscated wrote:
> > krasimir wrote:
> > > In the original code, TMacro detection was done as:
> > > ```
> > > (Text.startswith(Prefix = "_T(\"") && Text.endswith(Postfix = "\")"))
> > > ```
> > > In the new code the left part is saved in TMacroStringLiteral, and the 
> > > right part is checked in ContinuationIndenter. Could you keep them 
> > > together in `FormatTokenLexer`.
> > > @alexfh, why are we checking for the right check at all? What would be an 
> > > example where this is needed to disambiguate?
> > > 
> > Are you talking about the code in 
> > ContinuationIndenter::createBreakableToken?
> > 
> > I don't think I understand how the hole thing works.
> > Using the debugger I can see that this function is executed first and then 
> > createBreakableToken.
> > So we first detect the tmacro in FormatTokenLexer::tryMerge_TMacro and then 
> > use this information in the createBreakableToken to do something with it.
> > 
> > So when I get a TMacroStringLiteral==true in createBreakableToken I know 
> > that the token starts with a TMacro and there is no need to use some king 
> > of startswith calls. Probably the endswith call is redundant and I can do 
> > just a string search backwards...
> > In the new code the left part is saved in TMacroStringLiteral, and the 
> > right part is checked in ContinuationIndenter. Could you keep them together 
> > in FormatTokenLexer.
> 
> +1 to keeping these conditions together.
> 
> > @alexfh, why are we checking for the right check at all? What would be an 
> > example where this is needed to disambiguate?
> 
> I don't know whether there's reasonable code that would be handled 
> incorrectly without checking for the `")`, but I'm sure some test case can be 
> generated, that would break clang-format badly without this condition. It 
> also serves as a documentation (similar to asserts).
>> In the new code the left part is saved in TMacroStringLiteral, and the 
>> right part is checked in ContinuationIndenter. Could you keep them together 
>> in FormatTokenLexer.
>
> +1 to keeping these conditions together.

Can you please explain what you mean here?
And what am I supposed to do, because I don't know how to put these conditions 
together.
Do you want me to search the tmacro vector in 
ContinuationIndenter::createBreakableToken instead of checking the bool flag?





https://reviews.llvm.org/D44765



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos

2018-04-04 Thread Alexander Kornienko via Phabricator via cfe-commits
alexfh added inline comments.



Comment at: include/clang/Format/Format.h:1676
 
+  /// \brief A vector of macros that should be interpreted as macros expanding
+  /// to a string literal encoding prefix instead of as function calls.

"A list of macro names"?



Comment at: include/clang/Format/Format.h:1686-1689
+  /// These are expected to be macros of the form:
+  /// \code
+  ///   _T("...some string...")
+  /// \endcode

According to the description above, the option contains "a vector of macros", 
and here it says "these are expected to be macros of the form ...". This all is 
rather confusing. The above description should refer to names of macros, and 
this example should be made clearer in what it demonstrates.



Comment at: lib/Format/FormatTokenLexer.cpp:371
   FormatToken *Macro = Tokens[Tokens.size() - 4];
-  if (Macro->TokenText != "_T")
+  if (std::find(Style.TMacros.begin(), Style.TMacros.end(), Macro->TokenText) 
==
+  Style.TMacros.end())

Consider using llvm::find, which is a range adaptor for std::find.



Comment at: lib/Format/FormatTokenLexer.cpp:386
   String->HasUnescapedNewline = Macro->HasUnescapedNewline;
+  String->TMacroStringLiteral = true;
 

obfuscated wrote:
> krasimir wrote:
> > In the original code, TMacro detection was done as:
> > ```
> > (Text.startswith(Prefix = "_T(\"") && Text.endswith(Postfix = "\")"))
> > ```
> > In the new code the left part is saved in TMacroStringLiteral, and the 
> > right part is checked in ContinuationIndenter. Could you keep them together 
> > in `FormatTokenLexer`.
> > @alexfh, why are we checking for the right check at all? What would be an 
> > example where this is needed to disambiguate?
> > 
> Are you talking about the code in ContinuationIndenter::createBreakableToken?
> 
> I don't think I understand how the hole thing works.
> Using the debugger I can see that this function is executed first and then 
> createBreakableToken.
> So we first detect the tmacro in FormatTokenLexer::tryMerge_TMacro and then 
> use this information in the createBreakableToken to do something with it.
> 
> So when I get a TMacroStringLiteral==true in createBreakableToken I know that 
> the token starts with a TMacro and there is no need to use some king of 
> startswith calls. Probably the endswith call is redundant and I can do just a 
> string search backwards...
> In the new code the left part is saved in TMacroStringLiteral, and the right 
> part is checked in ContinuationIndenter. Could you keep them together in 
> FormatTokenLexer.

+1 to keeping these conditions together.

> @alexfh, why are we checking for the right check at all? What would be an 
> example where this is needed to disambiguate?

I don't know whether there's reasonable code that would be handled incorrectly 
without checking for the `")`, but I'm sure some test case can be generated, 
that would break clang-format badly without this condition. It also serves as a 
documentation (similar to asserts).


https://reviews.llvm.org/D44765



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos

2018-03-29 Thread Teodor Petrov via Phabricator via cfe-commits
obfuscated added inline comments.



Comment at: lib/Format/FormatTokenLexer.cpp:386
   String->HasUnescapedNewline = Macro->HasUnescapedNewline;
+  String->TMacroStringLiteral = true;
 

krasimir wrote:
> In the original code, TMacro detection was done as:
> ```
> (Text.startswith(Prefix = "_T(\"") && Text.endswith(Postfix = "\")"))
> ```
> In the new code the left part is saved in TMacroStringLiteral, and the right 
> part is checked in ContinuationIndenter. Could you keep them together in 
> `FormatTokenLexer`.
> @alexfh, why are we checking for the right check at all? What would be an 
> example where this is needed to disambiguate?
> 
Are you talking about the code in ContinuationIndenter::createBreakableToken?

I don't think I understand how the hole thing works.
Using the debugger I can see that this function is executed first and then 
createBreakableToken.
So we first detect the tmacro in FormatTokenLexer::tryMerge_TMacro and then use 
this information in the createBreakableToken to do something with it.

So when I get a TMacroStringLiteral==true in createBreakableToken I know that 
the token starts with a TMacro and there is no need to use some king of 
startswith calls. Probably the endswith call is redundant and I can do just a 
string search backwards...


https://reviews.llvm.org/D44765



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos

2018-03-29 Thread Krasimir Georgiev via Phabricator via cfe-commits
krasimir added inline comments.



Comment at: lib/Format/FormatTokenLexer.cpp:386
   String->HasUnescapedNewline = Macro->HasUnescapedNewline;
+  String->TMacroStringLiteral = true;
 

In the original code, TMacro detection was done as:
```
(Text.startswith(Prefix = "_T(\"") && Text.endswith(Postfix = "\")"))
```
In the new code the left part is saved in TMacroStringLiteral, and the right 
part is checked in ContinuationIndenter. Could you keep them together in 
`FormatTokenLexer`.
@alexfh, why are we checking for the right check at all? What would be an 
example where this is needed to disambiguate?



https://reviews.llvm.org/D44765



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos

2018-03-29 Thread Teodor Petrov via Phabricator via cfe-commits
obfuscated updated this revision to Diff 140191.
obfuscated marked an inline comment as done.
obfuscated added a comment.

Fixed tests. Fixed description.


https://reviews.llvm.org/D44765

Files:
  include/clang/Format/Format.h
  lib/Format/ContinuationIndenter.cpp
  lib/Format/Format.cpp
  lib/Format/FormatToken.h
  lib/Format/FormatTokenLexer.cpp
  unittests/Format/FormatTest.cpp

Index: unittests/Format/FormatTest.cpp
===
--- unittests/Format/FormatTest.cpp
+++ unittests/Format/FormatTest.cpp
@@ -7935,6 +7935,48 @@
"_T(\"Xn\"));"));
 }
 
+TEST_F(FormatTest, BreaksStringLiteralsWithin_GenericTMacro) {
+  FormatStyle Style = getLLVMStyleWithColumns(25);
+  Style.TMacros.push_back("blablaT");
+  EXPECT_EQ(
+  "blablaT(\"aa\")\n"
+  "blablaT(\"aa\")\n"
+  "blablaT(\"\")",
+  format("  blablaT(\"\")", Style));
+  EXPECT_EQ("f(x,\n"
+"  blablaT(\"\")\n"
+"  blablaT(\"aaa\"),\n"
+"  z);",
+format("f(x, blablaT(\"aaa\"), z);", Style));
+
+  // FIXME: Handle embedded spaces in one iteration.
+  //  EXPECT_EQ("blablaT(\"a\")\n"
+  //"blablaT(\"a\")\n"
+  //"blablaT(\"a\")\n"
+  //"blablaT(\"a\")",
+  //format("  blablaT ( \"\" )",
+  //   getLLVMStyleWithColumns(20)));
+  EXPECT_EQ(
+  "blablaT ( \"\" )",
+  format("  blablaT ( \"\" )", Style));
+  EXPECT_EQ("f(\n"
+"#if !TEST\n"
+"blablaT(\"Xn\")\n"
+"#endif\n"
+");",
+format("f(\n"
+   "#if !TEST\n"
+   "blablaT(\"Xn\")\n"
+   "#endif\n"
+   ");"));
+  EXPECT_EQ("f(\n"
+"\n"
+"blablaT(\"Xn\"));",
+format("f(\n"
+   "\n"
+   "blablaT(\"Xn\"));"));
+}
+
 TEST_F(FormatTest, BreaksStringLiteralOperands) {
   // In a function call with two operands, the second can be broken with no line
   // break before it.
@@ -10647,6 +10689,10 @@
   "  - 'CPPEVAL'\n"
   "CanonicalDelimiter: 'cc'",
   RawStringFormats, ExpectedRawStringFormats);
+
+  Style.TMacros.clear();
+  std::vector ExpectedTMacros = {"_T", "myT"};
+  CHECK_PARSE("TMacros: [_T, myT]", TMacros, ExpectedTMacros);
 }
 
 TEST_F(FormatTest, ParsesConfigurationWithLanguages) {
Index: lib/Format/FormatTokenLexer.cpp
===
--- lib/Format/FormatTokenLexer.cpp
+++ lib/Format/FormatTokenLexer.cpp
@@ -368,7 +368,8 @@
 return false;
 
   FormatToken *Macro = Tokens[Tokens.size() - 4];
-  if (Macro->TokenText != "_T")
+  if (std::find(Style.TMacros.begin(), Style.TMacros.end(), Macro->TokenText) ==
+  Style.TMacros.end())
 return false;
 
   const char *Start = Macro->TokenText.data();
@@ -382,6 +383,7 @@
   String->TokenText, String->OriginalColumn, Style.TabWidth, Encoding);
   String->NewlinesBefore = Macro->NewlinesBefore;
   String->HasUnescapedNewline = Macro->HasUnescapedNewline;
+  String->TMacroStringLiteral = true;
 
   Tokens.pop_back();
   Tokens.pop_back();
Index: lib/Format/FormatToken.h
===
--- lib/Format/FormatToken.h
+++ lib/Format/FormatToken.h
@@ -134,6 +134,9 @@
   /// Token.
   bool HasUnescapedNewline = false;
 
+  /// \brief Whether this is a string literal similar to _T("sdfsdf").
+  bool TMacroStringLiteral = false;
+
   /// \brief The range of the whitespace immediately preceding the \c Token.
   SourceRange WhitespaceRange;
 
Index: lib/Format/Format.cpp
===
--- lib/Format/Format.cpp
+++ lib/Format/Format.cpp
@@ -430,6 +430,7 @@
 IO.mapOptional("SpacesInSquareBrackets", Style.SpacesInSquareBrackets);
 IO.mapOptional("Standard", Style.Standard);
 IO.mapOptional("TabWidth", Style.TabWidth);
+IO.mapOptional("TMacros", Style.TMacros);
 IO.mapOptional("UseTab", Style.UseTab);
   }
 };
@@ -686,6 +687,7 @@
   LLVMStyle.DisableFormat = false;
   LLVMStyle.SortIncludes = true;
   LLVMStyle.SortUsingDeclarations = true;
+  LLVMStyle.TMacros.push_back("_T");
 
   return LLVMStyle;
 }
Index: lib/Format/ContinuationIndenter.cpp
===
--- lib/Format/ContinuationIndenter.cpp
+++ lib/Format/ContinuationIndenter.cpp
@@ -1599,12 +1599,10 @@
 // FIXME: Store Prefix and Suffix (or 

[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos

2018-03-27 Thread Teodor Petrov via Phabricator via cfe-commits
obfuscated marked an inline comment as done.
obfuscated added inline comments.



Comment at: lib/Format/FormatToken.h:138
+  /// \brief Whether this is a string literal similar to _T("sdfsdf").
+  bool TMacroStringLiteral = false;
+

krasimir wrote:
> We don't strictly need this new field. We could do as in the old 
> implementation and check if the string prefix is from a T macro in 
> ContinuationIndenter.
Using a flag is more reliable and faster - the checks are done once, so they 
don't have to be repeated.

From the field layout in the struct and from the usage of bools I could only 
conclude that this is not a memory nor performance critical part of the code.




Comment at: unittests/Format/FormatTest.cpp:10693
+
+  // FIXME: This is required because parsing a configuration simply overwrites
+  // the first N elements of the list instead of resetting it.

krasimir wrote:
> Why is the `FIXME` here? I suggest just use the pattern similar to the other 
> cases here and just keep the test with 2 elements:
> ```
> Style.TMacros.clear();
> std::vector ExpectedTMacros = {"_T", "myT"};
> CHECK_PARSE("TMacros: [_T, myT]", TMacros, ExpectedTMacros);
> ```
I've copy pasted this from the foreach macro option.
I've not investigate why this fixme is there.


Repository:
  rC Clang

https://reviews.llvm.org/D44765



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos

2018-03-27 Thread Krasimir Georgiev via Phabricator via cfe-commits
krasimir added a comment.

I don't have preferences over names, but I agree with Alex that the option 
should have more detailed description.




Comment at: lib/Format/FormatToken.h:138
+  /// \brief Whether this is a string literal similar to _T("sdfsdf").
+  bool TMacroStringLiteral = false;
+

We don't strictly need this new field. We could do as in the old implementation 
and check if the string prefix is from a T macro in ContinuationIndenter.



Comment at: unittests/Format/FormatTest.cpp:10693
+
+  // FIXME: This is required because parsing a configuration simply overwrites
+  // the first N elements of the list instead of resetting it.

Why is the `FIXME` here? I suggest just use the pattern similar to the other 
cases here and just keep the test with 2 elements:
```
Style.TMacros.clear();
std::vector ExpectedTMacros = {"_T", "myT"};
CHECK_PARSE("TMacros: [_T, myT]", TMacros, ExpectedTMacros);
```


Repository:
  rC Clang

https://reviews.llvm.org/D44765



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos

2018-03-26 Thread Alexander Kornienko via Phabricator via cfe-commits
alexfh requested changes to this revision.
alexfh added a comment.
This revision now requires changes to proceed.

Thank you! One more comment from me and I'll leave the rest of the review to 
Krasimir, who has a better idea of how the configuration options should be 
named etc.




Comment at: include/clang/Format/Format.h:1676-1677
 
+  /// \brief A vector of macros that should be interpreted as string wrapping
+  /// macros instead of as function calls.
+  ///

I'm not sure "string wrapping macros" is the right term or can be easily 
understood by the reader of the code. The relevant MSDN page is talking about 
"generic-text mapping". I'm not sure this is a commonly used term either, but 
at least there's a precedent. Another relevant term is "encoding prefix". With 
these in mind, I could suggest following descriptions: "generic-text mapping 
macros", "string literal encoding prefix macros", "macros expanding to a string 
literal encoding prefix".

I'd also expand this a bit explaining where these macros are coming from and 
how they are special. Something along the lines of:

"Some libraries provide macro(s) to change the encoding prefix of string 
literals depending on configuration, for example, `_T()` macro on Microsoft 
platforms. When splitting string literals, the macro should be applied to each 
fragment of the literal to apply the same encoding prefix to all of them, which 
requires special treatment from clang-format. This option lists the names of 
these special macros."


Repository:
  rC Clang

https://reviews.llvm.org/D44765



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos

2018-03-23 Thread Teodor Petrov via Phabricator via cfe-commits
obfuscated updated this revision to Diff 139687.
obfuscated added a comment.

Attached both commits in a single diff...


Repository:
  rC Clang

https://reviews.llvm.org/D44765

Files:
  include/clang/Format/Format.h
  lib/Format/ContinuationIndenter.cpp
  lib/Format/Format.cpp
  lib/Format/FormatToken.h
  lib/Format/FormatTokenLexer.cpp
  unittests/Format/FormatTest.cpp

Index: unittests/Format/FormatTest.cpp
===
--- unittests/Format/FormatTest.cpp
+++ unittests/Format/FormatTest.cpp
@@ -7935,6 +7935,48 @@
"_T(\"Xn\"));"));
 }
 
+TEST_F(FormatTest, BreaksStringLiteralsWithin_GenericTMacro) {
+  FormatStyle Style = getLLVMStyleWithColumns(25);
+  Style.TMacros.push_back("blablaT");
+  EXPECT_EQ(
+  "blablaT(\"aa\")\n"
+  "blablaT(\"aa\")\n"
+  "blablaT(\"\")",
+  format("  blablaT(\"\")", Style));
+  EXPECT_EQ("f(x,\n"
+"  blablaT(\"\")\n"
+"  blablaT(\"aaa\"),\n"
+"  z);",
+format("f(x, blablaT(\"aaa\"), z);", Style));
+
+  // FIXME: Handle embedded spaces in one iteration.
+  //  EXPECT_EQ("blablaT(\"a\")\n"
+  //"blablaT(\"a\")\n"
+  //"blablaT(\"a\")\n"
+  //"blablaT(\"a\")",
+  //format("  blablaT ( \"\" )",
+  //   getLLVMStyleWithColumns(20)));
+  EXPECT_EQ(
+  "blablaT ( \"\" )",
+  format("  blablaT ( \"\" )", Style));
+  EXPECT_EQ("f(\n"
+"#if !TEST\n"
+"blablaT(\"Xn\")\n"
+"#endif\n"
+");",
+format("f(\n"
+   "#if !TEST\n"
+   "blablaT(\"Xn\")\n"
+   "#endif\n"
+   ");"));
+  EXPECT_EQ("f(\n"
+"\n"
+"blablaT(\"Xn\"));",
+format("f(\n"
+   "\n"
+   "blablaT(\"Xn\"));"));
+}
+
 TEST_F(FormatTest, BreaksStringLiteralOperands) {
   // In a function call with two operands, the second can be broken with no line
   // break before it.
@@ -10647,6 +10689,17 @@
   "  - 'CPPEVAL'\n"
   "CanonicalDelimiter: 'cc'",
   RawStringFormats, ExpectedRawStringFormats);
+
+  // FIXME: This is required because parsing a configuration simply overwrites
+  // the first N elements of the list instead of resetting it.
+  Style.TMacros.clear();
+  std::vector TestTMarco;
+  TestTMarco.push_back("_T");
+  CHECK_PARSE("TMacros: [_T]", TMacros, TestTMarco);
+  std::vector TestTMarcoAndMyT;
+  TestTMarcoAndMyT.push_back("_T");
+  TestTMarcoAndMyT.push_back("myT");
+  CHECK_PARSE("TMacros: [_T, myT]", TMacros, TestTMarcoAndMyT);
 }
 
 TEST_F(FormatTest, ParsesConfigurationWithLanguages) {
Index: lib/Format/FormatTokenLexer.cpp
===
--- lib/Format/FormatTokenLexer.cpp
+++ lib/Format/FormatTokenLexer.cpp
@@ -368,7 +368,8 @@
 return false;
 
   FormatToken *Macro = Tokens[Tokens.size() - 4];
-  if (Macro->TokenText != "_T")
+  if (std::find(Style.TMacros.begin(), Style.TMacros.end(), Macro->TokenText) ==
+  Style.TMacros.end())
 return false;
 
   const char *Start = Macro->TokenText.data();
@@ -382,6 +383,7 @@
   String->TokenText, String->OriginalColumn, Style.TabWidth, Encoding);
   String->NewlinesBefore = Macro->NewlinesBefore;
   String->HasUnescapedNewline = Macro->HasUnescapedNewline;
+  String->TMacroStringLiteral = true;
 
   Tokens.pop_back();
   Tokens.pop_back();
Index: lib/Format/FormatToken.h
===
--- lib/Format/FormatToken.h
+++ lib/Format/FormatToken.h
@@ -134,6 +134,9 @@
   /// Token.
   bool HasUnescapedNewline = false;
 
+  /// \brief Whether this is a string literal similar to _T("sdfsdf").
+  bool TMacroStringLiteral = false;
+
   /// \brief The range of the whitespace immediately preceding the \c Token.
   SourceRange WhitespaceRange;
 
Index: lib/Format/Format.cpp
===
--- lib/Format/Format.cpp
+++ lib/Format/Format.cpp
@@ -430,6 +430,7 @@
 IO.mapOptional("SpacesInSquareBrackets", Style.SpacesInSquareBrackets);
 IO.mapOptional("Standard", Style.Standard);
 IO.mapOptional("TabWidth", Style.TabWidth);
+IO.mapOptional("TMacros", Style.TMacros);
 IO.mapOptional("UseTab", Style.UseTab);
   }
 };
@@ -686,6 +687,7 @@
   LLVMStyle.DisableFormat = false;
   LLVMStyle.SortIncludes = true;
   LLVMStyle.SortUsingDeclarations = true;
+  

[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos

2018-03-23 Thread Teodor Petrov via Phabricator via cfe-commits
obfuscated updated this revision to Diff 139686.
obfuscated added a comment.

Added an option


Repository:
  rC Clang

https://reviews.llvm.org/D44765

Files:
  include/clang/Format/Format.h
  lib/Format/Format.cpp
  lib/Format/FormatTokenLexer.cpp
  unittests/Format/FormatTest.cpp


Index: unittests/Format/FormatTest.cpp
===
--- unittests/Format/FormatTest.cpp
+++ unittests/Format/FormatTest.cpp
@@ -7937,6 +7937,7 @@
 
 TEST_F(FormatTest, BreaksStringLiteralsWithin_GenericTMacro) {
   FormatStyle Style = getLLVMStyleWithColumns(25);
+  Style.TMacros.push_back("blablaT");
   EXPECT_EQ(
   "blablaT(\"aa\")\n"
   "blablaT(\"aa\")\n"
@@ -10688,6 +10689,17 @@
   "  - 'CPPEVAL'\n"
   "CanonicalDelimiter: 'cc'",
   RawStringFormats, ExpectedRawStringFormats);
+
+  // FIXME: This is required because parsing a configuration simply overwrites
+  // the first N elements of the list instead of resetting it.
+  Style.TMacros.clear();
+  std::vector TestTMarco;
+  TestTMarco.push_back("_T");
+  CHECK_PARSE("TMacros: [_T]", TMacros, TestTMarco);
+  std::vector TestTMarcoAndMyT;
+  TestTMarcoAndMyT.push_back("_T");
+  TestTMarcoAndMyT.push_back("myT");
+  CHECK_PARSE("TMacros: [_T, myT]", TMacros, TestTMarcoAndMyT);
 }
 
 TEST_F(FormatTest, ParsesConfigurationWithLanguages) {
Index: lib/Format/FormatTokenLexer.cpp
===
--- lib/Format/FormatTokenLexer.cpp
+++ lib/Format/FormatTokenLexer.cpp
@@ -368,7 +368,8 @@
 return false;
 
   FormatToken *Macro = Tokens[Tokens.size() - 4];
-  if (Macro->TokenText.empty() || !String->TokenText.startswith("\"") || 
!String->TokenText.endswith("\""))
+  if (std::find(Style.TMacros.begin(), Style.TMacros.end(), Macro->TokenText) 
==
+  Style.TMacros.end())
 return false;
 
   const char *Start = Macro->TokenText.data();
Index: lib/Format/Format.cpp
===
--- lib/Format/Format.cpp
+++ lib/Format/Format.cpp
@@ -430,6 +430,7 @@
 IO.mapOptional("SpacesInSquareBrackets", Style.SpacesInSquareBrackets);
 IO.mapOptional("Standard", Style.Standard);
 IO.mapOptional("TabWidth", Style.TabWidth);
+IO.mapOptional("TMacros", Style.TMacros);
 IO.mapOptional("UseTab", Style.UseTab);
   }
 };
@@ -686,6 +687,7 @@
   LLVMStyle.DisableFormat = false;
   LLVMStyle.SortIncludes = true;
   LLVMStyle.SortUsingDeclarations = true;
+  LLVMStyle.TMacros.push_back("_T");
 
   return LLVMStyle;
 }
Index: include/clang/Format/Format.h
===
--- include/clang/Format/Format.h
+++ include/clang/Format/Format.h
@@ -1673,6 +1673,22 @@
   /// \brief The number of columns used for tab stops.
   unsigned TabWidth;
 
+  /// \brief A vector of macros that should be interpreted as string wrapping
+  /// macros instead of as function calls.
+  ///
+  /// These are expected to be macros of the form:
+  /// \code
+  ///   _T("...some string...")
+  /// \endcode
+  ///
+  /// In the .clang-format configuration file, this can be configured like:
+  /// \code{.yaml}
+  ///   TMarcos: ['_T', 'myT']
+  /// \endcode
+  ///
+  /// For example: _T.
+  std::vector TMacros;
+
   /// \brief Different ways to use tab in formatting.
   enum UseTabStyle {
 /// Never use tab.
@@ -1781,7 +1797,7 @@
SpacesInParentheses == R.SpacesInParentheses &&
SpacesInSquareBrackets == R.SpacesInSquareBrackets &&
Standard == R.Standard && TabWidth == R.TabWidth &&
-   UseTab == R.UseTab;
+   TMacros == R.TMacros && UseTab == R.UseTab;
   }
 
   llvm::Optional GetLanguageStyle(LanguageKind Language) const;


Index: unittests/Format/FormatTest.cpp
===
--- unittests/Format/FormatTest.cpp
+++ unittests/Format/FormatTest.cpp
@@ -7937,6 +7937,7 @@
 
 TEST_F(FormatTest, BreaksStringLiteralsWithin_GenericTMacro) {
   FormatStyle Style = getLLVMStyleWithColumns(25);
+  Style.TMacros.push_back("blablaT");
   EXPECT_EQ(
   "blablaT(\"aa\")\n"
   "blablaT(\"aa\")\n"
@@ -10688,6 +10689,17 @@
   "  - 'CPPEVAL'\n"
   "CanonicalDelimiter: 'cc'",
   RawStringFormats, ExpectedRawStringFormats);
+
+  // FIXME: This is required because parsing a configuration simply overwrites
+  // the first N elements of the list instead of resetting it.
+  Style.TMacros.clear();
+  std::vector TestTMarco;
+  TestTMarco.push_back("_T");
+  CHECK_PARSE("TMacros: [_T]", TMacros, TestTMarco);
+  std::vector TestTMarcoAndMyT;
+  TestTMarcoAndMyT.push_back("_T");
+  TestTMarcoAndMyT.push_back("myT");
+  CHECK_PARSE("TMacros: [_T, myT]", TMacros, TestTMarcoAndMyT);
 }
 
 TEST_F(FormatTest, ParsesConfigurationWithLanguages) {
Index: 

[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos

2018-03-23 Thread Alexander Kornienko via Phabricator via cfe-commits
alexfh added a comment.

In https://reviews.llvm.org/D44765#1045789, @obfuscated wrote:

> In https://reviews.llvm.org/D44765#1045373, @alexfh wrote:
>
> > We can't just treat `anything("")` like the _T macro. There should be a 
> > whitelist configurable with an option. By default only _T should be handled.
>
>
> What cases could break with this version of the patch?


It is fine to split `_T("veeery 
lng 
string")` as

  _T("veeery ")
  _T("lng ")
  _T("string")

and it's incorrect to split it like this:

  _T("veeery "
 "lng "
 "string")

since only the first string literal will get the L prefix in case when `_T(x)` 
is defined as `L ## x`. The same is valid for `wxT()` as I understand.

However, it's wrong to do so for any function or any macro which uses its 
argument in any way different from that of _T. E.g. `strlen("aaa bbb")` is not 
the same as

  strlen("aaa ")
  strlen("bbb")



> I'm fine adding an option if this is deemed acceptable.

I'd say this should be an option.


Repository:
  rC Clang

https://reviews.llvm.org/D44765



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos

2018-03-22 Thread Teodor Petrov via Phabricator via cfe-commits
obfuscated added a comment.

In https://reviews.llvm.org/D44765#1045373, @alexfh wrote:

> We can't just treat `anything("")` like the _T macro. There should be a 
> whitelist configurable with an option. By default only _T should be handled.


What cases could break with this version of the patch?
Or you think it is just too dangerous to have this in such generic form?
I'm fine adding an option if this is deemed acceptable.


Repository:
  rC Clang

https://reviews.llvm.org/D44765



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos

2018-03-22 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

You are right, I did not mean it would help with the handling of these macros; 
but it may be related on the configuration-side of things : adding an option 
for listing these macros may hit the same limitation, and the same mean of 
storing (in the code) the list of these macros may be used.


Repository:
  rC Clang

https://reviews.llvm.org/D44765



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos

2018-03-22 Thread Alexander Kornienko via Phabricator via cfe-commits
alexfh added a comment.

In https://reviews.llvm.org/D44765#1045394, @Typz wrote:

> A generic (or at least extandable) approach to specifying macro behaviors was 
> introduced here: https://reviews.llvm.org/D33440


I believe, that patch solves a significantly different problem and it won't 
make it much easier to implement correct handling of _T(x)-like macros (which 
expand to either x or L ## x depending on some other macro and thus have to be 
repeated for each fragment of the string literal after splitting).

The most extensible solution I see is to make the list of _T-like macro 
spellings configurable via an option.


Repository:
  rC Clang

https://reviews.llvm.org/D44765



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos

2018-03-22 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

A generic (or at least extandable) approach to specifying macro behaviors was 
introduced here: https://reviews.llvm.org/D33440

But this change seem to be on hold, as a configuration based on providing 
"macro definitions" seem to be preferable and kind-of in the pipe, though with 
no defined timeline...


Repository:
  rC Clang

https://reviews.llvm.org/D44765



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos

2018-03-22 Thread Alexander Kornienko via Phabricator via cfe-commits
alexfh requested changes to this revision.
alexfh added a comment.
This revision now requires changes to proceed.

We can't just treat `anything("")` like the _T macro. There should be a 
whitelist configurable with an option. By default only _T should be handled.


Repository:
  rC Clang

https://reviews.llvm.org/D44765



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos

2018-03-21 Thread Teodor Petrov via Phabricator via cfe-commits
obfuscated created this revision.
obfuscated added a reviewer: krasimir.
Herald added a subscriber: klimek.

This patch changes the check for _T to detect TMarcos with a more generic check.
This makes it possible to format wxWidgets projects (where wxT is used a lot) 
correctly.

Patch by Teodor Petrov


Repository:
  rC Clang

https://reviews.llvm.org/D44765

Files:
  lib/Format/ContinuationIndenter.cpp
  lib/Format/FormatToken.h
  lib/Format/FormatTokenLexer.cpp
  unittests/Format/FormatTest.cpp

Index: unittests/Format/FormatTest.cpp
===
--- unittests/Format/FormatTest.cpp
+++ unittests/Format/FormatTest.cpp
@@ -7935,6 +7935,47 @@
"_T(\"Xn\"));"));
 }
 
+TEST_F(FormatTest, BreaksStringLiteralsWithin_GenericTMacro) {
+  FormatStyle Style = getLLVMStyleWithColumns(25);
+  EXPECT_EQ(
+  "blablaT(\"aa\")\n"
+  "blablaT(\"aa\")\n"
+  "blablaT(\"\")",
+  format("  blablaT(\"\")", Style));
+  EXPECT_EQ("f(x,\n"
+"  blablaT(\"\")\n"
+"  blablaT(\"aaa\"),\n"
+"  z);",
+format("f(x, blablaT(\"aaa\"), z);", Style));
+
+  // FIXME: Handle embedded spaces in one iteration.
+  //  EXPECT_EQ("blablaT(\"a\")\n"
+  //"blablaT(\"a\")\n"
+  //"blablaT(\"a\")\n"
+  //"blablaT(\"a\")",
+  //format("  blablaT ( \"\" )",
+  //   getLLVMStyleWithColumns(20)));
+  EXPECT_EQ(
+  "blablaT ( \"\" )",
+  format("  blablaT ( \"\" )", Style));
+  EXPECT_EQ("f(\n"
+"#if !TEST\n"
+"blablaT(\"Xn\")\n"
+"#endif\n"
+");",
+format("f(\n"
+   "#if !TEST\n"
+   "blablaT(\"Xn\")\n"
+   "#endif\n"
+   ");"));
+  EXPECT_EQ("f(\n"
+"\n"
+"blablaT(\"Xn\"));",
+format("f(\n"
+   "\n"
+   "blablaT(\"Xn\"));"));
+}
+
 TEST_F(FormatTest, BreaksStringLiteralOperands) {
   // In a function call with two operands, the second can be broken with no line
   // break before it.
Index: lib/Format/FormatTokenLexer.cpp
===
--- lib/Format/FormatTokenLexer.cpp
+++ lib/Format/FormatTokenLexer.cpp
@@ -368,7 +368,7 @@
 return false;
 
   FormatToken *Macro = Tokens[Tokens.size() - 4];
-  if (Macro->TokenText != "_T")
+  if (Macro->TokenText.empty() || !String->TokenText.startswith("\"") || !String->TokenText.endswith("\""))
 return false;
 
   const char *Start = Macro->TokenText.data();
@@ -382,6 +382,7 @@
   String->TokenText, String->OriginalColumn, Style.TabWidth, Encoding);
   String->NewlinesBefore = Macro->NewlinesBefore;
   String->HasUnescapedNewline = Macro->HasUnescapedNewline;
+  String->TMacroStringLiteral = true;
 
   Tokens.pop_back();
   Tokens.pop_back();
Index: lib/Format/FormatToken.h
===
--- lib/Format/FormatToken.h
+++ lib/Format/FormatToken.h
@@ -134,6 +134,9 @@
   /// Token.
   bool HasUnescapedNewline = false;
 
+  /// \brief Whether this is a string literal similar to _T("sdfsdf").
+  bool TMacroStringLiteral = false;
+
   /// \brief The range of the whitespace immediately preceding the \c Token.
   SourceRange WhitespaceRange;
 
Index: lib/Format/ContinuationIndenter.cpp
===
--- lib/Format/ContinuationIndenter.cpp
+++ lib/Format/ContinuationIndenter.cpp
@@ -1599,12 +1599,10 @@
 // FIXME: Store Prefix and Suffix (or PrefixLength and SuffixLength to
 // reduce the overhead) for each FormatToken, which is a string, so that we
 // don't run multiple checks here on the hot path.
-if ((Text.endswith(Postfix = "\"") &&
- (Text.startswith(Prefix = "@\"") || Text.startswith(Prefix = "\"") ||
-  Text.startswith(Prefix = "u\"") || Text.startswith(Prefix = "U\"") ||
-  Text.startswith(Prefix = "u8\"") ||
-  Text.startswith(Prefix = "L\""))) ||
-(Text.startswith(Prefix = "_T(\"") && Text.endswith(Postfix = "\")"))) {
+if (Text.endswith(Postfix = "\"") &&
+(Text.startswith(Prefix = "@\"") || Text.startswith(Prefix = "\"") ||
+ Text.startswith(Prefix = "u\"") || Text.startswith(Prefix = "U\"") ||
+ Text.startswith(Prefix = "u8\"") || Text.startswith(Prefix = "L\""))) {
   // We need this to address the case where there is an unbreakable tail
   // only if certain