[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos
alexfh added inline comments. Comment at: lib/Format/FormatTokenLexer.cpp:371 FormatToken *Macro = Tokens[Tokens.size() - 4]; - if (Macro->TokenText != "_T") + if (std::find(Style.TMacros.begin(), Style.TMacros.end(), Macro->TokenText) == + Style.TMacros.end()) obfuscated wrote: > alexfh wrote: > > Consider using llvm::find, which is a range adaptor for std::find. > No idea what is range adaptor, but I can probably switch to it... :) It works with a range (something that has .begin() and .end()) instead of two iterators. Comment at: lib/Format/FormatTokenLexer.cpp:386 String->HasUnescapedNewline = Macro->HasUnescapedNewline; + String->TMacroStringLiteral = true; obfuscated wrote: > alexfh wrote: > > obfuscated wrote: > > > krasimir wrote: > > > > In the original code, TMacro detection was done as: > > > > ``` > > > > (Text.startswith(Prefix = "_T(\"") && Text.endswith(Postfix = "\")")) > > > > ``` > > > > In the new code the left part is saved in TMacroStringLiteral, and the > > > > right part is checked in ContinuationIndenter. Could you keep them > > > > together in `FormatTokenLexer`. > > > > @alexfh, why are we checking for the right check at all? What would be > > > > an example where this is needed to disambiguate? > > > > > > > Are you talking about the code in > > > ContinuationIndenter::createBreakableToken? > > > > > > I don't think I understand how the hole thing works. > > > Using the debugger I can see that this function is executed first and > > > then createBreakableToken. > > > So we first detect the tmacro in FormatTokenLexer::tryMerge_TMacro and > > > then use this information in the createBreakableToken to do something > > > with it. > > > > > > So when I get a TMacroStringLiteral==true in createBreakableToken I know > > > that the token starts with a TMacro and there is no need to use some king > > > of startswith calls. Probably the endswith call is redundant and I can do > > > just a string search backwards... > > > In the new code the left part is saved in TMacroStringLiteral, and the > > > right part is checked in ContinuationIndenter. Could you keep them > > > together in FormatTokenLexer. > > > > +1 to keeping these conditions together. > > > > > @alexfh, why are we checking for the right check at all? What would be an > > > example where this is needed to disambiguate? > > > > I don't know whether there's reasonable code that would be handled > > incorrectly without checking for the `")`, but I'm sure some test case can > > be generated, that would break clang-format badly without this condition. > > It also serves as a documentation (similar to asserts). > >> In the new code the left part is saved in TMacroStringLiteral, and the > >> right part is checked in ContinuationIndenter. Could you keep them > >> together in FormatTokenLexer. > > > > +1 to keeping these conditions together. > > Can you please explain what you mean here? > And what am I supposed to do, because I don't know how to put these > conditions together. > Do you want me to search the tmacro vector in > ContinuationIndenter::createBreakableToken instead of checking the bool flag? > > > Looking at this code again, I see what the checks for `_T("` and `")` were needed for. If there was a space between `_T(` and `"` or between `"` and `)`, the code in `createBreakableToken()` would probably not work correctly (see the FIXME). `tryMerge_TMacro()` looks at tokens and doesn't care about the whitespace, so checking for `TMacroStringLiteral` is not equivalent to checking for `("` and `")`. In particular, if you only remove just the first check, a test case like `_T ( "...")` may start being formatted in some way. We could remove the second check as well, if we manage to make formatting decent. Otherwise, performing these checks (that there are no spaces between `_T`, `(`, `"..."` and `)`) in `tryMerge_TMacro()` may be a better option than doing this in `createBreakableToken()`. https://reviews.llvm.org/D44765 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos
obfuscated marked an inline comment as done. obfuscated added a comment. Ping? https://reviews.llvm.org/D44765 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos
obfuscated marked 3 inline comments as done. obfuscated added inline comments. Comment at: lib/Format/FormatTokenLexer.cpp:371 FormatToken *Macro = Tokens[Tokens.size() - 4]; - if (Macro->TokenText != "_T") + if (std::find(Style.TMacros.begin(), Style.TMacros.end(), Macro->TokenText) == + Style.TMacros.end()) alexfh wrote: > Consider using llvm::find, which is a range adaptor for std::find. No idea what is range adaptor, but I can probably switch to it... :) Comment at: lib/Format/FormatTokenLexer.cpp:386 String->HasUnescapedNewline = Macro->HasUnescapedNewline; + String->TMacroStringLiteral = true; alexfh wrote: > obfuscated wrote: > > krasimir wrote: > > > In the original code, TMacro detection was done as: > > > ``` > > > (Text.startswith(Prefix = "_T(\"") && Text.endswith(Postfix = "\")")) > > > ``` > > > In the new code the left part is saved in TMacroStringLiteral, and the > > > right part is checked in ContinuationIndenter. Could you keep them > > > together in `FormatTokenLexer`. > > > @alexfh, why are we checking for the right check at all? What would be an > > > example where this is needed to disambiguate? > > > > > Are you talking about the code in > > ContinuationIndenter::createBreakableToken? > > > > I don't think I understand how the hole thing works. > > Using the debugger I can see that this function is executed first and then > > createBreakableToken. > > So we first detect the tmacro in FormatTokenLexer::tryMerge_TMacro and then > > use this information in the createBreakableToken to do something with it. > > > > So when I get a TMacroStringLiteral==true in createBreakableToken I know > > that the token starts with a TMacro and there is no need to use some king > > of startswith calls. Probably the endswith call is redundant and I can do > > just a string search backwards... > > In the new code the left part is saved in TMacroStringLiteral, and the > > right part is checked in ContinuationIndenter. Could you keep them together > > in FormatTokenLexer. > > +1 to keeping these conditions together. > > > @alexfh, why are we checking for the right check at all? What would be an > > example where this is needed to disambiguate? > > I don't know whether there's reasonable code that would be handled > incorrectly without checking for the `")`, but I'm sure some test case can be > generated, that would break clang-format badly without this condition. It > also serves as a documentation (similar to asserts). >> In the new code the left part is saved in TMacroStringLiteral, and the >> right part is checked in ContinuationIndenter. Could you keep them together >> in FormatTokenLexer. > > +1 to keeping these conditions together. Can you please explain what you mean here? And what am I supposed to do, because I don't know how to put these conditions together. Do you want me to search the tmacro vector in ContinuationIndenter::createBreakableToken instead of checking the bool flag? https://reviews.llvm.org/D44765 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos
alexfh added inline comments. Comment at: include/clang/Format/Format.h:1676 + /// \brief A vector of macros that should be interpreted as macros expanding + /// to a string literal encoding prefix instead of as function calls. "A list of macro names"? Comment at: include/clang/Format/Format.h:1686-1689 + /// These are expected to be macros of the form: + /// \code + /// _T("...some string...") + /// \endcode According to the description above, the option contains "a vector of macros", and here it says "these are expected to be macros of the form ...". This all is rather confusing. The above description should refer to names of macros, and this example should be made clearer in what it demonstrates. Comment at: lib/Format/FormatTokenLexer.cpp:371 FormatToken *Macro = Tokens[Tokens.size() - 4]; - if (Macro->TokenText != "_T") + if (std::find(Style.TMacros.begin(), Style.TMacros.end(), Macro->TokenText) == + Style.TMacros.end()) Consider using llvm::find, which is a range adaptor for std::find. Comment at: lib/Format/FormatTokenLexer.cpp:386 String->HasUnescapedNewline = Macro->HasUnescapedNewline; + String->TMacroStringLiteral = true; obfuscated wrote: > krasimir wrote: > > In the original code, TMacro detection was done as: > > ``` > > (Text.startswith(Prefix = "_T(\"") && Text.endswith(Postfix = "\")")) > > ``` > > In the new code the left part is saved in TMacroStringLiteral, and the > > right part is checked in ContinuationIndenter. Could you keep them together > > in `FormatTokenLexer`. > > @alexfh, why are we checking for the right check at all? What would be an > > example where this is needed to disambiguate? > > > Are you talking about the code in ContinuationIndenter::createBreakableToken? > > I don't think I understand how the hole thing works. > Using the debugger I can see that this function is executed first and then > createBreakableToken. > So we first detect the tmacro in FormatTokenLexer::tryMerge_TMacro and then > use this information in the createBreakableToken to do something with it. > > So when I get a TMacroStringLiteral==true in createBreakableToken I know that > the token starts with a TMacro and there is no need to use some king of > startswith calls. Probably the endswith call is redundant and I can do just a > string search backwards... > In the new code the left part is saved in TMacroStringLiteral, and the right > part is checked in ContinuationIndenter. Could you keep them together in > FormatTokenLexer. +1 to keeping these conditions together. > @alexfh, why are we checking for the right check at all? What would be an > example where this is needed to disambiguate? I don't know whether there's reasonable code that would be handled incorrectly without checking for the `")`, but I'm sure some test case can be generated, that would break clang-format badly without this condition. It also serves as a documentation (similar to asserts). https://reviews.llvm.org/D44765 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos
obfuscated added inline comments. Comment at: lib/Format/FormatTokenLexer.cpp:386 String->HasUnescapedNewline = Macro->HasUnescapedNewline; + String->TMacroStringLiteral = true; krasimir wrote: > In the original code, TMacro detection was done as: > ``` > (Text.startswith(Prefix = "_T(\"") && Text.endswith(Postfix = "\")")) > ``` > In the new code the left part is saved in TMacroStringLiteral, and the right > part is checked in ContinuationIndenter. Could you keep them together in > `FormatTokenLexer`. > @alexfh, why are we checking for the right check at all? What would be an > example where this is needed to disambiguate? > Are you talking about the code in ContinuationIndenter::createBreakableToken? I don't think I understand how the hole thing works. Using the debugger I can see that this function is executed first and then createBreakableToken. So we first detect the tmacro in FormatTokenLexer::tryMerge_TMacro and then use this information in the createBreakableToken to do something with it. So when I get a TMacroStringLiteral==true in createBreakableToken I know that the token starts with a TMacro and there is no need to use some king of startswith calls. Probably the endswith call is redundant and I can do just a string search backwards... https://reviews.llvm.org/D44765 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos
krasimir added inline comments. Comment at: lib/Format/FormatTokenLexer.cpp:386 String->HasUnescapedNewline = Macro->HasUnescapedNewline; + String->TMacroStringLiteral = true; In the original code, TMacro detection was done as: ``` (Text.startswith(Prefix = "_T(\"") && Text.endswith(Postfix = "\")")) ``` In the new code the left part is saved in TMacroStringLiteral, and the right part is checked in ContinuationIndenter. Could you keep them together in `FormatTokenLexer`. @alexfh, why are we checking for the right check at all? What would be an example where this is needed to disambiguate? https://reviews.llvm.org/D44765 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos
obfuscated updated this revision to Diff 140191. obfuscated marked an inline comment as done. obfuscated added a comment. Fixed tests. Fixed description. https://reviews.llvm.org/D44765 Files: include/clang/Format/Format.h lib/Format/ContinuationIndenter.cpp lib/Format/Format.cpp lib/Format/FormatToken.h lib/Format/FormatTokenLexer.cpp unittests/Format/FormatTest.cpp Index: unittests/Format/FormatTest.cpp === --- unittests/Format/FormatTest.cpp +++ unittests/Format/FormatTest.cpp @@ -7935,6 +7935,48 @@ "_T(\"Xn\"));")); } +TEST_F(FormatTest, BreaksStringLiteralsWithin_GenericTMacro) { + FormatStyle Style = getLLVMStyleWithColumns(25); + Style.TMacros.push_back("blablaT"); + EXPECT_EQ( + "blablaT(\"aa\")\n" + "blablaT(\"aa\")\n" + "blablaT(\"\")", + format(" blablaT(\"\")", Style)); + EXPECT_EQ("f(x,\n" +" blablaT(\"\")\n" +" blablaT(\"aaa\"),\n" +" z);", +format("f(x, blablaT(\"aaa\"), z);", Style)); + + // FIXME: Handle embedded spaces in one iteration. + // EXPECT_EQ("blablaT(\"a\")\n" + //"blablaT(\"a\")\n" + //"blablaT(\"a\")\n" + //"blablaT(\"a\")", + //format(" blablaT ( \"\" )", + // getLLVMStyleWithColumns(20))); + EXPECT_EQ( + "blablaT ( \"\" )", + format(" blablaT ( \"\" )", Style)); + EXPECT_EQ("f(\n" +"#if !TEST\n" +"blablaT(\"Xn\")\n" +"#endif\n" +");", +format("f(\n" + "#if !TEST\n" + "blablaT(\"Xn\")\n" + "#endif\n" + ");")); + EXPECT_EQ("f(\n" +"\n" +"blablaT(\"Xn\"));", +format("f(\n" + "\n" + "blablaT(\"Xn\"));")); +} + TEST_F(FormatTest, BreaksStringLiteralOperands) { // In a function call with two operands, the second can be broken with no line // break before it. @@ -10647,6 +10689,10 @@ " - 'CPPEVAL'\n" "CanonicalDelimiter: 'cc'", RawStringFormats, ExpectedRawStringFormats); + + Style.TMacros.clear(); + std::vector ExpectedTMacros = {"_T", "myT"}; + CHECK_PARSE("TMacros: [_T, myT]", TMacros, ExpectedTMacros); } TEST_F(FormatTest, ParsesConfigurationWithLanguages) { Index: lib/Format/FormatTokenLexer.cpp === --- lib/Format/FormatTokenLexer.cpp +++ lib/Format/FormatTokenLexer.cpp @@ -368,7 +368,8 @@ return false; FormatToken *Macro = Tokens[Tokens.size() - 4]; - if (Macro->TokenText != "_T") + if (std::find(Style.TMacros.begin(), Style.TMacros.end(), Macro->TokenText) == + Style.TMacros.end()) return false; const char *Start = Macro->TokenText.data(); @@ -382,6 +383,7 @@ String->TokenText, String->OriginalColumn, Style.TabWidth, Encoding); String->NewlinesBefore = Macro->NewlinesBefore; String->HasUnescapedNewline = Macro->HasUnescapedNewline; + String->TMacroStringLiteral = true; Tokens.pop_back(); Tokens.pop_back(); Index: lib/Format/FormatToken.h === --- lib/Format/FormatToken.h +++ lib/Format/FormatToken.h @@ -134,6 +134,9 @@ /// Token. bool HasUnescapedNewline = false; + /// \brief Whether this is a string literal similar to _T("sdfsdf"). + bool TMacroStringLiteral = false; + /// \brief The range of the whitespace immediately preceding the \c Token. SourceRange WhitespaceRange; Index: lib/Format/Format.cpp === --- lib/Format/Format.cpp +++ lib/Format/Format.cpp @@ -430,6 +430,7 @@ IO.mapOptional("SpacesInSquareBrackets", Style.SpacesInSquareBrackets); IO.mapOptional("Standard", Style.Standard); IO.mapOptional("TabWidth", Style.TabWidth); +IO.mapOptional("TMacros", Style.TMacros); IO.mapOptional("UseTab", Style.UseTab); } }; @@ -686,6 +687,7 @@ LLVMStyle.DisableFormat = false; LLVMStyle.SortIncludes = true; LLVMStyle.SortUsingDeclarations = true; + LLVMStyle.TMacros.push_back("_T"); return LLVMStyle; } Index: lib/Format/ContinuationIndenter.cpp === --- lib/Format/ContinuationIndenter.cpp +++ lib/Format/ContinuationIndenter.cpp @@ -1599,12 +1599,10 @@ // FIXME: Store Prefix and Suffix (or
[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos
obfuscated marked an inline comment as done. obfuscated added inline comments. Comment at: lib/Format/FormatToken.h:138 + /// \brief Whether this is a string literal similar to _T("sdfsdf"). + bool TMacroStringLiteral = false; + krasimir wrote: > We don't strictly need this new field. We could do as in the old > implementation and check if the string prefix is from a T macro in > ContinuationIndenter. Using a flag is more reliable and faster - the checks are done once, so they don't have to be repeated. From the field layout in the struct and from the usage of bools I could only conclude that this is not a memory nor performance critical part of the code. Comment at: unittests/Format/FormatTest.cpp:10693 + + // FIXME: This is required because parsing a configuration simply overwrites + // the first N elements of the list instead of resetting it. krasimir wrote: > Why is the `FIXME` here? I suggest just use the pattern similar to the other > cases here and just keep the test with 2 elements: > ``` > Style.TMacros.clear(); > std::vector ExpectedTMacros = {"_T", "myT"}; > CHECK_PARSE("TMacros: [_T, myT]", TMacros, ExpectedTMacros); > ``` I've copy pasted this from the foreach macro option. I've not investigate why this fixme is there. Repository: rC Clang https://reviews.llvm.org/D44765 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos
krasimir added a comment. I don't have preferences over names, but I agree with Alex that the option should have more detailed description. Comment at: lib/Format/FormatToken.h:138 + /// \brief Whether this is a string literal similar to _T("sdfsdf"). + bool TMacroStringLiteral = false; + We don't strictly need this new field. We could do as in the old implementation and check if the string prefix is from a T macro in ContinuationIndenter. Comment at: unittests/Format/FormatTest.cpp:10693 + + // FIXME: This is required because parsing a configuration simply overwrites + // the first N elements of the list instead of resetting it. Why is the `FIXME` here? I suggest just use the pattern similar to the other cases here and just keep the test with 2 elements: ``` Style.TMacros.clear(); std::vector ExpectedTMacros = {"_T", "myT"}; CHECK_PARSE("TMacros: [_T, myT]", TMacros, ExpectedTMacros); ``` Repository: rC Clang https://reviews.llvm.org/D44765 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos
alexfh requested changes to this revision. alexfh added a comment. This revision now requires changes to proceed. Thank you! One more comment from me and I'll leave the rest of the review to Krasimir, who has a better idea of how the configuration options should be named etc. Comment at: include/clang/Format/Format.h:1676-1677 + /// \brief A vector of macros that should be interpreted as string wrapping + /// macros instead of as function calls. + /// I'm not sure "string wrapping macros" is the right term or can be easily understood by the reader of the code. The relevant MSDN page is talking about "generic-text mapping". I'm not sure this is a commonly used term either, but at least there's a precedent. Another relevant term is "encoding prefix". With these in mind, I could suggest following descriptions: "generic-text mapping macros", "string literal encoding prefix macros", "macros expanding to a string literal encoding prefix". I'd also expand this a bit explaining where these macros are coming from and how they are special. Something along the lines of: "Some libraries provide macro(s) to change the encoding prefix of string literals depending on configuration, for example, `_T()` macro on Microsoft platforms. When splitting string literals, the macro should be applied to each fragment of the literal to apply the same encoding prefix to all of them, which requires special treatment from clang-format. This option lists the names of these special macros." Repository: rC Clang https://reviews.llvm.org/D44765 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos
obfuscated updated this revision to Diff 139687. obfuscated added a comment. Attached both commits in a single diff... Repository: rC Clang https://reviews.llvm.org/D44765 Files: include/clang/Format/Format.h lib/Format/ContinuationIndenter.cpp lib/Format/Format.cpp lib/Format/FormatToken.h lib/Format/FormatTokenLexer.cpp unittests/Format/FormatTest.cpp Index: unittests/Format/FormatTest.cpp === --- unittests/Format/FormatTest.cpp +++ unittests/Format/FormatTest.cpp @@ -7935,6 +7935,48 @@ "_T(\"Xn\"));")); } +TEST_F(FormatTest, BreaksStringLiteralsWithin_GenericTMacro) { + FormatStyle Style = getLLVMStyleWithColumns(25); + Style.TMacros.push_back("blablaT"); + EXPECT_EQ( + "blablaT(\"aa\")\n" + "blablaT(\"aa\")\n" + "blablaT(\"\")", + format(" blablaT(\"\")", Style)); + EXPECT_EQ("f(x,\n" +" blablaT(\"\")\n" +" blablaT(\"aaa\"),\n" +" z);", +format("f(x, blablaT(\"aaa\"), z);", Style)); + + // FIXME: Handle embedded spaces in one iteration. + // EXPECT_EQ("blablaT(\"a\")\n" + //"blablaT(\"a\")\n" + //"blablaT(\"a\")\n" + //"blablaT(\"a\")", + //format(" blablaT ( \"\" )", + // getLLVMStyleWithColumns(20))); + EXPECT_EQ( + "blablaT ( \"\" )", + format(" blablaT ( \"\" )", Style)); + EXPECT_EQ("f(\n" +"#if !TEST\n" +"blablaT(\"Xn\")\n" +"#endif\n" +");", +format("f(\n" + "#if !TEST\n" + "blablaT(\"Xn\")\n" + "#endif\n" + ");")); + EXPECT_EQ("f(\n" +"\n" +"blablaT(\"Xn\"));", +format("f(\n" + "\n" + "blablaT(\"Xn\"));")); +} + TEST_F(FormatTest, BreaksStringLiteralOperands) { // In a function call with two operands, the second can be broken with no line // break before it. @@ -10647,6 +10689,17 @@ " - 'CPPEVAL'\n" "CanonicalDelimiter: 'cc'", RawStringFormats, ExpectedRawStringFormats); + + // FIXME: This is required because parsing a configuration simply overwrites + // the first N elements of the list instead of resetting it. + Style.TMacros.clear(); + std::vector TestTMarco; + TestTMarco.push_back("_T"); + CHECK_PARSE("TMacros: [_T]", TMacros, TestTMarco); + std::vector TestTMarcoAndMyT; + TestTMarcoAndMyT.push_back("_T"); + TestTMarcoAndMyT.push_back("myT"); + CHECK_PARSE("TMacros: [_T, myT]", TMacros, TestTMarcoAndMyT); } TEST_F(FormatTest, ParsesConfigurationWithLanguages) { Index: lib/Format/FormatTokenLexer.cpp === --- lib/Format/FormatTokenLexer.cpp +++ lib/Format/FormatTokenLexer.cpp @@ -368,7 +368,8 @@ return false; FormatToken *Macro = Tokens[Tokens.size() - 4]; - if (Macro->TokenText != "_T") + if (std::find(Style.TMacros.begin(), Style.TMacros.end(), Macro->TokenText) == + Style.TMacros.end()) return false; const char *Start = Macro->TokenText.data(); @@ -382,6 +383,7 @@ String->TokenText, String->OriginalColumn, Style.TabWidth, Encoding); String->NewlinesBefore = Macro->NewlinesBefore; String->HasUnescapedNewline = Macro->HasUnescapedNewline; + String->TMacroStringLiteral = true; Tokens.pop_back(); Tokens.pop_back(); Index: lib/Format/FormatToken.h === --- lib/Format/FormatToken.h +++ lib/Format/FormatToken.h @@ -134,6 +134,9 @@ /// Token. bool HasUnescapedNewline = false; + /// \brief Whether this is a string literal similar to _T("sdfsdf"). + bool TMacroStringLiteral = false; + /// \brief The range of the whitespace immediately preceding the \c Token. SourceRange WhitespaceRange; Index: lib/Format/Format.cpp === --- lib/Format/Format.cpp +++ lib/Format/Format.cpp @@ -430,6 +430,7 @@ IO.mapOptional("SpacesInSquareBrackets", Style.SpacesInSquareBrackets); IO.mapOptional("Standard", Style.Standard); IO.mapOptional("TabWidth", Style.TabWidth); +IO.mapOptional("TMacros", Style.TMacros); IO.mapOptional("UseTab", Style.UseTab); } }; @@ -686,6 +687,7 @@ LLVMStyle.DisableFormat = false; LLVMStyle.SortIncludes = true; LLVMStyle.SortUsingDeclarations = true; +
[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos
obfuscated updated this revision to Diff 139686. obfuscated added a comment. Added an option Repository: rC Clang https://reviews.llvm.org/D44765 Files: include/clang/Format/Format.h lib/Format/Format.cpp lib/Format/FormatTokenLexer.cpp unittests/Format/FormatTest.cpp Index: unittests/Format/FormatTest.cpp === --- unittests/Format/FormatTest.cpp +++ unittests/Format/FormatTest.cpp @@ -7937,6 +7937,7 @@ TEST_F(FormatTest, BreaksStringLiteralsWithin_GenericTMacro) { FormatStyle Style = getLLVMStyleWithColumns(25); + Style.TMacros.push_back("blablaT"); EXPECT_EQ( "blablaT(\"aa\")\n" "blablaT(\"aa\")\n" @@ -10688,6 +10689,17 @@ " - 'CPPEVAL'\n" "CanonicalDelimiter: 'cc'", RawStringFormats, ExpectedRawStringFormats); + + // FIXME: This is required because parsing a configuration simply overwrites + // the first N elements of the list instead of resetting it. + Style.TMacros.clear(); + std::vector TestTMarco; + TestTMarco.push_back("_T"); + CHECK_PARSE("TMacros: [_T]", TMacros, TestTMarco); + std::vector TestTMarcoAndMyT; + TestTMarcoAndMyT.push_back("_T"); + TestTMarcoAndMyT.push_back("myT"); + CHECK_PARSE("TMacros: [_T, myT]", TMacros, TestTMarcoAndMyT); } TEST_F(FormatTest, ParsesConfigurationWithLanguages) { Index: lib/Format/FormatTokenLexer.cpp === --- lib/Format/FormatTokenLexer.cpp +++ lib/Format/FormatTokenLexer.cpp @@ -368,7 +368,8 @@ return false; FormatToken *Macro = Tokens[Tokens.size() - 4]; - if (Macro->TokenText.empty() || !String->TokenText.startswith("\"") || !String->TokenText.endswith("\"")) + if (std::find(Style.TMacros.begin(), Style.TMacros.end(), Macro->TokenText) == + Style.TMacros.end()) return false; const char *Start = Macro->TokenText.data(); Index: lib/Format/Format.cpp === --- lib/Format/Format.cpp +++ lib/Format/Format.cpp @@ -430,6 +430,7 @@ IO.mapOptional("SpacesInSquareBrackets", Style.SpacesInSquareBrackets); IO.mapOptional("Standard", Style.Standard); IO.mapOptional("TabWidth", Style.TabWidth); +IO.mapOptional("TMacros", Style.TMacros); IO.mapOptional("UseTab", Style.UseTab); } }; @@ -686,6 +687,7 @@ LLVMStyle.DisableFormat = false; LLVMStyle.SortIncludes = true; LLVMStyle.SortUsingDeclarations = true; + LLVMStyle.TMacros.push_back("_T"); return LLVMStyle; } Index: include/clang/Format/Format.h === --- include/clang/Format/Format.h +++ include/clang/Format/Format.h @@ -1673,6 +1673,22 @@ /// \brief The number of columns used for tab stops. unsigned TabWidth; + /// \brief A vector of macros that should be interpreted as string wrapping + /// macros instead of as function calls. + /// + /// These are expected to be macros of the form: + /// \code + /// _T("...some string...") + /// \endcode + /// + /// In the .clang-format configuration file, this can be configured like: + /// \code{.yaml} + /// TMarcos: ['_T', 'myT'] + /// \endcode + /// + /// For example: _T. + std::vector TMacros; + /// \brief Different ways to use tab in formatting. enum UseTabStyle { /// Never use tab. @@ -1781,7 +1797,7 @@ SpacesInParentheses == R.SpacesInParentheses && SpacesInSquareBrackets == R.SpacesInSquareBrackets && Standard == R.Standard && TabWidth == R.TabWidth && - UseTab == R.UseTab; + TMacros == R.TMacros && UseTab == R.UseTab; } llvm::Optional GetLanguageStyle(LanguageKind Language) const; Index: unittests/Format/FormatTest.cpp === --- unittests/Format/FormatTest.cpp +++ unittests/Format/FormatTest.cpp @@ -7937,6 +7937,7 @@ TEST_F(FormatTest, BreaksStringLiteralsWithin_GenericTMacro) { FormatStyle Style = getLLVMStyleWithColumns(25); + Style.TMacros.push_back("blablaT"); EXPECT_EQ( "blablaT(\"aa\")\n" "blablaT(\"aa\")\n" @@ -10688,6 +10689,17 @@ " - 'CPPEVAL'\n" "CanonicalDelimiter: 'cc'", RawStringFormats, ExpectedRawStringFormats); + + // FIXME: This is required because parsing a configuration simply overwrites + // the first N elements of the list instead of resetting it. + Style.TMacros.clear(); + std::vector TestTMarco; + TestTMarco.push_back("_T"); + CHECK_PARSE("TMacros: [_T]", TMacros, TestTMarco); + std::vector TestTMarcoAndMyT; + TestTMarcoAndMyT.push_back("_T"); + TestTMarcoAndMyT.push_back("myT"); + CHECK_PARSE("TMacros: [_T, myT]", TMacros, TestTMarcoAndMyT); } TEST_F(FormatTest, ParsesConfigurationWithLanguages) { Index:
[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos
alexfh added a comment. In https://reviews.llvm.org/D44765#1045789, @obfuscated wrote: > In https://reviews.llvm.org/D44765#1045373, @alexfh wrote: > > > We can't just treat `anything("")` like the _T macro. There should be a > > whitelist configurable with an option. By default only _T should be handled. > > > What cases could break with this version of the patch? It is fine to split `_T("veeery lng string")` as _T("veeery ") _T("lng ") _T("string") and it's incorrect to split it like this: _T("veeery " "lng " "string") since only the first string literal will get the L prefix in case when `_T(x)` is defined as `L ## x`. The same is valid for `wxT()` as I understand. However, it's wrong to do so for any function or any macro which uses its argument in any way different from that of _T. E.g. `strlen("aaa bbb")` is not the same as strlen("aaa ") strlen("bbb") > I'm fine adding an option if this is deemed acceptable. I'd say this should be an option. Repository: rC Clang https://reviews.llvm.org/D44765 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos
obfuscated added a comment. In https://reviews.llvm.org/D44765#1045373, @alexfh wrote: > We can't just treat `anything("")` like the _T macro. There should be a > whitelist configurable with an option. By default only _T should be handled. What cases could break with this version of the patch? Or you think it is just too dangerous to have this in such generic form? I'm fine adding an option if this is deemed acceptable. Repository: rC Clang https://reviews.llvm.org/D44765 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos
Typz added a comment. You are right, I did not mean it would help with the handling of these macros; but it may be related on the configuration-side of things : adding an option for listing these macros may hit the same limitation, and the same mean of storing (in the code) the list of these macros may be used. Repository: rC Clang https://reviews.llvm.org/D44765 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos
alexfh added a comment. In https://reviews.llvm.org/D44765#1045394, @Typz wrote: > A generic (or at least extandable) approach to specifying macro behaviors was > introduced here: https://reviews.llvm.org/D33440 I believe, that patch solves a significantly different problem and it won't make it much easier to implement correct handling of _T(x)-like macros (which expand to either x or L ## x depending on some other macro and thus have to be repeated for each fragment of the string literal after splitting). The most extensible solution I see is to make the list of _T-like macro spellings configurable via an option. Repository: rC Clang https://reviews.llvm.org/D44765 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos
Typz added a comment. A generic (or at least extandable) approach to specifying macro behaviors was introduced here: https://reviews.llvm.org/D33440 But this change seem to be on hold, as a configuration based on providing "macro definitions" seem to be preferable and kind-of in the pipe, though with no defined timeline... Repository: rC Clang https://reviews.llvm.org/D44765 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos
alexfh requested changes to this revision. alexfh added a comment. This revision now requires changes to proceed. We can't just treat `anything("")` like the _T macro. There should be a whitelist configurable with an option. By default only _T should be handled. Repository: rC Clang https://reviews.llvm.org/D44765 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D44765: PR36643 Make clang-format support more generic TMarcos
obfuscated created this revision. obfuscated added a reviewer: krasimir. Herald added a subscriber: klimek. This patch changes the check for _T to detect TMarcos with a more generic check. This makes it possible to format wxWidgets projects (where wxT is used a lot) correctly. Patch by Teodor Petrov Repository: rC Clang https://reviews.llvm.org/D44765 Files: lib/Format/ContinuationIndenter.cpp lib/Format/FormatToken.h lib/Format/FormatTokenLexer.cpp unittests/Format/FormatTest.cpp Index: unittests/Format/FormatTest.cpp === --- unittests/Format/FormatTest.cpp +++ unittests/Format/FormatTest.cpp @@ -7935,6 +7935,47 @@ "_T(\"Xn\"));")); } +TEST_F(FormatTest, BreaksStringLiteralsWithin_GenericTMacro) { + FormatStyle Style = getLLVMStyleWithColumns(25); + EXPECT_EQ( + "blablaT(\"aa\")\n" + "blablaT(\"aa\")\n" + "blablaT(\"\")", + format(" blablaT(\"\")", Style)); + EXPECT_EQ("f(x,\n" +" blablaT(\"\")\n" +" blablaT(\"aaa\"),\n" +" z);", +format("f(x, blablaT(\"aaa\"), z);", Style)); + + // FIXME: Handle embedded spaces in one iteration. + // EXPECT_EQ("blablaT(\"a\")\n" + //"blablaT(\"a\")\n" + //"blablaT(\"a\")\n" + //"blablaT(\"a\")", + //format(" blablaT ( \"\" )", + // getLLVMStyleWithColumns(20))); + EXPECT_EQ( + "blablaT ( \"\" )", + format(" blablaT ( \"\" )", Style)); + EXPECT_EQ("f(\n" +"#if !TEST\n" +"blablaT(\"Xn\")\n" +"#endif\n" +");", +format("f(\n" + "#if !TEST\n" + "blablaT(\"Xn\")\n" + "#endif\n" + ");")); + EXPECT_EQ("f(\n" +"\n" +"blablaT(\"Xn\"));", +format("f(\n" + "\n" + "blablaT(\"Xn\"));")); +} + TEST_F(FormatTest, BreaksStringLiteralOperands) { // In a function call with two operands, the second can be broken with no line // break before it. Index: lib/Format/FormatTokenLexer.cpp === --- lib/Format/FormatTokenLexer.cpp +++ lib/Format/FormatTokenLexer.cpp @@ -368,7 +368,7 @@ return false; FormatToken *Macro = Tokens[Tokens.size() - 4]; - if (Macro->TokenText != "_T") + if (Macro->TokenText.empty() || !String->TokenText.startswith("\"") || !String->TokenText.endswith("\"")) return false; const char *Start = Macro->TokenText.data(); @@ -382,6 +382,7 @@ String->TokenText, String->OriginalColumn, Style.TabWidth, Encoding); String->NewlinesBefore = Macro->NewlinesBefore; String->HasUnescapedNewline = Macro->HasUnescapedNewline; + String->TMacroStringLiteral = true; Tokens.pop_back(); Tokens.pop_back(); Index: lib/Format/FormatToken.h === --- lib/Format/FormatToken.h +++ lib/Format/FormatToken.h @@ -134,6 +134,9 @@ /// Token. bool HasUnescapedNewline = false; + /// \brief Whether this is a string literal similar to _T("sdfsdf"). + bool TMacroStringLiteral = false; + /// \brief The range of the whitespace immediately preceding the \c Token. SourceRange WhitespaceRange; Index: lib/Format/ContinuationIndenter.cpp === --- lib/Format/ContinuationIndenter.cpp +++ lib/Format/ContinuationIndenter.cpp @@ -1599,12 +1599,10 @@ // FIXME: Store Prefix and Suffix (or PrefixLength and SuffixLength to // reduce the overhead) for each FormatToken, which is a string, so that we // don't run multiple checks here on the hot path. -if ((Text.endswith(Postfix = "\"") && - (Text.startswith(Prefix = "@\"") || Text.startswith(Prefix = "\"") || - Text.startswith(Prefix = "u\"") || Text.startswith(Prefix = "U\"") || - Text.startswith(Prefix = "u8\"") || - Text.startswith(Prefix = "L\""))) || -(Text.startswith(Prefix = "_T(\"") && Text.endswith(Postfix = "\")"))) { +if (Text.endswith(Postfix = "\"") && +(Text.startswith(Prefix = "@\"") || Text.startswith(Prefix = "\"") || + Text.startswith(Prefix = "u\"") || Text.startswith(Prefix = "U\"") || + Text.startswith(Prefix = "u8\"") || Text.startswith(Prefix = "L\""))) { // We need this to address the case where there is an unbreakable tail // only if certain