[PATCH] D143870: [clang-format] Remove all include duplicates not only those in the same block

2023-04-19 Thread Fabian Keßler via Phabricator via cfe-commits
Febbe added a comment.

Actually, I already wanted to add an enum, controlling this behavior. 
But even with an enum, this check should only remove includes in consecutive 
blocks, which could be relocated (are not split by `#defines` or a comment)
Or I should add the options to `DeduplicateIncludeDirectives`: 
`{DID_InSameBlock (default), DID_Never, DID_Allways, DID_OnlyConsecutiveBlocks}`

Note, that I use DID_InSameBlock as default, because that's the current 
behavior, even many people would like to have it off by default.

Regarding the breaking of current features: What would you do, if a break is 
necessary, to implement a way more useful feature or to fix a huge issue/bug? 
Would you just infer a new version number? (I personally don't like the idea of 
having version numbers for specific tests, but that would be the easiest way to 
keep compatibility with old configurations and codebases).
Of course, the assumptions must be proven, as you said.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143870/new/

https://reviews.llvm.org/D143870

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D143870: [clang-format] Remove all include duplicates not only those in the same block

2023-04-17 Thread MyDeveloperDay via Phabricator via cfe-commits
MyDeveloperDay added a comment.

> Regarding the comment that I must not change existing tests: I think this 
> rule is too strict, because those tests are mostly regression tests. 
> But a regression tests does not test for correctness. So if a test had 
> already a wrong assumption, it must be changeable.

I don't agree, you are making the assumption that the default behaviour should 
now be different from what is was before, we have 100,000+ of users did you 
canvas them to ask if we wanted your suggested correct behaviour? (especially 
when someone has split their includes into a separate include group? I assume 
for a reason)

What would be their recourse if they didn't want the duplicate removed across 
groups, I don't see one? are you expecting them to // clang-format on/off

Ultimately we try not to change the default behaviour, if we think the 
behaviour is useful (and this could be), we ask that its put behind a new 
"Option" and that by default its off.

`SortIncludes` was latterly considered a contentious addition and has been 
proved to alter code in a well publicized example, in hindsight most people 
think it should have been off by default. I don't like making assumptions that 
we should turn something on, that others might not want/need or desire. This 
feels like one of those and something that could break code.

I personally follow a Beyoncé rule when it comes to unit tests... "If I like it 
I should have put a test on it", if the test is there, I need to be persuaded 
the test is very wrong before I'm happy to let it be changed to a new behaviour.

FWIW, these are not just regression tests, they assert that the behaviour is as 
the author desired and that is considered correctness until proven otherwise, 
any new author needs to respect that, or prove its a genuine bug and not just a 
matter of style/opinion.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143870/new/

https://reviews.llvm.org/D143870

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D143870: [clang-format] Remove all include duplicates not only those in the same block

2023-04-17 Thread Fabian Keßler via Phabricator via cfe-commits
Febbe added a comment.

In D143870#4273075 , @MyDeveloperDay 
wrote:

> I know it might not seem an obvious use case but there really isn't a 
> requirement to not include header files more than once.. imagine if I have
>
>   #define ARCH "win32"
>   #include "MyDataStructThatContainsPlaformSpecificNamesUsingARCH.h"
>   #define ARCH "win64"
>   #include "MyDataStructThatContainsPlaformSpecificNamesUsingARCH.h"
>
> Its not nice, but just because I include it twice doesn't mean its wrong? 
> This change would break code written that way, No?

Mhh, I need to test that. But you are right, if it removes the header in one of 
the conditional preprocessor blocks, it would be very bad.
Regarding normal blocks, the behavior was already inconsistent, as soon to 
reorder blocks was activated:
Blocks are merged if possible, then de-duplicated and split again. So I just 
made the behavior more consistent here.

Regarding the comment that I must not change existing tests: I think this rule 
is too strict, because those tests are mostly regression tests. 
But a regression tests does not test for correctness. So if a test had already 
a wrong assumption, it must be changeable. 
Also, when a behavior is now undesirable, it should be possible to adapt them.
I would say in this case I should replace the test with your case, to reflect 
that includes must not be removed, when they are in different conditional 
preprocessor blocks.
That the different "modi" should be more consistent is in my opinion desirable.

Last but not least, I currently have nearly zero free time, so I would like to 
pause this and my other phab until my thesis is done.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143870/new/

https://reviews.llvm.org/D143870

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D143870: [clang-format] Remove all include duplicates not only those in the same block

2023-04-17 Thread MyDeveloperDay via Phabricator via cfe-commits
MyDeveloperDay requested changes to this revision.
MyDeveloperDay added a comment.

I know it might not seem an obvious use case but there really isn't a 
requirement to not include header files more than once.. imaging if I have

  #define ARCH "win32"
  #include "MyDataStructThatContainsPlaformSpecificNamesUsingARCH.h"
  #define ARCH "win64"
  #include "MyDataStructThatContainsPlaformSpecificNamesUsingARCH.h"

Its not nice, but just because I include it twice doesn't mean its wrong? This 
change would break code written that way.




Comment at: clang/unittests/Format/SortIncludesTest.cpp:927
 
-TEST_F(SortIncludesTest, DeduplicateLocallyInEachBlock) {
   EXPECT_EQ("#include \n"

Please don't change existing tests


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143870/new/

https://reviews.llvm.org/D143870

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D143870: [clang-format] Remove all include duplicates not only those in the same block

2023-02-14 Thread Björn Schäpers via Phabricator via cfe-commits
HazardyKnusperkeks added a comment.

In D143870#4124057 , @Febbe wrote:

> @HazardyKnusperkeks thank you for the review, I would add another option, but 
> I don't know a good name. I would propose a
>
> `boolean` `IncludeDeduplicateInAllBlocks` which defaults to zero.
>
> First an `Include`, to keep include-sorting related options together in the 
> documentation (sorted by name not by category)
> `DeduplicateInAllBlocks` self explaining name.
>
> But `IncludeDeduplication` with the enum states `IDD_SameBlock`(default) and 
> IDD_ALL would also an option.
> The latter has the advantage, that an expansion of this option would not 
> require the adjustment of tests.
>
> What's your preference?

The `enum` is the better variant, I'd add the `Off`/`None` variant in the same 
breath. But the name has to be better, you don't need to start it with 
`Include`, since D138446  we can link to 
other options and that should happen. So a more natural name would be 
`DeduplicateIncludeDirectives`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143870/new/

https://reviews.llvm.org/D143870

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D143870: [clang-format] Remove all include duplicates not only those in the same block

2023-02-13 Thread Fabian Keßler via Phabricator via cfe-commits
Febbe added a comment.

@HazardyKnusperkeks thank you for the review, I would add another option, but I 
don't know a good name. I would propose a

`boolean` `IncludeDeduplicateInAllBlocks` which defaults to zero.

First an `Include`, to keep include-sorting related options together in the 
documentation (sorted by name not by category)
`DeduplicateInAllBlocks` self explaining name.

But `IncludeDeduplication` with the enum states `IDD_SameBlock`(default) and 
IDD_ALL would also an option.
The latter has the advantage, that an expansion of this option would not 
require the adjustment of tests.

What's your preference?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143870/new/

https://reviews.llvm.org/D143870

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D143870: [clang-format] Remove all include duplicates not only those in the same block

2023-02-13 Thread Björn Schäpers via Phabricator via cfe-commits
HazardyKnusperkeks requested changes to this revision.
HazardyKnusperkeks added a comment.
This revision now requires changes to proceed.

I can see that this is maybe useful, but that have to be behind a new option, 
which is turned off by default. And a big no to changing the existing tests, 
you may add new stuff.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143870/new/

https://reviews.llvm.org/D143870

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D143870: [clang-format] Remove all include duplicates not only those in the same block

2023-02-12 Thread Fabian Keßler via Phabricator via cfe-commits
Febbe created this revision.
Herald added a project: All.
Febbe requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

This change aims to remove all duplicate include directives, 
instead of only those, which are in the same block.

This change might be a bit controversial, since it will also remove includes, 
which are in custom splitted blocks like:

  #include "a.h"
  /* some code */
  
  // following include must stay!:
  #include "a.h"

This is a follow up patch for D143691 , but 
can be freestanding.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D143870

Files:
  clang/lib/Format/Format.cpp
  clang/unittests/Format/SortIncludesTest.cpp

Index: clang/unittests/Format/SortIncludesTest.cpp
===
--- clang/unittests/Format/SortIncludesTest.cpp
+++ clang/unittests/Format/SortIncludesTest.cpp
@@ -268,9 +268,7 @@
 "#include \n"
 "#include \n"
 "/* clang-format offically */\n"
-"#include \n"
-"#include \n"
-"#include \n"
+"\n"
 "/* clang-format onwards */\n",
 sort("#include \n"
  "#include \n"
@@ -899,6 +897,19 @@
  "\n"
  "#include \n"
  "#include \n"));
+
+  Style.IncludeBlocks = tooling::IncludeStyle::IBS_Regroup;
+  EXPECT_EQ("#include \"MyProject/MyLibrary/c_main.h\"\n"
+"\n"
+"#include \n"
+"//\n",
+sort("#include \"MyProject/MyLibrary/c_main.h\"\n"
+ "#include \"MyProject/MyLibrary/c_main.h\"\n"
+ "#include \n"
+ "#include \n"
+ "//\n"
+ "#include \n",
+ "c_main.cpp", 2U));
 }
 
 TEST_F(SortIncludesTest, CalculatesCorrectCursorPositionAfterDeduplicate) {
@@ -924,11 +935,10 @@
   EXPECT_EQ(26u, newCursor(Code, 52));
 }
 
-TEST_F(SortIncludesTest, DeduplicateLocallyInEachBlock) {
+TEST_F(SortIncludesTest, DeduplicateInAllBlocks) {
   EXPECT_EQ("#include \n"
 "#include \n"
 "\n"
-"#include \n"
 "#include \n",
 sort("#include \n"
  "#include \n"
Index: clang/lib/Format/Format.cpp
===
--- clang/lib/Format/Format.cpp
+++ clang/lib/Format/Format.cpp
@@ -38,6 +38,7 @@
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/Sequence.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/StringSet.h"
 #include "llvm/Support/Allocator.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/Path.h"
@@ -2858,6 +2859,7 @@
 // #include in the duplicate #includes.
 static void sortCppIncludes(const FormatStyle ,
 const SmallVectorImpl ,
+llvm::StringSet<> ,
 ArrayRef Ranges, StringRef FileName,
 StringRef Code, tooling::Replacements ,
 unsigned *Cursor) {
@@ -2900,12 +2902,14 @@
   }
 
   // Deduplicate #includes.
-  Indices.erase(std::unique(Indices.begin(), Indices.end(),
-[&](unsigned LHSI, unsigned RHSI) {
-  return Includes[LHSI].Text.trim() ==
- Includes[RHSI].Text.trim();
-}),
-Indices.end());
+
+  Indices.erase(
+  std::remove_if(
+  Indices.begin(), Indices.end(),
+  [&](unsigned val) {
+return !AllIncludes.try_emplace(Includes[val].Text.trim()).second;
+  }),
+  Indices.end());
 
   int CurrentCategory = Includes.front().Category;
 
@@ -2966,6 +2970,7 @@
   .Default(0);
   unsigned SearchFrom = 0;
   SmallVector Matches;
+  llvm::StringSet<> AllIncludes{};
   SmallVector IncludesInBlock;
 
   // FIXME: Do some validation, e.g. edit distance of the base name, to fix
@@ -3032,16 +3037,20 @@
 Categories.getSortIncludePriority(IncludeName, FirstIncludeBlock);
 // Todo(remove before merge): reason for removal: every header can have
 // 0,0 priority
+
 IncludesInBlock.push_back(
 {IncludeName, Line, Prev, Category, Priority});
+
   } else if (!IncludesInBlock.empty() && !EmptyLineSkipped) {
-sortCppIncludes(Style, IncludesInBlock, Ranges, FileName, Code,
-Replaces, Cursor);
+sortCppIncludes(Style, IncludesInBlock, AllIncludes, Ranges, FileName,
+Code, Replaces, Cursor);
 IncludesInBlock.clear();
-if (Trimmed.startswith("#pragma hdrstop")) // Precompiled headers.
+if (Trimmed.startswith("#pragma hdrstop")) { // Precompiled headers.
   FirstIncludeBlock = true;
-else
+  AllIncludes.clear();
+}