[PATCH] D134685: Fix SourceManager::isBeforeInTranslationUnit bug with token-pasting

2022-10-05 Thread Sam McCall via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
sammccall marked an inline comment as done.
Closed by commit rG41b51007e637: Fix SourceManager::isBeforeInTranslationUnit 
bug with token-pasting (authored by sammccall).

Changed prior to commit:
  https://reviews.llvm.org/D134685?vs=463053=465437#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134685/new/

https://reviews.llvm.org/D134685

Files:
  clang-tools-extra/clangd/unittests/SelectionTests.cpp
  clang/include/clang/Basic/SourceManager.h
  clang/lib/Basic/SourceManager.cpp
  clang/unittests/Basic/SourceManagerTest.cpp

Index: clang/unittests/Basic/SourceManagerTest.cpp
===
--- clang/unittests/Basic/SourceManagerTest.cpp
+++ clang/unittests/Basic/SourceManagerTest.cpp
@@ -171,6 +171,83 @@
   EXPECT_TRUE(SourceMgr.isBeforeInTranslationUnit(idLoc, macroExpEndLoc));
 }
 
+TEST_F(SourceManagerTest, isBeforeInTranslationUnitWithTokenSplit) {
+  const char *main = R"cpp(
+#define ID(X) X
+ID(
+  ID(a >> b)
+  c
+)
+  )cpp";
+
+  SourceMgr.setMainFileID(
+  SourceMgr.createFileID(llvm::MemoryBuffer::getMemBuffer(main)));
+
+  TrivialModuleLoader ModLoader;
+  HeaderSearch HeaderInfo(std::make_shared(), SourceMgr,
+  Diags, LangOpts, &*Target);
+  Preprocessor PP(std::make_shared(), Diags, LangOpts,
+  SourceMgr, HeaderInfo, ModLoader,
+  /*IILookup =*/nullptr,
+  /*OwnsHeaderSearch =*/false);
+  PP.Initialize(*Target);
+  PP.EnterMainSourceFile();
+  llvm::SmallString<8> Scratch;
+
+  std::vector toks;
+  while (1) {
+Token tok;
+PP.Lex(tok);
+if (tok.is(tok::eof))
+  break;
+toks.push_back(tok);
+  }
+
+  // Make sure we got the tokens that we expected.
+  ASSERT_EQ(4U, toks.size()) << "a >> b c";
+  // Sanity check their order.
+  for (unsigned I = 0; I < toks.size() - 1; ++I) {
+EXPECT_TRUE(SourceMgr.isBeforeInTranslationUnit(toks[I].getLocation(),
+toks[I + 1].getLocation()));
+EXPECT_FALSE(SourceMgr.isBeforeInTranslationUnit(toks[I + 1].getLocation(),
+ toks[I].getLocation()));
+  }
+
+  // Split the >> into two > tokens, as happens when parsing nested templates.
+  unsigned RightShiftIndex = 1;
+  SourceLocation RightShift = toks[RightShiftIndex].getLocation();
+  EXPECT_EQ(">>", Lexer::getSpelling(SourceMgr.getSpellingLoc(RightShift),
+ Scratch, SourceMgr, LangOpts));
+  SourceLocation Greater1 = PP.SplitToken(RightShift, /*Length=*/1);
+  SourceLocation Greater2 = RightShift.getLocWithOffset(1);
+  EXPECT_TRUE(Greater1.isMacroID());
+  EXPECT_EQ(">", Lexer::getSpelling(SourceMgr.getSpellingLoc(Greater1), Scratch,
+SourceMgr, LangOpts));
+  EXPECT_EQ(">", Lexer::getSpelling(SourceMgr.getSpellingLoc(Greater2), Scratch,
+SourceMgr, LangOpts));
+  EXPECT_EQ(SourceMgr.getImmediateExpansionRange(Greater1).getBegin(),
+RightShift);
+
+  for (unsigned I = 0; I < toks.size(); ++I) {
+SCOPED_TRACE("Token " + std::to_string(I));
+// Right-shift is the parent of Greater1, so it compares less.
+EXPECT_EQ(
+SourceMgr.isBeforeInTranslationUnit(toks[I].getLocation(), Greater1),
+I <= RightShiftIndex);
+EXPECT_EQ(
+SourceMgr.isBeforeInTranslationUnit(toks[I].getLocation(), Greater2),
+I <= RightShiftIndex);
+EXPECT_EQ(
+SourceMgr.isBeforeInTranslationUnit(Greater1, toks[I].getLocation()),
+RightShiftIndex < I);
+EXPECT_EQ(
+SourceMgr.isBeforeInTranslationUnit(Greater2, toks[I].getLocation()),
+RightShiftIndex < I);
+  }
+  EXPECT_TRUE(SourceMgr.isBeforeInTranslationUnit(Greater1, Greater2));
+  EXPECT_FALSE(SourceMgr.isBeforeInTranslationUnit(Greater2, Greater1));
+}
+
 TEST_F(SourceManagerTest, getColumnNumber) {
   const char *Source =
 "int x;\n"
Index: clang/lib/Basic/SourceManager.cpp
===
--- clang/lib/Basic/SourceManager.cpp
+++ clang/lib/Basic/SourceManager.cpp
@@ -1992,6 +1992,7 @@
   // This is a magic number for limiting the cache size.  It was experimentally
   // derived from a small Objective-C project (where the cache filled
   // out to ~250 items).  We can make it larger if necessary.
+  // FIXME: this is almost certainly full these days. Use an LRU cache?
   enum { MagicCacheSize = 300 };
   IsBeforeInTUCacheKey Key(LFID, RFID);
 
@@ -2000,7 +2001,7 @@
   // use.  When they update the value, the cache will get automatically
   // updated as well.
   if (IBTUCache.size() < MagicCacheSize)
-return IBTUCache[Key];
+return IBTUCache.try_emplace(Key, LFID, 

[PATCH] D134685: Fix SourceManager::isBeforeInTranslationUnit bug with token-pasting

2022-09-29 Thread Haojian Wu via Phabricator via cfe-commits
hokein accepted this revision.
hokein added a comment.
This revision is now accepted and ready to land.

Thanks for digging into the rabbit role and the great analysis. It looks good 
from my side.

Re the test case `ID(I TWO 3)` we discussed yesterday, I verified that there is 
no issue. (we don't merge the `1 2 3` into a single SLOCEntry, each one have a 
dedicated FileID, and FileID(2) < FileID(3)).

>> However I tried to avoid mixing these with subtle behavior changes, and will 
>> send a followup instead.
>
> D134694  if you're interested. I guess I 
> should try to get performance measures though...

While doing the review, I agree that this part of code is unnecessary 
complicated, I think doing cleanup is probably a good idea.




Comment at: clang/lib/Basic/SourceManager.cpp:2108
 // quit early. The other way round unfortunately remains suboptimal.
-  } while (LOffs.first != ROffs.first && !MoveUpIncludeHierarchy(LOffs, 
*this));
-  LocSet::iterator I;
-  while((I = LChain.find(ROffs.first)) == LChain.end()) {
-if (MoveUpIncludeHierarchy(ROffs, *this))
-  break; // Met at topmost file.
-  }
-  if (I != LChain.end())
-LOffs = *I;
+if (LOffs.first == ROffs.first)
+  break;

nit:  I found the first/second usage really hurts the code readability here 
(using the struct `Entry` int all places is probably better, but this requires 
some changes to the existing interface, no required action here).



Comment at: clang/lib/Basic/SourceManager.cpp:2126
+  // This changes the relative order of local vs loaded FileIDs, but it
+  // doesn't matter as these are never mixed in macro expansion.
+  unsigned LParent = I->second.ParentFID.ID;

maybe add an assertion for "local and load FileIDs  are never mixed".


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134685/new/

https://reviews.llvm.org/D134685

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134685: Fix SourceManager::isBeforeInTranslationUnit bug with token-pasting

2022-09-26 Thread Sam McCall via Phabricator via cfe-commits
sammccall added a comment.

> However I tried to avoid mixing these with subtle behavior changes, and will 
> send a followup instead.

D134694  if you're interested. I guess I 
should try to get performance measures though...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134685/new/

https://reviews.llvm.org/D134685

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134685: Fix SourceManager::isBeforeInTranslationUnit bug with token-pasting

2022-09-26 Thread Sam McCall via Phabricator via cfe-commits
sammccall created this revision.
sammccall added reviewers: ilya-biryukov, hokein.
Herald added subscribers: kadircet, arphaman.
Herald added a project: All.
sammccall requested review of this revision.
Herald added projects: clang, clang-tools-extra.
Herald added a subscriber: cfe-commits.

isBeforeInTranslationUnit compares SourceLocations across FileIDs by
mapping them onto a common ancestor file, following include/expansion edges.

It is possible to get a tie in the common ancestor, because multiple
"chunks" of a macro arg will expand to the same macro param token in the body:

  #define ID(X) X
  #define TWO 2
  ID(1 TWO)

Here two FileIDs both expand into `X` in ID's expansion:

- one containing `1` and spelled on line 3
- one containing `2` and spelled by the macro expansion of TWO

isBeforeInTranslationUnit breaks this tie by comparing the two FileIDs:
the one "on the left" is always created first and is numerically smaller.
This seems correct so far.

Prior to this patch it also takes a shortcut (unclear if intentionally).
Instead of comparing the two FileIDs that directly expand to the same location,
it compares the original FileIDs being compared. These may not be the
same if there are multiple macro expansions in between.
This *almost* always yields the right answer, because macro expansion
yields "trees" of FileIDs allocated in a contiguous range: when comparing tree A
to tree B, it doesn't matter what representative you pick.

However, the splitting of >> tokens is modeled as macro expansion (as if
the first '>' was a macro that expands to a '>' spelled a scratch buffer).
This splitting occurs retroactively when parsing, so the FileID allocated is
larger than expected if it were a real macro expansion performed during lexing.
As a result, macro tree A can be on the left of tree B, and yet contain
a token-split FileID whose numeric value is *greator* than those in B.
In this case the tiebreak gives the wrong answer.

Concretely:

  #define ID(X) X
  template  class S{};
  ID(
ID(S> x);
int y;
  )
  
  Given Greater = (typeloc of S).getEndLoc();
Y   = (decl of y).getLocation();
  isBeforeInTranslationUnit(Greater, Y) should return true, but returns false.

Here the common FileID of (Greater, Y) is the body of the outer ID
expansion, and they both expand to X within it.
With the current tiebreak rules, we compare the FileID of Greater (a split)
to the FileID of Y (a macro arg expansion into X of the outer ID).
The former is larger because the token split occurred relatively late.

This patch fixes the issue by removing the shortcut. It tracks the immediate
FileIDs used to reach the common file, and uses these IDs to break ties.
In the example, we now compare the macro arg expansion of the inner ID()
to the macro arg expansion of Y, and find that it is smaller.

This requires some changes to the InBeforeInTUCacheEntry (sic).
We store a little more data so it's probably slightly slower.
It was difficult to resist more invasive changes:

- performance: the sizing is very suspicious, and once the cache "fills up" 
we're thrashing a single entry
- API: the class seems to be needlessly complicated

However I tried to avoid mixing these with subtle behavior changes, and
will send a followup instead.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D134685

Files:
  clang-tools-extra/clangd/unittests/SelectionTests.cpp
  clang/include/clang/Basic/SourceManager.h
  clang/lib/Basic/SourceManager.cpp
  clang/unittests/Basic/SourceManagerTest.cpp

Index: clang/unittests/Basic/SourceManagerTest.cpp
===
--- clang/unittests/Basic/SourceManagerTest.cpp
+++ clang/unittests/Basic/SourceManagerTest.cpp
@@ -171,6 +171,83 @@
   EXPECT_TRUE(SourceMgr.isBeforeInTranslationUnit(idLoc, macroExpEndLoc));
 }
 
+TEST_F(SourceManagerTest, isBeforeInTranslationUnitWithTokenSplit) {
+  const char *main = R"cpp(
+#define ID(X) X
+ID(
+  ID(a >> b)
+  c
+)
+  )cpp";
+
+  SourceMgr.setMainFileID(
+  SourceMgr.createFileID(llvm::MemoryBuffer::getMemBuffer(main)));
+
+  TrivialModuleLoader ModLoader;
+  HeaderSearch HeaderInfo(std::make_shared(), SourceMgr,
+  Diags, LangOpts, &*Target);
+  Preprocessor PP(std::make_shared(), Diags, LangOpts,
+  SourceMgr, HeaderInfo, ModLoader,
+  /*IILookup =*/nullptr,
+  /*OwnsHeaderSearch =*/false);
+  PP.Initialize(*Target);
+  PP.EnterMainSourceFile();
+  llvm::SmallString<8> Scratch;
+
+  std::vector toks;
+  while (1) {
+Token tok;
+PP.Lex(tok);
+if (tok.is(tok::eof))
+  break;
+toks.push_back(tok);
+  }
+
+  // Make sure we got the tokens that we expected.
+  ASSERT_EQ(4U, toks.size()) << "a >> b c";
+  // Sanity check their order.
+  for (unsigned I = 0; I < toks.size() - 1; ++I) {
+EXPECT_TRUE(SourceMgr.isBeforeInTranslationUnit(toks[I].getLocation(),
+