[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2020-01-09 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov marked 2 inline comments as done.
ilya-biryukov added inline comments.



Comment at: include/clang/Tooling/Syntax/Tokens.h:206
+  /// DECL(a);
+  /// spelledTokens() returns {"#", "define", "DECL", "(", "name", ")", "eof"}.
+  /// FIXME: we do not yet store tokens of directives, like #include, #define,

ilya-biryukov wrote:
> nridge wrote:
> > Is this comment correct?
> > 
> > If so:
> >   * Why are the tokens "int", "name", "=", "10" not included?
> >   * Why are the tokens `"DECL", "(", "a", ")" not included?
> > 
> It isn't. Thanks for catching that.
> This patch went through so many revisions, it was close to impossible to 
> track this down.
Also:
- we do not store 'eof' in the spelled tokens anymore
- the FIXME is stale, we do store tokens of macro directives now

759c90456d418ffe69e1a2b4bcea2792491a6b5a updates the comment.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2020-01-09 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov marked 2 inline comments as done.
ilya-biryukov added inline comments.



Comment at: include/clang/Tooling/Syntax/Tokens.h:206
+  /// DECL(a);
+  /// spelledTokens() returns {"#", "define", "DECL", "(", "name", ")", "eof"}.
+  /// FIXME: we do not yet store tokens of directives, like #include, #define,

nridge wrote:
> Is this comment correct?
> 
> If so:
>   * Why are the tokens "int", "name", "=", "10" not included?
>   * Why are the tokens `"DECL", "(", "a", ")" not included?
> 
It isn't. Thanks for catching that.
This patch went through so many revisions, it was close to impossible to track 
this down.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2020-01-09 Thread Nathan Ridge via Phabricator via cfe-commits
nridge added inline comments.



Comment at: include/clang/Tooling/Syntax/Tokens.h:206
+  /// DECL(a);
+  /// spelledTokens() returns {"#", "define", "DECL", "(", "name", ")", "eof"}.
+  /// FIXME: we do not yet store tokens of directives, like #include, #define,

Is this comment correct?

If so:
  * Why are the tokens "int", "name", "=", "10" not included?
  * Why are the tokens `"DECL", "(", "a", ")" not included?



Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-05-21 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov added a comment.

In D59887#1508991 , @thakis wrote:

> Out of interest: The RecursiveASTVisitorTests are part of the ToolingTests 
> binary while this adds a new binary TokensTest. Can you say why?
>
> (No change needed, just curious.)


The syntax library is mostly independent from the rest of tooling, so I'd 
rather keep everything related to it separate including the tests.
I don't think we'll get anything in terms of code reuse from merging them into 
the same test binary either.

In D59887#1508993 , @thakis wrote:

> Another comment: The new binary is called TokensTest but is in a directory 
> "Syntax". For consistency with all other unit test binaries, please either 
> rename the binary to SyntaxTests, or rename the directory to "Tokens". (From 
> the patch description, the former seems more appropriate.) Note the missing 
> trailing "s" in the binary name too.


Agree with renaming the binary. In fact, the not-yet-landed revisions also use 
`SyntaxTests`, and I should've named it this way from the start. I'll land a 
patch.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-05-20 Thread Nico Weber via Phabricator via cfe-commits
thakis added a comment.

Another comment: The new binary is called TokensTest but is in a directory 
"Syntax". For consistency with all other unit test binaries, please either 
rename the binary to SyntaxTests, or rename the directory to "Tokens". (From 
the patch description, the former seems more appropriate.) Note the missing 
trailing "s" in the binary name too.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-05-20 Thread Nico Weber via Phabricator via cfe-commits
thakis added a comment.

Out of interest: The RecursiveASTVisitorTests are part of the ToolingTests 
binary while this adds a new binary TokensTest. Can you say why?

(No change needed, just curious.)


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-05-20 Thread Phabricator via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rC361148: [Syntax] Introduce TokenBuffer, start 
clangToolingSyntax library (authored by ibiryukov, committed by ).

Changed prior to commit:
  https://reviews.llvm.org/D59887?vs=200213&id=200260#toc

Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887

Files:
  include/clang/Tooling/Syntax/Tokens.h
  lib/Tooling/CMakeLists.txt
  lib/Tooling/Syntax/CMakeLists.txt
  lib/Tooling/Syntax/Tokens.cpp
  unittests/Tooling/CMakeLists.txt
  unittests/Tooling/Syntax/CMakeLists.txt
  unittests/Tooling/Syntax/TokensTest.cpp

Index: lib/Tooling/Syntax/CMakeLists.txt
===
--- lib/Tooling/Syntax/CMakeLists.txt
+++ lib/Tooling/Syntax/CMakeLists.txt
@@ -0,0 +1,10 @@
+set(LLVM_LINK_COMPONENTS Support)
+
+add_clang_library(clangToolingSyntax
+  Tokens.cpp
+
+  LINK_LIBS
+  clangBasic
+  clangFrontend
+  clangLex
+  )
Index: lib/Tooling/Syntax/Tokens.cpp
===
--- lib/Tooling/Syntax/Tokens.cpp
+++ lib/Tooling/Syntax/Tokens.cpp
@@ -0,0 +1,509 @@
+//===- Tokens.cpp - collect tokens from preprocessing -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+#include "clang/Tooling/Syntax/Tokens.h"
+
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/IdentifierTable.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TokenKinds.h"
+#include "clang/Lex/Preprocessor.h"
+#include "clang/Lex/Token.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/None.h"
+#include "llvm/ADT/Optional.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/raw_ostream.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+using namespace clang;
+using namespace clang::syntax;
+
+syntax::Token::Token(const clang::Token &T)
+: Token(T.getLocation(), T.getLength(), T.getKind()) {
+  assert(!T.isAnnotation());
+}
+
+llvm::StringRef syntax::Token::text(const SourceManager &SM) const {
+  bool Invalid = false;
+  const char *Start = SM.getCharacterData(location(), &Invalid);
+  assert(!Invalid);
+  return llvm::StringRef(Start, length());
+}
+
+FileRange syntax::Token::range(const SourceManager &SM) const {
+  assert(location().isFileID() && "must be a spelled token");
+  FileID File;
+  unsigned StartOffset;
+  std::tie(File, StartOffset) = SM.getDecomposedLoc(location());
+  return FileRange(File, StartOffset, StartOffset + length());
+}
+
+FileRange syntax::Token::range(const SourceManager &SM,
+   const syntax::Token &First,
+   const syntax::Token &Last) {
+  auto F = First.range(SM);
+  auto L = Last.range(SM);
+  assert(F.file() == L.file() && "tokens from different files");
+  assert(F.endOffset() <= L.beginOffset() && "wrong order of tokens");
+  return FileRange(F.file(), F.beginOffset(), L.endOffset());
+}
+
+llvm::raw_ostream &syntax::operator<<(llvm::raw_ostream &OS, const Token &T) {
+  return OS << T.str();
+}
+
+FileRange::FileRange(FileID File, unsigned BeginOffset, unsigned EndOffset)
+: File(File), Begin(BeginOffset), End(EndOffset) {
+  assert(File.isValid());
+  assert(BeginOffset <= EndOffset);
+}
+
+FileRange::FileRange(const SourceManager &SM, SourceLocation BeginLoc,
+ unsigned Length) {
+  assert(BeginLoc.isValid());
+  assert(BeginLoc.isFileID());
+
+  std::tie(File, Begin) = SM.getDecomposedLoc(BeginLoc);
+  End = Begin + Length;
+}
+FileRange::FileRange(const SourceManager &SM, SourceLocation BeginLoc,
+ SourceLocation EndLoc) {
+  assert(BeginLoc.isValid());
+  assert(BeginLoc.isFileID());
+  assert(EndLoc.isValid());
+  assert(EndLoc.isFileID());
+  assert(SM.getFileID(BeginLoc) == SM.getFileID(EndLoc));
+  assert(SM.getFileOffset(BeginLoc) <= SM.getFileOffset(EndLoc));
+
+  std::tie(File, Begin) = SM.getDecomposedLoc(BeginLoc);
+  End = SM.getFileOffset(EndLoc);
+}
+
+llvm::raw_ostream &syntax::operator<<(llvm::raw_ostream &OS,
+  const FileRange &R) {
+  return OS << llvm::formatv("FileRange(file = {0}, offsets = {1}-{2})",
+ R.file().getHashValue(), R.beginOffset(),
+ R.endOffset());
+}
+
+llvm::StringRef FileRange::text(const SourceManager &SM) const {
+  bool Invalid = false;
+  StringRef Text = SM.getBufferData(F

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-05-20 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov added inline comments.



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:115
+  /// expansion.
+  llvm::Optional range(const SourceManager &SM) const;
+

sammccall wrote:
> ilya-biryukov wrote:
> > sammccall wrote:
> > > I think this might need a more explicit name. It's reasonably obvious 
> > > that this needs to be optional for some cases (token pasting), but it's 
> > > not obvious at callsites that (or why) we don't use the spelling location 
> > > for macro-expanded tokens.
> > > 
> > > One option would be just do that and add an expandedFromMacro() helper so 
> > > the no-macros version is easy to express too.
> > > If we can't do that, maybe `directlySpelledRange` or something?
> > As mentioned in the offline conversation, the idea is that mapping from a 
> > token inside a macro expansion to a spelled token should be handled by 
> > `TokenBuffer` and these two functions is really just a convenience helper 
> > to get to a range *after* the mapping.
> > 
> > This change has been boiling for some time and I think the other bits of it 
> > seem to be non-controversial and agreed upon.
> > Would it be ok if we land it with this function using a more concrete name 
> > (`directlySpelledRange` or something else) or move it into a separate 
> > change?
> > 
> > There's a follow-up change that adds an 'expand macro' action to clangd, 
> > which both has a use-site for these method and provides a functional 
> > feature based on `TokenBuffer`. Iterating on the design with (even a 
> > single) use-case should be simpler.
> If we do want to reflect expanded/spelled as types, this will rapidly get 
> harder to change. But we do need to make progress here.
> 
> If passing spelled tokens is the intended/well-understood use, let's just 
> assert on that and not return Optional. Then I'm less worried about the name: 
> misuse will be reliably punished.
Added an assertion. Not sure about the name, kept `range` for now for the lack 
of a better alternative.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-05-20 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov updated this revision to Diff 200213.
ilya-biryukov marked 3 inline comments as done.
ilya-biryukov added a comment.

- Fix a comment, reformat
- Use assertions instead of returning optionals


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887

Files:
  clang/include/clang/Tooling/Syntax/Tokens.h
  clang/lib/Tooling/CMakeLists.txt
  clang/lib/Tooling/Syntax/CMakeLists.txt
  clang/lib/Tooling/Syntax/Tokens.cpp
  clang/unittests/Tooling/CMakeLists.txt
  clang/unittests/Tooling/Syntax/CMakeLists.txt
  clang/unittests/Tooling/Syntax/TokensTest.cpp

Index: clang/unittests/Tooling/Syntax/TokensTest.cpp
===
--- /dev/null
+++ clang/unittests/Tooling/Syntax/TokensTest.cpp
@@ -0,0 +1,654 @@
+//===- TokensTest.cpp -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "clang/Tooling/Syntax/Tokens.h"
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/Expr.h"
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticIDs.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/FileSystemOptions.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TokenKinds.def"
+#include "clang/Basic/TokenKinds.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendAction.h"
+#include "clang/Frontend/Utils.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "clang/Lex/Token.h"
+#include "clang/Tooling/Tooling.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/ADT/None.h"
+#include "llvm/ADT/Optional.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Support/raw_os_ostream.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Testing/Support/Annotations.h"
+#include "llvm/Testing/Support/SupportHelpers.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+using namespace clang;
+using namespace clang::syntax;
+
+using llvm::ValueIs;
+using ::testing::AllOf;
+using ::testing::Contains;
+using ::testing::ElementsAre;
+using ::testing::Matcher;
+using ::testing::Not;
+using ::testing::StartsWith;
+
+namespace {
+// Checks the passed ArrayRef has the same begin() and end() iterators as the
+// argument.
+MATCHER_P(SameRange, A, "") {
+  return A.begin() == arg.begin() && A.end() == arg.end();
+}
+// Matchers for syntax::Token.
+MATCHER_P(Kind, K, "") { return arg.kind() == K; }
+MATCHER_P2(HasText, Text, SourceMgr, "") {
+  return arg.text(*SourceMgr) == Text;
+}
+/// Checks the start and end location of a token are equal to SourceRng.
+MATCHER_P(RangeIs, SourceRng, "") {
+  return arg.location() == SourceRng.first &&
+ arg.endLocation() == SourceRng.second;
+}
+
+class TokenCollectorTest : public ::testing::Test {
+public:
+  /// Run the clang frontend, collect the preprocessed tokens from the frontend
+  /// invocation and store them in this->Buffer.
+  /// This also clears SourceManager before running the compiler.
+  void recordTokens(llvm::StringRef Code) {
+class RecordTokens : public ASTFrontendAction {
+public:
+  explicit RecordTokens(TokenBuffer &Result) : Result(Result) {}
+
+  bool BeginSourceFileAction(CompilerInstance &CI) override {
+assert(!Collector && "expected only a single call to BeginSourceFile");
+Collector.emplace(CI.getPreprocessor());
+return true;
+  }
+  void EndSourceFileAction() override {
+assert(Collector && "BeginSourceFileAction was never called");
+Result = std::move(*Collector).consume();
+  }
+
+  std::unique_ptr
+  CreateASTConsumer(CompilerInstance &CI, StringRef InFile) override {
+return llvm::make_unique();
+  }
+
+private:
+  TokenBuffer &Result;
+  llvm::Optional Collector;
+};
+
+constexpr const char *FileName = "./input.cpp";
+FS->addFile(FileName, time_t(), llvm::MemoryBuffer::getMemBufferCopy(""));
+// Prepare to run a compiler.
+std::vector Args = {"tok-test", "-std=c++03", "-fsyntax-only",
+  FileName};
+auto CI = createInvocationFromCommandLine(Args, Diags, FS);
+assert(CI);
+CI->getFrontendOpts().DisableFree = false;
+CI->getPreprocessorOpts().addRemappedFile(
+

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-05-17 Thread Sam McCall via Phabricator via cfe-commits
sammccall added inline comments.



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:115
+  /// expansion.
+  llvm::Optional range(const SourceManager &SM) const;
+

ilya-biryukov wrote:
> sammccall wrote:
> > I think this might need a more explicit name. It's reasonably obvious that 
> > this needs to be optional for some cases (token pasting), but it's not 
> > obvious at callsites that (or why) we don't use the spelling location for 
> > macro-expanded tokens.
> > 
> > One option would be just do that and add an expandedFromMacro() helper so 
> > the no-macros version is easy to express too.
> > If we can't do that, maybe `directlySpelledRange` or something?
> As mentioned in the offline conversation, the idea is that mapping from a 
> token inside a macro expansion to a spelled token should be handled by 
> `TokenBuffer` and these two functions is really just a convenience helper to 
> get to a range *after* the mapping.
> 
> This change has been boiling for some time and I think the other bits of it 
> seem to be non-controversial and agreed upon.
> Would it be ok if we land it with this function using a more concrete name 
> (`directlySpelledRange` or something else) or move it into a separate change?
> 
> There's a follow-up change that adds an 'expand macro' action to clangd, 
> which both has a use-site for these method and provides a functional feature 
> based on `TokenBuffer`. Iterating on the design with (even a single) use-case 
> should be simpler.
If we do want to reflect expanded/spelled as types, this will rapidly get 
harder to change. But we do need to make progress here.

If passing spelled tokens is the intended/well-understood use, let's just 
assert on that and not return Optional. Then I'm less worried about the name: 
misuse will be reliably punished.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-05-17 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov added inline comments.



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:115
+  /// expansion.
+  llvm::Optional range(const SourceManager &SM) const;
+

sammccall wrote:
> I think this might need a more explicit name. It's reasonably obvious that 
> this needs to be optional for some cases (token pasting), but it's not 
> obvious at callsites that (or why) we don't use the spelling location for 
> macro-expanded tokens.
> 
> One option would be just do that and add an expandedFromMacro() helper so the 
> no-macros version is easy to express too.
> If we can't do that, maybe `directlySpelledRange` or something?
As mentioned in the offline conversation, the idea is that mapping from a token 
inside a macro expansion to a spelled token should be handled by `TokenBuffer` 
and these two functions is really just a convenience helper to get to a range 
*after* the mapping.

This change has been boiling for some time and I think the other bits of it 
seem to be non-controversial and agreed upon.
Would it be ok if we land it with this function using a more concrete name 
(`directlySpelledRange` or something else) or move it into a separate change?

There's a follow-up change that adds an 'expand macro' action to clangd, which 
both has a use-site for these method and provides a functional feature based on 
`TokenBuffer`. Iterating on the design with (even a single) use-case should be 
simpler.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-05-17 Thread Sam McCall via Phabricator via cfe-commits
sammccall accepted this revision.
sammccall added inline comments.



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:49
+
+/// A half-open range inside a particular file, the start offset is included 
and
+/// the end offset is excluded from the range.

nit: character range (just to be totally explicit)?



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:115
+  /// expansion.
+  llvm::Optional range(const SourceManager &SM) const;
+

I think this might need a more explicit name. It's reasonably obvious that this 
needs to be optional for some cases (token pasting), but it's not obvious at 
callsites that (or why) we don't use the spelling location for macro-expanded 
tokens.

One option would be just do that and add an expandedFromMacro() helper so the 
no-macros version is easy to express too.
If we can't do that, maybe `directlySpelledRange` or something?



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:121
+  /// are from different files or \p Last is located before \p First.
+  static llvm::Optional range(const SourceManager &SM,
+ const syntax::Token &First,

similar to above, I'd naively expect this to return a valid range, given the 
tokens expanded from `assert(X && [[Y.z()]] )`


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-05-17 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov updated this revision to Diff 200030.
ilya-biryukov added a comment.

- Skip annotation tokens, some of them are reported by the preprocessor but we 
don't want them in the final expanded stream.
- Add functions to compute FileRange of tokens, add tests for it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887

Files:
  clang/include/clang/Tooling/Syntax/Tokens.h
  clang/lib/Tooling/CMakeLists.txt
  clang/lib/Tooling/Syntax/CMakeLists.txt
  clang/lib/Tooling/Syntax/Tokens.cpp
  clang/unittests/Tooling/CMakeLists.txt
  clang/unittests/Tooling/Syntax/CMakeLists.txt
  clang/unittests/Tooling/Syntax/TokensTest.cpp

Index: clang/unittests/Tooling/Syntax/TokensTest.cpp
===
--- /dev/null
+++ clang/unittests/Tooling/Syntax/TokensTest.cpp
@@ -0,0 +1,672 @@
+//===- TokensTest.cpp -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "clang/Tooling/Syntax/Tokens.h"
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/Expr.h"
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticIDs.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/FileSystemOptions.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TokenKinds.def"
+#include "clang/Basic/TokenKinds.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendAction.h"
+#include "clang/Frontend/Utils.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "clang/Lex/Token.h"
+#include "clang/Tooling/Tooling.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/ADT/None.h"
+#include "llvm/ADT/Optional.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Support/raw_os_ostream.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Testing/Support/Annotations.h"
+#include "llvm/Testing/Support/SupportHelpers.h"
+#include "gmock/gmock-matchers.h"
+#include "gmock/gmock-more-matchers.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+using namespace clang;
+using namespace clang::syntax;
+
+using llvm::ValueIs;
+using ::testing::AllOf;
+using ::testing::Contains;
+using ::testing::ElementsAre;
+using ::testing::Matcher;
+using ::testing::Not;
+using ::testing::StartsWith;
+
+namespace {
+// Checks the passed ArrayRef has the same begin() and end() iterators as the
+// argument.
+MATCHER_P(SameRange, A, "") {
+  return A.begin() == arg.begin() && A.end() == arg.end();
+}
+// Matchers for syntax::Token.
+MATCHER_P(Kind, K, "") { return arg.kind() == K; }
+MATCHER_P2(HasText, Text, SourceMgr, "") {
+  return arg.text(*SourceMgr) == Text;
+}
+/// Checks the start and end location of a token are equal to SourceRng.
+MATCHER_P(RangeIs, SourceRng, "") {
+  return arg.location() == SourceRng.first &&
+ arg.endLocation() == SourceRng.second;
+}
+
+class TokenCollectorTest : public ::testing::Test {
+public:
+  /// Run the clang frontend, collect the preprocessed tokens from the frontend
+  /// invocation and store them in this->Buffer.
+  /// This also clears SourceManager before running the compiler.
+  void recordTokens(llvm::StringRef Code) {
+class RecordTokens : public ASTFrontendAction {
+public:
+  explicit RecordTokens(TokenBuffer &Result) : Result(Result) {}
+
+  bool BeginSourceFileAction(CompilerInstance &CI) override {
+assert(!Collector && "expected only a single call to BeginSourceFile");
+Collector.emplace(CI.getPreprocessor());
+return true;
+  }
+  void EndSourceFileAction() override {
+assert(Collector && "BeginSourceFileAction was never called");
+Result = std::move(*Collector).consume();
+  }
+
+  std::unique_ptr
+  CreateASTConsumer(CompilerInstance &CI, StringRef InFile) override {
+return llvm::make_unique();
+  }
+
+private:
+  TokenBuffer &Result;
+  llvm::Optional Collector;
+};
+
+constexpr const char *FileName = "./input.cpp";
+FS->addFile(FileName, time_t(), llvm::MemoryBuffer::getMemBufferCopy(""));
+// Prepare to run a compiler.
+std::vector Args = {"tok-test", "-std=c++03", "-fsyntax-only",
+  FileName};
+auto CI = createInvocationFromCom

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-05-17 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov updated this revision to Diff 200019.
ilya-biryukov added a comment.

- Remove the function that maps tokens to offsets


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887

Files:
  clang/include/clang/Tooling/Syntax/Tokens.h
  clang/lib/Tooling/CMakeLists.txt
  clang/lib/Tooling/Syntax/CMakeLists.txt
  clang/lib/Tooling/Syntax/Tokens.cpp
  clang/unittests/Tooling/CMakeLists.txt
  clang/unittests/Tooling/Syntax/CMakeLists.txt
  clang/unittests/Tooling/Syntax/TokensTest.cpp

Index: clang/unittests/Tooling/Syntax/TokensTest.cpp
===
--- /dev/null
+++ clang/unittests/Tooling/Syntax/TokensTest.cpp
@@ -0,0 +1,620 @@
+//===- TokensTest.cpp -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "clang/Tooling/Syntax/Tokens.h"
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/Expr.h"
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticIDs.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/FileSystemOptions.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TokenKinds.def"
+#include "clang/Basic/TokenKinds.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendAction.h"
+#include "clang/Frontend/Utils.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "clang/Lex/Token.h"
+#include "clang/Tooling/Tooling.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/ADT/None.h"
+#include "llvm/ADT/Optional.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Support/raw_os_ostream.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Testing/Support/Annotations.h"
+#include "llvm/Testing/Support/SupportHelpers.h"
+#include "gmock/gmock-matchers.h"
+#include "gmock/gmock-more-matchers.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+using namespace clang;
+using namespace clang::syntax;
+
+using llvm::ValueIs;
+using ::testing::AllOf;
+using ::testing::Contains;
+using ::testing::ElementsAre;
+using ::testing::Matcher;
+using ::testing::Not;
+using ::testing::StartsWith;
+
+namespace {
+// Checks the passed ArrayRef has the same begin() and end() iterators as the
+// argument.
+MATCHER_P(SameRange, A, "") {
+  return A.begin() == arg.begin() && A.end() == arg.end();
+}
+// Matchers for syntax::Token.
+MATCHER_P(Kind, K, "") { return arg.kind() == K; }
+MATCHER_P2(HasText, Text, SourceMgr, "") {
+  return arg.text(*SourceMgr) == Text;
+}
+/// Checks the start and end location of a token are equal to SourceRng.
+MATCHER_P(RangeIs, SourceRng, "") {
+  return arg.location() == SourceRng.first &&
+ arg.endLocation() == SourceRng.second;
+}
+
+class TokenCollectorTest : public ::testing::Test {
+public:
+  /// Run the clang frontend, collect the preprocessed tokens from the frontend
+  /// invocation and store them in this->Buffer.
+  /// This also clears SourceManager before running the compiler.
+  void recordTokens(llvm::StringRef Code) {
+class RecordTokens : public ASTFrontendAction {
+public:
+  explicit RecordTokens(TokenBuffer &Result) : Result(Result) {}
+
+  bool BeginSourceFileAction(CompilerInstance &CI) override {
+assert(!Collector && "expected only a single call to BeginSourceFile");
+Collector.emplace(CI.getPreprocessor());
+return true;
+  }
+  void EndSourceFileAction() override {
+assert(Collector && "BeginSourceFileAction was never called");
+Result = std::move(*Collector).consume();
+  }
+
+  std::unique_ptr
+  CreateASTConsumer(CompilerInstance &CI, StringRef InFile) override {
+return llvm::make_unique();
+  }
+
+private:
+  TokenBuffer &Result;
+  llvm::Optional Collector;
+};
+
+constexpr const char *FileName = "./input.cpp";
+FS->addFile(FileName, time_t(), llvm::MemoryBuffer::getMemBufferCopy(""));
+// Prepare to run a compiler.
+std::vector Args = {"tok-test", "-std=c++03", "-fsyntax-only",
+  FileName};
+auto CI = createInvocationFromCommandLine(Args, Diags, FS);
+assert(CI);
+CI->getFrontendOpts().DisableFree = false;
+CI->getPreprocessorOpts().addRemappedFile(
+ 

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-05-09 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov updated this revision to Diff 198868.
ilya-biryukov marked 2 inline comments as done.
ilya-biryukov added a comment.

- Use bsearch instead of upper_bound


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887

Files:
  clang/include/clang/Tooling/Syntax/Tokens.h
  clang/lib/Tooling/CMakeLists.txt
  clang/lib/Tooling/Syntax/CMakeLists.txt
  clang/lib/Tooling/Syntax/Tokens.cpp
  clang/unittests/Tooling/CMakeLists.txt
  clang/unittests/Tooling/Syntax/CMakeLists.txt
  clang/unittests/Tooling/Syntax/TokensTest.cpp

Index: clang/unittests/Tooling/Syntax/TokensTest.cpp
===
--- /dev/null
+++ clang/unittests/Tooling/Syntax/TokensTest.cpp
@@ -0,0 +1,622 @@
+//===- TokensTest.cpp -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "clang/Tooling/Syntax/Tokens.h"
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/Expr.h"
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticIDs.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/FileSystemOptions.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TokenKinds.def"
+#include "clang/Basic/TokenKinds.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendAction.h"
+#include "clang/Frontend/Utils.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "clang/Lex/Token.h"
+#include "clang/Tooling/Tooling.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/ADT/None.h"
+#include "llvm/ADT/Optional.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Support/raw_os_ostream.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Testing/Support/Annotations.h"
+#include "llvm/Testing/Support/SupportHelpers.h"
+#include "gmock/gmock-matchers.h"
+#include "gmock/gmock-more-matchers.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+using namespace clang;
+using namespace clang::syntax;
+
+using llvm::ValueIs;
+using ::testing::AllOf;
+using ::testing::Contains;
+using ::testing::ElementsAre;
+using ::testing::Matcher;
+using ::testing::Not;
+using ::testing::StartsWith;
+
+namespace {
+// Checks the passed ArrayRef has the same begin() and end() iterators as the
+// argument.
+MATCHER_P(SameRange, A, "") {
+  return A.begin() == arg.begin() && A.end() == arg.end();
+}
+// Matchers for syntax::Token.
+MATCHER_P(Kind, K, "") { return arg.kind() == K; }
+MATCHER_P2(HasText, Text, SourceMgr, "") {
+  return arg.text(*SourceMgr) == Text;
+}
+/// Checks the start and end location of a token are equal to SourceRng.
+MATCHER_P(RangeIs, SourceRng, "") {
+  return arg.location() == SourceRng.first &&
+ arg.endLocation() == SourceRng.second;
+}
+
+class TokenCollectorTest : public ::testing::Test {
+public:
+  /// Run the clang frontend, collect the preprocessed tokens from the frontend
+  /// invocation and store them in this->Buffer.
+  /// This also clears SourceManager before running the compiler.
+  void recordTokens(llvm::StringRef Code) {
+class RecordTokens : public ASTFrontendAction {
+public:
+  explicit RecordTokens(TokenBuffer &Result) : Result(Result) {}
+
+  bool BeginSourceFileAction(CompilerInstance &CI) override {
+assert(!Collector && "expected only a single call to BeginSourceFile");
+Collector.emplace(CI.getPreprocessor());
+return true;
+  }
+  void EndSourceFileAction() override {
+assert(Collector && "BeginSourceFileAction was never called");
+Result = std::move(*Collector).consume();
+  }
+
+  std::unique_ptr
+  CreateASTConsumer(CompilerInstance &CI, StringRef InFile) override {
+return llvm::make_unique();
+  }
+
+private:
+  TokenBuffer &Result;
+  llvm::Optional Collector;
+};
+
+constexpr const char *FileName = "./input.cpp";
+FS->addFile(FileName, time_t(), llvm::MemoryBuffer::getMemBufferCopy(""));
+// Prepare to run a compiler.
+std::vector Args = {"tok-test", "-std=c++03", "-fsyntax-only",
+  FileName};
+auto CI = createInvocationFromCommandLine(Args, Diags, FS);
+assert(CI);
+CI->getFrontendOpts().DisableFree = false;
+CI->getPre

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-05-09 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov marked 3 inline comments as done.
ilya-biryukov added inline comments.



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:78
+  /// For debugging purposes.
+  std::string str() const;
+

sammccall wrote:
> ilya-biryukov wrote:
> > sammccall wrote:
> > > (do we need to have two names for this version?)
> > You mean to have distinct names for two different overloads?
> > I wouldn't do it, they both have fairly similar outputs, could add a small 
> > comment that the one with SourceManager should probably be preferred if you 
> > have one.
> > 
> No sorry, I meant do we need both str() and operator<<
one can type `str()` in debugger, `operator <<` is for convenience when one is 
already using streams.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-05-09 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov marked 2 inline comments as done.
ilya-biryukov added inline comments.



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:62
+  SourceLocation location() const { return Location; }
+  SourceLocation endLocation() const {
+return Location.getLocWithOffset(Length);

sammccall wrote:
> ilya-biryukov wrote:
> > sammccall wrote:
> > > sadly, this would need documentation - it's the one-past-the-end 
> > > location, not the last character in the token, not the location of the 
> > > "next" token, or the location of the "next" character...
> > > 
> > > I do wonder whether this is actually the right function to expose...
> > >  Do you ever care about end but not start? (The reverse seems likelier). 
> > > Having two independent source location accessors obscures the invariant 
> > > that they have the same file ID.
> > > 
> > > I think exposing a `FileRange` accessor instead might be better, but for 
> > > now you could also make callers use 
> > > `Tok.location().getLocWithOffset(Tok.length())` until we know it's the 
> > > right pattern to encourage
> > Added a comment. With the comment and an implementation being inline and 
> > trivial, I don't think anyone would have trouble understanding what this 
> > method does.
> (this is ok. I do think a FileRange accessor would make client code more 
> readable/less error-prone. Let's discuss offline)
I've added a corresponding accessor (and a version of it that accepts a range 
of tokens) to D61681.
I'd keep it off this patch for now, will refactor in a follow-up.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-05-09 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov updated this revision to Diff 198867.
ilya-biryukov marked 5 inline comments as done.
ilya-biryukov added a comment.

- Check invariants on FileRange construction, unify access to all fields


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887

Files:
  clang/include/clang/Tooling/Syntax/Tokens.h
  clang/lib/Tooling/CMakeLists.txt
  clang/lib/Tooling/Syntax/CMakeLists.txt
  clang/lib/Tooling/Syntax/Tokens.cpp
  clang/unittests/Tooling/CMakeLists.txt
  clang/unittests/Tooling/Syntax/CMakeLists.txt
  clang/unittests/Tooling/Syntax/TokensTest.cpp

Index: clang/unittests/Tooling/Syntax/TokensTest.cpp
===
--- /dev/null
+++ clang/unittests/Tooling/Syntax/TokensTest.cpp
@@ -0,0 +1,622 @@
+//===- TokensTest.cpp -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "clang/Tooling/Syntax/Tokens.h"
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/Expr.h"
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticIDs.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/FileSystemOptions.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TokenKinds.def"
+#include "clang/Basic/TokenKinds.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendAction.h"
+#include "clang/Frontend/Utils.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "clang/Lex/Token.h"
+#include "clang/Tooling/Tooling.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/ADT/None.h"
+#include "llvm/ADT/Optional.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Support/raw_os_ostream.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Testing/Support/Annotations.h"
+#include "llvm/Testing/Support/SupportHelpers.h"
+#include "gmock/gmock-matchers.h"
+#include "gmock/gmock-more-matchers.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+using namespace clang;
+using namespace clang::syntax;
+
+using llvm::ValueIs;
+using ::testing::AllOf;
+using ::testing::Contains;
+using ::testing::ElementsAre;
+using ::testing::Matcher;
+using ::testing::Not;
+using ::testing::StartsWith;
+
+namespace {
+// Checks the passed ArrayRef has the same begin() and end() iterators as the
+// argument.
+MATCHER_P(SameRange, A, "") {
+  return A.begin() == arg.begin() && A.end() == arg.end();
+}
+// Matchers for syntax::Token.
+MATCHER_P(Kind, K, "") { return arg.kind() == K; }
+MATCHER_P2(HasText, Text, SourceMgr, "") {
+  return arg.text(*SourceMgr) == Text;
+}
+/// Checks the start and end location of a token are equal to SourceRng.
+MATCHER_P(RangeIs, SourceRng, "") {
+  return arg.location() == SourceRng.first &&
+ arg.endLocation() == SourceRng.second;
+}
+
+class TokenCollectorTest : public ::testing::Test {
+public:
+  /// Run the clang frontend, collect the preprocessed tokens from the frontend
+  /// invocation and store them in this->Buffer.
+  /// This also clears SourceManager before running the compiler.
+  void recordTokens(llvm::StringRef Code) {
+class RecordTokens : public ASTFrontendAction {
+public:
+  explicit RecordTokens(TokenBuffer &Result) : Result(Result) {}
+
+  bool BeginSourceFileAction(CompilerInstance &CI) override {
+assert(!Collector && "expected only a single call to BeginSourceFile");
+Collector.emplace(CI.getPreprocessor());
+return true;
+  }
+  void EndSourceFileAction() override {
+assert(Collector && "BeginSourceFileAction was never called");
+Result = std::move(*Collector).consume();
+  }
+
+  std::unique_ptr
+  CreateASTConsumer(CompilerInstance &CI, StringRef InFile) override {
+return llvm::make_unique();
+  }
+
+private:
+  TokenBuffer &Result;
+  llvm::Optional Collector;
+};
+
+constexpr const char *FileName = "./input.cpp";
+FS->addFile(FileName, time_t(), llvm::MemoryBuffer::getMemBufferCopy(""));
+// Prepare to run a compiler.
+std::vector Args = {"tok-test", "-std=c++03", "-fsyntax-only",
+  FileName};
+auto CI = createInvocationFromCommandLine(Args, Diags, FS);
+assert(CI);
+CI->getFrontendOpts().

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-05-09 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov updated this revision to Diff 198862.
ilya-biryukov marked 6 inline comments as done.
ilya-biryukov added a comment.

- Move filling the gaps at the end of files to a separate function


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887

Files:
  clang/include/clang/Tooling/Syntax/Tokens.h
  clang/lib/Tooling/CMakeLists.txt
  clang/lib/Tooling/Syntax/CMakeLists.txt
  clang/lib/Tooling/Syntax/Tokens.cpp
  clang/unittests/Tooling/CMakeLists.txt
  clang/unittests/Tooling/Syntax/CMakeLists.txt
  clang/unittests/Tooling/Syntax/TokensTest.cpp

Index: clang/unittests/Tooling/Syntax/TokensTest.cpp
===
--- /dev/null
+++ clang/unittests/Tooling/Syntax/TokensTest.cpp
@@ -0,0 +1,622 @@
+//===- TokensTest.cpp -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "clang/Tooling/Syntax/Tokens.h"
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/Expr.h"
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticIDs.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/FileSystemOptions.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TokenKinds.def"
+#include "clang/Basic/TokenKinds.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendAction.h"
+#include "clang/Frontend/Utils.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "clang/Lex/Token.h"
+#include "clang/Tooling/Tooling.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/ADT/None.h"
+#include "llvm/ADT/Optional.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Support/raw_os_ostream.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Testing/Support/Annotations.h"
+#include "llvm/Testing/Support/SupportHelpers.h"
+#include "gmock/gmock-matchers.h"
+#include "gmock/gmock-more-matchers.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+using namespace clang;
+using namespace clang::syntax;
+
+using llvm::ValueIs;
+using ::testing::AllOf;
+using ::testing::Contains;
+using ::testing::ElementsAre;
+using ::testing::Matcher;
+using ::testing::Not;
+using ::testing::StartsWith;
+
+namespace {
+// Checks the passed ArrayRef has the same begin() and end() iterators as the
+// argument.
+MATCHER_P(SameRange, A, "") {
+  return A.begin() == arg.begin() && A.end() == arg.end();
+}
+// Matchers for syntax::Token.
+MATCHER_P(Kind, K, "") { return arg.kind() == K; }
+MATCHER_P2(HasText, Text, SourceMgr, "") {
+  return arg.text(*SourceMgr) == Text;
+}
+/// Checks the start and end location of a token are equal to SourceRng.
+MATCHER_P(RangeIs, SourceRng, "") {
+  return arg.location() == SourceRng.first &&
+ arg.endLocation() == SourceRng.second;
+}
+
+class TokenCollectorTest : public ::testing::Test {
+public:
+  /// Run the clang frontend, collect the preprocessed tokens from the frontend
+  /// invocation and store them in this->Buffer.
+  /// This also clears SourceManager before running the compiler.
+  void recordTokens(llvm::StringRef Code) {
+class RecordTokens : public ASTFrontendAction {
+public:
+  explicit RecordTokens(TokenBuffer &Result) : Result(Result) {}
+
+  bool BeginSourceFileAction(CompilerInstance &CI) override {
+assert(!Collector && "expected only a single call to BeginSourceFile");
+Collector.emplace(CI.getPreprocessor());
+return true;
+  }
+  void EndSourceFileAction() override {
+assert(Collector && "BeginSourceFileAction was never called");
+Result = std::move(*Collector).consume();
+  }
+
+  std::unique_ptr
+  CreateASTConsumer(CompilerInstance &CI, StringRef InFile) override {
+return llvm::make_unique();
+  }
+
+private:
+  TokenBuffer &Result;
+  llvm::Optional Collector;
+};
+
+constexpr const char *FileName = "./input.cpp";
+FS->addFile(FileName, time_t(), llvm::MemoryBuffer::getMemBufferCopy(""));
+// Prepare to run a compiler.
+std::vector Args = {"tok-test", "-std=c++03", "-fsyntax-only",
+  FileName};
+auto CI = createInvocationFromCommandLine(Args, Diags, FS);
+assert(CI);
+CI->getFrontendOpts().Disabl

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-05-09 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov added inline comments.



Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:130
+  OS << llvm::formatv(
+  "['{0}'_{1}, '{2}'_{3}) => ['{4}'_{5}, '{6}'_{7})\n",
+  PrintToken(File.SpelledTokens[M.BeginSpelled]), M.BeginSpelled,

ilya-biryukov wrote:
> sammccall wrote:
> > sammccall wrote:
> > > As predicted :-) I think these `_` suffixes are a maintenance 
> > > hazard.
> > > 
> > > In practice, the assertions are likely to add them by copy-pasting the 
> > > test output.
> > > 
> > > They guard against a class of error that doesn't seem very likely, and in 
> > > practice they don't even really prevent it (because of the copy/paste 
> > > issue).
> > > 
> > > I'd suggest just dropping them, and living with the test assertions being 
> > > slightly ambiguous.
> > > 
> > > Failing that, some slightly trickier formatting could give more context:
> > > 
> > > `A B C D E F` --> `A B  ... E F`
> > > 
> > > With special cases for smaller numbers of tokens. I don't like the 
> > > irregularity of that approach though.
> > (this one is still open)
> Will make sure to do something about it before submitting
As mentioned in the offline conversations, we both agree that we don't want to 
have ambiguous test-cases.
The proposed alternative was putting the counters of tokens **of the same 
kind** in the same token stream, with a goal of making updating test-cases 
simpler (now one would need to update only indicies of the tokens they 
changed). 

After playing around with the idea, I think this complicates the dumping 
function too much. The easiest way to update the test is to run it and 
copy-paste the expected output.
So I'd keep as is and avoiding the extra complexity if that's ok



Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:226
+  TokenBuffer B(SourceMgr);
+  for (unsigned I = 0; I < Expanded.size(); ++I) {
+auto FID =

sammccall wrote:
> Is there a reason we do this at the end instead of as tokens are collected?
No strong reason, just wanted to keep the preprocessor callback minimal


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-05-09 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov updated this revision to Diff 198857.
ilya-biryukov added a comment.

- Move the constuction logic to a separate class, split into multiple methods.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887

Files:
  clang/include/clang/Tooling/Syntax/Tokens.h
  clang/lib/Tooling/CMakeLists.txt
  clang/lib/Tooling/Syntax/CMakeLists.txt
  clang/lib/Tooling/Syntax/Tokens.cpp
  clang/unittests/Tooling/CMakeLists.txt
  clang/unittests/Tooling/Syntax/CMakeLists.txt
  clang/unittests/Tooling/Syntax/TokensTest.cpp

Index: clang/unittests/Tooling/Syntax/TokensTest.cpp
===
--- /dev/null
+++ clang/unittests/Tooling/Syntax/TokensTest.cpp
@@ -0,0 +1,622 @@
+//===- TokensTest.cpp -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "clang/Tooling/Syntax/Tokens.h"
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/Expr.h"
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticIDs.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/FileSystemOptions.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TokenKinds.def"
+#include "clang/Basic/TokenKinds.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendAction.h"
+#include "clang/Frontend/Utils.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "clang/Lex/Token.h"
+#include "clang/Tooling/Tooling.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/ADT/None.h"
+#include "llvm/ADT/Optional.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Support/raw_os_ostream.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Testing/Support/Annotations.h"
+#include "llvm/Testing/Support/SupportHelpers.h"
+#include "gmock/gmock-matchers.h"
+#include "gmock/gmock-more-matchers.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+using namespace clang;
+using namespace clang::syntax;
+
+using llvm::ValueIs;
+using ::testing::AllOf;
+using ::testing::Contains;
+using ::testing::ElementsAre;
+using ::testing::Matcher;
+using ::testing::Not;
+using ::testing::StartsWith;
+
+namespace {
+// Checks the passed ArrayRef has the same begin() and end() iterators as the
+// argument.
+MATCHER_P(SameRange, A, "") {
+  return A.begin() == arg.begin() && A.end() == arg.end();
+}
+// Matchers for syntax::Token.
+MATCHER_P(Kind, K, "") { return arg.kind() == K; }
+MATCHER_P2(HasText, Text, SourceMgr, "") {
+  return arg.text(*SourceMgr) == Text;
+}
+/// Checks the start and end location of a token are equal to SourceRng.
+MATCHER_P(RangeIs, SourceRng, "") {
+  return arg.location() == SourceRng.first &&
+ arg.endLocation() == SourceRng.second;
+}
+
+class TokenCollectorTest : public ::testing::Test {
+public:
+  /// Run the clang frontend, collect the preprocessed tokens from the frontend
+  /// invocation and store them in this->Buffer.
+  /// This also clears SourceManager before running the compiler.
+  void recordTokens(llvm::StringRef Code) {
+class RecordTokens : public ASTFrontendAction {
+public:
+  explicit RecordTokens(TokenBuffer &Result) : Result(Result) {}
+
+  bool BeginSourceFileAction(CompilerInstance &CI) override {
+assert(!Collector && "expected only a single call to BeginSourceFile");
+Collector.emplace(CI.getPreprocessor());
+return true;
+  }
+  void EndSourceFileAction() override {
+assert(Collector && "BeginSourceFileAction was never called");
+Result = std::move(*Collector).consume();
+  }
+
+  std::unique_ptr
+  CreateASTConsumer(CompilerInstance &CI, StringRef InFile) override {
+return llvm::make_unique();
+  }
+
+private:
+  TokenBuffer &Result;
+  llvm::Optional Collector;
+};
+
+constexpr const char *FileName = "./input.cpp";
+FS->addFile(FileName, time_t(), llvm::MemoryBuffer::getMemBufferCopy(""));
+// Prepare to run a compiler.
+std::vector Args = {"tok-test", "-std=c++03", "-fsyntax-only",
+  FileName};
+auto CI = createInvocationFromCommandLine(Args, Diags, FS);
+assert(CI);
+CI->getFrontendOpts().DisableFree = false;
+CI->getPreproces

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-05-07 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov added inline comments.



Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:130
+  OS << llvm::formatv(
+  "['{0}'_{1}, '{2}'_{3}) => ['{4}'_{5}, '{6}'_{7})\n",
+  PrintToken(File.SpelledTokens[M.BeginSpelled]), M.BeginSpelled,

sammccall wrote:
> sammccall wrote:
> > As predicted :-) I think these `_` suffixes are a maintenance hazard.
> > 
> > In practice, the assertions are likely to add them by copy-pasting the test 
> > output.
> > 
> > They guard against a class of error that doesn't seem very likely, and in 
> > practice they don't even really prevent it (because of the copy/paste 
> > issue).
> > 
> > I'd suggest just dropping them, and living with the test assertions being 
> > slightly ambiguous.
> > 
> > Failing that, some slightly trickier formatting could give more context:
> > 
> > `A B C D E F` --> `A B  ... E F`
> > 
> > With special cases for smaller numbers of tokens. I don't like the 
> > irregularity of that approach though.
> (this one is still open)
Will make sure to do something about it before submitting



Comment at: clang/unittests/Tooling/Syntax/TokensTest.cpp:622
+
+// FIXME: add tests for mapping spelled tokens into offsets.
+

sammccall wrote:
> sammccall wrote:
> > please fix :-)
> (still missing?)
Will make sure to land this before submitting.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-05-07 Thread Sam McCall via Phabricator via cfe-commits
sammccall added inline comments.



Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:216
+llvm::dbgs()
+<< "$[collect-tokens]: "
+<< syntax::Token(T).dumpForTests(SourceMgr) << "\n"

what's "$[collect-tokens]"?
Maybe just "Token: "?



Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:224
+
+TokenBuffer TokenCollector::consume() && {
+  TokenBuffer B(SourceMgr);

this function is now >100 lines long. Can we split it up?




Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:225
+TokenBuffer TokenCollector::consume() && {
+  TokenBuffer B(SourceMgr);
+  for (unsigned I = 0; I < Expanded.size(); ++I) {

this could be a member, which would help splitting up



Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:226
+  TokenBuffer B(SourceMgr);
+  for (unsigned I = 0; I < Expanded.size(); ++I) {
+auto FID =

Is there a reason we do this at the end instead of as tokens are collected?



Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:243
+  llvm::DenseMap NextSpelled;
+  auto ConsumeSpelledUntil = [&](TokenBuffer::MarkedFile &File,
+ SourceLocation L) {

consumespelleduntil and fillgapuntil could be methods, I think



Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:277
+  for (unsigned I = 0; I < Expanded.size() - 1; ++I) {
+auto L = Expanded[I].location();
+if (L.isFileID()) {

the body here could be a method too, I think
i.e. for each (expanded token), process it?



Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:322
+
+  // Some files might have unaccounted spelled tokens at the end.
+  for (auto &F : B.Files) {

and similarly this section



Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:130
+  OS << llvm::formatv(
+  "['{0}'_{1}, '{2}'_{3}) => ['{4}'_{5}, '{6}'_{7})\n",
+  PrintToken(File.SpelledTokens[M.BeginSpelled]), M.BeginSpelled,

sammccall wrote:
> As predicted :-) I think these `_` suffixes are a maintenance hazard.
> 
> In practice, the assertions are likely to add them by copy-pasting the test 
> output.
> 
> They guard against a class of error that doesn't seem very likely, and in 
> practice they don't even really prevent it (because of the copy/paste issue).
> 
> I'd suggest just dropping them, and living with the test assertions being 
> slightly ambiguous.
> 
> Failing that, some slightly trickier formatting could give more context:
> 
> `A B C D E F` --> `A B  ... E F`
> 
> With special cases for smaller numbers of tokens. I don't like the 
> irregularity of that approach though.
(this one is still open)



Comment at: clang/unittests/Tooling/Syntax/TokensTest.cpp:622
+
+// FIXME: add tests for mapping spelled tokens into offsets.
+

sammccall wrote:
> please fix :-)
(still missing?)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-05-02 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov updated this revision to Diff 197803.
ilya-biryukov added a comment.

- Get only expanded stream from the preprocessor, recompute mappings and 
spelled stream separately


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887

Files:
  clang/include/clang/Tooling/Syntax/Tokens.h
  clang/lib/Tooling/CMakeLists.txt
  clang/lib/Tooling/Syntax/CMakeLists.txt
  clang/lib/Tooling/Syntax/Tokens.cpp
  clang/unittests/Tooling/CMakeLists.txt
  clang/unittests/Tooling/Syntax/CMakeLists.txt
  clang/unittests/Tooling/Syntax/TokensTest.cpp

Index: clang/unittests/Tooling/Syntax/TokensTest.cpp
===
--- /dev/null
+++ clang/unittests/Tooling/Syntax/TokensTest.cpp
@@ -0,0 +1,622 @@
+//===- TokensTest.cpp -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "clang/Tooling/Syntax/Tokens.h"
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/Expr.h"
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticIDs.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/FileSystemOptions.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TokenKinds.def"
+#include "clang/Basic/TokenKinds.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendAction.h"
+#include "clang/Frontend/Utils.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "clang/Lex/Token.h"
+#include "clang/Tooling/Tooling.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/ADT/None.h"
+#include "llvm/ADT/Optional.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Support/raw_os_ostream.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Testing/Support/Annotations.h"
+#include "llvm/Testing/Support/SupportHelpers.h"
+#include "gmock/gmock-matchers.h"
+#include "gmock/gmock-more-matchers.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+using namespace clang;
+using namespace clang::syntax;
+
+using llvm::ValueIs;
+using ::testing::AllOf;
+using ::testing::Contains;
+using ::testing::ElementsAre;
+using ::testing::Matcher;
+using ::testing::Not;
+using ::testing::StartsWith;
+
+namespace {
+// Checks the passed ArrayRef has the same begin() and end() iterators as the
+// argument.
+MATCHER_P(SameRange, A, "") {
+  return A.begin() == arg.begin() && A.end() == arg.end();
+}
+// Matchers for syntax::Token.
+MATCHER_P(Kind, K, "") { return arg.kind() == K; }
+MATCHER_P2(HasText, Text, SourceMgr, "") {
+  return arg.text(*SourceMgr) == Text;
+}
+/// Checks the start and end location of a token are equal to SourceRng.
+MATCHER_P(RangeIs, SourceRng, "") {
+  return arg.location() == SourceRng.first &&
+ arg.endLocation() == SourceRng.second;
+}
+
+class TokenCollectorTest : public ::testing::Test {
+public:
+  /// Run the clang frontend, collect the preprocessed tokens from the frontend
+  /// invocation and store them in this->Buffer.
+  /// This also clears SourceManager before running the compiler.
+  void recordTokens(llvm::StringRef Code) {
+class RecordTokens : public ASTFrontendAction {
+public:
+  explicit RecordTokens(TokenBuffer &Result) : Result(Result) {}
+
+  bool BeginSourceFileAction(CompilerInstance &CI) override {
+assert(!Collector && "expected only a single call to BeginSourceFile");
+Collector.emplace(CI.getPreprocessor());
+return true;
+  }
+  void EndSourceFileAction() override {
+assert(Collector && "BeginSourceFileAction was never called");
+Result = std::move(*Collector).consume();
+  }
+
+  std::unique_ptr
+  CreateASTConsumer(CompilerInstance &CI, StringRef InFile) override {
+return llvm::make_unique();
+  }
+
+private:
+  TokenBuffer &Result;
+  llvm::Optional Collector;
+};
+
+constexpr const char *FileName = "./input.cpp";
+FS->addFile(FileName, time_t(), llvm::MemoryBuffer::getMemBufferCopy(""));
+// Prepare to run a compiler.
+std::vector Args = {"tok-test", "-std=c++03", "-fsyntax-only",
+  FileName};
+auto CI = createInvocationFromCommandLine(Args, Diags, FS);
+assert(CI);
+CI->getFrontendOpts().DisableFree = false;

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-04-30 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov updated this revision to Diff 197279.
ilya-biryukov added a comment.

- Simplify collection of tokens
- Move dumping code to the bottom of the file


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887

Files:
  clang/include/clang/Tooling/Syntax/Tokens.h
  clang/lib/Tooling/CMakeLists.txt
  clang/lib/Tooling/Syntax/CMakeLists.txt
  clang/lib/Tooling/Syntax/Tokens.cpp
  clang/unittests/Tooling/CMakeLists.txt
  clang/unittests/Tooling/Syntax/CMakeLists.txt
  clang/unittests/Tooling/Syntax/TokensTest.cpp

Index: clang/unittests/Tooling/Syntax/TokensTest.cpp
===
--- /dev/null
+++ clang/unittests/Tooling/Syntax/TokensTest.cpp
@@ -0,0 +1,624 @@
+//===- TokensTest.cpp -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "clang/Tooling/Syntax/Tokens.h"
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/Expr.h"
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticIDs.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/FileSystemOptions.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TokenKinds.def"
+#include "clang/Basic/TokenKinds.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendAction.h"
+#include "clang/Frontend/Utils.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "clang/Lex/Token.h"
+#include "clang/Tooling/Tooling.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/ADT/None.h"
+#include "llvm/ADT/Optional.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Support/raw_os_ostream.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Testing/Support/Annotations.h"
+#include "llvm/Testing/Support/SupportHelpers.h"
+#include "gmock/gmock-matchers.h"
+#include "gmock/gmock-more-matchers.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+using namespace clang;
+using namespace clang::syntax;
+
+using llvm::ValueIs;
+using ::testing::AllOf;
+using ::testing::Contains;
+using ::testing::ElementsAre;
+using ::testing::Matcher;
+using ::testing::Not;
+using ::testing::StartsWith;
+
+namespace {
+// Checks the passed ArrayRef has the same begin() and end() iterators as the
+// argument.
+MATCHER_P(SameRange, A, "") {
+  return A.begin() == arg.begin() && A.end() == arg.end();
+}
+// Matchers for syntax::Token.
+MATCHER_P(Kind, K, "") { return arg.kind() == K; }
+MATCHER_P2(HasText, Text, SourceMgr, "") {
+  return arg.text(*SourceMgr) == Text;
+}
+/// Checks the start and end location of a token are equal to SourceRng.
+MATCHER_P(RangeIs, SourceRng, "") {
+  return arg.location() == SourceRng.first &&
+ arg.endLocation() == SourceRng.second;
+}
+
+class TokenCollectorTest : public ::testing::Test {
+public:
+  /// Run the clang frontend, collect the preprocessed tokens from the frontend
+  /// invocation and store them in this->Buffer.
+  /// This also clears SourceManager before running the compiler.
+  void recordTokens(llvm::StringRef Code) {
+class RecordTokens : public ASTFrontendAction {
+public:
+  explicit RecordTokens(TokenBuffer &Result) : Result(Result) {}
+
+  bool BeginSourceFileAction(CompilerInstance &CI) override {
+assert(!Collector && "expected only a single call to BeginSourceFile");
+Collector.emplace(CI.getPreprocessor());
+return true;
+  }
+  void EndSourceFileAction() override {
+assert(Collector && "BeginSourceFileAction was never called");
+Result = std::move(*Collector).consume();
+  }
+
+  std::unique_ptr
+  CreateASTConsumer(CompilerInstance &CI, StringRef InFile) override {
+return llvm::make_unique();
+  }
+
+private:
+  TokenBuffer &Result;
+  llvm::Optional Collector;
+};
+
+constexpr const char *FileName = "./input.cpp";
+FS->addFile(FileName, time_t(), llvm::MemoryBuffer::getMemBufferCopy(""));
+// Prepare to run a compiler.
+std::vector Args = {"tok-test", "-std=c++03", "-fsyntax-only",
+  FileName};
+auto CI = createInvocationFromCommandLine(Args, Diags, FS);
+assert(CI);
+CI->getFrontendOpts().DisableFree = false;
+CI->getPreprocess

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-04-29 Thread Sam McCall via Phabricator via cfe-commits
sammccall accepted this revision.
sammccall marked an inline comment as done.
sammccall added a comment.
This revision is now accepted and ready to land.

Rest is details only.




Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:59
+  explicit Token(const clang::Token &T);
+
+  tok::TokenKind kind() const { return Kind; }

ilya-biryukov wrote:
> sammccall wrote:
> > Token should be copyable I think?
> Sure, they are copyable. Am I missing something?
No. I think I got confused by the explicit clang::Token constructor.



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:62
+  SourceLocation location() const { return Location; }
+  SourceLocation endLocation() const {
+return Location.getLocWithOffset(Length);

ilya-biryukov wrote:
> sammccall wrote:
> > sadly, this would need documentation - it's the one-past-the-end location, 
> > not the last character in the token, not the location of the "next" token, 
> > or the location of the "next" character...
> > 
> > I do wonder whether this is actually the right function to expose...
> >  Do you ever care about end but not start? (The reverse seems likelier). 
> > Having two independent source location accessors obscures the invariant 
> > that they have the same file ID.
> > 
> > I think exposing a `FileRange` accessor instead might be better, but for 
> > now you could also make callers use 
> > `Tok.location().getLocWithOffset(Tok.length())` until we know it's the 
> > right pattern to encourage
> Added a comment. With the comment and an implementation being inline and 
> trivial, I don't think anyone would have trouble understanding what this 
> method does.
(this is ok. I do think a FileRange accessor would make client code more 
readable/less error-prone. Let's discuss offline)



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:78
+  /// For debugging purposes.
+  std::string str() const;
+

ilya-biryukov wrote:
> sammccall wrote:
> > (do we need to have two names for this version?)
> You mean to have distinct names for two different overloads?
> I wouldn't do it, they both have fairly similar outputs, could add a small 
> comment that the one with SourceManager should probably be preferred if you 
> have one.
> 
No sorry, I meant do we need both str() and operator<<



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:95
+  /// End offset (exclusive) in a corresponding file.
+  unsigned End = 0;
+};

ilya-biryukov wrote:
> sammccall wrote:
> > may want to offer `length()` and `StringRef text(const SourceManager&)`
> SG.
> WDYT of exposing the struct members via getters too?
> 
> This would mean uniform access for all things (beginOffset, endOffset, 
> length) and adding a ctor that checks invariants (Begin <= End, 
> File.isValid()) would also fit naturally.
> 
that sounds fine to me, though I don't feel strongly



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:75
+
+  /// For debugging purposes. More verbose than the other overload, but 
requries
+  /// a source manager.

there is no "other overload"



Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:77
+
+std::string TokenBuffer::dumpForTests() const {
+  auto PrintToken = [this](const syntax::Token &T) -> std::string {

Nit: I'd suggest moving this all the way to the bottom or something? It's 
pretty big and seems a bit "in the way" here.



Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:130
+  OS << llvm::formatv(
+  "['{0}'_{1}, '{2}'_{3}) => ['{4}'_{5}, '{6}'_{7})\n",
+  PrintToken(File.SpelledTokens[M.BeginSpelled]), M.BeginSpelled,

As predicted :-) I think these `_` suffixes are a maintenance hazard.

In practice, the assertions are likely to add them by copy-pasting the test 
output.

They guard against a class of error that doesn't seem very likely, and in 
practice they don't even really prevent it (because of the copy/paste issue).

I'd suggest just dropping them, and living with the test assertions being 
slightly ambiguous.

Failing that, some slightly trickier formatting could give more context:

`A B C D E F` --> `A B  ... E F`

With special cases for smaller numbers of tokens. I don't like the irregularity 
of that approach though.



Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:156
+  // Find the first mapping that produced tokens after \p Expanded.
+  auto It = llvm::upper_bound(
+  File.Mappings, ExpandedIndex,

if you prefer, this is

```
auto It = llvm::bsearch(File.Mappings,
[&](const Mapping &M) { return M.BeginExpanded >= 
Expanded; });
```
I *think* this is clear enough that you can drop the comment, though I may be 
biased.



Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:94
+TokenBuffer::expandedToRaw(const synt

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-04-26 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov updated this revision to Diff 196849.
ilya-biryukov added a comment.

- Update a comment


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887

Files:
  clang/include/clang/Tooling/Syntax/Tokens.h
  clang/lib/Tooling/CMakeLists.txt
  clang/lib/Tooling/Syntax/CMakeLists.txt
  clang/lib/Tooling/Syntax/Tokens.cpp
  clang/unittests/Tooling/CMakeLists.txt
  clang/unittests/Tooling/Syntax/CMakeLists.txt
  clang/unittests/Tooling/Syntax/TokensTest.cpp

Index: clang/unittests/Tooling/Syntax/TokensTest.cpp
===
--- /dev/null
+++ clang/unittests/Tooling/Syntax/TokensTest.cpp
@@ -0,0 +1,624 @@
+//===- TokensTest.cpp -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "clang/Tooling/Syntax/Tokens.h"
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/Expr.h"
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticIDs.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/FileSystemOptions.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TokenKinds.def"
+#include "clang/Basic/TokenKinds.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendAction.h"
+#include "clang/Frontend/Utils.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "clang/Lex/Token.h"
+#include "clang/Tooling/Tooling.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/ADT/None.h"
+#include "llvm/ADT/Optional.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Support/raw_os_ostream.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Testing/Support/Annotations.h"
+#include "llvm/Testing/Support/SupportHelpers.h"
+#include "gmock/gmock-matchers.h"
+#include "gmock/gmock-more-matchers.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+using namespace clang;
+using namespace clang::syntax;
+
+using llvm::ValueIs;
+using ::testing::AllOf;
+using ::testing::Contains;
+using ::testing::ElementsAre;
+using ::testing::Matcher;
+using ::testing::Not;
+using ::testing::StartsWith;
+
+namespace {
+// Checks the passed ArrayRef has the same begin() and end() iterators as the
+// argument.
+MATCHER_P(SameRange, A, "") {
+  return A.begin() == arg.begin() && A.end() == arg.end();
+}
+// Matchers for syntax::Token.
+MATCHER_P(Kind, K, "") { return arg.kind() == K; }
+MATCHER_P2(HasText, Text, SourceMgr, "") {
+  return arg.text(*SourceMgr) == Text;
+}
+/// Checks the start and end location of a token are equal to SourceRng.
+MATCHER_P(RangeIs, SourceRng, "") {
+  return arg.location() == SourceRng.first &&
+ arg.endLocation() == SourceRng.second;
+}
+
+class TokenCollectorTest : public ::testing::Test {
+public:
+  /// Run the clang frontend, collect the preprocessed tokens from the frontend
+  /// invocation and store them in this->Buffer.
+  /// This also clears SourceManager before running the compiler.
+  void recordTokens(llvm::StringRef Code) {
+class RecordTokens : public ASTFrontendAction {
+public:
+  explicit RecordTokens(TokenBuffer &Result) : Result(Result) {}
+
+  bool BeginSourceFileAction(CompilerInstance &CI) override {
+assert(!Collector && "expected only a single call to BeginSourceFile");
+Collector.emplace(CI.getPreprocessor());
+return true;
+  }
+  void EndSourceFileAction() override {
+assert(Collector && "BeginSourceFileAction was never called");
+Result = std::move(*Collector).consume();
+  }
+
+  std::unique_ptr
+  CreateASTConsumer(CompilerInstance &CI, StringRef InFile) override {
+return llvm::make_unique();
+  }
+
+private:
+  TokenBuffer &Result;
+  llvm::Optional Collector;
+};
+
+constexpr const char *FileName = "./input.cpp";
+FS->addFile(FileName, time_t(), llvm::MemoryBuffer::getMemBufferCopy(""));
+// Prepare to run a compiler.
+std::vector Args = {"tok-test", "-std=c++03", "-fsyntax-only",
+  FileName};
+auto CI = createInvocationFromCommandLine(Args, Diags, FS);
+assert(CI);
+CI->getFrontendOpts().DisableFree = false;
+CI->getPreprocessorOpts().addRemappedFile(
+FileName, llvm::MemoryBu

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-04-26 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov added a comment.

@sammccall, could we do another round of review? I think it's perfect now...
(Just kidding :-) )


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-04-26 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov updated this revision to Diff 196847.
ilya-biryukov added a comment.

- s/llvm::unittest::ValueIs/llvm::ValueIs.
- Add tests for empty macro expansions and directives around macro expansions.
- Record all gaps between spelled and expanded tokens.
- Tweak test dump to distinguish different tokens with the same spelling.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887

Files:
  clang/include/clang/Tooling/Syntax/Tokens.h
  clang/lib/Tooling/CMakeLists.txt
  clang/lib/Tooling/Syntax/CMakeLists.txt
  clang/lib/Tooling/Syntax/Tokens.cpp
  clang/unittests/Tooling/CMakeLists.txt
  clang/unittests/Tooling/Syntax/CMakeLists.txt
  clang/unittests/Tooling/Syntax/TokensTest.cpp

Index: clang/unittests/Tooling/Syntax/TokensTest.cpp
===
--- /dev/null
+++ clang/unittests/Tooling/Syntax/TokensTest.cpp
@@ -0,0 +1,624 @@
+//===- TokensTest.cpp -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "clang/Tooling/Syntax/Tokens.h"
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/Expr.h"
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticIDs.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/FileSystemOptions.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TokenKinds.def"
+#include "clang/Basic/TokenKinds.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendAction.h"
+#include "clang/Frontend/Utils.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "clang/Lex/Token.h"
+#include "clang/Tooling/Tooling.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/ADT/None.h"
+#include "llvm/ADT/Optional.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Support/raw_os_ostream.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Testing/Support/Annotations.h"
+#include "llvm/Testing/Support/SupportHelpers.h"
+#include "gmock/gmock-matchers.h"
+#include "gmock/gmock-more-matchers.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+using namespace clang;
+using namespace clang::syntax;
+
+using llvm::ValueIs;
+using ::testing::AllOf;
+using ::testing::Contains;
+using ::testing::ElementsAre;
+using ::testing::Matcher;
+using ::testing::Not;
+using ::testing::StartsWith;
+
+namespace {
+// Checks the passed ArrayRef has the same begin() and end() iterators as the
+// argument.
+MATCHER_P(SameRange, A, "") {
+  return A.begin() == arg.begin() && A.end() == arg.end();
+}
+// Matchers for syntax::Token.
+MATCHER_P(Kind, K, "") { return arg.kind() == K; }
+MATCHER_P2(HasText, Text, SourceMgr, "") {
+  return arg.text(*SourceMgr) == Text;
+}
+/// Checks the start and end location of a token are equal to SourceRng.
+MATCHER_P(RangeIs, SourceRng, "") {
+  return arg.location() == SourceRng.first &&
+ arg.endLocation() == SourceRng.second;
+}
+
+class TokenCollectorTest : public ::testing::Test {
+public:
+  /// Run the clang frontend, collect the preprocessed tokens from the frontend
+  /// invocation and store them in this->Buffer.
+  /// This also clears SourceManager before running the compiler.
+  void recordTokens(llvm::StringRef Code) {
+class RecordTokens : public ASTFrontendAction {
+public:
+  explicit RecordTokens(TokenBuffer &Result) : Result(Result) {}
+
+  bool BeginSourceFileAction(CompilerInstance &CI) override {
+assert(!Collector && "expected only a single call to BeginSourceFile");
+Collector.emplace(CI.getPreprocessor());
+return true;
+  }
+  void EndSourceFileAction() override {
+assert(Collector && "BeginSourceFileAction was never called");
+Result = std::move(*Collector).consume();
+  }
+
+  std::unique_ptr
+  CreateASTConsumer(CompilerInstance &CI, StringRef InFile) override {
+return llvm::make_unique();
+  }
+
+private:
+  TokenBuffer &Result;
+  llvm::Optional Collector;
+};
+
+constexpr const char *FileName = "./input.cpp";
+FS->addFile(FileName, time_t(), llvm::MemoryBuffer::getMemBufferCopy(""));
+// Prepare to run a compiler.
+std::vector Args = {"tok-test", "-std=c++03", "-fsyntax-only",
+

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-04-24 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov added inline comments.



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:59
+  explicit Token(const clang::Token &T);
+
+  tok::TokenKind kind() const { return Kind; }

sammccall wrote:
> Token should be copyable I think?
Sure, they are copyable. Am I missing something?



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:62
+  SourceLocation location() const { return Location; }
+  SourceLocation endLocation() const {
+return Location.getLocWithOffset(Length);

sammccall wrote:
> sadly, this would need documentation - it's the one-past-the-end location, 
> not the last character in the token, not the location of the "next" token, or 
> the location of the "next" character...
> 
> I do wonder whether this is actually the right function to expose...
>  Do you ever care about end but not start? (The reverse seems likelier). 
> Having two independent source location accessors obscures the invariant that 
> they have the same file ID.
> 
> I think exposing a `FileRange` accessor instead might be better, but for now 
> you could also make callers use 
> `Tok.location().getLocWithOffset(Tok.length())` until we know it's the right 
> pattern to encourage
Added a comment. With the comment and an implementation being inline and 
trivial, I don't think anyone would have trouble understanding what this method 
does.



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:78
+  /// For debugging purposes.
+  std::string str() const;
+

sammccall wrote:
> (do we need to have two names for this version?)
You mean to have distinct names for two different overloads?
I wouldn't do it, they both have fairly similar outputs, could add a small 
comment that the one with SourceManager should probably be preferred if you 
have one.




Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:95
+  /// End offset (exclusive) in a corresponding file.
+  unsigned End = 0;
+};

sammccall wrote:
> may want to offer `length()` and `StringRef text(const SourceManager&)`
SG.
WDYT of exposing the struct members via getters too?

This would mean uniform access for all things (beginOffset, endOffset, length) 
and adding a ctor that checks invariants (Begin <= End, File.isValid()) would 
also fit naturally.




Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:111
+///   replacements,
+///2. Raw tokens: corresponding directly to the source code of a file 
before
+///   any macro replacements occurred.

sammccall wrote:
> I think "Spelled" would be a better name than "Raw" here. (Despite being 
> longer and grammatically awkward)
> 
> - The spelled/expanded dichotomy is well-established in clang-land, and this 
> will help understand how they relate to each other and why the difference is 
> important. It's true we'd be extending the meaning a bit (all PP activity, 
> not just macros), but that applies to Expanded too. I think consistency is 
> valuable here.
> - "Raw" regarding tokens has an established meaning (raw lexer mode, 
> raw_identifier etc) that AIUI you're not using here, but plausibly could be. 
> So I think this may cause some confusion.
> 
> There's like 100 occurrences of "raw" in this patch, happy to discuss first 
> to avoid churn.
Spelled looks fine. The only downside is that it intersects with 
SourceManager's definition of spelled (which is very similar, but not the same).
Still like it more than raw.



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:160
+  llvm::Optional>
+  findRawByExpanded(llvm::ArrayRef Expanded) const;
+

sammccall wrote:
> sammccall wrote:
> > Not totally clear what the legal arguments are here:
> >  - any sequence of tokens?
> >  - a subrange of expandedTokens() (must point into that array) - most 
> > obvious, but in this case why does mapping an empty range always fail?
> >  - any subrange of expandedTokens(), compared by value? - I think this is 
> > what the implementation assumes
> > 
> I think `by` is a little weak semantically, I think it just means "the 
> parameter is expanded tokens".
> 
> `findRawForExpanded()` seems clearer.
> `rawTokensForExpanded()` would be clearer still I think.
Yes, `Expanded` should be a subrange of `expandedTokens()`. Added a comment.



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:165
+  llvm::Optional
+  findOffsetsByExpanded(llvm::ArrayRef Expanded) const;
+

sammccall wrote:
> this still seems a bit non-orthogonal:
>  - it fails if and only if findRawByExpanded fails, but that isn't visible to 
> callers
>  - there's no way to avoid the redundant work of doing the mapping twice
>  - the name doesn't indicate that it maps to raw locations as well as 
> translating tokens to locations
>  - seems likely to lead to combinatorial explosion, e.g. if we want a pair of 
> 

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-04-24 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov updated this revision to Diff 196462.
ilya-biryukov marked 53 inline comments as done.
ilya-biryukov added a comment.

- Simplify rawByExpanded by using a helper function.
- Add a FIXME to add spelled-to-expanded mapping
- s/raw*/spelled*
- Split token collector and token buffer tests
- Rewrite dump function for use in tests
- Simplify tests by using a dumpForTests function
- Address other comments


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887

Files:
  clang/include/clang/Tooling/Syntax/Tokens.h
  clang/lib/Tooling/CMakeLists.txt
  clang/lib/Tooling/Syntax/CMakeLists.txt
  clang/lib/Tooling/Syntax/Tokens.cpp
  clang/unittests/Tooling/CMakeLists.txt
  clang/unittests/Tooling/Syntax/CMakeLists.txt
  clang/unittests/Tooling/Syntax/TokensTest.cpp

Index: clang/unittests/Tooling/Syntax/TokensTest.cpp
===
--- /dev/null
+++ clang/unittests/Tooling/Syntax/TokensTest.cpp
@@ -0,0 +1,590 @@
+//===- TokensTest.cpp -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "clang/Tooling/Syntax/Tokens.h"
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/Expr.h"
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticIDs.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/FileSystemOptions.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TokenKinds.def"
+#include "clang/Basic/TokenKinds.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendAction.h"
+#include "clang/Frontend/Utils.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "clang/Lex/Token.h"
+#include "clang/Tooling/Tooling.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/ADT/None.h"
+#include "llvm/ADT/Optional.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Support/raw_os_ostream.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Testing/Support/Annotations.h"
+#include "llvm/Testing/Support/SupportHelpers.h"
+#include "gmock/gmock-matchers.h"
+#include "gmock/gmock-more-matchers.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+using namespace clang;
+using namespace clang::syntax;
+
+using llvm::unittest::ValueIs;
+using ::testing::AllOf;
+using ::testing::Contains;
+using ::testing::ElementsAre;
+using ::testing::Matcher;
+using ::testing::Not;
+using ::testing::StartsWith;
+
+namespace {
+// Checks the passed ArrayRef has the same begin() and end() iterators as the
+// argument.
+MATCHER_P(SameRange, A, "") {
+  return A.begin() == arg.begin() && A.end() == arg.end();
+}
+// Matchers for syntax::Token.
+MATCHER_P(Kind, K, "") { return arg.kind() == K; }
+MATCHER_P2(HasText, Text, SourceMgr, "") {
+  return arg.text(*SourceMgr) == Text;
+}
+/// Checks the start and end location of a token are equal to SourceRng.
+MATCHER_P(RangeIs, SourceRng, "") {
+  return arg.location() == SourceRng.first &&
+ arg.endLocation() == SourceRng.second;
+}
+
+class TokenCollectorTest : public ::testing::Test {
+public:
+  /// Run the clang frontend, collect the preprocessed tokens from the frontend
+  /// invocation and store them in this->Buffer.
+  /// This also clears SourceManager before running the compiler.
+  void recordTokens(llvm::StringRef Code) {
+class RecordTokens : public ASTFrontendAction {
+public:
+  explicit RecordTokens(TokenBuffer &Result) : Result(Result) {}
+
+  bool BeginSourceFileAction(CompilerInstance &CI) override {
+assert(!Collector && "expected only a single call to BeginSourceFile");
+Collector.emplace(CI.getPreprocessor());
+return true;
+  }
+  void EndSourceFileAction() override {
+assert(Collector && "BeginSourceFileAction was never called");
+Result = std::move(*Collector).consume();
+  }
+
+  std::unique_ptr
+  CreateASTConsumer(CompilerInstance &CI, StringRef InFile) override {
+return llvm::make_unique();
+  }
+
+private:
+  TokenBuffer &Result;
+  llvm::Optional Collector;
+};
+
+constexpr const char *FileName = "./input.cpp";
+FS->addFile(FileName, time_t(), llvm::MemoryBuffer::getMemBufferCopy(""));
+// Prepare to run a compiler.
+s

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-04-16 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov updated this revision to Diff 195425.
ilya-biryukov added a comment.

- Simplify rawByExpanded by using a helper function.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887

Files:
  clang/include/clang/Tooling/Syntax/Tokens.h
  clang/lib/Tooling/CMakeLists.txt
  clang/lib/Tooling/Syntax/CMakeLists.txt
  clang/lib/Tooling/Syntax/Tokens.cpp
  clang/unittests/Tooling/CMakeLists.txt
  clang/unittests/Tooling/Syntax/CMakeLists.txt
  clang/unittests/Tooling/Syntax/TokensTest.cpp

Index: clang/unittests/Tooling/Syntax/TokensTest.cpp
===
--- /dev/null
+++ clang/unittests/Tooling/Syntax/TokensTest.cpp
@@ -0,0 +1,629 @@
+//===- TokensTest.cpp -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "clang/Tooling/Syntax/Tokens.h"
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/Expr.h"
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticIDs.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/FileSystemOptions.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TokenKinds.def"
+#include "clang/Basic/TokenKinds.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendAction.h"
+#include "clang/Frontend/Utils.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "clang/Lex/Token.h"
+#include "clang/Tooling/Tooling.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Support/raw_os_ostream.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Testing/Support/Annotations.h"
+#include "gmock/gmock-more-matchers.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+using namespace clang;
+using namespace clang::syntax;
+
+using ::testing::AllOf;
+using ::testing::Contains;
+using ::testing::ElementsAre;
+using ::testing::Matcher;
+using ::testing::Pointwise;
+
+namespace {
+// Matchers for syntax::Token.
+MATCHER_P(Kind, K, "") { return arg.kind() == K; }
+MATCHER_P2(HasText, Text, SourceMgr, "") {
+  return arg.text(*SourceMgr) == Text;
+}
+MATCHER_P2(IsIdent, Text, SourceMgr, "") {
+  return arg.kind() == tok::identifier && arg.text(*SourceMgr) == Text;
+}
+/// Checks the start and end location of a token are equal to SourceRng.
+MATCHER_P(RangeIs, SourceRng, "") {
+  return arg.location() == SourceRng.first &&
+ arg.endLocation() == SourceRng.second;
+}
+/// Checks the passed tuple has two similar tokens, i.e. both are of the same
+/// kind and have the same text if they are identifiers.
+/// Ignores differences in kind between the raw and non-raw mode.
+MATCHER_P(IsSameToken, SourceMgr, "") {
+  auto ToEquivalenceClass = [](tok::TokenKind Kind) {
+if (Kind == tok::identifier || Kind == tok::raw_identifier ||
+tok::getKeywordSpelling(Kind) != nullptr)
+  return tok::identifier;
+if (Kind == tok::string_literal || Kind == tok::header_name)
+  return tok::string_literal;
+return Kind;
+  };
+
+  auto &L = std::get<0>(arg);
+  auto &R = std::get<1>(arg);
+  if (ToEquivalenceClass(L.kind()) != ToEquivalenceClass(R.kind()))
+return false;
+  return L.text(*SourceMgr) == L.text(*SourceMgr);
+}
+} // namespace
+
+// Actual test fixture lives in the syntax namespace as it's a friend of
+// TokenBuffer.
+class syntax::TokensTest : public ::testing::Test {
+public:
+  /// Run the clang frontend, collect the preprocessed tokens from the frontend
+  /// invocation and store them in this->Buffer.
+  /// This also clears SourceManager before running the compiler.
+  void recordTokens(llvm::StringRef Code) {
+class RecordTokens : public ASTFrontendAction {
+public:
+  explicit RecordTokens(TokenBuffer &Result) : Result(Result) {}
+
+  bool BeginSourceFileAction(CompilerInstance &CI) override {
+assert(!Collector && "expected only a single call to BeginSourceFile");
+Collector.emplace(CI.getPreprocessor());
+return true;
+  }
+  void EndSourceFileAction() override {
+assert(Collector && "BeginSourceFileAction was never called");
+Result = std::move(*Collector).consume();
+  }
+
+  std::unique_ptr
+  Cr

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-04-16 Thread Sam McCall via Phabricator via cfe-commits
sammccall added inline comments.



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:146
+  llvm::Optional>
+  toOffsetRange(const Token *Begin, const Token *End,
+const SourceManager &SM) const;

ilya-biryukov wrote:
> ilya-biryukov wrote:
> > sammccall wrote:
> > > I'd think this would map a character range onto a character range, or a 
> > > token range onto a token range - is there some reason otherwise?
> > No particular reason, this was driven by its usage in function computing 
> > replacements (not in this patch, see [[ 
> > https://github.com/ilya-biryukov/llvm-project/blob/syntax/clang/lib/Tooling/Syntax/ComputeReplacements.cpp
> >  | github prototype ]], if interested, but keep in mind the prototype is 
> > not ready for review yet).
> > 
> > Mapping to a range of "raw" tokens seems more consistent with our model, 
> > will update accordingly, obtaining a range is easy after one has tokens 
> > from the file.
> Added a method to map from an expanded token range to a raw token range.
> Kept the method that maps an expanded token range to a character range too.
> 
> Note that we don't currently have a way to map from raw token range to 
> expanded, that could be added later, we don't need it for the initial 
> use-case (map the ranges coming from AST nodes into raw source ranges), so I 
> left it out for now.
> Note that we don't currently have a way to map from raw token range to 
> expanded, that could be added later, we don't need it for the initial 
> use-case (map the ranges coming from AST nodes into raw source ranges), so I 
> left it out for now.

That seems fine, can we add a comment somewhere (class header)?
Not exactly a FIXME, but it would clarify that this is in principle a 
bidirectional mapping, just missing some implementation.



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:48
+namespace syntax {
+class TokenBuffer;
+

not needed



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:59
+  explicit Token(const clang::Token &T);
+
+  tok::TokenKind kind() const { return Kind; }

Token should be copyable I think?



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:62
+  SourceLocation location() const { return Location; }
+  SourceLocation endLocation() const {
+return Location.getLocWithOffset(Length);

sadly, this would need documentation - it's the one-past-the-end location, not 
the last character in the token, not the location of the "next" token, or the 
location of the "next" character...

I do wonder whether this is actually the right function to expose...
 Do you ever care about end but not start? (The reverse seems likelier). Having 
two independent source location accessors obscures the invariant that they have 
the same file ID.

I think exposing a `FileRange` accessor instead might be better, but for now 
you could also make callers use `Tok.location().getLocWithOffset(Tok.length())` 
until we know it's the right pattern to encourage



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:78
+  /// For debugging purposes.
+  std::string str() const;
+

(do we need to have two names for this version?)



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:95
+  /// End offset (exclusive) in a corresponding file.
+  unsigned End = 0;
+};

may want to offer `length()` and `StringRef text(const SourceManager&)`



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:111
+///   replacements,
+///2. Raw tokens: corresponding directly to the source code of a file 
before
+///   any macro replacements occurred.

I think "Spelled" would be a better name than "Raw" here. (Despite being longer 
and grammatically awkward)

- The spelled/expanded dichotomy is well-established in clang-land, and this 
will help understand how they relate to each other and why the difference is 
important. It's true we'd be extending the meaning a bit (all PP activity, not 
just macros), but that applies to Expanded too. I think consistency is valuable 
here.
- "Raw" regarding tokens has an established meaning (raw lexer mode, 
raw_identifier etc) that AIUI you're not using here, but plausibly could be. So 
I think this may cause some confusion.

There's like 100 occurrences of "raw" in this patch, happy to discuss first to 
avoid churn.



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:133
+  /// point to one of these tokens.
+  /// FIXME: the notable exception is '>>' being split into two '>'. figure out
+  ///how to deal with it.

It's not clear from this comment what the current behavior *is*, and how it's 
exceptional.
(It's a FIXME, but these sometimes last a while...)



Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:133
+  ///

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-04-11 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov updated this revision to Diff 194683.
ilya-biryukov added a comment.

- Store a source manager in a token buffer


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887

Files:
  clang/include/clang/Tooling/Syntax/Tokens.h
  clang/lib/Tooling/CMakeLists.txt
  clang/lib/Tooling/Syntax/CMakeLists.txt
  clang/lib/Tooling/Syntax/Tokens.cpp
  clang/unittests/Tooling/CMakeLists.txt
  clang/unittests/Tooling/Syntax/CMakeLists.txt
  clang/unittests/Tooling/Syntax/TokensTest.cpp

Index: clang/unittests/Tooling/Syntax/TokensTest.cpp
===
--- /dev/null
+++ clang/unittests/Tooling/Syntax/TokensTest.cpp
@@ -0,0 +1,629 @@
+//===- TokensTest.cpp -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "clang/Tooling/Syntax/Tokens.h"
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/Expr.h"
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticIDs.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/FileSystemOptions.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TokenKinds.def"
+#include "clang/Basic/TokenKinds.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendAction.h"
+#include "clang/Frontend/Utils.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "clang/Lex/Token.h"
+#include "clang/Tooling/Tooling.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Support/raw_os_ostream.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Testing/Support/Annotations.h"
+#include "gmock/gmock-more-matchers.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+using namespace clang;
+using namespace clang::syntax;
+
+using ::testing::AllOf;
+using ::testing::Contains;
+using ::testing::ElementsAre;
+using ::testing::Matcher;
+using ::testing::Pointwise;
+
+namespace {
+// Matchers for syntax::Token.
+MATCHER_P(Kind, K, "") { return arg.kind() == K; }
+MATCHER_P2(HasText, Text, SourceMgr, "") {
+  return arg.text(*SourceMgr) == Text;
+}
+MATCHER_P2(IsIdent, Text, SourceMgr, "") {
+  return arg.kind() == tok::identifier && arg.text(*SourceMgr) == Text;
+}
+/// Checks the start and end location of a token are equal to SourceRng.
+MATCHER_P(RangeIs, SourceRng, "") {
+  return arg.location() == SourceRng.first &&
+ arg.endLocation() == SourceRng.second;
+}
+/// Checks the passed tuple has two similar tokens, i.e. both are of the same
+/// kind and have the same text if they are identifiers.
+/// Ignores differences in kind between the raw and non-raw mode.
+MATCHER_P(IsSameToken, SourceMgr, "") {
+  auto ToEquivalenceClass = [](tok::TokenKind Kind) {
+if (Kind == tok::identifier || Kind == tok::raw_identifier ||
+tok::getKeywordSpelling(Kind) != nullptr)
+  return tok::identifier;
+if (Kind == tok::string_literal || Kind == tok::header_name)
+  return tok::string_literal;
+return Kind;
+  };
+
+  auto &L = std::get<0>(arg);
+  auto &R = std::get<1>(arg);
+  if (ToEquivalenceClass(L.kind()) != ToEquivalenceClass(R.kind()))
+return false;
+  return L.text(*SourceMgr) == L.text(*SourceMgr);
+}
+} // namespace
+
+// Actual test fixture lives in the syntax namespace as it's a friend of
+// TokenBuffer.
+class syntax::TokensTest : public ::testing::Test {
+public:
+  /// Run the clang frontend, collect the preprocessed tokens from the frontend
+  /// invocation and store them in this->Buffer.
+  /// This also clears SourceManager before running the compiler.
+  void recordTokens(llvm::StringRef Code) {
+class RecordTokens : public ASTFrontendAction {
+public:
+  explicit RecordTokens(TokenBuffer &Result) : Result(Result) {}
+
+  bool BeginSourceFileAction(CompilerInstance &CI) override {
+assert(!Collector && "expected only a single call to BeginSourceFile");
+Collector.emplace(CI.getPreprocessor());
+return true;
+  }
+  void EndSourceFileAction() override {
+assert(Collector && "BeginSourceFileAction was never called");
+Result = std::move(*Collector).consume();
+  }
+
+  std::unique_ptr
+  CreateASTCon

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-04-11 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov added a comment.

The new version address most of the comments, there are just a few left in the 
code doing the mapping from Dmitri, I'll look into simplifying and removing the 
possibly redundant checks.
Please take a look at this version, this is very close to the final state.




Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:72
+  /// macro or a name, arguments and parentheses of a function-like macro.
+  llvm::ArrayRef macroTokens(const TokenBuffer &B) const;
+  /// The range covering macroTokens().

ilya-biryukov wrote:
> sammccall wrote:
> > ilya-biryukov wrote:
> > > gribozavr wrote:
> > > > `invocationTokens` or `macroInvocationTokens`
> > > The plan is to eventually include the macro directives tokens, hence the 
> > > name.
> > > `invocationTokens` are somewhat inaccurate in that case, can't come up 
> > > with a better name.
> > I think your reply applies to TokenBuffer::macroTokens(), not 
> > MacroInvocation::macroTokens().
> > 
> > +1 to invocationTokens here.
> Yeah, sorry, I have mixed up these two functions with the same name. Will 
> change to `invocationTokens`.
The Mapping is now internal and only stores the indicies.
The names for two kinds of those are "raw" and "expanded", happy to consider 
alternatives for both.



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:8
+//===--===//
+
+#ifndef LLVM_CLANG_TOOLING_SYNTAX_TOKEN_BUFFER_H

sammccall wrote:
> file comment?
> sometimes they're covered by the class comments, but I think there's a bit to 
> say here.
> in particular the logical model (how we model the preprocessor, the two token 
> streams and types of mappings between them) might go here.
Added a file comment explaining the basic concepts inside the file.



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:32
+public:
+  Token() = default;
+  Token(SourceLocation Location, unsigned Length, tok::TokenKind Kind)

sammccall wrote:
> what's the default state (and why have one?) do you need a way to query 
> whether a token is "valid"?
> (I'd avoid just relying on length() == 0 or location().isInvalid() because 
> it's not obvious to callers this can happen)
Got rid of default state altogether. I haven' seen the use-cases where it's 
important so far.
Using `Optional` for invalid tokens seems like a cleaner design anyway 
(we did not need it so far).



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:60
+
+/// A top-level macro invocation inside a file, e.g.
+///   #define FOO 1+2

sammccall wrote:
> If you're going to say "invocation" in the name, say "use" in the comment (or 
> vice versa!)
The class is now internal to `TokenBuffer` and is called `Mapping`.



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:66
+///macroTokens = {'BAR', '(', '1', ')'}.
+class MacroInvocation {
+public:

sammccall wrote:
> I'd suggest a more natural order is token -> tokenbuffer -> macroinvocation. 
> Forward declare?
> 
> This is because the token buffer exposes token streams, which are probably 
> the second-most important concept in the file (after tokens)
Done. I forgot that you could have members of type `vector`  where `T` is 
incomplete.



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:74
+  /// The range covering macroTokens().
+  std::pair macroRange(const TokenBuffer &B,
+   const SourceManager &SM) const;

sammccall wrote:
> It seems weirdly non-orthogonal to be able to get the range but not the file 
> ID.
> 
> I'd suggest either adding another accessor for that, or returning a `struct 
> BufferRange { FileID File; unsigned Begin, End; }` (with a better name.)
> 
> I favor the latter, because `.first` and `.second` don't offer the reader 
> semantic hints at the callsite, and because that gives a nice place to 
> document the half-open nature of the interval.
> 
> Actually, I'd suggest adding such a struct accessor to Token too.
Done. I've used the name `FileRange` instead (the idea is that it's pointing to 
a substring in a file).
Let me know what you think about the name.




Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:87
+  /// to the identifier token corresponding to a name of the expanded macro.
+  unsigned BeginMacroToken = 0;
+  /// End index in TokenBuffer::macroTokens().

sammccall wrote:
> please rename along with macroTokens() function
This is now `BeginRawToken` and `EndRawToken`.
As usually, open to suggestions wrt to naming.



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:92
+
+/// A list of tokens obtained by lexing and preprocessing a text buffer and a
+/// set of helpers to a

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-04-11 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov updated this revision to Diff 194682.
ilya-biryukov added a comment.

- Fix header comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887

Files:
  clang/include/clang/Tooling/Syntax/Tokens.h
  clang/lib/Tooling/CMakeLists.txt
  clang/lib/Tooling/Syntax/CMakeLists.txt
  clang/lib/Tooling/Syntax/Tokens.cpp
  clang/unittests/Tooling/CMakeLists.txt
  clang/unittests/Tooling/Syntax/CMakeLists.txt
  clang/unittests/Tooling/Syntax/TokensTest.cpp

Index: clang/unittests/Tooling/Syntax/TokensTest.cpp
===
--- /dev/null
+++ clang/unittests/Tooling/Syntax/TokensTest.cpp
@@ -0,0 +1,631 @@
+//===- TokensTest.cpp -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "clang/Tooling/Syntax/Tokens.h"
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/Expr.h"
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticIDs.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/FileSystemOptions.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TokenKinds.def"
+#include "clang/Basic/TokenKinds.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendAction.h"
+#include "clang/Frontend/Utils.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "clang/Lex/Token.h"
+#include "clang/Tooling/Tooling.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Support/raw_os_ostream.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Testing/Support/Annotations.h"
+#include "gmock/gmock-more-matchers.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+using namespace clang;
+using namespace clang::syntax;
+
+using ::testing::AllOf;
+using ::testing::Contains;
+using ::testing::ElementsAre;
+using ::testing::Matcher;
+using ::testing::Pointwise;
+
+namespace {
+// Matchers for syntax::Token.
+MATCHER_P(Kind, K, "") { return arg.kind() == K; }
+MATCHER_P2(HasText, Text, SourceMgr, "") {
+  return arg.text(*SourceMgr) == Text;
+}
+MATCHER_P2(IsIdent, Text, SourceMgr, "") {
+  return arg.kind() == tok::identifier && arg.text(*SourceMgr) == Text;
+}
+/// Checks the start and end location of a token are equal to SourceRng.
+MATCHER_P(RangeIs, SourceRng, "") {
+  return arg.location() == SourceRng.first &&
+ arg.endLocation() == SourceRng.second;
+}
+/// Checks the passed tuple has two similar tokens, i.e. both are of the same
+/// kind and have the same text if they are identifiers.
+/// Ignores differences in kind between the raw and non-raw mode.
+MATCHER_P(IsSameToken, SourceMgr, "") {
+  auto ToEquivalenceClass = [](tok::TokenKind Kind) {
+if (Kind == tok::identifier || Kind == tok::raw_identifier ||
+tok::getKeywordSpelling(Kind) != nullptr)
+  return tok::identifier;
+if (Kind == tok::string_literal || Kind == tok::header_name)
+  return tok::string_literal;
+return Kind;
+  };
+
+  auto &L = std::get<0>(arg);
+  auto &R = std::get<1>(arg);
+  if (ToEquivalenceClass(L.kind()) != ToEquivalenceClass(R.kind()))
+return false;
+  return L.text(*SourceMgr) == L.text(*SourceMgr);
+}
+} // namespace
+
+// Actual test fixture lives in the syntax namespace as it's a friend of
+// TokenBuffer.
+class syntax::TokensTest : public ::testing::Test {
+public:
+  /// Run the clang frontend, collect the preprocessed tokens from the frontend
+  /// invocation and store them in this->Buffer.
+  /// This also clears SourceManager before running the compiler.
+  void recordTokens(llvm::StringRef Code) {
+class RecordTokens : public ASTFrontendAction {
+public:
+  explicit RecordTokens(TokenBuffer &Result) : Result(Result) {}
+
+  bool BeginSourceFileAction(CompilerInstance &CI) override {
+assert(!Collector && "expected only a single call to BeginSourceFile");
+Collector.emplace(CI.getPreprocessor());
+return true;
+  }
+  void EndSourceFileAction() override {
+assert(Collector && "BeginSourceFileAction was never called");
+Result = std::move(*Collector).consume();
+  }
+
+  std::unique_ptr
+  CreateASTConsumer(CompilerInstan

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-04-11 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov updated this revision to Diff 194680.
ilya-biryukov marked 19 inline comments as done.
ilya-biryukov added a comment.

- Remove a spamy debug tracing output.
- Less debug output, move stuff around, more comments.
- Add methods that map expanded tokens to raw tokens.
- Rename toOffsetsRange.
- Remove default ctor of Token.
- Introduce a struct for storing FileID and a pair of offsets.
- Update comments.
- Add a test for macro expansion at the end of the file.
- Misc fixes.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887

Files:
  clang/include/clang/Tooling/Syntax/Tokens.h
  clang/lib/Tooling/CMakeLists.txt
  clang/lib/Tooling/Syntax/CMakeLists.txt
  clang/lib/Tooling/Syntax/Tokens.cpp
  clang/unittests/Tooling/CMakeLists.txt
  clang/unittests/Tooling/Syntax/CMakeLists.txt
  clang/unittests/Tooling/Syntax/TokensTest.cpp

Index: clang/unittests/Tooling/Syntax/TokensTest.cpp
===
--- /dev/null
+++ clang/unittests/Tooling/Syntax/TokensTest.cpp
@@ -0,0 +1,631 @@
+//===- TokensTest.cpp -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "clang/Tooling/Syntax/Tokens.h"
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/Expr.h"
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticIDs.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/FileSystemOptions.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TokenKinds.def"
+#include "clang/Basic/TokenKinds.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendAction.h"
+#include "clang/Frontend/Utils.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "clang/Lex/Token.h"
+#include "clang/Tooling/Tooling.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Support/raw_os_ostream.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Testing/Support/Annotations.h"
+#include "gmock/gmock-more-matchers.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+using namespace clang;
+using namespace clang::syntax;
+
+using ::testing::AllOf;
+using ::testing::Contains;
+using ::testing::ElementsAre;
+using ::testing::Matcher;
+using ::testing::Pointwise;
+
+namespace {
+// Matchers for syntax::Token.
+MATCHER_P(Kind, K, "") { return arg.kind() == K; }
+MATCHER_P2(HasText, Text, SourceMgr, "") {
+  return arg.text(*SourceMgr) == Text;
+}
+MATCHER_P2(IsIdent, Text, SourceMgr, "") {
+  return arg.kind() == tok::identifier && arg.text(*SourceMgr) == Text;
+}
+/// Checks the start and end location of a token are equal to SourceRng.
+MATCHER_P(RangeIs, SourceRng, "") {
+  return arg.location() == SourceRng.first &&
+ arg.endLocation() == SourceRng.second;
+}
+/// Checks the passed tuple has two similar tokens, i.e. both are of the same
+/// kind and have the same text if they are identifiers.
+/// Ignores differences in kind between the raw and non-raw mode.
+MATCHER_P(IsSameToken, SourceMgr, "") {
+  auto ToEquivalenceClass = [](tok::TokenKind Kind) {
+if (Kind == tok::identifier || Kind == tok::raw_identifier ||
+tok::getKeywordSpelling(Kind) != nullptr)
+  return tok::identifier;
+if (Kind == tok::string_literal || Kind == tok::header_name)
+  return tok::string_literal;
+return Kind;
+  };
+
+  auto &L = std::get<0>(arg);
+  auto &R = std::get<1>(arg);
+  if (ToEquivalenceClass(L.kind()) != ToEquivalenceClass(R.kind()))
+return false;
+  return L.text(*SourceMgr) == L.text(*SourceMgr);
+}
+} // namespace
+
+// Actual test fixture lives in the syntax namespace as it's a friend of
+// TokenBuffer.
+class syntax::TokensTest : public ::testing::Test {
+public:
+  /// Run the clang frontend, collect the preprocessed tokens from the frontend
+  /// invocation and store them in this->Buffer.
+  /// This also clears SourceManager before running the compiler.
+  void recordTokens(llvm::StringRef Code) {
+class RecordTokens : public ASTFrontendAction {
+public:
+  explicit RecordTokens(TokenBuffer &Result) : Result(Result) {}
+
+  bool BeginSourceFileAction(CompilerInstance &CI) override {
+assert(!Co

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-04-05 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov updated this revision to Diff 193903.
ilya-biryukov marked 2 inline comments as done.
ilya-biryukov added a comment.
Herald added a subscriber: mgrang.

Changes:

- Add multi-file support, record a single expanded stream and per-file-id raw 
token streams and mappings.
- Rename MacroInvocation to TokenBuffer::Mapping, make it private.
- Simplify TokenCollector, let preprocessor handle some more stuff.

TODO:

- update the docs
- go through other comments again
- write more tests


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887

Files:
  clang/include/clang/Tooling/Syntax/Tokens.h
  clang/lib/Tooling/CMakeLists.txt
  clang/lib/Tooling/Syntax/CMakeLists.txt
  clang/lib/Tooling/Syntax/Tokens.cpp
  clang/unittests/Tooling/CMakeLists.txt
  clang/unittests/Tooling/Syntax/CMakeLists.txt
  clang/unittests/Tooling/Syntax/TokensTest.cpp

Index: clang/unittests/Tooling/Syntax/TokensTest.cpp
===
--- /dev/null
+++ clang/unittests/Tooling/Syntax/TokensTest.cpp
@@ -0,0 +1,602 @@
+//===- TokensTest.cpp -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "clang/Tooling/Syntax/Tokens.h"
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/Expr.h"
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticIDs.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/FileSystemOptions.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TokenKinds.def"
+#include "clang/Basic/TokenKinds.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendAction.h"
+#include "clang/Frontend/Utils.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "clang/Lex/Token.h"
+#include "clang/Tooling/Tooling.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Support/raw_os_ostream.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Testing/Support/Annotations.h"
+#include "gmock/gmock-more-matchers.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+using namespace clang;
+using namespace clang::syntax;
+
+using ::testing::AllOf;
+using ::testing::Contains;
+using ::testing::ElementsAre;
+using ::testing::Matcher;
+using ::testing::Pointwise;
+
+namespace {
+// Matchers for syntax::Token.
+MATCHER_P(Kind, K, "") { return arg.kind() == K; }
+MATCHER_P2(HasText, Text, SourceMgr, "") {
+  return arg.text(*SourceMgr) == Text;
+}
+MATCHER_P2(IsIdent, Text, SourceMgr, "") {
+  return arg.kind() == tok::identifier && arg.text(*SourceMgr) == Text;
+}
+/// Checks the start and end location of a token are equal to SourceRng.
+MATCHER_P(RangeIs, SourceRng, "") {
+  return arg.location() == SourceRng.first &&
+ arg.endLocation() == SourceRng.second;
+}
+/// Checks the passed tuple has two similar tokens, i.e. both are of the same
+/// kind and have the same text if they are identifiers.
+/// Ignores differences in kind between the raw and non-raw mode.
+MATCHER_P(IsSameToken, SourceMgr, "") {
+  auto ToEquivalenceClass = [](tok::TokenKind Kind) {
+if (Kind == tok::identifier || Kind == tok::raw_identifier ||
+tok::getKeywordSpelling(Kind) != nullptr)
+  return tok::identifier;
+if (Kind == tok::string_literal || Kind == tok::header_name)
+  return tok::string_literal;
+return Kind;
+  };
+
+  auto &L = std::get<0>(arg);
+  auto &R = std::get<1>(arg);
+  if (ToEquivalenceClass(L.kind()) != ToEquivalenceClass(R.kind()))
+return false;
+  return L.text(*SourceMgr) == L.text(*SourceMgr);
+}
+} // namespace
+
+// Actual test fixture lives in the syntax namespace as it's a friend of
+// TokenBuffer.
+class syntax::TokensTest : public ::testing::Test {
+public:
+  /// Run the clang frontend, collect the preprocessed tokens from the frontend
+  /// invocation and store them in this->Buffer.
+  /// This also clears SourceManager before running the compiler.
+  void recordTokens(llvm::StringRef Code) {
+class RecordTokens : public ASTFrontendAction {
+public:
+  explicit RecordTokens(TokenBuffer &Result) : Result(Result) {}
+
+  bool BeginSourceFileAction(CompilerInstance &CI) override {
+ass

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-04-02 Thread Sam McCall via Phabricator via cfe-commits
sammccall added a comment.

Not sure what the implications of design changes are, so will defer reviewing 
details of tokencollector (which generally looks good, but is of course highly 
coupled to lexer/pp) and range mapping (which I suspect could be simplified, 
but depends heavily on the model).




Comment at: clang/lib/Tooling/Syntax/TokenBuffer.cpp:61
+
+class TokenCollector::Callbacks : public PPCallbacks {
+public:

I'm afraid this really needs a class comment describing what this is supposed 
to do (easy!) and the implementation strategy (hard!)

In particular, it looks like macros like this:
 - we expect to see the tokens making up the macro invocation first (... FOO ( 
42 ))
 - then we see MacroExpands which allows us to recognize the last N tokens are 
a macro invocation. We create a MacroInvocation object, and remember where the 
invocation ends
 - then we see the tokens making up the macro expansion
 - finally, once we see the next token after the invocation, we finalize the 
MacroInvocation.



Comment at: clang/lib/Tooling/Syntax/TokenBuffer.cpp:285
+std::tie(File, Offset) = SM.getDecomposedLoc(L);
+if (File != MacroInvocationFile || Offset <= ExpansionEndOffset)
+  return;

is there a problem if we never see another token in that file after the 
expansion?
(or do we see the eof token in this case?)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-04-02 Thread Sam McCall via Phabricator via cfe-commits
sammccall added inline comments.



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:1
+//===- TokenBuffer.h - store tokens of preprocessed files -*- C++ 
-*-=//
+//

ilya-biryukov wrote:
> sammccall wrote:
> > are you sure TokenBuffer is the central concept in this file, rather than 
> > just the thing with the most code? Token.h might end up being a better name 
> > for users.
> I don't mind changing this to `Token.h`, although I'd personally expect that 
> a file with this name only contains a definition for the token class.
> `Tokens.h` would be a better fit from my POV. WDYT?
Sure, Tokens.h SGTM.



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:66
+///macroTokens = {'BAR', '(', '1', ')'}.
+class MacroInvocation {
+public:

I'd suggest a more natural order is token -> tokenbuffer -> macroinvocation. 
Forward declare?

This is because the token buffer exposes token streams, which are probably the 
second-most important concept in the file (after tokens)



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:92
+
+/// A list of tokens obtained by lexing and preprocessing a text buffer and a
+/// set of helpers to allow mapping the tokens after preprocessing to the

ilya-biryukov wrote:
> sammccall wrote:
> > Warning, braindump follows - let's discuss further.
> > We talked a bunch offline about the logical and concrete data model here.
> > 
> > As things stand:
> >  - #includes are not expanded, but will refer to a file ID with its own 
> > buffer: `map` is the whole structure
> >  - no information is captured about other PP interactions (PP directives 
> > that generate no tokens, skipped sections
> >  - the spelled sequence of tokens is not directly available (but expanded + 
> > macro invocations may be enough to reconstruct it)
> > 
> > If we keep this model, I think we should spell both of these out in the 
> > comment.
> > But there's another fairly natural model we could consider:
> > 
> >  - we have a single TokenBuffer with a flat list of all expanded tokens in 
> > the TU, rather than for the file 
> >  - one FileID corresponds to a contiguous range of *transitive* symbols, 
> > these ranges nest
> >  - the TokenBuffer also stores the original tokens as `map > vector>`
> >  - spelled -> expanded token mapping: spelled tokens for a file are 
> > partitioned into ranges (types: literal, include, macro invocation, other 
> > pp directive, pp skipped region). each range maps onto a range of expanded 
> > tokens (empty for e.g. pp directive or skipped region)
> >  - expanded -> spelled token is similar. (It's almost the inverse of the of 
> > the other mapping, we drop the "include" mapping ranges)
> >  - This can naturally handle comments, preprocessor directives, pp skipped 
> > sections, etc - these are in the spelled stream but not the expanded stream.
> > 
> > e.g. for this code
> > ```
> > // foo.cpp
> > int a;
> > #include "foo.h"
> > int b =  FOO(42);
> > // foo.h
> > #define FOO(X) X*X
> > int c;
> > ```
> > 
> > we'd have this buffer:
> > ```
> > expandedTokens = [int a ; int c ; int b = 42 * 42 ;]
> > spelledTokens = {
> >   "foo.cpp" => [int a; # include "foo.h" int b = FOO ( 42 ) ;],
> >   "foo.h" => [# define FOO ( X ) X * X int c ;]
> > }
> > 
> > expandedToSpelling = {
> >   int => int (foo.cpp), type = literal
> >   a => a
> >   ; => ;
> > 
> >   [] => [# define FOO ( X ) X * X](foo.h), type=preprocessor directive
> >   int => int (foo.h)
> >   c => c
> >   ; => ;
> > 
> >   int => int (foo.cpp)
> >   b => b
> >   = => =
> >   [42 * 42] => [FOO ( 42 ) ](foo.cpp), type=macro invocation
> >   ; => ; (foo.cpp)
> > }
> > 
> > spellingToExpanded = {
> >   // foo.cpp
> >   int => int, type=literal
> >   a => a
> >   ; => ;
> >   [# include "foo.h"] => [int c ;], type=include
> >   int => int
> >   b => b
> >   = => =
> >   [FOO ( X )] => [42 * 42], type=macro invocation
> >   ; => ;
> > 
> >   // foo.h
> >   [# define FOO ( X ) X] => [], type=preprocessor directive
> >   int => int
> >   c => c
> >   ; => ;
> > }
> > ```
> > 
> > Various implementation strategies possible here, one obvious one is to use 
> > a flat sorted list, and include a sequence of literal tokens as a single 
> > entry. 
> > 
> > ```
> > struct ExpandedSpellingMapping {
> >   unsigned ExpandedStart, ExpandedEnd;
> >   FileID SpellingFile; // redundant for illustration: can actually derive 
> > from SpellingTokens[SpellingStart].location()
> >   unsigned SpellingStart, SpellingEnd;
> >   enum { LiteralSequence, MacroInvocation, Preprocessor, PPSkipped, 
> > Inclusion } Kind;
> > }
> > vector ExpandedToSpelling; // bsearchable
> > vector> Inclusions; // for spelling 
> > -> expanded mapping. redundant: much can be taken from SourceManager.
> > ```
> > 
> > A delta-encoded representation with chunks for bsearch will likely be much 
> > more compact, if th

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-04-02 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov added a comment.

Will address the rest of the comments later, answering a few questions that 
popped up first.




Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:72
+  /// macro or a name, arguments and parentheses of a function-like macro.
+  llvm::ArrayRef macroTokens(const TokenBuffer &B) const;
+  /// The range covering macroTokens().

sammccall wrote:
> ilya-biryukov wrote:
> > gribozavr wrote:
> > > `invocationTokens` or `macroInvocationTokens`
> > The plan is to eventually include the macro directives tokens, hence the 
> > name.
> > `invocationTokens` are somewhat inaccurate in that case, can't come up with 
> > a better name.
> I think your reply applies to TokenBuffer::macroTokens(), not 
> MacroInvocation::macroTokens().
> 
> +1 to invocationTokens here.
Yeah, sorry, I have mixed up these two functions with the same name. Will 
change to `invocationTokens`.



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:1
+//===- TokenBuffer.h - store tokens of preprocessed files -*- C++ 
-*-=//
+//

sammccall wrote:
> are you sure TokenBuffer is the central concept in this file, rather than 
> just the thing with the most code? Token.h might end up being a better name 
> for users.
I don't mind changing this to `Token.h`, although I'd personally expect that a 
file with this name only contains a definition for the token class.
`Tokens.h` would be a better fit from my POV. WDYT?



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:92
+
+/// A list of tokens obtained by lexing and preprocessing a text buffer and a
+/// set of helpers to allow mapping the tokens after preprocessing to the

sammccall wrote:
> Warning, braindump follows - let's discuss further.
> We talked a bunch offline about the logical and concrete data model here.
> 
> As things stand:
>  - #includes are not expanded, but will refer to a file ID with its own 
> buffer: `map` is the whole structure
>  - no information is captured about other PP interactions (PP directives that 
> generate no tokens, skipped sections
>  - the spelled sequence of tokens is not directly available (but expanded + 
> macro invocations may be enough to reconstruct it)
> 
> If we keep this model, I think we should spell both of these out in the 
> comment.
> But there's another fairly natural model we could consider:
> 
>  - we have a single TokenBuffer with a flat list of all expanded tokens in 
> the TU, rather than for the file 
>  - one FileID corresponds to a contiguous range of *transitive* symbols, 
> these ranges nest
>  - the TokenBuffer also stores the original tokens as `map vector>`
>  - spelled -> expanded token mapping: spelled tokens for a file are 
> partitioned into ranges (types: literal, include, macro invocation, other pp 
> directive, pp skipped region). each range maps onto a range of expanded 
> tokens (empty for e.g. pp directive or skipped region)
>  - expanded -> spelled token is similar. (It's almost the inverse of the of 
> the other mapping, we drop the "include" mapping ranges)
>  - This can naturally handle comments, preprocessor directives, pp skipped 
> sections, etc - these are in the spelled stream but not the expanded stream.
> 
> e.g. for this code
> ```
> // foo.cpp
> int a;
> #include "foo.h"
> int b =  FOO(42);
> // foo.h
> #define FOO(X) X*X
> int c;
> ```
> 
> we'd have this buffer:
> ```
> expandedTokens = [int a ; int c ; int b = 42 * 42 ;]
> spelledTokens = {
>   "foo.cpp" => [int a; # include "foo.h" int b = FOO ( 42 ) ;],
>   "foo.h" => [# define FOO ( X ) X * X int c ;]
> }
> 
> expandedToSpelling = {
>   int => int (foo.cpp), type = literal
>   a => a
>   ; => ;
> 
>   [] => [# define FOO ( X ) X * X](foo.h), type=preprocessor directive
>   int => int (foo.h)
>   c => c
>   ; => ;
> 
>   int => int (foo.cpp)
>   b => b
>   = => =
>   [42 * 42] => [FOO ( 42 ) ](foo.cpp), type=macro invocation
>   ; => ; (foo.cpp)
> }
> 
> spellingToExpanded = {
>   // foo.cpp
>   int => int, type=literal
>   a => a
>   ; => ;
>   [# include "foo.h"] => [int c ;], type=include
>   int => int
>   b => b
>   = => =
>   [FOO ( X )] => [42 * 42], type=macro invocation
>   ; => ;
> 
>   // foo.h
>   [# define FOO ( X ) X] => [], type=preprocessor directive
>   int => int
>   c => c
>   ; => ;
> }
> ```
> 
> Various implementation strategies possible here, one obvious one is to use a 
> flat sorted list, and include a sequence of literal tokens as a single entry. 
> 
> ```
> struct ExpandedSpellingMapping {
>   unsigned ExpandedStart, ExpandedEnd;
>   FileID SpellingFile; // redundant for illustration: can actually derive 
> from SpellingTokens[SpellingStart].location()
>   unsigned SpellingStart, SpellingEnd;
>   enum { LiteralSequence, MacroInvocation, Preprocessor, PPSkipped, Inclusion 
> } Kind;
> }
> vector ExpandedToSpelling; // bsearchable
> vector> Inclusions; // for spelling -> 

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-04-01 Thread Sam McCall via Phabricator via cfe-commits
sammccall added a comment.

incomplete, haven't reviewed token collector




Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:72
+  /// macro or a name, arguments and parentheses of a function-like macro.
+  llvm::ArrayRef macroTokens(const TokenBuffer &B) const;
+  /// The range covering macroTokens().

ilya-biryukov wrote:
> gribozavr wrote:
> > `invocationTokens` or `macroInvocationTokens`
> The plan is to eventually include the macro directives tokens, hence the name.
> `invocationTokens` are somewhat inaccurate in that case, can't come up with a 
> better name.
I think your reply applies to TokenBuffer::macroTokens(), not 
MacroInvocation::macroTokens().

+1 to invocationTokens here.



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:74
+  /// The range covering macroTokens().
+  std::pair macroRange(const TokenBuffer &B,
+   const SourceManager &SM) const;

ilya-biryukov wrote:
> gribozavr wrote:
> > `invocationSourceRange` or `macroInvocationSourceRange` depending on what 
> > you choose for the function above.
> > 
> WDYT about `range`?
> Given the name of the parent class, this should be unambiguous.
I'd personally prefer invocationRange for symmetry with invocationTokens (which 
probably can't just be called tokens). But range is OK.
Please document half-openness (shouldn't be necessary, but this is clang after 
all).





Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:1
+//===- TokenBuffer.h - store tokens of preprocessed files -*- C++ 
-*-=//
+//

are you sure TokenBuffer is the central concept in this file, rather than just 
the thing with the most code? Token.h might end up being a better name for 
users.



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:8
+//===--===//
+
+#ifndef LLVM_CLANG_TOOLING_SYNTAX_TOKEN_BUFFER_H

file comment?
sometimes they're covered by the class comments, but I think there's a bit to 
say here.
in particular the logical model (how we model the preprocessor, the two token 
streams and types of mappings between them) might go here.



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:32
+public:
+  Token() = default;
+  Token(SourceLocation Location, unsigned Length, tok::TokenKind Kind)

what's the default state (and why have one?) do you need a way to query whether 
a token is "valid"?
(I'd avoid just relying on length() == 0 or location().isInvalid() because it's 
not obvious to callers this can happen)



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:58
+
+static_assert(sizeof(Token) <= 16, "Token is unresonably large");
+

unresonably -> unreasonably



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:60
+
+/// A top-level macro invocation inside a file, e.g.
+///   #define FOO 1+2

If you're going to say "invocation" in the name, say "use" in the comment (or 
vice versa!)



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:74
+  /// The range covering macroTokens().
+  std::pair macroRange(const TokenBuffer &B,
+   const SourceManager &SM) const;

It seems weirdly non-orthogonal to be able to get the range but not the file ID.

I'd suggest either adding another accessor for that, or returning a `struct 
BufferRange { FileID File; unsigned Begin, End; }` (with a better name.)

I favor the latter, because `.first` and `.second` don't offer the reader 
semantic hints at the callsite, and because that gives a nice place to document 
the half-open nature of the interval.

Actually, I'd suggest adding such a struct accessor to Token too.



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:87
+  /// to the identifier token corresponding to a name of the expanded macro.
+  unsigned BeginMacroToken = 0;
+  /// End index in TokenBuffer::macroTokens().

please rename along with macroTokens() function



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:92
+
+/// A list of tokens obtained by lexing and preprocessing a text buffer and a
+/// set of helpers to allow mapping the tokens after preprocessing to the

Warning, braindump follows - let's discuss further.
We talked a bunch offline about the logical and concrete data model here.

As things stand:
 - #includes are not expanded, but will refer to a file ID with its own buffer: 
`map` is the whole structure
 - no information is captured about other PP interactions (PP directives that 
generate no tokens, skipped sections
 - the spelled sequence of tokens is not directly available (but expanded + 
macro invocations may be enough t

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-03-29 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov updated this revision to Diff 192839.
ilya-biryukov marked 19 inline comments as done and 2 inline comments as done.
ilya-biryukov added a comment.

- Rename various things.
- Update doc comments.
- Search tokens in the tests by spelling, not by kind.
- Add more tests.
- Fix typos.
- Address other comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887

Files:
  clang/include/clang/Tooling/Syntax/TokenBuffer.h
  clang/lib/Tooling/CMakeLists.txt
  clang/lib/Tooling/Syntax/CMakeLists.txt
  clang/lib/Tooling/Syntax/TokenBuffer.cpp
  clang/unittests/Tooling/CMakeLists.txt
  clang/unittests/Tooling/Syntax/CMakeLists.txt
  clang/unittests/Tooling/Syntax/TokenBufferTest.cpp

Index: clang/unittests/Tooling/Syntax/TokenBufferTest.cpp
===
--- /dev/null
+++ clang/unittests/Tooling/Syntax/TokenBufferTest.cpp
@@ -0,0 +1,507 @@
+//===- TokenBufferTest.cpp ===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "clang/Tooling/Syntax/TokenBuffer.h"
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/Expr.h"
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticIDs.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/FileSystemOptions.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TokenKinds.def"
+#include "clang/Basic/TokenKinds.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendAction.h"
+#include "clang/Frontend/Utils.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "clang/Lex/Token.h"
+#include "clang/Tooling/Tooling.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Support/raw_os_ostream.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Testing/Support/Annotations.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+using namespace clang;
+using namespace clang::syntax;
+
+using ::testing::AllOf;
+using ::testing::Contains;
+using ::testing::ElementsAre;
+using ::testing::IsEmpty;
+using ::testing::Matcher;
+using ::testing::Pointwise;
+
+// Debug printers.
+// FIXME: This should live somewhere else or be implemented as 'operator
+// <<(raw_ostream&, T)'.
+namespace clang {
+namespace tok {
+inline void PrintTo(TokenKind K, std::ostream *OS) {
+  *OS << tok::getTokenName(K);
+}
+} // namespace tok
+namespace syntax {
+inline void PrintTo(const syntax::Token &T, std::ostream *OS) {
+  PrintTo(T.kind(), OS);
+  OS->flush();
+}
+} // namespace syntax
+} // namespace clang
+
+namespace {
+// Matchers for clang::Token.
+MATCHER_P(Kind, K, "") { return arg.kind() == K; }
+MATCHER_P2(HasText, Text, SourceMgr, "") {
+  return arg.text(*SourceMgr) == Text;
+}
+MATCHER_P2(IsIdent, Text, SourceMgr, "") {
+  return arg.kind() == tok::identifier && arg.text(*SourceMgr) == Text;
+}
+/// Checks the start and end location of a token are equal to SourceRng.
+MATCHER_P(RangeIs, SourceRng, "") {
+  return arg.location() == SourceRng.first &&
+ arg.endLocation() == SourceRng.second;
+}
+/// Checks the passed tuple has two similar tokens, i.e. both are of the same
+/// kind and have the same text if they are identifiers.
+MATCHER_P(IsSameToken, SourceMgr, "") {
+  auto &L = std::get<0>(arg);
+  auto &R = std::get<1>(arg);
+  if (L.kind() != R.kind())
+return false;
+  return L.text(*SourceMgr) == L.text(*SourceMgr);
+}
+
+class TokenBufferTest : public ::testing::Test {
+public:
+  /// Run the clang frontend, collect the preprocessed tokens from the frontend
+  /// invocation and store them in this->Buffer.
+  /// This also clears SourceManager before running the compiler.
+  void recordTokens(llvm::StringRef Code) {
+class RecordTokens : public ASTFrontendAction {
+public:
+  explicit RecordTokens(TokenBuffer &Result) : Result(Result) {}
+
+  bool BeginSourceFileAction(CompilerInstance &CI) override {
+assert(!Collector && "expected only a single call to BeginSourceFile");
+Collector.emplace(CI.getPreprocessor());
+return true;
+  }
+  void EndSourceFileAction() override {
+assert(Collector && "BeginSourceFileAction was never called");
+Result = std::move(*Co

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-03-29 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov added a comment.

Still have a few comments to address in `TokenCollector` and wrt to naming. But 
apart from this revision is ready for another round.




Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:120
+  /// the original source file. The tranformation may not be possible if the
+  /// tokens cross macro invocations in the middle, e.g.
+  ///#define FOO 1*2

gribozavr wrote:
> "The mapping fails if cross boundaries of macro expansions, that is, don't 
> correspond to a complete top-level macro invocation"
> 
> Consider adding examples.
> 
> ```
> Given this source file:
> 
> #define FIRST f1 f2 f3
> #define SECOND s1 s2 3
> 
> a FIRST b SECOND c  // expansion: a f1 f2 f3 b s1 s2 s3 c
> 
> toOffsetRange will map tokens like this:
> 
> input range => output range
> a => a
> s1 s2 s3 => SECOND
> a f1 f2 f3 => a FIRST
> a f1 => can't map
> s1 s2 => s1 s2 within the macro definition
> ```
> 
> Could you add these to tests as well?
Done. Note that we never map to tokens inside macro definition (the `s1 s2` 
example)



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:191
+  /// i.e. after running Execute().
+  TokenBuffer consume() &&;
+

gribozavr wrote:
> gribozavr wrote:
> > "Finalizes token collection"
> > 
> > Of course a function called "consume" consumes the result :)
> LLVM_NODISCARD?
> Of course a function called "consume" consumes the result :)

Agreed :-)



Comment at: clang/lib/Tooling/Syntax/TokenBuffer.cpp:295
+MacroExpansion::tokens(const TokenBuffer &B) const {
+  return B.tokens().slice(BeginExpansionToken,
+  EndExpansionToken - BeginExpansionToken);

gribozavr wrote:
> "ExpansionTokenBegin"?  "ExpansionTokenStartIndex"?
Went with `BeginExpandedToken`, `EndExpandedToken`.



Comment at: clang/unittests/Tooling/Syntax/TokenBufferTest.cpp:190
+for (unsigned I = 0; I < Actual.size(); ++I) {
+  auto &A = Actual[I];
+  auto &E = Expected[I];

Eugene.Zelenko wrote:
> Return type is not obvious, so auto should not be used.
The declarations of `Actual` and `Expected` are **really** close, both types 
are easy to infer



Comment at: clang/unittests/Tooling/Syntax/TokenBufferTest.cpp:429
+  Code = llvm::Annotations(R"cpp(
+$all[[int $a[[a]] = $numbers[[100 + 200]];]]
+  )cpp");

gribozavr wrote:
> Could we instead use some nonsensical code like "a1 a2 a3 FIRST a3 a4 a5 
> SECOND a6 a7 a8", and instead of `OfKind` we can make a helper that finds the 
> identifier with the given name?
Thanks. Much nicer with a function that finds by text.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-03-29 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov marked 2 inline comments as done.
ilya-biryukov added a comment.

A few naming alternatives, will update the review after addressing other 
comments.




Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:72
+  /// macro or a name, arguments and parentheses of a function-like macro.
+  llvm::ArrayRef macroTokens(const TokenBuffer &B) const;
+  /// The range covering macroTokens().

gribozavr wrote:
> `invocationTokens` or `macroInvocationTokens`
The plan is to eventually include the macro directives tokens, hence the name.
`invocationTokens` are somewhat inaccurate in that case, can't come up with a 
better name.



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:74
+  /// The range covering macroTokens().
+  std::pair macroRange(const TokenBuffer &B,
+   const SourceManager &SM) const;

gribozavr wrote:
> `invocationSourceRange` or `macroInvocationSourceRange` depending on what you 
> choose for the function above.
> 
WDYT about `range`?
Given the name of the parent class, this should be unambiguous.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-03-28 Thread Dmitri Gribenko via Phabricator via cfe-commits
gribozavr added inline comments.



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:69
+  /// The tokens after preprocessor replacements.
+  llvm::ArrayRef tokens(const TokenBuffer &B) const;
+  /// Tokens that appear in the text of the file, i.e. a name of an object-like

`expandedTokens`?



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:72
+  /// macro or a name, arguments and parentheses of a function-like macro.
+  llvm::ArrayRef macroTokens(const TokenBuffer &B) const;
+  /// The range covering macroTokens().

`invocationTokens` or `macroInvocationTokens`



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:74
+  /// The range covering macroTokens().
+  std::pair macroRange(const TokenBuffer &B,
+   const SourceManager &SM) const;

`invocationSourceRange` or `macroInvocationSourceRange` depending on what you 
choose for the function above.




Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:83
+  unsigned BeginMacroToken = 0;
+  unsigned EndMacroToken = 0;
+};

Please add brief doc comments to these variables. Having read the public API of 
this class, I still don't have an idea what these variables are.





Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:120
+  /// the original source file. The tranformation may not be possible if the
+  /// tokens cross macro invocations in the middle, e.g.
+  ///#define FOO 1*2

"The mapping fails if cross boundaries of macro expansions, that is, don't 
correspond to a complete top-level macro invocation"

Consider adding examples.

```
Given this source file:

#define FIRST f1 f2 f3
#define SECOND s1 s2 3

a FIRST b SECOND c  // expansion: a f1 f2 f3 b s1 s2 s3 c

toOffsetRange will map tokens like this:

input range => output range
a => a
s1 s2 s3 => SECOND
a f1 f2 f3 => a FIRST
a f1 => can't map
s1 s2 => s1 s2 within the macro definition
```

Could you add these to tests as well?



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:153
+  /// FIXME: we do not yet store tokens of directives, like #include, #define,
+  ///#pragma, etc.
+  llvm::ArrayRef macroTokens() const { return MacroTokens; }

... For the input above, macroTokens() should return {desired output}



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:191
+  /// i.e. after running Execute().
+  TokenBuffer consume() &&;
+

"Finalizes token collection"

Of course a function called "consume" consumes the result :)



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:191
+  /// i.e. after running Execute().
+  TokenBuffer consume() &&;
+

gribozavr wrote:
> "Finalizes token collection"
> 
> Of course a function called "consume" consumes the result :)
LLVM_NODISCARD?



Comment at: clang/lib/Tooling/Syntax/TokenBuffer.cpp:295
+MacroExpansion::tokens(const TokenBuffer &B) const {
+  return B.tokens().slice(BeginExpansionToken,
+  EndExpansionToken - BeginExpansionToken);

"ExpansionTokenBegin"?  "ExpansionTokenStartIndex"?



Comment at: clang/lib/Tooling/Syntax/TokenBuffer.cpp:367
+  if (FirstCall != Expansions.end() &&
+  (FirstCall->BeginExpansionToken < BeginIndex ||
+   EndIndex < FirstCall->EndExpansionToken)) {

`FirstCall->BeginExpansionToken != BeginIndex` ?



Comment at: clang/lib/Tooling/Syntax/TokenBuffer.cpp:368
+  (FirstCall->BeginExpansionToken < BeginIndex ||
+   EndIndex < FirstCall->EndExpansionToken)) {
+return llvm::None;

I don't understand why `EndIndex < FirstCall->EndExpansionToken` is needed -- 
isn't it redundant with the `LastCall` checks below?



Comment at: clang/lib/Tooling/Syntax/TokenBuffer.cpp:373
+  if (LastCall != Expansions.end() &&
+  (LastCall->BeginExpansionToken < BeginIndex ||
+   EndIndex < LastCall->EndExpansionToken)) {

`LastCall->BeginExpansionToken != BeginIndex`?



Comment at: clang/lib/Tooling/Syntax/TokenBuffer.cpp:374
+  (LastCall->BeginExpansionToken < BeginIndex ||
+   EndIndex < LastCall->EndExpansionToken)) {
+return llvm::None;

Also redundant with `FirstCall` checks?



Comment at: clang/unittests/Tooling/Syntax/TokenBufferTest.cpp:355
+  checkTokens("");
+  checkMacroInvocations({{"EMPTY", "", Code.range("m")},
+   {"EMPTY_FUNC(1+2+3)", "", Code.range("f")}});

`expectTokens`, `expectMacroInvocations`?



Comment at: clang/unittests/Tooling/Syntax/TokenBufferTest.cpp:376
+  // The parser eventually breaks the first '>>' into two tokens ('>' and '>'),
+  // but 

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-03-28 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov updated this revision to Diff 192661.
ilya-biryukov added a comment.

- s/macroMacroInvocation/something else...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887

Files:
  clang/include/clang/Tooling/Syntax/TokenBuffer.h
  clang/lib/Tooling/CMakeLists.txt
  clang/lib/Tooling/Syntax/CMakeLists.txt
  clang/lib/Tooling/Syntax/TokenBuffer.cpp
  clang/unittests/Tooling/CMakeLists.txt
  clang/unittests/Tooling/Syntax/CMakeLists.txt
  clang/unittests/Tooling/Syntax/TokenBufferTest.cpp

Index: clang/unittests/Tooling/Syntax/TokenBufferTest.cpp
===
--- /dev/null
+++ clang/unittests/Tooling/Syntax/TokenBufferTest.cpp
@@ -0,0 +1,470 @@
+//===- TokenBufferTest.cpp ===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "clang/Tooling/Syntax/TokenBuffer.h"
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/Expr.h"
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticIDs.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/FileSystemOptions.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TokenKinds.def"
+#include "clang/Basic/TokenKinds.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendAction.h"
+#include "clang/Frontend/Utils.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "clang/Lex/Token.h"
+#include "clang/Tooling/Tooling.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Support/raw_os_ostream.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Testing/Support/Annotations.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+using namespace clang;
+using namespace clang::syntax;
+
+using ::testing::AllOf;
+using ::testing::Contains;
+using ::testing::ElementsAre;
+using ::testing::IsEmpty;
+using ::testing::Matcher;
+using ::testing::Pointwise;
+
+// Debug printers.
+// FIXME: This should live somewhere else or be implemented as 'operator
+// <<(raw_ostream&, T)'.
+namespace clang {
+namespace tok {
+inline void PrintTo(TokenKind K, std::ostream *OS) {
+  *OS << tok::getTokenName(K);
+}
+} // namespace tok
+namespace syntax {
+inline void PrintTo(const syntax::Token &T, std::ostream *OS) {
+  PrintTo(T.kind(), OS);
+  OS->flush();
+}
+} // namespace syntax
+} // namespace clang
+
+namespace {
+// Matchers for clang::Token.
+MATCHER_P(Kind, K, "") { return arg.kind() == K; }
+MATCHER_P2(HasText, Text, SourceMgr, "") {
+  return arg.text(*SourceMgr) == Text;
+}
+MATCHER_P2(IsIdent, Text, SourceMgr, "") {
+  return arg.kind() == tok::identifier && arg.text(*SourceMgr) == Text;
+}
+/// Checks the start and end location of a token are equal to SourceRng.
+MATCHER_P(RangeIs, SourceRng, "") {
+  return arg.location() == SourceRng.first &&
+ arg.endLocation() == SourceRng.second;
+}
+/// Checks the passed tuple has two similar tokens, i.e. both are of the same
+/// kind and have the same text if they are identifiers.
+MATCHER_P(IsSameToken, SourceMgr, "") {
+  auto &L = std::get<0>(arg);
+  auto &R = std::get<1>(arg);
+  if (L.kind() != R.kind())
+return false;
+  return L.text(*SourceMgr) == L.text(*SourceMgr);
+}
+
+class TokenBufferTest : public ::testing::Test {
+public:
+  /// Run the clang frontend, collect the preprocessed tokens from the frontend
+  /// invocation and store them in this->Tokens.
+  /// This also clears SourceManager before running the compiler.
+  void recordTokens(llvm::StringRef Code) {
+class RecordTokens : public ASTFrontendAction {
+public:
+  explicit RecordTokens(TokenBuffer &Result) : Result(Result) {}
+
+  bool BeginSourceFileAction(CompilerInstance &CI) override {
+assert(!Collector && "expected only a single call to BeginSourceFile");
+Collector.emplace(CI.getPreprocessor());
+return true;
+  }
+  void EndSourceFileAction() override {
+assert(Collector && "BeginSourceFileAction was never called");
+Result = std::move(*Collector).consume();
+  }
+
+  std::unique_ptr
+  CreateASTConsumer(CompilerInstance &CI, StringRef InFile) override {
+return llvm::make_unique();
+  }
+
+private:
+  TokenBuf

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-03-28 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov updated this revision to Diff 192660.
ilya-biryukov marked an inline comment as done.
ilya-biryukov added a comment.

- Rename macro expansion to macro invocation everywhere
- Tweak comments


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887

Files:
  clang/include/clang/Tooling/Syntax/TokenBuffer.h
  clang/lib/Tooling/CMakeLists.txt
  clang/lib/Tooling/Syntax/CMakeLists.txt
  clang/lib/Tooling/Syntax/TokenBuffer.cpp
  clang/unittests/Tooling/CMakeLists.txt
  clang/unittests/Tooling/Syntax/CMakeLists.txt
  clang/unittests/Tooling/Syntax/TokenBufferTest.cpp

Index: clang/unittests/Tooling/Syntax/TokenBufferTest.cpp
===
--- /dev/null
+++ clang/unittests/Tooling/Syntax/TokenBufferTest.cpp
@@ -0,0 +1,470 @@
+//===- TokenBufferTest.cpp ===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "clang/Tooling/Syntax/TokenBuffer.h"
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/Expr.h"
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticIDs.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/FileSystemOptions.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TokenKinds.def"
+#include "clang/Basic/TokenKinds.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendAction.h"
+#include "clang/Frontend/Utils.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "clang/Lex/Token.h"
+#include "clang/Tooling/Tooling.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Support/raw_os_ostream.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Testing/Support/Annotations.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+using namespace clang;
+using namespace clang::syntax;
+
+using ::testing::AllOf;
+using ::testing::Contains;
+using ::testing::ElementsAre;
+using ::testing::IsEmpty;
+using ::testing::Matcher;
+using ::testing::Pointwise;
+
+// Debug printers.
+// FIXME: This should live somewhere else or be implemented as 'operator
+// <<(raw_ostream&, T)'.
+namespace clang {
+namespace tok {
+inline void PrintTo(TokenKind K, std::ostream *OS) {
+  *OS << tok::getTokenName(K);
+}
+} // namespace tok
+namespace syntax {
+inline void PrintTo(const syntax::Token &T, std::ostream *OS) {
+  PrintTo(T.kind(), OS);
+  OS->flush();
+}
+} // namespace syntax
+} // namespace clang
+
+namespace {
+// Matchers for clang::Token.
+MATCHER_P(Kind, K, "") { return arg.kind() == K; }
+MATCHER_P2(HasText, Text, SourceMgr, "") {
+  return arg.text(*SourceMgr) == Text;
+}
+MATCHER_P2(IsIdent, Text, SourceMgr, "") {
+  return arg.kind() == tok::identifier && arg.text(*SourceMgr) == Text;
+}
+/// Checks the start and end location of a token are equal to SourceRng.
+MATCHER_P(RangeIs, SourceRng, "") {
+  return arg.location() == SourceRng.first &&
+ arg.endLocation() == SourceRng.second;
+}
+/// Checks the passed tuple has two similar tokens, i.e. both are of the same
+/// kind and have the same text if they are identifiers.
+MATCHER_P(IsSameToken, SourceMgr, "") {
+  auto &L = std::get<0>(arg);
+  auto &R = std::get<1>(arg);
+  if (L.kind() != R.kind())
+return false;
+  return L.text(*SourceMgr) == L.text(*SourceMgr);
+}
+
+class TokenBufferTest : public ::testing::Test {
+public:
+  /// Run the clang frontend, collect the preprocessed tokens from the frontend
+  /// invocation and store them in this->Tokens.
+  /// This also clears SourceManager before running the compiler.
+  void recordTokens(llvm::StringRef Code) {
+class RecordTokens : public ASTFrontendAction {
+public:
+  explicit RecordTokens(TokenBuffer &Result) : Result(Result) {}
+
+  bool BeginSourceFileAction(CompilerInstance &CI) override {
+assert(!Collector && "expected only a single call to BeginSourceFile");
+Collector.emplace(CI.getPreprocessor());
+return true;
+  }
+  void EndSourceFileAction() override {
+assert(Collector && "BeginSourceFileAction was never called");
+Result = std::move(*Collector).consume();
+  }
+
+  std::unique_ptr
+  CreateASTConsumer(CompilerInstance &CI, StringRef InFile) override {

[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-03-27 Thread Eugene Zelenko via Phabricator via cfe-commits
Eugene.Zelenko added inline comments.



Comment at: clang/lib/Tooling/Syntax/TokenBuffer.cpp:323
+  PP.getLangOpts(), Tokens);
+  auto *CB = CBOwner.get();
+

ilya-biryukov wrote:
> Eugene.Zelenko wrote:
> > Return type is not obvious. Actually variable is used only one, so it's not 
> > necessary.
> - The type is quite obvious from the previous line.
> - We need the variable as we `std::move` the `unique_ptr` in the next line 
> and sill want to access the pointer later.
Lines may be separated in future.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-03-27 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov marked 2 inline comments as done.
ilya-biryukov added inline comments.



Comment at: clang/lib/Tooling/Syntax/TokenBuffer.cpp:323
+  PP.getLangOpts(), Tokens);
+  auto *CB = CBOwner.get();
+

Eugene.Zelenko wrote:
> Return type is not obvious. Actually variable is used only one, so it's not 
> necessary.
- The type is quite obvious from the previous line.
- We need the variable as we `std::move` the `unique_ptr` in the next line and 
sill want to access the pointer later.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-03-27 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov marked an inline comment as done.
ilya-biryukov added inline comments.



Comment at: clang/include/clang/Tooling/Syntax/TokenBuffer.h:130
+  /// result.
+  llvm::ArrayRef expansions() const { return Expansions; }
+  /// Tokens of macro directives and top-level macro expansions. These are not

@gribozavr, note that I left the name expansions.
The reasoning is that it seems to describe both the functional and 
non-functional macros, while "invocation" is only applicable to functional 
macros.
```
#define FOO int
#define ID(a)

FOO // a macro expansion
ID(a = 10;); // a macro invocation *and* a macro expansion.
```



Comment at: clang/unittests/Tooling/Syntax/TokenBufferTest.cpp:157
+std::string NullTerminated = Text.str();
+auto FID = SourceMgr->createFileID(llvm::MemoryBuffer::getMemBufferCopy(
+StringRef(NullTerminated.data(), NullTerminated.size() + 1)));

Eugene.Zelenko wrote:
> Return type is not obvious, so auto should not be used.
It is `createFileID` definitely returns `FileID`


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-03-27 Thread Eugene Zelenko via Phabricator via cfe-commits
Eugene.Zelenko added inline comments.



Comment at: clang/lib/Tooling/Syntax/TokenBuffer.cpp:8
+//===--===//
+#include "clang/Tooling/Syntax/TokenBuffer.h"
+#include "clang/Basic/Diagnostic.h"

Please separate with empty line.



Comment at: clang/lib/Tooling/Syntax/TokenBuffer.cpp:28
+}
+llvm::StringRef syntax::Token::text(const SourceManager &SM) const {
+  bool Invalid = false;

Please separate with empty line.



Comment at: clang/lib/Tooling/Syntax/TokenBuffer.cpp:323
+  PP.getLangOpts(), Tokens);
+  auto *CB = CBOwner.get();
+

Return type is not obvious. Actually variable is used only one, so it's not 
necessary.



Comment at: clang/unittests/Tooling/Syntax/TokenBufferTest.cpp:133
+  FileName};
+auto CI = createInvocationFromCommandLine(Args, Diags, FS);
+assert(CI);

Return type is not obvious, so auto should not be used.



Comment at: clang/unittests/Tooling/Syntax/TokenBufferTest.cpp:157
+std::string NullTerminated = Text.str();
+auto FID = SourceMgr->createFileID(llvm::MemoryBuffer::getMemBufferCopy(
+StringRef(NullTerminated.data(), NullTerminated.size() + 1)));

Return type is not obvious, so auto should not be used.



Comment at: clang/unittests/Tooling/Syntax/TokenBufferTest.cpp:165
+  void checkTokens(llvm::StringRef ExpectedText) {
+auto TokenizedCode = tokenize(ExpectedText);
+std::vector ExpectedTokens = TokenizedCode.tokens();

Return type is not obvious, so auto should not be used.



Comment at: clang/unittests/Tooling/Syntax/TokenBufferTest.cpp:186
+  void checkExpansions(llvm::ArrayRef Expected) {
+auto Actual = Buffer.expansions();
+ASSERT_EQ(Actual.size(), Expected.size());

Return type is not obvious, so auto should not be used.



Comment at: clang/unittests/Tooling/Syntax/TokenBufferTest.cpp:190
+for (unsigned I = 0; I < Actual.size(); ++I) {
+  auto &A = Actual[I];
+  auto &E = Expected[I];

Return type is not obvious, so auto should not be used.



Comment at: clang/unittests/Tooling/Syntax/TokenBufferTest.cpp:191
+  auto &A = Actual[I];
+  auto &E = Expected[I];
+

Return type is not obvious, so auto should not be used.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

2019-03-27 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov created this revision.
ilya-biryukov added a reviewer: gribozavr.
Herald added subscribers: jdoerfert, mgorny.
Herald added a project: clang.

TokenBuffer stores the list of tokens for a file obtained after
preprocessing. This is a base building block for syntax trees,
see [1] for the full proposal on syntax trees.

This commits also starts a new sub-library of ClangTooling, which
would be the home for the syntax trees and syntax-tree-based refactoring
utilities.

[1]: https://lists.llvm.org/pipermail/cfe-dev/2019-February/061414.html


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D59887

Files:
  clang/include/clang/Tooling/Syntax/TokenBuffer.h
  clang/lib/Tooling/CMakeLists.txt
  clang/lib/Tooling/Syntax/CMakeLists.txt
  clang/lib/Tooling/Syntax/TokenBuffer.cpp
  clang/unittests/Tooling/CMakeLists.txt
  clang/unittests/Tooling/Syntax/CMakeLists.txt
  clang/unittests/Tooling/Syntax/TokenBufferTest.cpp

Index: clang/unittests/Tooling/Syntax/TokenBufferTest.cpp
===
--- /dev/null
+++ clang/unittests/Tooling/Syntax/TokenBufferTest.cpp
@@ -0,0 +1,471 @@
+//===- TokenBufferTest.cpp ===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "clang/Tooling/Syntax/TokenBuffer.h"
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/Expr.h"
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticIDs.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/FileSystemOptions.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TokenKinds.def"
+#include "clang/Basic/TokenKinds.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendAction.h"
+#include "clang/Frontend/Utils.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "clang/Lex/Token.h"
+#include "clang/Tooling/Tooling.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Support/raw_os_ostream.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Testing/Support/Annotations.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+using namespace clang;
+using namespace clang::syntax;
+
+using ::testing::AllOf;
+using ::testing::Contains;
+using ::testing::ElementsAre;
+using ::testing::IsEmpty;
+using ::testing::Matcher;
+using ::testing::Pointwise;
+
+// Debug printers.
+// FIXME: This should live somewhere else or be implemented as 'operator
+// <<(raw_ostream&, T)'.
+namespace clang {
+namespace tok {
+inline void PrintTo(TokenKind K, std::ostream *OS) {
+  *OS << tok::getTokenName(K);
+}
+} // namespace tok
+namespace syntax {
+inline void PrintTo(const syntax::Token &T, std::ostream *OS) {
+  PrintTo(T.kind(), OS);
+  OS->flush();
+}
+} // namespace syntax
+} // namespace clang
+
+namespace {
+// Matchers for clang::Token.
+MATCHER_P(Kind, K, "") { return arg.kind() == K; }
+MATCHER_P2(HasText, Text, SourceMgr, "") {
+  return arg.text(*SourceMgr) == Text;
+}
+MATCHER_P2(IsIdent, Text, SourceMgr, "") {
+  return arg.kind() == tok::identifier && arg.text(*SourceMgr) == Text;
+}
+/// Checks the start and end location of a token are equal to SourceRng.
+MATCHER_P(RangeIs, SourceRng, "") {
+  return arg.location() == SourceRng.first &&
+ arg.endLocation() == SourceRng.second;
+}
+/// Checks the passed tuple has two similar tokens, i.e. both are of the same
+/// kind and have the same text if they are identifiers.
+MATCHER_P(IsSameToken, SourceMgr, "") {
+  auto &L = std::get<0>(arg);
+  auto &R = std::get<1>(arg);
+  if (L.kind() != R.kind())
+return false;
+  return L.text(*SourceMgr) == L.text(*SourceMgr);
+}
+
+class TokenBufferTest : public ::testing::Test {
+public:
+  /// Run the clang frontend, collect the preprocessed tokens from the frontend
+  /// invocation and store them in this->Tokens.
+  /// This also clears SourceManager before running the compiler.
+  void recordTokens(llvm::StringRef Code) {
+class RecordTokens : public ASTFrontendAction {
+public:
+  explicit RecordTokens(TokenBuffer &Result) : Result(Result) {}
+
+  bool BeginSourceFileAction(CompilerInstance &CI) override {
+assert(!Collector && "expected only a single call to BeginSourceFile");
+Collector.emplace(CI.getPreprocessor());
+return true;
+