Re: [PATCH] D20814: [include-fixer] Rank symbols based on the number of occurrences we found while merging.

2016-05-31 Thread Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL271283: [include-fixer] Rank symbols based on the number of 
occurrences we found… (authored by d0k).

Changed prior to commit:
  http://reviews.llvm.org/D20814?vs=59062=59065#toc

Repository:
  rL LLVM

http://reviews.llvm.org/D20814

Files:
  clang-tools-extra/trunk/include-fixer/SymbolIndexManager.cpp
  clang-tools-extra/trunk/include-fixer/tool/ClangIncludeFixer.cpp
  clang-tools-extra/trunk/test/include-fixer/Inputs/fake_yaml_db.yaml
  clang-tools-extra/trunk/test/include-fixer/ranking.cpp

Index: clang-tools-extra/trunk/test/include-fixer/Inputs/fake_yaml_db.yaml
===
--- clang-tools-extra/trunk/test/include-fixer/Inputs/fake_yaml_db.yaml
+++ clang-tools-extra/trunk/test/include-fixer/Inputs/fake_yaml_db.yaml
@@ -22,3 +22,24 @@
 Type:Class
 NumOccurrences:  1
 ...
+Name:   bar
+Contexts:
+  - ContextType: Namespace
+ContextName: a
+  - ContextType: Namespace
+ContextName: b
+FilePath:../include/bar.h
+LineNumber:  2
+Type:Class
+NumOccurrences:  3
+...
+Name:   bar
+Contexts:
+  - ContextType: Namespace
+ContextName: a
+  - ContextType: Namespace
+ContextName: b
+FilePath:../include/zbar.h
+LineNumber:  1
+Type:Class
+NumOccurrences:  3
Index: clang-tools-extra/trunk/test/include-fixer/ranking.cpp
===
--- clang-tools-extra/trunk/test/include-fixer/ranking.cpp
+++ clang-tools-extra/trunk/test/include-fixer/ranking.cpp
@@ -0,0 +1,6 @@
+// RUN: clang-include-fixer -db=yaml -input=%S/Inputs/fake_yaml_db.yaml -output-headers %s -- | FileCheck %s -implicit-check-not=.h
+
+// CHECK: "../include/bar.h"
+// CHECK-NEXT: "../include/zbar.h"
+
+bar b;
Index: clang-tools-extra/trunk/include-fixer/tool/ClangIncludeFixer.cpp
===
--- clang-tools-extra/trunk/include-fixer/tool/ClangIncludeFixer.cpp
+++ clang-tools-extra/trunk/include-fixer/tool/ClangIncludeFixer.cpp
@@ -98,11 +98,11 @@
   std::vector Headers;
   SmallVector CommaSplits;
   Split.second.split(CommaSplits, ",");
-  for (StringRef Header : CommaSplits)
+  for (size_t I = 0, E = CommaSplits.size(); I != E; ++I)
 Symbols.push_back(find_all_symbols::SymbolInfo(
 Split.first.trim(),
-find_all_symbols::SymbolInfo::SymbolKind::Unknown, Header.trim(), 1,
-{}));
+find_all_symbols::SymbolInfo::SymbolKind::Unknown,
+CommaSplits[I].trim(), 1, {}, /*NumOccurrences=*/E - I));
 }
 SymbolIndexMgr->addSymbolIndex(
 llvm::make_unique(Symbols));
Index: clang-tools-extra/trunk/include-fixer/SymbolIndexManager.cpp
===
--- clang-tools-extra/trunk/include-fixer/SymbolIndexManager.cpp
+++ clang-tools-extra/trunk/include-fixer/SymbolIndexManager.cpp
@@ -17,6 +17,37 @@
 namespace clang {
 namespace include_fixer {
 
+using clang::find_all_symbols::SymbolInfo;
+
+/// Sorts and uniques SymbolInfos based on the popularity info in SymbolInfo.
+static void rankByPopularity(std::vector ) {
+  // First collect occurrences per header file.
+  std::map HeaderPopularity;
+  for (const SymbolInfo  : Symbols) {
+unsigned  = HeaderPopularity[Symbol.getFilePath()];
+Popularity = std::max(Popularity, Symbol.getNumOccurrences());
+  }
+
+  // Sort by the gathered popularities. Use file name as a tie breaker so we can
+  // deduplicate.
+  std::sort(Symbols.begin(), Symbols.end(),
+[&](const SymbolInfo , const SymbolInfo ) {
+  auto APop = HeaderPopularity[A.getFilePath()];
+  auto BPop = HeaderPopularity[B.getFilePath()];
+  if (APop != BPop)
+return APop > BPop;
+  return A.getFilePath() < B.getFilePath();
+});
+
+  // Deduplicate based on the file name. They will have the same popularity and
+  // we don't want to suggest the same header twice.
+  Symbols.erase(std::unique(Symbols.begin(), Symbols.end(),
+[](const SymbolInfo , const SymbolInfo ) {
+  return A.getFilePath() == B.getFilePath();
+}),
+Symbols.end());
+}
+
 std::vector
 SymbolIndexManager::search(llvm::StringRef Identifier) const {
   // The identifier may be fully qualified, so split it and get all the context
@@ -45,6 +76,8 @@
 DEBUG(llvm::dbgs() << "Searching " << Names.back() << "... got "
<< Symbols.size() << " results...\n");
 
+rankByPopularity(Symbols);
+
 for (const auto  : Symbols) {
   // Match the identifier name without qualifier.
   if (Symbol.getName() == Names.back()) {

Re: [PATCH] D20814: [include-fixer] Rank symbols based on the number of occurrences we found while merging.

2016-05-31 Thread Daniel Jasper via cfe-commits
djasper accepted this revision.
This revision is now accepted and ready to land.


Comment at: include-fixer/SymbolIndexManager.cpp:25
@@ +24,3 @@
+  // First collect occurrences per header file.
+  std::map HeaderPopularity;
+  for (const SymbolInfo  : Symbols) {

Maybe use a DenseHashMap?


Comment at: include-fixer/tool/ClangIncludeFixer.cpp:101
@@ -100,3 +100,3 @@
   Split.second.split(CommaSplits, ",");
-  for (StringRef Header : CommaSplits)
+  for (const StringRef  : CommaSplits)
 Symbols.push_back(find_all_symbols::SymbolInfo(

Can you please write a regular for loop with index instead?


Comment at: include-fixer/tool/ClangIncludeFixer.cpp:105
@@ -104,3 +104,3 @@
 find_all_symbols::SymbolInfo::SymbolKind::Unknown, Header.trim(), 
1,
-{}));
+{}, /*NumOccurrences=*/CommaSplits.end() - ));
 }

Add a comment that you are assigning fake occurrences to keep the existing test 
logic (with the first include being the most preferable).


http://reviews.llvm.org/D20814



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D20814: [include-fixer] Rank symbols based on the number of occurrences we found while merging.

2016-05-31 Thread Benjamin Kramer via cfe-commits
bkramer created this revision.
bkramer added reviewers: ioeric, djasper.
bkramer added a subscriber: cfe-commits.

This sorts based on the popularity of the header, not the symbol. If
there are mutliple matching symbols in one header we take the maximum
popularity for that header and deduplicate. If we know nothing we sort
lexicographically based on the header path.

http://reviews.llvm.org/D20814

Files:
  include-fixer/SymbolIndexManager.cpp
  include-fixer/tool/ClangIncludeFixer.cpp
  test/include-fixer/Inputs/fake_yaml_db.yaml
  test/include-fixer/ranking.cpp

Index: test/include-fixer/ranking.cpp
===
--- /dev/null
+++ test/include-fixer/ranking.cpp
@@ -0,0 +1,6 @@
+// RUN: clang-include-fixer -db=yaml -input=%S/Inputs/fake_yaml_db.yaml -output-headers %s -- | FileCheck %s -implicit-check-not=.h
+
+// CHECK: "../include/bar.h"
+// CHECK-NEXT: "../include/zbar.h"
+
+bar b;
Index: test/include-fixer/Inputs/fake_yaml_db.yaml
===
--- test/include-fixer/Inputs/fake_yaml_db.yaml
+++ test/include-fixer/Inputs/fake_yaml_db.yaml
@@ -22,3 +22,24 @@
 Type:Class
 NumOccurrences:  1
 ...
+Name:   bar
+Contexts:
+  - ContextType: Namespace
+ContextName: a
+  - ContextType: Namespace
+ContextName: b
+FilePath:../include/bar.h
+LineNumber:  2
+Type:Class
+NumOccurrences:  3
+...
+Name:   bar
+Contexts:
+  - ContextType: Namespace
+ContextName: a
+  - ContextType: Namespace
+ContextName: b
+FilePath:../include/zbar.h
+LineNumber:  1
+Type:Class
+NumOccurrences:  3
Index: include-fixer/tool/ClangIncludeFixer.cpp
===
--- include-fixer/tool/ClangIncludeFixer.cpp
+++ include-fixer/tool/ClangIncludeFixer.cpp
@@ -98,11 +98,11 @@
   std::vector Headers;
   SmallVector CommaSplits;
   Split.second.split(CommaSplits, ",");
-  for (StringRef Header : CommaSplits)
+  for (const StringRef  : CommaSplits)
 Symbols.push_back(find_all_symbols::SymbolInfo(
 Split.first.trim(),
 find_all_symbols::SymbolInfo::SymbolKind::Unknown, Header.trim(), 1,
-{}));
+{}, /*NumOccurrences=*/CommaSplits.end() - ));
 }
 SymbolIndexMgr->addSymbolIndex(
 llvm::make_unique(Symbols));
Index: include-fixer/SymbolIndexManager.cpp
===
--- include-fixer/SymbolIndexManager.cpp
+++ include-fixer/SymbolIndexManager.cpp
@@ -17,6 +17,37 @@
 namespace clang {
 namespace include_fixer {
 
+using clang::find_all_symbols::SymbolInfo;
+
+/// Sorts and uniques SymbolInfos based on the popularity info in SymbolInfo.
+static void rankByPopularity(std::vector ) {
+  // First collect occurrences per header file.
+  std::map HeaderPopularity;
+  for (const SymbolInfo  : Symbols) {
+unsigned  = HeaderPopularity[Symbol.getFilePath()];
+Popularity = std::max(Popularity, Symbol.getNumOccurrences());
+  }
+
+  // Sort by the gathered popularities. Use file name as a tie breaker so we can
+  // deduplicate.
+  std::sort(Symbols.begin(), Symbols.end(),
+[&](const SymbolInfo , const SymbolInfo ) {
+  auto APop = HeaderPopularity[A.getFilePath()];
+  auto BPop = HeaderPopularity[B.getFilePath()];
+  if (APop != BPop)
+return APop > BPop;
+  return A.getFilePath() < B.getFilePath();
+});
+
+  // Deduplicate based on the file name. They will have the same popularity and
+  // we don't want to suggest the same header twice.
+  Symbols.erase(std::unique(Symbols.begin(), Symbols.end(),
+[](const SymbolInfo , const SymbolInfo ) {
+  return A.getFilePath() == B.getFilePath();
+}),
+Symbols.end());
+}
+
 std::vector
 SymbolIndexManager::search(llvm::StringRef Identifier) const {
   // The identifier may be fully qualified, so split it and get all the context
@@ -45,6 +76,8 @@
 DEBUG(llvm::dbgs() << "Searching " << Names.back() << "... got "
<< Symbols.size() << " results...\n");
 
+rankByPopularity(Symbols);
+
 for (const auto  : Symbols) {
   // Match the identifier name without qualifier.
   if (Symbol.getName() == Names.back()) {
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits