[PATCH] D35923: Fix PR32332 - PCM nondeterminism with builtins caused by global module index

2017-07-28 Thread Adrian Prantl via Phabricator via cfe-commits
aprantl added a comment.

After thinking about this some more, I think it should be safe to use the 
global module index as a negative cache like in this example. The conditions 
under which a module is rebuilt should not affect the contents, and if the 
module were out-of-date it would have been rebuilt anyway.
I will see if I can track down why the builtin macros make it into the module's 
identifier table. We already know that they are being marked incorrectly as 
FromAST, but I do not yet understand why.


https://reviews.llvm.org/D35923



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D35923: Fix PR32332 - PCM nondeterminism with builtins caused by global module index

2017-07-26 Thread Doug Gregor via Phabricator via cfe-commits
doug.gregor added a comment.

I am *very* concerned about this. The optimization that avoids doing lookups 
within module files where we "know" we won't find anything was very important 
at the time it was introduced, so I'd like to understand better why it didn't 
make a significant difference in your benchmarks before we disable it. 
(Especially because of built-ins; those are Very Weird and probably modeled 
wrong).


https://reviews.llvm.org/D35923



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D35923: Fix PR32332 - PCM nondeterminism with builtins caused by global module index

2017-07-26 Thread Adrian Prantl via Phabricator via cfe-commits
aprantl created this revision.

Long story short: I think that it is too dangerous to rely on the global module 
index for anything but diagnostics.

The global module index is only rebuilt in two cases:

  a. When triggered by Sema for diagnosing errors
  b. At the very end of FrontendAction::Execute() 

This patch is deleting a shortcut in ModuleManager::visit() that uses the 
global module index as a negative cache to determine that it does not make 
sense to open a module when looking for a specific identifier.

In the test case (see also the PR https://bugs.llvm.org/show_bug.cgi?id=32332)  
we get different PCM output for B.pcm when

1. building B.pcm (which also builds the dependency A.pcm in one go)
2. building A.pcm, then building B.pcm in a separate clang invocation

Because of when the global module index is built, (1) does not use the above 
shortcut, but (2) does. The symbol were this makes a difference is a built-in 
macro __FLT_EPSILON__ which does not belong to any module but still makes it 
into the global module index as not being defined in any module. This causes 
the symbol to be serialized in (1) A.pcm and (2) A.pcm and B.pcm.

I have considered the alternative of hunting down why builtins are serialized 
into the PCM in the first place, but frankly I feel like this is just the tip 
of the iceberg, and that relying on the global module index this way is just 
asking for trouble in concurrent or incremental builds. I have looked at the 
performance impact of the change, and in the project I built (a large project, 
~30min build time) the wall-clock numbers were in the noise.

rdar://problem/31122421


https://reviews.llvm.org/D35923

Files:
  lib/Serialization/ModuleManager.cpp
  test/Modules/Inputs/builtins/a.h
  test/Modules/Inputs/builtins/b.h
  test/Modules/Inputs/builtins/module.modulemap
  test/Modules/builtins-incremental.m


Index: test/Modules/builtins-incremental.m
===
--- /dev/null
+++ test/Modules/builtins-incremental.m
@@ -0,0 +1,28 @@
+// RUN: rm -rf %T/cache %T/cache1 %T/cache2 
+//
+// Build B which imports A.
+// RUN: %clang_cc1 -cc1 -fsyntax-only -fmodules -fimplicit-module-maps \
+// RUN:   -fmodules-cache-path=%T/cache -fdisable-module-hash \
+// RUN:   -I %S/Inputs/builtins %s
+// RUN: mv %T/cache %T/cache1
+//
+// Build only A.
+// RUN: echo '@import A;' >%t.m
+// RUN: %clang_cc1 -cc1 -fsyntax-only -fmodules -fimplicit-module-maps \
+// RUN:   -fmodules-cache-path=%T/cache -fdisable-module-hash \
+// RUN:   -I %S/Inputs/builtins %t.m
+//
+// Incrementally build B which imports A.
+// RUN: %clang_cc1 -cc1 -fsyntax-only -fmodules -fimplicit-module-maps \
+// RUN:   -fmodules-cache-path=%T/cache -fdisable-module-hash \
+// RUN:   -I %S/Inputs/builtins %s
+// RUN: mv %T/cache %T/cache2
+//
+// RUN: llvm-bcanalyzer %T/cache1/B.pcm >%t.dump
+// RUN: llvm-bcanalyzer %T/cache2/B.pcm >>%t.dump
+// RUN: cat %t.dump | FileCheck %s
+
+// Verify the identifier table has the same size in both versions of B.pcm:
+// CHECK:  1 [[INDENT_SIZE:[0-9]+]] 100.00 IDENTIFIER_TABLE
+// CHECK:  1 [[INDENT_SIZE]] 100.00 IDENTIFIER_TABLE
+@import B;
Index: test/Modules/Inputs/builtins/module.modulemap
===
--- /dev/null
+++ test/Modules/Inputs/builtins/module.modulemap
@@ -0,0 +1,7 @@
+module A {
+  header "a.h"
+}
+
+module B {
+  header "b.h"
+}
Index: test/Modules/Inputs/builtins/b.h
===
--- /dev/null
+++ test/Modules/Inputs/builtins/b.h
@@ -0,0 +1,3 @@
+#include "a.h"
+@interface B : A
+@end
Index: test/Modules/Inputs/builtins/a.h
===
--- /dev/null
+++ test/Modules/Inputs/builtins/a.h
@@ -0,0 +1,3 @@
+#include 
+@interface A
+@end
Index: lib/Serialization/ModuleManager.cpp
===
--- lib/Serialization/ModuleManager.cpp
+++ lib/Serialization/ModuleManager.cpp
@@ -363,19 +363,6 @@
 
   VisitState *State = allocateVisitState();
   unsigned VisitNumber = State->NextVisitNumber++;
-
-  // If the caller has provided us with a hit-set that came from the global
-  // module index, mark every module file in common with the global module
-  // index that is *not* in that set as 'visited'.
-  if (ModuleFilesHit && !ModulesInCommonWithGlobalIndex.empty()) {
-for (unsigned I = 0, N = ModulesInCommonWithGlobalIndex.size(); I != N; 
++I)
-{
-  ModuleFile *M = ModulesInCommonWithGlobalIndex[I];
-  if (!ModuleFilesHit->count(M))
-State->VisitNumber[M->Index] = VisitNumber;
-}
-  }
-
   for (unsigned I = 0, N = VisitOrder.size(); I != N; ++I) {
 ModuleFile *CurrentModule = VisitOrder[I];
 // Should we skip this module file?


Index: test/Modules/builtins-incremental.m
===
--- /dev/null
+++