[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-10-15 Thread Sylvestre Ledru via Phabricator via cfe-commits
sylvestre.ledru added a comment.

@ostannard  Maybe you could add that to the release notes? Thanks


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-10-11 Thread Oliver Stannard (Linaro) via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG9f6a873268e1: Dead Virtual Function Elimination (authored by 
ostannard).

Changed prior to commit:
  https://reviews.llvm.org/D63932?vs=218363=224568#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932

Files:
  clang/include/clang/Basic/CodeGenOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CGClass.cpp
  clang/lib/CodeGen/CGVTables.cpp
  clang/lib/CodeGen/CodeGenModule.h
  clang/lib/CodeGen/ItaniumCXXABI.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Frontend/CompilerInvocation.cpp
  clang/test/CodeGenCXX/vcall-visibility-metadata.cpp
  clang/test/CodeGenCXX/virtual-function-elimination.cpp
  clang/test/Driver/virtual-function-elimination.cpp
  llvm/docs/LangRef.rst
  llvm/docs/TypeMetadata.rst
  llvm/include/llvm/Analysis/TypeMetadataUtils.h
  llvm/include/llvm/IR/FixedMetadataKinds.def
  llvm/include/llvm/IR/GlobalObject.h
  llvm/include/llvm/Transforms/IPO/GlobalDCE.h
  llvm/lib/Analysis/TypeMetadataUtils.cpp
  llvm/lib/IR/Metadata.cpp
  llvm/lib/LTO/LTO.cpp
  llvm/lib/LTO/LTOCodeGenerator.cpp
  llvm/lib/Transforms/IPO/GlobalDCE.cpp
  llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
  llvm/test/LTO/ARM/lto-linking-metadata.ll
  llvm/test/ThinLTO/X86/lazyload_metadata.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions-base-call.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions-base-pointer-call.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions-derived-call.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions-derived-pointer-call.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions-visibility-post-lto.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions-visibility-pre-lto.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions.ll
  llvm/test/Transforms/GlobalDCE/vtable-rtti.ll
  llvm/test/Transforms/Internalize/vcall-visibility.ll

Index: llvm/test/Transforms/Internalize/vcall-visibility.ll
===
--- /dev/null
+++ llvm/test/Transforms/Internalize/vcall-visibility.ll
@@ -0,0 +1,64 @@
+; RUN: opt < %s -internalize -S | FileCheck %s
+
+%struct.A = type { i32 (...)** }
+%struct.B = type { i32 (...)** }
+%struct.C = type { i32 (...)** }
+
+; Class A has default visibility, so has no !vcall_visibility metadata before
+; or after LTO.
+; CHECK-NOT: @_ZTV1A = {{.*}}!vcall_visibility
+@_ZTV1A = dso_local unnamed_addr constant { [3 x i8*] } { [3 x i8*] [i8* null, i8* null, i8* bitcast (void (%struct.A*)* @_ZN1A3fooEv to i8*)] }, align 8, !type !0, !type !1
+
+; Class B has hidden visibility but public LTO visibility, so has no
+; !vcall_visibility metadata before or after LTO.
+; CHECK-NOT: @_ZTV1B = {{.*}}!vcall_visibility
+@_ZTV1B = hidden unnamed_addr constant { [3 x i8*] } { [3 x i8*] [i8* null, i8* null, i8* bitcast (void (%struct.B*)* @_ZN1B3fooEv to i8*)] }, align 8, !type !2, !type !3
+
+; Class C has hidden visibility, so the !vcall_visibility metadata is set to 1
+; (linkage unit) before LTO, and 2 (translation unit) after LTO.
+; CHECK: @_ZTV1C ={{.*}}!vcall_visibility [[MD_TU_VIS:![0-9]+]]
+@_ZTV1C = hidden unnamed_addr constant { [3 x i8*] } { [3 x i8*] [i8* null, i8* null, i8* bitcast (void (%struct.C*)* @_ZN1C3fooEv to i8*)] }, align 8, !type !4, !type !5, !vcall_visibility !6
+
+; Class D has translation unit visibility before LTO, and this is not changed
+; by LTO.
+; CHECK: @_ZTVN12_GLOBAL__N_11DE = {{.*}}!vcall_visibility [[MD_TU_VIS:![0-9]+]]
+@_ZTVN12_GLOBAL__N_11DE = internal unnamed_addr constant { [3 x i8*] } zeroinitializer, align 8, !type !7, !type !9, !vcall_visibility !11
+
+define dso_local void @_ZN1A3fooEv(%struct.A* nocapture %this) {
+entry:
+  ret void
+}
+
+define hidden void @_ZN1B3fooEv(%struct.B* nocapture %this) {
+entry:
+  ret void
+}
+
+define hidden void @_ZN1C3fooEv(%struct.C* nocapture %this) {
+entry:
+  ret void
+}
+
+define hidden noalias nonnull i8* @_Z6make_dv() {
+entry:
+  %call = tail call i8* @_Znwm(i64 8) #3
+  %0 = bitcast i8* %call to i32 (...)***
+  store i32 (...)** bitcast (i8** getelementptr inbounds ({ [3 x i8*] }, { [3 x i8*] }* @_ZTVN12_GLOBAL__N_11DE, i64 0, inrange i32 0, i64 2) to i32 (...)**), i32 (...)*** %0, align 8
+  ret i8* %call
+}
+
+declare dso_local noalias nonnull i8* @_Znwm(i64)
+
+; CHECK: [[MD_TU_VIS]] = !{i64 2}
+!0 = !{i64 16, !"_ZTS1A"}
+!1 = !{i64 16, !"_ZTSM1AFvvE.virtual"}
+!2 = !{i64 16, !"_ZTS1B"}
+!3 = !{i64 16, !"_ZTSM1BFvvE.virtual"}
+!4 = !{i64 16, !"_ZTS1C"}
+!5 = !{i64 16, !"_ZTSM1CFvvE.virtual"}
+!6 = !{i64 1}
+!7 = !{i64 16, !8}
+!8 = distinct !{}
+!9 = !{i64 16, !10}
+!10 = distinct !{}
+!11 = !{i64 2}
Index: llvm/test/Transforms/GlobalDCE/vtable-rtti.ll
===
--- /dev/null
+++ llvm/test/Transforms/GlobalDCE/vtable-rtti.ll
@@ -0,0 +1,47 @@
+; RUN: opt < %s -globaldce -S | 

[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-10-10 Thread Peter Collingbourne via Phabricator via cfe-commits
pcc accepted this revision.
pcc added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-10-10 Thread Oliver Stannard (Linaro) via Phabricator via cfe-commits
ostannard added a comment.

Ping


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-10-03 Thread Oliver Stannard (Linaro) via Phabricator via cfe-commits
ostannard added a comment.

Ping


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-09-24 Thread Mehdi AMINI via Phabricator via cfe-commits
mehdi_amini added a comment.

In D63932#1679910 , @ostannard wrote:

> > This makes the IR self-contained, which is good, but it also make the 
> > interpretation of the IR modal, which isn't great. It means that suddenly 
> > the rules of interpretation of what is valid to do or not changes according 
> > to this module flag.
>
> I think the original version was better from that perspective - most compiler 
> passes only need to check for one value of the attribute, and only the linker 
> needed to care about the difference between link-unit and public visibility.


The *linker* needed to care about the difference is fine, but that's different 
from a *compiler pass* during LTO. In general it is an important design point 
in LLVM so far to encode these information in the IR (for instance the linkage 
type of globals would be changed and not the interpretation of these with a 
flag).

> Do you think we should go back to that design, or do you have a different 
> idea?

I didn't review the original version, but pcc@ did apparently, so I'm willing 
to trust them on what's the best way forward here.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-09-23 Thread Oliver Stannard (Linaro) via Phabricator via cfe-commits
ostannard added a comment.

> This makes the IR self-contained, which is good, but it also make the 
> interpretation of the IR modal, which isn't great. It means that suddenly the 
> rules of interpretation of what is valid to do or not changes according to 
> this module flag.

I think the original version was better from that perspective - most compiler 
passes only need to check for one value of the attribute, and only the linker 
needed to care about the difference between link-unit and public visibility. Do 
you think we should go back to that design, or do you have a different idea?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-09-21 Thread Mehdi AMINI via Phabricator via cfe-commits
mehdi_amini added a comment.

> we represent the "post-link" flag in the IR (e.g. as a module flag) in order 
> to keep the IR self-contained.

This makes the IR self-contained, which is good, but it also make the 
interpretation of the IR modal, which isn't great. It means that suddenly the 
rules of interpretation of what is valid to do or not changes according to this 
module flag.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-09-19 Thread Oliver Stannard (Linaro) via Phabricator via cfe-commits
ostannard added a comment.

Ping


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-09-12 Thread Oliver Stannard (Linaro) via Phabricator via cfe-commits
ostannard added a comment.

Ping


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-09-02 Thread Oliver Stannard (Linaro) via Phabricator via cfe-commits
ostannard marked 4 inline comments as done.
ostannard added a comment.

> It isn't that common, but it seems worth doing if it can be done easily.



> That said, I note that it does appear that your implementation will end up 
> preserving the pointer in the vtable in this case because you're relying on 
> the use list to make decisions about what to GC. So it doesn't seem easy to 
> do at this point, but if for example we made this compatible with ThinLTO at 
> some point we would probably not be able to rely on the use list, and the 
> resulting changes to this feature might make this easier to do.

Ok, I think that it makes sense to leave this for a separate patch, as long as 
we currently generate correct code. I've added partial linking of the LTO unit 
to my fuzzer, and haven't found any problems.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-09-02 Thread Oliver Stannard (Linaro) via Phabricator via cfe-commits
ostannard updated this revision to Diff 218363.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932

Files:
  clang/include/clang/Basic/CodeGenOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CGClass.cpp
  clang/lib/CodeGen/CGVTables.cpp
  clang/lib/CodeGen/CodeGenModule.h
  clang/lib/CodeGen/ItaniumCXXABI.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Frontend/CompilerInvocation.cpp
  clang/test/CodeGenCXX/vcall-visibility-metadata.cpp
  clang/test/CodeGenCXX/virtual-function-elimination.cpp
  clang/test/Driver/virtual-function-elimination.cpp
  llvm/docs/LangRef.rst
  llvm/docs/TypeMetadata.rst
  llvm/include/llvm/Analysis/TypeMetadataUtils.h
  llvm/include/llvm/IR/FixedMetadataKinds.def
  llvm/include/llvm/IR/GlobalObject.h
  llvm/include/llvm/Transforms/IPO/GlobalDCE.h
  llvm/lib/Analysis/TypeMetadataUtils.cpp
  llvm/lib/IR/Metadata.cpp
  llvm/lib/LTO/LTO.cpp
  llvm/lib/LTO/LTOCodeGenerator.cpp
  llvm/lib/Transforms/IPO/GlobalDCE.cpp
  llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
  llvm/test/LTO/ARM/lto-linking-metadata.ll
  llvm/test/ThinLTO/X86/lazyload_metadata.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions-base-call.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions-base-pointer-call.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions-derived-call.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions-derived-pointer-call.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions-visibility-post-lto.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions-visibility-pre-lto.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions.ll
  llvm/test/Transforms/GlobalDCE/vtable-rtti.ll
  llvm/test/Transforms/Internalize/vcall-visibility.ll

Index: llvm/test/Transforms/Internalize/vcall-visibility.ll
===
--- /dev/null
+++ llvm/test/Transforms/Internalize/vcall-visibility.ll
@@ -0,0 +1,64 @@
+; RUN: opt < %s -internalize -S | FileCheck %s
+
+%struct.A = type { i32 (...)** }
+%struct.B = type { i32 (...)** }
+%struct.C = type { i32 (...)** }
+
+; Class A has default visibility, so has no !vcall_visibility metadata before
+; or after LTO.
+; CHECK-NOT: @_ZTV1A = {{.*}}!vcall_visibility
+@_ZTV1A = dso_local unnamed_addr constant { [3 x i8*] } { [3 x i8*] [i8* null, i8* null, i8* bitcast (void (%struct.A*)* @_ZN1A3fooEv to i8*)] }, align 8, !type !0, !type !1
+
+; Class B has hidden visibility but public LTO visibility, so has no
+; !vcall_visibility metadata before or after LTO.
+; CHECK-NOT: @_ZTV1B = {{.*}}!vcall_visibility
+@_ZTV1B = hidden unnamed_addr constant { [3 x i8*] } { [3 x i8*] [i8* null, i8* null, i8* bitcast (void (%struct.B*)* @_ZN1B3fooEv to i8*)] }, align 8, !type !2, !type !3
+
+; Class C has hidden visibility, so the !vcall_visibility metadata is set to 1
+; (linkage unit) before LTO, and 2 (translation unit) after LTO.
+; CHECK: @_ZTV1C ={{.*}}!vcall_visibility [[MD_TU_VIS:![0-9]+]]
+@_ZTV1C = hidden unnamed_addr constant { [3 x i8*] } { [3 x i8*] [i8* null, i8* null, i8* bitcast (void (%struct.C*)* @_ZN1C3fooEv to i8*)] }, align 8, !type !4, !type !5, !vcall_visibility !6
+
+; Class D has translation unit visibility before LTO, and this is not changed
+; by LTO.
+; CHECK: @_ZTVN12_GLOBAL__N_11DE = {{.*}}!vcall_visibility [[MD_TU_VIS:![0-9]+]]
+@_ZTVN12_GLOBAL__N_11DE = internal unnamed_addr constant { [3 x i8*] } zeroinitializer, align 8, !type !7, !type !9, !vcall_visibility !11
+
+define dso_local void @_ZN1A3fooEv(%struct.A* nocapture %this) {
+entry:
+  ret void
+}
+
+define hidden void @_ZN1B3fooEv(%struct.B* nocapture %this) {
+entry:
+  ret void
+}
+
+define hidden void @_ZN1C3fooEv(%struct.C* nocapture %this) {
+entry:
+  ret void
+}
+
+define hidden noalias nonnull i8* @_Z6make_dv() {
+entry:
+  %call = tail call i8* @_Znwm(i64 8) #3
+  %0 = bitcast i8* %call to i32 (...)***
+  store i32 (...)** bitcast (i8** getelementptr inbounds ({ [3 x i8*] }, { [3 x i8*] }* @_ZTVN12_GLOBAL__N_11DE, i64 0, inrange i32 0, i64 2) to i32 (...)**), i32 (...)*** %0, align 8
+  ret i8* %call
+}
+
+declare dso_local noalias nonnull i8* @_Znwm(i64)
+
+; CHECK: [[MD_TU_VIS]] = !{i64 2}
+!0 = !{i64 16, !"_ZTS1A"}
+!1 = !{i64 16, !"_ZTSM1AFvvE.virtual"}
+!2 = !{i64 16, !"_ZTS1B"}
+!3 = !{i64 16, !"_ZTSM1BFvvE.virtual"}
+!4 = !{i64 16, !"_ZTS1C"}
+!5 = !{i64 16, !"_ZTSM1CFvvE.virtual"}
+!6 = !{i64 1}
+!7 = !{i64 16, !8}
+!8 = distinct !{}
+!9 = !{i64 16, !10}
+!10 = distinct !{}
+!11 = !{i64 2}
Index: llvm/test/Transforms/GlobalDCE/vtable-rtti.ll
===
--- /dev/null
+++ llvm/test/Transforms/GlobalDCE/vtable-rtti.ll
@@ -0,0 +1,47 @@
+; RUN: opt < %s -globaldce -S | FileCheck %s
+
+; We currently only use llvm.type.checked.load for virtual function pointers,
+; not any other part of the vtable, so we can't remove the RTTI pointer even if
+; it's never going to be 

[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-08-19 Thread Peter Collingbourne via Phabricator via cfe-commits
pcc added a comment.

In D63932#1610182 , @ostannard wrote:

> > Partial linking will indeed prevent dropping the virtual functions, but it 
> > should not prevent clearing the pointer to the virtual function in the 
> > vtable. The linker should then be able to drop the virtual function body as 
> > part of `--gc-sections` during the final link.
>
> If partial linking isn't doing internalisation, I'd expect that to prevent a 
> lot of other LTO optimisations, not just VFE. Is this a common use-case which 
> we need to care about?


It isn't that common, but it seems worth doing if it can be done easily.

That said, I note that it does appear that your implementation will end up 
preserving the pointer in the vtable in this case because you're relying on the 
use list to make decisions about what to GC. So it doesn't seem easy to do at 
this point, but if for example we made this compatible with ThinLTO at some 
point we would probably not be able to rely on the use list, and the resulting 
changes to this feature might make this easier to do.

>> I think I would favour something closer to your first suggestion, but 
>> instead of telling GlobalDCE specifically about this, we represent the 
>> "post-link" flag in the IR (e.g. as a module flag) in order to keep the IR 
>> self-contained. LTO would then be taught to add this flag at the right time, 
>> and the logic inside GlobalDCE would be:
>> 
>> - If post-link flag not set, allow VFE if linkage <= TranslationUnit.
>> - If post-link flag set, allow VFE if linkage <= LinkageUnit.
>> 
>>   This would also help address a correctness issue with the CFI and WPD 
>> passes, which is that it is currently unsound to run them at compile time. 
>> If we let them run at compile time, we would in principle be able to do CFI 
>> and WPD on internal linkage types without LTO.
> 
> Ok, sounds reasonable, though I suspect WPD and CFI will need a slightly 
> different definition of type visibility - they care about the possibility of 
> unseen code adding new derived classes, so the visibility of base classes 
> isn't important, and they might be able to make use of the final specifier.

Internal linkage types use distinct metadata (i.e. metadata that can't be 
merged with other TUs), so there's no possibility of a new derived class being 
added. I suspect that we could make these passes check the distinctness of 
metadata if the post-link flag is not set, but that probably deserves more 
thought.




Comment at: llvm/lib/Transforms/IPO/GlobalDCE.cpp:409
+  F->replaceAllUsesWith(
+  
ConstantPointerNull::get(PointerType::getUnqual(F->getFunctionType(;
+}

Could be just `ConstantPointerNull::get(F->getType())`.



Comment at: llvm/test/LTO/ARM/lto-linking-metadata.ll:2
+; RUN: opt %s -o %t1.bc
+; RUN: llvm-lto %t1.bc -o %t1.save.opt -save-merged-module -O1 
--exported-symbol=foo
+; RUN: llvm-dis < %t1.save.opt.merged.bc | FileCheck %s

Could you add a test for the new LTO API as well as the legacy one?



Comment at: 
llvm/test/Transforms/GlobalDCE/virtual-functions-derived-pointer-call.ll:30
+; pointer of type "int (B::*q)(int)", so the call could only be dispatched to
+; or B::foo. It can't be dispatched to A::bar or B::bar as the function pointer
+; does not match, and it can't be dispatched to A::foo as the object type

Remove word "or".



Comment at: 
llvm/test/Transforms/GlobalDCE/virtual-functions-visibility-post-lto.ll:13
+@_ZTS1A = hidden constant [3 x i8] c"1A\00", align 1
+@_ZTI1A = hidden constant { i8*, i8* } { i8* bitcast (i8** getelementptr 
inbounds (i8*, i8** @_ZTVN10__cxxabiv117__class_type_infoE, i64 2) to i8*), i8* 
getelementptr inbounds ([3 x i8], [3 x i8]* @_ZTS1A, i32 0, i32 0) }, align 8
+

Could you simplify this test (and others) by removing the RTTI, please? We 
should also have a test showing that RTTI is preserved.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-08-01 Thread Oliver Stannard (Linaro) via Phabricator via cfe-commits
ostannard updated this revision to Diff 212833.
ostannard added a comment.

- Add LTOPostLink metadata, instead of internalising vcall visibility at LTO 
time


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932

Files:
  clang/include/clang/Basic/CodeGenOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CGClass.cpp
  clang/lib/CodeGen/CGVTables.cpp
  clang/lib/CodeGen/CodeGenModule.h
  clang/lib/CodeGen/ItaniumCXXABI.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Frontend/CompilerInvocation.cpp
  clang/test/CodeGenCXX/vcall-visibility-metadata.cpp
  clang/test/CodeGenCXX/virtual-function-elimination.cpp
  clang/test/Driver/virtual-function-elimination.cpp
  llvm/docs/LangRef.rst
  llvm/docs/TypeMetadata.rst
  llvm/include/llvm/Analysis/TypeMetadataUtils.h
  llvm/include/llvm/IR/FixedMetadataKinds.def
  llvm/include/llvm/IR/GlobalObject.h
  llvm/include/llvm/Transforms/IPO/GlobalDCE.h
  llvm/lib/Analysis/TypeMetadataUtils.cpp
  llvm/lib/IR/Metadata.cpp
  llvm/lib/LTO/LTO.cpp
  llvm/lib/LTO/LTOCodeGenerator.cpp
  llvm/lib/Transforms/IPO/GlobalDCE.cpp
  llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
  llvm/test/LTO/ARM/lto-linking-metadata.ll
  llvm/test/ThinLTO/X86/lazyload_metadata.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions-base-call.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions-base-pointer-call.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions-derived-call.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions-derived-pointer-call.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions-visibility-post-lto.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions-visibility-pre-lto.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions.ll
  llvm/test/Transforms/Internalize/vcall-visibility.ll

Index: llvm/test/Transforms/Internalize/vcall-visibility.ll
===
--- /dev/null
+++ llvm/test/Transforms/Internalize/vcall-visibility.ll
@@ -0,0 +1,64 @@
+; RUN: opt < %s -internalize -S | FileCheck %s
+
+%struct.A = type { i32 (...)** }
+%struct.B = type { i32 (...)** }
+%struct.C = type { i32 (...)** }
+
+; Class A has default visibility, so has no !vcall_visibility metadata before
+; or after LTO.
+; CHECK-NOT: @_ZTV1A = {{.*}}!vcall_visibility
+@_ZTV1A = dso_local unnamed_addr constant { [3 x i8*] } { [3 x i8*] [i8* null, i8* null, i8* bitcast (void (%struct.A*)* @_ZN1A3fooEv to i8*)] }, align 8, !type !0, !type !1
+
+; Class B has hidden visibility but public LTO visibility, so has no
+; !vcall_visibility metadata before or after LTO.
+; CHECK-NOT: @_ZTV1B = {{.*}}!vcall_visibility
+@_ZTV1B = hidden unnamed_addr constant { [3 x i8*] } { [3 x i8*] [i8* null, i8* null, i8* bitcast (void (%struct.B*)* @_ZN1B3fooEv to i8*)] }, align 8, !type !2, !type !3
+
+; Class C has hidden visibility, so the !vcall_visibility metadata is set to 1
+; (linkage unit) before LTO, and 2 (translation unit) after LTO.
+; CHECK: @_ZTV1C ={{.*}}!vcall_visibility [[MD_TU_VIS:![0-9]+]]
+@_ZTV1C = hidden unnamed_addr constant { [3 x i8*] } { [3 x i8*] [i8* null, i8* null, i8* bitcast (void (%struct.C*)* @_ZN1C3fooEv to i8*)] }, align 8, !type !4, !type !5, !vcall_visibility !6
+
+; Class D has translation unit visibility before LTO, and this is not changed
+; by LTO.
+; CHECK: @_ZTVN12_GLOBAL__N_11DE = {{.*}}!vcall_visibility [[MD_TU_VIS:![0-9]+]]
+@_ZTVN12_GLOBAL__N_11DE = internal unnamed_addr constant { [3 x i8*] } zeroinitializer, align 8, !type !7, !type !9, !vcall_visibility !11
+
+define dso_local void @_ZN1A3fooEv(%struct.A* nocapture %this) {
+entry:
+  ret void
+}
+
+define hidden void @_ZN1B3fooEv(%struct.B* nocapture %this) {
+entry:
+  ret void
+}
+
+define hidden void @_ZN1C3fooEv(%struct.C* nocapture %this) {
+entry:
+  ret void
+}
+
+define hidden noalias nonnull i8* @_Z6make_dv() {
+entry:
+  %call = tail call i8* @_Znwm(i64 8) #3
+  %0 = bitcast i8* %call to i32 (...)***
+  store i32 (...)** bitcast (i8** getelementptr inbounds ({ [3 x i8*] }, { [3 x i8*] }* @_ZTVN12_GLOBAL__N_11DE, i64 0, inrange i32 0, i64 2) to i32 (...)**), i32 (...)*** %0, align 8
+  ret i8* %call
+}
+
+declare dso_local noalias nonnull i8* @_Znwm(i64)
+
+; CHECK: [[MD_TU_VIS]] = !{i64 2}
+!0 = !{i64 16, !"_ZTS1A"}
+!1 = !{i64 16, !"_ZTSM1AFvvE.virtual"}
+!2 = !{i64 16, !"_ZTS1B"}
+!3 = !{i64 16, !"_ZTSM1BFvvE.virtual"}
+!4 = !{i64 16, !"_ZTS1C"}
+!5 = !{i64 16, !"_ZTSM1CFvvE.virtual"}
+!6 = !{i64 1}
+!7 = !{i64 16, !8}
+!8 = distinct !{}
+!9 = !{i64 16, !10}
+!10 = distinct !{}
+!11 = !{i64 2}
Index: llvm/test/Transforms/GlobalDCE/virtual-functions.ll
===
--- /dev/null
+++ llvm/test/Transforms/GlobalDCE/virtual-functions.ll
@@ -0,0 +1,55 @@
+; RUN: opt < %s -globaldce -S | FileCheck %s
+
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+
+declare dso_local noalias nonnull i8* 

[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-08-01 Thread Oliver Stannard (Linaro) via Phabricator via cfe-commits
ostannard added a comment.

> Partial linking will indeed prevent dropping the virtual functions, but it 
> should not prevent clearing the pointer to the virtual function in the 
> vtable. The linker should then be able to drop the virtual function body as 
> part of `--gc-sections` during the final link.

If partial linking isn't doing internalisation, I'd expect that to prevent a 
lot of other LTO optimisations, not just VFE. Is this a common use-case which 
we need to care about?

> I think I would favour something closer to your first suggestion, but instead 
> of telling GlobalDCE specifically about this, we represent the "post-link" 
> flag in the IR (e.g. as a module flag) in order to keep the IR 
> self-contained. LTO would then be taught to add this flag at the right time, 
> and the logic inside GlobalDCE would be:
> 
> - If post-link flag not set, allow VFE if linkage <= TranslationUnit.
> - If post-link flag set, allow VFE if linkage <= LinkageUnit.
> 
>   This would also help address a correctness issue with the CFI and WPD 
> passes, which is that it is currently unsound to run them at compile time. If 
> we let them run at compile time, we would in principle be able to do CFI and 
> WPD on internal linkage types without LTO.

Ok, sounds reasonable, though I suspect WPD and CFI will need a slightly 
different definition of type visibility - they care about the possibility of 
unseen code adding new derived classes, so the visibility of base classes isn't 
important, and they might be able to make use of the final specifier.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-07-31 Thread Peter Collingbourne via Phabricator via cfe-commits
pcc added a comment.

In D63932#1608235 , @ostannard wrote:

> > - Take the example from my earlier message, give the "main executable" and 
> > "DSO" hidden visibility, build the "main executable" with LTO and the "DSO" 
> > without LTO, and statically link them both into the same executable. We run 
> > into the same problem where the Plugin1 vtable is potentially not 
> > referenced and thus misoptimised. Yes, this is a violation of the LTO 
> > visibility rules, but the example shows that we only detect it sometimes. I 
> > think that if we did want to detect cases where the LTO visibility rules 
> > are clearly being violated, the outcome should be to issue a diagnostic and 
> > not to silently proceed with optimizations disabled, since the violation 
> > might be masking other undetected issues. That really seems orthogonal to 
> > this patch, though.
>
> As you say, this is a violation of the LTO visibility rules, so 
> `[[clang::lto_visibility_public]]` must be used. I've gated this optimisation 
> behind `-fwhole-program-vtables`, which if I've understood 
> https://clang.llvm.org/docs/LTOVisibility.html correctly, is effectively the 
> user asserting to the compiler that they've followed the LTO visility rules 
> correctly.


Yes, that's correct.

>> - Imagine linking part of a program with ld -r with LTO -- all symbols 
>> including vtable symbols will appear to be externally visible, which is not 
>> necessarily a violation of the LTO visibility rules because they may not be 
>> used externally in the final link. So we end up pessimising unnecessarily.
> 
> I'm not really familiar with partial linking, is there any good documentation 
> on how it is supposed to interact with LTO?

I'm not aware of any. The way that I think about it is that a partial link 
operation with LTO inputs can still produce an LTO unit, but that LTO unit may 
not be combined with other LTO units, either by passing bitcode files directly 
to the final link or by adding another `ld -r` output with an LTO unit to the 
final link.

> I've tried to make the changed to vcall_visibility as similar to the normal 
> LTO internalisation process as possible, it happens at the same time and 
> under the same conditions. If partial linking prevents normal 
> internalisation, which will in turn prevent most of the existing 
> optimisations done by GlobalDCE, then surely it should also prevent 
> internalisation of vcall_visibility?

Partial linking will indeed prevent dropping the virtual functions, but it 
should not prevent clearing the pointer to the virtual function in the vtable. 
The linker should then be able to drop the virtual function body as part of 
`--gc-sections` during the final link.

>> I gave this some more thought and it seems that the frontend has all of the 
>> information required to make a determination about whether to allow VFE, 
>> without needing to teach LTO to relax visibility. We can make the frontend 
>> do this:
>> 
>> - If -flto was passed (see CodeGenOptions::PrepareForLTO), attach "allow 
>> VFE" metadata if class has hidden LTO visibility.
>> - Otherwise, attach "allow VFE" metadata if class is not isExternallyVisible.
>> 
>>   Now the behaviour of GlobalDCE can be made conditional on the "allow VFE" 
>> metadata alone. This also has the advantage of simplifying the IR by making 
>> "allow VFE" a boolean rather than a tri-state as it is now.
> 
> The problem with that is that GlobalDCE is still run in the pre-linking 
> pipeline when emiting an LTO object, because it can remove code/data which 
> already has internal linkage.

Right, of course. Sorry, I forgot about that. It seems that if we want to allow 
VFE on internal linkage types without LTO, we would still need the tri-state.

> To make this work, we'd have to do one of:
> 
> - Tell GlobalDCE whether it is running in LTO post-linking or not, so it can 
> avoid doing VFE before internalisation.   This breaks the idea that the IR 
> contains all the information needed to determine if an optimisation is legal 
> or not.
> - Remove the pre-linking runs of GlobalDCE when doing LTO. This will result 
> in more known-dead code reaching the LTO link stage, and results in a pass 
> which is only valid at certain points in the pipeline.
> - Split VFE out of GlobalDCE. This would result in worse optimisation, 
> because chains of dependencies involving both virtual calls and regular 
> references won't be followed fully.

I think I would favour something closer to your first suggestion, but instead 
of telling GlobalDCE specifically about this, we represent the "post-link" flag 
in the IR (e.g. as a module flag) in order to keep the IR self-contained. LTO 
would then be taught to add this flag at the right time, and the logic inside 
GlobalDCE would be:

- If post-link flag not set, allow VFE if linkage <= TranslationUnit.
- If post-link flag set, allow VFE if linkage <= LinkageUnit.

This would also help 

[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-07-31 Thread Oliver Stannard (Linaro) via Phabricator via cfe-commits
ostannard added a comment.

> - Take the example from my earlier message, give the "main executable" and 
> "DSO" hidden visibility, build the "main executable" with LTO and the "DSO" 
> without LTO, and statically link them both into the same executable. We run 
> into the same problem where the Plugin1 vtable is potentially not referenced 
> and thus misoptimised. Yes, this is a violation of the LTO visibility rules, 
> but the example shows that we only detect it sometimes. I think that if we 
> did want to detect cases where the LTO visibility rules are clearly being 
> violated, the outcome should be to issue a diagnostic and not to silently 
> proceed with optimizations disabled, since the violation might be masking 
> other undetected issues. That really seems orthogonal to this patch, though.

As you say, this is a violation of the LTO visibility rules, so 
`[[clang::lto_visibility_public]]` must be used. I've gated this optimisation 
behind `-fwhole-program-vtables`, which if I've understood 
https://clang.llvm.org/docs/LTOVisibility.html correctly, is effectively the 
user asserting to the compiler that they've followed the LTO visility rules 
correctly.

> - Imagine linking part of a program with ld -r with LTO -- all symbols 
> including vtable symbols will appear to be externally visible, which is not 
> necessarily a violation of the LTO visibility rules because they may not be 
> used externally in the final link. So we end up pessimising unnecessarily.

I'm not really familiar with partial linking, is there any good documentation 
on how it is supposed to interact with LTO?

I've tried to make the changed to vcall_visibility as similar to the normal LTO 
internalisation process as possible, it happens at the same time and under the 
same conditions. If partial linking prevents normal internalisation, which will 
in turn prevent most of the existing optimisations done by GlobalDCE, then 
surely it should also prevent internalisation of vcall_visibility?

> I gave this some more thought and it seems that the frontend has all of the 
> information required to make a determination about whether to allow VFE, 
> without needing to teach LTO to relax visibility. We can make the frontend do 
> this:
> 
> - If -flto was passed (see CodeGenOptions::PrepareForLTO), attach "allow VFE" 
> metadata if class has hidden LTO visibility.
> - Otherwise, attach "allow VFE" metadata if class is not isExternallyVisible.
> 
>   Now the behaviour of GlobalDCE can be made conditional on the "allow VFE" 
> metadata alone. This also has the advantage of simplifying the IR by making 
> "allow VFE" a boolean rather than a tri-state as it is now.

The problem with that is that GlobalDCE is still run in the pre-linking 
pipeline when emiting an LTO object, because it can remove code/data which 
already has internal linkage. To make this work, we'd have to do one of:

- Tell GlobalDCE whether it is running in LTO post-linking or not, so it can 
avoid doing VFE before internalisation.   This breaks the idea that the IR 
contains all the information needed to determine if an optimisation is legal or 
not.
- Remove the pre-linking runs of GlobalDCE when doing LTO. This will result in 
more known-dead code reaching the LTO link stage, and results in a pass which 
is only valid at certain points in the pipeline.
- Split VFE out of GlobalDCE. This would result in worse optimisation, because 
chains of dependencies involving both virtual calls and regular references 
won't be followed fully.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-07-30 Thread Peter Collingbourne via Phabricator via cfe-commits
pcc added a comment.

In D63932#1606248 , @ostannard wrote:

> In that example, with everything having default ELF visibility, all of the 
> vtables will get vcall_visibility public, which can't be optimised by VFE, 
> and won't ever be relaxed to one of the tighter visibilities.
>
> The cases which can be optimised are (assuming "visibility" means visibility 
> of the most-visible dynamic base class):
>
> - Classes in an anonymous namespace, which aren't visible outside their 
> translation unit. Probably doesn't happen very often, but can be optimised 
> without LTO.
> - Classes visible outside the translation unit, but not outside the LTO unit. 
> This generally means hidden ELF visibility, unless the lto_visibility_public 
> attribute is used. These can't be optimised without LTO, but can be with it.
>
>   I implemented the second case by adding the LinkageUnit visibility, which 
> can't be optimised by VFE, but can be reduced to TranslationUnit when LTO 
> internalisation happens. This could also have been done by not changing the 
> visibility at LTO time, and instead leting GlobalDCE know if it is running 
> post-LTO. Both of these should give the same results, but the way I used 
> represents the visibility fully in the IR, without having the extra state of 
> "are we doing LTO?".


Okay, I somehow missed that the relaxation to TranslationUnit was guarded on it 
being LinkageUnit to start with. So at least from that perspective I don't see 
any soundness issues. It still doesn't seem like a good idea to do the 
relaxation in LTO though, since the visibility-outside-of-LTO of the vtable 
symbols is not a reliable indicator whether the type is used externally (I 
think that's what confused me to begin with). Here are a couple of cases where 
things might go wrong:

- Take the example from my earlier message, give the "main executable" and 
"DSO" hidden visibility, build the "main executable" with LTO and the "DSO" 
without LTO, and statically link them both into the same executable. We run 
into the same problem where the `Plugin1` vtable is potentially not referenced 
and thus misoptimised. Yes, this is a violation of the LTO visibility rules, 
but the example shows that we only detect it sometimes. I think that if we did 
want to detect cases where the LTO visibility rules are clearly being violated, 
the outcome should be to issue a diagnostic and not to silently proceed with 
optimizations disabled, since the violation might be masking other undetected 
issues. That really seems orthogonal to this patch, though.
- Imagine linking part of a program with `ld -r` with LTO -- all symbols 
including vtable symbols will appear to be externally visible, which is not 
necessarily a violation of the LTO visibility rules because they may not be 
used externally in the final link. So we end up pessimising unnecessarily.

I gave this some more thought and it seems that the frontend has all of the 
information required to make a determination about whether to allow VFE, 
without needing to teach LTO to relax visibility. We can make the frontend do 
this:

- If `-flto` was passed (see `CodeGenOptions::PrepareForLTO`), attach "allow 
VFE" metadata if class has hidden LTO visibility.
- Otherwise, attach "allow VFE" metadata if class is not `isExternallyVisible`.

Now the behaviour of GlobalDCE can be made conditional on the "allow VFE" 
metadata alone. This also has the advantage of simplifying the IR by making 
"allow VFE" a boolean rather than a tri-state as it is now.




Comment at: llvm/lib/Transforms/IPO/GlobalDCE.cpp:183
+// unit, we know that we can see all virtual functions which might use it,
+// so VFE is safe.
+if (auto GO = dyn_cast()) {

ostannard wrote:
> pcc wrote:
> > Not necessarily, at least in this implementation, because "vtable symbol 
> > can be internalized" doesn't imply "all vcalls are visible". See main 
> > response.
> This is checking the vcall_visibility, which will only be 
> VCallVisibilityTranslationUnit if all base classes which contain virtual 
> functions are private to this translation unit, which does imply "all vcalls 
> are visible".
Ack.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-07-30 Thread Oliver Stannard (Linaro) via Phabricator via cfe-commits
ostannard updated this revision to Diff 212370.
ostannard marked 2 inline comments as done.
ostannard added a comment.

- Rebase
- Don't emit llvm.assume when not necessary (we already weren't checking for 
it's presence in GlobalDCE)
- s/"public"/"default"/ in IR docs


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932

Files:
  clang/include/clang/Basic/CodeGenOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/Basic/Targets/ARM.cpp
  clang/lib/CodeGen/CGClass.cpp
  clang/lib/CodeGen/CGVTables.cpp
  clang/lib/CodeGen/CodeGenModule.h
  clang/lib/CodeGen/ItaniumCXXABI.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Frontend/CompilerInvocation.cpp
  clang/test/CodeGenCXX/vcall-visibility-metadata.cpp
  clang/test/CodeGenCXX/virtual-function-elimination.cpp
  clang/test/Driver/virtual-function-elimination.cpp
  llvm/docs/LangRef.rst
  llvm/docs/TypeMetadata.rst
  llvm/include/llvm/Analysis/TypeMetadataUtils.h
  llvm/include/llvm/IR/FixedMetadataKinds.def
  llvm/include/llvm/IR/GlobalObject.h
  llvm/include/llvm/Transforms/IPO/GlobalDCE.h
  llvm/lib/Analysis/TypeMetadataUtils.cpp
  llvm/lib/IR/Metadata.cpp
  llvm/lib/LTO/LTO.cpp
  llvm/lib/Transforms/IPO/GlobalDCE.cpp
  llvm/lib/Transforms/IPO/Internalize.cpp
  llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
  llvm/test/LTO/ARM/vcall-visibility.ll
  llvm/test/ThinLTO/X86/lazyload_metadata.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions-base-call.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions-base-pointer-call.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions-derived-call.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions-derived-pointer-call.ll
  llvm/test/Transforms/GlobalDCE/virtual-functions.ll
  llvm/test/Transforms/Internalize/vcall-visibility.ll

Index: llvm/test/Transforms/Internalize/vcall-visibility.ll
===
--- /dev/null
+++ llvm/test/Transforms/Internalize/vcall-visibility.ll
@@ -0,0 +1,64 @@
+; RUN: opt < %s -internalize -S | FileCheck %s
+
+%struct.A = type { i32 (...)** }
+%struct.B = type { i32 (...)** }
+%struct.C = type { i32 (...)** }
+
+; Class A has default visibility, so has no !vcall_visibility metadata before
+; or after LTO.
+; CHECK-NOT: @_ZTV1A = {{.*}}!vcall_visibility
+@_ZTV1A = dso_local unnamed_addr constant { [3 x i8*] } { [3 x i8*] [i8* null, i8* null, i8* bitcast (void (%struct.A*)* @_ZN1A3fooEv to i8*)] }, align 8, !type !0, !type !1
+
+; Class B has hidden visibility but public LTO visibility, so has no
+; !vcall_visibility metadata before or after LTO.
+; CHECK-NOT: @_ZTV1B = {{.*}}!vcall_visibility
+@_ZTV1B = hidden unnamed_addr constant { [3 x i8*] } { [3 x i8*] [i8* null, i8* null, i8* bitcast (void (%struct.B*)* @_ZN1B3fooEv to i8*)] }, align 8, !type !2, !type !3
+
+; Class C has hidden visibility, so the !vcall_visibility metadata is set to 1
+; (linkage unit) before LTO, and 2 (translation unit) after LTO.
+; CHECK: @_ZTV1C ={{.*}}!vcall_visibility [[MD_TU_VIS:![0-9]+]]
+@_ZTV1C = hidden unnamed_addr constant { [3 x i8*] } { [3 x i8*] [i8* null, i8* null, i8* bitcast (void (%struct.C*)* @_ZN1C3fooEv to i8*)] }, align 8, !type !4, !type !5, !vcall_visibility !6
+
+; Class D has translation unit visibility before LTO, and this is not changed
+; by LTO.
+; CHECK: @_ZTVN12_GLOBAL__N_11DE = {{.*}}!vcall_visibility [[MD_TU_VIS:![0-9]+]]
+@_ZTVN12_GLOBAL__N_11DE = internal unnamed_addr constant { [3 x i8*] } zeroinitializer, align 8, !type !7, !type !9, !vcall_visibility !11
+
+define dso_local void @_ZN1A3fooEv(%struct.A* nocapture %this) {
+entry:
+  ret void
+}
+
+define hidden void @_ZN1B3fooEv(%struct.B* nocapture %this) {
+entry:
+  ret void
+}
+
+define hidden void @_ZN1C3fooEv(%struct.C* nocapture %this) {
+entry:
+  ret void
+}
+
+define hidden noalias nonnull i8* @_Z6make_dv() {
+entry:
+  %call = tail call i8* @_Znwm(i64 8) #3
+  %0 = bitcast i8* %call to i32 (...)***
+  store i32 (...)** bitcast (i8** getelementptr inbounds ({ [3 x i8*] }, { [3 x i8*] }* @_ZTVN12_GLOBAL__N_11DE, i64 0, inrange i32 0, i64 2) to i32 (...)**), i32 (...)*** %0, align 8
+  ret i8* %call
+}
+
+declare dso_local noalias nonnull i8* @_Znwm(i64)
+
+; CHECK: [[MD_TU_VIS]] = !{i64 2}
+!0 = !{i64 16, !"_ZTS1A"}
+!1 = !{i64 16, !"_ZTSM1AFvvE.virtual"}
+!2 = !{i64 16, !"_ZTS1B"}
+!3 = !{i64 16, !"_ZTSM1BFvvE.virtual"}
+!4 = !{i64 16, !"_ZTS1C"}
+!5 = !{i64 16, !"_ZTSM1CFvvE.virtual"}
+!6 = !{i64 1}
+!7 = !{i64 16, !8}
+!8 = distinct !{}
+!9 = !{i64 16, !10}
+!10 = distinct !{}
+!11 = !{i64 2}
Index: llvm/test/Transforms/GlobalDCE/virtual-functions.ll
===
--- /dev/null
+++ llvm/test/Transforms/GlobalDCE/virtual-functions.ll
@@ -0,0 +1,55 @@
+; RUN: opt < %s -globaldce -S | FileCheck %s
+
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+
+declare dso_local noalias nonnull i8* 

[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-07-30 Thread Oliver Stannard (Linaro) via Phabricator via cfe-commits
ostannard added a comment.

In that example, with everything having default ELF visibility, all of the 
vtables will get vcall_visibility public, which can't be optimised by VFE, and 
won't ever be relaxed to one of the tighter visibilities.

The cases which can be optimised are (assuming "visibility" means visibility of 
the most-visible dynamic base class):

- Classes in an anonymous namespace, which aren't visible outside their 
translation unit. Probably doesn't happen very often, but can be optimised 
without LTO.
- Classes visible outside the translation unit, but not outside the LTO unit. 
This generally means hidden ELF visibility, unless the lto_visibility_public 
attribute is used. These can't be optimised without LTO, but can be with it.

I implemented the second case by adding the LinkageUnit visibility, which can't 
be optimised by VFE, but can be reduced to TranslationUnit when LTO 
internalisation happens. This could also have been done by not changing the 
visibility at LTO time, and instead leting GlobalDCE know if it is running 
post-LTO. Both of these should give the same results, but the way I used 
represents the visibility fully in the IR, without having the extra state of 
"are we doing LTO?".

I also don't think it matters here whether the DSO is loaded by dlopen or by 
being linked against it. Is there something I've missed here, or was dlopen not 
relevant to your example.

> Have you checked the assembly to see whether an unused CFI check is being 
> emitted?

Not yet, I wanted to make sure that this optimisation was valid and correctly 
implemented before tracking down the performance regressions.




Comment at: llvm/lib/Transforms/IPO/GlobalDCE.cpp:183
+// unit, we know that we can see all virtual functions which might use it,
+// so VFE is safe.
+if (auto GO = dyn_cast()) {

pcc wrote:
> Not necessarily, at least in this implementation, because "vtable symbol can 
> be internalized" doesn't imply "all vcalls are visible". See main response.
This is checking the vcall_visibility, which will only be 
VCallVisibilityTranslationUnit if all base classes which contain virtual 
functions are private to this translation unit, which does imply "all vcalls 
are visible".


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-07-29 Thread Peter Collingbourne via Phabricator via cfe-commits
pcc added a comment.

It doesn't seem sound to create the distinction between translation unit and 
linkage unit visibility and then allow passes to relax visibility. For example, 
imagine that you have an executable with (compiled with `-fvisibility=default`):

  struct PluginInterface {
virtual void foo() = 0;
  }
  
  struct Plugin1 : PluginInterface {
virtual void foo();
  };
  
  void Plugin1::foo() {
   // ...
  }

And a DSO loaded with dlopen:

  void call_foo(PluginInterface *plugin) {
plugin->foo();
  }

Note that the lack of attribute means that the vtables have public LTO 
visibility. Depending on the contents of the DSO, the `PluginInterface` and 
`Plugin1` vtable symbols may not be referenced at all by the DSO. If the DSO is 
loaded via `dlopen` it will definitely not be referenced at link time. If we 
allow LTO to relax `PluginInterface` and `Plugin1` vtables to what you're 
calling `VCallVisibilityTranslationUnit` in the main executable and there are 
no calls to `foo()` in the main executable, we will incorrectly drop the 
definition of `Plugin1::foo()`.

The approach that I can see working involves basically the same structure as 
CFI and WPD: the LTO visibility is decided by the frontend and not relaxed by 
passes or LTO. There is only one "flavour" of `!vcall_visibility` metadata: the 
one that means the entire hierarchy below the vtable has hidden LTO visibility.

> However, the
>  changes in clang to use the llvm.type.checked.load intrinsic are causing ~1%
>  performance regression in the C++ parts of SPEC2006.

Have you checked the assembly to see whether an unused CFI check is being 
emitted?




Comment at: clang/lib/CodeGen/CGClass.cpp:2816
+  } else {
+Builder.CreateCall(CGM.getIntrinsic(llvm::Intrinsic::assume), CheckResult);
+  }

There should be no need to emit this `llvm.assume` intrinsic. We can rely on 
the unspecified behaviour in the case where the second element returned by 
`llvm.type.checked.load` is false.



Comment at: llvm/docs/TypeMetadata.rst:240
+
+  __attribute__((visibility("public")))
+  struct A {

`s/"public"/"default"/`



Comment at: llvm/lib/Transforms/IPO/GlobalDCE.cpp:183
+// unit, we know that we can see all virtual functions which might use it,
+// so VFE is safe.
+if (auto GO = dyn_cast()) {

Not necessarily, at least in this implementation, because "vtable symbol can be 
internalized" doesn't imply "all vcalls are visible". See main response.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-07-22 Thread Oliver Stannard (Linaro) via Phabricator via cfe-commits
ostannard added a comment.

Ping.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-07-10 Thread Peter Collingbourne via Phabricator via cfe-commits
pcc added a comment.

Hi Oliver, thanks for working on this. This is something that I've always 
wanted to do with the type metadata but never had time to work on. I'll try to 
take a more in depth look later this week.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-07-10 Thread Oliver Stannard (Linaro) via Phabricator via cfe-commits
ostannard added a comment.

Ping.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-06-28 Thread Oliver Stannard (Linaro) via Phabricator via cfe-commits
ostannard added a comment.

Without switching to `llvm.type.checked.load`, the optimisation would trigger 
in some cases where it shouldn't trigger, thus miscompiling the code. This is 
because it needs to know about _every_ virtual call site, if we lose the 
information about some of them then it will incorrectly think a virtual 
function is unused.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-06-28 Thread Roman Lebedev via Phabricator via cfe-commits
lebedev.ri added a comment.

  To make this transformation safe, I have changed clang's code-generation to
  always load virtual function pointers using the llvm.type.checked.load
  intrinsic, instead of regular load instructions. I originally tried writing
  this using clang's existing code-generation, which uses the llvm.type.test
  and llvm.assume intrinsics after doing a normal load. However, it is possible
  for optimisations to obscure the relationship between the GEP, load and
  llvm.type.test, causing GlobalDCE to fail to find virtual function call
  sites.

Can you please clarify, that is saying that without switching to
`llvm.type.checked.load` the optimization would be less efficient
since it would trigger less, not that the optimization would trigger
in some cases where it shouldn't trigger, thus miscompiling the code,
correct ?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

2019-06-28 Thread Oliver Stannard (Linaro) via Phabricator via cfe-commits
ostannard created this revision.
ostannard added reviewers: mehdi_amini, pcc, majnemer, tejohnson.
Herald added subscribers: dang, dexonsmith, steven_wu, hiraditya, 
kristof.beyls, Prazek, javed.absar.
Herald added projects: clang, LLVM.

Currently, it is hard for the compiler to remove unused C++ virtual
functions, because they are all referenced from vtables, which are referenced
by constructors. This means that if the constructor is called from any live
code, then we keep every virtual function in the final link, even if there
are no call sites which can use it.

This patch allows unused virtual functions to be removed during LTO (and
regular compilation in limited circumstances) by using type metadata to match
virtual function call sites to the vtable slots they might load from. This
information can then be used in the global dead code elimination pass instead
of the references from vtables to virtual functions, to more accurately
determine which functions are reachable.

To make this transformation safe, I have changed clang's code-generation to
always load virtual function pointers using the llvm.type.checked.load
intrinsic, instead of regular load instructions. I originally tried writing
this using clang's existing code-generation, which uses the llvm.type.test
and llvm.assume intrinsics after doing a normal load. However, it is possible
for optimisations to obscure the relationship between the GEP, load and
llvm.type.test, causing GlobalDCE to fail to find virtual function call
sites.

The existing linkage and visibility types don't accurately describe the scope
in which a virtual call could be made which uses a given vtable. This is
wider than the visibility of the type itself, because a virtual function call
could be made using a more-visible base class. I've added a new
!vcall_visibility metadata type to represent this, described in
TypeMetadata.rst. The internalization pass and libLTO have been updated to
change this metadata when linking is performed.

This doesn't currently work with ThinLTO, because it needs to see every call
to llvm.type.checked.load in the linkage unit. It might be possible to
extend this optimisation to be able to use the ThinLTO summary, as was done
for devirtualization, but until then that combination is rejected in the
clang driver.

To test this, I've written a fuzzer which generates random C++ programs with
complex class inheritance graphs, and virtual functions called through object
and function pointers of different types. The programs are spread across
multiple translation units and DSOs to test the different visibility
restrictions.

I've also tried doing bootstrap builds of LLVM to test this. This isn't
ideal, because only classes in anonymous namespaces can be optimised with
-fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not
work correctly with -fvisibility=hidden. However, there are only 12 test
failures when building with -fvisibility=hidden (and an unmodified compiler),
and this change does not cause any new failures for either value of
-fvisibility.

On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size
reduction of ~6%, over a baseline compiled with "-O2 -flto
-fvisibility=hidden -fwhole-program-vtables". The best cases are reductions
of ~14% in 450.soplex and 483.xalancbmk, and there are no code size
increases.

I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which
show a geomean size reduction of ~3%, again with no size increases.

I had hoped that this would have no effect on performance, which would allow
it to awlays be enabled (when using -fwhole-program-vtables). However, the
changes in clang to use the llvm.type.checked.load intrinsic are causing ~1%
performance regression in the C++ parts of SPEC2006. It should be possible to
recover some of this perf loss by teaching optimisations about the
llvm.type.checked.load intrinsic, which would make it worth turning this on
by default (though it's still dependent on -fwhole-program-vtables).


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D63932

Files:
  clang/include/clang/Basic/CodeGenOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CGClass.cpp
  clang/lib/CodeGen/CGVTables.cpp
  clang/lib/CodeGen/CodeGenModule.h
  clang/lib/CodeGen/ItaniumCXXABI.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Frontend/CompilerInvocation.cpp
  clang/test/CodeGenCXX/vcall-visibility-metadata.cpp
  clang/test/CodeGenCXX/virtual-function-elimination.cpp
  clang/test/Driver/virtual-function-elimination.cpp
  llvm/docs/LangRef.rst
  llvm/docs/TypeMetadata.rst
  llvm/include/llvm/Analysis/TypeMetadataUtils.h
  llvm/include/llvm/IR/GlobalObject.h
  llvm/include/llvm/IR/LLVMContext.h
  llvm/include/llvm/Transforms/IPO/GlobalDCE.h
  llvm/lib/Analysis/TypeMetadataUtils.cpp
  llvm/lib/IR/LLVMContext.cpp
  llvm/lib/IR/Metadata.cpp
  llvm/lib/LTO/LTO.cpp
  llvm/lib/Transforms/IPO/GlobalDCE.cpp