[PATCH] D49403: More aggressively complete RecordTypes with Function Pointers

2018-08-02 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

Sorry, let me be more precise.  The semantics of LLVM IR are that the element 
type of a pointer matters when doing specific operations on pointers: loads, 
stores, GEPs, calls, and so on.  Many of these operations have been at least 
partially updated — in some combination of the APIs for constructing them and 
the IR assembly listings — to require the "element" type to be given 
separately, but that's not really instrumental here.  What's instrumental is 
that, for almost all of these operations, the only thing that matters about the 
element type is its basic layout.  For example, it is perfectly legal in IR to 
load a `float` by loading any type that happens to be the same size as a 
`float` and then bitcasting that value to `float` — well, maybe it's not okay 
to load it as a `struct` with internal padding, because I think the padding 
bits might take on unspecified values, but anything else.  It is also legal in 
IR to do pointer arithmetic by using any combination of GEPs that happens to 
yield the right offset.  The only pointer operations where the specific details 
of the element type actually have semantic weight beyond their layout are calls.

This general lack of meaning is part of the reason we'd like to remove element 
types entirely.

I'm sorry, I'm not interested in taking the patch.


https://reviews.llvm.org/D49403



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49403: More aggressively complete RecordTypes with Function Pointers

2018-08-02 Thread Andy Kaylor via Phabricator via cfe-commits
andrew.w.kaylor added a comment.

Fair enough.

FYI, I've spent most of the day poking at the IR linker and I've found all 
sorts of ways that I can get it to make a complete mess of structure types and 
pointers to them, including some simple cases where it will mess it up in 
different ways depending on the order in which you link files, but I think I've 
convinced myself that there is no way to get the linker to cause incorrect code 
to be generated -- just lots of extra types, pointer casts, and function casts. 
This seems to be entirely consistent with what you are saying and sounds like a 
pretty solid argument for getting rid of typed pointers.

I guess maybe I'll take a step back and think about whether or not I could 
solve my current problems by pretending that all pointers are i8* and reasoning 
from there based on uses.

Thanks for taking the time to share your thoughts on this with me.


https://reviews.llvm.org/D49403



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49403: More aggressively complete RecordTypes with Function Pointers

2018-08-02 Thread David Blaikie via Phabricator via cfe-commits
dblaikie added a comment.

Hi - sorry about the stalled opaque pointer types effort.

For my money - ideally - if someone comes across a bug caused by incorrect 
pointee types, ideally that would be fixed by adjusting whatever piece of code 
was depending on that pointee type being correct to not depend on that 
information. Though, yes, there are likely cases where a small bug in the type 
information exposes vast swathes of LLVM code - where the argument might 
reasonably made that the cleanup would take too long to leave things broken. 
Worth seeing what that looks like, though, before making the call - hopefully.


https://reviews.llvm.org/D49403



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49403: More aggressively complete RecordTypes with Function Pointers

2018-08-02 Thread Andy Kaylor via Phabricator via cfe-commits
andrew.w.kaylor added a comment.

I hope I'm not coming across as too argumentative here. I don't really have 
strong feelings about this function pointer type patch and ultimately I see 
that you are right, but the objections you are raising here would equally apply 
to what I'm working on with the IR linker and if I find a way to fix that, I'll 
care a bit more about that case. (If you'd like a preview, here's the bug I'm 
trying to fix: https://bugs.llvm.org/show_bug.cgi?id=38408)

You say "there is no semantic meaning for pointer types in LLVM IR" and that's 
fine, but there is a function PointerType::getElementType() and if I modify 
that function to always return the i8 type it will break a lot of things. So 
while there may be some sense in which the the pointer type cannot be the 
"correct" type, there is most definitely a sense in which it can be "incorrect" 
even if that sense isn't the strict semantic sense. I haven't looked at all the 
uses of  PointerType::getElementType() [or the related 
Type::getPointerElementType()]. I know a lot of them are just tests and 
assertions. Others are trivially walking through something that they know to be 
true by other means. A few seem (at least on first glance) to actually be doing 
something with it. For instance, llvm::getPointerStride() contains this line of 
code: "int64_t Size = DL.getTypeAllocSize(PtrTy->getElementType());"

I'm not arguing against opaque pointer types. I just feel like we're probably 
at least a couple of years away from having that. I'm also not arguing for 
robust and exhaustive correction of all cases where pointer types are not 
currently "working" in the sense that I am describing in my prior comments. I'm 
just saying that if someone runs into a specific case that is causing them 
problems and they submit a fix for that case, maybe we should let it through 
unless we have more of a reason than strict semantics rendering it unimportant.


https://reviews.llvm.org/D49403



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49403: More aggressively complete RecordTypes with Function Pointers

2018-08-01 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

My point is that IR pointer type information can't be "incorrect", because 
there is no semantic meaning for pointer types in LLVM IR.  That has never been 
a part of the semantics of LLVM IR.  Even if it were possible to change that — 
and I don't think it is, not without major changes, although I can very much 
understand why someone looking at compiler output might think "but we're so 
close right now" — we're not going to make that effort just for the benefit of 
one specific analysis, especially given that the consensus position is still 
that we should go further in the opposite direction and make pointer types stop 
carrying this meaningless information.


https://reviews.llvm.org/D49403



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49403: More aggressively complete RecordTypes with Function Pointers

2018-08-01 Thread Andy Kaylor via Phabricator via cfe-commits
andrew.w.kaylor added a comment.

The LLVM linker also seems to have a bunch of problems with resolving pointer 
types in structures.  I'm looking into a couple of those now.

I am aware of the opaque pointer effort, though as you say it seems to be 
stalled. I agree that there aren't a lot of things you can do based on pointer 
type information, but there are some things that you might like to do which can 
be proven to be unsafe based on pointer type information that might be 
incorrectly flagged as unsafe if the pointer type information is incorrect. An 
example would be the sorts of data layout optimizations described here: 
https://llvm.org/devmtg/2014-10/Slides/Prashanth-DLO.pdf  Note, in particular, 
on slide 11 ("Legality") the reference to "cast to/from a given struct type". I 
would imagine that includes pointer casts. It should be possible to do the same 
sort of analysis with opaque pointers by following their uses very carefully, 
and maybe not having these infernal pointer type casts all over the place would 
make that less prone to false negatives. In the meantime, the pointer casts are 
there and have to be dealt with.

Getting back to the current review, Erich explained to me that this patch 
involves a certain amount of risk in that it changes the way clang processes 
things, but it has the merit of getting the correct answer.


https://reviews.llvm.org/D49403



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49403: More aggressively complete RecordTypes with Function Pointers

2018-08-01 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

LLVM is sometimes confused by bitcasts of function pointers, but that's just a 
weird exception to the basic rule that pointer element types aren't supposed to 
actually matter in LLVM IR.  There's a (stalled, unfortunately) effort to 
remove pointer element types from IR entirely for exactly that reason.


https://reviews.llvm.org/D49403



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49403: More aggressively complete RecordTypes with Function Pointers

2018-08-01 Thread Andy Kaylor via Phabricator via cfe-commits
andrew.w.kaylor added a comment.

I've talked to Olga about this, and I think we have a way to handle the problem 
she was working on without this change.

However,  I think this change is worth considering anyway. The test case shows 
an example where clang is clearly not producing the output it intends to 
produce. In most cases that probably doesn't matter, and I can't come up with 
any case where it will result in incorrect code generation. One place that I 
think it has the potential to introduce trouble is with LTO.

Modifying the example from the test case slightly, suppose you have that code 
broken up into two source files like this.

f1.c:

  struct has_fp;
  typedef void (*fp)(const struct has_fp *f);
  struct has_fp {
fp i;
  };
  void func(const struct has_fp *f) {}

f2.c:

  struct has_fp;
  typedef void (*fp)(const struct has_fp *f);
  struct has_fp {
fp i;
  };
  void func(const struct has_fp *f);
  void func2(const struct has_fp *f, int n) {
if (n == 0)
  func(f);
  }

Now if I compile both of these files with "-c -flto -O2" and then use 
"llvm-link -S -o - f1.o f2.o" I'll see the following:

  %struct.has_fp = type { {}* }
  %struct.has_fp.0 = type { void (%struct.has_fp.0*)* }
  
  define void @func(%struct.has_fp* %f) {
  entry:
ret void
  }
  
  define void @func2(%struct.has_fp.0* %f, i32 %i) {
  entry:
%cmp = icmp eq i32 %i, 0
br i1 %cmp, label %if.end, label %if.then
  
  if.then:
tail call void bitcast (void (%struct.has_fp*)* @func to void 
(%struct.has_fp.0*)*)(%struct.has_fp.0* %f)
br label %if.end
  
  if.end:
ret void
  }

Granted, this will ultimately produce correct code, and I don't have an example 
handy that shows how the extra type and the function bitcast might inhibit 
optimizations, but I think we're at least a step closer to being able to 
imagine it.


https://reviews.llvm.org/D49403



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49403: More aggressively complete RecordTypes with Function Pointers

2018-07-25 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

In https://reviews.llvm.org/D49403#1175350, @olga.chupina wrote:

> I should probably add one more example to explain my point of view.
>  Suppose we have an indirect call in the program and we need to know all 
> possible goals for this indirect call. Then we would like to know that one of 
> the structure fields is a function pointer and it can be a candidate for 
> indirect call resolution.


I'm not making the connection you're trying to suggest, sorry.  So you have 
whole-program information, and you're looking at a particular indirect call, 
and you want to know where it can go more precisely than just "any function 
whose address has escaped".  What is the struct in this situation?

LLVM IR types are not accurate to the C type system, and the C type system does 
not allow you to know conclusively whether a function pointer is stored 
somewhere.  Even ignoring common extensions like allowing a function pointer to 
be represented in a `void*` (which is absolutely pervasive and relied upon by 
POSIX APIs), you can just store a function pointer value into a union member or 
some other kind of untyped/loosely-typed memory.  So any analysis that depends 
for correctness on identifying all IR types that could store a function pointer 
is just not going to work, at least for IR coming from C.


https://reviews.llvm.org/D49403



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49403: More aggressively complete RecordTypes with Function Pointers

2018-07-25 Thread Olga Chupina via Phabricator via cfe-commits
olga.chupina added a comment.

I should probably add one more example to explain my point of view.
Suppose we have an indirect call in the program and we need to know all 
possible goals for this indirect call. Then we would like to know that one of 
the structure fields is a function pointer and it can be a candidate for 
indirect call resolution.


https://reviews.llvm.org/D49403



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49403: More aggressively complete RecordTypes with Function Pointers

2018-07-23 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

Are you sure you shouldn't do it based on some sort of actual annotation based 
on the frontend's knowledge of types instead of adding semantics for LLVM IR 
types that were never intended?


https://reviews.llvm.org/D49403



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49403: More aggressively complete RecordTypes with Function Pointers

2018-07-23 Thread Olga Chupina via Phabricator via cfe-commits
olga.chupina added a comment.

Yes, correct.


https://reviews.llvm.org/D49403



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49403: More aggressively complete RecordTypes with Function Pointers

2018-07-23 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

Sorry, I'm not following you.  Are you doing some sort of type-based security 
analysis based on LLVM IR types?


https://reviews.llvm.org/D49403



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49403: More aggressively complete RecordTypes with Function Pointers

2018-07-23 Thread Olga Chupina via Phabricator via cfe-commits
olga.chupina added a comment.

Hello,

Since the structure field which is a function pointer appears in the assignment 
statement, it messes up the type analysis.
If it would really be a pointer to a structure then the corresponding 
assignment could be treated as bad type casting and potentially lead to 
unauthorized structure field access.


https://reviews.llvm.org/D49403



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49403: More aggressively complete RecordTypes with Function Pointers

2018-07-18 Thread Erich Keane via Phabricator via cfe-commits
erichkeane added a subscriber: olga.chupina.
erichkeane added a comment.

In https://reviews.llvm.org/D49403#1165845, @rjmccall wrote:

> Can you explain why this is important for the optimizer?


@olga.a.chupina  (or is it @olga.chupina ) is the one who reported this to me, 
so hopefully she can explain it in a disclosable fashion.


https://reviews.llvm.org/D49403



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49403: More aggressively complete RecordTypes with Function Pointers

2018-07-17 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

Can you explain why this is important for the optimizer?


https://reviews.llvm.org/D49403



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49403: More aggressively complete RecordTypes with Function Pointers

2018-07-16 Thread Erich Keane via Phabricator via cfe-commits
erichkeane created this revision.
erichkeane added reviewers: majnemer, rjmccall.

I discovered that function pointers inside a RecordType that had
its type-determination done in a function whose signature matched
said function pointer resulted in the function pointer type being
emitted empty.  This resulted in information being lost that is
interesting to certain optimizations.

See: https://godbolt.org/g/EegViY

This patch accomplishes this in 2 different situations:
1- When the function itself is being emitted, first convert all

  the return/parameter types to ensure that they are available when
  completing the function type.  This should not result in any additional
  work besides completing parameter types that previously were not completed.

2- When the function type is being evaluated, defer conversion of the record

  type until it is no longer dependent. 


https://reviews.llvm.org/D49403

Files:
  lib/CodeGen/CGCall.cpp
  lib/CodeGen/CodeGenTypes.cpp
  test/CodeGen/func_ptr_recursive_layout.c
  test/CodeGenCXX/pr18962.cpp


Index: lib/CodeGen/CGCall.cpp
===
--- lib/CodeGen/CGCall.cpp
+++ lib/CodeGen/CGCall.cpp
@@ -756,6 +756,13 @@
 
   unsigned CC = ClangCallConvToLLVMCallConv(info.getCC());
 
+  // Prime the type-cache with the types for this function early.  This enables
+  // parameter types that are dependent on this function's type to be properly
+  // emitted.
+  ConvertType(resultType);
+  for (const CanQualType &QT : argTypes)
+ConvertType(QT);
+
   // Construct the function info.  We co-allocate the ArgInfos.
   FI = CGFunctionInfo::create(CC, instanceMethod, chainCall, info,
   paramInfos, resultType, argTypes, required);
Index: lib/CodeGen/CodeGenTypes.cpp
===
--- lib/CodeGen/CodeGenTypes.cpp
+++ lib/CodeGen/CodeGenTypes.cpp
@@ -169,6 +169,14 @@
   if (const auto *AT = CGT.getContext().getAsArrayType(T))
 return isSafeToConvert(AT->getElementType(), CGT, AlreadyChecked);
 
+  // If this type has a function pointer that is in the process of being laid
+  // out, delay emitting this RecordDecl until that function type is finished.
+  if (T->isFunctionPointerType())
+return isSafeToConvert(T->getPointeeType(), CGT, AlreadyChecked);
+
+  if (T->isFunctionType())
+return !CGT.isRecordBeingLaidOut(T.getCanonicalType().getTypePtr());
+
   // Otherwise, there is no concern about transforming this.  We only care 
about
   // things that are contained by-value in a structure that can have another 
   // structure as a member.
Index: test/CodeGen/func_ptr_recursive_layout.c
===
--- test/CodeGen/func_ptr_recursive_layout.c
+++ test/CodeGen/func_ptr_recursive_layout.c
@@ -0,0 +1,23 @@
+// RUN: %clang_cc1 -triple x86_64-linux-pc -emit-llvm -o - %s | FileCheck %s
+
+struct has_fp;
+typedef void (*fp)(const struct has_fp *f);
+struct has_fp {
+  fp i;
+};
+void func(const struct has_fp *f);
+
+void usage() {
+  (void)func;
+}
+
+struct has_fp2;
+typedef void (*fp2)(const struct has_fp2 *f);
+struct has_fp2 {
+  fp2 i;
+};
+void func2(const struct has_fp2 *f) {}
+
+// CHECK-DAG: %struct.has_fp = type { void (%struct.has_fp*)* }
+// CHECK-DAG: %struct.has_fp2 = type { void (%struct.has_fp2*)* }
+
Index: test/CodeGenCXX/pr18962.cpp
===
--- test/CodeGenCXX/pr18962.cpp
+++ test/CodeGenCXX/pr18962.cpp
@@ -25,7 +25,7 @@
 }
 
 // We end up using an opaque type for 'append' to avoid circular references.
-// CHECK: %class.A = type { {}* }
+// CHECK: %class.A = type { void (%class.A*)* }
 // CHECK: %class.C = type <{ %class.D*, %class.B, [3 x i8] }>
 // CHECK: %class.D = type { %class.C.base, [3 x i8] }
 // CHECK: %class.C.base = type <{ %class.D*, %class.B }>


Index: lib/CodeGen/CGCall.cpp
===
--- lib/CodeGen/CGCall.cpp
+++ lib/CodeGen/CGCall.cpp
@@ -756,6 +756,13 @@
 
   unsigned CC = ClangCallConvToLLVMCallConv(info.getCC());
 
+  // Prime the type-cache with the types for this function early.  This enables
+  // parameter types that are dependent on this function's type to be properly
+  // emitted.
+  ConvertType(resultType);
+  for (const CanQualType &QT : argTypes)
+ConvertType(QT);
+
   // Construct the function info.  We co-allocate the ArgInfos.
   FI = CGFunctionInfo::create(CC, instanceMethod, chainCall, info,
   paramInfos, resultType, argTypes, required);
Index: lib/CodeGen/CodeGenTypes.cpp
===
--- lib/CodeGen/CodeGenTypes.cpp
+++ lib/CodeGen/CodeGenTypes.cpp
@@ -169,6 +169,14 @@
   if (const auto *AT = CGT.getContext().getAsArrayType(T))
 return isSafeToConvert(AT->getElementType(), CGT, AlreadyChecked);
 
+  // If this type has a function