[PATCH] D60719: Demonstrate how to fix freestanding for memcpy

2019-04-18 Thread Guillaume Chatelet via Phabricator via cfe-commits
gchatelet added a comment.

In D60719#1470632 , @t.p.northover 
wrote:

> > IIUC freestanding environment should not rely on memcpy being present so my 
> > take on it was that by "fixing" freestanding I could have my cake and eat 
> > it too.
>
> The formal situation is that freestanding implementations are only required 
> to provide language support stuff like `va_list`. They're allowed to give 
> more of the standard library if they want though, as implementation defined 
> behaviour.
>
> In practice, everyone I know provides the basic `string.h` functions and the 
> compiler is pretty explicit about relying on them being present for 
> correctness. They're part of the surrounding environment a user is expected 
> to supply (much like the various exception handling callbacks if you want C++ 
> exceptions, but always required).
>
> For the people actually using freestanding I think they're mostly considered 
> important enough that they're implemented in assembly anyway so this isn't 
> really a burden, but...


Ack. "Fixing" freestanding is not the way to go then : )

>> Ultimately I'm interested in implementing libc's mem function in C++. Let's 
>> take memcpy for instance
> 
> Ah, that is an entirely different problem from what I thought you were trying 
> to do. In that case I'm all in favour of some solution, but would start 
> thinking along the lines of `-fno-builtin-memcpy` options (it's possible that 
> already does what you want, but can't get LLVM to form a `memcpy` from quick 
> tests to check).

Thx for taking the time to suggest possible solution, I really appreciate it.

I already tried `-fno-builtin-memcpy`. The semantic of this flag is pretty 
counter intuitive and I'll explain why it doesn't fit my needs.
LLVM is allowed to replace a piece of code that looks like a `memcpy` with an 
IR intrinsic that implements the same semantic. You can see this by inspecting 
the IR here: https://godbolt.org/z/0y1Yqh.
Now if you use `-fno-builtin-memcpy` you're preventing the compiler from 
understanding that this is a memory copy. Look at the IR here: 
https://godbolt.org/z/lnCIIh.
The vectorizer kicks in in this case and the generated code is quite good but 
sometimes the generated code is pretty bad: https://godbolt.org/z/mHpAYe.

Last but not least `-fno-builtin-memcpy`prevents the compiler from 
understanding code constructs as memcpy but does not prevent the compiler from 
generating calls to `memcpy`:

- Using `__builtin_memcpy` still lowers to `@llvm.memcpy` IR intrinsics: 
https://godbolt.org/z/O0sjIl
- Passing big structs by value will lowers to `@llvm.memcpy` IR intrinsics: 
https://godbolt.org/z/4BUDc0

Another solution is needed for my use case, either a new C++ intrinsics 
`__builtin_inline_memcpy` or an attribute that disables generated libc calls, 
compiler could either inline or fail with an error.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60719/new/

https://reviews.llvm.org/D60719



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D60719: Demonstrate how to fix freestanding for memcpy

2019-04-17 Thread Tim Northover via Phabricator via cfe-commits
t.p.northover added a comment.

> IIUC freestanding environment should not rely on memcpy being present so my 
> take on it was that by "fixing" freestanding I could have my cake and eat it 
> too.

The formal situation is that freestanding implementations are only required to 
provide language support stuff like `va_list`. They're allowed to give more of 
the standard library if they want though, as implementation defined behaviour.

In practice, everyone I know provides the basic `string.h` functions and the 
compiler is pretty explicit about relying on them being present for 
correctness. They're part of the surrounding environment a user is expected to 
supply (much like the various exception handling callbacks if you want C++ 
exceptions, but always required).

For the people actually using freestanding I think they're mostly considered 
important enough that they're implemented in assembly anyway so this isn't 
really a burden, but...

> Ultimately I'm interested in implementing libc's mem function in C++. Let's 
> take memcpy for instance

Ah, that is an entirely different problem from what I thought you were trying 
to do. In that case I'm all in favour of some solution, but would start 
thinking along the lines of `-fno-builtin-memcpy` options (it's possible that 
already does what you want, but can't get LLVM to form a `memcpy` from quick 
tests to check).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60719/new/

https://reviews.llvm.org/D60719



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D60719: Demonstrate how to fix freestanding for memcpy

2019-04-17 Thread Guillaume Chatelet via Phabricator via cfe-commits
gchatelet added a comment.

In D60719#1469958 , @t.p.northover 
wrote:

> I think it'd be pretty unpopular with the people I know who use freestanding. 
> They're mostly working on microcontrollers and compiling -Oz so the extra 
> code size would be untenable; they also have memcpy implementations anyway 
> because they use it in their own code.
>
> If I was trying to solve this (and I'm also not 100% sure it needs solving), 
> I think I'd instead generate a `linkonce` definition of `memcpy` whenever 
> it's needed and leave CodeGen unmodified.


Thx for chiming in @t.p.northover . Since the patch was mostly a proof of 
concept I haven't explained well what the intent was.
Ultimately I'm interested in implementing libc's mem function in C++. Let's 
take `memcpy` for instance, it would be composed out of fixed sized copies 
created from `__builtin_memcpy`. Unfortunately, the compiler is allowed to 
replace the intrinsic with a call to `memcpy` - which I'm currently defining - 
the generated code would recurse indefinitely.

IIUC //freestanding// environment should not rely on `memcpy` being present so 
my take on it was that by "fixing" //freestanding// I could have my cake and 
eat it too.

There are other options to explore but your comment shows that it's better to 
start a discussion around an RFC.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60719/new/

https://reviews.llvm.org/D60719



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D60719: Demonstrate how to fix freestanding for memcpy

2019-04-17 Thread Tim Northover via Phabricator via cfe-commits
t.p.northover added a comment.

I think it'd be pretty unpopular with the people I know who use freestanding. 
They're mostly working on microcontrollers and compiling -Oz so the extra code 
size would be untenable; they also have memcpy implementations anyway because 
they use it in their own code.

If I was trying to solve this (and I'm also not 100% sure it needs solving), I 
think I'd instead generate a `linkonce` definition of `memcpy` whenever it's 
needed and leave CodeGen unmodified.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60719/new/

https://reviews.llvm.org/D60719



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D60719: Demonstrate how to fix freestanding for memcpy

2019-04-17 Thread Clement Courbet via Phabricator via cfe-commits
courbet added a comment.

As discussed offline, I think this should go through an RFC process.

I guess the main reservation that people will have is that this might generate 
a very large number of load stores (because we don't have a good way to 
generate loops here). This is not an issue for X86 because of REPMOVS, but it 
would be cool to have the opinion of people familiar with other architectures. 
One way to tackle this issue could be to expand existing `mem*` functions 
before the DAG. This is already happening for `memcmp` in `ExpandMemcmp`, which 
could be made aware of this flag. Similar approaches could be used for other 
`mem*` functions. But eventually this has to be handled here as the dag itself 
can call `getMemcpy()`.

Also, the current approach does not protect against current users and future 
unsavvy users calling `SelectionDAG::getMemcpy` with `AlwaysAlign == false`, so 
what about actually retrieving the flag //within// the function instead of 
outside ? This happens e.g. in `SelectionDAG::visitMemPCpyCall()`m 
`AArch64TargetLowering::LowerCall` and others.




Comment at: clang/test/CodeGen/freestanding-disables-libc.c:6
+
+// NOTE: Test that assembly doesn't call memcpy function in freestanding mode.
+// RUN: %clang_cc1 -triple i386-unknown-unknown -O2 -S %s -o - | grep 'memcpy'

That's the responsibility of LLVM, so this test should be in 
`llvm/test/Codegen`.



Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:6093
 
+  assert(MF->getFunction().getParent()->getModuleFlag("force-inline-libc") ==
+ nullptr);

Please add an error message (maybe "modules with 'force-inline-libc' should 
never emit library calls")



Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:5483
 
+static bool IsForceInlineLibc(const SelectionDAG ) {
+  const Module *M = DAG.getMachineFunction().getFunction().getParent();

isForceInlineLibc


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60719/new/

https://reviews.llvm.org/D60719



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D60719: Demonstrate how to fix freestanding for memcpy

2019-04-16 Thread Guillaume Chatelet via Phabricator via cfe-commits
gchatelet updated this revision to Diff 195369.
gchatelet added a comment.

Add test.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60719/new/

https://reviews.llvm.org/D60719

Files:
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/CodeGen/freestanding-disables-libc.c
  llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
  llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp


Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
===
--- llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -5480,6 +5480,14 @@
   }
 }
 
+static bool IsForceInlineLibc(const SelectionDAG ) {
+  const Module *M = DAG.getMachineFunction().getFunction().getParent();
+  if (auto *MD = mdconst::extract_or_null(
+  M->getModuleFlag("force-inline-libc")))
+return MD->getZExtValue();
+  return false;
+}
+
 /// Lower the call to the specified intrinsic function. If we want to emit this
 /// as a call to a named external function, return the name. Otherwise, lower 
it
 /// and return null.
@@ -5553,10 +5561,11 @@
 unsigned Align = MinAlign(DstAlign, SrcAlign);
 bool isVol = MCI.isVolatile();
 bool isTC = I.isTailCall() && isInTailCallPosition(, DAG.getTarget());
+bool IsAlwaysInline = IsForceInlineLibc(DAG);
 // FIXME: Support passing different dest/src alignments to the memcpy DAG
 // node.
 SDValue MC = DAG.getMemcpy(getRoot(), sdl, Op1, Op2, Op3, Align, isVol,
-   false, isTC,
+   IsAlwaysInline, isTC,
MachinePointerInfo(I.getArgOperand(0)),
MachinePointerInfo(I.getArgOperand(1)));
 updateDAGForMaybeTailCall(MC);
Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
===
--- llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -6090,6 +6090,9 @@
   // beyond the given memory regions. But fixing this isn't easy, and most
   // people don't care.
 
+  assert(MF->getFunction().getParent()->getModuleFlag("force-inline-libc") ==
+ nullptr);
+
   // Emit a library call.
   TargetLowering::ArgListTy Args;
   TargetLowering::ArgListEntry Entry;
Index: clang/test/CodeGen/freestanding-disables-libc.c
===
--- /dev/null
+++ clang/test/CodeGen/freestanding-disables-libc.c
@@ -0,0 +1,11 @@
+// NOTE: Test that IR module specifies a "force-inline-libc" attribute in 
freestanding mode.
+// RUN: %clang_cc1 -triple i386-unknown-unknown -emit-llvm %s -o - | not grep 
'force-inline-libc'
+// RUN: %clang_cc1 -triple i386-unknown-unknown -O2 -emit-llvm %s -o - | not 
grep 'force-inline-libc'
+// RUN: %clang_cc1 -triple i386-unknown-unknown -ffreestanding -O2 -emit-llvm 
%s -o - | grep 'force-inline-libc'
+
+// NOTE: Test that assembly doesn't call memcpy function in freestanding mode.
+// RUN: %clang_cc1 -triple i386-unknown-unknown -O2 -S %s -o - | grep 'memcpy'
+// RUN: %clang_cc1 -triple i386-unknown-unknown -ffreestanding -O2 -S %s -o - 
| not grep 'memcpy'
+
+struct BigStruct { char payload[4096]; };
+struct BigStruct PassByValue(struct BigStruct value) { return value; }
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -596,6 +596,10 @@
   if (!getCodeGenOpts().RecordCommandLine.empty())
 EmitCommandLineMetadata();
 
+  if (LangOpts.Freestanding) {
+getModule().addModuleFlag(llvm::Module::Override, "force-inline-libc", 1);
+  }
+
   EmitTargetMetadata();
 }
 


Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
===
--- llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -5480,6 +5480,14 @@
   }
 }
 
+static bool IsForceInlineLibc(const SelectionDAG ) {
+  const Module *M = DAG.getMachineFunction().getFunction().getParent();
+  if (auto *MD = mdconst::extract_or_null(
+  M->getModuleFlag("force-inline-libc")))
+return MD->getZExtValue();
+  return false;
+}
+
 /// Lower the call to the specified intrinsic function. If we want to emit this
 /// as a call to a named external function, return the name. Otherwise, lower it
 /// and return null.
@@ -5553,10 +5561,11 @@
 unsigned Align = MinAlign(DstAlign, SrcAlign);
 bool isVol = MCI.isVolatile();
 bool isTC = I.isTailCall() && isInTailCallPosition(, DAG.getTarget());
+bool IsAlwaysInline = IsForceInlineLibc(DAG);
 // FIXME: Support passing different dest/src alignments to the memcpy DAG
 // node.
 SDValue MC = DAG.getMemcpy(getRoot(), sdl, Op1, Op2, Op3, Align,