[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-25 Thread Xun Li via Phabricator via cfe-commits
lxfind abandoned this revision.
lxfind added a comment.

Abandoning in favor of D99227 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-21 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

@bruno  Thanks for the review!




Comment at: clang/lib/CodeGen/CGCoroutine.cpp:221
 CGF.EmitBlock(RealSuspendBlock);
+  } else if (ForcestackStart) {
+Builder.CreateCall(

bruno wrote:
> ChuanqiXu wrote:
> > lxfind wrote:
> > > ChuanqiXu wrote:
> > > > ChuanqiXu wrote:
> > > > > can we rewrite it into:
> > > > > ```
> > > > > else if (SuspendRet != nullptr && 
> > > > > SuspendRet->getType()->isClassType()) {
> > > > >  // generate:
> > > > >  // llvm.coro.forcestack(SuspendRet)
> > > > > }
> > > > > ```
> > > > Sorry I find we can't did it directly. As you said, we need to traverse 
> > > > down SuspendRet. And I still think we should did it only at CodeGen 
> > > > part since it looks not so hard. I guess we could make it in above 
> > > > 10~15 lines of codes.
> > > Traversing down AST isn't the hard part. The hard part is to search the 
> > > emitted IR, and look for the temporary alloca used to store the returned 
> > > handle.
> > Yes, I get your point. If we want to traverse the emitted IR, we could only 
> > search for the use-chain backward, which is also very odd. Let's see if 
> > there is other ways to modify the ASTNodes to make it more naturally.
> I'm curious whether did you consider annotating instructions with some new 
> custom metadata instead of using intrinsics? If so, what would be the 
> tradeoff? For example, if you could conditionally attach metadata some 
> "begin" metadata here:
> 
> `auto *SaveCall = Builder.CreateCall(CoroSave, {NullPtr});`
> 
> and "end" metadata here:
> 
> `auto *SuspendResult = Builder.CreateCall(CoroSuspend, {SaveCall, 
> Builder.getInt1(IsFinalSuspend)});`
The "end" part could probably be done through metadata. But I'm not sure how to 
do it for the "begin" part. The "begin" part needs to happen after the emission 
of S.getAwaitSuspendCallExpr().



Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:2085
+if (auto *II = dyn_cast())
+  if (II->getIntrinsicID() == llvm::Intrinsic::coro_forcestack_begin) {
+assert(II->getNumUses() == 1 &&

bruno wrote:
> Do such intrinsics never get removed? What happens when this hits a backend?
They are added to the list of DeadInstructions after collected. So they will 
all be removed at the end of the pass.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-18 Thread JunMa via Phabricator via cfe-commits
junparser added a comment.

In D98638#2630786 , @lxfind wrote:

> Well, I guess another potential solution is to force emitting lifetime 
> intrinsics for this part of coroutine in the front-end.
> Like this:
>
>   diff --git a/clang/lib/CodeGen/CGDecl.cpp b/clang/lib/CodeGen/CGDecl.cpp
>   index 243d93a8c165..ef76e8dcb7c9 100644
>   --- a/clang/lib/CodeGen/CGDecl.cpp
>   +++ b/clang/lib/CodeGen/CGDecl.cpp
>   @@ -1317,7 +1317,7 @@ void CodeGenFunction::EmitAutoVarDecl(const VarDecl 
> ) {
>/// otherwise
>llvm::Value *CodeGenFunction::EmitLifetimeStart(uint64_t Size,
>llvm::Value *Addr) {
>   -  if (!ShouldEmitLifetimeMarkers)
>   +  if (!ShouldEmitLifetimeMarkers && !isCoroutine())
>return nullptr;
>
>  assert(Addr->getType()->getPointerAddressSpace() ==
>   diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
>   index 18f1468dcb86..2e6e6808db7f 100644
>   --- a/clang/lib/CodeGen/CGExpr.cpp
>   +++ b/clang/lib/CodeGen/CGExpr.cpp
>   @@ -535,7 +535,7 @@ EmitMaterializeTemporaryExpr(const 
> MaterializeTemporaryExpr *M) {
>  break;
>
>case SD_FullExpression: {
>   -  if (!ShouldEmitLifetimeMarkers)
>   +  if (!ShouldEmitLifetimeMarkers && !isCoroutine())
>break;
>
>  // Avoid creating a conditional cleanup just to hold an 
> llvm.lifetime.end

We have already allowed to emit lifetime intrinsics for always inlined function 
under O2 , so IMOO emitting 
lifetime intrinsics for coroutine function is OK since stack coloring has less 
effect on coroutine function.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-17 Thread Bruno Cardoso Lopes via Phabricator via cfe-commits
bruno added a comment.

Hi Xun, great to see more improvements in this area.




Comment at: clang/lib/CodeGen/CGCoroutine.cpp:221
 CGF.EmitBlock(RealSuspendBlock);
+  } else if (ForcestackStart) {
+Builder.CreateCall(

ChuanqiXu wrote:
> lxfind wrote:
> > ChuanqiXu wrote:
> > > ChuanqiXu wrote:
> > > > can we rewrite it into:
> > > > ```
> > > > else if (SuspendRet != nullptr && SuspendRet->getType()->isClassType()) 
> > > > {
> > > >  // generate:
> > > >  // llvm.coro.forcestack(SuspendRet)
> > > > }
> > > > ```
> > > Sorry I find we can't did it directly. As you said, we need to traverse 
> > > down SuspendRet. And I still think we should did it only at CodeGen part 
> > > since it looks not so hard. I guess we could make it in above 10~15 lines 
> > > of codes.
> > Traversing down AST isn't the hard part. The hard part is to search the 
> > emitted IR, and look for the temporary alloca used to store the returned 
> > handle.
> Yes, I get your point. If we want to traverse the emitted IR, we could only 
> search for the use-chain backward, which is also very odd. Let's see if there 
> is other ways to modify the ASTNodes to make it more naturally.
I'm curious whether did you consider annotating instructions with some new 
custom metadata instead of using intrinsics? If so, what would be the tradeoff? 
For example, if you could conditionally attach metadata some "begin" metadata 
here:

`auto *SaveCall = Builder.CreateCall(CoroSave, {NullPtr});`

and "end" metadata here:

`auto *SuspendResult = Builder.CreateCall(CoroSuspend, {SaveCall, 
Builder.getInt1(IsFinalSuspend)});`



Comment at: clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp:53
 
-// check that the lifetime of the coroutine handle used to obtain the address 
is contained within single basic block, and hence does not live across 
suspension points.
-// CHECK-LABEL: final.suspend:
-// CHECK: %[[PTR1:.+]] = bitcast 
%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* 
%[[ADDR_TMP:.+]] to i8*
-// CHECK-NEXT:call void @llvm.lifetime.start.p0i8(i64 8, i8* %[[PTR1]])
-// CHECK: call i8* 
@{{.*address.*}}(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"*
 {{[^,]*}} %[[ADDR_TMP]])
-// CHECK-NEXT:%[[PTR2:.+]] = bitcast 
%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[ADDR_TMP]] 
to i8*
-// CHECK-NEXT:call void @llvm.lifetime.end.p0i8(i64 8, i8* %[[PTR2]])
+// CHECK-LABEL: |-FunctionDecl {{.*}} foo 'detached_task ()'
+// first ExprWithCleanups is the initial await

Nice tests. The codegen should live in a different file from the AST dump one, 
you can put the later in `test/clang/SemaCXX` or `tes/clang/AST`.



Comment at: llvm/include/llvm/IR/Intrinsics.td:1308
+// Like the sideeffect intrinsic defined above, this intrinsic is treated by 
the
+// optimizer as having opaque side effects so that it won't be get rid of or 
moved
 // out of the block it probes.

This change seems unrelated to this patch.



Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:2083
+  ForceStackList ForceStacks;
+  for (auto  : instructions(F))
+if (auto *II = dyn_cast())

`collectForceStacks` is only called once from a function that already traverses 
all instructions, can you take advantage of that to collect 
`llvm::Intrinsic::coro_forcestack_begin/end`?



Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:2085
+if (auto *II = dyn_cast())
+  if (II->getIntrinsicID() == llvm::Intrinsic::coro_forcestack_begin) {
+assert(II->getNumUses() == 1 &&

Do such intrinsics never get removed? What happens when this hits a backend?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-17 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added inline comments.



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:221
 CGF.EmitBlock(RealSuspendBlock);
+  } else if (ForcestackStart) {
+Builder.CreateCall(

lxfind wrote:
> ChuanqiXu wrote:
> > ChuanqiXu wrote:
> > > can we rewrite it into:
> > > ```
> > > else if (SuspendRet != nullptr && SuspendRet->getType()->isClassType()) {
> > >  // generate:
> > >  // llvm.coro.forcestack(SuspendRet)
> > > }
> > > ```
> > Sorry I find we can't did it directly. As you said, we need to traverse 
> > down SuspendRet. And I still think we should did it only at CodeGen part 
> > since it looks not so hard. I guess we could make it in above 10~15 lines 
> > of codes.
> Traversing down AST isn't the hard part. The hard part is to search the 
> emitted IR, and look for the temporary alloca used to store the returned 
> handle.
Yes, I get your point. If we want to traverse the emitted IR, we could only 
search for the use-chain backward, which is also very odd. Let's see if there 
is other ways to modify the ASTNodes to make it more naturally.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-17 Thread Xun Li via Phabricator via cfe-commits
lxfind added inline comments.



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:221
 CGF.EmitBlock(RealSuspendBlock);
+  } else if (ForcestackStart) {
+Builder.CreateCall(

ChuanqiXu wrote:
> ChuanqiXu wrote:
> > can we rewrite it into:
> > ```
> > else if (SuspendRet != nullptr && SuspendRet->getType()->isClassType()) {
> >  // generate:
> >  // llvm.coro.forcestack(SuspendRet)
> > }
> > ```
> Sorry I find we can't did it directly. As you said, we need to traverse down 
> SuspendRet. And I still think we should did it only at CodeGen part since it 
> looks not so hard. I guess we could make it in above 10~15 lines of codes.
Traversing down AST isn't the hard part. The hard part is to search the emitted 
IR, and look for the temporary alloca used to store the returned handle.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-17 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

> Can't we did as inline comments?

No, because it would have already been too late. SuspendExpr returns the result 
of __builtin_coro_resume(awaiter.await_suspend().address()), which is different 
from the result of awaiter.await_suspend().
We need to be able to control the placement of awaiter.await_suspend(), which 
is why I had to break up the AST at that boundary.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-17 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added inline comments.



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:221
 CGF.EmitBlock(RealSuspendBlock);
+  } else if (ForcestackStart) {
+Builder.CreateCall(

ChuanqiXu wrote:
> can we rewrite it into:
> ```
> else if (SuspendRet != nullptr && SuspendRet->getType()->isClassType()) {
>  // generate:
>  // llvm.coro.forcestack(SuspendRet)
> }
> ```
Sorry I find we can't did it directly. As you said, we need to traverse down 
SuspendRet. And I still think we should did it only at CodeGen part since it 
looks not so hard. I guess we could make it in above 10~15 lines of codes.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-16 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D98638#2630864 , @lxfind wrote:

> That's a fair point. I agree that we have no guarantee these are the only two 
> cases.
> It does seem to me that coroutine implementation somewhat relies on proper 
> lifetime markers so that data are being put correctly, which may be the 
> fundamental problem we are trying to solve.

It is hard to prove it. This topic need more discuss and more folks get 
involved. But it is really valuable. I can't remember how many patches we had 
to judge whether values should be put on the coroutine frame. I am OK to emit 
lifetime markers even at O0. But I think you need to ask for other's opinion.

In D98638#2630864 , @lxfind wrote:

> We probably could, but it would be very very tedious. 
> During CodeGen, we only have the AST that's calling __builtin_coro_resume, 
> which we will call Emit as a whole.
> So we need to manually match the AST 2 levels down to find the await_suspend 
> call, get its name, and then walk through the emitted IR to find a call with 
> the same name, and then find the tmp that's used to store the return value of 
> the call, and then emit llvm.coro.forcestack.

Can't we did as inline comments?




Comment at: clang/lib/CodeGen/CGCoroutine.cpp:221
 CGF.EmitBlock(RealSuspendBlock);
+  } else if (ForcestackStart) {
+Builder.CreateCall(

can we rewrite it into:
```
else if (SuspendRet != nullptr && SuspendRet->getType()->isClassType()) {
 // generate:
 // llvm.coro.forcestack(SuspendRet)
}
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-16 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

> Then if we want to put the result of the await_suspend in the stack, I think 
> we can do it under CodeGen part only. It should be easy to judge the return 
> type of await_suspend and create a call to llvm.coro.forcestack to the return 
> value of await_suspend.

We probably could, but it would be very very tedious. 
During CodeGen, we only have the AST that's calling __builtin_coro_resume, 
which we will call Emit as a whole.
So we need to manually match the AST 2 levels down to find the await_suspend 
call, get its name, and then walk through the emitted IR to find a call with 
the same name, and then find the tmp that's used to store the return value of 
the call, and then emit llvm.coro.forcestack.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-16 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

> Then we need to answer the question: how can we **prove** that the result of 
> symmetric transfer and %gro are the **only** exceptions from the above rules. 
> Or how can we know the list of exceptions wouldn't get longer and longer in 
> the future?
>
> Then go back to the example in the summary. From my point of view, the key 
> problem is that our escape analysis isn't powerful enough. I don't ask us to 
> do excellent escape analysis. It may beyond our abilities. I just want to say 
> how can we know the result of symmetric transfer and %gro are the only 
> exceptions.

That's a fair point. I agree that we have no guarantee these are the only two 
cases.
It does seem to me that coroutine implementation somewhat relies on proper 
lifetime markers so that data are being put correctly, which may be the 
fundamental problem we are trying to solve.

> In D98638#2630778 , @lxfind wrote:
>
>> Whether or not the current coroutine frame would be destroyed completely 
>> depend on the implementation of await_suspend. So we cannot predict or know 
>> in advance. Therefore, the temporary handle returned by await_suspend must 
>> be put on the stack. I don't really see any other solutions other than this.
>
> OK. Although the main stream implementation of await_suspend only destroy the 
> coroutine handle in the final awaiter, the compiler can't assume the normal 
> await_suspend won't destroy it. So I agree to guard the result of the 
> await_suspend to make it put on the stack. At least, it would reduce the size 
> of the coroutine frame.
>
> Then if we want to put the result of the await_suspend in the stack, I think 
> we can do it under CodeGen part only. It should be easy to judge the return 
> type of await_suspend and create a call to llvm.coro.forcestack to the return 
> value of await_suspend.
>
> In D98638#2630778 , @lxfind wrote:
>
>> Well, I guess another potential solution is to force emitting lifetime 
>> intrinsics for this part of coroutine in the front-end.
>
> I am not sure if this is a good idea. May it break the guide principle in 
> LLVM? This need to be reviewed by others.




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-16 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D98638#2630778 , @lxfind wrote:

> What do you think is the fundamental problem, though?

It is hard to give a formal description for the problem. Let me try to explain 
it.
What I want to say here is about rules that decide whether a value should be 
put on the coroutine frame.
Initially, we put values on the frame for whose uses are crossing suspend 
points with their definition.
Then, we put values on the frame for whose uses are crossing suspend points 
with their definition and uses are not escaped.
In this patch, we want to put values on the frame for whose uses are crossing 
suspend points with their definition and uses are not escaped but except the 
result of symmetric transfer and %gro.
Then we need to answer the question: how can we **prove** that the result of 
symmetric transfer and %gro are the **only** exceptions from the above rules. 
Or how can we know the list of exceptions wouldn't get longer and longer in the 
future?

Then go back to the example in the summary. From my point of view, the key 
problem is that our escape analysis isn't powerful enough. I don't ask us to do 
excellent escape analysis. It may beyond our abilities. I just want to say how 
can we know the result of symmetric transfer and %gro are the only exceptions.

In D98638#2630778 , @lxfind wrote:

> Whether or not the current coroutine frame would be destroyed completely 
> depend on the implementation of await_suspend. So we cannot predict or know 
> in advance. Therefore, the temporary handle returned by await_suspend must be 
> put on the stack. I don't really see any other solutions other than this.

OK. Although the main stream implementation of await_suspend only destroy the 
coroutine handle in the final awaiter, the compiler can't assume the normal 
await_suspend won't destroy it. So I agree to guard the result of the 
await_suspend to make it put on the stack. At least, it would reduce the size 
of the coroutine frame.

Then if we want to put the result of the await_suspend in the stack, I think we 
can do it under CodeGen part only. It should be easy to judge the return type 
of await_suspend and create a call to llvm.coro.forcestack to the return value 
of await_suspend.

In D98638#2630778 , @lxfind wrote:

> Well, I guess another potential solution is to force emitting lifetime 
> intrinsics for this part of coroutine in the front-end.

I am not sure if this is a good idea. May it break the guide principle in LLVM? 
This need to be reviewed by others.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-16 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

Well, I guess another potential solution is to force emitting lifetime 
intrinsics for this part of coroutine in the front-end.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-16 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

> Here what I want to say is we **shouldn't**  handle all the symmetric 
> transfer from the above analysis. And we shouldn't change the ASTNodes and 
> Sema part. We need to solve about the above pattern. It is not easy to give a 
> solution since user could implement symmetric transfer in final awaiter 
> without destroying the handle, which is more common.

Just to clarify, in case there are any confusions around this. This patch would 
work no matter whether the coroutine frame is destroyed or not during 
await_suspend(). It simply makes sure that the temporary handle returned by 
await_suspend will be put in the stack instead of heap, and it will always be 
safe to do so, no matter what happens.
Whether or not the current coroutine frame would be destroyed completely depend 
on the implementation of await_suspend. So we cannot predict or know in 
advance. Therefore, the temporary handle returned by await_suspend must be put 
on the stack. I don't really see any other solutions other than this.

> It seems to be a workaround to use 
> @llvm.coro.forcestack(%result_of_final_await_suspend) . Since I wondering if 
> there are other corner cases as the %gro. My opinion about 
> '@llvm.coro.forcestack' is that we could use it as a patch if we find any 
> holes that is hard to handle immediately. But we also need to find a solution 
> to solve problems more fundamentally.

Yes as I mentioned in the description, there are really only two cases, one is 
after await_suspend call, and one is gro. gro is easy to handle and I will 
likely send a separate patch latter. But this problem with await_suspend is 
particularly challenging to solve.

What do you think is the fundamental problem, though?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-16 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D98638#2630607 , @lxfind wrote:

> In D98638#2628082 , @ChuanqiXu wrote:
>
>> I am a little confused about the first problem. Would it cause the program 
>> to crash? (e.g., we access the fields of coroutine frame after the frame 
>> gets destroyed). Or it just wastes some storage?
>
> This is a repro of the crash (in TSAN mode): https://godbolt.org/z/KvPY66

Oh I got it. The program would crash since handle is destroyed in 
final_awaiter::await_suspend explicitly:

  std::coroutine_handle<> await_suspend(std::coroutine_handle h) 
noexcept {
 h.destroy();
 return std::noop_coroutine();
  }

And the normal symmetric transfer wouldn't destroy the handle (although it 
depends on the implementation of await_suspend).
So the problem met in this patch is program maybe crash with symmetric transfer 
with destroy coroutine handle explicitly (in the final awaiter normally) 
instead of normal symmetric transfer.

The explicitly destruction for the coroutine handle in the await_suspend of 
final awaiter is a normal pattern to enable the Coro-elide optimization. There 
is a discuss before in cafe-dev: 
http://clang-developers.42468.n3.nabble.com/Miscompilation-heap-use-after-free-in-C-coroutines-td4070320.html.
 It looks like it is a subsequent problem.

Here what I want to say is we **shouldn't**  handle all the symmetric transfer 
from the above analysis. And we shouldn't change the ASTNodes and Sema part. We 
need to solve about the above pattern. It is not easy to give a solution since 
user could implement symmetric transfer in final awaiter without destroying the 
handle, which is more common.

My unfinished idea is to emit an intrinsic called @llvm.coro.finalize before we 
emit the promise_type::final_suspend. Then the @llvm.coro.finalize marks the 
end of the lifetime for current coroutine frame. And all the analysis in 
CoroFrame should consider use after @llvm.coro.finalize (We could emit warning 
for some cases). But this idea is also problematic, it makes the semantics of 
coroutine intrinsic more chaos. Just image that how a newbie feels when he see 
@llvm.coro.end, @llvm.coro.destroy and @llvm.coro.finalize. And we can't use 
@llvm.coro.end @llvm.coro.destroy  since they have other semantics 
(llvm.coro.destroy means deletion and llvm.coro.end would be used to split the 
coroutine). Also, the idea of @llvm.coro.finalize seems available to solve the 
problem about%gro mentioned above.

It seems to be a workaround to use 
@llvm.coro.forcestack(%result_of_final_await_suspend) . Since I wondering if 
there are other corner cases as the %gro. My opinion about 
'@llvm.coro.forcestack' is that we could use it as a patch if we find any holes 
that is hard to handle immediately. But we also need to find a solution to 
solve problems more fundamentally.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-16 Thread Xun Li via Phabricator via cfe-commits
lxfind added inline comments.



Comment at: clang/include/clang/AST/ExprCXX.h:4695
+/// afterwards on the stack.
 class CoroutineSuspendExpr : public Expr {
   friend class ASTStmtReader;

lxfind wrote:
> ChuanqiXu wrote:
> > It looks strange for the change of `CoroutineSuspendExpr` at the first 
> > glance. It is easy to understand the coroutine suspend expression is 
> > consists of three parts: Ready, Suspend and resume. It is written in the 
> > language documentation. And the new added AwaitSuspendCall is confusing.
> I agree. But this seems to be the only way to break up Suspend at the point 
> of await_suspend call so that we can insert instructions during CodeGen. Open 
> to ideas though.
One potential way to make this more clear is to rename these two nodes as: 
Suspend and Transfer.



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-16 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

In D98638#2628082 , @ChuanqiXu wrote:

> I am a little confused about the first problem. Would it cause the program to 
> crash? (e.g., we access the fields of coroutine frame after the frame gets 
> destroyed). Or it just wastes some storage?

This is a repro of the crash (in TSAN mode): https://godbolt.org/z/KvPY66


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-16 Thread Xun Li via Phabricator via cfe-commits
lxfind added inline comments.



Comment at: clang/include/clang/AST/ExprCXX.h:4695
+/// afterwards on the stack.
 class CoroutineSuspendExpr : public Expr {
   friend class ASTStmtReader;

ChuanqiXu wrote:
> It looks strange for the change of `CoroutineSuspendExpr` at the first 
> glance. It is easy to understand the coroutine suspend expression is consists 
> of three parts: Ready, Suspend and resume. It is written in the language 
> documentation. And the new added AwaitSuspendCall is confusing.
I agree. But this seems to be the only way to break up Suspend at the point of 
await_suspend call so that we can insert instructions during CodeGen. Open to 
ideas though.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-16 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

In D98638#2628082 , @ChuanqiXu wrote:

> It looks like there are two things this patch wants to do:
>
> 1. Don't put the temporary generated by symmetric-transfer on the coroutine 
> frame.
> 2. Offer a mechanism to force some values (it is easy to extend Alloca to 
> Value) to put in the stack instead of the coroutine frame.
>
> I am a little confused about the first problem. Would it cause the program to 
> crash? (e.g., we access the fields of coroutine frame after the frame gets 
> destroyed). Or it just wastes some storage?
> And I want to ask about the change of the AST nodes and SemaCoroutine. Can we 
> know if a CoroutineSuspendExpr stands for a symmetric-transfer? If yes, it 
> seems we can only do changes in CodeGen part.

It will result in a crash, because we will be accessing memory that's already 
freed.  If you run:

  bin/clang -fcoroutines-ts -std=c++14 -stdlib=libc++ 
../clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp -o - -emit-llvm 
-S -Xclang -disable-llvm-passes

You can see that in the `final.suspend` basic block, there are IRs like this:

%call19 = call i8* 
@_ZN13detached_task12promise_type13final_awaiter13await_suspendENSt12experimental13coroutines_v116coroutine_handleIS0_EE(%"struct.detached_task::promise_type::final_awaiter"*
 nonnull dereferenceable(1) %ref.tm
  p10, i8* %22) #2
%coerce.dive20 = getelementptr inbounds 
%"struct.std::experimental::coroutines_v1::coroutine_handle.0", 
%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %coerce, i32 0, 
i32 0
store i8* %call19, i8** %coerce.dive20, align 8
%call21 = call i8* 
@_ZNKSt12experimental13coroutines_v116coroutine_handleIvE7addressEv(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"*
 nonnull dereferenceable(8) %coerce) #2
call void @llvm.coro.resume(i8* %call21)

The temporary variable %coerce will be put on the frame because it's used by 
the call to `address` function and LLVM thinks it may escape. But the call to 
await_suspend() (the first line) in reality could destroy the current coroutine 
frame. Hence after the call to await_suspend, it will be accessing the frame, 
leading to memory corruption.

> Then I agree to introduce new intrinsic to hint the middle end to put some 
> values on the stack. And the design of `@llvm.coro.forcestack.begin()` and 
> `@llvm.coro.forcestack.end()` is a little strange to me. It says they mark a 
> region where only data from the local stack can be accessed. But it looks 
> error-prone since it is hard for the front-end to decide whether all the 
> access of the region should be put on the stack. I think we could introduce 
> only one intrinisic `@llvm.coro.forcestack(Value* v)`, we can use the 
> argument to mark the value need to be put on the stack.

This is a good idea. Let me play with it. Thanks!

> And about the problem you mentioned in D96922 
> : "The lifetime of  %coro.gro" starts early 
> and %coro.gro" would be used after `coro.end` (Possibly the destructor?) 
> which would cause the program to access destroyed coroutine frame". It looks 
> like the mechanism could solve this problem by a call to 
> `@llvm.coro.forcestack(%coro.gro)`.




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-15 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

It looks like there are two things this patch wants to do:

1. Don't put the temporary generated by symmetric-transfer on the coroutine 
frame.
2. Offer a mechanism to force some values (it is easy to extend Alloca to 
Value) to put in the stack instead of the coroutine frame.

I am a little confused about the first problem. Would it cause the program to 
crash? (e.g., we access the fields of coroutine frame after the frame gets 
destroyed). Or it just wastes some storage?
And I want to ask about the change of the AST nodes and SemaCoroutine. Can we 
know if a CoroutineSuspendExpr stands for a symmetric-transfer? If yes, it 
seems we can only do changes in CodeGen part.

Then I agree to introduce new intrinsic to hint the middle end to put some 
values on the stack. And the design of `@llvm.coro.forcestack.begin()` and 
`@llvm.coro.forcestack.end()` is a little strange to me. It says they mark a 
region where only data from the local stack can be accessed. But it looks 
error-prone since it is hard for the front-end to decide whether all the access 
of the region should be put on the stack. I think we could introduce only one 
intrinisic `@llvm.coro.forcestack(Value* v)`, we can use the argument to mark 
the value need to be put on the stack.

And about the problem you mentioned in D96922 
: "The lifetime of  %coro.gro" starts early 
and %coro.gro" would be used after `coro.end` (Possibly the destructor?) which 
would cause the program to access destroyed coroutine frame". It looks like the 
mechanism could solve this problem by a call to 
`@llvm.coro.forcestack(%coro.gro)`.




Comment at: clang/include/clang/AST/ExprCXX.h:4695
+/// afterwards on the stack.
 class CoroutineSuspendExpr : public Expr {
   friend class ASTStmtReader;

It looks strange for the change of `CoroutineSuspendExpr` at the first glance. 
It is easy to understand the coroutine suspend expression is consists of three 
parts: Ready, Suspend and resume. It is written in the language documentation. 
And the new added AwaitSuspendCall is confusing.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-15 Thread Xun Li via Phabricator via cfe-commits
lxfind created this revision.
Herald added subscribers: ChuanqiXu, hoy, modimo, wenlei, hiraditya.
lxfind requested review of this revision.
Herald added subscribers: llvm-commits, cfe-commits, jdoerfert.
Herald added projects: clang, LLVM.

One of the challenges with the alloca analysis in CoroSplit is that in a few 
cases we need to make sure the allocas must be put on the stack, not on the 
frame.
One of the cases is symmetric transfer. Symmetric transfer is a newly 
introduced feature in C++ coroutines that allows for immediate transfer to a 
different coroutine when the current coroutine is suspended.
The await_suspend() call will return a coroutine handle type, and when that 
happens, the compiler should generate code to resume the returned handle. Like 
this:

  coroutine_handle tmp = awaiter.await_suspend();
  __builtin_coro_resume(tmp.address());

It's very common that after the call to await_suspend(), the current coroutine 
frame is already destroyed, which means we should not be accessing the 
coroutine frame from there.
And we shouldn't because we we use here is a temporary variable which will be 
short-lived. However in a debug build when we don't have lifetime intrinsics, 
it's very hard for 
the compiler to determine that tmp doesn't escape. There are two specific 
challenges here:

1. If the address() function call is not inlined (this should be the default 
case with -O0), we will have a function call that takes tmp as a pointer. The 
compiler does not know that the address call will not capture. This will lead 
to tmp being put on the frame. We could potentially special handle the address 
function in either front-end or CoroSplit, but both are fragile (we will need 
to do some name pattern matching).
2. If the address() function call is inlined (in some versions of libc++, 
address seems to have "always_inline" attribute), we will end up with a series 
of store/load instructions. For a naive analysis, a store of the pointer will 
also be treated as escape. To solve that problem, I introduced D91305 
, which tries to match this specific 
store/load pattern and be able to deal with it. It looks very hacky.

To solve this problem once for all, and provide a framework for solving similar 
problems in the future, this patch introduces 2 new intrinsics to mark a region 
where all data accessed must be put on the stack.
In the case of symmetric transfer, in order to be able to insert code during 
front-end codegen right after the await_suspend call, we need to split the 
Suspend subnode CoroutineSuspendExpr at await_suspend call, as the new 
AwaitSuspendCall subnode.
Then we create a OpaqueValueExpr to wrap around AwaitSuspendCall, and use it to 
continue build the rest of the Suspend subnode. OpaqueValueExpr is necessary 
because we don't want to emit the await_suspend call twice. OpaqueValueExpr 
serves as a stopper in codegen.
If there is no symmetric transfer, the new nodes will be nullptr.
After this patch, now right after the await_suspend() call, we will see a 
llvm.coro.forcestack.begin() intrinsic, and then right before coro.suspend(), 
we will see a llvm.coro.forcestack.end() intrinsic.
CoroSplit will then be able to use this information to decide whether some data 
must be put on the stack.
We are also able to remove the code that tries to match the special store/load 
instruction sequence.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D98638

Files:
  clang/include/clang/AST/ExprCXX.h
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/lib/Sema/SemaCoroutine.cpp
  clang/lib/Serialization/ASTReaderStmt.cpp
  clang/lib/Serialization/ASTWriterStmt.cpp
  clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp
  clang/test/CodeGenCoroutines/coro-symmetric-transfer-02.cpp
  llvm/docs/Coroutines.rst
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/Transforms/Coroutines/CoroFrame.cpp
  llvm/test/Transforms/Coroutines/coro-alloca-06.ll

Index: llvm/test/Transforms/Coroutines/coro-alloca-06.ll
===
--- llvm/test/Transforms/Coroutines/coro-alloca-06.ll
+++ llvm/test/Transforms/Coroutines/coro-alloca-06.ll
@@ -1,5 +1,5 @@
-; Test that in some simple cases allocas will not live on the frame even
-; though their pointers are stored.
+; Test that even though some stores may seem to escape pointers,
+; they can be put on the stack as long as they are within forcestack range.
 ; RUN: opt < %s -coro-split -S | FileCheck %s
 ; RUN: opt < %s -passes=coro-split -S | FileCheck %s
 
@@ -19,14 +19,12 @@
   %2 = call i8* @await_suspend()
   %3 = getelementptr inbounds %"handle", %"handle"* %0, i32 0, i32 0
   store i8* %2, i8** %3, align 8
-  %4 = bitcast %"handle"** %1 to i8*
-  call void @llvm.lifetime.start.p0i8(i64 8, i8* %4)
+  %4 = call i8* @llvm.coro.forcestack.begin()
   store %"handle"* %0, %"handle"** %1, align 8
   %5 = load %"handle"*, %"handle"** %1, align 8
   %6 = getelementptr