subject:"\[PATCH\] D97915\: \[Coroutines\] Handle overaligned frame allocation"

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2022-09-23 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen abandoned this revision.
ychen added a comment.
Herald added a project: All.

327141f  
is landed as an alternative.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-07-08 Thread Chuanqi Xu via Phabricator via cfe-commits

ChuanqiXu added a comment.

In D97915#2863615 , @ychen wrote:

> That's just my conclusion based on @rjmccall's suggestion 
> (https://reviews.llvm.org/D100739#2717582) and my following responses.

I guess you got the conclusion from this:

> 2d. Use the correct allocator for the frame alignment; both allocators are 
> (allowed to be) ODR-used, but only one would be dynamically used. This is 
> what would be necessary for the implementation I suggested above. In reality 
> there won't be any dynamic overhead because we should always be able to fold 
> the branch after allocation.

It looks like necessary to emit two BBs in the frontend which use the different 
allocators. Then we prune the branch in the middle end.

But I still feel like that there are redundancies in current implementation. 
Let me think more about it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-07-07 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added a comment.

In D97915#2863581 , @ChuanqiXu wrote:

> In D97915#2862419 , @ychen wrote:
>
>>> It looks not hard to implement. And we don't need to refactor the CodeGen 
>>> part a lot. In this way, I think the main effort to support `::operator 
>>> new(size_t, align_t)` would be in the Sema part and the works remained in 
>>> CodeGen part would be little. It wouldn't touch the middle end part neither.
>>
>> I agree that something like this is simpler implementation-wise. Although in 
>> spirit, it is similar to D100739  which we 
>> decided not to pursue. What is the middle-end? is it LLVM?
>>
 It shifts the semantic lowering from Clang to LLVM but does not perform 
 less work.
>>>
>>> I think it would be simpler.
>>
>> I agree it would be simpler.
>>
>>> At least, we don't need to emit `getReturnStmtOnAllocFailure` twice and
>>
>> `getReturnStmtOnAllocFailure` is called twice but the code is emitted once.
>>
>>> we don't need to touch `CallCoroDelete`  neither.
>>
>> If we don't touch `CallCoroDelete`, then we cannot do the equivalent work in 
>> LLVM since there is no concept of deallocation function concept there.
>>
>>> And we don't organize the basic blocks in the CodeGenCoroutineBody.
>>
>> That's true. It is delayed until CoroSplit.
>>
>>> And we could emit simpler AlignupTo (Although it could be simplified 
>>> further, I believe).
>>
>> D102147  has a version of it.
>>
>>> And the extra work we need to do is to compare the alignment requirement 
>>> for the coroutine frame with the second argument of 
>>> `llvm.coro.create.frame` to see if we need to over align coroutine frame.
>>> If yes, we need to lower the `llvm.coro.create.frame` to compute the true 
>>> address for the coroutine frame and store the raw frame address.
>>> If no, we could return `%allocated` simply.
>>
>> Yes, it is similar to CoroFrame.cpp changes in D102147 
>> .
>
>
>
>> I agree that something like this is simpler implementation-wise. Although in 
>> spirit, it is similar to D100739  which we 
>> decided not to pursue. What is the middle-end? is it LLVM?
>
> It is frustrating to break the decision made before. But I don't find the 
> conclusion in D100739  that we shouldn't 
> complete this in middle end.

Yeah, it is not explicitly stated there. That's just my conclusion based on 
@rjmccall's suggestion (https://reviews.llvm.org/D100739#2717582) and my 
following responses. I do think your proposal works for the dynamic allocation 
cases, however, it needs major change when aligned allocator/deallocator comes 
into play in the future when the issue is fixed at the language level. That is 
the primary reason that most of the implementations are in front-end and LLVM 
coroutine intrinsics are mostly agnostic of the alignment issue (newly added 
llvm.coro.raw.frame.ptr.* are two exceptions and all existing intrinsics are 
not touched in both in API or semantics).

>> `getReturnStmtOnAllocFailure` is called twice but the code is emitted once.
>
> I don't think so. `getReturnStmtOnAllocFailure` is called twice and emitted 
> twice. The code emitted by the frontend should be:
>
>   AllocBB:
> ; ...
> getReturnStmtOnAllocFailure()
>   
>   AlignedAllocBB:
> ; ...
> getReturnStmtOnAllocFailure()
>
> I understand that one of the branch would be pruned in the middle end. But in 
> the frontend, they are emitted twice. And it's redundant in both compiler 
> codes and generated codes.

Got you. I meant alloc failure block is emitted once and agree that both 
aligned and normal alloc block are emitted.

>> If we don't touch CallCoroDelete, then we cannot do the equivalent work in 
>> LLVM since there is no concept of deallocation function concept there.
>
> Sorry for confusing. I means that we could change semantics for 
> `llvm.coro.free` to return the address of raw frame  and touch the Sema part 
> to pass `llvm.coro.raw.frame.size` as the second argument to deallocator.
> It should be easy and clean to implement and we don't need to touch 
> `CallCoroDelete ` either.
>
>>> And we don't organize the basic blocks in the CodeGenCoroutineBody.
>>
>> That's true. It is delayed until CoroSplit.
>
> I don't think it is delayed. I think the task to organize the BBs would be 
> eliminated if we move the main work in the middle end. 
> I think if we move the work to select to over align or not, we could use the 
> same IR structure. I think it is simpler, cleaner and more natural.

Agree. But like I said above, IMHO, that's a secondary goal here. LLVM better 
only deal with optimizations, not semantics. (but, yeah, coroutine is special 
but the principle still applies).

>>> And we could emit simpler AlignupTo (Although it could be simplified 
>>> further, I

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-07-07 Thread Chuanqi Xu via Phabricator via cfe-commits

ChuanqiXu added a comment.

In D97915#2862419 , @ychen wrote:

>> It looks not hard to implement. And we don't need to refactor the CodeGen 
>> part a lot. In this way, I think the main effort to support `::operator 
>> new(size_t, align_t)` would be in the Sema part and the works remained in 
>> CodeGen part would be little. It wouldn't touch the middle end part neither.
>
> I agree that something like this is simpler implementation-wise. Although in 
> spirit, it is similar to D100739  which we 
> decided not to pursue. What is the middle-end? is it LLVM?
>
>>> It shifts the semantic lowering from Clang to LLVM but does not perform 
>>> less work.
>>
>> I think it would be simpler.
>
> I agree it would be simpler.
>
>> At least, we don't need to emit `getReturnStmtOnAllocFailure` twice and
>
> `getReturnStmtOnAllocFailure` is called twice but the code is emitted once.
>
>> we don't need to touch `CallCoroDelete`  neither.
>
> If we don't touch `CallCoroDelete`, then we cannot do the equivalent work in 
> LLVM since there is no concept of deallocation function concept there.
>
>> And we don't organize the basic blocks in the CodeGenCoroutineBody.
>
> That's true. It is delayed until CoroSplit.
>
>> And we could emit simpler AlignupTo (Although it could be simplified 
>> further, I believe).
>
> D102147  has a version of it.
>
>> And the extra work we need to do is to compare the alignment requirement for 
>> the coroutine frame with the second argument of `llvm.coro.create.frame` to 
>> see if we need to over align coroutine frame.
>> If yes, we need to lower the `llvm.coro.create.frame` to compute the true 
>> address for the coroutine frame and store the raw frame address.
>> If no, we could return `%allocated` simply.
>
> Yes, it is similar to CoroFrame.cpp changes in D102147 
> .



> I agree that something like this is simpler implementation-wise. Although in 
> spirit, it is similar to D100739  which we 
> decided not to pursue. What is the middle-end? is it LLVM?

It is frustrating to break the decision made before. But I don't find the 
conclusion in D100739  that we shouldn't 
complete this in middle end.

> `getReturnStmtOnAllocFailure` is called twice but the code is emitted once.

I don't think so. `getReturnStmtOnAllocFailure` is called twice and emitted 
twice. The code emitted by the frontend should be:

  AllocBB:
; ...
getReturnStmtOnAllocFailure()
  
  AlignedAllocBB:
; ...
getReturnStmtOnAllocFailure()

I understand that one of the branch would be pruned in the middle end. But in 
the frontend, they are emitted twice. And it's redundant in both compiler codes 
and generated codes.

> If we don't touch CallCoroDelete, then we cannot do the equivalent work in 
> LLVM since there is no concept of deallocation function concept there.

Sorry for confusing. I means that we could change semantics for 
`llvm.coro.free` to return the address of raw frame  and touch the Sema part to 
pass `llvm.coro.raw.frame.size` as the second argument to deallocator.
It should be easy and clean to implement and we don't need to touch 
`CallCoroDelete ` either.

>> And we don't organize the basic blocks in the CodeGenCoroutineBody.
>
> That's true. It is delayed until CoroSplit.

I don't think it is delayed. I think the task to organize the BBs would be 
eliminated if we move the main work in the middle end. 
I think if we move the work to select to over align or not, we could use the 
same IR structure. I think it is simpler, cleaner and more natural.

>> And we could emit simpler AlignupTo (Although it could be simplified 
>> further, I believe).
>
> D102147  has a version of it.
>
>> And the extra work we need to do is to compare the alignment requirement for 
>> the coroutine frame with the second argument of `llvm.coro.create.frame` to 
>> see if we need to over align coroutine frame.
>> If yes, we need to lower the `llvm.coro.create.frame` to compute the true 
>> address for the coroutine frame and store the raw frame address.
>> If no, we could return `%allocated` simply.
>
> Yes, it is similar to CoroFrame.cpp changes in D102147 
> .

I guess you refer to the wrong revision. Since I don't find related things in 
D102147 .


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-07-07 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added a comment.

In D97915#2861178 , @ChuanqiXu wrote:

> In D97915#2861036 , @ychen wrote:
>
>> In D97915#2860984 , @ChuanqiXu 
>> wrote:
>>
>>> In D97915#2860916 , @ychen wrote:
>>>
 In D97915#2859237 , @ChuanqiXu 
 wrote:

> In D97915#2848816 , @ychen wrote:
>
>>> Thanks for clarifying. Let's solve the semantics problem first.
>>> With the introduction about 'raw frame', I think it's necessary to 
>>> introduce this concept in the section 'Switched-Resume Lowering' or 
>>> even the section 'Introduction' in the document. Add a section to tell 
>>> the terminology is satisfied too.
>>
>> Done.
>>
>>> Then why we defined both 'llvm.coro.raw.frame.ptr.offset' and 
>>> 'llvm.coro.raw.frame.ptr.addr' together? It looks like refer to the 
>>> same value finally. It looks like 'llvm.coro.raw.frame.ptr.offset' are 
>>> trying to solve the problem about memory leak. But I think we could use 
>>> llvm.coro.raw.frame.ptr.addr directly instead of traversing the frame 
>>> (Maybe we need to add an intrinsic `llvm.coro.raw.size`).  Then we can 
>>> omit a field in the frame to save space.
>>
>> ("llvm.coro.raw.frame.ptr.offset" is an offset from coroutine frame 
>> address instead of raw frame pointer)
>>
>> Apologies for the confusion. I've briefly explained it here 
>> https://reviews.llvm.org/D102145#2752445 I think it is not clear. 
>> "llvm.coro.raw.frame.ptr.addr" is conceptually "the address of a 
>> coroutine frame field storing the `raw frame pointer`" only after 
>> `insertSpills` in CoroFrame.cpp. Before that, 
>> "llvm.coro.raw.frame.ptr.addr" is actually an alloca storing the `raw 
>> frame pointer` (try grepping "alloc.frame.ptr" in this review page). 
>> Using  "llvm.coro.raw.frame.ptr.offset" instead of  
>> "llvm.coro.raw.frame.ptr.addr" is doable which looks like below, please 
>> check line 31. The downside is that the write to coroutine frame is not 
>> through an alloca but a direct write. It is unusual because all fields 
>> in the frame are stored as 1. special/header fields 2. alloca 3. 
>> splills. Doing the write indirectly as Alloca makes me comfortable. The 
>> tradeoff is one extra intrinsic "llvm.coro.raw.frame.ptr.addr". What do 
>> you think?
>>
>>   19 coro.alloc.align: ; preds = 
>> %coro.alloc.check.align
>>   20   %3 = sub nsw i64 64, 16
>>   21   %4 = add i64 128, %3
>>   22   %call1 = call noalias nonnull i8* @_Znwm(i64 %4) #13
>>   23   %mask = sub i64 64, 1
>>   24   %intptr = ptrtoint i8* %call1 to i64
>>   25   %over_boundary = add i64 %intptr, %mask
>>   26   %inverted_mask = xor i64 %mask, -1
>>   27   %aligned_intptr = and i64 %over_boundary, %inverted_mask
>>   28   %diff = sub i64 %aligned_intptr, %intptr
>>   29   %aligned_result = getelementptr inbounds i8, i8* %call1, i64 %diff
>>   30   call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, 
>> i64 64) ]
>>   31   store i8* %call1, i8** %alloc.frame.ptr, align 8  
>>
>>   
>>; Replace line 31 with below, and must makes sure line 46~line 48 
>> is skipped.
>>; %poff = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
>>; %addr = getelementptr inbounds i8, i8* %aligned_result, i32 
>> %poff
>>; %addr1 = bitcast i8* %addr to i8**
>>; store i8* %call1, i8** %addr1, align 8
>>   
>>   
>>   32   br label %coro.init.from.coro.alloc.align
>>   33
>>   34 coro.init.from.coro.alloc.align:  ; preds = 
>> %coro.alloc.align
>>   35   %aligned_result.coro.init = phi i8* [ %aligned_result, 
>> %coro.alloc.align ]
>>   36   br label %coro.init
>>   37
>>   38 coro.init:; preds = 
>> %coro.init.from.entry, %coro.init.from.coro.alloc.align, %cor
>>  o.init.from.coro.alloc
>>   39   %5 = phi i8* [ %.coro.init, %coro.init.from.entry ], [ 
>> %call.coro.init, %coro.init.from.coro.alloc ], [ %aligned_result
>>  .coro.init, %coro.init.from.coro.alloc.align ]
>>   40   %FramePtr = bitcast i8* %5 to %f0.Frame*
>>   41   %resume.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
>> %FramePtr, i32 0, i32 0
>>   42   store void (%f0.Frame*)* @f0.resume, void (%f0.Frame*)** 
>> %resume.addr, align 8
>>   43   %6 = select i1 true, void (%f0.Frame*)* @f0.destroy, void 
>> (%f0.Frame*)* @f0.cleanup
>>   44   %destroy.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
>> %FramePtr, i32 0,

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-07-07 Thread Chuanqi Xu via Phabricator via cfe-commits

ChuanqiXu added a comment.

In D97915#2861036 , @ychen wrote:

> In D97915#2860984 , @ChuanqiXu wrote:
>
>> In D97915#2860916 , @ychen wrote:
>>
>>> In D97915#2859237 , @ChuanqiXu 
>>> wrote:
>>>
 In D97915#2848816 , @ychen wrote:

>> Thanks for clarifying. Let's solve the semantics problem first.
>> With the introduction about 'raw frame', I think it's necessary to 
>> introduce this concept in the section 'Switched-Resume Lowering' or even 
>> the section 'Introduction' in the document. Add a section to tell the 
>> terminology is satisfied too.
>
> Done.
>
>> Then why we defined both 'llvm.coro.raw.frame.ptr.offset' and 
>> 'llvm.coro.raw.frame.ptr.addr' together? It looks like refer to the same 
>> value finally. It looks like 'llvm.coro.raw.frame.ptr.offset' are trying 
>> to solve the problem about memory leak. But I think we could use 
>> llvm.coro.raw.frame.ptr.addr directly instead of traversing the frame 
>> (Maybe we need to add an intrinsic `llvm.coro.raw.size`).  Then we can 
>> omit a field in the frame to save space.
>
> ("llvm.coro.raw.frame.ptr.offset" is an offset from coroutine frame 
> address instead of raw frame pointer)
>
> Apologies for the confusion. I've briefly explained it here 
> https://reviews.llvm.org/D102145#2752445 I think it is not clear. 
> "llvm.coro.raw.frame.ptr.addr" is conceptually "the address of a 
> coroutine frame field storing the `raw frame pointer`" only after 
> `insertSpills` in CoroFrame.cpp. Before that, 
> "llvm.coro.raw.frame.ptr.addr" is actually an alloca storing the `raw 
> frame pointer` (try grepping "alloc.frame.ptr" in this review page). 
> Using  "llvm.coro.raw.frame.ptr.offset" instead of  
> "llvm.coro.raw.frame.ptr.addr" is doable which looks like below, please 
> check line 31. The downside is that the write to coroutine frame is not 
> through an alloca but a direct write. It is unusual because all fields in 
> the frame are stored as 1. special/header fields 2. alloca 3. splills. 
> Doing the write indirectly as Alloca makes me comfortable. The tradeoff 
> is one extra intrinsic "llvm.coro.raw.frame.ptr.addr". What do you think?
>
>   19 coro.alloc.align: ; preds = 
> %coro.alloc.check.align
>   20   %3 = sub nsw i64 64, 16
>   21   %4 = add i64 128, %3
>   22   %call1 = call noalias nonnull i8* @_Znwm(i64 %4) #13
>   23   %mask = sub i64 64, 1
>   24   %intptr = ptrtoint i8* %call1 to i64
>   25   %over_boundary = add i64 %intptr, %mask
>   26   %inverted_mask = xor i64 %mask, -1
>   27   %aligned_intptr = and i64 %over_boundary, %inverted_mask
>   28   %diff = sub i64 %aligned_intptr, %intptr
>   29   %aligned_result = getelementptr inbounds i8, i8* %call1, i64 %diff
>   30   call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, i64 
> 64) ]
>   31   store i8* %call1, i8** %alloc.frame.ptr, align 8   
>   
>   
>; Replace line 31 with below, and must makes sure line 46~line 48 
> is skipped.
>; %poff = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
>; %addr = getelementptr inbounds i8, i8* %aligned_result, i32 %poff
>; %addr1 = bitcast i8* %addr to i8**
>; store i8* %call1, i8** %addr1, align 8
>   
>   
>   32   br label %coro.init.from.coro.alloc.align
>   33
>   34 coro.init.from.coro.alloc.align:  ; preds = 
> %coro.alloc.align
>   35   %aligned_result.coro.init = phi i8* [ %aligned_result, 
> %coro.alloc.align ]
>   36   br label %coro.init
>   37
>   38 coro.init:; preds = 
> %coro.init.from.entry, %coro.init.from.coro.alloc.align, %cor
>  o.init.from.coro.alloc
>   39   %5 = phi i8* [ %.coro.init, %coro.init.from.entry ], [ 
> %call.coro.init, %coro.init.from.coro.alloc ], [ %aligned_result
>  .coro.init, %coro.init.from.coro.alloc.align ]
>   40   %FramePtr = bitcast i8* %5 to %f0.Frame*
>   41   %resume.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
> %FramePtr, i32 0, i32 0
>   42   store void (%f0.Frame*)* @f0.resume, void (%f0.Frame*)** 
> %resume.addr, align 8
>   43   %6 = select i1 true, void (%f0.Frame*)* @f0.destroy, void 
> (%f0.Frame*)* @f0.cleanup
>   44   %destroy.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
> %FramePtr, i32 0, i32 1
>   45   store void (%f0.Frame*)* %6, void (%f0.Frame*)** %destroy.addr, 
> align 8
>   46   %7 = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 
> 0, i32

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-07-06 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added a comment.

In D97915#2860984 , @ChuanqiXu wrote:

> In D97915#2860916 , @ychen wrote:
>
>> In D97915#2859237 , @ChuanqiXu 
>> wrote:
>>
>>> In D97915#2848816 , @ychen wrote:
>>>
> Thanks for clarifying. Let's solve the semantics problem first.
> With the introduction about 'raw frame', I think it's necessary to 
> introduce this concept in the section 'Switched-Resume Lowering' or even 
> the section 'Introduction' in the document. Add a section to tell the 
> terminology is satisfied too.

 Done.

> Then why we defined both 'llvm.coro.raw.frame.ptr.offset' and 
> 'llvm.coro.raw.frame.ptr.addr' together? It looks like refer to the same 
> value finally. It looks like 'llvm.coro.raw.frame.ptr.offset' are trying 
> to solve the problem about memory leak. But I think we could use 
> llvm.coro.raw.frame.ptr.addr directly instead of traversing the frame 
> (Maybe we need to add an intrinsic `llvm.coro.raw.size`).  Then we can 
> omit a field in the frame to save space.

 ("llvm.coro.raw.frame.ptr.offset" is an offset from coroutine frame 
 address instead of raw frame pointer)

 Apologies for the confusion. I've briefly explained it here 
 https://reviews.llvm.org/D102145#2752445 I think it is not clear. 
 "llvm.coro.raw.frame.ptr.addr" is conceptually "the address of a coroutine 
 frame field storing the `raw frame pointer`" only after `insertSpills` in 
 CoroFrame.cpp. Before that, "llvm.coro.raw.frame.ptr.addr" is actually an 
 alloca storing the `raw frame pointer` (try grepping "alloc.frame.ptr" in 
 this review page). Using  "llvm.coro.raw.frame.ptr.offset" instead of  
 "llvm.coro.raw.frame.ptr.addr" is doable which looks like below, please 
 check line 31. The downside is that the write to coroutine frame is not 
 through an alloca but a direct write. It is unusual because all fields in 
 the frame are stored as 1. special/header fields 2. alloca 3. splills. 
 Doing the write indirectly as Alloca makes me comfortable. The tradeoff is 
 one extra intrinsic "llvm.coro.raw.frame.ptr.addr". What do you think?

   19 coro.alloc.align: ; preds = 
 %coro.alloc.check.align
   20   %3 = sub nsw i64 64, 16
   21   %4 = add i64 128, %3
   22   %call1 = call noalias nonnull i8* @_Znwm(i64 %4) #13
   23   %mask = sub i64 64, 1
   24   %intptr = ptrtoint i8* %call1 to i64
   25   %over_boundary = add i64 %intptr, %mask
   26   %inverted_mask = xor i64 %mask, -1
   27   %aligned_intptr = and i64 %over_boundary, %inverted_mask
   28   %diff = sub i64 %aligned_intptr, %intptr
   29   %aligned_result = getelementptr inbounds i8, i8* %call1, i64 %diff
   30   call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, i64 
 64) ]
   31   store i8* %call1, i8** %alloc.frame.ptr, align 8
  
   
; Replace line 31 with below, and must makes sure line 46~line 48 
 is skipped.
; %poff = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
; %addr = getelementptr inbounds i8, i8* %aligned_result, i32 %poff
; %addr1 = bitcast i8* %addr to i8**
; store i8* %call1, i8** %addr1, align 8
   
   
   32   br label %coro.init.from.coro.alloc.align
   33
   34 coro.init.from.coro.alloc.align:  ; preds = 
 %coro.alloc.align
   35   %aligned_result.coro.init = phi i8* [ %aligned_result, 
 %coro.alloc.align ]
   36   br label %coro.init
   37
   38 coro.init:; preds = 
 %coro.init.from.entry, %coro.init.from.coro.alloc.align, %cor
  o.init.from.coro.alloc
   39   %5 = phi i8* [ %.coro.init, %coro.init.from.entry ], [ 
 %call.coro.init, %coro.init.from.coro.alloc ], [ %aligned_result
  .coro.init, %coro.init.from.coro.alloc.align ]
   40   %FramePtr = bitcast i8* %5 to %f0.Frame*
   41   %resume.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
 %FramePtr, i32 0, i32 0
   42   store void (%f0.Frame*)* @f0.resume, void (%f0.Frame*)** 
 %resume.addr, align 8
   43   %6 = select i1 true, void (%f0.Frame*)* @f0.destroy, void 
 (%f0.Frame*)* @f0.cleanup
   44   %destroy.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
 %FramePtr, i32 0, i32 1
   45   store void (%f0.Frame*)* %6, void (%f0.Frame*)** %destroy.addr, 
 align 8
   46   %7 = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, 
 i32 2
   47   %8 = load i8*, i8** %alloc.frame.ptr, align 8
   48   store i8* %8, i8** %7, align 8
   49   br label %AllocaSpillBB
   50
   51 AllocaSpillBB:

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-07-06 Thread Chuanqi Xu via Phabricator via cfe-commits

ChuanqiXu added a comment.

In D97915#2860916 , @ychen wrote:

> In D97915#2859237 , @ChuanqiXu wrote:
>
>> In D97915#2848816 , @ychen wrote:
>>
 Thanks for clarifying. Let's solve the semantics problem first.
 With the introduction about 'raw frame', I think it's necessary to 
 introduce this concept in the section 'Switched-Resume Lowering' or even 
 the section 'Introduction' in the document. Add a section to tell the 
 terminology is satisfied too.
>>>
>>> Done.
>>>
 Then why we defined both 'llvm.coro.raw.frame.ptr.offset' and 
 'llvm.coro.raw.frame.ptr.addr' together? It looks like refer to the same 
 value finally. It looks like 'llvm.coro.raw.frame.ptr.offset' are trying 
 to solve the problem about memory leak. But I think we could use 
 llvm.coro.raw.frame.ptr.addr directly instead of traversing the frame 
 (Maybe we need to add an intrinsic `llvm.coro.raw.size`).  Then we can 
 omit a field in the frame to save space.
>>>
>>> ("llvm.coro.raw.frame.ptr.offset" is an offset from coroutine frame address 
>>> instead of raw frame pointer)
>>>
>>> Apologies for the confusion. I've briefly explained it here 
>>> https://reviews.llvm.org/D102145#2752445 I think it is not clear. 
>>> "llvm.coro.raw.frame.ptr.addr" is conceptually "the address of a coroutine 
>>> frame field storing the `raw frame pointer`" only after `insertSpills` in 
>>> CoroFrame.cpp. Before that, "llvm.coro.raw.frame.ptr.addr" is actually an 
>>> alloca storing the `raw frame pointer` (try grepping "alloc.frame.ptr" in 
>>> this review page). Using  "llvm.coro.raw.frame.ptr.offset" instead of  
>>> "llvm.coro.raw.frame.ptr.addr" is doable which looks like below, please 
>>> check line 31. The downside is that the write to coroutine frame is not 
>>> through an alloca but a direct write. It is unusual because all fields in 
>>> the frame are stored as 1. special/header fields 2. alloca 3. splills. 
>>> Doing the write indirectly as Alloca makes me comfortable. The tradeoff is 
>>> one extra intrinsic "llvm.coro.raw.frame.ptr.addr". What do you think?
>>>
>>>   19 coro.alloc.align: ; preds = 
>>> %coro.alloc.check.align
>>>   20   %3 = sub nsw i64 64, 16
>>>   21   %4 = add i64 128, %3
>>>   22   %call1 = call noalias nonnull i8* @_Znwm(i64 %4) #13
>>>   23   %mask = sub i64 64, 1
>>>   24   %intptr = ptrtoint i8* %call1 to i64
>>>   25   %over_boundary = add i64 %intptr, %mask
>>>   26   %inverted_mask = xor i64 %mask, -1
>>>   27   %aligned_intptr = and i64 %over_boundary, %inverted_mask
>>>   28   %diff = sub i64 %aligned_intptr, %intptr
>>>   29   %aligned_result = getelementptr inbounds i8, i8* %call1, i64 %diff
>>>   30   call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, i64 
>>> 64) ]
>>>   31   store i8* %call1, i8** %alloc.frame.ptr, align 8 
>>>   
>>>; Replace line 31 with below, and must makes sure line 46~line 48 is 
>>> skipped.
>>>; %poff = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
>>>; %addr = getelementptr inbounds i8, i8* %aligned_result, i32 %poff
>>>; %addr1 = bitcast i8* %addr to i8**
>>>; store i8* %call1, i8** %addr1, align 8
>>>   
>>>   
>>>   32   br label %coro.init.from.coro.alloc.align
>>>   33
>>>   34 coro.init.from.coro.alloc.align:  ; preds = 
>>> %coro.alloc.align
>>>   35   %aligned_result.coro.init = phi i8* [ %aligned_result, 
>>> %coro.alloc.align ]
>>>   36   br label %coro.init
>>>   37
>>>   38 coro.init:; preds = 
>>> %coro.init.from.entry, %coro.init.from.coro.alloc.align, %cor
>>>  o.init.from.coro.alloc
>>>   39   %5 = phi i8* [ %.coro.init, %coro.init.from.entry ], [ 
>>> %call.coro.init, %coro.init.from.coro.alloc ], [ %aligned_result
>>>  .coro.init, %coro.init.from.coro.alloc.align ]
>>>   40   %FramePtr = bitcast i8* %5 to %f0.Frame*
>>>   41   %resume.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
>>> %FramePtr, i32 0, i32 0
>>>   42   store void (%f0.Frame*)* @f0.resume, void (%f0.Frame*)** 
>>> %resume.addr, align 8
>>>   43   %6 = select i1 true, void (%f0.Frame*)* @f0.destroy, void 
>>> (%f0.Frame*)* @f0.cleanup
>>>   44   %destroy.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
>>> %FramePtr, i32 0, i32 1
>>>   45   store void (%f0.Frame*)* %6, void (%f0.Frame*)** %destroy.addr, 
>>> align 8
>>>   46   %7 = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, 
>>> i32 2
>>>   47   %8 = load i8*, i8** %alloc.frame.ptr, align 8
>>>   48   store i8* %8, i8** %7, align 8
>>>   49   br label %AllocaSpillBB
>>>   50
>>>   51 AllocaSpillBB:; preds = %coro.init
>>>   52   %.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
>>> %FramePtr, i32 0, i32 4
>>>   53   %ref.tmp.reload.addr

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-07-06 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added a comment.

In D97915#2859237 , @ChuanqiXu wrote:

> In D97915#2848816 , @ychen wrote:
>
>>> Thanks for clarifying. Let's solve the semantics problem first.
>>> With the introduction about 'raw frame', I think it's necessary to 
>>> introduce this concept in the section 'Switched-Resume Lowering' or even 
>>> the section 'Introduction' in the document. Add a section to tell the 
>>> terminology is satisfied too.
>>
>> Done.
>>
>>> Then why we defined both 'llvm.coro.raw.frame.ptr.offset' and 
>>> 'llvm.coro.raw.frame.ptr.addr' together? It looks like refer to the same 
>>> value finally. It looks like 'llvm.coro.raw.frame.ptr.offset' are trying to 
>>> solve the problem about memory leak. But I think we could use 
>>> llvm.coro.raw.frame.ptr.addr directly instead of traversing the frame 
>>> (Maybe we need to add an intrinsic `llvm.coro.raw.size`).  Then we can omit 
>>> a field in the frame to save space.
>>
>> ("llvm.coro.raw.frame.ptr.offset" is an offset from coroutine frame address 
>> instead of raw frame pointer)
>>
>> Apologies for the confusion. I've briefly explained it here 
>> https://reviews.llvm.org/D102145#2752445 I think it is not clear. 
>> "llvm.coro.raw.frame.ptr.addr" is conceptually "the address of a coroutine 
>> frame field storing the `raw frame pointer`" only after `insertSpills` in 
>> CoroFrame.cpp. Before that, "llvm.coro.raw.frame.ptr.addr" is actually an 
>> alloca storing the `raw frame pointer` (try grepping "alloc.frame.ptr" in 
>> this review page). Using  "llvm.coro.raw.frame.ptr.offset" instead of  
>> "llvm.coro.raw.frame.ptr.addr" is doable which looks like below, please 
>> check line 31. The downside is that the write to coroutine frame is not 
>> through an alloca but a direct write. It is unusual because all fields in 
>> the frame are stored as 1. special/header fields 2. alloca 3. splills. Doing 
>> the write indirectly as Alloca makes me comfortable. The tradeoff is one 
>> extra intrinsic "llvm.coro.raw.frame.ptr.addr". What do you think?
>>
>>   19 coro.alloc.align: ; preds = 
>> %coro.alloc.check.align
>>   20   %3 = sub nsw i64 64, 16
>>   21   %4 = add i64 128, %3
>>   22   %call1 = call noalias nonnull i8* @_Znwm(i64 %4) #13
>>   23   %mask = sub i64 64, 1
>>   24   %intptr = ptrtoint i8* %call1 to i64
>>   25   %over_boundary = add i64 %intptr, %mask
>>   26   %inverted_mask = xor i64 %mask, -1
>>   27   %aligned_intptr = and i64 %over_boundary, %inverted_mask
>>   28   %diff = sub i64 %aligned_intptr, %intptr
>>   29   %aligned_result = getelementptr inbounds i8, i8* %call1, i64 %diff
>>   30   call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, i64 
>> 64) ]
>>   31   store i8* %call1, i8** %alloc.frame.ptr, align 8 
>>   
>>; Replace line 31 with below, and must makes sure line 46~line 48 is 
>> skipped.
>>; %poff = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
>>; %addr = getelementptr inbounds i8, i8* %aligned_result, i32 %poff
>>; %addr1 = bitcast i8* %addr to i8**
>>; store i8* %call1, i8** %addr1, align 8
>>   
>>   
>>   32   br label %coro.init.from.coro.alloc.align
>>   33
>>   34 coro.init.from.coro.alloc.align:  ; preds = 
>> %coro.alloc.align
>>   35   %aligned_result.coro.init = phi i8* [ %aligned_result, 
>> %coro.alloc.align ]
>>   36   br label %coro.init
>>   37
>>   38 coro.init:; preds = 
>> %coro.init.from.entry, %coro.init.from.coro.alloc.align, %cor
>>  o.init.from.coro.alloc
>>   39   %5 = phi i8* [ %.coro.init, %coro.init.from.entry ], [ 
>> %call.coro.init, %coro.init.from.coro.alloc ], [ %aligned_result
>>  .coro.init, %coro.init.from.coro.alloc.align ]
>>   40   %FramePtr = bitcast i8* %5 to %f0.Frame*
>>   41   %resume.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
>> %FramePtr, i32 0, i32 0
>>   42   store void (%f0.Frame*)* @f0.resume, void (%f0.Frame*)** 
>> %resume.addr, align 8
>>   43   %6 = select i1 true, void (%f0.Frame*)* @f0.destroy, void 
>> (%f0.Frame*)* @f0.cleanup
>>   44   %destroy.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
>> %FramePtr, i32 0, i32 1
>>   45   store void (%f0.Frame*)* %6, void (%f0.Frame*)** %destroy.addr, align 
>> 8
>>   46   %7 = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, 
>> i32 2
>>   47   %8 = load i8*, i8** %alloc.frame.ptr, align 8
>>   48   store i8* %8, i8** %7, align 8
>>   49   br label %AllocaSpillBB
>>   50
>>   51 AllocaSpillBB:; preds = %coro.init
>>   52   %.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
>> %FramePtr, i32 0, i32 4
>>   53   %ref.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
>> %FramePtr, i32 0, i32 5
>>   54   %agg.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
>> %FramePtr, i32 0,

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-07-06 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen updated this revision to Diff 356838.
ychen marked 2 inline comments as done.
ychen added a comment.

- Move `raw frame` description to `Switched-Resume` section.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

Files:
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-cleanup.cpp
  clang/test/CodeGenCoroutines/coro-gro.cpp
  llvm/docs/Coroutines.rst
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/Transforms/Coroutines/CoroFrame.cpp
  llvm/lib/Transforms/Coroutines/CoroInstr.h
  llvm/lib/Transforms/Coroutines/CoroInternal.h
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp
  llvm/lib/Transforms/Coroutines/Coroutines.cpp
  llvm/test/Transforms/Coroutines/coro-frame-overalign.ll

Index: llvm/test/Transforms/Coroutines/coro-frame-overalign.ll
===
--- /dev/null
+++ llvm/test/Transforms/Coroutines/coro-frame-overalign.ll
@@ -0,0 +1,78 @@
+; Check that `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and
+; `@llvm.coro.raw.frame.ptr.alloca` are lowered correctly.
+; RUN: opt < %s -passes=coro-split -S | FileCheck %s
+
+%PackedStruct = type <{ i64 }>
+
+declare void @consume(%PackedStruct*, i32, i32, i8**)
+declare void @consume2(i32, i32)
+
+define i8* @f() "coroutine.presplit"="1" {
+entry:
+  %data = alloca %PackedStruct, align 32
+  %id = call token @llvm.coro.id(i32 16, i8* null, i8* null, i8* null)
+  %size = call i32 @llvm.coro.size.i32()
+  %alloc = call i8* @malloc(i32 %size)
+  %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
+  %align = call i32 @llvm.coro.align.i32()
+  %offset = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  %addr = call i8** @llvm.coro.raw.frame.ptr.addr()
+  call void @consume(%PackedStruct* %data, i32 %align, i32 %offset, i8** %addr)
+  %0 = call i8 @llvm.coro.suspend(token none, i1 false)
+  switch i8 %0, label %suspend [i8 0, label %resume
+i8 1, label %cleanup]
+resume:
+  br label %cleanup
+
+cleanup:
+  %align2 = call i32 @llvm.coro.align.i32()
+  %offset2 = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  call void @consume2(i32 %align2, i32 %offset2)
+  %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
+  call void @free(i8* %mem)
+  br label %suspend
+suspend:
+  call i1 @llvm.coro.end(i8* %hdl, i1 0)
+  ret i8* %hdl
+}
+
+; See if the raw frame pointer was inserted into the frame.
+; CHECK-LABEL: %f.Frame = type { void (%f.Frame*)*, void (%f.Frame*)*, i8*, i1, [7 x i8], %PackedStruct }
+
+; See if we used correct index to access frame addr field (field 2).
+; CHECK-LABEL: @f(
+; CHECK: %alloc.frame.ptr = alloca i8*, align 8
+; CHECK: %[[FIELD:.+]] = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
+; CHECK: %[[ADDR:.+]] = load i8*, i8** %alloc.frame.ptr, align 8
+; CHECK: store i8* %[[ADDR]], i8** %[[FIELD]], align 8
+; CHECK: %[[DATA:.+]] = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 5
+; CHECK: call void @consume(%PackedStruct* %[[DATA]], i32 32, i32 16, i8** %[[FIELD]])
+; CHECK: ret i8*
+
+; See if `llvm.coro.align` and `llvm.coro.raw.frame.ptr.offset` are lowered
+; correctly during deallocation.
+; CHECK-LABEL: @f.destroy(
+; CHECK: call void @consume2(i32 32, i32 16)
+; CHECK: call void @free(i8* %{{.*}})
+
+; CHECK-LABEL: @f.cleanup(
+; CHECK: call void @consume2(i32 32, i32 16)
+; CHECK: call void @free(i8*
+
+declare i8* @llvm.coro.free(token, i8*)
+declare i32 @llvm.coro.size.i32()
+declare i32 @llvm.coro.align.i32()
+declare i32 @llvm.coro.raw.frame.ptr.offset.i32()
+declare i8** @llvm.coro.raw.frame.ptr.addr()
+declare i8  @llvm.coro.suspend(token, i1)
+declare void @llvm.coro.resume(i8*)
+declare void @llvm.coro.destroy(i8*)
+
+declare token @llvm.coro.id(i32, i8*, i8*, i8*)
+declare i1 @llvm.coro.alloc(token)
+declare i8* @llvm.coro.begin(token, i8*)
+declare i1 @llvm.coro.end(i8*, i1)
+
+declare noalias i8* @malloc(i32)
+declare double @print(double)
+declare void @free(i8*)
Index: llvm/lib/Transforms/Coroutines/Coroutines.cpp
===
--- llvm/lib/Transforms/Coroutines/Coroutines.cpp
+++ llvm/lib/Transforms/Coroutines/Coroutines.cpp
@@ -234,6 +234,9 @@
   Shape.CoroBegin = nullptr;
   Shape.CoroEnds.clear();
   Shape.CoroSizes.clear();
+  Shape.CoroAligns.clear();
+  Shape.CoroRawFramePtrOffsets.clear();
+  Shape.CoroRawFramePtrAddrs.clear();
   Shape.CoroSuspends.clear();
 
   Shape.FrameTy = nullptr;
@@ -268,6 +271,15 @@
   case Intrinsic::coro_size:
 CoroSizes.push_back(cast(II));
 break;
+  case Intrinsic::coro_align:
+CoroAligns.push_back(cast(II));
+break;
+  case Intrinsic::coro_raw_frame_ptr_offset:
+CoroRawFramePtrOffsets.push_back(cast(II));
+break;
+  case Intrinsic::coro_raw_frame_ptr_addr:
+

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-07-06 Thread Chuanqi Xu via Phabricator via cfe-commits

ChuanqiXu added a comment.

In D97915#2848816 , @ychen wrote:

>> Thanks for clarifying. Let's solve the semantics problem first.
>> With the introduction about 'raw frame', I think it's necessary to introduce 
>> this concept in the section 'Switched-Resume Lowering' or even the section 
>> 'Introduction' in the document. Add a section to tell the terminology is 
>> satisfied too.
>
> Done.
>
>> Then why we defined both 'llvm.coro.raw.frame.ptr.offset' and 
>> 'llvm.coro.raw.frame.ptr.addr' together? It looks like refer to the same 
>> value finally. It looks like 'llvm.coro.raw.frame.ptr.offset' are trying to 
>> solve the problem about memory leak. But I think we could use 
>> llvm.coro.raw.frame.ptr.addr directly instead of traversing the frame (Maybe 
>> we need to add an intrinsic `llvm.coro.raw.size`).  Then we can omit a field 
>> in the frame to save space.
>
> ("llvm.coro.raw.frame.ptr.offset" is an offset from coroutine frame address 
> instead of raw frame pointer)
>
> Apologies for the confusion. I've briefly explained it here 
> https://reviews.llvm.org/D102145#2752445 I think it is not clear. 
> "llvm.coro.raw.frame.ptr.addr" is conceptually "the address of a coroutine 
> frame field storing the `raw frame pointer`" only after `insertSpills` in 
> CoroFrame.cpp. Before that, "llvm.coro.raw.frame.ptr.addr" is actually an 
> alloca storing the `raw frame pointer` (try grepping "alloc.frame.ptr" in 
> this review page). Using  "llvm.coro.raw.frame.ptr.offset" instead of  
> "llvm.coro.raw.frame.ptr.addr" is doable which looks like below, please check 
> line 31. The downside is that the write to coroutine frame is not through an 
> alloca but a direct write. It is unusual because all fields in the frame are 
> stored as 1. special/header fields 2. alloca 3. splills. Doing the write 
> indirectly as Alloca makes me comfortable. The tradeoff is one extra 
> intrinsic "llvm.coro.raw.frame.ptr.addr". What do you think?
>
>   19 coro.alloc.align: ; preds = 
> %coro.alloc.check.align
>   20   %3 = sub nsw i64 64, 16
>   21   %4 = add i64 128, %3
>   22   %call1 = call noalias nonnull i8* @_Znwm(i64 %4) #13
>   23   %mask = sub i64 64, 1
>   24   %intptr = ptrtoint i8* %call1 to i64
>   25   %over_boundary = add i64 %intptr, %mask
>   26   %inverted_mask = xor i64 %mask, -1
>   27   %aligned_intptr = and i64 %over_boundary, %inverted_mask
>   28   %diff = sub i64 %aligned_intptr, %intptr
>   29   %aligned_result = getelementptr inbounds i8, i8* %call1, i64 %diff
>   30   call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, i64 64) 
> ]
>   31   store i8* %call1, i8** %alloc.frame.ptr, align 8 
>   
>; Replace line 31 with below, and must makes sure line 46~line 48 is 
> skipped.
>; %poff = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
>; %addr = getelementptr inbounds i8, i8* %aligned_result, i32 %poff
>; %addr1 = bitcast i8* %addr to i8**
>; store i8* %call1, i8** %addr1, align 8
>   
>   
>   32   br label %coro.init.from.coro.alloc.align
>   33
>   34 coro.init.from.coro.alloc.align:  ; preds = 
> %coro.alloc.align
>   35   %aligned_result.coro.init = phi i8* [ %aligned_result, 
> %coro.alloc.align ]
>   36   br label %coro.init
>   37
>   38 coro.init:; preds = 
> %coro.init.from.entry, %coro.init.from.coro.alloc.align, %cor
>  o.init.from.coro.alloc
>   39   %5 = phi i8* [ %.coro.init, %coro.init.from.entry ], [ 
> %call.coro.init, %coro.init.from.coro.alloc ], [ %aligned_result
>  .coro.init, %coro.init.from.coro.alloc.align ]
>   40   %FramePtr = bitcast i8* %5 to %f0.Frame*
>   41   %resume.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, 
> i32 0, i32 0
>   42   store void (%f0.Frame*)* @f0.resume, void (%f0.Frame*)** %resume.addr, 
> align 8
>   43   %6 = select i1 true, void (%f0.Frame*)* @f0.destroy, void 
> (%f0.Frame*)* @f0.cleanup
>   44   %destroy.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
> %FramePtr, i32 0, i32 1
>   45   store void (%f0.Frame*)* %6, void (%f0.Frame*)** %destroy.addr, align 8
>   46   %7 = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, 
> i32 2
>   47   %8 = load i8*, i8** %alloc.frame.ptr, align 8
>   48   store i8* %8, i8** %7, align 8
>   49   br label %AllocaSpillBB
>   50
>   51 AllocaSpillBB:; preds = %coro.init
>   52   %.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
> %FramePtr, i32 0, i32 4
>   53   %ref.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
> %FramePtr, i32 0, i32 5
>   54   %agg.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
> %FramePtr, i32 0, i32 6
>   55   %ref.tmp5.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
> %FramePtr, i32 0, i32 7
>   56   %agg.tmp8.reload.addr = getelementptr inbounds %f0.Frame,

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-07-01 Thread Chuanqi Xu via Phabricator via cfe-commits

ChuanqiXu added a comment.

I am a little busy this week. I would try to look into this next week if 
possible.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-06-29 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen updated this revision to Diff 355401.
ychen added a comment.

- Add description of `raw frame` in `Coroutines.rst`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

Files:
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-cleanup.cpp
  clang/test/CodeGenCoroutines/coro-gro.cpp
  llvm/docs/Coroutines.rst
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/Transforms/Coroutines/CoroFrame.cpp
  llvm/lib/Transforms/Coroutines/CoroInstr.h
  llvm/lib/Transforms/Coroutines/CoroInternal.h
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp
  llvm/lib/Transforms/Coroutines/Coroutines.cpp
  llvm/test/Transforms/Coroutines/coro-frame-overalign.ll

Index: llvm/test/Transforms/Coroutines/coro-frame-overalign.ll
===
--- /dev/null
+++ llvm/test/Transforms/Coroutines/coro-frame-overalign.ll
@@ -0,0 +1,78 @@
+; Check that `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and
+; `@llvm.coro.raw.frame.ptr.alloca` are lowered correctly.
+; RUN: opt < %s -passes=coro-split -S | FileCheck %s
+
+%PackedStruct = type <{ i64 }>
+
+declare void @consume(%PackedStruct*, i32, i32, i8**)
+declare void @consume2(i32, i32)
+
+define i8* @f() "coroutine.presplit"="1" {
+entry:
+  %data = alloca %PackedStruct, align 32
+  %id = call token @llvm.coro.id(i32 16, i8* null, i8* null, i8* null)
+  %size = call i32 @llvm.coro.size.i32()
+  %alloc = call i8* @malloc(i32 %size)
+  %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
+  %align = call i32 @llvm.coro.align.i32()
+  %offset = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  %addr = call i8** @llvm.coro.raw.frame.ptr.addr()
+  call void @consume(%PackedStruct* %data, i32 %align, i32 %offset, i8** %addr)
+  %0 = call i8 @llvm.coro.suspend(token none, i1 false)
+  switch i8 %0, label %suspend [i8 0, label %resume
+i8 1, label %cleanup]
+resume:
+  br label %cleanup
+
+cleanup:
+  %align2 = call i32 @llvm.coro.align.i32()
+  %offset2 = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  call void @consume2(i32 %align2, i32 %offset2)
+  %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
+  call void @free(i8* %mem)
+  br label %suspend
+suspend:
+  call i1 @llvm.coro.end(i8* %hdl, i1 0)
+  ret i8* %hdl
+}
+
+; See if the raw frame pointer was inserted into the frame.
+; CHECK-LABEL: %f.Frame = type { void (%f.Frame*)*, void (%f.Frame*)*, i8*, i1, [7 x i8], %PackedStruct }
+
+; See if we used correct index to access frame addr field (field 2).
+; CHECK-LABEL: @f(
+; CHECK: %alloc.frame.ptr = alloca i8*, align 8
+; CHECK: %[[FIELD:.+]] = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
+; CHECK: %[[ADDR:.+]] = load i8*, i8** %alloc.frame.ptr, align 8
+; CHECK: store i8* %[[ADDR]], i8** %[[FIELD]], align 8
+; CHECK: %[[DATA:.+]] = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 5
+; CHECK: call void @consume(%PackedStruct* %[[DATA]], i32 32, i32 16, i8** %[[FIELD]])
+; CHECK: ret i8*
+
+; See if `llvm.coro.align` and `llvm.coro.raw.frame.ptr.offset` are lowered
+; correctly during deallocation.
+; CHECK-LABEL: @f.destroy(
+; CHECK: call void @consume2(i32 32, i32 16)
+; CHECK: call void @free(i8* %{{.*}})
+
+; CHECK-LABEL: @f.cleanup(
+; CHECK: call void @consume2(i32 32, i32 16)
+; CHECK: call void @free(i8*
+
+declare i8* @llvm.coro.free(token, i8*)
+declare i32 @llvm.coro.size.i32()
+declare i32 @llvm.coro.align.i32()
+declare i32 @llvm.coro.raw.frame.ptr.offset.i32()
+declare i8** @llvm.coro.raw.frame.ptr.addr()
+declare i8  @llvm.coro.suspend(token, i1)
+declare void @llvm.coro.resume(i8*)
+declare void @llvm.coro.destroy(i8*)
+
+declare token @llvm.coro.id(i32, i8*, i8*, i8*)
+declare i1 @llvm.coro.alloc(token)
+declare i8* @llvm.coro.begin(token, i8*)
+declare i1 @llvm.coro.end(i8*, i1)
+
+declare noalias i8* @malloc(i32)
+declare double @print(double)
+declare void @free(i8*)
Index: llvm/lib/Transforms/Coroutines/Coroutines.cpp
===
--- llvm/lib/Transforms/Coroutines/Coroutines.cpp
+++ llvm/lib/Transforms/Coroutines/Coroutines.cpp
@@ -234,6 +234,9 @@
   Shape.CoroBegin = nullptr;
   Shape.CoroEnds.clear();
   Shape.CoroSizes.clear();
+  Shape.CoroAligns.clear();
+  Shape.CoroRawFramePtrOffsets.clear();
+  Shape.CoroRawFramePtrAddrs.clear();
   Shape.CoroSuspends.clear();
 
   Shape.FrameTy = nullptr;
@@ -268,6 +271,15 @@
   case Intrinsic::coro_size:
 CoroSizes.push_back(cast(II));
 break;
+  case Intrinsic::coro_align:
+CoroAligns.push_back(cast(II));
+break;
+  case Intrinsic::coro_raw_frame_ptr_offset:
+CoroRawFramePtrOffsets.push_back(cast(II));
+break;
+  case Intrinsic::coro_raw_frame_ptr_addr:
+CoroRawFramePtrAddrs.push_back(cast(II));
+break;

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-06-29 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added a comment.

> Thanks for clarifying. Let's solve the semantics problem first.
> With the introduction about 'raw frame', I think it's necessary to introduce 
> this concept in the section 'Switched-Resume Lowering' or even the section 
> 'Introduction' in the document. Add a section to tell the terminology is 
> satisfied too.

Done.

> Then why we defined both 'llvm.coro.raw.frame.ptr.offset' and 
> 'llvm.coro.raw.frame.ptr.addr' together? It looks like refer to the same 
> value finally. It looks like 'llvm.coro.raw.frame.ptr.offset' are trying to 
> solve the problem about memory leak. But I think we could use 
> llvm.coro.raw.frame.ptr.addr directly instead of traversing the frame (Maybe 
> we need to add an intrinsic `llvm.coro.raw.size`).  Then we can omit a field 
> in the frame to save space.

("llvm.coro.raw.frame.ptr.offset" is an offset from coroutine frame address 
instead of raw frame pointer)

Apologies for the confusion. I've briefly explained it here 
https://reviews.llvm.org/D102145#2752445 I think it is not clear. 
"llvm.coro.raw.frame.ptr.addr" is conceptually "the address of a coroutine 
frame field storing the `raw frame pointer`" only after `insertSpills` in 
CoroFrame.cpp. Before that, "llvm.coro.raw.frame.ptr.addr" is actually an 
alloca storing the `raw frame pointer` (try grepping "alloc.frame.ptr" in this 
review page). Using  "llvm.coro.raw.frame.ptr.offset" instead of  
"llvm.coro.raw.frame.ptr.addr" is doable which looks like below, please check 
line 31. The downside is that the write to coroutine frame is not through an 
alloca but a direct write. It is unusual because all fields in the frame are 
stored as 1. special/header fields 2. alloca 3. splills. Doing the write 
indirectly as Alloca makes me comfortable. The tradeoff is one extra intrinsic 
"llvm.coro.raw.frame.ptr.addr". What do you think?

  19 coro.alloc.align: ; preds = 
%coro.alloc.check.align
  20   %3 = sub nsw i64 64, 16
  21   %4 = add i64 128, %3
  22   %call1 = call noalias nonnull i8* @_Znwm(i64 %4) #13
  23   %mask = sub i64 64, 1
  24   %intptr = ptrtoint i8* %call1 to i64
  25   %over_boundary = add i64 %intptr, %mask
  26   %inverted_mask = xor i64 %mask, -1
  27   %aligned_intptr = and i64 %over_boundary, %inverted_mask
  28   %diff = sub i64 %aligned_intptr, %intptr
  29   %aligned_result = getelementptr inbounds i8, i8* %call1, i64 %diff
  30   call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, i64 64) ]
  31   store i8* %call1, i8** %alloc.frame.ptr, align 8 
  
   ; Replace line 31 with below, and must makes sure line 46~line 48 is 
skipped.
   ; %poff = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
   ; %addr = getelementptr inbounds i8, i8* %aligned_result, i32 %poff
   ; %addr1 = bitcast i8* %addr to i8**
   ; store i8* %call1, i8** %addr1, align 8
  
  
  32   br label %coro.init.from.coro.alloc.align
  33
  34 coro.init.from.coro.alloc.align:  ; preds = 
%coro.alloc.align
  35   %aligned_result.coro.init = phi i8* [ %aligned_result, %coro.alloc.align 
]
  36   br label %coro.init
  37
  38 coro.init:; preds = 
%coro.init.from.entry, %coro.init.from.coro.alloc.align, %cor
 o.init.from.coro.alloc
  39   %5 = phi i8* [ %.coro.init, %coro.init.from.entry ], [ %call.coro.init, 
%coro.init.from.coro.alloc ], [ %aligned_result
 .coro.init, %coro.init.from.coro.alloc.align ]
  40   %FramePtr = bitcast i8* %5 to %f0.Frame*
  41   %resume.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, 
i32 0, i32 0
  42   store void (%f0.Frame*)* @f0.resume, void (%f0.Frame*)** %resume.addr, 
align 8
  43   %6 = select i1 true, void (%f0.Frame*)* @f0.destroy, void (%f0.Frame*)* 
@f0.cleanup
  44   %destroy.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, 
i32 0, i32 1
  45   store void (%f0.Frame*)* %6, void (%f0.Frame*)** %destroy.addr, align 8
  46   %7 = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 2
  47   %8 = load i8*, i8** %alloc.frame.ptr, align 8
  48   store i8* %8, i8** %7, align 8
  49   br label %AllocaSpillBB
  50
  51 AllocaSpillBB:; preds = %coro.init
  52   %.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, 
i32 0, i32 4
  53   %ref.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
%FramePtr, i32 0, i32 5
  54   %agg.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
%FramePtr, i32 0, i32 6
  55   %ref.tmp5.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
%FramePtr, i32 0, i32 7
  56   %agg.tmp8.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
%FramePtr, i32 0, i32 8
  57   %__promise.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
%FramePtr, i32 0, i32 10
  58   br label %PostSpill




> Then I am a little confused for the design again, since we would treat the 
> value for

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-06-24 Thread Chuanqi Xu via Phabricator via cfe-commits

ChuanqiXu added a comment.

In D97915#2835147 , @ychen wrote:

> In D97915#2832667 , @ChuanqiXu wrote:
>
>> In D97915#2832446 , @ychen wrote:
>>
>>> In D97915#2829581 , @ChuanqiXu 
>>> wrote:
>>>
 A remained question.

 - what's the semantics if user specified their allocation/deallocation 
 functions?

 Previously, we discussed for the ::operator new and ::operator delete. But 
 what would we do for user specified allocation/deallocation functions?
 It looks like we would treat them just like `::operator new`. And it makes 
 sense at the first glance. But the problem comes from that we judge whether
 or not to over allocate a frame by this condition:

   coro.align > align of new

 But if the user uses their new, it makes no sense to use the `align of 
 new` as the condition. On the other hand, if user specified their new 
 function and the 
 alignment requirement for their promise type, would it be better really 
 that the compiler do the extra transformation?

 May be we need to discuss with other guys who are more familiar with the 
 C++ standard to answer this.
>>>
>>> I think @rjmccall could answer these. My understanding is that 
>>> user-specified allocation/deallocation has the same semantics as their 
>>> standard library's counterparts. `align of new` (aka 
>>> __STDCPP_DEFAULT_NEW_ALIGNMENT__) should apply to both.
>>
>> Yeah, I just curious about the question and not sure about the answer yet. I 
>> agree with that it should be safe if we treat user-specified 
>> allocation/deallocation as ::operator new. Maybe I am a little bit of 
>> pedantry. I just not sure if the developer would be satisfied when they find 
>> we allocate padding space they didn't want when they offered a new/delete 
>> method. (Maybe I am too anxious).
>>
>> ---
>>
>> Another problem I find in this patch is that the introduction for `raw 
>> frame` makes the frame confusing. For example, the following codes is aim to 
>> calculate the size for the over allocated  frame:
>>
>>   %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
>>   %[[NEWSIZE:.+]] = add i64 %[[SIZE]], %[[PAD]]
>>   %[[MEM2:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[NEWSIZE]])
>>
>> It makes sense only if `llvm.coro.size` stands for the size of 'true frame' 
>> (I am not sure what's the meaning for raw frame now. Let me use 'true frame' 
>> temporarily). But the document now says that '@llvm.coro.size' returns the 
>> size of the frame. It's confusing
>> I am not require to fix it by rename 'llvm.coro.size' or rephrase the 
>> document simply. I am thinking about the the semantics of coroutine 
>> intrinsics after we introduce the concept of 'raw frame'.
>
> These are great points. The semantics of some coroutine intrinsics needs a 
> little bit of tweaking otherwise they are confusing with the introduction of 
> `raw frame` (suggestions for alternative names are much appreciated) which I 
> defined in the updated patch (Coroutines.rst). Docs for `llvm.coro.size` 
> mentions `coroutine frame` which I've used verbatim in the docs for 
> `llvm.coro.raw.frame.ptr.*`.
> Using your diagram below: `raw frame` is  `| --- for align --- | --- true 
> frame --- |`. `Coroutine frame` is `| --- true frame --- |`. (BTW, 
> `llvm.coro.begin` docs are stale which I updated in the patch, please take a 
> look @ChuanqiXu @rjmccall @lxfind).

Thanks for clarifying. Let's solve the semantics problem first.
With the introduction about 'raw frame', I think it's necessary to introduce 
this concept in the section 'Switched-Resume Lowering' or even the section 
'Introduction' in the document. Add a section to tell the terminology is 
satisfied too.

Then why we defined both 'llvm.coro.raw.frame.ptr.offset' and 
'llvm.coro.raw.frame.ptr.addr' together? It looks like refer to the same value 
finally. It looks like 'llvm.coro.raw.frame.ptr.offset' are trying to solve the 
problem about memory leak. But I think we could use 
llvm.coro.raw.frame.ptr.addr directly instead of traversing the frame (Maybe we 
need to add an intrinsic `llvm.coro.raw.size`).  Then we can omit a field in 
the frame to save space.

Then I am a little confused for the design again, since we would treat the 
value for CoroBegin as the address of coroutine frame in the past and it looks 
like to be the raw frame now. Let me reconsider if it is OK.

Comment at: llvm/docs/Coroutines.rst:963
+
+The '``llvm.coro.align``' intrinsic returns the alignment of the coroutine 
frame
+in bytes.

> '``llvm.coro.align``'
`llvm.coro.align`

Comment at: llvm/docs/Coroutines.rst:1054

 The second argument is a pointer to a block of memory where coroutine frame
 will be stored if it is allocated dynamically.  This pointer is

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-06-23 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen updated this revision to Diff 353855.
ychen added a comment.

- Update intrinsics documentation.
- Inline `EmitBuiltinAlignTo` as `emitAlignUpTo`.
- Address other comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

Files:
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-cleanup.cpp
  clang/test/CodeGenCoroutines/coro-gro.cpp
  llvm/docs/Coroutines.rst
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/Transforms/Coroutines/CoroFrame.cpp
  llvm/lib/Transforms/Coroutines/CoroInstr.h
  llvm/lib/Transforms/Coroutines/CoroInternal.h
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp
  llvm/lib/Transforms/Coroutines/Coroutines.cpp
  llvm/test/Transforms/Coroutines/coro-frame-overalign.ll

Index: llvm/test/Transforms/Coroutines/coro-frame-overalign.ll
===
--- /dev/null
+++ llvm/test/Transforms/Coroutines/coro-frame-overalign.ll
@@ -0,0 +1,78 @@
+; Check that `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and
+; `@llvm.coro.raw.frame.ptr.alloca` are lowered correctly.
+; RUN: opt < %s -passes=coro-split -S | FileCheck %s
+
+%PackedStruct = type <{ i64 }>
+
+declare void @consume(%PackedStruct*, i32, i32, i8**)
+declare void @consume2(i32, i32)
+
+define i8* @f() "coroutine.presplit"="1" {
+entry:
+  %data = alloca %PackedStruct, align 32
+  %id = call token @llvm.coro.id(i32 16, i8* null, i8* null, i8* null)
+  %size = call i32 @llvm.coro.size.i32()
+  %alloc = call i8* @malloc(i32 %size)
+  %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
+  %align = call i32 @llvm.coro.align.i32()
+  %offset = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  %addr = call i8** @llvm.coro.raw.frame.ptr.addr()
+  call void @consume(%PackedStruct* %data, i32 %align, i32 %offset, i8** %addr)
+  %0 = call i8 @llvm.coro.suspend(token none, i1 false)
+  switch i8 %0, label %suspend [i8 0, label %resume
+i8 1, label %cleanup]
+resume:
+  br label %cleanup
+
+cleanup:
+  %align2 = call i32 @llvm.coro.align.i32()
+  %offset2 = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  call void @consume2(i32 %align2, i32 %offset2)
+  %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
+  call void @free(i8* %mem)
+  br label %suspend
+suspend:
+  call i1 @llvm.coro.end(i8* %hdl, i1 0)
+  ret i8* %hdl
+}
+
+; See if the raw frame address was inserted into the frame.
+; CHECK-LABEL: %f.Frame = type { void (%f.Frame*)*, void (%f.Frame*)*, i8*, i1, [7 x i8], %PackedStruct }
+
+; See if we used correct index to access frame addr field (field 2).
+; CHECK-LABEL: @f(
+; CHECK: %alloc.frame.ptr = alloca i8*, align 8
+; CHECK: %[[FIELD:.+]] = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
+; CHECK: %[[ADDR:.+]] = load i8*, i8** %alloc.frame.ptr, align 8
+; CHECK: store i8* %[[ADDR]], i8** %[[FIELD]], align 8
+; CHECK: %[[DATA:.+]] = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 5
+; CHECK: call void @consume(%PackedStruct* %[[DATA]], i32 32, i32 16, i8** %[[FIELD]])
+; CHECK: ret i8*
+
+; See if `llvm.coro.align` and `llvm.coro.raw.frame.ptr.offset` are lowered
+; correctly during deallocation.
+; CHECK-LABEL: @f.destroy(
+; CHECK: call void @consume2(i32 32, i32 16)
+; CHECK: call void @free(i8* %{{.*}})
+
+; CHECK-LABEL: @f.cleanup(
+; CHECK: call void @consume2(i32 32, i32 16)
+; CHECK: call void @free(i8*
+
+declare i8* @llvm.coro.free(token, i8*)
+declare i32 @llvm.coro.size.i32()
+declare i32 @llvm.coro.align.i32()
+declare i32 @llvm.coro.raw.frame.ptr.offset.i32()
+declare i8** @llvm.coro.raw.frame.ptr.addr()
+declare i8  @llvm.coro.suspend(token, i1)
+declare void @llvm.coro.resume(i8*)
+declare void @llvm.coro.destroy(i8*)
+
+declare token @llvm.coro.id(i32, i8*, i8*, i8*)
+declare i1 @llvm.coro.alloc(token)
+declare i8* @llvm.coro.begin(token, i8*)
+declare i1 @llvm.coro.end(i8*, i1)
+
+declare noalias i8* @malloc(i32)
+declare double @print(double)
+declare void @free(i8*)
Index: llvm/lib/Transforms/Coroutines/Coroutines.cpp
===
--- llvm/lib/Transforms/Coroutines/Coroutines.cpp
+++ llvm/lib/Transforms/Coroutines/Coroutines.cpp
@@ -234,6 +234,9 @@
   Shape.CoroBegin = nullptr;
   Shape.CoroEnds.clear();
   Shape.CoroSizes.clear();
+  Shape.CoroAligns.clear();
+  Shape.CoroRawFramePtrOffsets.clear();
+  Shape.CoroRawFramePtrAddrs.clear();
   Shape.CoroSuspends.clear();
 
   Shape.FrameTy = nullptr;
@@ -268,6 +271,15 @@
   case Intrinsic::coro_size:
 CoroSizes.push_back(cast(II));
 break;
+  case Intrinsic::coro_align:
+CoroAligns.push_back(cast(II));
+break;
+  case Intrinsic::coro_raw_frame_ptr_offset:
+CoroRawFramePtrOffsets.push_back(cast(II));
+break;
+  case Intrinsic::coro_raw_frame_ptr_addr:
+

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-06-23 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added a comment.

In D97915#2832667 , @ChuanqiXu wrote:

> In D97915#2832446 , @ychen wrote:
>
>> In D97915#2829581 , @ChuanqiXu 
>> wrote:
>>
>>> A remained question.
>>>
>>> - what's the semantics if user specified their allocation/deallocation 
>>> functions?
>>>
>>> Previously, we discussed for the ::operator new and ::operator delete. But 
>>> what would we do for user specified allocation/deallocation functions?
>>> It looks like we would treat them just like `::operator new`. And it makes 
>>> sense at the first glance. But the problem comes from that we judge whether
>>> or not to over allocate a frame by this condition:
>>>
>>>   coro.align > align of new
>>>
>>> But if the user uses their new, it makes no sense to use the `align of new` 
>>> as the condition. On the other hand, if user specified their new function 
>>> and the 
>>> alignment requirement for their promise type, would it be better really 
>>> that the compiler do the extra transformation?
>>>
>>> May be we need to discuss with other guys who are more familiar with the 
>>> C++ standard to answer this.
>>
>> I think @rjmccall could answer these. My understanding is that 
>> user-specified allocation/deallocation has the same semantics as their 
>> standard library's counterparts. `align of new` (aka 
>> __STDCPP_DEFAULT_NEW_ALIGNMENT__) should apply to both.
>
> Yeah, I just curious about the question and not sure about the answer yet. I 
> agree with that it should be safe if we treat user-specified 
> allocation/deallocation as ::operator new. Maybe I am a little bit of 
> pedantry. I just not sure if the developer would be satisfied when they find 
> we allocate padding space they didn't want when they offered a new/delete 
> method. (Maybe I am too anxious).
>
> ---
>
> Another problem I find in this patch is that the introduction for `raw frame` 
> makes the frame confusing. For example, the following codes is aim to 
> calculate the size for the over allocated  frame:
>
>   %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
>   %[[NEWSIZE:.+]] = add i64 %[[SIZE]], %[[PAD]]
>   %[[MEM2:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[NEWSIZE]])
>
> It makes sense only if `llvm.coro.size` stands for the size of 'true frame' 
> (I am not sure what's the meaning for raw frame now. Let me use 'true frame' 
> temporarily). But the document now says that '@llvm.coro.size' returns the 
> size of the frame. It's confusing
> I am not require to fix it by rename 'llvm.coro.size' or rephrase the 
> document simply. I am thinking about the the semantics of coroutine 
> intrinsics after we introduce the concept of 'raw frame'.

These are great points. The semantics of some coroutine intrinsics needs a 
little bit of tweaking otherwise they are confusing with the introduction of 
`raw frame` (suggestions for alternative names are much appreciated) which I 
defined in the updated patch (Coroutines.rst). Docs for `llvm.coro.size` 
mentions `coroutine frame` which I've used verbatim in the docs for 
`llvm.coro.raw.frame.ptr.*`.
Using your diagram below: `raw frame` is  `| --- for align --- | --- true frame 
--- |`. `Coroutine frame` is `| --- true frame --- |`. (BTW, `llvm.coro.begin` 
docs are stale which I updated in the patch, please take a look @ChuanqiXu 
@rjmccall @lxfind).




Comment at: clang/lib/CodeGen/CGCoroutine.cpp:452-461
+  llvm::Function *RawFramePtrOffsetIntrin = CGF.CGM.getIntrinsic(
+  llvm::Intrinsic::coro_raw_frame_ptr_offset, CGF.Int32Ty);
+  auto *RawFramePtrOffset = CGF.Builder.CreateCall(RawFramePtrOffsetIntrin);
+  auto *FramePtrAddrStart =
+  CGF.Builder.CreateInBoundsGEP(CoroFree, {RawFramePtrOffset});
+  auto *FramePtrAddr = CGF.Builder.CreatePointerCast(
+  FramePtrAddrStart, CGF.Int8PtrTy->getPointerTo());

ChuanqiXu wrote:
> ychen wrote:
> > ChuanqiXu wrote:
> > > We allocate overaligned-frame like:
> > > ```
> > > | --- for align --- | --- true frame --- |
> > > ```
> > > And here we set the argument to the address of true frame. Then I wonder 
> > > how about the memory for the `for align` part. Would we still free them 
> > > correctly? Maybe I missed something.
> > > Would we still free them correctly? 
> > Yes, that's the tricky part. Using `f0` of `coro-alloc.cpp` as an example, 
> > `llvm.coro.raw.frame.ptr.addr` is called at alloc time to save the raw 
> > memory pointer to the coroutine frame.  Later at dealloc time, 
> > `llvm.coro.raw.frame.ptr.addr` is called again to load the raw memory 
> > pointer back and free it.
> To make it clear, what's the definition for 'raw ptr'? From the context, I 
> think it means the true frame in above diagram from the context.
> 
> So this confuses me:
> > llvm.coro.raw.frame.ptr.addr is called again to load the raw memory pointer 
> > back and free it.
> 
> If the raw memory means the true frame, it may

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-06-22 Thread Chuanqi Xu via Phabricator via cfe-commits

ChuanqiXu added a comment.

In D97915#2832446 , @ychen wrote:

> In D97915#2829581 , @ChuanqiXu wrote:
>
>> A remained question.
>>
>> - what's the semantics if user specified their allocation/deallocation 
>> functions?
>>
>> Previously, we discussed for the ::operator new and ::operator delete. But 
>> what would we do for user specified allocation/deallocation functions?
>> It looks like we would treat them just like `::operator new`. And it makes 
>> sense at the first glance. But the problem comes from that we judge whether
>> or not to over allocate a frame by this condition:
>>
>>   coro.align > align of new
>>
>> But if the user uses their new, it makes no sense to use the `align of new` 
>> as the condition. On the other hand, if user specified their new function 
>> and the 
>> alignment requirement for their promise type, would it be better really that 
>> the compiler do the extra transformation?
>>
>> May be we need to discuss with other guys who are more familiar with the C++ 
>> standard to answer this.
>
> I think @rjmccall could answer these. My understanding is that user-specified 
> allocation/deallocation has the same semantics as their standard library's 
> counterparts. `align of new` (aka __STDCPP_DEFAULT_NEW_ALIGNMENT__) should 
> apply to both.

Yeah, I just curious about the question and not sure about the answer yet. I 
agree with that it should be safe if we treat user-specified 
allocation/deallocation as ::operator new. Maybe I am a little bit of pedantry. 
I just not sure if the developer would be satisfied when they find we allocate 
padding space they didn't want when they offered a new/delete method. (Maybe I 
am too anxious).

---

Another problem I find in this patch is that the introduction for `raw frame` 
makes the frame confusing. For example, the following codes is aim to calculate 
the size for the over allocated  frame:

  %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
  %[[NEWSIZE:.+]] = add i64 %[[SIZE]], %[[PAD]]
  %[[MEM2:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[NEWSIZE]])

It makes sense only if `llvm.coro.size` stands for the size of 'true frame' (I 
am not sure what's the meaning for raw frame now. Let me use 'true frame' 
temporarily). But the document now says that '@llvm.coro.size' returns the size 
of the frame. It's confusing
I am not require to fix it by rename 'llvm.coro.size' or rephrase the document 
simply. I am thinking about the the semantics of coroutine intrinsics after we 
introduce the concept of 'raw frame'.

Comment at: clang/lib/CodeGen/CGCoroutine.cpp:424
+// dynamically adjust the frame start address at runtime.
+void OverAllocateFrame(CodeGenFunction , llvm::CallInst *CI, bool IsAlloc) 
{
+  unsigned CoroSizeIdx = IsAlloc ? 0 : 1;

So if this would be called for deallocate. the function name is confusing. I 
think it may be better to rename it as something like 
'GeSizeOFtOverAlignedFrame' (The name suggested looks not good too).
By the way, now I am wondering why wouldn't we use llvm.coro.size directly? And 
make the middle end to handle it. How do you think about it?

Comment at: clang/lib/CodeGen/CGCoroutine.cpp:719
+  auto *AlignedUpAddr =
+  EmitBuiltinAlignTo(AlignedAllocateCall, CoroAlign, S.getAllocate(), 
true);
+  // rawFrame = frame

How do your think about to replace EmitBuiltinAlignTo inplace?

Comment at: clang/lib/CodeGen/CGCoroutine.cpp:452-461
+  llvm::Function *RawFramePtrOffsetIntrin = CGF.CGM.getIntrinsic(
+  llvm::Intrinsic::coro_raw_frame_ptr_offset, CGF.Int32Ty);
+  auto *RawFramePtrOffset = CGF.Builder.CreateCall(RawFramePtrOffsetIntrin);
+  auto *FramePtrAddrStart =
+  CGF.Builder.CreateInBoundsGEP(CoroFree, {RawFramePtrOffset});
+  auto *FramePtrAddr = CGF.Builder.CreatePointerCast(
+  FramePtrAddrStart, CGF.Int8PtrTy->getPointerTo());

ychen wrote:
> ChuanqiXu wrote:
> > We allocate overaligned-frame like:
> > ```
> > | --- for align --- | --- true frame --- |
> > ```
> > And here we set the argument to the address of true frame. Then I wonder 
> > how about the memory for the `for align` part. Would we still free them 
> > correctly? Maybe I missed something.
> > Would we still free them correctly? 
> Yes, that's the tricky part. Using `f0` of `coro-alloc.cpp` as an example, 
> `llvm.coro.raw.frame.ptr.addr` is called at alloc time to save the raw memory 
> pointer to the coroutine frame.  Later at dealloc time, 
> `llvm.coro.raw.frame.ptr.addr` is called again to load the raw memory pointer 
> back and free it.
To make it clear, what's the definition for 'raw ptr'? From the context, I 
think it means the true frame in above diagram from the context.

So this confuses me:
> llvm.coro.raw.frame.ptr.addr is called again to load the raw memory pointer 
> back and free it.

If the raw memory means the true

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-06-22 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen updated this revision to Diff 353567.
ychen marked an inline comment as done.
ychen added a comment.

- Address comments


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-cleanup.cpp
  clang/test/CodeGenCoroutines/coro-gro.cpp
  llvm/docs/Coroutines.rst
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/Transforms/Coroutines/CoroFrame.cpp
  llvm/lib/Transforms/Coroutines/CoroInstr.h
  llvm/lib/Transforms/Coroutines/CoroInternal.h
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp
  llvm/lib/Transforms/Coroutines/Coroutines.cpp
  llvm/test/Transforms/Coroutines/coro-frame-overalign.ll

Index: llvm/test/Transforms/Coroutines/coro-frame-overalign.ll
===
--- /dev/null
+++ llvm/test/Transforms/Coroutines/coro-frame-overalign.ll
@@ -0,0 +1,78 @@
+; Check that `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and
+; `@llvm.coro.raw.frame.ptr.alloca` are lowered correctly.
+; RUN: opt < %s -passes=coro-split -S | FileCheck %s
+
+%PackedStruct = type <{ i64 }>
+
+declare void @consume(%PackedStruct*, i32, i32, i8**)
+declare void @consume2(i32, i32)
+
+define i8* @f() "coroutine.presplit"="1" {
+entry:
+  %data = alloca %PackedStruct, align 32
+  %id = call token @llvm.coro.id(i32 16, i8* null, i8* null, i8* null)
+  %size = call i32 @llvm.coro.size.i32()
+  %alloc = call i8* @malloc(i32 %size)
+  %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
+  %align = call i32 @llvm.coro.align.i32()
+  %offset = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  %addr = call i8** @llvm.coro.raw.frame.ptr.addr()
+  call void @consume(%PackedStruct* %data, i32 %align, i32 %offset, i8** %addr)
+  %0 = call i8 @llvm.coro.suspend(token none, i1 false)
+  switch i8 %0, label %suspend [i8 0, label %resume
+i8 1, label %cleanup]
+resume:
+  br label %cleanup
+
+cleanup:
+  %align2 = call i32 @llvm.coro.align.i32()
+  %offset2 = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  call void @consume2(i32 %align2, i32 %offset2)
+  %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
+  call void @free(i8* %mem)
+  br label %suspend
+suspend:
+  call i1 @llvm.coro.end(i8* %hdl, i1 0)
+  ret i8* %hdl
+}
+
+; See if the raw frame address was inserted into the frame.
+; CHECK-LABEL: %f.Frame = type { void (%f.Frame*)*, void (%f.Frame*)*, i8*, i1, [7 x i8], %PackedStruct }
+
+; See if we used correct index to access frame addr field (field 2).
+; CHECK-LABEL: @f(
+; CHECK: %alloc.frame.ptr = alloca i8*, align 8
+; CHECK: %[[FIELD:.+]] = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
+; CHECK: %[[ADDR:.+]] = load i8*, i8** %alloc.frame.ptr, align 8
+; CHECK: store i8* %[[ADDR]], i8** %[[FIELD]], align 8
+; CHECK: %[[DATA:.+]] = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 5
+; CHECK: call void @consume(%PackedStruct* %[[DATA]], i32 32, i32 16, i8** %[[FIELD]])
+; CHECK: ret i8*
+
+; See if `llvm.coro.align` and `llvm.coro.raw.frame.ptr.offset` are lowered
+; correctly during deallocation.
+; CHECK-LABEL: @f.destroy(
+; CHECK: call void @consume2(i32 32, i32 16)
+; CHECK: call void @free(i8* %{{.*}})
+
+; CHECK-LABEL: @f.cleanup(
+; CHECK: call void @consume2(i32 32, i32 16)
+; CHECK: call void @free(i8*
+
+declare i8* @llvm.coro.free(token, i8*)
+declare i32 @llvm.coro.size.i32()
+declare i32 @llvm.coro.align.i32()
+declare i32 @llvm.coro.raw.frame.ptr.offset.i32()
+declare i8** @llvm.coro.raw.frame.ptr.addr()
+declare i8  @llvm.coro.suspend(token, i1)
+declare void @llvm.coro.resume(i8*)
+declare void @llvm.coro.destroy(i8*)
+
+declare token @llvm.coro.id(i32, i8*, i8*, i8*)
+declare i1 @llvm.coro.alloc(token)
+declare i8* @llvm.coro.begin(token, i8*)
+declare i1 @llvm.coro.end(i8*, i1)
+
+declare noalias i8* @malloc(i32)
+declare double @print(double)
+declare void @free(i8*)
Index: llvm/lib/Transforms/Coroutines/Coroutines.cpp
===
--- llvm/lib/Transforms/Coroutines/Coroutines.cpp
+++ llvm/lib/Transforms/Coroutines/Coroutines.cpp
@@ -234,6 +234,9 @@
   Shape.CoroBegin = nullptr;
   Shape.CoroEnds.clear();
   Shape.CoroSizes.clear();
+  Shape.CoroAligns.clear();
+  Shape.CoroRawFramePtrOffsets.clear();
+  Shape.CoroRawFramePtrAddrs.clear();
   Shape.CoroSuspends.clear();
 
   Shape.FrameTy = nullptr;
@@ -268,6 +271,15 @@
   case Intrinsic::coro_size:
 CoroSizes.push_back(cast(II));
 break;
+  case Intrinsic::coro_align:
+CoroAligns.push_back(cast(II));
+break;
+  case Intrinsic::coro_raw_frame_ptr_offset:
+CoroRawFramePtrOffsets.push_back(cast(II));
+break;
+  case

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-06-22 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen marked 5 inline comments as done.
ychen added a comment.

In D97915#2829581 , @ChuanqiXu wrote:

> A remained question.
>
> - what's the semantics if user specified their allocation/deallocation 
> functions?
>
> Previously, we discussed for the ::operator new and ::operator delete. But 
> what would we do for user specified allocation/deallocation functions?
> It looks like we would treat them just like `::operator new`. And it makes 
> sense at the first glance. But the problem comes from that we judge whether
> or not to over allocate a frame by this condition:
>
>   coro.align > align of new
>
> But if the user uses their new, it makes no sense to use the `align of new` 
> as the condition. On the other hand, if user specified their new function and 
> the 
> alignment requirement for their promise type, would it be better really that 
> the compiler do the extra transformation?
>
> May be we need to discuss with other guys who are more familiar with the C++ 
> standard to answer this.

I think @rjmccall could answer these. My understanding is that user-specified 
allocation/deallocation has the same semantics as their standard library's 
counterparts. `align of new` (aka __STDCPP_DEFAULT_NEW_ALIGNMENT__) should 
apply to both.




Comment at: clang/lib/CodeGen/CGCoroutine.cpp:743-744
+CGM.getIntrinsic(llvm::Intrinsic::coro_align, SizeTy));
+auto *AlignedUpAddr = EmitBuiltinAlignTo(AlignedAllocateCall, CoroAlign,
+ S.getAlignedAllocate(), true);
+// rawFrame = frame;

ChuanqiXu wrote:
> ychen wrote:
> > ChuanqiXu wrote:
> > > ychen wrote:
> > > > ChuanqiXu wrote:
> > > > > Maybe we could calculate it in place instead of trying to call a 
> > > > > function which is not designed for llvm::value*. It looks like the 
> > > > > calculation isn't too complex.
> > > > I'm open to not calling `EmitBuiltinAlignTo`, which basically inline 
> > > > the useful parts of `EmitBuiltinAlignTo`.  The initial intention is 
> > > > code sharing and easy readability. What's the benefit of not calling it?
> > > Reusing code is good. But my main concern is that the design for the 
> > > interfaces. The current design smells bad to me. Another reason for 
> > > implement it in place is I think it is not very complex and easy to 
> > > understand.
> > > 
> > > Another option I got is to implement `EmitBuitinAlign` in LLVM (someplace 
> > > like `Local`), then the CodeGenFunction:: EmitBuitinAlign and this 
> > > function could use it.
> > > Reusing code is good. But my main concern is that the design for the 
> > > interfaces. The current design smells bad to me. Another reason for 
> > > implement it in place is I think it is not very complex and easy to 
> > > understand.
> > > 
> > > Another option I got is to implement `EmitBuitinAlign` in LLVM (someplace 
> > > like `Local`), then the CodeGenFunction:: EmitBuitinAlign and this 
> > > function could use it.
> > 
> > 
> I guess you forgot to reply what you want to say.
Yep, I meant to say the use of "void *" is removed.



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:431-433
+  auto *Diff = Builder.CreateNSWSub(AlignCall, AlignOfNewInt);
+  auto *NewCoroSize = Builder.CreateAdd(CI->getArgOperand(CoroSizeIdx), Diff);
+  CI->setArgOperand(CoroSizeIdx, NewCoroSize);

ChuanqiXu wrote:
> In other comments, I find 'size += align - NEW_ALIGN + sizeof(void*);'. But I 
> don't find sizeof(void*) in this function.
Sorry, that's a stale comment. It should be `size += align - NEW_ALIGN`. The 
`sizeof(void*)` was supposed for the newly added raw memory pointer stored in 
the frame. In the current implementation, `sizeof(void*)` is factored into the 
`llvm.coro.size()` calculation because CoroFrame is responsible for allocating 
the extra raw memory pointer if it is needed at all.



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:452-461
+  llvm::Function *RawFramePtrOffsetIntrin = CGF.CGM.getIntrinsic(
+  llvm::Intrinsic::coro_raw_frame_ptr_offset, CGF.Int32Ty);
+  auto *RawFramePtrOffset = CGF.Builder.CreateCall(RawFramePtrOffsetIntrin);
+  auto *FramePtrAddrStart =
+  CGF.Builder.CreateInBoundsGEP(CoroFree, {RawFramePtrOffset});
+  auto *FramePtrAddr = CGF.Builder.CreatePointerCast(
+  FramePtrAddrStart, CGF.Int8PtrTy->getPointerTo());

ChuanqiXu wrote:
> We allocate overaligned-frame like:
> ```
> | --- for align --- | --- true frame --- |
> ```
> And here we set the argument to the address of true frame. Then I wonder how 
> about the memory for the `for align` part. Would we still free them 
> correctly? Maybe I missed something.
> Would we still free them correctly? 
Yes, that's the tricky part. Using `f0` of `coro-alloc.cpp` as an example, 
`llvm.coro.raw.frame.ptr.addr` is called at alloc time to save the raw memory 
pointer to the coroutine frame.  Later at

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-06-20 Thread Chuanqi Xu via Phabricator via cfe-commits

ChuanqiXu added a comment.

A remained question.

- what's the semantics if user specified their allocation/deallocation 
functions?

Previously, we discussed for the ::operator new and ::operator delete. But what 
would we do for user specified allocation/deallocation functions?
It looks like we would treat them just like `::operator new`. And it makes 
sense at the first glance. But the problem comes from that we judge whether
or not to over allocate a frame by this condition:

  coro.align > align of new

But if the user uses their new, it makes no sense to use the `align of new` as 
the condition. On the other hand, if user specified their new function and the 
alignment requirement for their promise type, would it be better really that 
the compiler do the extra transformation?

May be we need to discuss with other guys who are more familiar with the C++ 
standard to answer this.

Comment at: clang/lib/CodeGen/CGCoroutine.cpp:437-475
+void emitDynamicAlignedDealloc(CodeGenFunction ,
+   llvm::BasicBlock *AlignedFreeBB,
+   llvm::CallInst *CoroFree) {
+  llvm::CallInst *Dealloc = nullptr;
+  for (llvm::User *U : CoroFree->users()) {
+if (auto *CI = dyn_cast(U))
+  if (CI->getParent() == CGF.Builder.GetInsertBlock())

ychen wrote:
> ChuanqiXu wrote:
> > We don't need this in this patch.
> Do you mean `// Match size_t argument with the one used during allocation.` 
> or the function `emitDynamicAlignedDealloc`? I think either is needed here. 
> Could you please elaborate? 
Sorry for that I misunderstand this function earlier.

Comment at: clang/lib/CodeGen/CGCoroutine.cpp:420
+
+void OverAllocateFrame(CodeGenFunction , llvm::CallInst *CI, bool IsAlloc) 
{
+  unsigned CoroSizeIdx = IsAlloc ? 0 : 1;

It looks like we'd better to add comment for this function.

Comment at: clang/lib/CodeGen/CGCoroutine.cpp:421
+void OverAllocateFrame(CodeGenFunction , llvm::CallInst *CI, bool IsAlloc) 
{
+  unsigned CoroSizeIdx = IsAlloc ? 0 : 1;
+  CGBuilderTy  = CGF.Builder;

CoroSizeIdx should be zero all the time in this patch.

Comment at: clang/lib/CodeGen/CGCoroutine.cpp:431-433
+  auto *Diff = Builder.CreateNSWSub(AlignCall, AlignOfNewInt);
+  auto *NewCoroSize = Builder.CreateAdd(CI->getArgOperand(CoroSizeIdx), Diff);
+  CI->setArgOperand(CoroSizeIdx, NewCoroSize);

In other comments, I find 'size += align - NEW_ALIGN + sizeof(void*);'. But I 
don't find sizeof(void*) in this function.

Comment at: clang/lib/CodeGen/CGCoroutine.cpp:452-461
+  llvm::Function *RawFramePtrOffsetIntrin = CGF.CGM.getIntrinsic(
+  llvm::Intrinsic::coro_raw_frame_ptr_offset, CGF.Int32Ty);
+  auto *RawFramePtrOffset = CGF.Builder.CreateCall(RawFramePtrOffsetIntrin);
+  auto *FramePtrAddrStart =
+  CGF.Builder.CreateInBoundsGEP(CoroFree, {RawFramePtrOffset});
+  auto *FramePtrAddr = CGF.Builder.CreatePointerCast(
+  FramePtrAddrStart, CGF.Int8PtrTy->getPointerTo());

We allocate overaligned-frame like:
```
| --- for align --- | --- true frame --- |
```
And here we set the argument to the address of true frame. Then I wonder how 
about the memory for the `for align` part. Would we still free them correctly? 
Maybe I missed something.

Comment at: clang/lib/CodeGen/CGCoroutine.cpp:466-472
+  if (Dealloc->getNumArgOperands() > 1) {
+// Size may only be the second argument of allocator call.
+if (auto *CoroSize =
+dyn_cast(Dealloc->getArgOperand(1)))
+  if (CoroSize->getIntrinsicID() == llvm::Intrinsic::coro_size)
+OverAllocateFrame(CGF, Dealloc, /*IsAlloc*/ false);
+  }

We don't need to handle this condition in this patch.

Comment at: clang/lib/CodeGen/CGCoroutine.cpp:644
+  llvm::BasicBlock *RetOnFailureBB = nullptr;
+  llvm::BasicBlock *AlignAllocBB2 = nullptr;

It may be better to rename AlignAllocBB2 as AlignAllocBBCont or something 
similar.

Comment at: clang/lib/CodeGen/CGCoroutine.cpp:717
+AlignAllocBB2 = createBasicBlock("coro.alloc.align2");
+Builder.CreateCondBr(Cond, AlignAllocBB2, RetOnFailureBB);
+EmitBlock(AlignAllocBB2);

It looks better to add an assert for RetOnFailureBB. I think it may be nullptr 
at the first glance.

Comment at: clang/lib/CodeGen/CGCoroutine.cpp:731-732
+  {RawFramePtrAddr, getPointerAlign()});
+  AlignedAllocateCall = AlignedUpAddr;
+
   EmitBlock(InitBB);

We remove this assignment and use AlignedUpAddr directly in the following.

Comment at: clang/lib/CodeGen/CGCoroutine.cpp:743-744
+CGM.getIntrinsic(llvm::Intrinsic::coro_align, SizeTy));
+auto *AlignedUpAddr = EmitBuiltinAlignTo(AlignedAllocateCall, CoroAlign,
+

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-06-17 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added inline comments.



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:437-475
+void emitDynamicAlignedDealloc(CodeGenFunction ,
+   llvm::BasicBlock *AlignedFreeBB,
+   llvm::CallInst *CoroFree) {
+  llvm::CallInst *Dealloc = nullptr;
+  for (llvm::User *U : CoroFree->users()) {
+if (auto *CI = dyn_cast(U))
+  if (CI->getParent() == CGF.Builder.GetInsertBlock())

ChuanqiXu wrote:
> We don't need this in this patch.
Do you mean `// Match size_t argument with the one used during allocation.` or 
the function `emitDynamicAlignedDealloc`? I think either is needed here. Could 
you please elaborate? 



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:555-558
+  explicit CallCoroDelete(Stmt *DeallocStmt, Stmt *AlignedDeallocStmt,
+  bool DynamicAlignedDealloc)
+  : Deallocate(DeallocStmt), AlignedDeallocate(AlignedDeallocStmt),
+DynamicAlignedDealloc(DynamicAlignedDealloc) {}

ChuanqiXu wrote:
> Do we still need this change?
Nope



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:743-744
+CGM.getIntrinsic(llvm::Intrinsic::coro_align, SizeTy));
+auto *AlignedUpAddr = EmitBuiltinAlignTo(AlignedAllocateCall, CoroAlign,
+ S.getAlignedAllocate(), true);
+// rawFrame = frame;

ChuanqiXu wrote:
> ychen wrote:
> > ChuanqiXu wrote:
> > > Maybe we could calculate it in place instead of trying to call a function 
> > > which is not designed for llvm::value*. It looks like the calculation 
> > > isn't too complex.
> > I'm open to not calling `EmitBuiltinAlignTo`, which basically inline the 
> > useful parts of `EmitBuiltinAlignTo`.  The initial intention is code 
> > sharing and easy readability. What's the benefit of not calling it?
> Reusing code is good. But my main concern is that the design for the 
> interfaces. The current design smells bad to me. Another reason for implement 
> it in place is I think it is not very complex and easy to understand.
> 
> Another option I got is to implement `EmitBuitinAlign` in LLVM (someplace 
> like `Local`), then the CodeGenFunction:: EmitBuitinAlign and this function 
> could use it.
> Reusing code is good. But my main concern is that the design for the 
> interfaces. The current design smells bad to me. Another reason for implement 
> it in place is I think it is not very complex and easy to understand.
> 
> Another option I got is to implement `EmitBuitinAlign` in LLVM (someplace 
> like `Local`), then the CodeGenFunction:: EmitBuitinAlign and this function 
> could use it.





Comment at: clang/lib/CodeGen/CodeGenFunction.h:1920-1921
 
+  llvm::Value *EmitBuiltinAlignTo(void *Args, const Expr *E, bool AlignUp);
+
 public:

ChuanqiXu wrote:
> ychen wrote:
> > ChuanqiXu wrote:
> > > We shouldn't add this interface. The actual type for the first argument 
> > > is BuiltinAlignArgs*, which defined in .cpp files. The signature is 
> > > confusing.
> > This is a private function, supposedly only meaningful for the 
> > implementation.  In that situation do you think it's bad? 
> It makes no sense to me that we can add interfaces casually if it is private. 
> For the users of Clang/LLVM, it may be OK since they wouldn't look into the 
> details. But for the developers, it matters. For example, I must be very 
> confused when I see this signature. Why is the type of `Args` is void*? 
> What's the type should I passed in? The smell is really bad.
> It makes no sense to me that we can add interfaces casually if it is private. 
> For the users of Clang/LLVM, it may be OK since they wouldn't look into the 
> details. But for the developers, it matters. For example, I must be very 
> confused when I see this signature. Why is the type of `Args` is void*? 
> What's the type should I passed in? The smell is really bad.




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-06-17 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen updated this revision to Diff 352908.
ychen marked 2 inline comments as done.
ychen added a comment.

- Not use `void *` in EmitBuiltinAlignTo signature.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-cleanup.cpp
  clang/test/CodeGenCoroutines/coro-gro.cpp
  llvm/docs/Coroutines.rst
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/Transforms/Coroutines/CoroFrame.cpp
  llvm/lib/Transforms/Coroutines/CoroInstr.h
  llvm/lib/Transforms/Coroutines/CoroInternal.h
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp
  llvm/lib/Transforms/Coroutines/Coroutines.cpp
  llvm/test/Transforms/Coroutines/coro-frame-overalign.ll

Index: llvm/test/Transforms/Coroutines/coro-frame-overalign.ll
===
--- /dev/null
+++ llvm/test/Transforms/Coroutines/coro-frame-overalign.ll
@@ -0,0 +1,78 @@
+; Check that `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and
+; `@llvm.coro.raw.frame.ptr.alloca` are lowered correctly.
+; RUN: opt < %s -passes=coro-split -S | FileCheck %s
+
+%PackedStruct = type <{ i64 }>
+
+declare void @consume(%PackedStruct*, i32, i32, i8**)
+declare void @consume2(i32, i32)
+
+define i8* @f() "coroutine.presplit"="1" {
+entry:
+  %data = alloca %PackedStruct, align 32
+  %id = call token @llvm.coro.id(i32 16, i8* null, i8* null, i8* null)
+  %size = call i32 @llvm.coro.size.i32()
+  %alloc = call i8* @malloc(i32 %size)
+  %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
+  %align = call i32 @llvm.coro.align.i32()
+  %offset = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  %addr = call i8** @llvm.coro.raw.frame.ptr.addr()
+  call void @consume(%PackedStruct* %data, i32 %align, i32 %offset, i8** %addr)
+  %0 = call i8 @llvm.coro.suspend(token none, i1 false)
+  switch i8 %0, label %suspend [i8 0, label %resume
+i8 1, label %cleanup]
+resume:
+  br label %cleanup
+
+cleanup:
+  %align2 = call i32 @llvm.coro.align.i32()
+  %offset2 = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  call void @consume2(i32 %align2, i32 %offset2)
+  %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
+  call void @free(i8* %mem)
+  br label %suspend
+suspend:
+  call i1 @llvm.coro.end(i8* %hdl, i1 0)
+  ret i8* %hdl
+}
+
+; See if the raw frame address was inserted into the frame.
+; CHECK-LABEL: %f.Frame = type { void (%f.Frame*)*, void (%f.Frame*)*, i8*, i1, [7 x i8], %PackedStruct }
+
+; See if we used correct index to access frame addr field (field 2).
+; CHECK-LABEL: @f(
+; CHECK: %alloc.frame.ptr = alloca i8*, align 8
+; CHECK: %[[FIELD:.+]] = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
+; CHECK: %[[ADDR:.+]] = load i8*, i8** %alloc.frame.ptr, align 8
+; CHECK: store i8* %[[ADDR]], i8** %[[FIELD]], align 8
+; CHECK: %[[DATA:.+]] = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 5
+; CHECK: call void @consume(%PackedStruct* %[[DATA]], i32 32, i32 16, i8** %[[FIELD]])
+; CHECK: ret i8*
+
+; See if `llvm.coro.align` and `llvm.coro.raw.frame.ptr.offset` are lowered
+; correctly during deallocation.
+; CHECK-LABEL: @f.destroy(
+; CHECK: call void @consume2(i32 32, i32 16)
+; CHECK: call void @free(i8* %{{.*}})
+
+; CHECK-LABEL: @f.cleanup(
+; CHECK: call void @consume2(i32 32, i32 16)
+; CHECK: call void @free(i8*
+
+declare i8* @llvm.coro.free(token, i8*)
+declare i32 @llvm.coro.size.i32()
+declare i32 @llvm.coro.align.i32()
+declare i32 @llvm.coro.raw.frame.ptr.offset.i32()
+declare i8** @llvm.coro.raw.frame.ptr.addr()
+declare i8  @llvm.coro.suspend(token, i1)
+declare void @llvm.coro.resume(i8*)
+declare void @llvm.coro.destroy(i8*)
+
+declare token @llvm.coro.id(i32, i8*, i8*, i8*)
+declare i1 @llvm.coro.alloc(token)
+declare i8* @llvm.coro.begin(token, i8*)
+declare i1 @llvm.coro.end(i8*, i1)
+
+declare noalias i8* @malloc(i32)
+declare double @print(double)
+declare void @free(i8*)
Index: llvm/lib/Transforms/Coroutines/Coroutines.cpp
===
--- llvm/lib/Transforms/Coroutines/Coroutines.cpp
+++ llvm/lib/Transforms/Coroutines/Coroutines.cpp
@@ -234,6 +234,9 @@
   Shape.CoroBegin = nullptr;
   Shape.CoroEnds.clear();
   Shape.CoroSizes.clear();
+  Shape.CoroAligns.clear();
+  Shape.CoroRawFramePtrOffsets.clear();
+  Shape.CoroRawFramePtrAddrs.clear();
   Shape.CoroSuspends.clear();
 
   Shape.FrameTy = nullptr;
@@ -268,6 +271,15 @@
   case Intrinsic::coro_size:
 CoroSizes.push_back(cast(II));
 break;
+  case Intrinsic::coro_align:
+CoroAligns.push_back(cast(II));
+break;
+  case Intrinsic::coro_raw_frame_ptr_offset:
+CoroRawFramePtrOffsets.push_back(cast(II));
+break;
+

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-06-17 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen updated this revision to Diff 352810.
ychen added a comment.

- Rebase correctly


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-cleanup.cpp
  clang/test/CodeGenCoroutines/coro-gro.cpp
  llvm/docs/Coroutines.rst
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/Transforms/Coroutines/CoroFrame.cpp
  llvm/lib/Transforms/Coroutines/CoroInstr.h
  llvm/lib/Transforms/Coroutines/CoroInternal.h
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp
  llvm/lib/Transforms/Coroutines/Coroutines.cpp
  llvm/test/Transforms/Coroutines/coro-frame-overalign.ll

Index: llvm/test/Transforms/Coroutines/coro-frame-overalign.ll
===
--- /dev/null
+++ llvm/test/Transforms/Coroutines/coro-frame-overalign.ll
@@ -0,0 +1,78 @@
+; Check that `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and
+; `@llvm.coro.raw.frame.ptr.alloca` are lowered correctly.
+; RUN: opt < %s -passes=coro-split -S | FileCheck %s
+
+%PackedStruct = type <{ i64 }>
+
+declare void @consume(%PackedStruct*, i32, i32, i8**)
+declare void @consume2(i32, i32)
+
+define i8* @f() "coroutine.presplit"="1" {
+entry:
+  %data = alloca %PackedStruct, align 32
+  %id = call token @llvm.coro.id(i32 16, i8* null, i8* null, i8* null)
+  %size = call i32 @llvm.coro.size.i32()
+  %alloc = call i8* @malloc(i32 %size)
+  %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
+  %align = call i32 @llvm.coro.align.i32()
+  %offset = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  %addr = call i8** @llvm.coro.raw.frame.ptr.addr()
+  call void @consume(%PackedStruct* %data, i32 %align, i32 %offset, i8** %addr)
+  %0 = call i8 @llvm.coro.suspend(token none, i1 false)
+  switch i8 %0, label %suspend [i8 0, label %resume
+i8 1, label %cleanup]
+resume:
+  br label %cleanup
+
+cleanup:
+  %align2 = call i32 @llvm.coro.align.i32()
+  %offset2 = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  call void @consume2(i32 %align2, i32 %offset2)
+  %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
+  call void @free(i8* %mem)
+  br label %suspend
+suspend:
+  call i1 @llvm.coro.end(i8* %hdl, i1 0)
+  ret i8* %hdl
+}
+
+; See if the raw frame address was inserted into the frame.
+; CHECK-LABEL: %f.Frame = type { void (%f.Frame*)*, void (%f.Frame*)*, i8*, i1, [7 x i8], %PackedStruct }
+
+; See if we used correct index to access frame addr field (field 2).
+; CHECK-LABEL: @f(
+; CHECK: %alloc.frame.ptr = alloca i8*, align 8
+; CHECK: %[[FIELD:.+]] = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
+; CHECK: %[[ADDR:.+]] = load i8*, i8** %alloc.frame.ptr, align 8
+; CHECK: store i8* %[[ADDR]], i8** %[[FIELD]], align 8
+; CHECK: %[[DATA:.+]] = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 5
+; CHECK: call void @consume(%PackedStruct* %[[DATA]], i32 32, i32 16, i8** %[[FIELD]])
+; CHECK: ret i8*
+
+; See if `llvm.coro.align` and `llvm.coro.raw.frame.ptr.offset` are lowered
+; correctly during deallocation.
+; CHECK-LABEL: @f.destroy(
+; CHECK: call void @consume2(i32 32, i32 16)
+; CHECK: call void @free(i8* %{{.*}})
+
+; CHECK-LABEL: @f.cleanup(
+; CHECK: call void @consume2(i32 32, i32 16)
+; CHECK: call void @free(i8*
+
+declare i8* @llvm.coro.free(token, i8*)
+declare i32 @llvm.coro.size.i32()
+declare i32 @llvm.coro.align.i32()
+declare i32 @llvm.coro.raw.frame.ptr.offset.i32()
+declare i8** @llvm.coro.raw.frame.ptr.addr()
+declare i8  @llvm.coro.suspend(token, i1)
+declare void @llvm.coro.resume(i8*)
+declare void @llvm.coro.destroy(i8*)
+
+declare token @llvm.coro.id(i32, i8*, i8*, i8*)
+declare i1 @llvm.coro.alloc(token)
+declare i8* @llvm.coro.begin(token, i8*)
+declare i1 @llvm.coro.end(i8*, i1)
+
+declare noalias i8* @malloc(i32)
+declare double @print(double)
+declare void @free(i8*)
Index: llvm/lib/Transforms/Coroutines/Coroutines.cpp
===
--- llvm/lib/Transforms/Coroutines/Coroutines.cpp
+++ llvm/lib/Transforms/Coroutines/Coroutines.cpp
@@ -234,6 +234,9 @@
   Shape.CoroBegin = nullptr;
   Shape.CoroEnds.clear();
   Shape.CoroSizes.clear();
+  Shape.CoroAligns.clear();
+  Shape.CoroRawFramePtrOffsets.clear();
+  Shape.CoroRawFramePtrAddrs.clear();
   Shape.CoroSuspends.clear();
 
   Shape.FrameTy = nullptr;
@@ -268,6 +271,15 @@
   case Intrinsic::coro_size:
 CoroSizes.push_back(cast(II));
 break;
+  case Intrinsic::coro_align:
+CoroAligns.push_back(cast(II));
+break;
+  case Intrinsic::coro_raw_frame_ptr_offset:
+CoroRawFramePtrOffsets.push_back(cast(II));
+break;
+  case Intrinsic::coro_raw_frame_ptr_addr:
+

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-06-16 Thread Chuanqi Xu via Phabricator via cfe-commits

ChuanqiXu added a comment.

It misses the part in llvm now.




Comment at: clang/lib/CodeGen/CGCoroutine.cpp:437-475
+void emitDynamicAlignedDealloc(CodeGenFunction ,
+   llvm::BasicBlock *AlignedFreeBB,
+   llvm::CallInst *CoroFree) {
+  llvm::CallInst *Dealloc = nullptr;
+  for (llvm::User *U : CoroFree->users()) {
+if (auto *CI = dyn_cast(U))
+  if (CI->getParent() == CGF.Builder.GetInsertBlock())

We don't need this in this patch.



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:477
+
+void emitCheckAlignBasicBlock(CodeGenFunction ,
+  llvm::BasicBlock *CheckAlignBB,

Capitalize `EmitCheckAlignBasicBlock`



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:555-558
+  explicit CallCoroDelete(Stmt *DeallocStmt, Stmt *AlignedDeallocStmt,
+  bool DynamicAlignedDealloc)
+  : Deallocate(DeallocStmt), AlignedDeallocate(AlignedDeallocStmt),
+DynamicAlignedDealloc(DynamicAlignedDealloc) {}

Do we still need this change?



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:743-744
+CGM.getIntrinsic(llvm::Intrinsic::coro_align, SizeTy));
+auto *AlignedUpAddr = EmitBuiltinAlignTo(AlignedAllocateCall, CoroAlign,
+ S.getAlignedAllocate(), true);
+// rawFrame = frame;

ychen wrote:
> ChuanqiXu wrote:
> > Maybe we could calculate it in place instead of trying to call a function 
> > which is not designed for llvm::value*. It looks like the calculation isn't 
> > too complex.
> I'm open to not calling `EmitBuiltinAlignTo`, which basically inline the 
> useful parts of `EmitBuiltinAlignTo`.  The initial intention is code sharing 
> and easy readability. What's the benefit of not calling it?
Reusing code is good. But my main concern is that the design for the 
interfaces. The current design smells bad to me. Another reason for implement 
it in place is I think it is not very complex and easy to understand.

Another option I got is to implement `EmitBuitinAlign` in LLVM (someplace like 
`Local`), then the CodeGenFunction:: EmitBuitinAlign and this function could 
use it.



Comment at: clang/lib/CodeGen/CodeGenFunction.h:1920-1921
 
+  llvm::Value *EmitBuiltinAlignTo(void *Args, const Expr *E, bool AlignUp);
+
 public:

ychen wrote:
> ChuanqiXu wrote:
> > We shouldn't add this interface. The actual type for the first argument is 
> > BuiltinAlignArgs*, which defined in .cpp files. The signature is confusing.
> This is a private function, supposedly only meaningful for the 
> implementation.  In that situation do you think it's bad? 
It makes no sense to me that we can add interfaces casually if it is private. 
For the users of Clang/LLVM, it may be OK since they wouldn't look into the 
details. But for the developers, it matters. For example, I must be very 
confused when I see this signature. Why is the type of `Args` is void*? What's 
the type should I passed in? The smell is really bad.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-06-16 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added inline comments.



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:743-744
+CGM.getIntrinsic(llvm::Intrinsic::coro_align, SizeTy));
+auto *AlignedUpAddr = EmitBuiltinAlignTo(AlignedAllocateCall, CoroAlign,
+ S.getAlignedAllocate(), true);
+// rawFrame = frame;

ChuanqiXu wrote:
> Maybe we could calculate it in place instead of trying to call a function 
> which is not designed for llvm::value*. It looks like the calculation isn't 
> too complex.
I'm open to not calling `EmitBuiltinAlignTo`, which basically inline the useful 
parts of `EmitBuiltinAlignTo`.  The initial intention is code sharing and easy 
readability. What's the benefit of not calling it?



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:753
+  }
+
   EmitBlock(InitBB);

ChuanqiXu wrote:
> Does here miss a branch to InitBB?
`EmitBlock` would handle the case.



Comment at: clang/lib/CodeGen/CodeGenFunction.h:1920-1921
 
+  llvm::Value *EmitBuiltinAlignTo(void *Args, const Expr *E, bool AlignUp);
+
 public:

ChuanqiXu wrote:
> We shouldn't add this interface. The actual type for the first argument is 
> BuiltinAlignArgs*, which defined in .cpp files. The signature is confusing.
This is a private function, supposedly only meaningful for the implementation.  
In that situation do you think it's bad? 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-06-16 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen updated this revision to Diff 352617.
ychen marked 3 inline comments as done.
ychen added a comment.

- Remove AlignedAllocator & AlignedDeallocator.
- Use 'OverAllocateFrame'
- Get rid of `if (HasAlignArg)`


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-cleanup.cpp
  clang/test/CodeGenCoroutines/coro-gro.cpp

Index: clang/test/CodeGenCoroutines/coro-gro.cpp
===
--- clang/test/CodeGenCoroutines/coro-gro.cpp
+++ clang/test/CodeGenCoroutines/coro-gro.cpp
@@ -68,6 +68,7 @@
   // CHECK: call void @_ZNSt12experimental16coroutine_traitsIJiEE12promise_typeD1Ev(
   // CHECK: %[[Mem:.+]] = call i8* @llvm.coro.free(
   // CHECK: call void @_ZdlPv(i8* %[[Mem]])
+  // CHECK: call void @_ZdlPv(i8* %{{.*}})
 
   // Initialize retval from Gro and destroy Gro
 
Index: clang/test/CodeGenCoroutines/coro-cleanup.cpp
===
--- clang/test/CodeGenCoroutines/coro-cleanup.cpp
+++ clang/test/CodeGenCoroutines/coro-cleanup.cpp
@@ -78,12 +78,46 @@
 
   // CHECK: [[Cleanup]]:
   // CHECK: call void @_ZNSt12experimental16coroutine_traitsIJvEE12promise_typeD1Ev(
-  // CHECK: %[[Mem0:.+]] = call i8* @llvm.coro.free(
-  // CHECK: call void @_ZdlPv(i8* %[[Mem0]]
+  // CHECK: %[[MEM0:.+]] = call i8* @llvm.coro.free(
+  // CHECK: br i1 %{{.*}}, label %[[CheckAlignBB:.+]], label %[[Afterwards:.+]]
+
+  // CHECK: [[CheckAlignBB]]:
+  // CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
+  // CHECK: %[[CMP:.+]] = icmp ugt i64 %[[ALIGN]],
+  // CHECK: br i1 %[[CMP]], label %[[AlignedFreeBB:.+]], label %[[FreeBB:.+]]
+
+  // CHECK: [[FreeBB]]:
+  // CHECK: call void @_ZdlPv(i8* %[[MEM0]]
+  // CHECK: br label %[[Afterwards]]
+
+  // CHECK: [[AlignedFreeBB]]:
+  // CHECK-NEXT: %[[OFFSET:.+]] = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  // CHECK-NEXT: %[[ADDR:.+]] = getelementptr inbounds i8, i8* %[[MEM0]], i32 %[[OFFSET]]
+  // CHECK-NEXT: %[[ADDR2:.+]] = bitcast i8* %[[ADDR]] to i8**
+  // CHECK-NEXT: %[[MEM:.+]] = load i8*, i8** %[[ADDR2]], align 8
+  // CHECK-NEXT: call void @_ZdlPv(i8* %[[MEM]])
+  // CHECK-NEXT: br label %[[Afterwards]]
 
   // CHECK: [[Dealloc]]:
-  // CHECK:   %[[Mem:.+]] = call i8* @llvm.coro.free(
-  // CHECK:   call void @_ZdlPv(i8* %[[Mem]])
+  // CHECK: %[[MEM0:.+]] = call i8* @llvm.coro.free(
+  // CHECK: br i1 %{{.*}}, label %[[CheckAlignBB:.+]], label %[[Afterwards:.+]]
+
+  // CHECK: [[CheckAlignBB]]:
+  // CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
+  // CHECK: %[[CMP:.+]] = icmp ugt i64 %[[ALIGN]],
+  // CHECK: br i1 %[[CMP]], label %[[AlignedFreeBB:.+]], label %[[FreeBB:.+]]
+
+  // CHECK: [[FreeBB]]:
+  // CHECK: call void @_ZdlPv(i8* %[[MEM0]]
+  // CHECK: br label %[[Afterwards]]
+
+  // CHECK: [[AlignedFreeBB]]:
+  // CHECK-NEXT: %[[OFFSET:.+]] = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  // CHECK-NEXT: %[[ADDR:.+]] = getelementptr inbounds i8, i8* %[[MEM0]], i32 %[[OFFSET]]
+  // CHECK-NEXT: %[[ADDR2:.+]] = bitcast i8* %[[ADDR]] to i8**
+  // CHECK-NEXT: %[[MEM:.+]] = load i8*, i8** %[[ADDR2]], align 8
+  // CHECK-NEXT: call void @_ZdlPv(i8* %[[MEM]])
+  // CHECK-NEXT: br label %[[Afterwards]]
 
   co_return;
 }
Index: clang/test/CodeGenCoroutines/coro-alloc.cpp
===
--- clang/test/CodeGenCoroutines/coro-alloc.cpp
+++ clang/test/CodeGenCoroutines/coro-alloc.cpp
@@ -57,24 +57,55 @@
 extern "C" void f0(global_new_delete_tag) {
   // CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16
   // CHECK: %[[NeedAlloc:.+]] = call i1 @llvm.coro.alloc(token %[[ID]])
-  // CHECK: br i1 %[[NeedAlloc]], label %[[AllocBB:.+]], label %[[InitBB:.+]]
+  // CHECK: br i1 %[[NeedAlloc]], label %[[CheckAlignBB:.+]], label %[[InitBB:.+]]
+
+  // CHECK: [[CheckAlignBB]]:
+  // CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
+  // CHECK: %[[CMP:.+]] = icmp ugt i64 %[[ALIGN]], 16
+  // CHECK: br i1 %[[CMP]], label %[[AlignAllocBB:.+]], label %[[AllocBB:.+]]
 
   // CHECK: [[AllocBB]]:
+  // CHECK-NEXT: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
+  // CHECK-NEXT: %[[MEM:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[SIZE]])
+  // CHECK-NEXT: br label %[[InitBB:.+]]
+
+  // CHECK: [[AlignAllocBB]]:
   // CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
-  // CHECK: %[[MEM:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[SIZE]])
+  // CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
+  // CHECK: %[[PAD:.+]] = sub nsw i64 %[[ALIGN]], 16
+  // CHECK: %[[NEWSIZE:.+]] = add i64 %[[SIZE]], %[[PAD]]
+  // CHECK: %[[MEM2:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[NEWSIZE]])
+  // CHECK: %[[ALIGN2:.+]] = call i64

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-06-16 Thread Chuanqi Xu via Phabricator via cfe-commits

ChuanqiXu added a comment.

It looks not easy to review completely. After a rough review, the first suggest 
is to remove the contents that should be in 'D102147 
', it makes the complex problem more complex 
I think.
I would try to do more detailed review and test it if possible.




Comment at: clang/include/clang/AST/StmtCXX.h:331-335
+ReturnValue, ///< Return value for thunk function: p.get_return_object().
+ResultDecl,  ///< Declaration holding the result of get_return_object.
+ReturnStmt,  ///< Return statement for the thunk function.
 ReturnStmtOnAllocFailure, ///< Return statement if allocation failed.
 FirstParamMove ///< First offset for move construction of parameter copies.

Minor issue: the intention of the comments should be the same.



Comment at: clang/include/clang/AST/StmtCXX.h:356-359
 Expr *Allocate = nullptr;
 Expr *Deallocate = nullptr;
+Expr *AlignedAllocate = nullptr;
+Expr *AlignedDeallocate = nullptr;

ChuanqiXu wrote:
> ychen wrote:
> > ChuanqiXu wrote:
> > > ychen wrote:
> > > > ChuanqiXu wrote:
> > > > > Can't we merge these?
> > > > I'm not sure about the "merge" here. Could you be more explicit?
> > > Sorry. I mean if we can merge `Allocate` with `AlignedAllocate` and merge 
> > > `Deallocate` with `AlignedDeallocate`. Since from the implementation, it 
> > > looks like the value of `Allocate` and `AlignedAllocate ` (so as 
> > > `Deallocate` and `AlignedDeallocate`) are the same.
> > Oh, this is to set the path for D102147 where `Allocate` and 
> > `AlignedAllocate` could be different. If I do this in D102147, it will also 
> > touch the `CGCoroutine.cpp` which I'm trying to avoid` since it is intended 
> > to be a Sema only patch.
> Yeah, this is the key different point between us. I think that `D102147` 
> could and should to touch the CodeGen part.
As we discussed before, I prefer to merge `Allocate` and `AlignedAllocate` 
(also Deallocate and AlignedDeallocate ) in this patch. It looks weird the are 
the same in one commit.



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:420
+
+void overAllocateFrame(CodeGenFunction , llvm::CallInst *CI, bool IsAlloc) 
{
+  unsigned CoroSizeIdx = IsAlloc ? 0 : 1;

We should capitalize it as 'OverAllocateFrame'



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:701-706
+  if (HasAlignArg) {
+if (S.getReturnStmtOnAllocFailure()) {
+  auto *Cond = Builder.CreateICmpNE(AlignedAllocateCall, NullPtr);
+  Builder.CreateCondBr(Cond, InitBB, RetOnFailureBB);
+}
+  } else {

`if (HasAlignArg)` should be the content of the next patch 'D102147', right? I 
don't think they should come here.



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:743-744
+CGM.getIntrinsic(llvm::Intrinsic::coro_align, SizeTy));
+auto *AlignedUpAddr = EmitBuiltinAlignTo(AlignedAllocateCall, CoroAlign,
+ S.getAlignedAllocate(), true);
+// rawFrame = frame;

Maybe we could calculate it in place instead of trying to call a function which 
is not designed for llvm::value*. It looks like the calculation isn't too 
complex.



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:753
+  }
+
   EmitBlock(InitBB);

Does here miss a branch to InitBB?



Comment at: clang/lib/CodeGen/CodeGenFunction.h:1920-1921
 
+  llvm::Value *EmitBuiltinAlignTo(void *Args, const Expr *E, bool AlignUp);
+
 public:

We shouldn't add this interface. The actual type for the first argument is 
BuiltinAlignArgs*, which defined in .cpp files. The signature is confusing.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-06-15 Thread Chuanqi Xu via Phabricator via cfe-commits

ChuanqiXu added a comment.

In D97915#2819871 , @ychen wrote:

> - Merge D102145  by @ChuanqiXu's request.

Thanks a lot. Could you add me as a reviewer? So that I can see this patch in 
my main page.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-06-15 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen updated this revision to Diff 352189.
ychen added a comment.

- Merge D102145  by @ChuanqiXu's request.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

Files:
  clang/include/clang/AST/StmtCXX.h
  clang/lib/AST/StmtCXX.cpp
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/lib/Sema/SemaCoroutine.cpp
  clang/lib/Sema/TreeTransform.h
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-cleanup.cpp
  clang/test/CodeGenCoroutines/coro-gro.cpp
  llvm/docs/Coroutines.rst
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/Transforms/Coroutines/CoroFrame.cpp
  llvm/lib/Transforms/Coroutines/CoroInstr.h
  llvm/lib/Transforms/Coroutines/CoroInternal.h
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp
  llvm/lib/Transforms/Coroutines/Coroutines.cpp
  llvm/test/Transforms/Coroutines/coro-frame-overalign.ll

Index: llvm/test/Transforms/Coroutines/coro-frame-overalign.ll
===
--- /dev/null
+++ llvm/test/Transforms/Coroutines/coro-frame-overalign.ll
@@ -0,0 +1,78 @@
+; Check that `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and
+; `@llvm.coro.raw.frame.ptr.alloca` are lowered correctly.
+; RUN: opt < %s -passes=coro-split -S | FileCheck %s
+
+%PackedStruct = type <{ i64 }>
+
+declare void @consume(%PackedStruct*, i32, i32, i8**)
+declare void @consume2(i32, i32)
+
+define i8* @f() "coroutine.presplit"="1" {
+entry:
+  %data = alloca %PackedStruct, align 32
+  %id = call token @llvm.coro.id(i32 16, i8* null, i8* null, i8* null)
+  %size = call i32 @llvm.coro.size.i32()
+  %alloc = call i8* @malloc(i32 %size)
+  %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
+  %align = call i32 @llvm.coro.align.i32()
+  %offset = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  %addr = call i8** @llvm.coro.raw.frame.ptr.addr()
+  call void @consume(%PackedStruct* %data, i32 %align, i32 %offset, i8** %addr)
+  %0 = call i8 @llvm.coro.suspend(token none, i1 false)
+  switch i8 %0, label %suspend [i8 0, label %resume
+i8 1, label %cleanup]
+resume:
+  br label %cleanup
+
+cleanup:
+  %align2 = call i32 @llvm.coro.align.i32()
+  %offset2 = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  call void @consume2(i32 %align2, i32 %offset2)
+  %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
+  call void @free(i8* %mem)
+  br label %suspend
+suspend:
+  call i1 @llvm.coro.end(i8* %hdl, i1 0)
+  ret i8* %hdl
+}
+
+; See if the raw frame address was inserted into the frame.
+; CHECK-LABEL: %f.Frame = type { void (%f.Frame*)*, void (%f.Frame*)*, i8*, i1, [7 x i8], %PackedStruct }
+
+; See if we used correct index to access frame addr field (field 2).
+; CHECK-LABEL: @f(
+; CHECK: %alloc.frame.ptr = alloca i8*, align 8
+; CHECK: %[[FIELD:.+]] = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
+; CHECK: %[[ADDR:.+]] = load i8*, i8** %alloc.frame.ptr, align 8
+; CHECK: store i8* %[[ADDR]], i8** %[[FIELD]], align 8
+; CHECK: %[[DATA:.+]] = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 5
+; CHECK: call void @consume(%PackedStruct* %[[DATA]], i32 32, i32 16, i8** %[[FIELD]])
+; CHECK: ret i8*
+
+; See if `llvm.coro.align` and `llvm.coro.raw.frame.ptr.offset` are lowered
+; correctly during deallocation.
+; CHECK-LABEL: @f.destroy(
+; CHECK: call void @consume2(i32 32, i32 16)
+; CHECK: call void @free(i8* %{{.*}})
+
+; CHECK-LABEL: @f.cleanup(
+; CHECK: call void @consume2(i32 32, i32 16)
+; CHECK: call void @free(i8*
+
+declare i8* @llvm.coro.free(token, i8*)
+declare i32 @llvm.coro.size.i32()
+declare i32 @llvm.coro.align.i32()
+declare i32 @llvm.coro.raw.frame.ptr.offset.i32()
+declare i8** @llvm.coro.raw.frame.ptr.addr()
+declare i8  @llvm.coro.suspend(token, i1)
+declare void @llvm.coro.resume(i8*)
+declare void @llvm.coro.destroy(i8*)
+
+declare token @llvm.coro.id(i32, i8*, i8*, i8*)
+declare i1 @llvm.coro.alloc(token)
+declare i8* @llvm.coro.begin(token, i8*)
+declare i1 @llvm.coro.end(i8*, i1)
+
+declare noalias i8* @malloc(i32)
+declare double @print(double)
+declare void @free(i8*)
Index: llvm/lib/Transforms/Coroutines/Coroutines.cpp
===
--- llvm/lib/Transforms/Coroutines/Coroutines.cpp
+++ llvm/lib/Transforms/Coroutines/Coroutines.cpp
@@ -234,6 +234,9 @@
   Shape.CoroBegin = nullptr;
   Shape.CoroEnds.clear();
   Shape.CoroSizes.clear();
+  Shape.CoroAligns.clear();
+  Shape.CoroRawFramePtrOffsets.clear();
+  Shape.CoroRawFramePtrAddrs.clear();
   Shape.CoroSuspends.clear();
 
   Shape.FrameTy = nullptr;
@@ -268,6 +271,15 @@
   case Intrinsic::coro_size:
 CoroSizes.push_back(cast(II));
 break;
+  case Intrinsic::coro_align:
+CoroAligns.push_back(cast(II));
+break;
+  case

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-05-13 Thread Chuanqi Xu via Phabricator via cfe-commits

ChuanqiXu added inline comments.



Comment at: clang/include/clang/AST/StmtCXX.h:356-359
 Expr *Allocate = nullptr;
 Expr *Deallocate = nullptr;
+Expr *AlignedAllocate = nullptr;
+Expr *AlignedDeallocate = nullptr;

ychen wrote:
> ChuanqiXu wrote:
> > ychen wrote:
> > > ChuanqiXu wrote:
> > > > Can't we merge these?
> > > I'm not sure about the "merge" here. Could you be more explicit?
> > Sorry. I mean if we can merge `Allocate` with `AlignedAllocate` and merge 
> > `Deallocate` with `AlignedDeallocate`. Since from the implementation, it 
> > looks like the value of `Allocate` and `AlignedAllocate ` (so as 
> > `Deallocate` and `AlignedDeallocate`) are the same.
> Oh, this is to set the path for D102147 where `Allocate` and 
> `AlignedAllocate` could be different. If I do this in D102147, it will also 
> touch the `CGCoroutine.cpp` which I'm trying to avoid` since it is intended 
> to be a Sema only patch.
Yeah, this is the key different point between us. I think that `D102147` could 
and should to touch the CodeGen part.



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:528-547
 CGF.EmitBlock(FreeBB);
 CGF.EmitStmt(Deallocate);
-
-auto *AfterFreeBB = CGF.createBasicBlock("after.coro.free");
-CGF.EmitBlock(AfterFreeBB);
+CGF.Builder.CreateBr(AfterFreeBB);
 
 // We should have captured coro.free from the emission of deallocate.
 auto *CoroFree = CGF.CurCoro.Data->LastCoroFree;
+CGF.CurCoro.Data->LastCoroFreeUsedForDealloc = true;

ychen wrote:
> ychen wrote:
> > ChuanqiXu wrote:
> > > ychen wrote:
> > > > ChuanqiXu wrote:
> > > > > It looks like it would emit a `deallocate` first, and emit an 
> > > > > `alignedDeallocate`, which is very odd. Although I can find that the 
> > > > > second `deallocate` wouldn't be emitted due to the check 
> > > > > `LastCoroFreeUsedForDealloc`, it is still very odd to me. If the 
> > > > > second `deallocate` wouldn't come up all the way, what's the reason 
> > > > > we need to write `emit(deallocate)` twice?
> > > > Agree that `LastCoroFreeUsedForDealloc` is a bit confusing. It makes 
> > > > sure deallocation and aligned deallocation share one `coro.free`. 
> > > > Otherwise, AFAIK, there would be two `coro.free` get codegen'd.
> > > > ```
> > > > %mem = llvm.coro.free()
> > > > br i1  , label , label 
> > > > 
> > > > aligend-dealloc:
> > > > use %mem
> > > > 
> > > > dealloc:
> > > > use %mem
> > > > ```
> > > > 
> > > > 
> > > > > what's the reason we need to write emit(deallocate) twice?
> > > > John wrote a code snippet here: 
> > > > https://reviews.llvm.org/D100739#2717582. I think it would be helpful 
> > > > to look at the changed tests below to see the patterns.
> > > > 
> > > > Basically, for allocation, it looks like below; for deallocation, it 
> > > > would be similar.
> > > > ```
> > > > void *rawFrame =nullptr;
> > > > ...
> > > > if (llvm.coro.alloc()) {
> > > >   size_t size = llvm.coro.size(), align = llvm.coro.align();
> > > >   if (align > NEW_ALIGN) {
> > > > #if  > > > selected by Sema>
> > > > size += align - NEW_ALIGN + sizeof(void*);
> > > > frame = operator new(size);
> > > > rawFrame = frame;
> > > > frame = (frame + align - 1) & ~(align - 1);
> > > > #else
> > > > // If an aligned allocation function is selected.
> > > > frame = operator new(size, align);
> > > > #endif
> > > >   } else {
> > > > frame = operator new(size);
> > > >   }
> > > > }
> > > > ```
> > > > The true branch of the #if directive is equivalent to 
> > > > "coro.alloc.align" block (and "coro.alloc.align2" if 
> > > > `get_return_object_on_allocation_failure` is defined), the false branch 
> > > > is equivalent to "coro.alloc" block.
> > > > The above pattern handles both aligned/normal allocation/deallocation 
> > > > so it is independent of D102147.
> > > Thanks. I get the reason why I am thinking the code isn't natural. Since 
> > > I think `::operator new(size_t, align_val_t)` shouldn't come up in this 
> > > patch which should be available after D102147 applies. Here you said this 
> > > patch is independent with D102147, I believe this patch could work 
> > > without D102147. But it contains the codes which would work only if we 
> > > applies the successor patch, so I think it is dependent on D102147.
> > > 
> > > The ideally relationship for me is to merge `D102145 ` into this one 
> > > (Otherwise it is weird for me that `D102145` only introduces some 
> > > intrinsics which wouldn't be used actually). Then this patch should 
> > > handle the alignment for variables in coroutine frame without introducing 
> > > `::new(size_t, align_val_t)`. Then the final patch could do the job that 
> > > searching and generating code for `::new(size_t, align_val_t)`.
> > > 
> > > Maybe it is a little bit hard to rebase again and again. But I think it 
> > > is better.
> > I think I know where the confusion comes from.

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-05-13 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added inline comments.



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:528-547
 CGF.EmitBlock(FreeBB);
 CGF.EmitStmt(Deallocate);
-
-auto *AfterFreeBB = CGF.createBasicBlock("after.coro.free");
-CGF.EmitBlock(AfterFreeBB);
+CGF.Builder.CreateBr(AfterFreeBB);
 
 // We should have captured coro.free from the emission of deallocate.
 auto *CoroFree = CGF.CurCoro.Data->LastCoroFree;
+CGF.CurCoro.Data->LastCoroFreeUsedForDealloc = true;

ychen wrote:
> ChuanqiXu wrote:
> > ychen wrote:
> > > ChuanqiXu wrote:
> > > > It looks like it would emit a `deallocate` first, and emit an 
> > > > `alignedDeallocate`, which is very odd. Although I can find that the 
> > > > second `deallocate` wouldn't be emitted due to the check 
> > > > `LastCoroFreeUsedForDealloc`, it is still very odd to me. If the second 
> > > > `deallocate` wouldn't come up all the way, what's the reason we need to 
> > > > write `emit(deallocate)` twice?
> > > Agree that `LastCoroFreeUsedForDealloc` is a bit confusing. It makes sure 
> > > deallocation and aligned deallocation share one `coro.free`. Otherwise, 
> > > AFAIK, there would be two `coro.free` get codegen'd.
> > > ```
> > > %mem = llvm.coro.free()
> > > br i1  , label , label 
> > > 
> > > aligend-dealloc:
> > > use %mem
> > > 
> > > dealloc:
> > > use %mem
> > > ```
> > > 
> > > 
> > > > what's the reason we need to write emit(deallocate) twice?
> > > John wrote a code snippet here: https://reviews.llvm.org/D100739#2717582. 
> > > I think it would be helpful to look at the changed tests below to see the 
> > > patterns.
> > > 
> > > Basically, for allocation, it looks like below; for deallocation, it 
> > > would be similar.
> > > ```
> > > void *rawFrame =nullptr;
> > > ...
> > > if (llvm.coro.alloc()) {
> > >   size_t size = llvm.coro.size(), align = llvm.coro.align();
> > >   if (align > NEW_ALIGN) {
> > > #if  > > by Sema>
> > > size += align - NEW_ALIGN + sizeof(void*);
> > > frame = operator new(size);
> > > rawFrame = frame;
> > > frame = (frame + align - 1) & ~(align - 1);
> > > #else
> > > // If an aligned allocation function is selected.
> > > frame = operator new(size, align);
> > > #endif
> > >   } else {
> > > frame = operator new(size);
> > >   }
> > > }
> > > ```
> > > The true branch of the #if directive is equivalent to "coro.alloc.align" 
> > > block (and "coro.alloc.align2" if 
> > > `get_return_object_on_allocation_failure` is defined), the false branch 
> > > is equivalent to "coro.alloc" block.
> > > The above pattern handles both aligned/normal allocation/deallocation so 
> > > it is independent of D102147.
> > Thanks. I get the reason why I am thinking the code isn't natural. Since I 
> > think `::operator new(size_t, align_val_t)` shouldn't come up in this patch 
> > which should be available after D102147 applies. Here you said this patch 
> > is independent with D102147, I believe this patch could work without 
> > D102147. But it contains the codes which would work only if we applies the 
> > successor patch, so I think it is dependent on D102147.
> > 
> > The ideally relationship for me is to merge `D102145 ` into this one 
> > (Otherwise it is weird for me that `D102145` only introduces some 
> > intrinsics which wouldn't be used actually). Then this patch should handle 
> > the alignment for variables in coroutine frame without introducing 
> > `::new(size_t, align_val_t)`. Then the final patch could do the job that 
> > searching and generating code for `::new(size_t, align_val_t)`.
> > 
> > Maybe it is a little bit hard to rebase again and again. But I think it is 
> > better.
> I think I know where the confusion comes from. `AlignedDeallocate` is not 
> guaranteed to be an aligned allocator. In this patch in `SemaCoroutine.cpp`, 
> it is set to `Deallocate` in which case we always dynamically adjust frame 
> alignment. Once D102147 is landed. `AlignedDeallocate` may or may not be an 
> aligned allocator.
> 
> > The ideally relationship for me is to merge D102145  into this one 
> > (Otherwise it is weird for me that D102145 only introduces some intrinsics 
> > which wouldn't be used actually). Then this patch should handle the 
> > alignment for variables in coroutine frame without introducing 
> > ::new(size_t, align_val_t). Then the final patch could do the job that 
> > searching and generating code for ::new(size_t, align_val_t).
> 
> I was worried about the size of the patch if this is merged with D102145 but 
> if that is preferred by more than one reviewer, I'll go ahead and do that. 
> D102145 is pretty self-contained in that it does not contain clients of the 
> added intrinsics but the introduced test should cover the expected intrinsic 
> lowering.
> 
Naming is hard. I had a hard time figuring out a better name. 
`AlignedDeallocate`/`AlignedAllocate` is intended to refer to 
allocator/deallocator used for handling overaligned frame. Not

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-05-13 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added inline comments.



Comment at: clang/include/clang/AST/StmtCXX.h:356-359
 Expr *Allocate = nullptr;
 Expr *Deallocate = nullptr;
+Expr *AlignedAllocate = nullptr;
+Expr *AlignedDeallocate = nullptr;

ChuanqiXu wrote:
> ychen wrote:
> > ChuanqiXu wrote:
> > > Can't we merge these?
> > I'm not sure about the "merge" here. Could you be more explicit?
> Sorry. I mean if we can merge `Allocate` with `AlignedAllocate` and merge 
> `Deallocate` with `AlignedDeallocate`. Since from the implementation, it 
> looks like the value of `Allocate` and `AlignedAllocate ` (so as `Deallocate` 
> and `AlignedDeallocate`) are the same.
Oh, this is to set the path for D102147 where `Allocate` and `AlignedAllocate` 
could be different. If I do this in D102147, it will also touch the 
`CGCoroutine.cpp` which I'm trying to avoid` since it is intended to be a Sema 
only patch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-05-13 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added inline comments.



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:528-547
 CGF.EmitBlock(FreeBB);
 CGF.EmitStmt(Deallocate);
-
-auto *AfterFreeBB = CGF.createBasicBlock("after.coro.free");
-CGF.EmitBlock(AfterFreeBB);
+CGF.Builder.CreateBr(AfterFreeBB);
 
 // We should have captured coro.free from the emission of deallocate.
 auto *CoroFree = CGF.CurCoro.Data->LastCoroFree;
+CGF.CurCoro.Data->LastCoroFreeUsedForDealloc = true;

ChuanqiXu wrote:
> ychen wrote:
> > ChuanqiXu wrote:
> > > It looks like it would emit a `deallocate` first, and emit an 
> > > `alignedDeallocate`, which is very odd. Although I can find that the 
> > > second `deallocate` wouldn't be emitted due to the check 
> > > `LastCoroFreeUsedForDealloc`, it is still very odd to me. If the second 
> > > `deallocate` wouldn't come up all the way, what's the reason we need to 
> > > write `emit(deallocate)` twice?
> > Agree that `LastCoroFreeUsedForDealloc` is a bit confusing. It makes sure 
> > deallocation and aligned deallocation share one `coro.free`. Otherwise, 
> > AFAIK, there would be two `coro.free` get codegen'd.
> > ```
> > %mem = llvm.coro.free()
> > br i1  , label , label 
> > 
> > aligend-dealloc:
> > use %mem
> > 
> > dealloc:
> > use %mem
> > ```
> > 
> > 
> > > what's the reason we need to write emit(deallocate) twice?
> > John wrote a code snippet here: https://reviews.llvm.org/D100739#2717582. I 
> > think it would be helpful to look at the changed tests below to see the 
> > patterns.
> > 
> > Basically, for allocation, it looks like below; for deallocation, it would 
> > be similar.
> > ```
> > void *rawFrame =nullptr;
> > ...
> > if (llvm.coro.alloc()) {
> >   size_t size = llvm.coro.size(), align = llvm.coro.align();
> >   if (align > NEW_ALIGN) {
> > #if  > by Sema>
> > size += align - NEW_ALIGN + sizeof(void*);
> > frame = operator new(size);
> > rawFrame = frame;
> > frame = (frame + align - 1) & ~(align - 1);
> > #else
> > // If an aligned allocation function is selected.
> > frame = operator new(size, align);
> > #endif
> >   } else {
> > frame = operator new(size);
> >   }
> > }
> > ```
> > The true branch of the #if directive is equivalent to "coro.alloc.align" 
> > block (and "coro.alloc.align2" if `get_return_object_on_allocation_failure` 
> > is defined), the false branch is equivalent to "coro.alloc" block.
> > The above pattern handles both aligned/normal allocation/deallocation so it 
> > is independent of D102147.
> Thanks. I get the reason why I am thinking the code isn't natural. Since I 
> think `::operator new(size_t, align_val_t)` shouldn't come up in this patch 
> which should be available after D102147 applies. Here you said this patch is 
> independent with D102147, I believe this patch could work without D102147. 
> But it contains the codes which would work only if we applies the successor 
> patch, so I think it is dependent on D102147.
> 
> The ideally relationship for me is to merge `D102145 ` into this one 
> (Otherwise it is weird for me that `D102145` only introduces some intrinsics 
> which wouldn't be used actually). Then this patch should handle the alignment 
> for variables in coroutine frame without introducing `::new(size_t, 
> align_val_t)`. Then the final patch could do the job that searching and 
> generating code for `::new(size_t, align_val_t)`.
> 
> Maybe it is a little bit hard to rebase again and again. But I think it is 
> better.
I think I know where the confusion comes from. `AlignedDeallocate` is not 
guaranteed to be an aligned allocator. In this patch in `SemaCoroutine.cpp`, it 
is set to `Deallocate` in which case we always dynamically adjust frame 
alignment. Once D102147 is landed. `AlignedDeallocate` may or may not be an 
aligned allocator.

> The ideally relationship for me is to merge D102145  into this one (Otherwise 
> it is weird for me that D102145 only introduces some intrinsics which 
> wouldn't be used actually). Then this patch should handle the alignment for 
> variables in coroutine frame without introducing ::new(size_t, align_val_t). 
> Then the final patch could do the job that searching and generating code for 
> ::new(size_t, align_val_t).

I was worried about the size of the patch if this is merged with D102145 but if 
that is preferred by more than one reviewer, I'll go ahead and do that. D102145 
is pretty self-contained in that it does not contain clients of the added 
intrinsics but the introduced test should cover the expected intrinsic lowering.



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-05-13 Thread Chuanqi Xu via Phabricator via cfe-commits

ChuanqiXu added inline comments.



Comment at: clang/include/clang/AST/StmtCXX.h:356-359
 Expr *Allocate = nullptr;
 Expr *Deallocate = nullptr;
+Expr *AlignedAllocate = nullptr;
+Expr *AlignedDeallocate = nullptr;

ychen wrote:
> ChuanqiXu wrote:
> > Can't we merge these?
> I'm not sure about the "merge" here. Could you be more explicit?
Sorry. I mean if we can merge `Allocate` with `AlignedAllocate` and merge 
`Deallocate` with `AlignedDeallocate`. Since from the implementation, it looks 
like the value of `Allocate` and `AlignedAllocate ` (so as `Deallocate` and 
`AlignedDeallocate`) are the same.



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:528-547
 CGF.EmitBlock(FreeBB);
 CGF.EmitStmt(Deallocate);
-
-auto *AfterFreeBB = CGF.createBasicBlock("after.coro.free");
-CGF.EmitBlock(AfterFreeBB);
+CGF.Builder.CreateBr(AfterFreeBB);
 
 // We should have captured coro.free from the emission of deallocate.
 auto *CoroFree = CGF.CurCoro.Data->LastCoroFree;
+CGF.CurCoro.Data->LastCoroFreeUsedForDealloc = true;

ychen wrote:
> ChuanqiXu wrote:
> > It looks like it would emit a `deallocate` first, and emit an 
> > `alignedDeallocate`, which is very odd. Although I can find that the second 
> > `deallocate` wouldn't be emitted due to the check 
> > `LastCoroFreeUsedForDealloc`, it is still very odd to me. If the second 
> > `deallocate` wouldn't come up all the way, what's the reason we need to 
> > write `emit(deallocate)` twice?
> Agree that `LastCoroFreeUsedForDealloc` is a bit confusing. It makes sure 
> deallocation and aligned deallocation share one `coro.free`. Otherwise, 
> AFAIK, there would be two `coro.free` get codegen'd.
> ```
> %mem = llvm.coro.free()
> br i1  , label , label 
> 
> aligend-dealloc:
> use %mem
> 
> dealloc:
> use %mem
> ```
> 
> 
> > what's the reason we need to write emit(deallocate) twice?
> John wrote a code snippet here: https://reviews.llvm.org/D100739#2717582. I 
> think it would be helpful to look at the changed tests below to see the 
> patterns.
> 
> Basically, for allocation, it looks like below; for deallocation, it would be 
> similar.
> ```
> void *rawFrame =nullptr;
> ...
> if (llvm.coro.alloc()) {
>   size_t size = llvm.coro.size(), align = llvm.coro.align();
>   if (align > NEW_ALIGN) {
> #if  Sema>
> size += align - NEW_ALIGN + sizeof(void*);
> frame = operator new(size);
> rawFrame = frame;
> frame = (frame + align - 1) & ~(align - 1);
> #else
> // If an aligned allocation function is selected.
> frame = operator new(size, align);
> #endif
>   } else {
> frame = operator new(size);
>   }
> }
> ```
> The true branch of the #if directive is equivalent to "coro.alloc.align" 
> block (and "coro.alloc.align2" if `get_return_object_on_allocation_failure` 
> is defined), the false branch is equivalent to "coro.alloc" block.
> The above pattern handles both aligned/normal allocation/deallocation so it 
> is independent of D102147.
Thanks. I get the reason why I am thinking the code isn't natural. Since I 
think `::operator new(size_t, align_val_t)` shouldn't come up in this patch 
which should be available after D102147 applies. Here you said this patch is 
independent with D102147, I believe this patch could work without D102147. But 
it contains the codes which would work only if we applies the successor patch, 
so I think it is dependent on D102147.

The ideally relationship for me is to merge `D102145 ` into this one (Otherwise 
it is weird for me that `D102145` only introduces some intrinsics which 
wouldn't be used actually). Then this patch should handle the alignment for 
variables in coroutine frame without introducing `::new(size_t, align_val_t)`. 
Then the final patch could do the job that searching and generating code for 
`::new(size_t, align_val_t)`.

Maybe it is a little bit hard to rebase again and again. But I think it is 
better.



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:878
+if (CurCoro.Data && CurCoro.Data->LastCoroFreeUsedForDealloc)
+  return RValue::get(CurCoro.Data->LastCoroFree);
+

ychen wrote:
> ChuanqiXu wrote:
> > Is it possible that it would return a nullptr value?
> Not that I know of. Because there is an early return 
> ```
> if (!CoroFree) {
>   CGF.CGM.Error(Deallocate->getBeginLoc(),
> "Deallocation expressoin does not refer to coro.free");
>   return;
> }
> ```
Do you think it is better to merge this check here?

```
 if (CurCoro.Data && CurCoro.Data->LastCoroFreeUsedForDealloc) {
 if (!CoroFree) {
  CGF.CGM.Error(Deallocate->getBeginLoc(),
"Deallocation expressoin does not refer to coro.free");
  return something;
 }
return RValue::get(CurCoro.Data->LastCoroFree);
}
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-05-12 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen updated this revision to Diff 345053.
ychen marked an inline comment as done.
ychen added a comment.

- Address feedbacks.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

Files:
  clang/include/clang/AST/StmtCXX.h
  clang/lib/AST/StmtCXX.cpp
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/lib/Sema/SemaCoroutine.cpp
  clang/lib/Sema/TreeTransform.h
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-cleanup.cpp
  clang/test/CodeGenCoroutines/coro-gro.cpp

Index: clang/test/CodeGenCoroutines/coro-gro.cpp
===
--- clang/test/CodeGenCoroutines/coro-gro.cpp
+++ clang/test/CodeGenCoroutines/coro-gro.cpp
@@ -68,6 +68,7 @@
   // CHECK: call void @_ZNSt12experimental16coroutine_traitsIJiEE12promise_typeD1Ev(
   // CHECK: %[[Mem:.+]] = call i8* @llvm.coro.free(
   // CHECK: call void @_ZdlPv(i8* %[[Mem]])
+  // CHECK: call void @_ZdlPv(i8* %{{.*}})
 
   // Initialize retval from Gro and destroy Gro
 
Index: clang/test/CodeGenCoroutines/coro-cleanup.cpp
===
--- clang/test/CodeGenCoroutines/coro-cleanup.cpp
+++ clang/test/CodeGenCoroutines/coro-cleanup.cpp
@@ -78,12 +78,46 @@
 
   // CHECK: [[Cleanup]]:
   // CHECK: call void @_ZNSt12experimental16coroutine_traitsIJvEE12promise_typeD1Ev(
-  // CHECK: %[[Mem0:.+]] = call i8* @llvm.coro.free(
-  // CHECK: call void @_ZdlPv(i8* %[[Mem0]]
+  // CHECK: %[[MEM0:.+]] = call i8* @llvm.coro.free(
+  // CHECK: br i1 %{{.*}}, label %[[CheckAlignBB:.+]], label %[[Afterwards:.+]]
+
+  // CHECK: [[CheckAlignBB]]:
+  // CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
+  // CHECK: %[[CMP:.+]] = icmp ugt i64 %[[ALIGN]],
+  // CHECK: br i1 %[[CMP]], label %[[AlignedFreeBB:.+]], label %[[FreeBB:.+]]
+
+  // CHECK: [[FreeBB]]:
+  // CHECK: call void @_ZdlPv(i8* %[[MEM0]]
+  // CHECK: br label %[[Afterwards]]
+
+  // CHECK: [[AlignedFreeBB]]:
+  // CHECK-NEXT: %[[OFFSET:.+]] = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  // CHECK-NEXT: %[[ADDR:.+]] = getelementptr inbounds i8, i8* %[[MEM0]], i32 %[[OFFSET]]
+  // CHECK-NEXT: %[[ADDR2:.+]] = bitcast i8* %[[ADDR]] to i8**
+  // CHECK-NEXT: %[[MEM:.+]] = load i8*, i8** %[[ADDR2]], align 8
+  // CHECK-NEXT: call void @_ZdlPv(i8* %[[MEM]])
+  // CHECK-NEXT: br label %[[Afterwards]]
 
   // CHECK: [[Dealloc]]:
-  // CHECK:   %[[Mem:.+]] = call i8* @llvm.coro.free(
-  // CHECK:   call void @_ZdlPv(i8* %[[Mem]])
+  // CHECK: %[[MEM0:.+]] = call i8* @llvm.coro.free(
+  // CHECK: br i1 %{{.*}}, label %[[CheckAlignBB:.+]], label %[[Afterwards:.+]]
+
+  // CHECK: [[CheckAlignBB]]:
+  // CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
+  // CHECK: %[[CMP:.+]] = icmp ugt i64 %[[ALIGN]],
+  // CHECK: br i1 %[[CMP]], label %[[AlignedFreeBB:.+]], label %[[FreeBB:.+]]
+
+  // CHECK: [[FreeBB]]:
+  // CHECK: call void @_ZdlPv(i8* %[[MEM0]]
+  // CHECK: br label %[[Afterwards]]
+
+  // CHECK: [[AlignedFreeBB]]:
+  // CHECK-NEXT: %[[OFFSET:.+]] = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  // CHECK-NEXT: %[[ADDR:.+]] = getelementptr inbounds i8, i8* %[[MEM0]], i32 %[[OFFSET]]
+  // CHECK-NEXT: %[[ADDR2:.+]] = bitcast i8* %[[ADDR]] to i8**
+  // CHECK-NEXT: %[[MEM:.+]] = load i8*, i8** %[[ADDR2]], align 8
+  // CHECK-NEXT: call void @_ZdlPv(i8* %[[MEM]])
+  // CHECK-NEXT: br label %[[Afterwards]]
 
   co_return;
 }
Index: clang/test/CodeGenCoroutines/coro-alloc.cpp
===
--- clang/test/CodeGenCoroutines/coro-alloc.cpp
+++ clang/test/CodeGenCoroutines/coro-alloc.cpp
@@ -57,24 +57,55 @@
 extern "C" void f0(global_new_delete_tag) {
   // CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16
   // CHECK: %[[NeedAlloc:.+]] = call i1 @llvm.coro.alloc(token %[[ID]])
-  // CHECK: br i1 %[[NeedAlloc]], label %[[AllocBB:.+]], label %[[InitBB:.+]]
+  // CHECK: br i1 %[[NeedAlloc]], label %[[CheckAlignBB:.+]], label %[[InitBB:.+]]
+
+  // CHECK: [[CheckAlignBB]]:
+  // CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
+  // CHECK: %[[CMP:.+]] = icmp ugt i64 %[[ALIGN]], 16
+  // CHECK: br i1 %[[CMP]], label %[[AlignAllocBB:.+]], label %[[AllocBB:.+]]
 
   // CHECK: [[AllocBB]]:
+  // CHECK-NEXT: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
+  // CHECK-NEXT: %[[MEM:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[SIZE]])
+  // CHECK-NEXT: br label %[[InitBB:.+]]
+
+  // CHECK: [[AlignAllocBB]]:
   // CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
-  // CHECK: %[[MEM:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[SIZE]])
+  // CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
+  // CHECK: %[[PAD:.+]] = sub nsw i64 %[[ALIGN]], 16
+  // CHECK: %[[NEWSIZE:.+]] = add i64 %[[SIZE]], %[[PAD]]
+  // CHECK: %[[MEM2:.+]] = call noalias nonnull i8* @_Znwm(i64

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-05-12 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen marked an inline comment as done.
ychen added inline comments.



Comment at: clang/include/clang/AST/StmtCXX.h:356-359
 Expr *Allocate = nullptr;
 Expr *Deallocate = nullptr;
+Expr *AlignedAllocate = nullptr;
+Expr *AlignedDeallocate = nullptr;

ChuanqiXu wrote:
> Can't we merge these?
I'm not sure about the "merge" here. Could you be more explicit?



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:445-483
+void emitDynamicAlignedDealloc(CodeGenFunction ,
+   llvm::BasicBlock *AlignedFreeBB,
+   llvm::CallInst *CoroFree) {
+  llvm::CallInst *Dealloc = nullptr;
+  for (llvm::User *U : CoroFree->users()) {
+if (auto *CI = dyn_cast(U))
+  if (CI->getParent() == CGF.Builder.GetInsertBlock())

ChuanqiXu wrote:
> This code would only work if we use `::operator new(size_t, align_val_t)`, 
> which is implemented in another patch. I would suggest to move this into that 
> one.
It handles both aligned and normal new/delete.



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:528-547
 CGF.EmitBlock(FreeBB);
 CGF.EmitStmt(Deallocate);
-
-auto *AfterFreeBB = CGF.createBasicBlock("after.coro.free");
-CGF.EmitBlock(AfterFreeBB);
+CGF.Builder.CreateBr(AfterFreeBB);
 
 // We should have captured coro.free from the emission of deallocate.
 auto *CoroFree = CGF.CurCoro.Data->LastCoroFree;
+CGF.CurCoro.Data->LastCoroFreeUsedForDealloc = true;

ChuanqiXu wrote:
> It looks like it would emit a `deallocate` first, and emit an 
> `alignedDeallocate`, which is very odd. Although I can find that the second 
> `deallocate` wouldn't be emitted due to the check 
> `LastCoroFreeUsedForDealloc`, it is still very odd to me. If the second 
> `deallocate` wouldn't come up all the way, what's the reason we need to write 
> `emit(deallocate)` twice?
Agree that `LastCoroFreeUsedForDealloc` is a bit confusing. It makes sure 
deallocation and aligned deallocation share one `coro.free`. Otherwise, AFAIK, 
there would be two `coro.free` get codegen'd.
```
%mem = llvm.coro.free()
br i1  , label , label 

aligend-dealloc:
use %mem

dealloc:
use %mem
```


> what's the reason we need to write emit(deallocate) twice?
John wrote a code snippet here: https://reviews.llvm.org/D100739#2717582. I 
think it would be helpful to look at the changed tests below to see the 
patterns.

Basically, for allocation, it looks like below; for deallocation, it would be 
similar.
```
void *rawFrame =nullptr;
...
if (llvm.coro.alloc()) {
  size_t size = llvm.coro.size(), align = llvm.coro.align();
  if (align > NEW_ALIGN) {
#if 
size += align - NEW_ALIGN + sizeof(void*);
frame = operator new(size);
rawFrame = frame;
frame = (frame + align - 1) & ~(align - 1);
#else
// If an aligned allocation function is selected.
frame = operator new(size, align);
#endif
  } else {
frame = operator new(size);
  }
}
```
The true branch of the #if directive is equivalent to "coro.alloc.align" block 
(and "coro.alloc.align2" if `get_return_object_on_allocation_failure` is 
defined), the false branch is equivalent to "coro.alloc" block.
The above pattern handles both aligned/normal allocation/deallocation so it is 
independent of D102147.



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:700
+  auto *AlignedAllocateCall = EmitScalarExpr(S.getAlignedAllocate());
+  bool HasAlignArg = hasAlignArg(cast(AlignedAllocateCall));
+

ChuanqiXu wrote:
> Since `hasAlignArg` is called only once, I suggested to make it a lambda here 
> which would make the code more easy to read.
will do



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:702-704
+  if (!HasAlignArg)
+overAllocateFrame(*this, cast(AlignedAllocateCall),
+  /*IsAlloc*/ true);

ChuanqiXu wrote:
> I recommend to add a detailed comment here to tell the story why we need to 
> over allocate the frame. It is really hard to understand for people who are 
> new to this code. Otherwise, I think they need to use `git blame` to find the 
> commit id and this review page to figure the reasons out.
will do.



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:878
+if (CurCoro.Data && CurCoro.Data->LastCoroFreeUsedForDealloc)
+  return RValue::get(CurCoro.Data->LastCoroFree);
+

ChuanqiXu wrote:
> Is it possible that it would return a nullptr value?
Not that I know of. Because there is an early return 
```
if (!CoroFree) {
  CGF.CGM.Error(Deallocate->getBeginLoc(),
"Deallocation expressoin does not refer to coro.free");
  return;
}
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-05-12 Thread Chuanqi Xu via Phabricator via cfe-commits

ChuanqiXu added inline comments.



Comment at: clang/include/clang/AST/StmtCXX.h:356-359
 Expr *Allocate = nullptr;
 Expr *Deallocate = nullptr;
+Expr *AlignedAllocate = nullptr;
+Expr *AlignedDeallocate = nullptr;

Can't we merge these?



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:445-483
+void emitDynamicAlignedDealloc(CodeGenFunction ,
+   llvm::BasicBlock *AlignedFreeBB,
+   llvm::CallInst *CoroFree) {
+  llvm::CallInst *Dealloc = nullptr;
+  for (llvm::User *U : CoroFree->users()) {
+if (auto *CI = dyn_cast(U))
+  if (CI->getParent() == CGF.Builder.GetInsertBlock())

This code would only work if we use `::operator new(size_t, align_val_t)`, 
which is implemented in another patch. I would suggest to move this into that 
one.



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:528-547
 CGF.EmitBlock(FreeBB);
 CGF.EmitStmt(Deallocate);
-
-auto *AfterFreeBB = CGF.createBasicBlock("after.coro.free");
-CGF.EmitBlock(AfterFreeBB);
+CGF.Builder.CreateBr(AfterFreeBB);
 
 // We should have captured coro.free from the emission of deallocate.
 auto *CoroFree = CGF.CurCoro.Data->LastCoroFree;
+CGF.CurCoro.Data->LastCoroFreeUsedForDealloc = true;

It looks like it would emit a `deallocate` first, and emit an 
`alignedDeallocate`, which is very odd. Although I can find that the second 
`deallocate` wouldn't be emitted due to the check `LastCoroFreeUsedForDealloc`, 
it is still very odd to me. If the second `deallocate` wouldn't come up all the 
way, what's the reason we need to write `emit(deallocate)` twice?



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:700
+  auto *AlignedAllocateCall = EmitScalarExpr(S.getAlignedAllocate());
+  bool HasAlignArg = hasAlignArg(cast(AlignedAllocateCall));
+

Since `hasAlignArg` is called only once, I suggested to make it a lambda here 
which would make the code more easy to read.



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:702-704
+  if (!HasAlignArg)
+overAllocateFrame(*this, cast(AlignedAllocateCall),
+  /*IsAlloc*/ true);

I recommend to add a detailed comment here to tell the story why we need to 
over allocate the frame. It is really hard to understand for people who are new 
to this code. Otherwise, I think they need to use `git blame` to find the 
commit id and this review page to figure the reasons out.



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:706-728
+  if (auto *RetOnAllocFailure = S.getReturnStmtOnAllocFailure()) {
+auto *Cond = Builder.CreateICmpNE(AlignedAllocateCall, NullPtr);
+if (HasAlignArg) {
+  Builder.CreateCondBr(Cond, InitBB, RetOnFailureBB);
+} else {
+  AlignAllocBB2 = createBasicBlock("coro.alloc.align2");
+  Builder.CreateCondBr(Cond, AlignAllocBB2, RetOnFailureBB);

It may be better to organize it as:
```
if (!HasAlignArg) {
   if (auto *RetOnAllocFailure = S.getReturnStmtOnAllocFailure()) {
   auto *Cond = Builder.CreateICmpNE(AlignedAllocateCall, NullPtr);
   AlignAllocBB2 = createBasicBlock("coro.alloc.align2");
   Builder.CreateCondBr(Cond, AlignAllocBB2, RetOnFailureBB);
   EmitBlock(AlignAllocBB2);
   }
   auto *CoroAlign = Builder.CreateCall(
CGM.getIntrinsic(llvm::Intrinsic::coro_align, SizeTy));
   ...
}
```



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:878
+if (CurCoro.Data && CurCoro.Data->LastCoroFreeUsedForDealloc)
+  return RValue::get(CurCoro.Data->LastCoroFree);
+

Is it possible that it would return a nullptr value?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-05-11 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added a comment.

In D97915#2747142 , @ChuanqiXu wrote:

>> which should be correct. It is implemented by 
>> CodeGenFunction::EmitBuiltinAlignTo.
>
> I agree it is correct. I just want to say we should comment it to avoid 
> confusing.

Happy to do it in a separate patch since this patch does not change the 
implementation of `CodeGenFunction::EmitBuiltinAlignTo`.

> Since the patch could handle the case if the frontend tries to search 
> `::operator new(size_t, align_val_t)`, this patch should be based on D102147 
> .

This patch *could* handle both aligned and normal new/delete, so it doesn't 
need D102147  to work correctly?
D102147  depends on this patch since it may 
find a non-aligned new/delete for overaligned frame. In such a case, this patch 
is required.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-05-11 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen updated this revision to Diff 344602.
ychen added a comment.

- Rebase on updated D102145  (use 
`llvm.coro.raw.frame.ptr.addr` during allocation)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

Files:
  clang/include/clang/AST/StmtCXX.h
  clang/lib/AST/StmtCXX.cpp
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/lib/Sema/SemaCoroutine.cpp
  clang/lib/Sema/TreeTransform.h
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-cleanup.cpp
  clang/test/CodeGenCoroutines/coro-gro.cpp

Index: clang/test/CodeGenCoroutines/coro-gro.cpp
===
--- clang/test/CodeGenCoroutines/coro-gro.cpp
+++ clang/test/CodeGenCoroutines/coro-gro.cpp
@@ -68,6 +68,7 @@
   // CHECK: call void @_ZNSt12experimental16coroutine_traitsIJiEE12promise_typeD1Ev(
   // CHECK: %[[Mem:.+]] = call i8* @llvm.coro.free(
   // CHECK: call void @_ZdlPv(i8* %[[Mem]])
+  // CHECK: call void @_ZdlPv(i8* %{{.*}})
 
   // Initialize retval from Gro and destroy Gro
 
Index: clang/test/CodeGenCoroutines/coro-cleanup.cpp
===
--- clang/test/CodeGenCoroutines/coro-cleanup.cpp
+++ clang/test/CodeGenCoroutines/coro-cleanup.cpp
@@ -78,12 +78,46 @@
 
   // CHECK: [[Cleanup]]:
   // CHECK: call void @_ZNSt12experimental16coroutine_traitsIJvEE12promise_typeD1Ev(
-  // CHECK: %[[Mem0:.+]] = call i8* @llvm.coro.free(
-  // CHECK: call void @_ZdlPv(i8* %[[Mem0]]
+  // CHECK: %[[MEM0:.+]] = call i8* @llvm.coro.free(
+  // CHECK: br i1 %{{.*}}, label %[[CheckAlignBB:.+]], label %[[Afterwards:.+]]
+
+  // CHECK: [[CheckAlignBB]]:
+  // CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
+  // CHECK: %[[CMP:.+]] = icmp ugt i64 %[[ALIGN]],
+  // CHECK: br i1 %[[CMP]], label %[[AlignedFreeBB:.+]], label %[[FreeBB:.+]]
+
+  // CHECK: [[FreeBB]]:
+  // CHECK: call void @_ZdlPv(i8* %[[MEM0]]
+  // CHECK: br label %[[Afterwards]]
+
+  // CHECK: [[AlignedFreeBB]]:
+  // CHECK-NEXT: %[[OFFSET:.+]] = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  // CHECK-NEXT: %[[ADDR:.+]] = getelementptr inbounds i8, i8* %[[MEM0]], i32 %[[OFFSET]]
+  // CHECK-NEXT: %[[ADDR2:.+]] = bitcast i8* %[[ADDR]] to i8**
+  // CHECK-NEXT: %[[MEM:.+]] = load i8*, i8** %[[ADDR2]], align 8
+  // CHECK-NEXT: call void @_ZdlPv(i8* %[[MEM]])
+  // CHECK-NEXT: br label %[[Afterwards]]
 
   // CHECK: [[Dealloc]]:
-  // CHECK:   %[[Mem:.+]] = call i8* @llvm.coro.free(
-  // CHECK:   call void @_ZdlPv(i8* %[[Mem]])
+  // CHECK: %[[MEM0:.+]] = call i8* @llvm.coro.free(
+  // CHECK: br i1 %{{.*}}, label %[[CheckAlignBB:.+]], label %[[Afterwards:.+]]
+
+  // CHECK: [[CheckAlignBB]]:
+  // CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
+  // CHECK: %[[CMP:.+]] = icmp ugt i64 %[[ALIGN]],
+  // CHECK: br i1 %[[CMP]], label %[[AlignedFreeBB:.+]], label %[[FreeBB:.+]]
+
+  // CHECK: [[FreeBB]]:
+  // CHECK: call void @_ZdlPv(i8* %[[MEM0]]
+  // CHECK: br label %[[Afterwards]]
+
+  // CHECK: [[AlignedFreeBB]]:
+  // CHECK-NEXT: %[[OFFSET:.+]] = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  // CHECK-NEXT: %[[ADDR:.+]] = getelementptr inbounds i8, i8* %[[MEM0]], i32 %[[OFFSET]]
+  // CHECK-NEXT: %[[ADDR2:.+]] = bitcast i8* %[[ADDR]] to i8**
+  // CHECK-NEXT: %[[MEM:.+]] = load i8*, i8** %[[ADDR2]], align 8
+  // CHECK-NEXT: call void @_ZdlPv(i8* %[[MEM]])
+  // CHECK-NEXT: br label %[[Afterwards]]
 
   co_return;
 }
Index: clang/test/CodeGenCoroutines/coro-alloc.cpp
===
--- clang/test/CodeGenCoroutines/coro-alloc.cpp
+++ clang/test/CodeGenCoroutines/coro-alloc.cpp
@@ -57,24 +57,55 @@
 extern "C" void f0(global_new_delete_tag) {
   // CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16
   // CHECK: %[[NeedAlloc:.+]] = call i1 @llvm.coro.alloc(token %[[ID]])
-  // CHECK: br i1 %[[NeedAlloc]], label %[[AllocBB:.+]], label %[[InitBB:.+]]
+  // CHECK: br i1 %[[NeedAlloc]], label %[[CheckAlignBB:.+]], label %[[InitBB:.+]]
+
+  // CHECK: [[CheckAlignBB]]:
+  // CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
+  // CHECK: %[[CMP:.+]] = icmp ugt i64 %[[ALIGN]], 16
+  // CHECK: br i1 %[[CMP]], label %[[AlignAllocBB:.+]], label %[[AllocBB:.+]]
 
   // CHECK: [[AllocBB]]:
+  // CHECK-NEXT: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
+  // CHECK-NEXT: %[[MEM:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[SIZE]])
+  // CHECK-NEXT: br label %[[InitBB:.+]]
+
+  // CHECK: [[AlignAllocBB]]:
   // CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
-  // CHECK: %[[MEM:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[SIZE]])
+  // CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
+  // CHECK: %[[PAD:.+]] = sub nsw i64 %[[ALIGN]], 16
+  // CHECK: %[[NEWSIZE:.+]] = add i64 %[[SIZE]], %[[PAD]]
+  // CHECK:

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-05-10 Thread Chuanqi Xu via Phabricator via cfe-commits

ChuanqiXu added a comment.

> which should be correct. It is implemented by 
> CodeGenFunction::EmitBuiltinAlignTo.

I agree it is correct. I just want to say we should comment it to avoid 
confusing.

Since the patch could handle the case if the frontend tries to search 
`::operator new(size_t, align_val_t)`, this patch should be based on D102147 
.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-05-09 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen updated this revision to Diff 343959.
ychen added a comment.

- Rebase


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

Files:
  clang/include/clang/AST/StmtCXX.h
  clang/lib/AST/StmtCXX.cpp
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/lib/Sema/SemaCoroutine.cpp
  clang/lib/Sema/TreeTransform.h
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-cleanup.cpp
  clang/test/CodeGenCoroutines/coro-gro.cpp

Index: clang/test/CodeGenCoroutines/coro-gro.cpp
===
--- clang/test/CodeGenCoroutines/coro-gro.cpp
+++ clang/test/CodeGenCoroutines/coro-gro.cpp
@@ -68,6 +68,7 @@
   // CHECK: call void @_ZNSt12experimental16coroutine_traitsIJiEE12promise_typeD1Ev(
   // CHECK: %[[Mem:.+]] = call i8* @llvm.coro.free(
   // CHECK: call void @_ZdlPv(i8* %[[Mem]])
+  // CHECK: call void @_ZdlPv(i8* %{{.*}})
 
   // Initialize retval from Gro and destroy Gro
 
Index: clang/test/CodeGenCoroutines/coro-cleanup.cpp
===
--- clang/test/CodeGenCoroutines/coro-cleanup.cpp
+++ clang/test/CodeGenCoroutines/coro-cleanup.cpp
@@ -78,12 +78,46 @@
 
   // CHECK: [[Cleanup]]:
   // CHECK: call void @_ZNSt12experimental16coroutine_traitsIJvEE12promise_typeD1Ev(
-  // CHECK: %[[Mem0:.+]] = call i8* @llvm.coro.free(
-  // CHECK: call void @_ZdlPv(i8* %[[Mem0]]
+  // CHECK: %[[MEM0:.+]] = call i8* @llvm.coro.free(
+  // CHECK: br i1 %{{.*}}, label %[[CheckAlignBB:.+]], label %[[Afterwards:.+]]
+
+  // CHECK: [[CheckAlignBB]]:
+  // CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
+  // CHECK: %[[CMP:.+]] = icmp ugt i64 %[[ALIGN]],
+  // CHECK: br i1 %[[CMP]], label %[[AlignedFreeBB:.+]], label %[[FreeBB:.+]]
+
+  // CHECK: [[FreeBB]]:
+  // CHECK: call void @_ZdlPv(i8* %[[MEM0]]
+  // CHECK: br label %[[Afterwards]]
+
+  // CHECK: [[AlignedFreeBB]]:
+  // CHECK-NEXT: %[[OFFSET:.+]] = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  // CHECK-NEXT: %[[ADDR:.+]] = getelementptr inbounds i8, i8* %[[MEM0]], i32 %[[OFFSET]]
+  // CHECK-NEXT: %[[ADDR2:.+]] = bitcast i8* %[[ADDR]] to i8**
+  // CHECK-NEXT: %[[MEM:.+]] = load i8*, i8** %[[ADDR2]], align 8
+  // CHECK-NEXT: call void @_ZdlPv(i8* %[[MEM]])
+  // CHECK-NEXT: br label %[[Afterwards]]
 
   // CHECK: [[Dealloc]]:
-  // CHECK:   %[[Mem:.+]] = call i8* @llvm.coro.free(
-  // CHECK:   call void @_ZdlPv(i8* %[[Mem]])
+  // CHECK: %[[MEM0:.+]] = call i8* @llvm.coro.free(
+  // CHECK: br i1 %{{.*}}, label %[[CheckAlignBB:.+]], label %[[Afterwards:.+]]
+
+  // CHECK: [[CheckAlignBB]]:
+  // CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
+  // CHECK: %[[CMP:.+]] = icmp ugt i64 %[[ALIGN]],
+  // CHECK: br i1 %[[CMP]], label %[[AlignedFreeBB:.+]], label %[[FreeBB:.+]]
+
+  // CHECK: [[FreeBB]]:
+  // CHECK: call void @_ZdlPv(i8* %[[MEM0]]
+  // CHECK: br label %[[Afterwards]]
+
+  // CHECK: [[AlignedFreeBB]]:
+  // CHECK-NEXT: %[[OFFSET:.+]] = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  // CHECK-NEXT: %[[ADDR:.+]] = getelementptr inbounds i8, i8* %[[MEM0]], i32 %[[OFFSET]]
+  // CHECK-NEXT: %[[ADDR2:.+]] = bitcast i8* %[[ADDR]] to i8**
+  // CHECK-NEXT: %[[MEM:.+]] = load i8*, i8** %[[ADDR2]], align 8
+  // CHECK-NEXT: call void @_ZdlPv(i8* %[[MEM]])
+  // CHECK-NEXT: br label %[[Afterwards]]
 
   co_return;
 }
Index: clang/test/CodeGenCoroutines/coro-alloc.cpp
===
--- clang/test/CodeGenCoroutines/coro-alloc.cpp
+++ clang/test/CodeGenCoroutines/coro-alloc.cpp
@@ -57,24 +57,57 @@
 extern "C" void f0(global_new_delete_tag) {
   // CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16
   // CHECK: %[[NeedAlloc:.+]] = call i1 @llvm.coro.alloc(token %[[ID]])
-  // CHECK: br i1 %[[NeedAlloc]], label %[[AllocBB:.+]], label %[[InitBB:.+]]
+  // CHECK: br i1 %[[NeedAlloc]], label %[[CheckAlignBB:.+]], label %[[InitBB:.+]]
+
+  // CHECK: [[CheckAlignBB]]:
+  // CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
+  // CHECK: %[[CMP:.+]] = icmp ugt i64 %[[ALIGN]], 16
+  // CHECK: br i1 %[[CMP]], label %[[AlignAllocBB:.+]], label %[[AllocBB:.+]]
 
   // CHECK: [[AllocBB]]:
+  // CHECK-NEXT: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
+  // CHECK-NEXT: %[[MEM:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[SIZE]])
+  // CHECK-NEXT: br label %[[InitBB:.+]]
+
+  // CHECK: [[AlignAllocBB]]:
   // CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
-  // CHECK: %[[MEM:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[SIZE]])
+  // CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
+  // CHECK: %[[PAD:.+]] = sub nsw i64 %[[ALIGN]], 16
+  // CHECK: %[[NEWSIZE:.+]] = add i64 %[[SIZE]], %[[PAD]]
+  // CHECK: %[[MEM2:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[NEWSIZE]])
+  // CHECK: %[[ALIGN2:.+]] = call i64

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-05-09 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen updated this revision to Diff 343956.
ychen added a comment.

- rebase on D102145 .


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-cleanup.cpp
  clang/test/CodeGenCoroutines/coro-gro.cpp

Index: clang/test/CodeGenCoroutines/coro-gro.cpp
===
--- clang/test/CodeGenCoroutines/coro-gro.cpp
+++ clang/test/CodeGenCoroutines/coro-gro.cpp
@@ -68,6 +68,7 @@
   // CHECK: call void @_ZNSt12experimental16coroutine_traitsIJiEE12promise_typeD1Ev(
   // CHECK: %[[Mem:.+]] = call i8* @llvm.coro.free(
   // CHECK: call void @_ZdlPv(i8* %[[Mem]])
+  // CHECK: call void @_ZdlPv(i8* %{{.*}})
 
   // Initialize retval from Gro and destroy Gro
 
Index: clang/test/CodeGenCoroutines/coro-cleanup.cpp
===
--- clang/test/CodeGenCoroutines/coro-cleanup.cpp
+++ clang/test/CodeGenCoroutines/coro-cleanup.cpp
@@ -78,12 +78,46 @@
 
   // CHECK: [[Cleanup]]:
   // CHECK: call void @_ZNSt12experimental16coroutine_traitsIJvEE12promise_typeD1Ev(
-  // CHECK: %[[Mem0:.+]] = call i8* @llvm.coro.free(
-  // CHECK: call void @_ZdlPv(i8* %[[Mem0]]
+  // CHECK: %[[MEM0:.+]] = call i8* @llvm.coro.free(
+  // CHECK: br i1 %{{.*}}, label %[[CheckAlignBB:.+]], label %[[Afterwards:.+]]
+
+  // CHECK: [[CheckAlignBB]]:
+  // CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
+  // CHECK: %[[CMP:.+]] = icmp ugt i64 %[[ALIGN]],
+  // CHECK: br i1 %[[CMP]], label %[[AlignedFreeBB:.+]], label %[[FreeBB:.+]]
+
+  // CHECK: [[FreeBB]]:
+  // CHECK: call void @_ZdlPv(i8* %[[MEM0]]
+  // CHECK: br label %[[Afterwards]]
+
+  // CHECK: [[AlignedFreeBB]]:
+  // CHECK-NEXT: %[[OFFSET:.+]] = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  // CHECK-NEXT: %[[ADDR:.+]] = getelementptr inbounds i8, i8* %[[MEM0]], i32 %[[OFFSET]]
+  // CHECK-NEXT: %[[ADDR2:.+]] = bitcast i8* %[[ADDR]] to i8**
+  // CHECK-NEXT: %[[MEM:.+]] = load i8*, i8** %[[ADDR2]], align 8
+  // CHECK-NEXT: call void @_ZdlPv(i8* %[[MEM]])
+  // CHECK-NEXT: br label %[[Afterwards]]
 
   // CHECK: [[Dealloc]]:
-  // CHECK:   %[[Mem:.+]] = call i8* @llvm.coro.free(
-  // CHECK:   call void @_ZdlPv(i8* %[[Mem]])
+  // CHECK: %[[MEM0:.+]] = call i8* @llvm.coro.free(
+  // CHECK: br i1 %{{.*}}, label %[[CheckAlignBB:.+]], label %[[Afterwards:.+]]
+
+  // CHECK: [[CheckAlignBB]]:
+  // CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
+  // CHECK: %[[CMP:.+]] = icmp ugt i64 %[[ALIGN]],
+  // CHECK: br i1 %[[CMP]], label %[[AlignedFreeBB:.+]], label %[[FreeBB:.+]]
+
+  // CHECK: [[FreeBB]]:
+  // CHECK: call void @_ZdlPv(i8* %[[MEM0]]
+  // CHECK: br label %[[Afterwards]]
+
+  // CHECK: [[AlignedFreeBB]]:
+  // CHECK-NEXT: %[[OFFSET:.+]] = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
+  // CHECK-NEXT: %[[ADDR:.+]] = getelementptr inbounds i8, i8* %[[MEM0]], i32 %[[OFFSET]]
+  // CHECK-NEXT: %[[ADDR2:.+]] = bitcast i8* %[[ADDR]] to i8**
+  // CHECK-NEXT: %[[MEM:.+]] = load i8*, i8** %[[ADDR2]], align 8
+  // CHECK-NEXT: call void @_ZdlPv(i8* %[[MEM]])
+  // CHECK-NEXT: br label %[[Afterwards]]
 
   co_return;
 }
Index: clang/test/CodeGenCoroutines/coro-alloc.cpp
===
--- clang/test/CodeGenCoroutines/coro-alloc.cpp
+++ clang/test/CodeGenCoroutines/coro-alloc.cpp
@@ -57,24 +57,57 @@
 extern "C" void f0(global_new_delete_tag) {
   // CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16
   // CHECK: %[[NeedAlloc:.+]] = call i1 @llvm.coro.alloc(token %[[ID]])
-  // CHECK: br i1 %[[NeedAlloc]], label %[[AllocBB:.+]], label %[[InitBB:.+]]
+  // CHECK: br i1 %[[NeedAlloc]], label %[[CheckAlignBB:.+]], label %[[InitBB:.+]]
+
+  // CHECK: [[CheckAlignBB]]:
+  // CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
+  // CHECK: %[[CMP:.+]] = icmp ugt i64 %[[ALIGN]], 16
+  // CHECK: br i1 %[[CMP]], label %[[AlignAllocBB:.+]], label %[[AllocBB:.+]]
 
   // CHECK: [[AllocBB]]:
+  // CHECK-NEXT: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
+  // CHECK-NEXT: %[[MEM:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[SIZE]])
+  // CHECK-NEXT: br label %[[InitBB:.+]]
+
+  // CHECK: [[AlignAllocBB]]:
   // CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
-  // CHECK: %[[MEM:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[SIZE]])
+  // CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
+  // CHECK: %[[PAD:.+]] = sub nsw i64 %[[ALIGN]], 16
+  // CHECK: %[[NEWSIZE:.+]] = add i64 %[[SIZE]], %[[PAD]]
+  // CHECK: %[[MEM2:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[NEWSIZE]])
+  // CHECK: %[[ALIGN2:.+]] = call i64 @llvm.coro.align.i64()
+  // CHECK: %[[ALIGNED:.+]] = getelementptr inbounds i8, i8* %[[MEM2]],

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-05-05 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen planned changes to this revision.
ychen added a comment.

Plan to rebase this together with the following patch for two lookups (aligned 
and non-aligned new/delete, and generate code accordingly)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-05-03 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added a comment.

In D97915#2728377 , @ChuanqiXu wrote:

> This code snippets confused me before:
>
>   coro.alloc.align: ; preds = 
> %coro.check.align
> %mask = sub i64 %11, 1
> %intptr = ptrtoint i8* %call to i64
> %over_boundary = add i64 %intptr, %mask
> %inverted_mask = xor i64 %mask, -1
> %aligned_intptr = and i64 %over_boundary, %inverted_mask
> %diff = sub i64 %aligned_intptr, %intptr
> %aligned_result = getelementptr inbounds i8, i8* %call, i64 %diff
>
> This code implies that `%diff > 0`. Formally, given `Align = 2^m, m > 4` and 
> `Address=16n`, we need to prove that:
>
>   (Address + Align -16)&(~(Align-1)) >= Address
>
> `&(~Align-1)` would make the lowest `m` bit to 0. And `Align-16` equals to 
> `2^m - 16`, which is `16*(2^(m-4)-1)`. Then `Address + Align -16` could be 
> `16*(n+2^(m-4)-1)`.
> Then we call `X` for the value of the lowest `m` bit of  `Address + Align 
> -16`. 
> Because `X` has `m` bit, so `X <= 2^m - 1`. Noticed that `X` should be 16 
> aligned, so the lowest 4 bit should be zero.
> Now,
>
>   X <= 2^m - 1 -1 - 2 - 4 - 8 = 2^m - 16
>
> So the inequality we need prove now should be:
>
>   16*(n+2^(m-4)-1) - X >= 16n
>
> Given X has the largest value wouldn't affect the inequality, so:
>
>   16*(n+2^(m-4)-1) - 2^m + 16 >= 16n
>
> which is very easy now.
>
> The overall prove looks non-travel to me. I spent some time to figure it out. 
> I guess there must be some other people who can't get it immediately. I 
> strongly recommend to add comment and corresponding prove for this code.

The code is equivalent to

  (Address + Align -1)&(~(Align-1)) >= Address

which should be correct. It is implemented by 
`CodeGenFunction::EmitBuiltinAlignTo`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-04-30 Thread Chuanqi Xu via Phabricator via cfe-commits

ChuanqiXu added a comment.

This code snippets confused me before:

  coro.alloc.align: ; preds = %coro.check.align
%mask = sub i64 %11, 1
%intptr = ptrtoint i8* %call to i64
%over_boundary = add i64 %intptr, %mask
%inverted_mask = xor i64 %mask, -1
%aligned_intptr = and i64 %over_boundary, %inverted_mask
%diff = sub i64 %aligned_intptr, %intptr
%aligned_result = getelementptr inbounds i8, i8* %call, i64 %diff

This code implies that `%diff > 0`. Formally, given `Align = 2^m, m > 4` and 
`Address=16n`, we need to prove that:

  (Address + Align -16)&(~(Align-1)) >= Address

`&(~Align-1)` would make the lowest `m` bit to 0. And `Align-16` equals to `2^m 
- 16`, which is `16*(2^(m-4)-1)`. Then `Address + Align -16` could be 
`16*(n+2^(m-4)-1)`.
Then we call `X` for the value of the lowest `m` bit of  `Address + Align -16`. 
Because `X` has `m` bit, so `X <= 2^m - 1`. Noticed that `X` should be 16 
aligned, so the lowest 4 bit should be zero.
Now,

  X <= 2^m - 1 -1 - 2 - 4 - 8 = 2^m - 16

So the inequality we need prove now should be:

  16*(n+2^(m-4)-1) - X >= 16n

Given X has the largest value wouldn't affect the inequality, so:

  16*(n+2^(m-4)-1) - 2^m + 16 >= 16n

which is very easy now.

The overall prove looks non-travel to me. I spent some time to figure it out. I 
guess there must be some other people who can't get it immediately. I strongly 
recommend to add comment and corresponding prove for this code.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-04-30 Thread Chuanqi Xu via Phabricator via cfe-commits

ChuanqiXu added a comment.

In D97915#2727787 , @ychen wrote:

> In D97915#2727759 , @ChuanqiXu wrote:
>
>> May I ask a question may be too simple? What if the user specify the 
>> alignment for promise (or any other local variables) to 128 or even 256? 
>> Since it looks like all the discuss before assumes that the largest 
>> alignment requirement is 64.
>
> 64 is one example. Bitwise operations (`coro.alloc.align` block in the 
> attached example) should handle all valid alignment numbers.

Thanks for the example. And I recommended to add comment for the corresponding 
code. The code for bit-operation and the example confused me. I would look into 
this and the other part later.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-04-29 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen updated this revision to Diff 341756.
ychen added a comment.

- Add missed `Shape.CoroRawFramePtrOffsets.clear();`


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-cleanup.cpp
  clang/test/CodeGenCoroutines/coro-gro.cpp
  llvm/docs/Coroutines.rst
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/Transforms/Coroutines/CoroFrame.cpp
  llvm/lib/Transforms/Coroutines/CoroInstr.h
  llvm/lib/Transforms/Coroutines/CoroInternal.h
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp
  llvm/lib/Transforms/Coroutines/Coroutines.cpp
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-04.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-05.ll

Index: llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-05.ll
===
--- llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-05.ll
+++ llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-05.ll
@@ -62,7 +62,7 @@
   call i1 @llvm.coro.end(i8* null, i1 false)
   ret void
 }
-; CHECK:   %a.Frame = type { void (%a.Frame*)*, void (%a.Frame*)*, %"struct.task::promise_type", i1, [14 x i8], %struct.big_structure }
+; CHECK:   %a.Frame = type { void (%a.Frame*)*, void (%a.Frame*)*, %"struct.task::promise_type", i1, i8*, %struct.big_structure }
 ; CHECK-LABEL: @a.resume(
 ; CHECK: %[[A:.*]] = getelementptr inbounds %a.Frame, %a.Frame* %FramePtr, i32 0, i32 3
 ; CHECK: %[[A:.*]] = getelementptr inbounds %a.Frame, %a.Frame* %FramePtr, i32 0, i32 5
Index: llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-04.ll
===
--- llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-04.ll
+++ llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-04.ll
@@ -62,10 +62,10 @@
   call i1 @llvm.coro.end(i8* null, i1 false)
   ret void
 }
-; CHECK:   %a.Frame = type { void (%a.Frame*)*, void (%a.Frame*)*, %"struct.task::promise_type", %struct.big_structure, i1, [26 x i8], %struct.big_structure.2 }
+; CHECK:   %a.Frame = type { void (%a.Frame*)*, void (%a.Frame*)*, %"struct.task::promise_type", %struct.big_structure, i1, i8*, [16 x i8], %struct.big_structure.2 }
 ; CHECK-LABEL: @a.resume(
 ; CHECK: %[[A:.*]] = getelementptr inbounds %a.Frame, %a.Frame* %FramePtr, i32 0, i32 3
-; CHECK: %[[A:.*]] = getelementptr inbounds %a.Frame, %a.Frame* %FramePtr, i32 0, i32 6
+; CHECK: %[[A:.*]] = getelementptr inbounds %a.Frame, %a.Frame* %FramePtr, i32 0, i32 7
 
 declare token @llvm.coro.id(i32, i8* readnone, i8* nocapture readonly, i8*)
 declare i1 @llvm.coro.alloc(token) #3
Index: llvm/lib/Transforms/Coroutines/Coroutines.cpp
===
--- llvm/lib/Transforms/Coroutines/Coroutines.cpp
+++ llvm/lib/Transforms/Coroutines/Coroutines.cpp
@@ -234,6 +234,8 @@
   Shape.CoroBegin = nullptr;
   Shape.CoroEnds.clear();
   Shape.CoroSizes.clear();
+  Shape.CoroAligns.clear();
+  Shape.CoroRawFramePtrOffsets.clear();
   Shape.CoroSuspends.clear();
 
   Shape.FrameTy = nullptr;
@@ -268,6 +270,12 @@
   case Intrinsic::coro_size:
 CoroSizes.push_back(cast(II));
 break;
+  case Intrinsic::coro_align:
+CoroAligns.push_back(cast(II));
+break;
+  case Intrinsic::coro_raw_frame_ptr_offset:
+CoroRawFramePtrOffsets.push_back(cast(II));
+break;
   case Intrinsic::coro_frame:
 CoroFrames.push_back(cast(II));
 break;
@@ -375,6 +383,7 @@
 this->SwitchLowering.ResumeSwitch = nullptr;
 this->SwitchLowering.PromiseAlloca = SwitchId->getPromise();
 this->SwitchLowering.ResumeEntryBlock = nullptr;
+this->SwitchLowering.FramePtrOffset = 0;
 
 for (auto AnySuspend : CoroSuspends) {
   auto Suspend = dyn_cast(AnySuspend);
Index: llvm/lib/Transforms/Coroutines/CoroSplit.cpp
===
--- llvm/lib/Transforms/Coroutines/CoroSplit.cpp
+++ llvm/lib/Transforms/Coroutines/CoroSplit.cpp
@@ -997,23 +997,44 @@
   Shape.AsyncLowering.AsyncFuncPointer->setInitializer(NewFuncPtrStruct);
 }
 
-static void replaceFrameSize(coro::Shape ) {
+static void replaceFrameSizeAndAlign(coro::Shape ) {
   if (Shape.ABI == coro::ABI::Async)
 updateAsyncFuncPointerContextSize(Shape);
 
-  if (Shape.CoroSizes.empty())
-return;
+  if (!Shape.CoroSizes.empty()) {
+// In the same function all coro.sizes should have the same result type.
+auto *SizeIntrin = Shape.CoroSizes.back();
+Module *M = SizeIntrin->getModule();
+const DataLayout  = M->getDataLayout();
+auto Size = DL.getTypeAllocSize(Shape.FrameTy);
+auto *SizeConstant =

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-04-29 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added a comment.

In D97915#2727759 , @ChuanqiXu wrote:

> May I ask a question may be too simple? What if the user specify the 
> alignment for promise (or any other local variables) to 128 or even 256? 
> Since it looks like all the discuss before assumes that the largest alignment 
> requirement is 64.

64 is one example. Bitwise operations (`coro.alloc.align` block in the attached 
example) should handle all valid alignment numbers.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-04-29 Thread Chuanqi Xu via Phabricator via cfe-commits

ChuanqiXu added a comment.

May I ask a question may be too simple? What if the user specify the alignment 
for promise (or any other local variables) to 128 or even 256? Since it looks 
like all the discuss before assumes that the largest alignment requirement is 
64.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-04-29 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added a comment.

For coroutine `f0` in `test/CodeGenCoroutines/coro-alloc.cpp`

The allocation looks like this:

  ; Function Attrs: noinline nounwind optnone mustprogress
  define dso_local void @f0() #0 {
  entry:
%0 = alloca %struct.global_new_delete_tag, align 1
%1 = alloca %struct.global_new_delete_tag, align 1
%__promise = alloca %"struct.std::experimental::coroutine_traits::promise_type", align 1
%ref.tmp = alloca %struct.suspend_always, align 1
%undef.agg.tmp = alloca %struct.suspend_always, align 1
%agg.tmp = alloca %"struct.std::experimental::coroutine_handle", align 1
%agg.tmp2 = alloca %"struct.std::experimental::coroutine_handle.0", align 1
%undef.agg.tmp3 = alloca %"struct.std::experimental::coroutine_handle.0", 
align 1
%ref.tmp4 = alloca %struct.suspend_always, align 1
%undef.agg.tmp5 = alloca %struct.suspend_always, align 1
%agg.tmp7 = alloca %"struct.std::experimental::coroutine_handle", align 1
%agg.tmp8 = alloca %"struct.std::experimental::coroutine_handle.0", align 1
%undef.agg.tmp9 = alloca %"struct.std::experimental::coroutine_handle.0", 
align 1
%2 = bitcast %"struct.std::experimental::coroutine_traits::promise_type"* %__promise to i8*
%3 = call token @llvm.coro.id(i32 16, i8* %2, i8* null, i8* null)
%4 = call i1 @llvm.coro.alloc(token %3)
br i1 %4, label %coro.alloc, label %coro.init
  
  coro.alloc:   ; preds = %entry
%5 = call i64 @llvm.coro.size.i64()
%6 = call i64 @llvm.coro.align.i64()
%7 = sub nsw i64 %6, 16
%8 = icmp sgt i64 %7, 0
%9 = select i1 %8, i64 %7, i64 0
%10 = add i64 %5, %9
%call = call noalias nonnull i8* @_Znwm(i64 %10) #11
br label %coro.check.align
  
  coro.check.align: ; preds = %coro.alloc
%11 = call i64 @llvm.coro.align.i64()
%12 = icmp ugt i64 %11, 16
br i1 %12, label %coro.alloc.align, label %coro.init
  
  coro.alloc.align: ; preds = %coro.check.align
%mask = sub i64 %11, 1
%intptr = ptrtoint i8* %call to i64
%over_boundary = add i64 %intptr, %mask
%inverted_mask = xor i64 %mask, -1
%aligned_intptr = and i64 %over_boundary, %inverted_mask
%diff = sub i64 %aligned_intptr, %intptr
%aligned_result = getelementptr inbounds i8, i8* %call, i64 %diff
call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, i64 %11) ]
%13 = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
%14 = getelementptr inbounds i8, i8* %aligned_result, i32 %13
%15 = bitcast i8* %14 to i8**
store i8* %call, i8** %15, align 8
br label %coro.init
  
  coro.init:; preds = 
%coro.alloc.align, %coro.check.align, %entry
%16 = phi i8* [ null, %entry ], [ %call, %coro.check.align ], [ 
%aligned_result, %coro.alloc.align ]
%17 = call i8* @llvm.coro.begin(token %3, i8* %16)
call void 
@_ZNSt12experimental16coroutine_traitsIJv21global_new_delete_tagEE12promise_type17get_return_objectEv(%"struct.std::experimental::coroutine_traits::promise_type"* nonnull dereferenceable(1) %__promise)
call void 
@_ZNSt12experimental16coroutine_traitsIJv21global_new_delete_tagEE12promise_type15initial_suspendEv(%"struct.std::experimental::coroutine_traits::promise_type"* nonnull dereferenceable(1) %__promise)
%call1 = call zeroext i1 
@_ZN14suspend_always11await_readyEv(%struct.suspend_always* nonnull 
dereferenceable(1) %ref.tmp) #2
br i1 %call1, label %init.ready, label %init.suspend

The deallocation looks like this:

  cleanup:  ; preds = %final.ready, 
%final.cleanup, %init.cleanup
%cleanup.dest.slot.0 = phi i32 [ 0, %final.ready ], [ 2, %final.cleanup ], 
[ 2, %init.cleanup ]
%22 = call i8* @llvm.coro.free(token %3, i8* %17)
%23 = icmp ne i8* %22, null
br i1 %23, label %coro.free, label %after.coro.free
  
  coro.free:; preds = %cleanup
%24 = call i64 @llvm.coro.align.i64()
%25 = icmp ugt i64 %24, 16
%26 = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
%27 = getelementptr inbounds i8, i8* %22, i32 %26
%28 = bitcast i8* %27 to i8**
%29 = load i8*, i8** %28, align 8
%30 = select i1 %25, i8* %29, i8* %22
call void @_ZdlPv(i8* %30) #2
br label %after.coro.free
  
  after.coro.free:  ; preds = %cleanup, 
%coro.free
switch i32 %cleanup.dest.slot.0, label %unreachable [
  i32 0, label %cleanup.cont
  i32 2, label %coro.ret
]
  
  cleanup.cont: ; preds = %after.coro.free
br label %coro.ret
  
  coro.ret: ; preds = %cleanup.cont, 
%after.coro.free, %final.suspend, %init.suspend
%31 = call i1 @llvm.coro.end(i8* null, i1 false)
ret void
  
  unreachable:  ; preds = %after.coro.free

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-04-29 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen updated this revision to Diff 341592.
ychen added a comment.

- fix typo


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-cleanup.cpp
  clang/test/CodeGenCoroutines/coro-gro.cpp
  llvm/docs/Coroutines.rst
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/Transforms/Coroutines/CoroFrame.cpp
  llvm/lib/Transforms/Coroutines/CoroInstr.h
  llvm/lib/Transforms/Coroutines/CoroInternal.h
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp
  llvm/lib/Transforms/Coroutines/Coroutines.cpp
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-04.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-05.ll

Index: llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-05.ll
===
--- llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-05.ll
+++ llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-05.ll
@@ -62,7 +62,7 @@
   call i1 @llvm.coro.end(i8* null, i1 false)
   ret void
 }
-; CHECK:   %a.Frame = type { void (%a.Frame*)*, void (%a.Frame*)*, %"struct.task::promise_type", i1, [14 x i8], %struct.big_structure }
+; CHECK:   %a.Frame = type { void (%a.Frame*)*, void (%a.Frame*)*, %"struct.task::promise_type", i1, i8*, %struct.big_structure }
 ; CHECK-LABEL: @a.resume(
 ; CHECK: %[[A:.*]] = getelementptr inbounds %a.Frame, %a.Frame* %FramePtr, i32 0, i32 3
 ; CHECK: %[[A:.*]] = getelementptr inbounds %a.Frame, %a.Frame* %FramePtr, i32 0, i32 5
Index: llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-04.ll
===
--- llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-04.ll
+++ llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-04.ll
@@ -62,10 +62,10 @@
   call i1 @llvm.coro.end(i8* null, i1 false)
   ret void
 }
-; CHECK:   %a.Frame = type { void (%a.Frame*)*, void (%a.Frame*)*, %"struct.task::promise_type", %struct.big_structure, i1, [26 x i8], %struct.big_structure.2 }
+; CHECK:   %a.Frame = type { void (%a.Frame*)*, void (%a.Frame*)*, %"struct.task::promise_type", %struct.big_structure, i1, i8*, [16 x i8], %struct.big_structure.2 }
 ; CHECK-LABEL: @a.resume(
 ; CHECK: %[[A:.*]] = getelementptr inbounds %a.Frame, %a.Frame* %FramePtr, i32 0, i32 3
-; CHECK: %[[A:.*]] = getelementptr inbounds %a.Frame, %a.Frame* %FramePtr, i32 0, i32 6
+; CHECK: %[[A:.*]] = getelementptr inbounds %a.Frame, %a.Frame* %FramePtr, i32 0, i32 7
 
 declare token @llvm.coro.id(i32, i8* readnone, i8* nocapture readonly, i8*)
 declare i1 @llvm.coro.alloc(token) #3
Index: llvm/lib/Transforms/Coroutines/Coroutines.cpp
===
--- llvm/lib/Transforms/Coroutines/Coroutines.cpp
+++ llvm/lib/Transforms/Coroutines/Coroutines.cpp
@@ -234,6 +234,7 @@
   Shape.CoroBegin = nullptr;
   Shape.CoroEnds.clear();
   Shape.CoroSizes.clear();
+  Shape.CoroAligns.clear();
   Shape.CoroSuspends.clear();
 
   Shape.FrameTy = nullptr;
@@ -268,6 +269,12 @@
   case Intrinsic::coro_size:
 CoroSizes.push_back(cast(II));
 break;
+  case Intrinsic::coro_align:
+CoroAligns.push_back(cast(II));
+break;
+  case Intrinsic::coro_raw_frame_ptr_offset:
+CoroRawFramePtrOffsets.push_back(cast(II));
+break;
   case Intrinsic::coro_frame:
 CoroFrames.push_back(cast(II));
 break;
@@ -375,6 +382,7 @@
 this->SwitchLowering.ResumeSwitch = nullptr;
 this->SwitchLowering.PromiseAlloca = SwitchId->getPromise();
 this->SwitchLowering.ResumeEntryBlock = nullptr;
+this->SwitchLowering.FramePtrOffset = 0;
 
 for (auto AnySuspend : CoroSuspends) {
   auto Suspend = dyn_cast(AnySuspend);
Index: llvm/lib/Transforms/Coroutines/CoroSplit.cpp
===
--- llvm/lib/Transforms/Coroutines/CoroSplit.cpp
+++ llvm/lib/Transforms/Coroutines/CoroSplit.cpp
@@ -997,23 +997,44 @@
   Shape.AsyncLowering.AsyncFuncPointer->setInitializer(NewFuncPtrStruct);
 }
 
-static void replaceFrameSize(coro::Shape ) {
+static void replaceFrameSizeAndAlign(coro::Shape ) {
   if (Shape.ABI == coro::ABI::Async)
 updateAsyncFuncPointerContextSize(Shape);
 
-  if (Shape.CoroSizes.empty())
-return;
+  if (!Shape.CoroSizes.empty()) {
+// In the same function all coro.sizes should have the same result type.
+auto *SizeIntrin = Shape.CoroSizes.back();
+Module *M = SizeIntrin->getModule();
+const DataLayout  = M->getDataLayout();
+auto Size = DL.getTypeAllocSize(Shape.FrameTy);
+auto *SizeConstant = ConstantInt::get(SizeIntrin->getType(), Size);
+
+for (CoroSizeInst *CS : Shape.CoroSizes)

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-04-29 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen updated this revision to Diff 341591.
ychen added a comment.

- fix a bug.

  ready for review.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-cleanup.cpp
  clang/test/CodeGenCoroutines/coro-gro.cpp
  llvm/docs/Coroutines.rst
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/Transforms/Coroutines/CoroFrame.cpp
  llvm/lib/Transforms/Coroutines/CoroInstr.h
  llvm/lib/Transforms/Coroutines/CoroInternal.h
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp
  llvm/lib/Transforms/Coroutines/Coroutines.cpp
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-04.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-05.ll

Index: llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-05.ll
===
--- llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-05.ll
+++ llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-05.ll
@@ -62,7 +62,7 @@
   call i1 @llvm.coro.end(i8* null, i1 false)
   ret void
 }
-; CHECK:   %a.Frame = type { void (%a.Frame*)*, void (%a.Frame*)*, %"struct.task::promise_type", i1, [14 x i8], %struct.big_structure }
+; CHECK:   %a.Frame = type { void (%a.Frame*)*, void (%a.Frame*)*, %"struct.task::promise_type", i1, i8*, %struct.big_structure }
 ; CHECK-LABEL: @a.resume(
 ; CHECK: %[[A:.*]] = getelementptr inbounds %a.Frame, %a.Frame* %FramePtr, i32 0, i32 3
 ; CHECK: %[[A:.*]] = getelementptr inbounds %a.Frame, %a.Frame* %FramePtr, i32 0, i32 5
Index: llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-04.ll
===
--- llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-04.ll
+++ llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-04.ll
@@ -62,10 +62,10 @@
   call i1 @llvm.coro.end(i8* null, i1 false)
   ret void
 }
-; CHECK:   %a.Frame = type { void (%a.Frame*)*, void (%a.Frame*)*, %"struct.task::promise_type", %struct.big_structure, i1, [26 x i8], %struct.big_structure.2 }
+; CHECK:   %a.Frame = type { void (%a.Frame*)*, void (%a.Frame*)*, %"struct.task::promise_type", %struct.big_structure, i1, i8*, [16 x i8], %struct.big_structure.2 }
 ; CHECK-LABEL: @a.resume(
 ; CHECK: %[[A:.*]] = getelementptr inbounds %a.Frame, %a.Frame* %FramePtr, i32 0, i32 3
-; CHECK: %[[A:.*]] = getelementptr inbounds %a.Frame, %a.Frame* %FramePtr, i32 0, i32 6
+; CHECK: %[[A:.*]] = getelementptr inbounds %a.Frame, %a.Frame* %FramePtr, i32 0, i32 7
 
 declare token @llvm.coro.id(i32, i8* readnone, i8* nocapture readonly, i8*)
 declare i1 @llvm.coro.alloc(token) #3
Index: llvm/lib/Transforms/Coroutines/Coroutines.cpp
===
--- llvm/lib/Transforms/Coroutines/Coroutines.cpp
+++ llvm/lib/Transforms/Coroutines/Coroutines.cpp
@@ -234,6 +234,7 @@
   Shape.CoroBegin = nullptr;
   Shape.CoroEnds.clear();
   Shape.CoroSizes.clear();
+  Shape.CoroAligns.clear();
   Shape.CoroSuspends.clear();
 
   Shape.FrameTy = nullptr;
@@ -268,6 +269,12 @@
   case Intrinsic::coro_size:
 CoroSizes.push_back(cast(II));
 break;
+  case Intrinsic::coro_align:
+CoroAligns.push_back(cast(II));
+break;
+  case Intrinsic::coro_raw_frame_ptr_offset:
+CoroRawFramePtrOffsets.push_back(cast(II));
+break;
   case Intrinsic::coro_frame:
 CoroFrames.push_back(cast(II));
 break;
@@ -375,6 +382,7 @@
 this->SwitchLowering.ResumeSwitch = nullptr;
 this->SwitchLowering.PromiseAlloca = SwitchId->getPromise();
 this->SwitchLowering.ResumeEntryBlock = nullptr;
+this->SwitchLowering.FramePtrOffset = 0;
 
 for (auto AnySuspend : CoroSuspends) {
   auto Suspend = dyn_cast(AnySuspend);
Index: llvm/lib/Transforms/Coroutines/CoroSplit.cpp
===
--- llvm/lib/Transforms/Coroutines/CoroSplit.cpp
+++ llvm/lib/Transforms/Coroutines/CoroSplit.cpp
@@ -997,23 +997,44 @@
   Shape.AsyncLowering.AsyncFuncPointer->setInitializer(NewFuncPtrStruct);
 }
 
-static void replaceFrameSize(coro::Shape ) {
+static void replaceFrameSizeAndAlign(coro::Shape ) {
   if (Shape.ABI == coro::ABI::Async)
 updateAsyncFuncPointerContextSize(Shape);
 
-  if (Shape.CoroSizes.empty())
-return;
+  if (!Shape.CoroSizes.empty()) {
+// In the same function all coro.sizes should have the same result type.
+auto *SizeIntrin = Shape.CoroSizes.back();
+Module *M = SizeIntrin->getModule();
+const DataLayout  = M->getDataLayout();
+auto Size = DL.getTypeAllocSize(Shape.FrameTy);
+auto *SizeConstant = ConstantInt::get(SizeIntrin->getType(), Size);
+
+for (CoroSizeInst

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-04-29 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen planned changes to this revision.
ychen added a comment.

Found a bug. Will fix.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-04-29 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added a comment.

@rjmccall  the patch is on the large side. I'll submit a separate patch for the 
Sema part about searching for two allocators.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-04-29 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen updated this revision to Diff 341418.
ychen added a comment.

- Handle deallocation.
- Fix tests.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-cleanup.cpp
  clang/test/CodeGenCoroutines/coro-gro.cpp
  llvm/docs/Coroutines.rst
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/Transforms/Coroutines/CoroFrame.cpp
  llvm/lib/Transforms/Coroutines/CoroInstr.h
  llvm/lib/Transforms/Coroutines/CoroInternal.h
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp
  llvm/lib/Transforms/Coroutines/Coroutines.cpp
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-04.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-05.ll

Index: llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-05.ll
===
--- llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-05.ll
+++ llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-05.ll
@@ -62,7 +62,7 @@
   call i1 @llvm.coro.end(i8* null, i1 false)
   ret void
 }
-; CHECK:   %a.Frame = type { void (%a.Frame*)*, void (%a.Frame*)*, %"struct.task::promise_type", i1, [14 x i8], %struct.big_structure }
+; CHECK:   %a.Frame = type { void (%a.Frame*)*, void (%a.Frame*)*, %"struct.task::promise_type", i1, i8*, %struct.big_structure }
 ; CHECK-LABEL: @a.resume(
 ; CHECK: %[[A:.*]] = getelementptr inbounds %a.Frame, %a.Frame* %FramePtr, i32 0, i32 3
 ; CHECK: %[[A:.*]] = getelementptr inbounds %a.Frame, %a.Frame* %FramePtr, i32 0, i32 5
Index: llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-04.ll
===
--- llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-04.ll
+++ llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-04.ll
@@ -62,10 +62,10 @@
   call i1 @llvm.coro.end(i8* null, i1 false)
   ret void
 }
-; CHECK:   %a.Frame = type { void (%a.Frame*)*, void (%a.Frame*)*, %"struct.task::promise_type", %struct.big_structure, i1, [26 x i8], %struct.big_structure.2 }
+; CHECK:   %a.Frame = type { void (%a.Frame*)*, void (%a.Frame*)*, %"struct.task::promise_type", %struct.big_structure, i1, i8*, [16 x i8], %struct.big_structure.2 }
 ; CHECK-LABEL: @a.resume(
 ; CHECK: %[[A:.*]] = getelementptr inbounds %a.Frame, %a.Frame* %FramePtr, i32 0, i32 3
-; CHECK: %[[A:.*]] = getelementptr inbounds %a.Frame, %a.Frame* %FramePtr, i32 0, i32 6
+; CHECK: %[[A:.*]] = getelementptr inbounds %a.Frame, %a.Frame* %FramePtr, i32 0, i32 7
 
 declare token @llvm.coro.id(i32, i8* readnone, i8* nocapture readonly, i8*)
 declare i1 @llvm.coro.alloc(token) #3
Index: llvm/lib/Transforms/Coroutines/Coroutines.cpp
===
--- llvm/lib/Transforms/Coroutines/Coroutines.cpp
+++ llvm/lib/Transforms/Coroutines/Coroutines.cpp
@@ -234,6 +234,7 @@
   Shape.CoroBegin = nullptr;
   Shape.CoroEnds.clear();
   Shape.CoroSizes.clear();
+  Shape.CoroAligns.clear();
   Shape.CoroSuspends.clear();
 
   Shape.FrameTy = nullptr;
@@ -268,6 +269,12 @@
   case Intrinsic::coro_size:
 CoroSizes.push_back(cast(II));
 break;
+  case Intrinsic::coro_align:
+CoroAligns.push_back(cast(II));
+break;
+  case Intrinsic::coro_raw_frame_ptr_offset:
+CoroRawFramePtrOffsets.push_back(cast(II));
+break;
   case Intrinsic::coro_frame:
 CoroFrames.push_back(cast(II));
 break;
@@ -375,6 +382,7 @@
 this->SwitchLowering.ResumeSwitch = nullptr;
 this->SwitchLowering.PromiseAlloca = SwitchId->getPromise();
 this->SwitchLowering.ResumeEntryBlock = nullptr;
+this->SwitchLowering.FramePtrOffset = 0;
 
 for (auto AnySuspend : CoroSuspends) {
   auto Suspend = dyn_cast(AnySuspend);
Index: llvm/lib/Transforms/Coroutines/CoroSplit.cpp
===
--- llvm/lib/Transforms/Coroutines/CoroSplit.cpp
+++ llvm/lib/Transforms/Coroutines/CoroSplit.cpp
@@ -997,23 +997,44 @@
   Shape.AsyncLowering.AsyncFuncPointer->setInitializer(NewFuncPtrStruct);
 }
 
-static void replaceFrameSize(coro::Shape ) {
+static void replaceFrameSizeAndAlign(coro::Shape ) {
   if (Shape.ABI == coro::ABI::Async)
 updateAsyncFuncPointerContextSize(Shape);
 
-  if (Shape.CoroSizes.empty())
-return;
+  if (!Shape.CoroSizes.empty()) {
+// In the same function all coro.sizes should have the same result type.
+auto *SizeIntrin = Shape.CoroSizes.back();
+Module *M = SizeIntrin->getModule();
+const DataLayout  = M->getDataLayout();
+auto Size = DL.getTypeAllocSize(Shape.FrameTy);
+auto *SizeConstant = ConstantInt::get(SizeIntrin->getType(), Size);
+
+for

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-04-23 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen abandoned this revision.
ychen added a comment.

Pursue  D100739  instead.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-03-17 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added a comment.

In D97915#2632493 , @lxfind wrote:

> I am not sure how this would work, maybe I am missing something.
> But this patch tries to round up the frame pointer by looking at the 
> difference between the alignment of new and the alignment of the frame.
> The alignment of new only gives you the guaranteed alignment for new, but not 
> necessarily the maximum alignment, e.g. if the alignment of new is 16, the 
> returned pointer can still be a multiple 32. And that difference matters.
>
> Let's consider a frame that only has the two pointers and a promise with 
> alignment requirement of 64. The alignment of new is 16.
> Now you will calculate the difference to be 48, and create a padding of 48 
> before the frame:
> But if the returned pointer from new is actually a multiple of 32 (but not 
> 64), the frame will no longer be aligned to 64 (but (32 + 48) % 64 = 16).

48 is the maximal possible adjustment needed. For this particular case, 
`EmitBuiltinAlignTo` would make the real adjustment 32 since (32 + 32) % 64 == 
0.

> So from what I can tell, if we cannot pass alignment to new, we need to look 
> at the address returned by new dynamically to decide the padding.

Indeed, that's what `EmitBuiltinAlignTo` is for.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-03-17 Thread Xun Li via Phabricator via cfe-commits

lxfind added a comment.

I am not sure how this would work, maybe I am missing something.
But this patch tries to round up the frame pointer by looking at the difference 
between the alignment of new and the alignment of the frame.
The alignment of new only gives you the guaranteed alignment for new, but not 
necessarily the maximum alignment, e.g. if the alignment of new is 16, the 
returned pointer can still be a multiple 32. And that difference matters.

Let's consider a frame that only has the two pointers and a promise with 
alignment requirement of 64. The alignment of new is 16.
Now you will calculate the difference to be 48, and create a padding of 48 
before the frame:
But if the returned pointer from new is actually a multiple of 32 (but not 64), 
the frame will no longer be aligned to 64 (but (32 + 48) % 64 = 16).
So from what I can tell, if we cannot pass alignment to new, we need to look at 
the address returned by new dynamically to decide the padding.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-03-07 Thread Lewis Baker via Phabricator via cfe-commits

lewissbaker added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:4450
+Value *Extra = Builder.CreateSelect(Cmp, Diff, Zero);
+return RValue::get(Builder.CreateAdd(SizeCall, Extra));
   }

rjmccall wrote:
> lewissbaker wrote:
> > ychen wrote:
> > > rjmccall wrote:
> > > > ychen wrote:
> > > > > ychen wrote:
> > > > > > rjmccall wrote:
> > > > > > > Okay, so you're implicitly increasing the coroutine size to allow 
> > > > > > > you to round up to get an aligned frame.  But how do you round 
> > > > > > > back down to get the actual pointer that you need to delete?  
> > > > > > > This just doesn't work.
> > > > > > > 
> > > > > > > You really ought to just be using the aligned `operator new` 
> > > > > > > instead when the required alignment is too high.  If that means 
> > > > > > > checking the alignment "dynamically" before calling `operator 
> > > > > > > new` / `operator delete`, so be it.  In practice, it will not be 
> > > > > > > dynamic because lowering will replace the `coro.align` call with 
> > > > > > > a constant, at which point the branch will be foldable.
> > > > > > > 
> > > > > > > I don't know what to suggest if the aligned `operator new` isn't 
> > > > > > > reliably available on the target OS.  You could outline a 
> > > > > > > function to pick the best allocator/deallocator, I suppose.
> > > > > > Thanks for the review!
> > > > > > 
> > > > > > > Okay, so you're implicitly increasing the coroutine size to allow 
> > > > > > > you to round up to get an aligned frame. But how do you round 
> > > > > > > back down to get the actual pointer that you need to delete? This 
> > > > > > > just doesn't work.
> > > > > > 
> > > > > > Hmm, you're right that I missed the `delete` part, thanks. The 
> > > > > > adjusted amount is constant, I could just dial it down in 
> > > > > > `coro.free`, right?
> > > > > > 
> > > > > > >  You really ought to just be using the aligned operator new 
> > > > > > > instead when the required alignment is too high. If that means 
> > > > > > > checking the alignment "dynamically" before calling operator new 
> > > > > > > / operator delete, so be it. In practice, it will not be dynamic 
> > > > > > > because lowering will replace the coro.align call with a 
> > > > > > > constant, at which point the branch will be foldable.
> > > > > >  
> > > > > > That's my intuition at first. But spec is written in a way 
> > > > > > suggesting (IMHO) that the aligned version should not be used? What 
> > > > > > if the user specify their own allocator, then which one they should 
> > > > > > use?
> > > > > Sorry, I meant the adjusted amount is coro.align - std::max_align_t,  
> > > > > I could subtract it in `coro.free` . I think it should work?
> > > > No, because the adjustment you have to do in `coro.alloc` isn't just an 
> > > > addition, it's an addition plus a mask, which isn't reversible.  
> > > > Suppose the frame needs to be 32-byte-aligned, but the allocator only 
> > > > promises 8-byte alignment.  The problem is that when you go to free a 
> > > > frame pointer, and you see that it's 32-byte-aligned (which, again, it 
> > > > always will be), the pointer you got from the allocator might be the 
> > > > frame pointer minus any of 8, 16, or 24 — or it might be exactly the 
> > > > same.  The only way to reverse that is to store some sort of cookie, 
> > > > either the amount to subtract or even just the original pointer.
> > > > 
> > > > Now, if you could change the entire coroutine ABI, you could make the 
> > > > frame handle that you pass around be the unadjusted pointer and then 
> > > > just repeat the adjustment every time you enter the coroutine.  But 
> > > > that doesn't work because the ABI relies on things like the promise 
> > > > being at a reliable offset from the frame handle.
> > > > 
> > > > I think the best solution would be to figure out a way to use an 
> > > > aligned allocator, which at worst does this in a more systematic way 
> > > > and at best can actually just satisfy your requirement directly without 
> > > > any overhead.  If you can't do that, adding an offset to the frame 
> > > > would be best; if you can't do that, doing it as a cookie is okay.
> > > > 
> > > > > That's my intuition at first. But spec is written in a way suggesting 
> > > > > (IMHO) that the aligned version should not be used? What if the user 
> > > > > specify their own allocator, then which one they should use?
> > > > 
> > > > It seems like a spec bug that this doesn't use aligned allocators even 
> > > > when they're available.  If there's an aligned allocator available, I 
> > > > think this should essentially do dynamically what it would normally do 
> > > > statically, i.e.:
> > > > 
> > > > ```
> > > > void *allocation = alignment > __STDCPP_DEFAULT_NEW_ALIGNMENT__ ? 
> > > > operator new(size, align_val_t(alignment)) : operator new(size);
> > > > ```
> > > > 
> > > > This would ODR-use both

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-03-05 Thread John McCall via Phabricator via cfe-commits

rjmccall added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:4450
+Value *Extra = Builder.CreateSelect(Cmp, Diff, Zero);
+return RValue::get(Builder.CreateAdd(SizeCall, Extra));
   }

lewissbaker wrote:
> ychen wrote:
> > rjmccall wrote:
> > > ychen wrote:
> > > > ychen wrote:
> > > > > rjmccall wrote:
> > > > > > Okay, so you're implicitly increasing the coroutine size to allow 
> > > > > > you to round up to get an aligned frame.  But how do you round back 
> > > > > > down to get the actual pointer that you need to delete?  This just 
> > > > > > doesn't work.
> > > > > > 
> > > > > > You really ought to just be using the aligned `operator new` 
> > > > > > instead when the required alignment is too high.  If that means 
> > > > > > checking the alignment "dynamically" before calling `operator new` 
> > > > > > / `operator delete`, so be it.  In practice, it will not be dynamic 
> > > > > > because lowering will replace the `coro.align` call with a 
> > > > > > constant, at which point the branch will be foldable.
> > > > > > 
> > > > > > I don't know what to suggest if the aligned `operator new` isn't 
> > > > > > reliably available on the target OS.  You could outline a function 
> > > > > > to pick the best allocator/deallocator, I suppose.
> > > > > Thanks for the review!
> > > > > 
> > > > > > Okay, so you're implicitly increasing the coroutine size to allow 
> > > > > > you to round up to get an aligned frame. But how do you round back 
> > > > > > down to get the actual pointer that you need to delete? This just 
> > > > > > doesn't work.
> > > > > 
> > > > > Hmm, you're right that I missed the `delete` part, thanks. The 
> > > > > adjusted amount is constant, I could just dial it down in 
> > > > > `coro.free`, right?
> > > > > 
> > > > > >  You really ought to just be using the aligned operator new instead 
> > > > > > when the required alignment is too high. If that means checking the 
> > > > > > alignment "dynamically" before calling operator new / operator 
> > > > > > delete, so be it. In practice, it will not be dynamic because 
> > > > > > lowering will replace the coro.align call with a constant, at which 
> > > > > > point the branch will be foldable.
> > > > >  
> > > > > That's my intuition at first. But spec is written in a way suggesting 
> > > > > (IMHO) that the aligned version should not be used? What if the user 
> > > > > specify their own allocator, then which one they should use?
> > > > Sorry, I meant the adjusted amount is coro.align - std::max_align_t,  I 
> > > > could subtract it in `coro.free` . I think it should work?
> > > No, because the adjustment you have to do in `coro.alloc` isn't just an 
> > > addition, it's an addition plus a mask, which isn't reversible.  Suppose 
> > > the frame needs to be 32-byte-aligned, but the allocator only promises 
> > > 8-byte alignment.  The problem is that when you go to free a frame 
> > > pointer, and you see that it's 32-byte-aligned (which, again, it always 
> > > will be), the pointer you got from the allocator might be the frame 
> > > pointer minus any of 8, 16, or 24 — or it might be exactly the same.  The 
> > > only way to reverse that is to store some sort of cookie, either the 
> > > amount to subtract or even just the original pointer.
> > > 
> > > Now, if you could change the entire coroutine ABI, you could make the 
> > > frame handle that you pass around be the unadjusted pointer and then just 
> > > repeat the adjustment every time you enter the coroutine.  But that 
> > > doesn't work because the ABI relies on things like the promise being at a 
> > > reliable offset from the frame handle.
> > > 
> > > I think the best solution would be to figure out a way to use an aligned 
> > > allocator, which at worst does this in a more systematic way and at best 
> > > can actually just satisfy your requirement directly without any overhead. 
> > >  If you can't do that, adding an offset to the frame would be best; if 
> > > you can't do that, doing it as a cookie is okay.
> > > 
> > > > That's my intuition at first. But spec is written in a way suggesting 
> > > > (IMHO) that the aligned version should not be used? What if the user 
> > > > specify their own allocator, then which one they should use?
> > > 
> > > It seems like a spec bug that this doesn't use aligned allocators even 
> > > when they're available.  If there's an aligned allocator available, I 
> > > think this should essentially do dynamically what it would normally do 
> > > statically, i.e.:
> > > 
> > > ```
> > > void *allocation = alignment > __STDCPP_DEFAULT_NEW_ALIGNMENT__ ? 
> > > operator new(size, align_val_t(alignment)) : operator new(size);
> > > ```
> > > 
> > > This would ODR-use both allocation functions, of course.
> > > 
> > > Maybe it's right to do this cookie thing if we can't rely on an aligned 
> > > allocator, like if the promise class provides only an `operator 
> > >

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-03-05 Thread Lewis Baker via Phabricator via cfe-commits

lewissbaker added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:4450
+Value *Extra = Builder.CreateSelect(Cmp, Diff, Zero);
+return RValue::get(Builder.CreateAdd(SizeCall, Extra));
   }

ychen wrote:
> rjmccall wrote:
> > ychen wrote:
> > > ychen wrote:
> > > > rjmccall wrote:
> > > > > Okay, so you're implicitly increasing the coroutine size to allow you 
> > > > > to round up to get an aligned frame.  But how do you round back down 
> > > > > to get the actual pointer that you need to delete?  This just doesn't 
> > > > > work.
> > > > > 
> > > > > You really ought to just be using the aligned `operator new` instead 
> > > > > when the required alignment is too high.  If that means checking the 
> > > > > alignment "dynamically" before calling `operator new` / `operator 
> > > > > delete`, so be it.  In practice, it will not be dynamic because 
> > > > > lowering will replace the `coro.align` call with a constant, at which 
> > > > > point the branch will be foldable.
> > > > > 
> > > > > I don't know what to suggest if the aligned `operator new` isn't 
> > > > > reliably available on the target OS.  You could outline a function to 
> > > > > pick the best allocator/deallocator, I suppose.
> > > > Thanks for the review!
> > > > 
> > > > > Okay, so you're implicitly increasing the coroutine size to allow you 
> > > > > to round up to get an aligned frame. But how do you round back down 
> > > > > to get the actual pointer that you need to delete? This just doesn't 
> > > > > work.
> > > > 
> > > > Hmm, you're right that I missed the `delete` part, thanks. The adjusted 
> > > > amount is constant, I could just dial it down in `coro.free`, right?
> > > > 
> > > > >  You really ought to just be using the aligned operator new instead 
> > > > > when the required alignment is too high. If that means checking the 
> > > > > alignment "dynamically" before calling operator new / operator 
> > > > > delete, so be it. In practice, it will not be dynamic because 
> > > > > lowering will replace the coro.align call with a constant, at which 
> > > > > point the branch will be foldable.
> > > >  
> > > > That's my intuition at first. But spec is written in a way suggesting 
> > > > (IMHO) that the aligned version should not be used? What if the user 
> > > > specify their own allocator, then which one they should use?
> > > Sorry, I meant the adjusted amount is coro.align - std::max_align_t,  I 
> > > could subtract it in `coro.free` . I think it should work?
> > No, because the adjustment you have to do in `coro.alloc` isn't just an 
> > addition, it's an addition plus a mask, which isn't reversible.  Suppose 
> > the frame needs to be 32-byte-aligned, but the allocator only promises 
> > 8-byte alignment.  The problem is that when you go to free a frame pointer, 
> > and you see that it's 32-byte-aligned (which, again, it always will be), 
> > the pointer you got from the allocator might be the frame pointer minus any 
> > of 8, 16, or 24 — or it might be exactly the same.  The only way to reverse 
> > that is to store some sort of cookie, either the amount to subtract or even 
> > just the original pointer.
> > 
> > Now, if you could change the entire coroutine ABI, you could make the frame 
> > handle that you pass around be the unadjusted pointer and then just repeat 
> > the adjustment every time you enter the coroutine.  But that doesn't work 
> > because the ABI relies on things like the promise being at a reliable 
> > offset from the frame handle.
> > 
> > I think the best solution would be to figure out a way to use an aligned 
> > allocator, which at worst does this in a more systematic way and at best 
> > can actually just satisfy your requirement directly without any overhead.  
> > If you can't do that, adding an offset to the frame would be best; if you 
> > can't do that, doing it as a cookie is okay.
> > 
> > > That's my intuition at first. But spec is written in a way suggesting 
> > > (IMHO) that the aligned version should not be used? What if the user 
> > > specify their own allocator, then which one they should use?
> > 
> > It seems like a spec bug that this doesn't use aligned allocators even when 
> > they're available.  If there's an aligned allocator available, I think this 
> > should essentially do dynamically what it would normally do statically, 
> > i.e.:
> > 
> > ```
> > void *allocation = alignment > __STDCPP_DEFAULT_NEW_ALIGNMENT__ ? operator 
> > new(size, align_val_t(alignment)) : operator new(size);
> > ```
> > 
> > This would ODR-use both allocation functions, of course.
> > 
> > Maybe it's right to do this cookie thing if we can't rely on an aligned 
> > allocator, like if the promise class provides only an `operator 
> > new(size_t)`.
> > No, because the adjustment you have to do in `coro.alloc` isn't just an 
> > addition, it's an addition plus a mask, which isn't reversible.  Suppose 
> > the frame needs to be

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-03-05 Thread John McCall via Phabricator via cfe-commits

rjmccall added a comment.

In D97915#2607567 , @ychen wrote:

> In D97915#2607338 , @rjmccall wrote:
>
>> Let's try to avoid adding a new builtin for what we acknowledge is a 
>> workaround.  Builtins become part of the language supported by the compiler, 
>> so we shouldn't add them casually.
>
> If we're going to use the aligned new in the future, do we still need this 
> builtin, or something else is preferred?

Oh, sorry, for some reason I got the impression from the patch that we were 
adding a new Clang-level builtin.  Adding a new LLVM intrinsic seems reasonable 
to me.

In any case, I don't think we should expose BuiltinAlignArgs outside of 
CGBuiltin.cpp.  Seems like at most we need to add a convenience function on 
CGBuilderTy to do a pointer round-up-to-alignment operation.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-03-05 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added a comment.

In D97915#2607338 , @rjmccall wrote:

> Let's try to avoid adding a new builtin for what we acknowledge is a 
> workaround.  Builtins become part of the language supported by the compiler, 
> so we shouldn't add them casually.

If we're going to use the aligned new in the future, do we still need this 
builtin, or something else is preferred?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-03-05 Thread John McCall via Phabricator via cfe-commits

rjmccall added a comment.

Let's try to avoid adding a new builtin for what we acknowledge is a 
workaround.  Builtins become part of the language supported by the compiler, so 
we shouldn't add them casually.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-03-05 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added a comment.

In D97915#2605398 , @ChuanqiXu wrote:

> I am a little confusing about the problem. The example in the link tells the 
> align of the `promise` instead of the `frame`. The address of `promise` and 
> `frame` is not same. It looks like you're trying to do:
>
>   +   +---+
>   |   |   |
>   +---+  frame|
>   | pedding   |   |
>   +   +---+
>   ^
>   |
>   |
>   |
>   |
>   |
>   +
>   
> The address of frame matches the offset of promise.
>
> However, what we should do is:
>
>   +   +---+
>   |   |   +--+|
>   +---+frame  | promise  ||
>   | pedding   |   <--+|
>   +   +---+
>   ^   |
>   |   |
>   |   |
>   |   |
>   |   +
>   |   This is what we really want
>   +
>   
> The address of frame matches the offset of promise.
>
> If I get the problem problems, I think we can handle this problem in the 
> middle end if the information for the promise remains.

Not sure I follow. Inside the frame, the promise is in its desired position. It 
is not properly aligned because the frame start address is underaligned - 
`malloc` usually only returns 16 bytes aligned memory whereas `alignas` could 
make the preferred alignment larger than that.




Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16756
 };
-} // namespace
 

ChuanqiXu wrote:
> Why we remove the anonymous namespace here?
I added a common/helper function that takes `BuiltinAlignArgs` as an argument. 
Need to move it out of the anonymous namespace to forward declare it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-03-04 Thread Chuanqi Xu via Phabricator via cfe-commits

ChuanqiXu added a comment.

I am a little confusing about the problem. The example in the link tells the 
align of the `promise` instead of the `frame`. The address of `promise` and 
`frame` is not same. It looks like you're trying to do:

  +   +---+
  |   |   |
  +---+  frame|
  | pedding   |   |
  +   +---+
  ^
  |
  |
  |
  |
  |
  +
  
The address of frame matches the offset of promise.

However, what we should do is:

  +   +---+
  |   |   +--+|
  +---+frame  | promise  ||
  | pedding   |   <--+|
  +   +---+
  ^   |
  |   |
  |   |
  |   |
  |   +
  |   This is what we really want
  +
  
The address of frame matches the offset of promise.

If I get the problem problems, I think we can handle this problem in the middle 
end if the information for the promise remains.




Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16756
 };
-} // namespace
 

Why we remove the anonymous namespace here?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-03-04 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:4450
+Value *Extra = Builder.CreateSelect(Cmp, Diff, Zero);
+return RValue::get(Builder.CreateAdd(SizeCall, Extra));
   }

rjmccall wrote:
> ychen wrote:
> > ychen wrote:
> > > rjmccall wrote:
> > > > Okay, so you're implicitly increasing the coroutine size to allow you 
> > > > to round up to get an aligned frame.  But how do you round back down to 
> > > > get the actual pointer that you need to delete?  This just doesn't work.
> > > > 
> > > > You really ought to just be using the aligned `operator new` instead 
> > > > when the required alignment is too high.  If that means checking the 
> > > > alignment "dynamically" before calling `operator new` / `operator 
> > > > delete`, so be it.  In practice, it will not be dynamic because 
> > > > lowering will replace the `coro.align` call with a constant, at which 
> > > > point the branch will be foldable.
> > > > 
> > > > I don't know what to suggest if the aligned `operator new` isn't 
> > > > reliably available on the target OS.  You could outline a function to 
> > > > pick the best allocator/deallocator, I suppose.
> > > Thanks for the review!
> > > 
> > > > Okay, so you're implicitly increasing the coroutine size to allow you 
> > > > to round up to get an aligned frame. But how do you round back down to 
> > > > get the actual pointer that you need to delete? This just doesn't work.
> > > 
> > > Hmm, you're right that I missed the `delete` part, thanks. The adjusted 
> > > amount is constant, I could just dial it down in `coro.free`, right?
> > > 
> > > >  You really ought to just be using the aligned operator new instead 
> > > > when the required alignment is too high. If that means checking the 
> > > > alignment "dynamically" before calling operator new / operator delete, 
> > > > so be it. In practice, it will not be dynamic because lowering will 
> > > > replace the coro.align call with a constant, at which point the branch 
> > > > will be foldable.
> > >  
> > > That's my intuition at first. But spec is written in a way suggesting 
> > > (IMHO) that the aligned version should not be used? What if the user 
> > > specify their own allocator, then which one they should use?
> > Sorry, I meant the adjusted amount is coro.align - std::max_align_t,  I 
> > could subtract it in `coro.free` . I think it should work?
> No, because the adjustment you have to do in `coro.alloc` isn't just an 
> addition, it's an addition plus a mask, which isn't reversible.  Suppose the 
> frame needs to be 32-byte-aligned, but the allocator only promises 8-byte 
> alignment.  The problem is that when you go to free a frame pointer, and you 
> see that it's 32-byte-aligned (which, again, it always will be), the pointer 
> you got from the allocator might be the frame pointer minus any of 8, 16, or 
> 24 — or it might be exactly the same.  The only way to reverse that is to 
> store some sort of cookie, either the amount to subtract or even just the 
> original pointer.
> 
> Now, if you could change the entire coroutine ABI, you could make the frame 
> handle that you pass around be the unadjusted pointer and then just repeat 
> the adjustment every time you enter the coroutine.  But that doesn't work 
> because the ABI relies on things like the promise being at a reliable offset 
> from the frame handle.
> 
> I think the best solution would be to figure out a way to use an aligned 
> allocator, which at worst does this in a more systematic way and at best can 
> actually just satisfy your requirement directly without any overhead.  If you 
> can't do that, adding an offset to the frame would be best; if you can't do 
> that, doing it as a cookie is okay.
> 
> > That's my intuition at first. But spec is written in a way suggesting 
> > (IMHO) that the aligned version should not be used? What if the user 
> > specify their own allocator, then which one they should use?
> 
> It seems like a spec bug that this doesn't use aligned allocators even when 
> they're available.  If there's an aligned allocator available, I think this 
> should essentially do dynamically what it would normally do statically, i.e.:
> 
> ```
> void *allocation = alignment > __STDCPP_DEFAULT_NEW_ALIGNMENT__ ? operator 
> new(size, align_val_t(alignment)) : operator new(size);
> ```
> 
> This would ODR-use both allocation functions, of course.
> 
> Maybe it's right to do this cookie thing if we can't rely on an aligned 
> allocator, like if the promise class provides only an `operator new(size_t)`.
> No, because the adjustment you have to do in `coro.alloc` isn't just an 
> addition, it's an addition plus a mask, which isn't reversible.  Suppose the 
> frame needs to be 32-byte-aligned, but the allocator only promises 8-byte 
> alignment.  The problem is that when you go to free a frame pointer, and you 
> see that it's 32-byte-aligned (which, again, it always will be), the pointer 
>

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-03-04 Thread John McCall via Phabricator via cfe-commits

rjmccall added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:4450
+Value *Extra = Builder.CreateSelect(Cmp, Diff, Zero);
+return RValue::get(Builder.CreateAdd(SizeCall, Extra));
   }

ychen wrote:
> ychen wrote:
> > rjmccall wrote:
> > > Okay, so you're implicitly increasing the coroutine size to allow you to 
> > > round up to get an aligned frame.  But how do you round back down to get 
> > > the actual pointer that you need to delete?  This just doesn't work.
> > > 
> > > You really ought to just be using the aligned `operator new` instead when 
> > > the required alignment is too high.  If that means checking the alignment 
> > > "dynamically" before calling `operator new` / `operator delete`, so be 
> > > it.  In practice, it will not be dynamic because lowering will replace 
> > > the `coro.align` call with a constant, at which point the branch will be 
> > > foldable.
> > > 
> > > I don't know what to suggest if the aligned `operator new` isn't reliably 
> > > available on the target OS.  You could outline a function to pick the 
> > > best allocator/deallocator, I suppose.
> > Thanks for the review!
> > 
> > > Okay, so you're implicitly increasing the coroutine size to allow you to 
> > > round up to get an aligned frame. But how do you round back down to get 
> > > the actual pointer that you need to delete? This just doesn't work.
> > 
> > Hmm, you're right that I missed the `delete` part, thanks. The adjusted 
> > amount is constant, I could just dial it down in `coro.free`, right?
> > 
> > >  You really ought to just be using the aligned operator new instead when 
> > > the required alignment is too high. If that means checking the alignment 
> > > "dynamically" before calling operator new / operator delete, so be it. In 
> > > practice, it will not be dynamic because lowering will replace the 
> > > coro.align call with a constant, at which point the branch will be 
> > > foldable.
> >  
> > That's my intuition at first. But spec is written in a way suggesting 
> > (IMHO) that the aligned version should not be used? What if the user 
> > specify their own allocator, then which one they should use?
> Sorry, I meant the adjusted amount is coro.align - std::max_align_t,  I could 
> subtract it in `coro.free` . I think it should work?
No, because the adjustment you have to do in `coro.alloc` isn't just an 
addition, it's an addition plus a mask, which isn't reversible.  Suppose the 
frame needs to be 32-byte-aligned, but the allocator only promises 8-byte 
alignment.  The problem is that when you go to free a frame pointer, and you 
see that it's 32-byte-aligned (which, again, it always will be), the pointer 
you got from the allocator might be the frame pointer minus any of 8, 16, or 24 
— or it might be exactly the same.  The only way to reverse that is to store 
some sort of cookie, either the amount to subtract or even just the original 
pointer.

Now, if you could change the entire coroutine ABI, you could make the frame 
handle that you pass around be the unadjusted pointer and then just repeat the 
adjustment every time you enter the coroutine.  But that doesn't work because 
the ABI relies on things like the promise being at a reliable offset from the 
frame handle.

I think the best solution would be to figure out a way to use an aligned 
allocator, which at worst does this in a more systematic way and at best can 
actually just satisfy your requirement directly without any overhead.  If you 
can't do that, adding an offset to the frame would be best; if you can't do 
that, doing it as a cookie is okay.

> That's my intuition at first. But spec is written in a way suggesting (IMHO) 
> that the aligned version should not be used? What if the user specify their 
> own allocator, then which one they should use?

It seems like a spec bug that this doesn't use aligned allocators even when 
they're available.  If there's an aligned allocator available, I think this 
should essentially do dynamically what it would normally do statically, i.e.:

```
void *allocation = alignment > __STDCPP_DEFAULT_NEW_ALIGNMENT__ ? operator 
new(size, align_val_t(alignment)) : operator new(size);
```

This would ODR-use both allocation functions, of course.

Maybe it's right to do this cookie thing if we can't rely on an aligned 
allocator, like if the promise class provides only an `operator new(size_t)`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-03-04 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:4450
+Value *Extra = Builder.CreateSelect(Cmp, Diff, Zero);
+return RValue::get(Builder.CreateAdd(SizeCall, Extra));
   }

ychen wrote:
> rjmccall wrote:
> > Okay, so you're implicitly increasing the coroutine size to allow you to 
> > round up to get an aligned frame.  But how do you round back down to get 
> > the actual pointer that you need to delete?  This just doesn't work.
> > 
> > You really ought to just be using the aligned `operator new` instead when 
> > the required alignment is too high.  If that means checking the alignment 
> > "dynamically" before calling `operator new` / `operator delete`, so be it.  
> > In practice, it will not be dynamic because lowering will replace the 
> > `coro.align` call with a constant, at which point the branch will be 
> > foldable.
> > 
> > I don't know what to suggest if the aligned `operator new` isn't reliably 
> > available on the target OS.  You could outline a function to pick the best 
> > allocator/deallocator, I suppose.
> Thanks for the review!
> 
> > Okay, so you're implicitly increasing the coroutine size to allow you to 
> > round up to get an aligned frame. But how do you round back down to get the 
> > actual pointer that you need to delete? This just doesn't work.
> 
> Hmm, you're right that I missed the `delete` part, thanks. The adjusted 
> amount is constant, I could just dial it down in `coro.free`, right?
> 
> >  You really ought to just be using the aligned operator new instead when 
> > the required alignment is too high. If that means checking the alignment 
> > "dynamically" before calling operator new / operator delete, so be it. In 
> > practice, it will not be dynamic because lowering will replace the 
> > coro.align call with a constant, at which point the branch will be foldable.
>  
> That's my intuition at first. But spec is written in a way suggesting (IMHO) 
> that the aligned version should not be used? What if the user specify their 
> own allocator, then which one they should use?
Sorry, I meant the adjusted amount is coro.align - std::max_align_t,  I could 
subtract it in `coro.free` . I think it should work?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-03-04 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:4450
+Value *Extra = Builder.CreateSelect(Cmp, Diff, Zero);
+return RValue::get(Builder.CreateAdd(SizeCall, Extra));
   }

rjmccall wrote:
> Okay, so you're implicitly increasing the coroutine size to allow you to 
> round up to get an aligned frame.  But how do you round back down to get the 
> actual pointer that you need to delete?  This just doesn't work.
> 
> You really ought to just be using the aligned `operator new` instead when the 
> required alignment is too high.  If that means checking the alignment 
> "dynamically" before calling `operator new` / `operator delete`, so be it.  
> In practice, it will not be dynamic because lowering will replace the 
> `coro.align` call with a constant, at which point the branch will be foldable.
> 
> I don't know what to suggest if the aligned `operator new` isn't reliably 
> available on the target OS.  You could outline a function to pick the best 
> allocator/deallocator, I suppose.
Thanks for the review!

> Okay, so you're implicitly increasing the coroutine size to allow you to 
> round up to get an aligned frame. But how do you round back down to get the 
> actual pointer that you need to delete? This just doesn't work.

Hmm, you're right that I missed the `delete` part, thanks. The adjusted amount 
is constant, I could just dial it down in `coro.free`, right?

>  You really ought to just be using the aligned operator new instead when the 
> required alignment is too high. If that means checking the alignment 
> "dynamically" before calling operator new / operator delete, so be it. In 
> practice, it will not be dynamic because lowering will replace the coro.align 
> call with a constant, at which point the branch will be foldable.
 
That's my intuition at first. But spec is written in a way suggesting (IMHO) 
that the aligned version should not be used? What if the user specify their own 
allocator, then which one they should use?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-03-04 Thread John McCall via Phabricator via cfe-commits

rjmccall added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:4450
+Value *Extra = Builder.CreateSelect(Cmp, Diff, Zero);
+return RValue::get(Builder.CreateAdd(SizeCall, Extra));
   }

Okay, so you're implicitly increasing the coroutine size to allow you to round 
up to get an aligned frame.  But how do you round back down to get the actual 
pointer that you need to delete?  This just doesn't work.

You really ought to just be using the aligned `operator new` instead when the 
required alignment is too high.  If that means checking the alignment 
"dynamically" before calling `operator new` / `operator delete`, so be it.  In 
practice, it will not be dynamic because lowering will replace the `coro.align` 
call with a constant, at which point the branch will be foldable.

I don't know what to suggest if the aligned `operator new` isn't reliably 
available on the target OS.  You could outline a function to pick the best 
allocator/deallocator, I suppose.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-03-04 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added a comment.

In D97915#2603912 , @lxfind wrote:

> Could you describe in more detail what problem this patch solves?

Yes, updated description.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-03-04 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen updated this revision to Diff 328232.
ychen edited the summary of this revision.
ychen added a comment.

- Add docs for `coro.align`
- clang-format


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-builtins.c
  clang/test/CodeGenCoroutines/coro-gro.cpp
  llvm/docs/Coroutines.rst
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/Transforms/Coroutines/CoroInstr.h
  llvm/lib/Transforms/Coroutines/CoroInternal.h
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp
  llvm/lib/Transforms/Coroutines/Coroutines.cpp

Index: llvm/lib/Transforms/Coroutines/Coroutines.cpp
===
--- llvm/lib/Transforms/Coroutines/Coroutines.cpp
+++ llvm/lib/Transforms/Coroutines/Coroutines.cpp
@@ -234,6 +234,7 @@
   Shape.CoroBegin = nullptr;
   Shape.CoroEnds.clear();
   Shape.CoroSizes.clear();
+  Shape.CoroAligns.clear();
   Shape.CoroSuspends.clear();
 
   Shape.FrameTy = nullptr;
@@ -268,6 +269,9 @@
   case Intrinsic::coro_size:
 CoroSizes.push_back(cast(II));
 break;
+  case Intrinsic::coro_align:
+CoroAligns.push_back(cast(II));
+break;
   case Intrinsic::coro_frame:
 CoroFrames.push_back(cast(II));
 break;
Index: llvm/lib/Transforms/Coroutines/CoroSplit.cpp
===
--- llvm/lib/Transforms/Coroutines/CoroSplit.cpp
+++ llvm/lib/Transforms/Coroutines/CoroSplit.cpp
@@ -997,23 +997,33 @@
   Shape.AsyncLowering.AsyncFuncPointer->setInitializer(NewFuncPtrStruct);
 }
 
-static void replaceFrameSize(coro::Shape ) {
+static void replaceFrameSizeAndAlign(coro::Shape ) {
   if (Shape.ABI == coro::ABI::Async)
 updateAsyncFuncPointerContextSize(Shape);
 
-  if (Shape.CoroSizes.empty())
-return;
+  if (!Shape.CoroSizes.empty()) {
+// In the same function all coro.sizes should have the same result type.
+auto *SizeIntrin = Shape.CoroSizes.back();
+Module *M = SizeIntrin->getModule();
+const DataLayout  = M->getDataLayout();
+auto Size = DL.getTypeAllocSize(Shape.FrameTy);
+auto *SizeConstant = ConstantInt::get(SizeIntrin->getType(), Size);
+
+for (CoroSizeInst *CS : Shape.CoroSizes) {
+  CS->replaceAllUsesWith(SizeConstant);
+  CS->eraseFromParent();
+}
+  }
 
-  // In the same function all coro.sizes should have the same result type.
-  auto *SizeIntrin = Shape.CoroSizes.back();
-  Module *M = SizeIntrin->getModule();
-  const DataLayout  = M->getDataLayout();
-  auto Size = DL.getTypeAllocSize(Shape.FrameTy);
-  auto *SizeConstant = ConstantInt::get(SizeIntrin->getType(), Size);
+  if (!Shape.CoroAligns.empty()) {
+auto *Intrin = Shape.CoroAligns.back();
+auto *AlignConstant =
+ConstantInt::get(Intrin->getType(), Shape.FrameAlign.value());
 
-  for (CoroSizeInst *CS : Shape.CoroSizes) {
-CS->replaceAllUsesWith(SizeConstant);
-CS->eraseFromParent();
+for (CoroAlignInst *CS : Shape.CoroAligns) {
+  CS->replaceAllUsesWith(AlignConstant);
+  CS->eraseFromParent();
+}
   }
 }
 
@@ -1748,7 +1758,7 @@
 
   simplifySuspendPoints(Shape);
   buildCoroutineFrame(F, Shape);
-  replaceFrameSize(Shape);
+  replaceFrameSizeAndAlign(Shape);
 
   // If there are no suspend points, no split required, just remove
   // the allocation and deallocation blocks, they are not needed.
Index: llvm/lib/Transforms/Coroutines/CoroInternal.h
===
--- llvm/lib/Transforms/Coroutines/CoroInternal.h
+++ llvm/lib/Transforms/Coroutines/CoroInternal.h
@@ -99,6 +99,7 @@
   CoroBeginInst *CoroBegin;
   SmallVector CoroEnds;
   SmallVector CoroSizes;
+  SmallVector CoroAligns;
   SmallVector CoroSuspends;
   SmallVector SwiftErrorOps;
 
Index: llvm/lib/Transforms/Coroutines/CoroInstr.h
===
--- llvm/lib/Transforms/Coroutines/CoroInstr.h
+++ llvm/lib/Transforms/Coroutines/CoroInstr.h
@@ -599,6 +599,18 @@
   }
 };
 
+/// This represents the llvm.coro.align instruction.
+class LLVM_LIBRARY_VISIBILITY CoroAlignInst : public IntrinsicInst {
+public:
+  // Methods to support type inquiry through isa, cast, and dyn_cast:
+  static bool classof(const IntrinsicInst *I) {
+return I->getIntrinsicID() == Intrinsic::coro_align;
+  }
+  static bool classof(const Value *V) {
+return isa(V) && classof(cast(V));
+  }
+};
+
 class LLVM_LIBRARY_VISIBILITY AnyCoroEndInst : public IntrinsicInst {
   enum { FrameArg, UnwindArg };
 
Index: llvm/include/llvm/IR/Intrinsics.td
===
--- llvm/include/llvm/IR/Intrinsics.td
+++ llvm/include/llvm/IR/Intrinsics.td
@@

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-03-04 Thread Xun Li via Phabricator via cfe-commits

lxfind added a comment.

Could you describe in more detail what problem this patch solves?
Also, since you are adding a new intrinsics, please also update the coroutine 
documentation regarding this new intrinsics.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-03-03 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen added a comment.

I didn't add any new tests. Instead, added checks for the new code sequence: 1. 
check that `llvm.assume` is there to make sure the adjusted address is 
correctly aligned. 2. check that `llvm.coro.align` and the associated 
instructions are there to over-allocate heap memory.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-03-03 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen updated this revision to Diff 328036.
ychen added a comment.
Herald added a subscriber: jdoerfert.

- rebase


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-builtins.c
  clang/test/CodeGenCoroutines/coro-gro.cpp
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/Transforms/Coroutines/CoroInstr.h
  llvm/lib/Transforms/Coroutines/CoroInternal.h
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp
  llvm/lib/Transforms/Coroutines/Coroutines.cpp

Index: llvm/lib/Transforms/Coroutines/Coroutines.cpp
===
--- llvm/lib/Transforms/Coroutines/Coroutines.cpp
+++ llvm/lib/Transforms/Coroutines/Coroutines.cpp
@@ -234,6 +234,7 @@
   Shape.CoroBegin = nullptr;
   Shape.CoroEnds.clear();
   Shape.CoroSizes.clear();
+  Shape.CoroAligns.clear();
   Shape.CoroSuspends.clear();
 
   Shape.FrameTy = nullptr;
@@ -268,6 +269,9 @@
   case Intrinsic::coro_size:
 CoroSizes.push_back(cast(II));
 break;
+  case Intrinsic::coro_align:
+CoroAligns.push_back(cast(II));
+break;
   case Intrinsic::coro_frame:
 CoroFrames.push_back(cast(II));
 break;
Index: llvm/lib/Transforms/Coroutines/CoroSplit.cpp
===
--- llvm/lib/Transforms/Coroutines/CoroSplit.cpp
+++ llvm/lib/Transforms/Coroutines/CoroSplit.cpp
@@ -997,23 +997,33 @@
   Shape.AsyncLowering.AsyncFuncPointer->setInitializer(NewFuncPtrStruct);
 }
 
-static void replaceFrameSize(coro::Shape ) {
+static void replaceFrameSizeAndAlign(coro::Shape ) {
   if (Shape.ABI == coro::ABI::Async)
 updateAsyncFuncPointerContextSize(Shape);
 
-  if (Shape.CoroSizes.empty())
-return;
+  if (!Shape.CoroSizes.empty()) {
+// In the same function all coro.sizes should have the same result type.
+auto *SizeIntrin = Shape.CoroSizes.back();
+Module *M = SizeIntrin->getModule();
+const DataLayout  = M->getDataLayout();
+auto Size = DL.getTypeAllocSize(Shape.FrameTy);
+auto *SizeConstant = ConstantInt::get(SizeIntrin->getType(), Size);
+
+for (CoroSizeInst *CS : Shape.CoroSizes) {
+  CS->replaceAllUsesWith(SizeConstant);
+  CS->eraseFromParent();
+}
+  }
 
-  // In the same function all coro.sizes should have the same result type.
-  auto *SizeIntrin = Shape.CoroSizes.back();
-  Module *M = SizeIntrin->getModule();
-  const DataLayout  = M->getDataLayout();
-  auto Size = DL.getTypeAllocSize(Shape.FrameTy);
-  auto *SizeConstant = ConstantInt::get(SizeIntrin->getType(), Size);
+  if (!Shape.CoroAligns.empty()) {
+auto *Intrin = Shape.CoroAligns.back();
+auto *AlignConstant =
+ConstantInt::get(Intrin->getType(), Shape.FrameAlign.value());
 
-  for (CoroSizeInst *CS : Shape.CoroSizes) {
-CS->replaceAllUsesWith(SizeConstant);
-CS->eraseFromParent();
+for (CoroAlignInst *CS : Shape.CoroAligns) {
+  CS->replaceAllUsesWith(AlignConstant);
+  CS->eraseFromParent();
+}
   }
 }
 
@@ -1748,7 +1758,7 @@
 
   simplifySuspendPoints(Shape);
   buildCoroutineFrame(F, Shape);
-  replaceFrameSize(Shape);
+  replaceFrameSizeAndAlign(Shape);
 
   // If there are no suspend points, no split required, just remove
   // the allocation and deallocation blocks, they are not needed.
Index: llvm/lib/Transforms/Coroutines/CoroInternal.h
===
--- llvm/lib/Transforms/Coroutines/CoroInternal.h
+++ llvm/lib/Transforms/Coroutines/CoroInternal.h
@@ -99,6 +99,7 @@
   CoroBeginInst *CoroBegin;
   SmallVector CoroEnds;
   SmallVector CoroSizes;
+  SmallVector CoroAligns;
   SmallVector CoroSuspends;
   SmallVector SwiftErrorOps;
 
Index: llvm/lib/Transforms/Coroutines/CoroInstr.h
===
--- llvm/lib/Transforms/Coroutines/CoroInstr.h
+++ llvm/lib/Transforms/Coroutines/CoroInstr.h
@@ -599,6 +599,18 @@
   }
 };
 
+/// This represents the llvm.coro.align instruction.
+class LLVM_LIBRARY_VISIBILITY CoroAlignInst : public IntrinsicInst {
+public:
+  // Methods to support type inquiry through isa, cast, and dyn_cast:
+  static bool classof(const IntrinsicInst *I) {
+return I->getIntrinsicID() == Intrinsic::coro_align;
+  }
+  static bool classof(const Value *V) {
+return isa(V) && classof(cast(V));
+  }
+};
+
 class LLVM_LIBRARY_VISIBILITY AnyCoroEndInst : public IntrinsicInst {
   enum { FrameArg, UnwindArg };
 
Index: llvm/include/llvm/IR/Intrinsics.td
===
--- llvm/include/llvm/IR/Intrinsics.td
+++ llvm/include/llvm/IR/Intrinsics.td
@@ -1237,6 +1237,7 @@
 def int_coro_frame : Intrinsic<[llvm_ptr_ty], [],

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-03-03 Thread Yuanfang Chen via Phabricator via cfe-commits

ychen created this revision.
ychen added reviewers: rjmccall, lxfind, GorNishanov.
Herald added a subscriber: hiraditya.
ychen requested review of this revision.
Herald added projects: clang, LLVM.
Herald added subscribers: llvm-commits, cfe-commits.

by over-allocating and emitting `alignTo` code to adjust the frame start 
address.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D97915

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-builtins.c
  clang/test/CodeGenCoroutines/coro-gro.cpp
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp

Index: llvm/lib/Transforms/Coroutines/CoroSplit.cpp
===
--- llvm/lib/Transforms/Coroutines/CoroSplit.cpp
+++ llvm/lib/Transforms/Coroutines/CoroSplit.cpp
@@ -1004,14 +1004,10 @@
   if (!Shape.CoroSizes.empty()) {
 // In the same function all coro.sizes should have the same result type.
 auto *SizeIntrin = Shape.CoroSizes.back();
-auto *SizeConstant =
-ConstantInt::get(SizeIntrin->getType(), Shape.FrameSize);
-assert(
-Shape.FrameSize ==
-alignTo(SizeIntrin->getModule()->getDataLayout().getTypeStoreSize(
-Shape.FrameTy),
-Align(Shape.FrameAlign)) &&
-"FrameSize should already be aligned up to FrameAlign");
+Module *M = SizeIntrin->getModule();
+const DataLayout  = M->getDataLayout();
+auto Size = DL.getTypeAllocSize(Shape.FrameTy);
+auto *SizeConstant = ConstantInt::get(SizeIntrin->getType(), Size);
 
 for (CoroSizeInst *CS : Shape.CoroSizes) {
   CS->replaceAllUsesWith(SizeConstant);
Index: clang/test/CodeGenCoroutines/coro-gro.cpp
===
--- clang/test/CodeGenCoroutines/coro-gro.cpp
+++ clang/test/CodeGenCoroutines/coro-gro.cpp
@@ -49,7 +49,8 @@
   // CHECK: %[[GroActive:.+]] = alloca i1
 
   // CHECK: %[[Size:.+]] = call i64 @llvm.coro.size.i64()
-  // CHECK: call noalias nonnull i8* @_Znwm(i64 %[[Size]])
+  // CHECK: %[[NewSize:.+]] = add i64 %[[Size]], %{{.+}}
+  // CHECK: call noalias nonnull i8* @_Znwm(i64 %[[NewSize]])
   // CHECK: store i1 false, i1* %[[GroActive]]
   // CHECK: call void @_ZNSt12experimental16coroutine_traitsIJiEE12promise_typeC1Ev(
   // CHECK: call void @_ZNSt12experimental16coroutine_traitsIJiEE12promise_type17get_return_objectEv(
Index: clang/test/CodeGenCoroutines/coro-builtins.c
===
--- clang/test/CodeGenCoroutines/coro-builtins.c
+++ clang/test/CodeGenCoroutines/coro-builtins.c
@@ -21,7 +21,12 @@
   __builtin_coro_noop();
 
   // CHECK-NEXT: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
-  // CHECK-NEXT: %[[MEM:.+]] = call i8* @myAlloc(i64 %[[SIZE]])
+  // CHECK-NEXT: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
+  // CHECK-NEXT: %[[DIFF:.+]] = sub nsw i64 %[[ALIGN]], 16
+  // CHECK-NEXT: %[[CMP:.+]] = icmp sgt i64 %[[DIFF]], 0
+  // CHECK-NEXT: %[[SEL:.+]] = select i1 %[[CMP]], i64 %[[DIFF]], i64 0
+  // CHECK-NEXT: %[[NEWSIZE:.+]] = add i64 %[[SIZE]], %[[SEL]]
+  // CHECK-NEXT: %[[MEM:.+]] = call i8* @myAlloc(i64 %[[NEWSIZE]])
   // CHECK-NEXT: %[[FRAME:.+]] = call i8* @llvm.coro.begin(token %[[COROID]], i8* %[[MEM]])
   __builtin_coro_begin(myAlloc(__builtin_coro_size()));
 
Index: clang/test/CodeGenCoroutines/coro-alloc.cpp
===
--- clang/test/CodeGenCoroutines/coro-alloc.cpp
+++ clang/test/CodeGenCoroutines/coro-alloc.cpp
@@ -60,9 +60,14 @@
   // CHECK: br i1 %[[NeedAlloc]], label %[[AllocBB:.+]], label %[[InitBB:.+]]
 
   // CHECK: [[AllocBB]]:
-  // CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
-  // CHECK: %[[MEM:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[SIZE]])
-  // CHECK: br label %[[AlignAllocBB:.+]]
+  // CHECK-NEXT: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
+  // CHECK-NEXT: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
+  // CHECK-NEXT: %[[DIFF:.+]] = sub nsw i64 %[[ALIGN]], 16
+  // CHECK-NEXT: %[[CMP:.+]] = icmp sgt i64 %[[DIFF]], 0
+  // CHECK-NEXT: %[[SEL:.+]] = select i1 %[[CMP]], i64 %[[DIFF]], i64 0
+  // CHECK-NEXT: %[[NEWSIZE:.+]] = add i64 %[[SIZE]], %[[SEL]]
+  // CHECK-NEXT: %[[MEM:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[NEWSIZE]])
+  // CHECK-NEXT: br label %[[AlignAllocBB:.+]]
 
   // CHECK: [[AlignAllocBB]]:
   // CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
@@ -104,7 +109,8 @@
 extern "C" void f1(promise_new_tag ) {
   // CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16
   // CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
-  // CHECK: call i8* @_ZNSt12experimental16coroutine_traitsIJv15promise_new_tagEE12promise_typenwEm(i64 %[[SIZE]])
+  // CHECK: %[[NEWSIZE:.+]] = add i64 %[[SIZE]], %{{.+}}
+  // CHECK: call i8*

78 matches

Mail list logo