[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-29 Thread Yuanfang Chen via Phabricator via cfe-commits
ychen added a comment.

In D100739#2727974 , @ChuanqiXu wrote:

> In D100739#2727808 , @ychen wrote:
>
>> In D100739#2718528 , @ChuanqiXu 
>> wrote:
>>
>>> In D100739#2718514 , @ychen wrote:
>>>
 Oh, right now C++ coroutine standard is written in the way that the 
 aligned new is not searched by the frontend. The limitation will be lifted 
 in C++23 (hopefully).
>>>
>>> I see. I am curious about the relationship of compiler implementation and 
>>> language standard now. For example, Clang/LLVM implement coroutine before 
>>> it becomes standard. The point I curious about is that should Clang/LLVM 
>>> implement based on proposals accepted only?
>>
>> Not a C++ language expert myself. But I think a proposal does not have to be 
>> formally accepted to kick-start the implementation (as long as the overall 
>> design is decided and the proposal is very likely to be accepted).
>
> I see. I am still prefer to use the aligned allocator to solve the problems, 
> although you and other reviewers prefer to use the over aligned frame.

It will be a hybrid of both. Since when a non-aligned version is picked by the 
frontend due to the aligned version not available, we still have to use 
overaligend frame. I'll send a separate patch for the aligned allocator 
searching.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-29 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D100739#2727808 , @ychen wrote:

> In D100739#2718528 , @ChuanqiXu 
> wrote:
>
>> In D100739#2718514 , @ychen wrote:
>>
>>> Oh, right now C++ coroutine standard is written in the way that the aligned 
>>> new is not searched by the frontend. The limitation will be lifted in C++23 
>>> (hopefully).
>>
>> I see. I am curious about the relationship of compiler implementation and 
>> language standard now. For example, Clang/LLVM implement coroutine before it 
>> becomes standard. The point I curious about is that should Clang/LLVM 
>> implement based on proposals accepted only?
>
> Not a C++ language expert myself. But I think a proposal does not have to be 
> formally accepted to kick-start the implementation (as long as the overall 
> design is decided and the proposal is very likely to be accepted).

I see. I am still prefer to use the aligned allocator to solve the problems, 
although you and other reviewers prefer to use the over aligned frame.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-29 Thread Yuanfang Chen via Phabricator via cfe-commits
ychen added a comment.

In D100739#2718528 , @ChuanqiXu wrote:

> In D100739#2718514 , @ychen wrote:
>
>> Oh, right now C++ coroutine standard is written in the way that the aligned 
>> new is not searched by the frontend. The limitation will be lifted in C++23 
>> (hopefully).
>
> I see. I am curious about the relationship of compiler implementation and 
> language standard now. For example, Clang/LLVM implement coroutine before it 
> becomes standard. The point I curious about is that should Clang/LLVM 
> implement based on proposals accepted only?

Not a C++ language expert myself. But I think a proposal does not have to be 
formally accepted to kick-start the implementation (as long as the overall 
design is decided and the proposal is very likely to be accepted).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-29 Thread Yuanfang Chen via Phabricator via cfe-commits
ychen abandoned this revision.
ychen added a comment.

Reopened D97915  to address the feedbacks. 
Close this one.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-26 Thread Yuanfang Chen via Phabricator via cfe-commits
ychen added a comment.

In D100739#2718681 , @rjmccall wrote:

> Here are the options I think the committee might take:
>
> 1. Always select an unaligned allocator and force implementors to dynamically 
> align.  This seems unlikely to me.
> 2. Allow an aligned allocator to be selected.  The issue here is that we 
> cannot know until coroutine splitting whether the frame has a new-extended 
> alignment.  So there are some sub-options:
>
> 2a. Always use an aligned allocator if available, even if the frame ends up 
> not being aligned.  I think this is unlikely.
> 2b. Use the correct allocator for the frame alignment, using the alignment 
> inferred from the immediate coroutine body.  This would force implementations 
> to avoid doing anything prior to coroutine splitting that would increase the 
> frame's alignment beyond the derived limit.  This would be quite annoying for 
> implementors, and it would have strange performance cliffs, but it's 
> theoretically simple.
> 2c. Use the correct allocator for the frame alignment; only that allocator is 
> formally ODR-used.  This would force implementations to not ODR-use the right 
> allocator until coroutine lowering, which is not really reasonable; we should 
> be able to dissuade the committee from picking this option.
> 2d. Use the correct allocator for the frame alignment; both allocators are 
> (allowed to be) ODR-used, but only one would be dynamically used.  This is 
> what would be necessary for the implementation I suggested above.  In reality 
> there won't be any dynamic overhead because we should always be able to fold 
> the branch after allocation.
>
> I think 2d is obvious enough that we could go ahead and implement it pending 
> standardization.  There's still a question of what to do if the promise class 
> only provides an unaligned `operator new`, but the only reasonable behavior 
> is to dynamically align: unlike with `new` expressions, there's no way the 
> promise class could be expected to know/satisfy the appropriate alignment 
> requirement in general, so assuming alignment is not acceptable.  (Neither is 
> rejecting the program if it doesn't provide both — this wouldn't be 
> compatible with existing behavior.)
>
>> *(void**) (frame + size) = rawFrame; this means we always need the extra 
>> space to store the raw frame ptr. If either doing what the patch is 
>> currently doing or add another intrinsic say "llvm.coro.raw.frame.ptr.index" 
>> to do *(void**) (frame + llvm.coro.raw.frame.ptr.index()) = rawFrame;, it is 
>> likely that the extra pointer could reuse some existing paddings in the 
>> frame. There is an example of this in https://reviews.llvm.org/P8260. What 
>> do you think?
>
> You're right that there might be space in the frame we could re-use.  I was 
> thinking that it would be a shame to always add storage to the frame for the 
> raw frame pointer, but maybe the contract of `llvm.coro.raw.frame.ptr.offset` 
> could be that it's only meaningful if the frame has extended alignment.  
> Coroutine splitting would determine if any of the frame members was 
> over-aligned and add a raw-pointer field if so. We'd be stuck allocating 
> space in the frame even when allocation was elided, but stack space is cheap.
>
> It does need to be an offset instead of a type index, though; the frontend 
> will be emitting a GEP and will not know the frame type.

Sounds good to me. Thanks. I'll go ahead with `llvm.coro.raw.frame.ptr.offset`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-26 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

Here are the options I think the committee might take:

1. Always select an unaligned allocator and force implementors to dynamically 
align.  This seems unlikely to me.
2. Allow an aligned allocator to be selected.  The issue here is that we cannot 
know until coroutine splitting whether the frame has a new-extended alignment.  
So there are some sub-options:

2a. Always use an aligned allocator if available, even if the frame ends up not 
being aligned.  I think this is unlikely.
2b. Use the correct allocator for the frame alignment, using the alignment 
inferred from the immediate coroutine body.  This would force implementations 
to avoid doing anything prior to coroutine splitting that would increase the 
frame's alignment beyond the derived limit.  This would be quite annoying for 
implementors, and it would have strange performance cliffs, but it's 
theoretically simple.
2c. Use the correct allocator for the frame alignment; only that allocator is 
formally ODR-used.  This would force implementations to not ODR-use the right 
allocator until coroutine lowering, which is not really reasonable; we should 
be able to dissuade the committee from picking this option.
2d. Use the correct allocator for the frame alignment; both allocators are 
(allowed to be) ODR-used, but only one would be dynamically used.  This is what 
would be necessary for the implementation I suggested above.  In reality there 
won't be any dynamic overhead because we should always be able to fold the 
branch after allocation.

I think 2d is obvious enough that we could go ahead and implement it pending 
standardization.  There's still a question of what to do if the promise class 
only provides an unaligned `operator new`, but the only reasonable behavior is 
to dynamically align: unlike with `new` expressions, there's no way the promise 
class could be expected to know/satisfy the appropriate alignment requirement 
in general, so assuming alignment is not acceptable.  (Neither is rejecting the 
program if it doesn't provide both — this wouldn't be compatible with existing 
behavior.)

> *(void**) (frame + size) = rawFrame; this means we always need the extra 
> space to store the raw frame ptr. If either doing what the patch is currently 
> doing or add another intrinsic say "llvm.coro.raw.frame.ptr.index" to do 
> *(void**) (frame + llvm.coro.raw.frame.ptr.index()) = rawFrame;, it is likely 
> that the extra pointer could reuse some existing paddings in the frame. There 
> is an example of this in https://reviews.llvm.org/P8260. What do you think?

You're right that there might be space in the frame we could re-use.  I was 
thinking that it would be a shame to always add storage to the frame for the 
raw frame pointer, but maybe the contract of `llvm.coro.raw.frame.ptr.offset` 
could be that it's only meaningful if the frame has extended alignment.  
Coroutine splitting would determine if any of the frame members was 
over-aligned and add a raw-pointer field if so. We'd be stuck allocating space 
in the frame even when allocation was elided, but stack space is cheap.

It does need to be an offset instead of a type index, though; the frontend will 
be emitting a GEP and will not know the frame type.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-26 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D100739#2718514 , @ychen wrote:

> Oh, right now C++ coroutine standard is written in the way that the aligned 
> new is not searched by the frontend. The limitation will be lifted in C++23 
> (hopefully).

I see. I am curious about the relationship of compiler implementation and 
language standard now. For example, Clang/LLVM implement coroutine before it 
becomes standard. The point I curious about is that should Clang/LLVM implement 
based on proposals accepted only?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-26 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D100739#2717582 , @rjmccall wrote:

> That's not really what I meant, no. I meant it would be better to find a way 
> to use an allocator that promises to return a well-aligned value when 
> possible. We've talked about this before; that will require the C++ committee 
> to update the design.

I had a question. Does this mean we can't use aligned allocator until the C++ 
committee update the wording? For example, I know Clang/LLVM implement 
coroutine before it becomes the standard. I mean is it possible to use 
aligned-allocator to solve the problem here?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-26 Thread Yuanfang Chen via Phabricator via cfe-commits
ychen added a comment.

In D100739#2715964 , @ChuanqiXu wrote:

> In D100739#2713579 , @ychen wrote:
>
>> In D100739#2711698 , @ChuanqiXu 
>> wrote:
>>
 This is an alternative to D97915  which 
 missed proper deallocation of the over-allocated frame. This patch handles 
 both allocations and deallocations.
>>>
>>> If D97915  is not needed, we should 
>>> abandon it.
>>>
>>> For the example shows in D97915 , it says:
>>>
>>>   #include 
>>>   #include 
>>>   #include 
>>>   #include 
>>>   #include 
>>>   
>>>   struct task{
>>> struct alignas(64) promise_type {
>>>   task get_return_object() { return {}; }
>>>   std::experimental::suspend_never initial_suspend() { return {}; }
>>>   std::experimental::suspend_never final_suspend() noexcept { return 
>>> {}; }
>>>   void return_void() {}
>>>   void unhandled_exception() {}
>>> };
>>> using handle = std::experimental::coroutine_handle;
>>>   };
>>>   
>>>   auto switch_to_new_thread() {
>>> struct awaitable {
>>>   bool await_ready() { return false; }
>>>   void await_suspend(task::handle h) {
>>> auto i = reinterpret_cast(&h.promise());
>>> std::cout << i << std::endl;
>>> assert(i % 64 == 0);
>>>   }
>>>   void await_resume() {}
>>> };
>>> return awaitable{};
>>>   }
>>>   
>>>   task resuming_on_new_thread() {
>>> co_await switch_to_new_thread();
>>>   }
>>>   
>>>   int main() {
>>> resuming_on_new_thread();
>>>   }
>>>
>>> The assertion would fail. If this is the root problem, I think we could 
>>> adjust the align for the promise alloca like:
>>
>> The problem is that any member of the coroutine frame could be overaligned 
>> (thus make the frame overaligned) including promise, local variables, 
>> spills. The problem is *not* specific to promise.
>>
>>>   %promise = alloca %promise_type, align 8
>>>
>>> into
>>>
>>>   %promise = alloca %promise_type, align 128
>>>
>>> In other words, if this the problem we need to solve, I think we could make 
>>> it in a simpler way.
>>
>> This may not fix the problem.
>>
>>> Then I looked into the document you give in the summary. The issue#3 says 
>>> the frontend can't do some work in the process of template instantiation 
>>> due to the frontend doesn't know about the align and size of the coroutine. 
>>> But from the implementation, it looks like not the problem this patch wants 
>>> to solve.
>>
>> I meant to use that as a reference to help describe the problem (but not the 
>> solution). The document itself includes both problem statements (issue#3) 
>> and solutions (frontend-based) which are totally unrelated to this patch. It 
>> looks like it is not that useful in this case so please disregard that.
>>
>>> I am really confused about the problem. Could you please restate your 
>>> problem more in more detail? For example, would it make the alignment 
>>> incorrect like the example above? Or does we want the frontend to get 
>>> alignment information? Then what would be affected? From the title, I can 
>>> guess the size of frame would get bigger. But how big would it be? Who 
>>> would control and determine the final size?
>>
>> understood.
>>
>> There are two kinds of alignments: the alignment of a type/object at 
>> compile-time (ABI specified or user-specified), and the alignment the object 
>> of that type actually gets during runtime. The compiler assumes that the 
>> alignment of a struct is the maximal alignment of all its members. However, 
>> that assumption may not be true at runtime where the memory allocator may 
>> return a memory block that has insufficient alignment which causes some 
>> members aligned incorrectly.
>>
>> For C++ coroutine, right now the default memory allocator could only return 
>> 16 bytes aligned memory block. When any member of the coroutine frame 
>> (promise, local variables, spills etc.) has alignment > 16, the frame 
>> becomes overaligned. This could only be fixed dynamically at runtime: by 
>> over-allocating memory and then adjust the frame start address so that it 
>> aligns correctly.
>>
>> For example, suppose malloc returns 16 bytes aligned address 16, how do we 
>> make it 64 bytes aligned? align 16 up to an address that is 64 bytes aligned 
>> which is 64, so the adjustment amount is 64-16=48
>>
>> Another similar example, suppose malloc returns 16 bytes aligned address 32, 
>> how do we make it 64 bytes aligned? align 32 up to an address that is 64 
>> bytes aligned which is 64, so the adjustment amount is 64-32=32
>>
>> Another similar example, suppose malloc returns 16 bytes aligned address 48, 
>> how do we make it 64 bytes aligned? align 48 up to an address that is 64 
>> bytes aligned which is 64, so the adjustment amount is 64-4

[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-26 Thread Yuanfang Chen via Phabricator via cfe-commits
ychen added a comment.

In D100739#2717582 , @rjmccall wrote:

> In D100739#2717259 , @ychen wrote:
>
>> In D100739#2717227 , @rjmccall 
>> wrote:
>>
>>> Yes, if you can dynamically choose to use an aligned allocator, that's 
>>> clearly just much better.
>>
>> Right now:
>>
>> Intrinsic::coro_size_aligned :   overaligned frame: over-allocate, adjust 
>> start address;non-overaligned frame: no-op
>> Intrinsic::coro_size :overaligned frame: no-op;  
>>non-overaligned frame: no-op
>>
>> Do you mean to remove `Intrinsic::coro_size_aligned` and make 
>> Intrinsic::coro_size :overaligned frame: over-allocate, 
>> adjust start address;non-overaligned frame: no-op
>>
>> that makes sense to me. Just want to confirm first.
>
> That's not really what I meant, no.  I meant it would be better to find a way 
> to use an allocator that promises to return a well-aligned value when 
> possible.  We've talked about this before; that will require the C++ 
> committee to update the design.
>
> I think the cleanest design for coro_size/align would be that they reflect 
> the unadjusted requirements, and the frontend is expected to emit code which 
> satisfies those requirements.  In the absence of an aligned allocator, that 
> means generating code like:
>
>   if (llvm.coro.alloc()) {
> size_t size = llvm.coro.size(), align = llvm.coro.align();
> if (align > NEW_ALIGN)
>   size += align - NEW_ALIGN + sizeof(void*);
> frame = operator new(size);
> if (align > NEW_ALIGN) {
>   auto rawFrame = frame;
>   frame = (frame + align - 1) & ~(align - 1);
>   *(void**) (frame + size) = rawFrame;
> }
>   }
>
> and then on the deallocation side:
>
>   if (llvm.coro.alloc()) {
> size_t size = llvm.coro.size(), align = llvm.coro.align();
> if (align > NEW_ALIGN)
>   frame = *(void**) (frame + size);
> operator delete(frame);
>   }
>
> That's all quite annoying, but it does extend quite nicely to cover the 
> presence of an aligned allocator when the committee gets around to ratifying 
> that this is what should happen:
>
>   if (llvm.coro.alloc()) {
> size_t size = llvm.coro.size(), align = llvm.coro.align();
> if (align > NEW_ALIGN)
>   frame = operator new(std::align_val_t(align), size);
> else
>   frame = operator new(size);
>   }
>
> and then on the deallocation side:
>
>   if (llvm.coro.alloc()) {
> size_t size = llvm.coro.size(), align = llvm.coro.align();
> if (align > NEW_ALIGN)
>   operator delete(frame, std::align_val_t(align));
> else
>   operator delete(frame);
>   }

`*(void**) (frame + size) = rawFrame;` this means we always need the extra 
space to store the raw frame ptr. If either doing what the patch is currently 
doing or add another intrinsic say "llvm.coro.raw.frame.ptr.index" to do 
`*(void**) (frame + llvm.coro.raw.frame.ptr.index()) = rawFrame;`, it is likely 
that the extra pointer could reuse some existing paddings in the frame.  There 
is an example of this in https://reviews.llvm.org/P8260. What do you think?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-26 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

In D100739#2717259 , @ychen wrote:

> In D100739#2717227 , @rjmccall 
> wrote:
>
>> Yes, if you can dynamically choose to use an aligned allocator, that's 
>> clearly just much better.
>
> Right now:
>
> Intrinsic::coro_size_aligned :   overaligned frame: over-allocate, adjust 
> start address;non-overaligned frame: no-op
> Intrinsic::coro_size :overaligned frame: no-op;   
>   non-overaligned frame: no-op
>
> Do you mean to remove `Intrinsic::coro_size_aligned` and make 
> Intrinsic::coro_size :overaligned frame: over-allocate, 
> adjust start address;non-overaligned frame: no-op
>
> that makes sense to me. Just want to confirm first.

That's not really what I meant, no.  I meant it would be better to find a way 
to use an allocator that promises to return a well-aligned value when possible. 
 We've talked about this before; that will require the C++ committee to update 
the design.

I think the cleanest design for coro_size/align would be that they reflect the 
unadjusted requirements, and the frontend is expected to emit code which 
satisfies those requirements.  In the absence of an aligned allocator, that 
means generating code like:

  if (llvm.coro.alloc()) {
size_t size = llvm.coro.size(), align = llvm.coro.align();
if (align > NEW_ALIGN)
  size += align - NEW_ALIGN + sizeof(void*);
frame = operator new(size);
if (align > NEW_ALIGN) {
  auto rawFrame = frame;
  frame = (frame + align - 1) & ~(align - 1);
  *(void**) (frame + size) = rawFrame;
}
  }

and then on the deallocation side:

  if (llvm.coro.alloc()) {
size_t size = llvm.coro.size(), align = llvm.coro.align();
if (align > NEW_ALIGN)
  frame = *(void**) (frame + size);
operator delete(frame);
  }

That's all quite annoying, but it does extend quite nicely to cover the 
presence of an aligned allocator when the committee gets around to ratifying 
that this is what should happen:

  if (llvm.coro.alloc()) {
size_t size = llvm.coro.size(), align = llvm.coro.align();
if (align > NEW_ALIGN)
  frame = operator new(std::align_val_t(align), size);
else
  frame = operator new(size);
  }

and then on the deallocation side:

  if (llvm.coro.alloc()) {
size_t size = llvm.coro.size(), align = llvm.coro.align();
if (align > NEW_ALIGN)
  operator delete(frame, std::align_val_t(align));
else
  operator delete(frame);
  }


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-26 Thread Yuanfang Chen via Phabricator via cfe-commits
ychen added a comment.

In D100739#2717227 , @rjmccall wrote:

> Yes, if you can dynamically choose to use an aligned allocator, that's 
> clearly just much better.

Right now:

Intrinsic::coro_size_aligned :   overaligned frame: over-allocate, adjust start 
address;non-overaligned frame: no-op
Intrinsic::coro_size :overaligned frame: no-op; 
non-overaligned frame: no-op

Do you mean to remove `Intrinsic::coro_size_aligned` and make 
Intrinsic::coro_size :overaligned frame: over-allocate, adjust 
start address;non-overaligned frame: no-op

that makes sense to me. Just want to confirm first.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-26 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

Yes, if you can dynamically choose to use an aligned allocator, that's clearly 
just much better.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-25 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D100739#2713579 , @ychen wrote:

> In D100739#2711698 , @ChuanqiXu 
> wrote:
>
>>> This is an alternative to D97915  which 
>>> missed proper deallocation of the over-allocated frame. This patch handles 
>>> both allocations and deallocations.
>>
>> If D97915  is not needed, we should abandon 
>> it.
>>
>> For the example shows in D97915 , it says:
>>
>>   #include 
>>   #include 
>>   #include 
>>   #include 
>>   #include 
>>   
>>   struct task{
>> struct alignas(64) promise_type {
>>   task get_return_object() { return {}; }
>>   std::experimental::suspend_never initial_suspend() { return {}; }
>>   std::experimental::suspend_never final_suspend() noexcept { return {}; 
>> }
>>   void return_void() {}
>>   void unhandled_exception() {}
>> };
>> using handle = std::experimental::coroutine_handle;
>>   };
>>   
>>   auto switch_to_new_thread() {
>> struct awaitable {
>>   bool await_ready() { return false; }
>>   void await_suspend(task::handle h) {
>> auto i = reinterpret_cast(&h.promise());
>> std::cout << i << std::endl;
>> assert(i % 64 == 0);
>>   }
>>   void await_resume() {}
>> };
>> return awaitable{};
>>   }
>>   
>>   task resuming_on_new_thread() {
>> co_await switch_to_new_thread();
>>   }
>>   
>>   int main() {
>> resuming_on_new_thread();
>>   }
>>
>> The assertion would fail. If this is the root problem, I think we could 
>> adjust the align for the promise alloca like:
>
> The problem is that any member of the coroutine frame could be overaligned 
> (thus make the frame overaligned) including promise, local variables, spills. 
> The problem is *not* specific to promise.
>
>>   %promise = alloca %promise_type, align 8
>>
>> into
>>
>>   %promise = alloca %promise_type, align 128
>>
>> In other words, if this the problem we need to solve, I think we could make 
>> it in a simpler way.
>
> This may not fix the problem.
>
>> Then I looked into the document you give in the summary. The issue#3 says 
>> the frontend can't do some work in the process of template instantiation due 
>> to the frontend doesn't know about the align and size of the coroutine. But 
>> from the implementation, it looks like not the problem this patch wants to 
>> solve.
>
> I meant to use that as a reference to help describe the problem (but not the 
> solution). The document itself includes both problem statements (issue#3) and 
> solutions (frontend-based) which are totally unrelated to this patch. It 
> looks like it is not that useful in this case so please disregard that.
>
>> I am really confused about the problem. Could you please restate your 
>> problem more in more detail? For example, would it make the alignment 
>> incorrect like the example above? Or does we want the frontend to get 
>> alignment information? Then what would be affected? From the title, I can 
>> guess the size of frame would get bigger. But how big would it be? Who would 
>> control and determine the final size?
>
> understood.
>
> There are two kinds of alignments: the alignment of a type/object at 
> compile-time (ABI specified or user-specified), and the alignment the object 
> of that type actually gets during runtime. The compiler assumes that the 
> alignment of a struct is the maximal alignment of all its members. However, 
> that assumption may not be true at runtime where the memory allocator may 
> return a memory block that has insufficient alignment which causes some 
> members aligned incorrectly.
>
> For C++ coroutine, right now the default memory allocator could only return 
> 16 bytes aligned memory block. When any member of the coroutine frame 
> (promise, local variables, spills etc.) has alignment > 16, the frame becomes 
> overaligned. This could only be fixed dynamically at runtime: by 
> over-allocating memory and then adjust the frame start address so that it 
> aligns correctly.
>
> For example, suppose malloc returns 16 bytes aligned address 16, how do we 
> make it 64 bytes aligned? align 16 up to an address that is 64 bytes aligned 
> which is 64, so the adjustment amount is 64-16=48
>
> Another similar example, suppose malloc returns 16 bytes aligned address 32, 
> how do we make it 64 bytes aligned? align 32 up to an address that is 64 
> bytes aligned which is 64, so the adjustment amount is 64-32=32
>
> Another similar example, suppose malloc returns 16 bytes aligned address 48, 
> how do we make it 64 bytes aligned? align 48 up to an address that is 64 
> bytes aligned which is 64, so the adjustment amount is 64-48=16
>
> Another similar example, suppose malloc returns 16 bytes aligned address 64, 
> how do we make it 64 bytes aligned? align 64 up to an address that is 64 
> bytes aligned which is 64, so t

[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-25 Thread Yuanfang Chen via Phabricator via cfe-commits
ychen added inline comments.



Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:854
+  // Save raw frame pointer to alloca
+  Value *Mem = Shape.CoroBegin->getMem();
+  AllocaInst *FramePtrAddr =

This seems to align with what this mem argument is designed for: 
https://reviews.llvm.org/D66230#1631860


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-23 Thread Yuanfang Chen via Phabricator via cfe-commits
ychen added a comment.

In D100739#2712268 , @lxfind wrote:

>> Sorry for the confusion. I think either overaligned or under-aligned could 
>> be used here to describe the problem: either "Handle overaligned frame" or 
>> "Fix under-aligned frame". Since c++ spec defines the former but not the 
>> later (https://en.cppreference.com/w/cpp/language/object#Alignment), my 
>> first intuition was to use the term "overalign". Under-aligned is the 
>> undesired outcome that should be fixed (probably too late to handle I 
>> assume). Also the overaligned is a static property whereas 'under-aligned" 
>> is a runtime property. From the compiler's perspective, I think overaligned 
>> should be preferred. With that said, I don't feel strongly about this. I 
>> could switch to use "overaligned" if that feels more intuitive.
>
> Here "overaligned frame" doesn't already occur.

It does occur. `FrameAlign > Shape.getSwitchCoroId()->getAlignment())` this 
check reflects the definition of C++ spec's definition of overalign.

> From what I understand, you really just want to support promise object 
> alignment. So why not just say that directly?

This patch does not deal with promise in any specific way. It treats promise 
just like any other frame members.

> To add on that, I do think you need to describe the problem in more detail in 
> the description. It's indeed still confusing.

Yep, will do that.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-23 Thread Yuanfang Chen via Phabricator via cfe-commits
ychen added a comment.

In D100739#2711698 , @ChuanqiXu wrote:

>> This is an alternative to D97915  which 
>> missed proper deallocation of the over-allocated frame. This patch handles 
>> both allocations and deallocations.
>
> If D97915  is not needed, we should abandon 
> it.
>
> For the example shows in D97915 , it says:
>
>   #include 
>   #include 
>   #include 
>   #include 
>   #include 
>   
>   struct task{
> struct alignas(64) promise_type {
>   task get_return_object() { return {}; }
>   std::experimental::suspend_never initial_suspend() { return {}; }
>   std::experimental::suspend_never final_suspend() noexcept { return {}; }
>   void return_void() {}
>   void unhandled_exception() {}
> };
> using handle = std::experimental::coroutine_handle;
>   };
>   
>   auto switch_to_new_thread() {
> struct awaitable {
>   bool await_ready() { return false; }
>   void await_suspend(task::handle h) {
> auto i = reinterpret_cast(&h.promise());
> std::cout << i << std::endl;
> assert(i % 64 == 0);
>   }
>   void await_resume() {}
> };
> return awaitable{};
>   }
>   
>   task resuming_on_new_thread() {
> co_await switch_to_new_thread();
>   }
>   
>   int main() {
> resuming_on_new_thread();
>   }
>
> The assertion would fail. If this is the root problem, I think we could 
> adjust the align for the promise alloca like:

The problem is that any member of the coroutine frame could be overaligned 
including promise, local variables, spills. The problem is *not* specific to 
promise.

>   %promise = alloca %promise_type, align 8
>
> into
>
>   %promise = alloca %promise_type, align 128
>
> In other words, if this the problem we need to solve, I think we could make 
> it in a simpler way.

This may not fix the problem.

> Then I looked into the document you give in the summary. The issue#3 says the 
> frontend can't do some work in the process of template instantiation due to 
> the frontend doesn't know about the align and size of the coroutine. But from 
> the implementation, it looks like not the problem this patch wants to solve.

I meant to use that as a reference to help describe the problem (but not the 
solution). The document itself includes both problem statements (issue#3) and 
solutions (frontend-based) which are totally unrelated to this patch. It looks 
like it is not that useful in this case so please disregard that.

> I am really confused about the problem. Could you please restate your problem 
> more in more detail? For example, would it make the alignment incorrect like 
> the example above? Or does we want the frontend to get alignment information? 
> Then what would be affected? From the title, I can guess the size of frame 
> would get bigger. But how big would it be? Who would control and determine 
> the final size?

understood.

There are two kinds of alignments: the alignment of a type/object at 
compile-time (ABI specified or user-specified), and the alignment the object of 
that type actually gets during runtime. The compiler assumes that the alignment 
of a struct is the maximal alignment of all its members. However, that 
assumption may not be true at runtime where the memory allocator may return a 
memory block that has insufficient alignment which causes some members aligned 
incorrectly.

For C++ coroutine, right now the default memory allocator could only return 16 
bytes aligned memory block. When any member of the coroutine frame (promise, 
local variables, spills etc.) has alignment > 16, the frame becomes 
overaligned. This could only be fixed dynamically at runtime: by 
over-allocating memory and then adjust the frame start address so that it 
aligns correctly.

For example, suppose malloc returns 16 bytes aligned address 16, how do we make 
it 64 bytes aligned? align 16 up to an address that is 64 bytes aligned which 
is 64, so the adjustment amount is 64-16=48

Another similar example, suppose malloc returns 16 bytes aligned address 32, 
how do we make it 64 bytes aligned? align 32 up to an address that is 64 bytes 
aligned which is 64, so the adjustment amount is 64-32=32

Another similar example, suppose malloc returns 16 bytes aligned address 48, 
how do we make it 64 bytes aligned? align 48 up to an address that is 64 bytes 
aligned which is 64, so the adjustment amount is 64-48=16

Another similar example, suppose malloc returns 16 bytes aligned address 64, 
how do we make it 64 bytes aligned? align 64 up to an address that is 64 bytes 
aligned which is 64, so the adjustment amount is 64-64=0

So the mamximal adjustment amount is 64-16=48. We don't know until runtime if 
the malloc returned address X is (X % 64 == 0) or (X % 64 == 16) or (X % 64 == 
32) or (X % 64 == 48), so we must emit extra code to deal with all cases (by 
bitwise 

[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-23 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

> Sorry for the confusion. I think either overaligned or under-aligned could be 
> used here to describe the problem: either "Handle overaligned frame" or "Fix 
> under-aligned frame". Since c++ spec defines the former but not the later 
> (https://en.cppreference.com/w/cpp/language/object#Alignment), my first 
> intuition was to use the term "overalign". Under-aligned is the undesired 
> outcome that should be fixed (probably too late to handle I assume). Also the 
> overaligned is a static property whereas 'under-aligned" is a runtime 
> property. From the compiler's perspective, I think overaligned should be 
> preferred. With that said, I don't feel strongly about this. I could switch 
> to use "overaligned" if that feels more intuitive.

"Handle" is probably not the right word to be used here. What follows "handle" 
is typically a legit situation that already occurred but not current handled 
properly. Here "overaligned frame" doesn't already occur. From what I 
understand, you really just want to support promise object alignment. So why 
not just say that directly?
To add on that, I do think you need to describe the problem in more detail in 
the description. It's indeed still confusing.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-23 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

> This is an alternative to D97915  which 
> missed proper deallocation of the over-allocated frame. This patch handles 
> both allocations and deallocations.

If D97915  is not needed, we should abandon it.

For the example shows in D97915 , it says:

  #include 
  #include 
  #include 
  #include 
  #include 
  
  struct task{
struct alignas(64) promise_type {
  task get_return_object() { return {}; }
  std::experimental::suspend_never initial_suspend() { return {}; }
  std::experimental::suspend_never final_suspend() noexcept { return {}; }
  void return_void() {}
  void unhandled_exception() {}
};
using handle = std::experimental::coroutine_handle;
  };
  
  auto switch_to_new_thread() {
struct awaitable {
  bool await_ready() { return false; }
  void await_suspend(task::handle h) {
auto i = reinterpret_cast(&h.promise());
std::cout << i << std::endl;
assert(i % 64 == 0);
  }
  void await_resume() {}
};
return awaitable{};
  }
  
  task resuming_on_new_thread() {
co_await switch_to_new_thread();
  }
  
  int main() {
resuming_on_new_thread();
  }

The assertion would fail. If this is the root problem, I think we could adjust 
the align for the promise alloca like:

  %promise = alloca %promise_type, align 8

into

  %promise = alloca %promise_type, align 128

In other words, if this the problem we need to solve, I think we could make it 
in a simpler way.

The I looked into the document you give in the summary. The issue#3 says the 
frontend can't do some work in the process of template instantiation due to the 
frontend doesn't know about the align and size of the coroutine. But from the 
implementation, it looks like not the problem this patch wants to solve.

I am really confused about the problem. Could you please restate your problem 
more in more detail? For example, would it make the alignment incorrect like 
the example above? Or does we want the frontend to get alignment information? 
Then what would be affected? From the title, I can guess the size of frame 
would get bigger. But how big would it be? Who would control and determine the 
final size?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-23 Thread Yuanfang Chen via Phabricator via cfe-commits
ychen updated this revision to Diff 339906.
ychen added a comment.

fix typo.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-builtins.c
  clang/test/CodeGenCoroutines/coro-gro.cpp
  llvm/docs/Coroutines.rst
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/Transforms/Coroutines/CoroFrame.cpp
  llvm/lib/Transforms/Coroutines/CoroInstr.h
  llvm/lib/Transforms/Coroutines/CoroInternal.h
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp
  llvm/lib/Transforms/Coroutines/Coroutines.cpp
  llvm/test/Transforms/Coroutines/coro-overalign.ll

Index: llvm/test/Transforms/Coroutines/coro-overalign.ll
===
--- /dev/null
+++ llvm/test/Transforms/Coroutines/coro-overalign.ll
@@ -0,0 +1,81 @@
+; Check that we will emit extra code to handle overaligned frame.
+; RUN: opt < %s -coro-split -S | FileCheck %s
+; RUN: opt < %s -passes=coro-split -S | FileCheck %s
+
+%PackedStruct = type <{ i64 }>
+
+declare void @consume(%PackedStruct*)
+
+define i8* @f() "coroutine.presplit"="1" {
+entry:
+  %data = alloca %PackedStruct, align 32
+  %id = call token @llvm.coro.id(i32 16, i8* null, i8* null, i8* null)
+  %size = call i32 @llvm.coro.size.aligned.i32()
+  %alloc = call i8* @malloc(i32 %size)
+  %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
+  call void @consume(%PackedStruct* %data)
+  %0 = call i8 @llvm.coro.suspend(token none, i1 false)
+  switch i8 %0, label %suspend [i8 0, label %resume
+i8 1, label %cleanup]
+resume:
+  call void @consume(%PackedStruct* %data)
+  br label %cleanup
+
+cleanup:
+  %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
+  call void @free(i8* %mem)
+  br label %suspend
+suspend:
+  call i1 @llvm.coro.end(i8* %hdl, i1 0)
+  ret i8* %hdl
+}
+
+; See if the frame pointer was inserted.
+; CHECK-LABEL: %f.Frame = type { void (%f.Frame*)*, void (%f.Frame*)*, i8*, i1, [7 x i8], %PackedStruct }
+
+; See if we over-allocate, adjust frame ptr start address and use a alloca to
+; save the raw frame pointer.
+; CHECK-LABEL: @f(
+;CHECK:  %alloc.frame.ptr = alloca i8*, align 8
+;CHECK:  %id = call token @llvm.coro.id(i32 16, i8* null, i8* null, i8* bitcast ([3 x void (%f.Frame*)*]* @f.resumers to i8*))
+;CHECK:  %alloc = call i8* @malloc(i32 56)
+;CHECK:  store i8* %alloc, i8** %alloc.frame.ptr, align 8
+;CHECK:  %intptr = ptrtoint i8* %alloc to i64
+;CHECK:  %over_boundary = add i64 %intptr, 31
+;CHECK:  %aligned_intptr = and i64 %over_boundary, -32
+;CHECK:  %diff = sub i64 %aligned_intptr, %intptr
+;CHECK:  %aligned_result = getelementptr i8, i8* %alloc, i64 %diff
+;CHECK:  call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, i64 32) ]
+;CHECK:  %hdl = call noalias nonnull i8* @llvm.coro.begin(token %id, i8* %aligned_result)
+
+; See if we emit correct deallocation code.
+
+; CHECK-LABEL: @f.resume(
+; CHECK:  %0 = getelementptr %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
+; CHECK-NEXT: %raw.frame.ptr = load i8*, i8** %0, align 8
+; CHECK-NEXT: call void @free(i8* %raw.frame.ptr)
+; CHECK-NEXT: ret void
+
+; CHECK-LABEL: @f.destroy(
+; CHECK:  %0 = getelementptr %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
+; CHECK-NEXT: %raw.frame.ptr = load i8*, i8** %0, align 8
+; CHECK-NEXT: call void @free(i8* %raw.frame.ptr)
+; CHECK-NEXT: ret void
+
+; CHECK-LABEL: @f.cleanup(
+; CHECK:  call void @free(i8* null)
+; CHECK-NEXT: ret void
+
+declare i8* @llvm.coro.free(token, i8*)
+declare i32 @llvm.coro.size.aligned.i32()
+declare i8  @llvm.coro.suspend(token, i1)
+declare void @llvm.coro.resume(i8*)
+declare void @llvm.coro.destroy(i8*)
+
+declare token @llvm.coro.id(i32, i8*, i8*, i8*)
+declare i1 @llvm.coro.alloc(token)
+declare i8* @llvm.coro.begin(token, i8*)
+declare i1 @llvm.coro.end(i8*, i1)
+
+declare noalias i8* @malloc(i32)
+declare void @free(i8*)
Index: llvm/lib/Transforms/Coroutines/Coroutines.cpp
===
--- llvm/lib/Transforms/Coroutines/Coroutines.cpp
+++ llvm/lib/Transforms/Coroutines/Coroutines.cpp
@@ -22,6 +22,7 @@
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/Function.h"
+#include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/InstIterator.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/IntrinsicInst.h"
@@ -171,7 +172,7 @@
 
 // Replace all coro.frees associated with the provided CoroId either with 'null'
 // if Elide is true and with its frame parameter otherwise.
-void coro::replaceCoroFree(CoroIdInst *CoroId, bool Elide) {
+void coro::replaceCoroFree(CoroIdInst *CoroId, bool Elide, Shape *Shape) {
   SmallVector CoroFrees;
   for (User *U : CoroId->users())
 if (auto CF = dyn_cast(U))
@@ -180,9 +181,25 @@
   if (CoroFrees.empty())
 return;
 
-  Value *Rep

[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-23 Thread Yuanfang Chen via Phabricator via cfe-commits
ychen updated this revision to Diff 339902.
ychen added a comment.

Fix typo.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-builtins.c
  clang/test/CodeGenCoroutines/coro-gro.cpp
  llvm/docs/Coroutines.rst
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/Transforms/Coroutines/CoroFrame.cpp
  llvm/lib/Transforms/Coroutines/CoroInstr.h
  llvm/lib/Transforms/Coroutines/CoroInternal.h
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp
  llvm/lib/Transforms/Coroutines/Coroutines.cpp
  llvm/test/Transforms/Coroutines/coro-overalign.ll

Index: llvm/test/Transforms/Coroutines/coro-overalign.ll
===
--- /dev/null
+++ llvm/test/Transforms/Coroutines/coro-overalign.ll
@@ -0,0 +1,81 @@
+; Check that we will emit extra code to handle overaligned frame.
+; RUN: opt < %s -coro-split -S | FileCheck %s
+; RUN: opt < %s -passes=coro-split -S | FileCheck %s
+
+%PackedStruct = type <{ i64 }>
+
+declare void @consume(%PackedStruct*)
+
+define i8* @f() "coroutine.presplit"="1" {
+entry:
+  %data = alloca %PackedStruct, align 32
+  %id = call token @llvm.coro.id(i32 16, i8* null, i8* null, i8* null)
+  %size = call i32 @llvm.coro.size.alloc.i32()
+  %alloc = call i8* @malloc(i32 %size)
+  %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
+  call void @consume(%PackedStruct* %data)
+  %0 = call i8 @llvm.coro.suspend(token none, i1 false)
+  switch i8 %0, label %suspend [i8 0, label %resume
+i8 1, label %cleanup]
+resume:
+  call void @consume(%PackedStruct* %data)
+  br label %cleanup
+
+cleanup:
+  %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
+  call void @free(i8* %mem)
+  br label %suspend
+suspend:
+  call i1 @llvm.coro.end(i8* %hdl, i1 0)
+  ret i8* %hdl
+}
+
+; See if the frame pointer was inserted.
+; CHECK-LABEL: %f.Frame = type { void (%f.Frame*)*, void (%f.Frame*)*, i8*, i1, [7 x i8], %PackedStruct }
+
+; See if we over-allocate, adjust frame ptr start address and use a alloca to
+; save the raw frame pointer.
+; CHECK-LABEL: @f(
+;CHECK:  %alloc.frame.ptr = alloca i8*, align 8
+;CHECK:  %id = call token @llvm.coro.id(i32 16, i8* null, i8* null, i8* bitcast ([3 x void (%f.Frame*)*]* @f.resumers to i8*))
+;CHECK:  %alloc = call i8* @malloc(i32 56)
+;CHECK:  store i8* %alloc, i8** %alloc.frame.ptr, align 8
+;CHECK:  %intptr = ptrtoint i8* %alloc to i64
+;CHECK:  %over_boundary = add i64 %intptr, 31
+;CHECK:  %aligned_intptr = and i64 %over_boundary, -32
+;CHECK:  %diff = sub i64 %aligned_intptr, %intptr
+;CHECK:  %aligned_result = getelementptr i8, i8* %alloc, i64 %diff
+;CHECK:  call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, i64 32) ]
+;CHECK:  %hdl = call noalias nonnull i8* @llvm.coro.begin(token %id, i8* %aligned_result)
+
+; See if we emit correct deallocation code.
+
+; CHECK-LABEL: @f.resume(
+; CHECK:  %0 = getelementptr %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
+; CHECK-NEXT: %raw.frame.ptr = load i8*, i8** %0, align 8
+; CHECK-NEXT: call void @free(i8* %raw.frame.ptr)
+; CHECK-NEXT: ret void
+
+; CHECK-LABEL: @f.destroy(
+; CHECK:  %0 = getelementptr %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
+; CHECK-NEXT: %raw.frame.ptr = load i8*, i8** %0, align 8
+; CHECK-NEXT: call void @free(i8* %raw.frame.ptr)
+; CHECK-NEXT: ret void
+
+; CHECK-LABEL: @f.cleanup(
+; CHECK:  call void @free(i8* null)
+; CHECK-NEXT: ret void
+
+declare i8* @llvm.coro.free(token, i8*)
+declare i32 @llvm.coro.size.alloc.i32()
+declare i8  @llvm.coro.suspend(token, i1)
+declare void @llvm.coro.resume(i8*)
+declare void @llvm.coro.destroy(i8*)
+
+declare token @llvm.coro.id(i32, i8*, i8*, i8*)
+declare i1 @llvm.coro.alloc(token)
+declare i8* @llvm.coro.begin(token, i8*)
+declare i1 @llvm.coro.end(i8*, i1)
+
+declare noalias i8* @malloc(i32)
+declare void @free(i8*)
Index: llvm/lib/Transforms/Coroutines/Coroutines.cpp
===
--- llvm/lib/Transforms/Coroutines/Coroutines.cpp
+++ llvm/lib/Transforms/Coroutines/Coroutines.cpp
@@ -22,6 +22,7 @@
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/Function.h"
+#include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/InstIterator.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/IntrinsicInst.h"
@@ -171,7 +172,7 @@
 
 // Replace all coro.frees associated with the provided CoroId either with 'null'
 // if Elide is true and with its frame parameter otherwise.
-void coro::replaceCoroFree(CoroIdInst *CoroId, bool Elide) {
+void coro::replaceCoroFree(CoroIdInst *CoroId, bool Elide, Shape *Shape) {
   SmallVector CoroFrees;
   for (User *U : CoroId->users())
 if (auto CF = dyn_cast(U))
@@ -180,9 +181,25 @@
   if (CoroFrees.empty())
 return;
 
-  Value *Replace

[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-22 Thread Yuanfang Chen via Phabricator via cfe-commits
ychen added a comment.

In D100739#2711181 , @rjmccall wrote:

> What is the purpose of the builtin?  Where is it being used?  Typically you 
> *can't* change the signature of a builtin because the builtin is itself a 
> language feature that's documented to have a particular signature.  If you've 
> made a builtin purely for use in generated AST, that's pretty unfortunate, 
> and you should consider whether you actually have to do that instead of e.g. 
> synthesizing a call to an allocation function the same way that we do in 
> `new` expressions.

Well, the intention was not *only* use it in AST, it could be used by clients 
to ask LLVM to handle overaligned frame. I'm not sure how many use cases that 
could have, so in the updated patch, I've removed it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-22 Thread Yuanfang Chen via Phabricator via cfe-commits
ychen added a comment.

In D100739#2706973 , @lxfind wrote:

> Thanks for working on this.
> I am still having a bit hard time understanding the solution.
> A few questions:
>
> 1. I assume this patch is to solve the problem where the promise object is 
> not aligned according to its alignof annotation, right? The title/wording is 
> a bit misleading. Usually "handling XXX" means XXX is a situation/problem 
> that wasn't handle properly before, and it's being handled here. I don't 
> really understand what "handle overaligned frame allocation" means. Isn't 
> frame allocation under-aligned being the problem?

Sorry for the confusion. I think either overaligned or under-aligned could be 
used here to describe the problem: either "Handle overaligned frame" or "Fix 
under-aligned frame". Since c++ spec defines the former but not the later 
(https://en.cppreference.com/w/cpp/language/object#Alignment), my first 
intuition was to use the term "overalign". Under-aligned is the undesired 
outcome that should be fixed (probably too late to handle I assume). Also the 
overaligned is a static property whereas 'under-aligned" is a runtime property. 
From the compiler's perspective, I think overaligned should be preferred. With 
that said, I don't feel strongly about this. I could switch to use 
"overaligned" if that feels more intuitive.

> 2. What is the purpose of coro.align intrinsic?

To communicate frame alignment to the frontend. It shouldn't be needed for this 
patch, so I've removed it.

> 3. Could you provide some examples of what the IR might look like after this 
> patch? Either that or a more detailed explanation of how this works in the 
> summary.

Yep, please see the updated description. And a new test is added.

> 4. Do you think it might be cleaner to introduce a new variant of coro.size 
> instead of adding arguments to it? For example, coro.size.aligned(). This 
> way, you can avoid changing any test file for non-switch-lowering test files, 
> but focus on all switch-lowering tests.

Agree, I've thought about variadic intrinsic and this new intrinsic, I think 
using new intrinsic is more flexible and avoids test fixup.

> 5. Typically, coro.free is used by a comparison with nullptr. This is to 
> enable CoroElide. See: 
> https://llvm.org/docs/Coroutines.html#llvm-coro-free-intrinsic. So I don't 
> think you can load from it directly.

Agree, I've changed to do it in `coro::replaceCoroFree`.

@ChuanqiXu @lxfind  Thanks a lot for the feedback. I've updated the description 
and addressed the existing comments. Please take a look.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-22 Thread Yuanfang Chen via Phabricator via cfe-commits
ychen updated this revision to Diff 339900.
ychen edited the summary of this revision.
ychen added a comment.

- Address feebacks.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-builtins.c
  clang/test/CodeGenCoroutines/coro-gro.cpp
  llvm/docs/Coroutines.rst
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/Transforms/Coroutines/CoroFrame.cpp
  llvm/lib/Transforms/Coroutines/CoroInstr.h
  llvm/lib/Transforms/Coroutines/CoroInternal.h
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp
  llvm/lib/Transforms/Coroutines/Coroutines.cpp
  llvm/test/Transforms/Coroutines/coro-overalign.ll

Index: llvm/test/Transforms/Coroutines/coro-overalign.ll
===
--- /dev/null
+++ llvm/test/Transforms/Coroutines/coro-overalign.ll
@@ -0,0 +1,81 @@
+; Check that we will emit extra code to handle overaligned frame.
+; RUN: opt < %s -coro-split -S | FileCheck %s
+; RUN: opt < %s -passes=coro-split -S | FileCheck %s
+
+%PackedStruct = type <{ i64 }>
+
+declare void @consume(%PackedStruct*)
+
+define i8* @f() "coroutine.presplit"="1" {
+entry:
+  %data = alloca %PackedStruct, align 32
+  %id = call token @llvm.coro.id(i32 16, i8* null, i8* null, i8* null)
+  %size = call i32 @llvm.coro.size.alloc.i32()
+  %alloc = call i8* @malloc(i32 %size)
+  %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
+  call void @consume(%PackedStruct* %data)
+  %0 = call i8 @llvm.coro.suspend(token none, i1 false)
+  switch i8 %0, label %suspend [i8 0, label %resume
+i8 1, label %cleanup]
+resume:
+  call void @consume(%PackedStruct* %data)
+  br label %cleanup
+
+cleanup:
+  %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
+  call void @free(i8* %mem)
+  br label %suspend
+suspend:
+  call i1 @llvm.coro.end(i8* %hdl, i1 0)
+  ret i8* %hdl
+}
+
+; See if the frame pointer was inserted.
+; CHECK-LABEL: %f.Frame = type { void (%f.Frame*)*, void (%f.Frame*)*, i8*, i1, [7 x i8], %PackedStruct }
+
+; See if we over-allocate, adjust frame ptr start address and use a alloca to
+; save the raw frame pointer.
+; CHECK-LABEL: @f(
+;CHECK:  %alloc.frame.ptr = alloca i8*, align 8
+;CHECK:  %id = call token @llvm.coro.id(i32 16, i8* null, i8* null, i8* bitcast ([3 x void (%f.Frame*)*]* @f.resumers to i8*))
+;CHECK:  %alloc = call i8* @malloc(i32 56)
+;CHECK:  store i8* %alloc, i8** %alloc.frame.ptr, align 8
+;CHECK:  %intptr = ptrtoint i8* %alloc to i64
+;CHECK:  %over_boundary = add i64 %intptr, 31
+;CHECK:  %aligned_intptr = and i64 %over_boundary, -32
+;CHECK:  %diff = sub i64 %aligned_intptr, %intptr
+;CHECK:  %aligned_result = getelementptr i8, i8* %alloc, i64 %diff
+;CHECK:  call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, i64 32) ]
+;CHECK:  %hdl = call noalias nonnull i8* @llvm.coro.begin(token %id, i8* %aligned_result)
+
+; See if we emit correct deallocation code.
+
+; CHECK-LABEL: @f.resume(
+; CHECK:  %0 = getelementptr %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
+; CHECK-NEXT: %raw.frame.ptr = load i8*, i8** %0, align 8
+; CHECK-NEXT: call void @free(i8* %raw.frame.ptr)
+; CHECK-NEXT: ret void
+
+; CHECK-LABEL: @f.destroy(
+; CHECK:  %0 = getelementptr %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
+; CHECK-NEXT: %raw.frame.ptr = load i8*, i8** %0, align 8
+; CHECK-NEXT: call void @free(i8* %raw.frame.ptr)
+; CHECK-NEXT: ret void
+
+; CHECK-LABEL: @f.cleanup(
+; CHECK:  call void @free(i8* null)
+; CHECK-NEXT: ret void
+
+declare i8* @llvm.coro.free(token, i8*)
+declare i32 @llvm.coro.size.alloc.i32()
+declare i8  @llvm.coro.suspend(token, i1)
+declare void @llvm.coro.resume(i8*)
+declare void @llvm.coro.destroy(i8*)
+
+declare token @llvm.coro.id(i32, i8*, i8*, i8*)
+declare i1 @llvm.coro.alloc(token)
+declare i8* @llvm.coro.begin(token, i8*)
+declare i1 @llvm.coro.end(i8*, i1)
+
+declare noalias i8* @malloc(i32)
+declare void @free(i8*)
Index: llvm/lib/Transforms/Coroutines/Coroutines.cpp
===
--- llvm/lib/Transforms/Coroutines/Coroutines.cpp
+++ llvm/lib/Transforms/Coroutines/Coroutines.cpp
@@ -22,6 +22,7 @@
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/Function.h"
+#include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/InstIterator.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/IntrinsicInst.h"
@@ -171,7 +172,7 @@
 
 // Replace all coro.frees associated with the provided CoroId either with 'null'
 // if Elide is true and with its frame parameter otherwise.
-void coro::replaceCoroFree(CoroIdInst *CoroId, bool Elide) {
+void coro::replaceCoroFree(CoroIdInst *CoroId, bool Elide, Shape *Shape) {
   SmallVector CoroFrees;
   for (User *U : CoroId->users())
 if (auto CF = dyn_cast(U))
@@ -180,9 +181,25 @@
   if

[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-22 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

What is the purpose of the builtin?  Where is it being used?  Typically you 
*can't* change the signature of a builtin because the builtin is itself a 
language feature that's documented to have a particular signature.  If you've 
made a builtin purely for use in generated AST, that's pretty unfortunate, and 
you should consider whether you actually have to do that instead of e.g. 
synthesizing a call to an allocation function the same way that we do in `new` 
expressions.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-21 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

Thanks for working on this.
I am still having a bit hard time understanding the solution.
A few questions:

1. I assume this patch is to solve the problem where the promise object is not 
aligned according to its alignof annotation, right? The title/wording is a bit 
misleading. Usually "handling XXX" means XXX is a situation/problem that wasn't 
handle properly before, and it's being handled here. I don't really understand 
what "handle overaligned frame allocation" means. Isn't frame allocation 
under-aligned being the problem?
2. What is the purpose of coro.align intrinsic?
3. Could you provide some examples of what the IR might look like after this 
patch? Either that or a more detailed explanation of how this works in the 
summary.
4. Do you think it might be cleaner to introduce a new variant of coro.size 
instead of adding arguments to it? For example, coro.size.aligned(). This way, 
you can avoid changing any test file for non-switch-lowering test files, but 
focus on all switch-lowering tests.
5. Typically, coro.free is used by a comparison with nullptr. This is to enable 
CoroElide. See: https://llvm.org/docs/Coroutines.html#llvm-coro-free-intrinsic. 
So I don't think you can load from it directly.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-21 Thread Xun Li via Phabricator via cfe-commits
lxfind added inline comments.



Comment at: clang/docs/LanguageExtensions.rst:2689
 
-  size_t __builtin_coro_size()
+  size_t __builtin_coro_size(bool alloc)
   void  *__builtin_coro_frame()

ChuanqiXu wrote:
> ychen wrote:
> > ChuanqiXu wrote:
> > > ychen wrote:
> > > > lxfind wrote:
> > > > > Do we need to change __builtin_coro_size? The argument will always be 
> > > > > 1, right?
> > > > > It only starts to change in LLVM intrinsics, if I read the impl 
> > > > > correctly.
> > > > Yeah, It is always 1 for Clang until the spec is fixed (then we could 
> > > > revert it back to 0).  Other clients using `__builtin_coro_size` may 
> > > > use 0 if the client doesn't care about overaligned frame or it could 
> > > > handle overaligned frame by itself. 
> > > BTW, is it OK to edit the `builtin`s directly? Since builtin is different 
> > > with intrinsic which is only visible in the internal of compiler, builtin 
> > > could be used by any end users. Although I know there should be  little 
> > > users who would use `__builtin_coro` APIs, I worry if there is any guide 
> > > principle for editing the `builtin`s.
> > > BTW, is it OK to edit the builtins directly? Since builtin is different 
> > > with intrinsic which is only visible in the internal of compiler, builtin 
> > > could be used by any end users. Although I know there should be little 
> > > users who would use __builtin_coro APIs, I worry if there is any guide 
> > > principle for editing the builtins.
> > 
> > I think it is ok to change these if it is justified like anything else.
> > 
> > builtins/intrinsics are interfaces on different levels. I'm trying to make 
> > __builtin_coro_size consistent with llvm.coro.size because I don't have a 
> > good reason for not doing that. (assume that we keep this opt-in 
> > overaligned frame handling in LLVM even after the spec is fixed since it 
> > helps solve a practical problem and the maintenance cost is low)
> > 
> > 
> It doesn't make sense to me that we need to change the signature for 
> `__builtin_coro_size` in this patch. In other words, why do we need to change 
> `__builtin_coro_size `? What are problems that can't be solved if we don't 
> change `__builtin_coro_size`? At least, if it is necessary to change 
> `__builtin_coro_size`, we could make it in successive patches.
Yeah I agree with ChuanqiXu, there is no need to make the builtin to be exactly 
the same as the llvm intrinsics just because they have the same name. Many of 
them are different even though they have the same name.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-20 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added inline comments.



Comment at: clang/docs/LanguageExtensions.rst:2689
 
-  size_t __builtin_coro_size()
+  size_t __builtin_coro_size(bool alloc)
   void  *__builtin_coro_frame()

ychen wrote:
> ChuanqiXu wrote:
> > ychen wrote:
> > > lxfind wrote:
> > > > Do we need to change __builtin_coro_size? The argument will always be 
> > > > 1, right?
> > > > It only starts to change in LLVM intrinsics, if I read the impl 
> > > > correctly.
> > > Yeah, It is always 1 for Clang until the spec is fixed (then we could 
> > > revert it back to 0).  Other clients using `__builtin_coro_size` may use 
> > > 0 if the client doesn't care about overaligned frame or it could handle 
> > > overaligned frame by itself. 
> > BTW, is it OK to edit the `builtin`s directly? Since builtin is different 
> > with intrinsic which is only visible in the internal of compiler, builtin 
> > could be used by any end users. Although I know there should be  little 
> > users who would use `__builtin_coro` APIs, I worry if there is any guide 
> > principle for editing the `builtin`s.
> > BTW, is it OK to edit the builtins directly? Since builtin is different 
> > with intrinsic which is only visible in the internal of compiler, builtin 
> > could be used by any end users. Although I know there should be little 
> > users who would use __builtin_coro APIs, I worry if there is any guide 
> > principle for editing the builtins.
> 
> I think it is ok to change these if it is justified like anything else.
> 
> builtins/intrinsics are interfaces on different levels. I'm trying to make 
> __builtin_coro_size consistent with llvm.coro.size because I don't have a 
> good reason for not doing that. (assume that we keep this opt-in overaligned 
> frame handling in LLVM even after the spec is fixed since it helps solve a 
> practical problem and the maintenance cost is low)
> 
> 
It doesn't make sense to me that we need to change the signature for 
`__builtin_coro_size` in this patch. In other words, why do we need to change 
`__builtin_coro_size `? What are problems that can't be solved if we don't 
change `__builtin_coro_size`? At least, if it is necessary to change 
`__builtin_coro_size`, we could make it in successive patches.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-20 Thread Yuanfang Chen via Phabricator via cfe-commits
ychen added inline comments.



Comment at: clang/docs/LanguageExtensions.rst:2689
 
-  size_t __builtin_coro_size()
+  size_t __builtin_coro_size(bool alloc)
   void  *__builtin_coro_frame()

ChuanqiXu wrote:
> ychen wrote:
> > lxfind wrote:
> > > Do we need to change __builtin_coro_size? The argument will always be 1, 
> > > right?
> > > It only starts to change in LLVM intrinsics, if I read the impl correctly.
> > Yeah, It is always 1 for Clang until the spec is fixed (then we could 
> > revert it back to 0).  Other clients using `__builtin_coro_size` may use 0 
> > if the client doesn't care about overaligned frame or it could handle 
> > overaligned frame by itself. 
> BTW, is it OK to edit the `builtin`s directly? Since builtin is different 
> with intrinsic which is only visible in the internal of compiler, builtin 
> could be used by any end users. Although I know there should be  little users 
> who would use `__builtin_coro` APIs, I worry if there is any guide principle 
> for editing the `builtin`s.
> BTW, is it OK to edit the builtins directly? Since builtin is different with 
> intrinsic which is only visible in the internal of compiler, builtin could be 
> used by any end users. Although I know there should be little users who would 
> use __builtin_coro APIs, I worry if there is any guide principle for editing 
> the builtins.

I think it is ok to change these if it is justified like anything else.

builtins/intrinsics are interfaces on different levels. I'm trying to make 
__builtin_coro_size consistent with llvm.coro.size because I don't have a good 
reason for not doing that. (assume that we keep this opt-in overaligned frame 
handling in LLVM even after the spec is fixed since it helps solve a practical 
problem and the maintenance cost is low)




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-20 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

There is something I am still confused about these two patch. Maybe I don't get 
the problem right.
The example problem shows if user uses `alignas` to specify the alignment for 
promise_type, the actual alignment for the promise isn't correctly due to 
compiler didn't call `new` method with alignment which isn't specified in the 
spec.
Then these two patches are trying to add paddings to the frame type to make the 
alignment of promise_type right. Here is a gap, in fact the problem comes from 
the alignment promise. But we want to solve it by over align the frame. It is 
odd for me.
I wonder if it is possible to simply adjust the alignment for the alloca of 
promise, like:

  %promise = alloca %promise_type, align 64




Comment at: clang/docs/LanguageExtensions.rst:2689
 
-  size_t __builtin_coro_size()
+  size_t __builtin_coro_size(bool alloc)
   void  *__builtin_coro_frame()

ychen wrote:
> lxfind wrote:
> > Do we need to change __builtin_coro_size? The argument will always be 1, 
> > right?
> > It only starts to change in LLVM intrinsics, if I read the impl correctly.
> Yeah, It is always 1 for Clang until the spec is fixed (then we could revert 
> it back to 0).  Other clients using `__builtin_coro_size` may use 0 if the 
> client doesn't care about overaligned frame or it could handle overaligned 
> frame by itself. 
BTW, is it OK to edit the `builtin`s directly? Since builtin is different with 
intrinsic which is only visible in the internal of compiler, builtin could be 
used by any end users. Although I know there should be  little users who would 
use `__builtin_coro` APIs, I worry if there is any guide principle for editing 
the `builtin`s.



Comment at: llvm/docs/Coroutines.rst:961
+declare i32 @llvm.coro.align.i32()
+declare i64 @llvm.coro.align.i64()
+

Maybe I was missing something. I think these two intrinsic should take one i8* 
argument to specify the coroutine handle. Otherwise, it may be confusing if 
there are some coroutines get inlined into other coroutine.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-20 Thread Yuanfang Chen via Phabricator via cfe-commits
ychen added inline comments.



Comment at: clang/docs/LanguageExtensions.rst:2689
 
-  size_t __builtin_coro_size()
+  size_t __builtin_coro_size(bool alloc)
   void  *__builtin_coro_frame()

lxfind wrote:
> Do we need to change __builtin_coro_size? The argument will always be 1, 
> right?
> It only starts to change in LLVM intrinsics, if I read the impl correctly.
Yeah, It is always 1 for Clang until the spec is fixed (then we could revert it 
back to 0).  Other clients using `__builtin_coro_size` may use 0 if the client 
doesn't care about overaligned frame or it could handle overaligned frame by 
itself. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-20 Thread Xun Li via Phabricator via cfe-commits
lxfind added inline comments.



Comment at: clang/docs/LanguageExtensions.rst:2689
 
-  size_t __builtin_coro_size()
+  size_t __builtin_coro_size(bool alloc)
   void  *__builtin_coro_frame()

Do we need to change __builtin_coro_size? The argument will always be 1, right?
It only starts to change in LLVM intrinsics, if I read the impl correctly.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-19 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D100739#2700324 , @ychen wrote:

> In D100739#2700273 , @ChuanqiXu 
> wrote:
>
>> I hadn't looked into the details. I would try to make it.
>> But from my understanding to this problem, the correct solution should 
>> remain the previous behavior if the end user doesn't specify `alignas` for 
>> `promise_type`. I mean, it shouldn't make so many test cases fail.
>
> The failures mostly are due to the `llvm.coro.size()` -> `llvm.coro.size(i1 
> alloc)` intrinsic signature change. The rest are due to the `align` arg of 
> `llvm.coro.id` is illegal 0 in most of the tests. This patch would read that 
> value causing asssertions.
>
>>> The alloca of the raw frame pointer (suppose we insert it in the frontend) 
>>> would be included in the non-overaligned frame if we don't teach CoroFrame 
>>> how to elide it.
>>
>> It is confusing to me, maybe I need to look into the codes. First, the raw 
>> frame pointer isn't inserted in the frontend. Do you mean coro.begin? Then, 
>> what's the relationship between eliding and over-aligned? It also makes me 
>> confused.
>
> Since we don't know if the frame is overaligned or not until after the frame 
> type is decided in CoroFrame, *suppose* we do this (adding raw frame ptr 
> alloca in frontend regardless of the presence of overalignment or not) in the 
> frontend, the code generated from CGCoroutine would be like this:
>
>   %1 = alloc i8*
>   %2 = malloc(x)
>   store %2, %1
>   ..
>   ..
>   %11 = load %1
>   free(%11)
>
> The alloca `%1` must be there even if the frame is not aligned. Then in the 
> CoroFrame, we have to check if the frame is really overaligned and if not, 
> reverse the above patterns. Otherwise, the alloca `%1` would be in the frame 
> needlessly and the following optimizations could not reliably remove it.
>
> This patch defers adding the alloca until CoroFrame where we know for sure 
> that the frame is aligned.

The semantics of alloca and free in the example look similar with coro.begin 
and coro.end.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-19 Thread Yuanfang Chen via Phabricator via cfe-commits
ychen added a comment.

In D100739#2700273 , @ChuanqiXu wrote:

> I hadn't looked into the details. I would try to make it.
> But from my understanding to this problem, the correct solution should remain 
> the previous behavior if the end user doesn't specify `alignas` for 
> `promise_type`. I mean, it shouldn't make so many test cases fail.

The failures mostly are due to the `llvm.coro.size()` -> `llvm.coro.size(i1 
alloc)` intrinsic signature change. The rest are due to the `align` arg of 
`llvm.coro.id` is illegal 0 in most of the tests. This patch would read that 
value causing asssertions.

>> The alloca of the raw frame pointer (suppose we insert it in the frontend) 
>> would be included in the non-overaligned frame if we don't teach CoroFrame 
>> how to elide it.
>
> It is confusing to me, maybe I need to look into the codes. First, the raw 
> frame pointer isn't inserted in the frontend. Do you mean coro.begin? Then, 
> what's the relationship between eliding and over-aligned? It also makes me 
> confused.

Since we don't know if the frame is overaligned or not until after the frame 
type is decided in CoroFrame, *suppose* we do this (adding raw frame ptr alloca 
in frontend regardless of the presence of overalignment or not) in the 
frontend, the code generated from CGCoroutine would be like this:

  %1 = alloc i8*
  %2 = malloc(x)
  store %2, %1
  ..
  ..
  %11 = load %1
  free(%11)

The alloca `%1` must be there even if the frame is not aligned. Then in the 
CoroFrame, we have to check if the frame is really overaligned and if not, 
reverse the above patterns. Otherwise, the alloca `%1` would be in the frame 
needlessly and the following optimizations could not reliably remove it.

This patch defers adding the alloca until CoroFrame where we know for sure that 
the frame is aligned.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-19 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

I hadn't looked into the details. I would try to make it.
But from my understanding to this problem, the correct solution should remain 
the previous behavior if the end user doesn't specify `alignas` for 
`promise_type`. I mean, it shouldn't make so many test cases fail.

> The alloca of the raw frame pointer (suppose we insert it in the frontend) 
> would be included in the non-overaligned frame if we don't teach CoroFrame 
> how to elide it.

It is confusing to me, maybe I need to look into the codes. First, the raw 
frame pointer isn't inserted in the frontend. Do you mean coro.begin? Then, 
what's the relationship between eliding and over-aligned? It also makes me 
confused.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-18 Thread Yuanfang Chen via Phabricator via cfe-commits
ychen updated this revision to Diff 338431.
ychen added a comment.

Fix comment. Ready for review.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

Files:
  clang/docs/LanguageExtensions.rst
  clang/include/clang/Basic/Builtins.def
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/Sema/SemaCoroutine.cpp
  llvm/docs/Coroutines.rst
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/Transforms/Coroutines/CoroFrame.cpp
  llvm/lib/Transforms/Coroutines/CoroInstr.h
  llvm/lib/Transforms/Coroutines/CoroInternal.h
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp
  llvm/lib/Transforms/Coroutines/Coroutines.cpp
  llvm/test/Transforms/Coroutines/ArgAddr.ll
  llvm/test/Transforms/Coroutines/coro-alloc-with-param-O0.ll
  llvm/test/Transforms/Coroutines/coro-alloca-01.ll

Index: llvm/test/Transforms/Coroutines/coro-alloca-01.ll
===
--- llvm/test/Transforms/Coroutines/coro-alloca-01.ll
+++ llvm/test/Transforms/Coroutines/coro-alloca-01.ll
@@ -8,7 +8,7 @@
   %x = alloca i64
   %y = alloca i64
   %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
-  %size = call i32 @llvm.coro.size.i32()
+  %size = call i32 @llvm.coro.size.i32(i1 false)
   %alloc = call i8* @malloc(i32 %size)
   %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
   br i1 %n, label %flag_true, label %flag_false
Index: llvm/test/Transforms/Coroutines/coro-alloc-with-param-O0.ll
===
--- llvm/test/Transforms/Coroutines/coro-alloc-with-param-O0.ll
+++ llvm/test/Transforms/Coroutines/coro-alloc-with-param-O0.ll
@@ -9,8 +9,8 @@
   %this.addr = alloca i64
   store i64 %this_arg, i64* %this.addr
   %this = load i64, i64* %this.addr
-  %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
-  %size = call i32 @llvm.coro.size.i32()
+  %id = call token @llvm.coro.id(i32 16, i8* null, i8* null, i8* null)
+  %size = call i32 @llvm.coro.size.i32(i1 true)
   %alloc = call i8* @myAlloc(i64 %this, i32 %size)
   %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
   %0 = call i8 @llvm.coro.suspend(token none, i1 false)
@@ -45,7 +45,7 @@
 ; CHECK: ret void
 
 declare i8* @llvm.coro.free(token, i8*)
-declare i32 @llvm.coro.size.i32()
+declare i32 @llvm.coro.size.i32(i1)
 declare i8  @llvm.coro.suspend(token, i1)
 declare void @llvm.coro.resume(i8*)
 declare void @llvm.coro.destroy(i8*)
Index: llvm/test/Transforms/Coroutines/ArgAddr.ll
===
--- llvm/test/Transforms/Coroutines/ArgAddr.ll
+++ llvm/test/Transforms/Coroutines/ArgAddr.ll
@@ -21,10 +21,10 @@
 ; CHECK-NEXT:store i32 [[TMP2]], i32* [[TMP1]], align 4
 ;
 entry:
-  %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null);
+  %id = call token @llvm.coro.id(i32 16, i8* null, i8* null, i8* null);
   %n.addr = alloca i32
   store i32 %n, i32* %n.addr ; this needs to go after coro.begin
-  %0 = tail call i32 @llvm.coro.size.i32()
+  %0 = tail call i32 @llvm.coro.size.i32(i1 true)
   %call = tail call i8* @malloc(i32 %0)
   %1 = tail call noalias nonnull i8* @llvm.coro.begin(token %id, i8* %call)
   %2 = bitcast i32* %n.addr to i8*
@@ -69,7 +69,7 @@
 declare void @ctor(i8* nocapture readonly)
 
 declare token @llvm.coro.id(i32, i8*, i8*, i8*)
-declare i32 @llvm.coro.size.i32()
+declare i32 @llvm.coro.size.i32(i1)
 declare i8* @llvm.coro.begin(token, i8*)
 declare i8 @llvm.coro.suspend(token, i1)
 declare i8* @llvm.coro.free(token, i8*)
Index: llvm/lib/Transforms/Coroutines/Coroutines.cpp
===
--- llvm/lib/Transforms/Coroutines/Coroutines.cpp
+++ llvm/lib/Transforms/Coroutines/Coroutines.cpp
@@ -234,6 +234,7 @@
   Shape.CoroBegin = nullptr;
   Shape.CoroEnds.clear();
   Shape.CoroSizes.clear();
+  Shape.CoroAligns.clear();
   Shape.CoroSuspends.clear();
 
   Shape.FrameTy = nullptr;
@@ -267,6 +268,10 @@
 continue;
   case Intrinsic::coro_size:
 CoroSizes.push_back(cast(II));
+HandleOverAlign = HandleOverAlign || CoroSizes.back()->isAlloc();
+break;
+  case Intrinsic::coro_align:
+CoroAligns.push_back(cast(II));
 break;
   case Intrinsic::coro_frame:
 CoroFrames.push_back(cast(II));
Index: llvm/lib/Transforms/Coroutines/CoroSplit.cpp
===
--- llvm/lib/Transforms/Coroutines/CoroSplit.cpp
+++ llvm/lib/Transforms/Coroutines/CoroSplit.cpp
@@ -997,23 +997,39 @@
   Shape.AsyncLowering.AsyncFuncPointer->setInitializer(NewFuncPtrStruct);
 }
 
-static void replaceFrameSize(coro::Shape &Shape) {
+static void replaceFrameSizeAndAlign(coro::Shape &Shape) {
   if (Shape.ABI == coro::ABI::Async)
 updateAsyncFuncPointerContextSize(Shape);
 
-  if (Shape.CoroSizes.empty())
-return;
+  if (!Shape.CoroSizes.empty()) {
+//

[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-18 Thread Yuanfang Chen via Phabricator via cfe-commits
ychen updated this revision to Diff 338430.
ychen added a comment.

fix comment


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

Files:
  clang/docs/LanguageExtensions.rst
  clang/include/clang/Basic/Builtins.def
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/Sema/SemaCoroutine.cpp
  llvm/docs/Coroutines.rst
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/Transforms/Coroutines/CoroFrame.cpp
  llvm/lib/Transforms/Coroutines/CoroInstr.h
  llvm/lib/Transforms/Coroutines/CoroInternal.h
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp
  llvm/lib/Transforms/Coroutines/Coroutines.cpp
  llvm/test/Transforms/Coroutines/ArgAddr.ll
  llvm/test/Transforms/Coroutines/coro-alloc-with-param-O0.ll
  llvm/test/Transforms/Coroutines/coro-alloca-01.ll

Index: llvm/test/Transforms/Coroutines/coro-alloca-01.ll
===
--- llvm/test/Transforms/Coroutines/coro-alloca-01.ll
+++ llvm/test/Transforms/Coroutines/coro-alloca-01.ll
@@ -8,7 +8,7 @@
   %x = alloca i64
   %y = alloca i64
   %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
-  %size = call i32 @llvm.coro.size.i32()
+  %size = call i32 @llvm.coro.size.i32(i1 false)
   %alloc = call i8* @malloc(i32 %size)
   %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
   br i1 %n, label %flag_true, label %flag_false
Index: llvm/test/Transforms/Coroutines/coro-alloc-with-param-O0.ll
===
--- llvm/test/Transforms/Coroutines/coro-alloc-with-param-O0.ll
+++ llvm/test/Transforms/Coroutines/coro-alloc-with-param-O0.ll
@@ -9,8 +9,8 @@
   %this.addr = alloca i64
   store i64 %this_arg, i64* %this.addr
   %this = load i64, i64* %this.addr
-  %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
-  %size = call i32 @llvm.coro.size.i32()
+  %id = call token @llvm.coro.id(i32 16, i8* null, i8* null, i8* null)
+  %size = call i32 @llvm.coro.size.i32(i1 true)
   %alloc = call i8* @myAlloc(i64 %this, i32 %size)
   %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
   %0 = call i8 @llvm.coro.suspend(token none, i1 false)
@@ -45,7 +45,7 @@
 ; CHECK: ret void
 
 declare i8* @llvm.coro.free(token, i8*)
-declare i32 @llvm.coro.size.i32()
+declare i32 @llvm.coro.size.i32(i1)
 declare i8  @llvm.coro.suspend(token, i1)
 declare void @llvm.coro.resume(i8*)
 declare void @llvm.coro.destroy(i8*)
Index: llvm/test/Transforms/Coroutines/ArgAddr.ll
===
--- llvm/test/Transforms/Coroutines/ArgAddr.ll
+++ llvm/test/Transforms/Coroutines/ArgAddr.ll
@@ -21,10 +21,10 @@
 ; CHECK-NEXT:store i32 [[TMP2]], i32* [[TMP1]], align 4
 ;
 entry:
-  %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null);
+  %id = call token @llvm.coro.id(i32 16, i8* null, i8* null, i8* null);
   %n.addr = alloca i32
   store i32 %n, i32* %n.addr ; this needs to go after coro.begin
-  %0 = tail call i32 @llvm.coro.size.i32()
+  %0 = tail call i32 @llvm.coro.size.i32(i1 true)
   %call = tail call i8* @malloc(i32 %0)
   %1 = tail call noalias nonnull i8* @llvm.coro.begin(token %id, i8* %call)
   %2 = bitcast i32* %n.addr to i8*
@@ -69,7 +69,7 @@
 declare void @ctor(i8* nocapture readonly)
 
 declare token @llvm.coro.id(i32, i8*, i8*, i8*)
-declare i32 @llvm.coro.size.i32()
+declare i32 @llvm.coro.size.i32(i1)
 declare i8* @llvm.coro.begin(token, i8*)
 declare i8 @llvm.coro.suspend(token, i1)
 declare i8* @llvm.coro.free(token, i8*)
Index: llvm/lib/Transforms/Coroutines/Coroutines.cpp
===
--- llvm/lib/Transforms/Coroutines/Coroutines.cpp
+++ llvm/lib/Transforms/Coroutines/Coroutines.cpp
@@ -234,6 +234,7 @@
   Shape.CoroBegin = nullptr;
   Shape.CoroEnds.clear();
   Shape.CoroSizes.clear();
+  Shape.CoroAligns.clear();
   Shape.CoroSuspends.clear();
 
   Shape.FrameTy = nullptr;
@@ -267,6 +268,10 @@
 continue;
   case Intrinsic::coro_size:
 CoroSizes.push_back(cast(II));
+HandleOverAlign = HandleOverAlign || CoroSizes.back()->isAlloc();
+break;
+  case Intrinsic::coro_align:
+CoroAligns.push_back(cast(II));
 break;
   case Intrinsic::coro_frame:
 CoroFrames.push_back(cast(II));
Index: llvm/lib/Transforms/Coroutines/CoroSplit.cpp
===
--- llvm/lib/Transforms/Coroutines/CoroSplit.cpp
+++ llvm/lib/Transforms/Coroutines/CoroSplit.cpp
@@ -69,6 +69,7 @@
 #include 
 #include 
 #include 
+#include 
 
 using namespace llvm;
 
@@ -997,23 +998,39 @@
   Shape.AsyncLowering.AsyncFuncPointer->setInitializer(NewFuncPtrStruct);
 }
 
-static void replaceFrameSize(coro::Shape &Shape) {
+static void replaceFrameSizeAndAlign(coro::Shape &Shape) {
   if (Shape.ABI == coro::ABI::Async)
 updateAsyncFuncPointerContextSize(Shape);
 
-  if (Shape.Coro

[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-18 Thread Yuanfang Chen via Phabricator via cfe-commits
ychen created this revision.
ychen added reviewers: rjmccall, lxfind, ChuanqiXu.
Herald added a subscriber: hiraditya.
ychen requested review of this revision.
Herald added subscribers: llvm-commits, cfe-commits, jdoerfert.
Herald added projects: clang, LLVM.

This is an alternative to D97915  which missed 
proper deallocation of the
over-allocated frame. This patch handles both allocations and deallocations.

Contrary to D97915 , this patch implements the 
over-allocation in the backend instead of the frontend since

- The alloca of the raw frame pointer (suppose we insert it in the frontend) 
would be included in the non-overaligned frame if we don't teach CoroFrame how 
to elide it.
- Only insert extra code when it is known that the frame is overaligned.
- Simpler implementation.
- Clients could turn it on/off by setting the newly introduced argument of 
`llvm.coro.size(i1 alloc)`.

`llvm.coro.size` gains it first argument `i1 alloc` to indicate users of the 
intrinsic are alloc/dealloc functions. It also indicates to LLVM that it should 
handle overaligned frames. One con of doing this is that many tests need fixup 
(if the general direction is agreed upon, I'll do that later).

It gets a little tricky finding the deallocation function. It is assumed that 
all CallInst users of `llvm.coro.free` are potentially deallocation functions. 
I think this should be the implicit assumption in practice although the 
documentation `Coroutines.rst` does not mention that. Happy to better idea in 
this regard.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D100739

Files:
  clang/docs/LanguageExtensions.rst
  clang/include/clang/Basic/Builtins.def
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/Sema/SemaCoroutine.cpp
  llvm/docs/Coroutines.rst
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/Transforms/Coroutines/CoroFrame.cpp
  llvm/lib/Transforms/Coroutines/CoroInstr.h
  llvm/lib/Transforms/Coroutines/CoroInternal.h
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp
  llvm/lib/Transforms/Coroutines/Coroutines.cpp
  llvm/test/Transforms/Coroutines/ArgAddr.ll
  llvm/test/Transforms/Coroutines/coro-alloc-with-param-O0.ll
  llvm/test/Transforms/Coroutines/coro-alloca-01.ll

Index: llvm/test/Transforms/Coroutines/coro-alloca-01.ll
===
--- llvm/test/Transforms/Coroutines/coro-alloca-01.ll
+++ llvm/test/Transforms/Coroutines/coro-alloca-01.ll
@@ -8,7 +8,7 @@
   %x = alloca i64
   %y = alloca i64
   %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
-  %size = call i32 @llvm.coro.size.i32()
+  %size = call i32 @llvm.coro.size.i32(i1 false)
   %alloc = call i8* @malloc(i32 %size)
   %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
   br i1 %n, label %flag_true, label %flag_false
Index: llvm/test/Transforms/Coroutines/coro-alloc-with-param-O0.ll
===
--- llvm/test/Transforms/Coroutines/coro-alloc-with-param-O0.ll
+++ llvm/test/Transforms/Coroutines/coro-alloc-with-param-O0.ll
@@ -9,8 +9,8 @@
   %this.addr = alloca i64
   store i64 %this_arg, i64* %this.addr
   %this = load i64, i64* %this.addr
-  %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
-  %size = call i32 @llvm.coro.size.i32()
+  %id = call token @llvm.coro.id(i32 16, i8* null, i8* null, i8* null)
+  %size = call i32 @llvm.coro.size.i32(i1 true)
   %alloc = call i8* @myAlloc(i64 %this, i32 %size)
   %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
   %0 = call i8 @llvm.coro.suspend(token none, i1 false)
@@ -45,7 +45,7 @@
 ; CHECK: ret void
 
 declare i8* @llvm.coro.free(token, i8*)
-declare i32 @llvm.coro.size.i32()
+declare i32 @llvm.coro.size.i32(i1)
 declare i8  @llvm.coro.suspend(token, i1)
 declare void @llvm.coro.resume(i8*)
 declare void @llvm.coro.destroy(i8*)
Index: llvm/test/Transforms/Coroutines/ArgAddr.ll
===
--- llvm/test/Transforms/Coroutines/ArgAddr.ll
+++ llvm/test/Transforms/Coroutines/ArgAddr.ll
@@ -21,10 +21,10 @@
 ; CHECK-NEXT:store i32 [[TMP2]], i32* [[TMP1]], align 4
 ;
 entry:
-  %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null);
+  %id = call token @llvm.coro.id(i32 16, i8* null, i8* null, i8* null);
   %n.addr = alloca i32
   store i32 %n, i32* %n.addr ; this needs to go after coro.begin
-  %0 = tail call i32 @llvm.coro.size.i32()
+  %0 = tail call i32 @llvm.coro.size.i32(i1 true)
   %call = tail call i8* @malloc(i32 %0)
   %1 = tail call noalias nonnull i8* @llvm.coro.begin(token %id, i8* %call)
   %2 = bitcast i32* %n.addr to i8*
@@ -69,7 +69,7 @@
 declare void @ctor(i8* nocapture readonly)
 
 declare token @llvm.coro.id(i32, i8*, i8*, i8*)
-declare i32 @llvm.coro.size.i32()
+declare i32 @llvm.coro.size.i32(i1)
 declare i8* @llvm.coro.begin(token, i8*)
 declare i8 @llvm.coro.suspend(token, i1)
 declare i