I met an issue in the crypto/chacha/chacha-x86_64.S, could you be kind to have a look on it? Thanks very much.

Currently it will stuck in the function *do_sse3_after_all*, and a #GP will occurs due to the following instructions

““movdqa %xmm0,0(%rsp)” need 16 bytes alignment, however, after I go through the detail code, I find that it already

adjust the rsp by “subq $64+8,%rsp” and I simply tried to change it like “subq $64,%rsp” then it will works correctly.

I don’t know whether there’s an issue about it?, if I have some mistake please correct me. J

I suppose that the “subq $64+8,%rsp” is used to align the stack with 16 bytes, but in my case if the default RSP already be 16 bytes

align then after execute it the stack will becomes 8 bytes align so the #GP happensL  So could you please help to check it?

All known x86_64 ABIs specify that top of stack is to be aligned at 16 bytes. Obviously it can't be aligned at each given moment, not on x86_64, so question is *when* does it have to be aligned? It has to be aligned at least at moment of call to another subroutine. Since x86_64 call instruction pushes return address to stack, this means that upon entry to function stack is actually misaligned. Hence compliant function has to allocate 16*n+8 frame. And that's what we see in code, 64+8 in the referred case. Now, if you experience crash at the point in question, it can only mean one thing, caller is not compliant with ABI. Though there is ambiguity and it might be wrong to blame direct caller for following reason. Customarily compilers don't explicitly align stack in each subroutine, but instead assume that caller aligned it. In other words stack alignment is kind of collective effort, with each subroutine relying on its caller. So that all subroutines can be compliant, but it would still be a problem. This would be case when stack was *initially* misaligned [upon its creation]. To summarize, it's either one of subroutines in chain of calls leading to ChaCha20_ctr32 that is not compliant with ABI, or stack was initially seeded misaligned.
