Hi dear openssl maintainer, I met an issue in the crypto/chacha/chacha-x86_64.S, could you be kind to have a look on it? Thanks very much.
Currently it will stuck in the function do_sse3_after_all, and a #GP will occurs due to the following instructions ""movdqa %xmm0,0(%rsp)" need 16 bytes alignment, however, after I go through the detail code, I find that it already adjust the rsp by "subq $64+8,%rsp" and I simply tried to change it like "subq $64,%rsp" then it will works correctly. I don't know whether there's an issue about it?, if I have some mistake please correct me. :) I suppose that the "subq $64+8,%rsp" is used to align the stack with 16 bytes, but in my case if the default RSP already be 16 bytes align then after execute it the stack will becomes 8 bytes align so the #GP happens:( So could you please help to check it? 438ChaCha20_4x: 439.LChaCha20_4x: 440 movq %rsp,%r9 441 movq %r10,%r11 442 shrq $32,%r10 443 testq $32,%r10 444 jnz .LChaCha20_8x 445 cmpq $192,%rdx 446 ja .Lproceed4x 447 448 andq $71303168,%r11 449 cmpq $4194304,%r11 450 je .Ldo_sse3_after_all 987.LChaCha20_8x: 988 movq %rsp,%r9 989 subq $0x280+8,%rsp 990 andq $-32,%rsp 991 vzeroupper .Lproceed4x: 453 subq $0x140+8,%rsp 454 movdqa .Lsigma(%rip),%xmm11 455 movdqu (%rcx),%xmm15 456 movdqu 16(%rcx),%xmm7 457 movdqu (%r8),%xmm3 458 leaq 256(%rsp),%rcx 459 leaq .Lrot16(%rip),%r10 460 leaq .Lrot24(%rip),%r11 .Ldo_sse3_after_all: 312 subq $64+8,%rsp 313 movdqa .Lsigma(%rip),%xmm0 314 movdqu (%rcx),%xmm1 315 movdqu 16(%rcx),%xmm2 316 movdqu (%r8),%xmm3 317 movdqa .Lrot16(%rip),%xmm6 318 movdqa .Lrot24(%rip),%xmm7 319 320 movdqa %xmm0,0(%rsp) 321 movdqa %xmm1,16(%rsp) 322 movdqa %xmm2,32(%rsp) 323 movdqa %xmm3,48(%rsp) 324 movq $10,%r8 325 jmp .Loop_ssse3 /Best Regards! --Shaopu
-- openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev