Re: [FFmpeg-devel] [PATCH 1/4] x86inc: Support arbitrary stack alignments
On Mon, Aug 3, 2015 at 8:12 PM, Hendrik Leppkes h.lepp...@gmail.com wrote: On Mon, Aug 3, 2015 at 5:26 PM, James Almer jamr...@gmail.com wrote: On 03/08/15 9:50 AM, Hendrik Leppkes wrote: Otherwise, I tested the patch with msvc 2013 32-bit, and fate passed fine. If there is something else I should specifically test which may not be covered by fate, let me know. Just to be sure try to convert an 8ch audio stream from float/s24 to floatp/s24p. See if it crashes or gives wrong output. Since I wasn't sure in which direction of packed/planar the problem usually happened, I tried 8ch s32 - 8ch s32p, 8ch s32 - 2ch s32p, 8ch s32p - 8ch s32 and 8ch s32p - 2ch s32, and all work fine. - Hendrik Thanks for testing. Applied. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/4] x86inc: Support arbitrary stack alignments
Hi, On Mon, Aug 3, 2015 at 8:50 AM, Hendrik Leppkes h.lepp...@gmail.com wrote: On Mon, Aug 3, 2015 at 10:31 AM, Henrik Gramner hen...@gramner.com wrote: On Mon, Aug 3, 2015 at 2:18 AM, Ronald S. Bultje rsbul...@gmail.com wrote: So, I think the code changes themselves look mostly healthy. Is there a behavioural difference before/after this patch? (Like: were there bugs in the original code, or does this change behaviour of previous code in a significant way?) Should only be what's in the commit message; Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not. which I guess you could classify as a bug (it certainly wasn't a sensible behavior). It's the reason for why the weird deblock stack allocation for example existed in the first place. So anything relying on the previous alignment behavior of automatic stack allocation using cglobal would be affected, other than that it shouldn't make any difference since ffmpeg doesn't use 16-byte stack alignment. I can only compile ffmpeg with --disable-programs when using msys2/msvc2015 (ffmpeg.c(437): error C2039: '_cnt': is not a member of '_iobuf'). Not sure if I'm doing something wrong, but if someone is able to test that better that would be nice. msvc2015 support is unfortunately still broken. I'll send a patch later for this. Otherwise, I tested the patch with msvc 2013 32-bit, and fate passed fine. If there is something else I should specifically test which may not be covered by fate, let me know. I think it tests things sufficiently well, so go ahead and apply. Fate will pick up some remaining items (like testing with restricted cpuflags). Tnx, Ronald ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/4] x86inc: Support arbitrary stack alignments
Hi all, Can you please remove me from this group. Thanks, Hitesh On Aug 3, 2015 2:14 AM, Henrik Gramner hen...@gramner.com wrote: Change ALLOC_STACK to always align the stack before allocating stack space for consistency. Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not. --- libavcodec/x86/h264_deblock.asm | 4 +-- libavutil/x86/x86inc.asm| 62 ++--- 2 files changed, 42 insertions(+), 24 deletions(-) diff --git a/libavcodec/x86/h264_deblock.asm b/libavcodec/x86/h264_deblock.asm index 14c8205..5151f3c 100644 --- a/libavcodec/x86/h264_deblock.asm +++ b/libavcodec/x86/h264_deblock.asm @@ -446,13 +446,13 @@ cglobal deblock_%1_luma_8, 5,5,8,2*%2 ;int8_t *tc0) ;- INIT_MMX cpuname -cglobal deblock_h_luma_8, 0,5,8,0x60+HAVE_ALIGNED_STACK*12 +cglobal deblock_h_luma_8, 0,5,8,0x60+12 movr0, r0mp movr3, r1m lear4, [r3*3] subr0, 4 lear1, [r0+r4] -%define pix_tmp esp+12*HAVE_ALIGNED_STACK +%define pix_tmp esp+12 ; transpose 6x16 - tmp space TRANSPOSE6x8_MEM PASS8ROWS(r0, r1, r3, r4), pix_tmp diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 12779f5..e176715 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -42,6 +42,17 @@ %define public_prefix private_prefix %endif +%if HAVE_ALIGNED_STACK +%define STACK_ALIGNMENT 16 +%endif +%ifndef STACK_ALIGNMENT +%if ARCH_X86_64 +%define STACK_ALIGNMENT 16 +%else +%define STACK_ALIGNMENT 4 +%endif +%endif + %define WIN64 0 %define UNIX64 0 %if ARCH_X86_64 @@ -108,8 +119,9 @@ ; %1 = number of arguments. loads them from stack if needed. ; %2 = number of registers used. pushes callee-saved regs if needed. ; %3 = number of xmm registers used. pushes callee-saved xmm regs if needed. -; %4 = (optional) stack size to be allocated. If not aligned (x86-32 ICC 10.x, -; MSVC or YMM), the stack will be manually aligned (to 16 or 32 bytes), +; %4 = (optional) stack size to be allocated. The stack will be aligned before +; allocating the specified stack size. If the required stack alignment is +; larger than the known stack alignment the stack will be manually aligned ; and an extra register will be allocated to hold the original stack ; pointer (to not invalidate r0m etc.). To prevent the use of an extra ; register as stack pointer, request a negative stack size. @@ -117,8 +129,10 @@ ; PROLOGUE can also be invoked by adding the same options to cglobal ; e.g. -; cglobal foo, 2,3,0, dst, src, tmp -; declares a function (foo), taking two args (dst and src) and one local variable (tmp) +; cglobal foo, 2,3,7,0x40, dst, src, tmp +; declares a function (foo) that automatically loads two arguments (dst and +; src) into registers, uses one additional register (tmp) plus 7 vector +; registers (m0-m6) and allocates 0x40 bytes of stack space. ; TODO Some functions can use some args directly from the stack. If they're the ; last args then you can just not declare them, but if they're in the middle @@ -319,26 +333,28 @@ DECLARE_REG_TMP_SIZE 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14 %assign n_arg_names %0 %endmacro +%define required_stack_alignment ((mmsize + 15) ~15) + %macro ALLOC_STACK 1-2 0 ; stack_size, n_xmm_regs (for win64 only) %ifnum %1 %if %1 != 0 -%assign %%stack_alignment ((mmsize + 15) ~15) +%assign %%pad 0 %assign stack_size %1 %if stack_size 0 %assign stack_size -stack_size %endif -%assign stack_size_padded stack_size %if WIN64 -%assign stack_size_padded stack_size_padded + 32 ; reserve 32 bytes for shadow space +%assign %%pad %%pad + 32 ; shadow space %if mmsize != 8 %assign xmm_regs_used %2 %if xmm_regs_used 8 -%assign stack_size_padded stack_size_padded + (xmm_regs_used-8)*16 +%assign %%pad %%pad + (xmm_regs_used-8)*16 ; callee-saved xmm registers %endif %endif %endif -%if mmsize = 16 HAVE_ALIGNED_STACK -%assign stack_size_padded stack_size_padded + %%stack_alignment - gprsize - (stack_offset (%%stack_alignment - 1)) +%if required_stack_alignment = STACK_ALIGNMENT +; maintain the current stack alignment +%assign stack_size_padded stack_size + %%pad + ((-%%pad-stack_offset-gprsize) (STACK_ALIGNMENT-1)) SUB rsp, stack_size_padded %else
Re: [FFmpeg-devel] [PATCH 1/4] x86inc: Support arbitrary stack alignments
On Mon, Aug 3, 2015 at 2:18 AM, Ronald S. Bultje rsbul...@gmail.com wrote: So, I think the code changes themselves look mostly healthy. Is there a behavioural difference before/after this patch? (Like: were there bugs in the original code, or does this change behaviour of previous code in a significant way?) Should only be what's in the commit message; Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not. which I guess you could classify as a bug (it certainly wasn't a sensible behavior). It's the reason for why the weird deblock stack allocation for example existed in the first place. So anything relying on the previous alignment behavior of automatic stack allocation using cglobal would be affected, other than that it shouldn't make any difference since ffmpeg doesn't use 16-byte stack alignment. I can only compile ffmpeg with --disable-programs when using msys2/msvc2015 (ffmpeg.c(437): error C2039: '_cnt': is not a member of '_iobuf'). Not sure if I'm doing something wrong, but if someone is able to test that better that would be nice. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/4] x86inc: Support arbitrary stack alignments
On 03/08/15 3:25 PM, Hendrik Leppkes wrote: On Mon, Aug 3, 2015 at 8:21 PM, James Almer jamr...@gmail.com wrote: On 03/08/15 3:12 PM, Hendrik Leppkes wrote: Since I wasn't sure in which direction of packed/planar the problem usually happened, I tried 8ch s32 - 8ch s32p, 8ch s32 - 2ch s32p, 8ch s32p - 8ch s32 and 8ch s32p - 2ch s32, and all work fine. - Hendrik If those tests were done with libswr and not libavr then it should be ok. Whatever ffmpeg.c uses when requesting those conversion, I assume libswr. ;) I don't know how you compiled ffmpeg or how you did the test (ffmpeg.c? library API?). Forgive me for being precautious :P ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/4] x86inc: Support arbitrary stack alignments
On Mon, Aug 3, 2015 at 8:21 PM, James Almer jamr...@gmail.com wrote: On 03/08/15 3:12 PM, Hendrik Leppkes wrote: Since I wasn't sure in which direction of packed/planar the problem usually happened, I tried 8ch s32 - 8ch s32p, 8ch s32 - 2ch s32p, 8ch s32p - 8ch s32 and 8ch s32p - 2ch s32, and all work fine. - Hendrik If those tests were done with libswr and not libavr then it should be ok. Whatever ffmpeg.c uses when requesting those conversion, I assume libswr. ;) - Hendrik ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/4] x86inc: Support arbitrary stack alignments
On 03/08/15 3:12 PM, Hendrik Leppkes wrote: Since I wasn't sure in which direction of packed/planar the problem usually happened, I tried 8ch s32 - 8ch s32p, 8ch s32 - 2ch s32p, 8ch s32p - 8ch s32 and 8ch s32p - 2ch s32, and all work fine. - Hendrik If those tests were done with libswr and not libavr then it should be ok. Thanks. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/4] x86inc: Support arbitrary stack alignments
On Mon, Aug 3, 2015 at 5:26 PM, James Almer jamr...@gmail.com wrote: On 03/08/15 9:50 AM, Hendrik Leppkes wrote: On Mon, Aug 3, 2015 at 10:31 AM, Henrik Gramner hen...@gramner.com wrote: On Mon, Aug 3, 2015 at 2:18 AM, Ronald S. Bultje rsbul...@gmail.com wrote: So, I think the code changes themselves look mostly healthy. Is there a behavioural difference before/after this patch? (Like: were there bugs in the original code, or does this change behaviour of previous code in a significant way?) Should only be what's in the commit message; Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not. which I guess you could classify as a bug (it certainly wasn't a sensible behavior). It's the reason for why the weird deblock stack allocation for example existed in the first place. So anything relying on the previous alignment behavior of automatic stack allocation using cglobal would be affected, other than that it shouldn't make any difference since ffmpeg doesn't use 16-byte stack alignment. I can only compile ffmpeg with --disable-programs when using msys2/msvc2015 (ffmpeg.c(437): error C2039: '_cnt': is not a member of '_iobuf'). Not sure if I'm doing something wrong, but if someone is able to test that better that would be nice. msvc2015 support is unfortunately still broken. I'll send a patch later for this. Otherwise, I tested the patch with msvc 2013 32-bit, and fate passed fine. If there is something else I should specifically test which may not be covered by fate, let me know. Just to be sure try to convert an 8ch audio stream from float/s24 to floatp/s24p. See if it crashes or gives wrong output. Since I wasn't sure in which direction of packed/planar the problem usually happened, I tried 8ch s32 - 8ch s32p, 8ch s32 - 2ch s32p, 8ch s32p - 8ch s32 and 8ch s32p - 2ch s32, and all work fine. - Hendrik ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/4] x86inc: Support arbitrary stack alignments
On Mon, Aug 3, 2015 at 10:31 AM, Henrik Gramner hen...@gramner.com wrote: On Mon, Aug 3, 2015 at 2:18 AM, Ronald S. Bultje rsbul...@gmail.com wrote: So, I think the code changes themselves look mostly healthy. Is there a behavioural difference before/after this patch? (Like: were there bugs in the original code, or does this change behaviour of previous code in a significant way?) Should only be what's in the commit message; Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not. which I guess you could classify as a bug (it certainly wasn't a sensible behavior). It's the reason for why the weird deblock stack allocation for example existed in the first place. So anything relying on the previous alignment behavior of automatic stack allocation using cglobal would be affected, other than that it shouldn't make any difference since ffmpeg doesn't use 16-byte stack alignment. I can only compile ffmpeg with --disable-programs when using msys2/msvc2015 (ffmpeg.c(437): error C2039: '_cnt': is not a member of '_iobuf'). Not sure if I'm doing something wrong, but if someone is able to test that better that would be nice. msvc2015 support is unfortunately still broken. I'll send a patch later for this. Otherwise, I tested the patch with msvc 2013 32-bit, and fate passed fine. If there is something else I should specifically test which may not be covered by fate, let me know. - Hendrik ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 1/4] x86inc: Support arbitrary stack alignments
Change ALLOC_STACK to always align the stack before allocating stack space for consistency. Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not. --- libavcodec/x86/h264_deblock.asm | 4 +-- libavutil/x86/x86inc.asm| 62 ++--- 2 files changed, 42 insertions(+), 24 deletions(-) diff --git a/libavcodec/x86/h264_deblock.asm b/libavcodec/x86/h264_deblock.asm index 14c8205..5151f3c 100644 --- a/libavcodec/x86/h264_deblock.asm +++ b/libavcodec/x86/h264_deblock.asm @@ -446,13 +446,13 @@ cglobal deblock_%1_luma_8, 5,5,8,2*%2 ;int8_t *tc0) ;- INIT_MMX cpuname -cglobal deblock_h_luma_8, 0,5,8,0x60+HAVE_ALIGNED_STACK*12 +cglobal deblock_h_luma_8, 0,5,8,0x60+12 movr0, r0mp movr3, r1m lear4, [r3*3] subr0, 4 lear1, [r0+r4] -%define pix_tmp esp+12*HAVE_ALIGNED_STACK +%define pix_tmp esp+12 ; transpose 6x16 - tmp space TRANSPOSE6x8_MEM PASS8ROWS(r0, r1, r3, r4), pix_tmp diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 12779f5..e176715 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -42,6 +42,17 @@ %define public_prefix private_prefix %endif +%if HAVE_ALIGNED_STACK +%define STACK_ALIGNMENT 16 +%endif +%ifndef STACK_ALIGNMENT +%if ARCH_X86_64 +%define STACK_ALIGNMENT 16 +%else +%define STACK_ALIGNMENT 4 +%endif +%endif + %define WIN64 0 %define UNIX64 0 %if ARCH_X86_64 @@ -108,8 +119,9 @@ ; %1 = number of arguments. loads them from stack if needed. ; %2 = number of registers used. pushes callee-saved regs if needed. ; %3 = number of xmm registers used. pushes callee-saved xmm regs if needed. -; %4 = (optional) stack size to be allocated. If not aligned (x86-32 ICC 10.x, -; MSVC or YMM), the stack will be manually aligned (to 16 or 32 bytes), +; %4 = (optional) stack size to be allocated. The stack will be aligned before +; allocating the specified stack size. If the required stack alignment is +; larger than the known stack alignment the stack will be manually aligned ; and an extra register will be allocated to hold the original stack ; pointer (to not invalidate r0m etc.). To prevent the use of an extra ; register as stack pointer, request a negative stack size. @@ -117,8 +129,10 @@ ; PROLOGUE can also be invoked by adding the same options to cglobal ; e.g. -; cglobal foo, 2,3,0, dst, src, tmp -; declares a function (foo), taking two args (dst and src) and one local variable (tmp) +; cglobal foo, 2,3,7,0x40, dst, src, tmp +; declares a function (foo) that automatically loads two arguments (dst and +; src) into registers, uses one additional register (tmp) plus 7 vector +; registers (m0-m6) and allocates 0x40 bytes of stack space. ; TODO Some functions can use some args directly from the stack. If they're the ; last args then you can just not declare them, but if they're in the middle @@ -319,26 +333,28 @@ DECLARE_REG_TMP_SIZE 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14 %assign n_arg_names %0 %endmacro +%define required_stack_alignment ((mmsize + 15) ~15) + %macro ALLOC_STACK 1-2 0 ; stack_size, n_xmm_regs (for win64 only) %ifnum %1 %if %1 != 0 -%assign %%stack_alignment ((mmsize + 15) ~15) +%assign %%pad 0 %assign stack_size %1 %if stack_size 0 %assign stack_size -stack_size %endif -%assign stack_size_padded stack_size %if WIN64 -%assign stack_size_padded stack_size_padded + 32 ; reserve 32 bytes for shadow space +%assign %%pad %%pad + 32 ; shadow space %if mmsize != 8 %assign xmm_regs_used %2 %if xmm_regs_used 8 -%assign stack_size_padded stack_size_padded + (xmm_regs_used-8)*16 +%assign %%pad %%pad + (xmm_regs_used-8)*16 ; callee-saved xmm registers %endif %endif %endif -%if mmsize = 16 HAVE_ALIGNED_STACK -%assign stack_size_padded stack_size_padded + %%stack_alignment - gprsize - (stack_offset (%%stack_alignment - 1)) +%if required_stack_alignment = STACK_ALIGNMENT +; maintain the current stack alignment +%assign stack_size_padded stack_size + %%pad + ((-%%pad-stack_offset-gprsize) (STACK_ALIGNMENT-1)) SUB rsp, stack_size_padded %else %assign %%reg_num (regs_used - 1) @@ -347,17 +363,17 @@ DECLARE_REG_TMP_SIZE 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14 ; it, i.e. in [rsp+stack_size_padded], so we can restore the ; stack in a single
Re: [FFmpeg-devel] [PATCH 1/4] x86inc: Support arbitrary stack alignments
On 02/08/15 5:40 PM, Henrik Gramner wrote: Change ALLOC_STACK to always align the stack before allocating stack space for consistency. Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not. --- libavcodec/x86/h264_deblock.asm | 4 +-- libavutil/x86/x86inc.asm| 62 ++--- Someone with msvc should test if this doesn't break 8ch audio conversion when using libswresample (libswresample/x86/audio_convert.asm), where HAVE_ALIGNED_STACK is also (ab)used. FATE will not because afaik there are no tests that deal with 8ch audio files at all. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/4] x86inc: Support arbitrary stack alignments
Hi, On Sun, Aug 2, 2015 at 4:40 PM, Henrik Gramner hen...@gramner.com wrote: Change ALLOC_STACK to always align the stack before allocating stack space for consistency. Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not. --- libavcodec/x86/h264_deblock.asm | 4 +-- libavutil/x86/x86inc.asm| 62 ++--- 2 files changed, 42 insertions(+), 24 deletions(-) So, I think the code changes themselves look mostly healthy. Is there a behavioural difference before/after this patch? (Like: were there bugs in the original code, or does this change behaviour of previous code in a significant way?) Thanks, Ronald ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/4] x86inc: Support arbitrary stack alignments
Hi, On Sun, Aug 2, 2015 at 7:56 PM, James Almer jamr...@gmail.com wrote: On 02/08/15 5:40 PM, Henrik Gramner wrote: Change ALLOC_STACK to always align the stack before allocating stack space for consistency. Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not. --- libavcodec/x86/h264_deblock.asm | 4 +-- libavutil/x86/x86inc.asm| 62 ++--- Someone with msvc should test if this doesn't break 8ch audio conversion when using libswresample (libswresample/x86/audio_convert.asm), where HAVE_ALIGNED_STACK is also (ab)used. FATE will not because afaik there are no tests that deal with 8ch audio files at all. Importantly, on 32bit only. Fortunately, I believe swscale also uses this construct in a few places, as does vp8/9, so a default fate run definitely is useful. Ronald ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel