Re: [FFmpeg-devel] [PATCH] avcodec/ppc/hevcdsp: Fix build failures with powerpc-linux-gnu-gcc-4.8 with --disable-optimizations
On Wed, Dec 05, 2018 at 12:15:14AM +0100, Carl Eugen Hoyos wrote: > 2018-12-04 18:02 GMT+01:00, Michael Niedermayer : > > On Tue, Dec 04, 2018 at 04:33:03PM +0100, Carl Eugen Hoyos wrote: > >> 2018-12-04 16:29 GMT+01:00, Michael Niedermayer : > >> > The affected functions could also be changed into macros, this is the > >> > smaller change to fix it though. And avoids (probably) less readable > >> > macros > >> > >> > The extra code should be optimized out when optimizations are done as > >> > all > >> > values are known at build after inlining. > >> > >> Shouldn't this be verified? > > > > ive verified it with the patch below > > Thank you! will apply thx [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Rewriting code that is poorly written but fully understood is good. Rewriting code that one doesnt understand is a sign that one is less smart then the original author, trying to rewrite it will not make it better. signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avcodec/ppc/hevcdsp: Fix build failures with powerpc-linux-gnu-gcc-4.8 with --disable-optimizations
2018-12-04 18:02 GMT+01:00, Michael Niedermayer : > On Tue, Dec 04, 2018 at 04:33:03PM +0100, Carl Eugen Hoyos wrote: >> 2018-12-04 16:29 GMT+01:00, Michael Niedermayer : >> > The affected functions could also be changed into macros, this is the >> > smaller change to fix it though. And avoids (probably) less readable >> > macros >> >> > The extra code should be optimized out when optimizations are done as >> > all >> > values are known at build after inlining. >> >> Shouldn't this be verified? > > ive verified it with the patch below Thank you! Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avcodec/ppc/hevcdsp: Fix build failures with powerpc-linux-gnu-gcc-4.8 with --disable-optimizations
On Tue, Dec 04, 2018 at 04:33:03PM +0100, Carl Eugen Hoyos wrote: > 2018-12-04 16:29 GMT+01:00, Michael Niedermayer : > > The affected functions could also be changed into macros, this is the > > smaller change to fix it though. And avoids (probably) less readable macros > > > The extra code should be optimized out when optimizations are done as all > > values are known at build after inlining. > > Shouldn't this be verified? ive verified it with the patch below only SIMD instructions are between the markers, so powerpc-linux-gnu-gcc-4.8 (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4 seems to optimize all conditional code out. All bets are off with a different compiler though. That could be worse or even better after the patch But i can do more tests if you want me to test something specific ? > This is speed-critical code, no? Yes, and the way this code is written depends on the compiler, if the compiler makes a mistake the function can be alot slower. Thats one of the reasons we use nasm/yasm on x86, that always produces the same result. thx [...] diff --git a/libavcodec/ppc/hevcdsp.c b/libavcodec/ppc/hevcdsp.c index c1d562a409..47246ed42d 100644 --- a/libavcodec/ppc/hevcdsp.c +++ b/libavcodec/ppc/hevcdsp.c @@ -57,13 +57,14 @@ static av_always_inline void transform4x4(vec_s16 src_01, vec_s16 src_23, o0 = vec_msums(src_13, trans4[1], zero); e1 = vec_msums(src_02, trans4[2], zero); o1 = vec_msums(src_13, trans4[3], zero); - +__asm volatile ("MARK\n\t"); switch(shift) { case 7: add = vec_sl(vec_splat_s32(1), vec_splat_u32( 7 - 1)); break; case 10: add = vec_sl(vec_splat_s32(1), vec_splat_u32(10 - 1)); break; case 12: add = vec_sl(vec_splat_s32(1), vec_splat_u32(12 - 1)); break; default: abort(); } +__asm volatile ("MARK-E\n\t"); e0 = vec_add(e0, add); e1 = vec_add(e1, add); @@ -79,13 +80,14 @@ static av_always_inline void scale(vec_s32 res[4], vec_s16 res_packed[2], { int i; vec_u32 v_shift; - +__asm volatile ("MARK\n\t"); switch(shift) { case 7: v_shift = vec_splat_u32(7) ; break; case 10: v_shift = vec_splat_u32(10); break; case 12: v_shift = vec_splat_u32(12); break; default: abort(); } +__asm volatile ("MARK-E2\n\t"); for (i = 0; i < 4; i++) res[i] = vec_sra(res[i], v_shift); -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB If you think the mosad wants you dead since a long time then you are either wrong or dead since a long time. signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avcodec/ppc/hevcdsp: Fix build failures with powerpc-linux-gnu-gcc-4.8 with --disable-optimizations
2018-12-04 16:29 GMT+01:00, Michael Niedermayer : > The affected functions could also be changed into macros, this is the > smaller change to fix it though. And avoids (probably) less readable macros > The extra code should be optimized out when optimizations are done as all > values are known at build after inlining. Shouldn't this be verified? This is speed-critical code, no? Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] avcodec/ppc/hevcdsp: Fix build failures with powerpc-linux-gnu-gcc-4.8 with --disable-optimizations
The affected functions could also be changed into macros, this is the smaller change to fix it though. And avoids (probably) less readable macros The extra code should be optimized out when optimizations are done as all values are known at build after inlining. Signed-off-by: Michael Niedermayer --- libavcodec/ppc/hevcdsp.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/libavcodec/ppc/hevcdsp.c b/libavcodec/ppc/hevcdsp.c index dcae43305a..c1d562a409 100644 --- a/libavcodec/ppc/hevcdsp.c +++ b/libavcodec/ppc/hevcdsp.c @@ -58,7 +58,13 @@ static av_always_inline void transform4x4(vec_s16 src_01, vec_s16 src_23, e1 = vec_msums(src_02, trans4[2], zero); o1 = vec_msums(src_13, trans4[3], zero); -add = vec_sl(vec_splat_s32(1), vec_splat_u32(shift - 1)); +switch(shift) { +case 7: add = vec_sl(vec_splat_s32(1), vec_splat_u32( 7 - 1)); break; +case 10: add = vec_sl(vec_splat_s32(1), vec_splat_u32(10 - 1)); break; +case 12: add = vec_sl(vec_splat_s32(1), vec_splat_u32(12 - 1)); break; +default: abort(); +} + e0 = vec_add(e0, add); e1 = vec_add(e1, add); @@ -72,7 +78,14 @@ static av_always_inline void scale(vec_s32 res[4], vec_s16 res_packed[2], const int shift) { int i; -vec_u32 v_shift = vec_splat_u32(shift); +vec_u32 v_shift; + +switch(shift) { +case 7: v_shift = vec_splat_u32(7) ; break; +case 10: v_shift = vec_splat_u32(10); break; +case 12: v_shift = vec_splat_u32(12); break; +default: abort(); +} for (i = 0; i < 4; i++) res[i] = vec_sra(res[i], v_shift); -- 2.19.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel