Re: [FFmpeg-devel] [FFmpeg-cvslog] avutil/mem: Optimize fill32() by unrolling and using 64bit
2019-01-20 23:37 GMT+01:00, Michael Niedermayer : > On Sun, Jan 20, 2019 at 10:33:26PM +0100, Carl Eugen Hoyos wrote: >> 2019-01-20 22:22 GMT+01:00, Michael Niedermayer : >> > ffmpeg | branch: master | Michael Niedermayer | >> > Thu >> > Jan 17 22:35:10 2019 +0100| [12b1338be376a3e5fb606d9fe41b58dc4a9e62c7] >> > | >> > committer: Michael Niedermayer >> > >> > avutil/mem: Optimize fill32() by unrolling and using 64bit >> > >> > Reviewed-by: Marton Balint >> > Signed-off-by: Michael Niedermayer >> > >> >> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=12b1338be376a3e5fb606d9fe41b58dc4a9e62c7 >> > --- >> > >> > libavutil/mem.c | 12 >> > 1 file changed, 12 insertions(+) >> > >> > diff --git a/libavutil/mem.c b/libavutil/mem.c >> > index 6149755a6b..88fe09b179 100644 >> > --- a/libavutil/mem.c >> > +++ b/libavutil/mem.c >> > @@ -399,6 +399,18 @@ static void fill32(uint8_t *dst, int len) >> > { >> > uint32_t v = AV_RN32(dst - 4); >> > >> > +#if HAVE_FAST_64BIT >> >> I suspect this should be !X86_32 > >> >> > +uint64_t v2= v + ((uint64_t)v<<32); >> > +while (len >= 32) { >> > +AV_WN64(dst , v2); >> > +AV_WN64(dst+ 8, v2); >> > +AV_WN64(dst+16, v2); >> > +AV_WN64(dst+24, v2); >> > +dst += 32; >> > +len -= 32; >> > +} >> >> How can I test the performance of this function? > > with the testcase from the fuzzer (it should be substantially > faster in this case with teh next commit) > it should also be possible to test it with some fate tests > as this is used by some. I cannot measure any speed difference for the (lengthened) nuv and cscd fate samples with your patch, so I don't think this questions warrants further investigation. Thank you for the abort() suggestion, Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [FFmpeg-cvslog] avutil/mem: Optimize fill32() by unrolling and using 64bit
2019-01-20 23:04 GMT+01:00, Hendrik Leppkes : > On Sun, Jan 20, 2019 at 10:38 PM Carl Eugen Hoyos > wrote: >> >> 2019-01-20 22:22 GMT+01:00, Michael Niedermayer : >> > ffmpeg | branch: master | Michael Niedermayer | >> > Thu >> > Jan 17 22:35:10 2019 +0100| [12b1338be376a3e5fb606d9fe41b58dc4a9e62c7] | >> > committer: Michael Niedermayer >> > >> > avutil/mem: Optimize fill32() by unrolling and using 64bit >> > >> > Reviewed-by: Marton Balint >> > Signed-off-by: Michael Niedermayer >> > >> >> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=12b1338be376a3e5fb606d9fe41b58dc4a9e62c7 >> > --- >> > >> > libavutil/mem.c | 12 >> > 1 file changed, 12 insertions(+) >> > >> > diff --git a/libavutil/mem.c b/libavutil/mem.c >> > index 6149755a6b..88fe09b179 100644 >> > --- a/libavutil/mem.c >> > +++ b/libavutil/mem.c >> > @@ -399,6 +399,18 @@ static void fill32(uint8_t *dst, int len) >> > { >> > uint32_t v = AV_RN32(dst - 4); >> > >> > +#if HAVE_FAST_64BIT >> >> I suspect this should be !X86_32 > > fast_64bit is set on any native 64-bit platform. I know, this is the reason for my question. > If you can prove that its faster on some 32-bit > platforms as well, numbers shall be required. Really? Well, this was the reason for my question above. Note that last time it was claimed that "all 32-bit platforms are slower", it turned out to be wrong (or at least unreproducible). Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [FFmpeg-cvslog] avutil/mem: Optimize fill32() by unrolling and using 64bit
On Sun, Jan 20, 2019 at 10:33:26PM +0100, Carl Eugen Hoyos wrote: > 2019-01-20 22:22 GMT+01:00, Michael Niedermayer : > > ffmpeg | branch: master | Michael Niedermayer | Thu > > Jan 17 22:35:10 2019 +0100| [12b1338be376a3e5fb606d9fe41b58dc4a9e62c7] | > > committer: Michael Niedermayer > > > > avutil/mem: Optimize fill32() by unrolling and using 64bit > > > > Reviewed-by: Marton Balint > > Signed-off-by: Michael Niedermayer > > > >> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=12b1338be376a3e5fb606d9fe41b58dc4a9e62c7 > > --- > > > > libavutil/mem.c | 12 > > 1 file changed, 12 insertions(+) > > > > diff --git a/libavutil/mem.c b/libavutil/mem.c > > index 6149755a6b..88fe09b179 100644 > > --- a/libavutil/mem.c > > +++ b/libavutil/mem.c > > @@ -399,6 +399,18 @@ static void fill32(uint8_t *dst, int len) > > { > > uint32_t v = AV_RN32(dst - 4); > > > > +#if HAVE_FAST_64BIT > > I suspect this should be !X86_32 > > > +uint64_t v2= v + ((uint64_t)v<<32); > > +while (len >= 32) { > > +AV_WN64(dst , v2); > > +AV_WN64(dst+ 8, v2); > > +AV_WN64(dst+16, v2); > > +AV_WN64(dst+24, v2); > > +dst += 32; > > +len -= 32; > > +} > > How can I test the performance of this function? with the testcase from the fuzzer (it should be substantially faster in this case with teh next commit) it should also be possible to test it with some fate tests as this is used by some. You can add a abort() and run fate to see which tests use it [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Good people do not need laws to tell them to act responsibly, while bad people will find a way around the laws. -- Plato signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [FFmpeg-cvslog] avutil/mem: Optimize fill32() by unrolling and using 64bit
On Sun, Jan 20, 2019 at 10:38 PM Carl Eugen Hoyos wrote: > > 2019-01-20 22:22 GMT+01:00, Michael Niedermayer : > > ffmpeg | branch: master | Michael Niedermayer | Thu > > Jan 17 22:35:10 2019 +0100| [12b1338be376a3e5fb606d9fe41b58dc4a9e62c7] | > > committer: Michael Niedermayer > > > > avutil/mem: Optimize fill32() by unrolling and using 64bit > > > > Reviewed-by: Marton Balint > > Signed-off-by: Michael Niedermayer > > > >> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=12b1338be376a3e5fb606d9fe41b58dc4a9e62c7 > > --- > > > > libavutil/mem.c | 12 > > 1 file changed, 12 insertions(+) > > > > diff --git a/libavutil/mem.c b/libavutil/mem.c > > index 6149755a6b..88fe09b179 100644 > > --- a/libavutil/mem.c > > +++ b/libavutil/mem.c > > @@ -399,6 +399,18 @@ static void fill32(uint8_t *dst, int len) > > { > > uint32_t v = AV_RN32(dst - 4); > > > > +#if HAVE_FAST_64BIT > > I suspect this should be !X86_32 > fast_64bit is set on any native 64-bit platform. If you can prove that its faster on some 32-bit platforms as well, numbers shall be required. - Hendrik ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [FFmpeg-cvslog] avutil/mem: Optimize fill32() by unrolling and using 64bit
2019-01-20 22:22 GMT+01:00, Michael Niedermayer : > ffmpeg | branch: master | Michael Niedermayer | Thu > Jan 17 22:35:10 2019 +0100| [12b1338be376a3e5fb606d9fe41b58dc4a9e62c7] | > committer: Michael Niedermayer > > avutil/mem: Optimize fill32() by unrolling and using 64bit > > Reviewed-by: Marton Balint > Signed-off-by: Michael Niedermayer > >> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=12b1338be376a3e5fb606d9fe41b58dc4a9e62c7 > --- > > libavutil/mem.c | 12 > 1 file changed, 12 insertions(+) > > diff --git a/libavutil/mem.c b/libavutil/mem.c > index 6149755a6b..88fe09b179 100644 > --- a/libavutil/mem.c > +++ b/libavutil/mem.c > @@ -399,6 +399,18 @@ static void fill32(uint8_t *dst, int len) > { > uint32_t v = AV_RN32(dst - 4); > > +#if HAVE_FAST_64BIT I suspect this should be !X86_32 > +uint64_t v2= v + ((uint64_t)v<<32); > +while (len >= 32) { > +AV_WN64(dst , v2); > +AV_WN64(dst+ 8, v2); > +AV_WN64(dst+16, v2); > +AV_WN64(dst+24, v2); > +dst += 32; > +len -= 32; > +} How can I test the performance of this function? Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel