Re: [FFmpeg-devel] [FFmpeg-cvslog] avutil/mem: Optimize fill32() by unrolling and using 64bit

2019-01-20 Thread Carl Eugen Hoyos
2019-01-20 23:37 GMT+01:00, Michael Niedermayer :
> On Sun, Jan 20, 2019 at 10:33:26PM +0100, Carl Eugen Hoyos wrote:
>> 2019-01-20 22:22 GMT+01:00, Michael Niedermayer :
>> > ffmpeg | branch: master | Michael Niedermayer  |
>> > Thu
>> > Jan 17 22:35:10 2019 +0100| [12b1338be376a3e5fb606d9fe41b58dc4a9e62c7]
>> > |
>> > committer: Michael Niedermayer
>> >
>> > avutil/mem: Optimize fill32() by unrolling and using 64bit
>> >
>> > Reviewed-by: Marton Balint 
>> > Signed-off-by: Michael Niedermayer 
>> >
>> >> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=12b1338be376a3e5fb606d9fe41b58dc4a9e62c7
>> > ---
>> >
>> >  libavutil/mem.c | 12 
>> >  1 file changed, 12 insertions(+)
>> >
>> > diff --git a/libavutil/mem.c b/libavutil/mem.c
>> > index 6149755a6b..88fe09b179 100644
>> > --- a/libavutil/mem.c
>> > +++ b/libavutil/mem.c
>> > @@ -399,6 +399,18 @@ static void fill32(uint8_t *dst, int len)
>> >  {
>> >  uint32_t v = AV_RN32(dst - 4);
>> >
>> > +#if HAVE_FAST_64BIT
>>
>> I suspect this should be !X86_32
>
>>
>> > +uint64_t v2= v + ((uint64_t)v<<32);
>> > +while (len >= 32) {
>> > +AV_WN64(dst   , v2);
>> > +AV_WN64(dst+ 8, v2);
>> > +AV_WN64(dst+16, v2);
>> > +AV_WN64(dst+24, v2);
>> > +dst += 32;
>> > +len -= 32;
>> > +}
>>
>> How can I test the performance of this function?
>
> with the testcase from the fuzzer (it should be substantially
> faster in this case with teh next commit)

> it should also be possible to test it with some fate tests
> as this is used by some.

I cannot measure any speed difference for the (lengthened)
nuv and cscd fate samples with your patch, so I don't think
this questions warrants further investigation.

Thank you for the abort() suggestion, Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [FFmpeg-cvslog] avutil/mem: Optimize fill32() by unrolling and using 64bit

2019-01-20 Thread Carl Eugen Hoyos
2019-01-20 23:04 GMT+01:00, Hendrik Leppkes :
> On Sun, Jan 20, 2019 at 10:38 PM Carl Eugen Hoyos 
> wrote:
>>
>> 2019-01-20 22:22 GMT+01:00, Michael Niedermayer :
>> > ffmpeg | branch: master | Michael Niedermayer  |
>> > Thu
>> > Jan 17 22:35:10 2019 +0100| [12b1338be376a3e5fb606d9fe41b58dc4a9e62c7] |
>> > committer: Michael Niedermayer
>> >
>> > avutil/mem: Optimize fill32() by unrolling and using 64bit
>> >
>> > Reviewed-by: Marton Balint 
>> > Signed-off-by: Michael Niedermayer 
>> >
>> >> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=12b1338be376a3e5fb606d9fe41b58dc4a9e62c7
>> > ---
>> >
>> >  libavutil/mem.c | 12 
>> >  1 file changed, 12 insertions(+)
>> >
>> > diff --git a/libavutil/mem.c b/libavutil/mem.c
>> > index 6149755a6b..88fe09b179 100644
>> > --- a/libavutil/mem.c
>> > +++ b/libavutil/mem.c
>> > @@ -399,6 +399,18 @@ static void fill32(uint8_t *dst, int len)
>> >  {
>> >  uint32_t v = AV_RN32(dst - 4);
>> >
>> > +#if HAVE_FAST_64BIT
>>
>> I suspect this should be !X86_32
>
> fast_64bit is set on any native 64-bit platform.

I know, this is the reason for my question.

> If you can prove that its faster on some 32-bit
> platforms as well, numbers shall be required.

Really? Well, this was the reason for my question
above.
Note that last time it was claimed that "all 32-bit
platforms are slower", it turned out to be wrong
(or at least unreproducible).

Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [FFmpeg-cvslog] avutil/mem: Optimize fill32() by unrolling and using 64bit

2019-01-20 Thread Michael Niedermayer
On Sun, Jan 20, 2019 at 10:33:26PM +0100, Carl Eugen Hoyos wrote:
> 2019-01-20 22:22 GMT+01:00, Michael Niedermayer :
> > ffmpeg | branch: master | Michael Niedermayer  | Thu
> > Jan 17 22:35:10 2019 +0100| [12b1338be376a3e5fb606d9fe41b58dc4a9e62c7] |
> > committer: Michael Niedermayer
> >
> > avutil/mem: Optimize fill32() by unrolling and using 64bit
> >
> > Reviewed-by: Marton Balint 
> > Signed-off-by: Michael Niedermayer 
> >
> >> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=12b1338be376a3e5fb606d9fe41b58dc4a9e62c7
> > ---
> >
> >  libavutil/mem.c | 12 
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/libavutil/mem.c b/libavutil/mem.c
> > index 6149755a6b..88fe09b179 100644
> > --- a/libavutil/mem.c
> > +++ b/libavutil/mem.c
> > @@ -399,6 +399,18 @@ static void fill32(uint8_t *dst, int len)
> >  {
> >  uint32_t v = AV_RN32(dst - 4);
> >
> > +#if HAVE_FAST_64BIT
> 
> I suspect this should be !X86_32

> 
> > +uint64_t v2= v + ((uint64_t)v<<32);
> > +while (len >= 32) {
> > +AV_WN64(dst   , v2);
> > +AV_WN64(dst+ 8, v2);
> > +AV_WN64(dst+16, v2);
> > +AV_WN64(dst+24, v2);
> > +dst += 32;
> > +len -= 32;
> > +}
> 
> How can I test the performance of this function?

with the testcase from the fuzzer (it should be substantially
faster in this case with teh next commit)

it should also be possible to test it with some fate tests
as this is used by some.
You can add a abort() and run fate to see which tests use it




[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Good people do not need laws to tell them to act responsibly, while bad
people will find a way around the laws. -- Plato


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [FFmpeg-cvslog] avutil/mem: Optimize fill32() by unrolling and using 64bit

2019-01-20 Thread Hendrik Leppkes
On Sun, Jan 20, 2019 at 10:38 PM Carl Eugen Hoyos  wrote:
>
> 2019-01-20 22:22 GMT+01:00, Michael Niedermayer :
> > ffmpeg | branch: master | Michael Niedermayer  | Thu
> > Jan 17 22:35:10 2019 +0100| [12b1338be376a3e5fb606d9fe41b58dc4a9e62c7] |
> > committer: Michael Niedermayer
> >
> > avutil/mem: Optimize fill32() by unrolling and using 64bit
> >
> > Reviewed-by: Marton Balint 
> > Signed-off-by: Michael Niedermayer 
> >
> >> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=12b1338be376a3e5fb606d9fe41b58dc4a9e62c7
> > ---
> >
> >  libavutil/mem.c | 12 
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/libavutil/mem.c b/libavutil/mem.c
> > index 6149755a6b..88fe09b179 100644
> > --- a/libavutil/mem.c
> > +++ b/libavutil/mem.c
> > @@ -399,6 +399,18 @@ static void fill32(uint8_t *dst, int len)
> >  {
> >  uint32_t v = AV_RN32(dst - 4);
> >
> > +#if HAVE_FAST_64BIT
>
> I suspect this should be !X86_32
>

fast_64bit is set on any native 64-bit platform. If you can prove that
its faster on some 32-bit platforms as well, numbers shall be
required.

- Hendrik
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [FFmpeg-cvslog] avutil/mem: Optimize fill32() by unrolling and using 64bit

2019-01-20 Thread Carl Eugen Hoyos
2019-01-20 22:22 GMT+01:00, Michael Niedermayer :
> ffmpeg | branch: master | Michael Niedermayer  | Thu
> Jan 17 22:35:10 2019 +0100| [12b1338be376a3e5fb606d9fe41b58dc4a9e62c7] |
> committer: Michael Niedermayer
>
> avutil/mem: Optimize fill32() by unrolling and using 64bit
>
> Reviewed-by: Marton Balint 
> Signed-off-by: Michael Niedermayer 
>
>> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=12b1338be376a3e5fb606d9fe41b58dc4a9e62c7
> ---
>
>  libavutil/mem.c | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/libavutil/mem.c b/libavutil/mem.c
> index 6149755a6b..88fe09b179 100644
> --- a/libavutil/mem.c
> +++ b/libavutil/mem.c
> @@ -399,6 +399,18 @@ static void fill32(uint8_t *dst, int len)
>  {
>  uint32_t v = AV_RN32(dst - 4);
>
> +#if HAVE_FAST_64BIT

I suspect this should be !X86_32

> +uint64_t v2= v + ((uint64_t)v<<32);
> +while (len >= 32) {
> +AV_WN64(dst   , v2);
> +AV_WN64(dst+ 8, v2);
> +AV_WN64(dst+16, v2);
> +AV_WN64(dst+24, v2);
> +dst += 32;
> +len -= 32;
> +}

How can I test the performance of this function?

Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel