Re: [FFmpeg-devel] [PATCH 1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes()
On Sat, Jan 19, 2019 at 12:28:25AM +0100, Marton Balint wrote: > > > On Thu, 17 Jan 2019, Michael Niedermayer wrote: > > >On Wed, Jan 16, 2019 at 08:00:22PM +0100, Marton Balint wrote: > >> > >> > >>On Tue, 15 Jan 2019, Michael Niedermayer wrote: > >> > >>>On Sun, Dec 30, 2018 at 07:15:49PM +0100, Marton Balint wrote: > > > On Fri, 28 Dec 2018, Michael Niedermayer wrote: > > >On Wed, Dec 26, 2018 at 10:16:47PM +0100, Marton Balint wrote: > >> > >> > >>On Wed, 26 Dec 2018, Paul B Mahol wrote: > >> > >>>On 12/26/18, Michael Niedermayer wrote: > On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote: > >On 12/25/18, Michael Niedermayer wrote: > >>Fixes: Timeout > >>Fixes: > >>11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > >>Before: Executed > >>clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > >>in 11294 ms > >>After : Executed > >>clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > >>in 4249 ms > >> > >>Found-by: continuous fuzzing process > >>https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > >>Signed-off-by: Michael Niedermayer > >>--- > >>libavutil/imgutils.c | 6 ++ > >>1 file changed, 6 insertions(+) > >> > >>diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c > >>index 4938a7ef67..cc38f1e878 100644 > >>--- a/libavutil/imgutils.c > >>+++ b/libavutil/imgutils.c > >>@@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t > >>dst_size, > >>uint8_t *clear, > >> } > >> } else if (clear_size == 4) { > >> uint32_t val = AV_RN32(clear); > >>+uint64_t val8 = val * 0x10001ULL; > >>+for (; dst_size >= 32; dst_size -= 32) { > >>+AV_WN64(dst , val8); AV_WN64(dst+ 8, val8); > >>+AV_WN64(dst+16, val8); AV_WN64(dst+24, val8); > >>+dst += 32; > >>+} > >> for (; dst_size >= 4; dst_size -= 4) { > >> AV_WN32(dst, val); > >> dst += 4; > >>-- > >>2.20.1 > >> > > > >NAK, implement special memset function instead. > > I can move the added loop into a seperate function, if thats what you > suggest ? > >>> > >>>No, don't do that. > >>> > All the code is already in a "special" memset though, this is > memset_bytes() > > >>> > >>>I guess function is less useful if its static. So any duplicate should > >>>be avoided in codebase. > >> > >>Isn't av_memcpy_backptr does almost exactly what is needed here? That > >>can > >>also be optimized further if needed. > > > >av_memcpy_backptr() copies data with overlap, its more like a recursive > >memmove(). > > So? As far as I see the memset_bytes function in imgutils.c can be > replaced > with this: > > if (clear_size > dst_size) > clear_size = dst_size; > memcpy(dst, clear, clear_size); > av_memcpy_backptr(dst + clear_size, clear_size, dst_size - clear_size); > > I am not against an av_memset_bytes API addition, but I believe it should > share code with av_memcpy_backptr to avoid duplication. > >>> > >>>ive implemented this, it does not seem to be really faster in the testcase > >> > >>I guess it is not faster because you have not applied your original > >>optimalization to fill32 in libavutil/mem.c. Could you compare speed after > >>optimizing that the same way your original patch did it with imgutils > >>memset_bytes? > > > >sure, that makes it faster: > > Thanks, both patches LGTM. will apply thanks [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Observe your enemies, for they first find out your faults. -- Antisthenes signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes()
On Thu, 17 Jan 2019, Michael Niedermayer wrote: On Wed, Jan 16, 2019 at 08:00:22PM +0100, Marton Balint wrote: On Tue, 15 Jan 2019, Michael Niedermayer wrote: On Sun, Dec 30, 2018 at 07:15:49PM +0100, Marton Balint wrote: On Fri, 28 Dec 2018, Michael Niedermayer wrote: On Wed, Dec 26, 2018 at 10:16:47PM +0100, Marton Balint wrote: On Wed, 26 Dec 2018, Paul B Mahol wrote: On 12/26/18, Michael Niedermayer wrote: On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote: On 12/25/18, Michael Niedermayer wrote: Fixes: Timeout Fixes: 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 Before: Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 11294 ms After : Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 4249 ms Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer --- libavutil/imgutils.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c index 4938a7ef67..cc38f1e878 100644 --- a/libavutil/imgutils.c +++ b/libavutil/imgutils.c @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t dst_size, uint8_t *clear, } } else if (clear_size == 4) { uint32_t val = AV_RN32(clear); +uint64_t val8 = val * 0x10001ULL; +for (; dst_size >= 32; dst_size -= 32) { +AV_WN64(dst , val8); AV_WN64(dst+ 8, val8); +AV_WN64(dst+16, val8); AV_WN64(dst+24, val8); +dst += 32; +} for (; dst_size >= 4; dst_size -= 4) { AV_WN32(dst, val); dst += 4; -- 2.20.1 NAK, implement special memset function instead. I can move the added loop into a seperate function, if thats what you suggest ? No, don't do that. All the code is already in a "special" memset though, this is memset_bytes() I guess function is less useful if its static. So any duplicate should be avoided in codebase. Isn't av_memcpy_backptr does almost exactly what is needed here? That can also be optimized further if needed. av_memcpy_backptr() copies data with overlap, its more like a recursive memmove(). So? As far as I see the memset_bytes function in imgutils.c can be replaced with this: if (clear_size > dst_size) clear_size = dst_size; memcpy(dst, clear, clear_size); av_memcpy_backptr(dst + clear_size, clear_size, dst_size - clear_size); I am not against an av_memset_bytes API addition, but I believe it should share code with av_memcpy_backptr to avoid duplication. ive implemented this, it does not seem to be really faster in the testcase I guess it is not faster because you have not applied your original optimalization to fill32 in libavutil/mem.c. Could you compare speed after optimizing that the same way your original patch did it with imgutils memset_bytes? sure, that makes it faster: Thanks, both patches LGTM. Regards, Marton ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes()
On Wed, Jan 16, 2019 at 08:00:22PM +0100, Marton Balint wrote: > > > On Tue, 15 Jan 2019, Michael Niedermayer wrote: > > >On Sun, Dec 30, 2018 at 07:15:49PM +0100, Marton Balint wrote: > >> > >> > >>On Fri, 28 Dec 2018, Michael Niedermayer wrote: > >> > >>>On Wed, Dec 26, 2018 at 10:16:47PM +0100, Marton Balint wrote: > > > On Wed, 26 Dec 2018, Paul B Mahol wrote: > > >On 12/26/18, Michael Niedermayer wrote: > >>On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote: > >>>On 12/25/18, Michael Niedermayer wrote: > Fixes: Timeout > Fixes: > 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > Before: Executed > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > in 11294 ms > After : Executed > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > in 4249 ms > > Found-by: continuous fuzzing process > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > Signed-off-by: Michael Niedermayer > --- > libavutil/imgutils.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c > index 4938a7ef67..cc38f1e878 100644 > --- a/libavutil/imgutils.c > +++ b/libavutil/imgutils.c > @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t > dst_size, > uint8_t *clear, > } > } else if (clear_size == 4) { > uint32_t val = AV_RN32(clear); > +uint64_t val8 = val * 0x10001ULL; > +for (; dst_size >= 32; dst_size -= 32) { > +AV_WN64(dst , val8); AV_WN64(dst+ 8, val8); > +AV_WN64(dst+16, val8); AV_WN64(dst+24, val8); > +dst += 32; > +} > for (; dst_size >= 4; dst_size -= 4) { > AV_WN32(dst, val); > dst += 4; > -- > 2.20.1 > > >>> > >>>NAK, implement special memset function instead. > >> > >>I can move the added loop into a seperate function, if thats what you > >>suggest ? > > > >No, don't do that. > > > >>All the code is already in a "special" memset though, this is > >>memset_bytes() > >> > > > >I guess function is less useful if its static. So any duplicate should > >be avoided in codebase. > > Isn't av_memcpy_backptr does almost exactly what is needed here? That can > also be optimized further if needed. > >>> > >>>av_memcpy_backptr() copies data with overlap, its more like a recursive > >>>memmove(). > >> > >>So? As far as I see the memset_bytes function in imgutils.c can be replaced > >>with this: > >> > >>if (clear_size > dst_size) > >>clear_size = dst_size; > >>memcpy(dst, clear, clear_size); > >>av_memcpy_backptr(dst + clear_size, clear_size, dst_size - clear_size); > >> > >>I am not against an av_memset_bytes API addition, but I believe it should > >>share code with av_memcpy_backptr to avoid duplication. > > > >ive implemented this, it does not seem to be really faster in the testcase > > I guess it is not faster because you have not applied your original > optimalization to fill32 in libavutil/mem.c. Could you compare speed after > optimizing that the same way your original patch did it with imgutils > memset_bytes? sure, that makes it faster: From f5660e4025bb8161ebdb55cda03b656cbf685b1a Mon Sep 17 00:00:00 2001 From: Michael Niedermayer Date: Thu, 17 Jan 2019 22:35:10 +0100 Subject: [PATCH 1/2] avutil/mem: Optimize fill32() by unrolling and using 64bit Signed-off-by: Michael Niedermayer --- libavutil/mem.c | 12 1 file changed, 12 insertions(+) diff --git a/libavutil/mem.c b/libavutil/mem.c index 6149755a6b..88fe09b179 100644 --- a/libavutil/mem.c +++ b/libavutil/mem.c @@ -399,6 +399,18 @@ static void fill32(uint8_t *dst, int len) { uint32_t v = AV_RN32(dst - 4); +#if HAVE_FAST_64BIT +uint64_t v2= v + ((uint64_t)v<<32); +while (len >= 32) { +AV_WN64(dst , v2); +AV_WN64(dst+ 8, v2); +AV_WN64(dst+16, v2); +AV_WN64(dst+24, v2); +dst += 32; +len -= 32; +} +#endif + while (len >= 4) { AV_WN32(dst, v); dst += 4; -- 2.20.1 From 9b5573f91a043a818fe1fd6b93d0d36c4830cd9c Mon Sep 17 00:00:00 2001 From: Michael Niedermayer Date: Tue, 25 Dec 2018 23:15:20 +0100 Subject: [PATCH 2/2] avutil/imgutils: Optimize memset_bytes() by using av_memcpy_backptr() This is strongly based on code by Marton Balint Fixes: Timeout Fixes: 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 Before: Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
Re: [FFmpeg-devel] [PATCH 1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes()
On Tue, 15 Jan 2019, Michael Niedermayer wrote: On Sun, Dec 30, 2018 at 07:15:49PM +0100, Marton Balint wrote: On Fri, 28 Dec 2018, Michael Niedermayer wrote: On Wed, Dec 26, 2018 at 10:16:47PM +0100, Marton Balint wrote: On Wed, 26 Dec 2018, Paul B Mahol wrote: On 12/26/18, Michael Niedermayer wrote: On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote: On 12/25/18, Michael Niedermayer wrote: Fixes: Timeout Fixes: 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 Before: Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 11294 ms After : Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 4249 ms Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer --- libavutil/imgutils.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c index 4938a7ef67..cc38f1e878 100644 --- a/libavutil/imgutils.c +++ b/libavutil/imgutils.c @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t dst_size, uint8_t *clear, } } else if (clear_size == 4) { uint32_t val = AV_RN32(clear); +uint64_t val8 = val * 0x10001ULL; +for (; dst_size >= 32; dst_size -= 32) { +AV_WN64(dst , val8); AV_WN64(dst+ 8, val8); +AV_WN64(dst+16, val8); AV_WN64(dst+24, val8); +dst += 32; +} for (; dst_size >= 4; dst_size -= 4) { AV_WN32(dst, val); dst += 4; -- 2.20.1 NAK, implement special memset function instead. I can move the added loop into a seperate function, if thats what you suggest ? No, don't do that. All the code is already in a "special" memset though, this is memset_bytes() I guess function is less useful if its static. So any duplicate should be avoided in codebase. Isn't av_memcpy_backptr does almost exactly what is needed here? That can also be optimized further if needed. av_memcpy_backptr() copies data with overlap, its more like a recursive memmove(). So? As far as I see the memset_bytes function in imgutils.c can be replaced with this: if (clear_size > dst_size) clear_size = dst_size; memcpy(dst, clear, clear_size); av_memcpy_backptr(dst + clear_size, clear_size, dst_size - clear_size); I am not against an av_memset_bytes API addition, but I believe it should share code with av_memcpy_backptr to avoid duplication. ive implemented this, it does not seem to be really faster in the testcase I guess it is not faster because you have not applied your original optimalization to fill32 in libavutil/mem.c. Could you compare speed after optimizing that the same way your original patch did it with imgutils memset_bytes? Thanks, Marton ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes()
On Sun, Dec 30, 2018 at 07:15:49PM +0100, Marton Balint wrote: > > > On Fri, 28 Dec 2018, Michael Niedermayer wrote: > > >On Wed, Dec 26, 2018 at 10:16:47PM +0100, Marton Balint wrote: > >> > >> > >>On Wed, 26 Dec 2018, Paul B Mahol wrote: > >> > >>>On 12/26/18, Michael Niedermayer wrote: > On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote: > >On 12/25/18, Michael Niedermayer wrote: > >>Fixes: Timeout > >>Fixes: > >>11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > >>Before: Executed > >>clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > >>in 11294 ms > >>After : Executed > >>clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > >>in 4249 ms > >> > >>Found-by: continuous fuzzing process > >>https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > >>Signed-off-by: Michael Niedermayer > >>--- > >> libavutil/imgutils.c | 6 ++ > >> 1 file changed, 6 insertions(+) > >> > >>diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c > >>index 4938a7ef67..cc38f1e878 100644 > >>--- a/libavutil/imgutils.c > >>+++ b/libavutil/imgutils.c > >>@@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t > >>dst_size, > >>uint8_t *clear, > >> } > >> } else if (clear_size == 4) { > >> uint32_t val = AV_RN32(clear); > >>+uint64_t val8 = val * 0x10001ULL; > >>+for (; dst_size >= 32; dst_size -= 32) { > >>+AV_WN64(dst , val8); AV_WN64(dst+ 8, val8); > >>+AV_WN64(dst+16, val8); AV_WN64(dst+24, val8); > >>+dst += 32; > >>+} > >> for (; dst_size >= 4; dst_size -= 4) { > >> AV_WN32(dst, val); > >> dst += 4; > >>-- > >>2.20.1 > >> > > > >NAK, implement special memset function instead. > > I can move the added loop into a seperate function, if thats what you > suggest ? > >>> > >>>No, don't do that. > >>> > All the code is already in a "special" memset though, this is > memset_bytes() > > >>> > >>>I guess function is less useful if its static. So any duplicate should > >>>be avoided in codebase. > >> > >>Isn't av_memcpy_backptr does almost exactly what is needed here? That can > >>also be optimized further if needed. > > > >av_memcpy_backptr() copies data with overlap, its more like a recursive > >memmove(). > > So? As far as I see the memset_bytes function in imgutils.c can be replaced > with this: > > if (clear_size > dst_size) > clear_size = dst_size; > memcpy(dst, clear, clear_size); > av_memcpy_backptr(dst + clear_size, clear_size, dst_size - clear_size); > > I am not against an av_memset_bytes API addition, but I believe it should > share code with av_memcpy_backptr to avoid duplication. ive implemented this, it does not seem to be really faster in the testcase patches below for reference: From 30549c4e674abce48608d99ed3e7f8ccbd557ada Mon Sep 17 00:00:00 2001 From: Michael Niedermayer Date: Tue, 25 Dec 2018 23:15:20 +0100 Subject: [PATCH] avutil/imgutils: Optimize writing 4 bytes in memset_bytes() Fixes: Timeout Fixes: 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 Before: Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 11294 ms After : Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 4249 ms Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer --- libavutil/imgutils.c | 8 1 file changed, 8 insertions(+) diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c index 4938a7ef67..6c0d3950de 100644 --- a/libavutil/imgutils.c +++ b/libavutil/imgutils.c @@ -529,6 +529,14 @@ static void memset_bytes(uint8_t *dst, size_t dst_size, uint8_t *clear, } } else if (clear_size == 4) { uint32_t val = AV_RN32(clear); +#if HAVE_FAST_64BIT +uint64_t val8 = val * 0x10001ULL; +for (; dst_size >= 32; dst_size -= 32) { +AV_WN64(dst , val8); AV_WN64(dst+ 8, val8); +AV_WN64(dst+16, val8); AV_WN64(dst+24, val8); +dst += 32; +} +#endif for (; dst_size >= 4; dst_size -= 4) { AV_WN32(dst, val); dst += 4; -- 2.20.1 From 8e5140bf92d7e41090bfca1c6163f9c428402904 Mon Sep 17 00:00:00 2001 From: Michael Niedermayer Date: Tue, 25 Dec 2018 23:15:20 +0100 Subject: [PATCH] avutil/imgutils: Optimize memset_bytes() by using av_memcpy_backptr() This is strongly based on code by Marton Balint Fixes: Timeout Fixes: 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 Before: Executed clusterfuzz-testcase-minimized-ffmpeg_A
Re: [FFmpeg-devel] [PATCH 1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes()
On Sun, Dec 30, 2018 at 07:15:49PM +0100, Marton Balint wrote: > > > On Fri, 28 Dec 2018, Michael Niedermayer wrote: > > >On Wed, Dec 26, 2018 at 10:16:47PM +0100, Marton Balint wrote: > >> > >> > >>On Wed, 26 Dec 2018, Paul B Mahol wrote: > >> > >>>On 12/26/18, Michael Niedermayer wrote: > On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote: > >On 12/25/18, Michael Niedermayer wrote: > >>Fixes: Timeout > >>Fixes: > >>11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > >>Before: Executed > >>clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > >>in 11294 ms > >>After : Executed > >>clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > >>in 4249 ms > >> > >>Found-by: continuous fuzzing process > >>https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > >>Signed-off-by: Michael Niedermayer > >>--- > >> libavutil/imgutils.c | 6 ++ > >> 1 file changed, 6 insertions(+) > >> > >>diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c > >>index 4938a7ef67..cc38f1e878 100644 > >>--- a/libavutil/imgutils.c > >>+++ b/libavutil/imgutils.c > >>@@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t > >>dst_size, > >>uint8_t *clear, > >> } > >> } else if (clear_size == 4) { > >> uint32_t val = AV_RN32(clear); > >>+uint64_t val8 = val * 0x10001ULL; > >>+for (; dst_size >= 32; dst_size -= 32) { > >>+AV_WN64(dst , val8); AV_WN64(dst+ 8, val8); > >>+AV_WN64(dst+16, val8); AV_WN64(dst+24, val8); > >>+dst += 32; > >>+} > >> for (; dst_size >= 4; dst_size -= 4) { > >> AV_WN32(dst, val); > >> dst += 4; > >>-- > >>2.20.1 > >> > > > >NAK, implement special memset function instead. > > I can move the added loop into a seperate function, if thats what you > suggest ? > >>> > >>>No, don't do that. > >>> > All the code is already in a "special" memset though, this is > memset_bytes() > > >>> > >>>I guess function is less useful if its static. So any duplicate should > >>>be avoided in codebase. > >> > >>Isn't av_memcpy_backptr does almost exactly what is needed here? That can > >>also be optimized further if needed. > > > >av_memcpy_backptr() copies data with overlap, its more like a recursive > >memmove(). > > So? As far as I see the memset_bytes function in imgutils.c can be replaced > with this: > > if (clear_size > dst_size) > clear_size = dst_size; > memcpy(dst, clear, clear_size); > av_memcpy_backptr(dst + clear_size, clear_size, dst_size - clear_size); > > I am not against an av_memset_bytes API addition, but I believe it should > share code with av_memcpy_backptr to avoid duplication. Iam a bit concerned that combining them could result in lower speed but i can surely combine them if people prefer? thx [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB If you think the mosad wants you dead since a long time then you are either wrong or dead since a long time. signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes()
On Fri, 28 Dec 2018, Michael Niedermayer wrote: On Wed, Dec 26, 2018 at 10:16:47PM +0100, Marton Balint wrote: On Wed, 26 Dec 2018, Paul B Mahol wrote: On 12/26/18, Michael Niedermayer wrote: On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote: On 12/25/18, Michael Niedermayer wrote: Fixes: Timeout Fixes: 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 Before: Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 11294 ms After : Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 4249 ms Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer --- libavutil/imgutils.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c index 4938a7ef67..cc38f1e878 100644 --- a/libavutil/imgutils.c +++ b/libavutil/imgutils.c @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t dst_size, uint8_t *clear, } } else if (clear_size == 4) { uint32_t val = AV_RN32(clear); +uint64_t val8 = val * 0x10001ULL; +for (; dst_size >= 32; dst_size -= 32) { +AV_WN64(dst , val8); AV_WN64(dst+ 8, val8); +AV_WN64(dst+16, val8); AV_WN64(dst+24, val8); +dst += 32; +} for (; dst_size >= 4; dst_size -= 4) { AV_WN32(dst, val); dst += 4; -- 2.20.1 NAK, implement special memset function instead. I can move the added loop into a seperate function, if thats what you suggest ? No, don't do that. All the code is already in a "special" memset though, this is memset_bytes() I guess function is less useful if its static. So any duplicate should be avoided in codebase. Isn't av_memcpy_backptr does almost exactly what is needed here? That can also be optimized further if needed. av_memcpy_backptr() copies data with overlap, its more like a recursive memmove(). So? As far as I see the memset_bytes function in imgutils.c can be replaced with this: if (clear_size > dst_size) clear_size = dst_size; memcpy(dst, clear, clear_size); av_memcpy_backptr(dst + clear_size, clear_size, dst_size - clear_size); I am not against an av_memset_bytes API addition, but I believe it should share code with av_memcpy_backptr to avoid duplication. Regards, Marton ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes()
On Wed, Dec 26, 2018 at 10:16:47PM +0100, Marton Balint wrote: > > > On Wed, 26 Dec 2018, Paul B Mahol wrote: > > >On 12/26/18, Michael Niedermayer wrote: > >>On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote: > >>>On 12/25/18, Michael Niedermayer wrote: > Fixes: Timeout > Fixes: > 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > Before: Executed > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > in 11294 ms > After : Executed > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > in 4249 ms > > Found-by: continuous fuzzing process > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > Signed-off-by: Michael Niedermayer > --- > libavutil/imgutils.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c > index 4938a7ef67..cc38f1e878 100644 > --- a/libavutil/imgutils.c > +++ b/libavutil/imgutils.c > @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t > dst_size, > uint8_t *clear, > } > } else if (clear_size == 4) { > uint32_t val = AV_RN32(clear); > +uint64_t val8 = val * 0x10001ULL; > +for (; dst_size >= 32; dst_size -= 32) { > +AV_WN64(dst , val8); AV_WN64(dst+ 8, val8); > +AV_WN64(dst+16, val8); AV_WN64(dst+24, val8); > +dst += 32; > +} > for (; dst_size >= 4; dst_size -= 4) { > AV_WN32(dst, val); > dst += 4; > -- > 2.20.1 > > >>> > >>>NAK, implement special memset function instead. > >> > >>I can move the added loop into a seperate function, if thats what you > >>suggest ? > > > >No, don't do that. > > > >>All the code is already in a "special" memset though, this is > >>memset_bytes() > >> > > > >I guess function is less useful if its static. So any duplicate should > >be avoided in codebase. > > Isn't av_memcpy_backptr does almost exactly what is needed here? That can > also be optimized further if needed. av_memcpy_backptr() copies data with overlap, its more like a recursive memmove(). thx [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB No snowflake in an avalanche ever feels responsible. -- Voltaire signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes()
On Wed, Dec 26, 2018 at 12:30:05AM +, Kieran Kunhya wrote: > On Tue, 25 Dec 2018 at 22:17 Michael Niedermayer > wrote: > > > Fixes: Timeout > > Fixes: > > 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > > Before: Executed > > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > > in 11294 ms > > After : Executed > > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > > in 4249 ms > > > > So basically you go from one arbitrary timeout to another. I assume your > fuzzer timeout is 5 or 10 seconds and so you manage to make it pass? I think the timeout used is 25sec. I assume the server that detected it was maybe slower than the machine i used. Its even quite possible that variation between runs, due to hardware, load or other issues to cause timeouts to disappear. But none of this really should matter in the context of this patchset. Which simply makes the code faster. Thanks [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Many things microsoft did are stupid, but not doing something just because microsoft did it is even more stupid. If everything ms did were stupid they would be bankrupt already. signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes()
On 12/26/18, Michael Niedermayer wrote: > On Wed, Dec 26, 2018 at 10:02:15PM +0100, Paul B Mahol wrote: >> On 12/26/18, Michael Niedermayer wrote: >> > On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote: >> >> On 12/25/18, Michael Niedermayer wrote: >> >> > Fixes: Timeout >> >> > Fixes: >> >> > 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 >> >> > Before: Executed >> >> > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 >> >> > in 11294 ms >> >> > After : Executed >> >> > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 >> >> > in 4249 ms >> >> > >> >> > Found-by: continuous fuzzing process >> >> > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg >> >> > Signed-off-by: Michael Niedermayer >> >> > --- >> >> > libavutil/imgutils.c | 6 ++ >> >> > 1 file changed, 6 insertions(+) >> >> > >> >> > diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c >> >> > index 4938a7ef67..cc38f1e878 100644 >> >> > --- a/libavutil/imgutils.c >> >> > +++ b/libavutil/imgutils.c >> >> > @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t >> >> > dst_size, >> >> > uint8_t *clear, >> >> > } >> >> > } else if (clear_size == 4) { >> >> > uint32_t val = AV_RN32(clear); >> >> > +uint64_t val8 = val * 0x10001ULL; >> >> > +for (; dst_size >= 32; dst_size -= 32) { >> >> > +AV_WN64(dst , val8); AV_WN64(dst+ 8, val8); >> >> > +AV_WN64(dst+16, val8); AV_WN64(dst+24, val8); >> >> > +dst += 32; >> >> > +} >> >> > for (; dst_size >= 4; dst_size -= 4) { >> >> > AV_WN32(dst, val); >> >> > dst += 4; >> >> > -- >> >> > 2.20.1 >> >> > >> >> >> >> NAK, implement special memset function instead. >> > >> > I can move the added loop into a seperate function, if thats what you >> > suggest ? >> >> No, don't do that. >> >> > All the code is already in a "special" memset though, this is >> > memset_bytes() >> > >> >> I guess function is less useful if its static. So any duplicate should >> be avoided in codebase. > > i can make it non static and export it if you want ? > Yes. Also make it used where there is already similar code doing same. > [...] > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > If you fake or manipulate statistics in a paper in physics you will never > get a job again. > If you fake or manipulate statistics in a paper in medicin you will get > a job for life at the pharma industry. > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes()
On Wed, 26 Dec 2018, Paul B Mahol wrote: On 12/26/18, Michael Niedermayer wrote: On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote: On 12/25/18, Michael Niedermayer wrote: > Fixes: Timeout > Fixes: > 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > Before: Executed > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > in 11294 ms > After : Executed > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > in 4249 ms > > Found-by: continuous fuzzing process > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > Signed-off-by: Michael Niedermayer > --- > libavutil/imgutils.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c > index 4938a7ef67..cc38f1e878 100644 > --- a/libavutil/imgutils.c > +++ b/libavutil/imgutils.c > @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t > dst_size, > uint8_t *clear, > } > } else if (clear_size == 4) { > uint32_t val = AV_RN32(clear); > +uint64_t val8 = val * 0x10001ULL; > +for (; dst_size >= 32; dst_size -= 32) { > +AV_WN64(dst , val8); AV_WN64(dst+ 8, val8); > +AV_WN64(dst+16, val8); AV_WN64(dst+24, val8); > +dst += 32; > +} > for (; dst_size >= 4; dst_size -= 4) { > AV_WN32(dst, val); > dst += 4; > -- > 2.20.1 > NAK, implement special memset function instead. I can move the added loop into a seperate function, if thats what you suggest ? No, don't do that. All the code is already in a "special" memset though, this is memset_bytes() I guess function is less useful if its static. So any duplicate should be avoided in codebase. Isn't av_memcpy_backptr does almost exactly what is needed here? That can also be optimized further if needed. Thanks, Marton ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes()
On Wed, Dec 26, 2018 at 10:02:15PM +0100, Paul B Mahol wrote: > On 12/26/18, Michael Niedermayer wrote: > > On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote: > >> On 12/25/18, Michael Niedermayer wrote: > >> > Fixes: Timeout > >> > Fixes: > >> > 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > >> > Before: Executed > >> > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > >> > in 11294 ms > >> > After : Executed > >> > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > >> > in 4249 ms > >> > > >> > Found-by: continuous fuzzing process > >> > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > >> > Signed-off-by: Michael Niedermayer > >> > --- > >> > libavutil/imgutils.c | 6 ++ > >> > 1 file changed, 6 insertions(+) > >> > > >> > diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c > >> > index 4938a7ef67..cc38f1e878 100644 > >> > --- a/libavutil/imgutils.c > >> > +++ b/libavutil/imgutils.c > >> > @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t > >> > dst_size, > >> > uint8_t *clear, > >> > } > >> > } else if (clear_size == 4) { > >> > uint32_t val = AV_RN32(clear); > >> > +uint64_t val8 = val * 0x10001ULL; > >> > +for (; dst_size >= 32; dst_size -= 32) { > >> > +AV_WN64(dst , val8); AV_WN64(dst+ 8, val8); > >> > +AV_WN64(dst+16, val8); AV_WN64(dst+24, val8); > >> > +dst += 32; > >> > +} > >> > for (; dst_size >= 4; dst_size -= 4) { > >> > AV_WN32(dst, val); > >> > dst += 4; > >> > -- > >> > 2.20.1 > >> > > >> > >> NAK, implement special memset function instead. > > > > I can move the added loop into a seperate function, if thats what you > > suggest ? > > No, don't do that. > > > All the code is already in a "special" memset though, this is > > memset_bytes() > > > > I guess function is less useful if its static. So any duplicate should > be avoided in codebase. i can make it non static and export it if you want ? [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB If you fake or manipulate statistics in a paper in physics you will never get a job again. If you fake or manipulate statistics in a paper in medicin you will get a job for life at the pharma industry. signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes()
On 12/26/18, Michael Niedermayer wrote: > On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote: >> On 12/25/18, Michael Niedermayer wrote: >> > Fixes: Timeout >> > Fixes: >> > 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 >> > Before: Executed >> > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 >> > in 11294 ms >> > After : Executed >> > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 >> > in 4249 ms >> > >> > Found-by: continuous fuzzing process >> > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg >> > Signed-off-by: Michael Niedermayer >> > --- >> > libavutil/imgutils.c | 6 ++ >> > 1 file changed, 6 insertions(+) >> > >> > diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c >> > index 4938a7ef67..cc38f1e878 100644 >> > --- a/libavutil/imgutils.c >> > +++ b/libavutil/imgutils.c >> > @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t >> > dst_size, >> > uint8_t *clear, >> > } >> > } else if (clear_size == 4) { >> > uint32_t val = AV_RN32(clear); >> > +uint64_t val8 = val * 0x10001ULL; >> > +for (; dst_size >= 32; dst_size -= 32) { >> > +AV_WN64(dst , val8); AV_WN64(dst+ 8, val8); >> > +AV_WN64(dst+16, val8); AV_WN64(dst+24, val8); >> > +dst += 32; >> > +} >> > for (; dst_size >= 4; dst_size -= 4) { >> > AV_WN32(dst, val); >> > dst += 4; >> > -- >> > 2.20.1 >> > >> >> NAK, implement special memset function instead. > > I can move the added loop into a seperate function, if thats what you > suggest ? No, don't do that. > All the code is already in a "special" memset though, this is > memset_bytes() > I guess function is less useful if its static. So any duplicate should be avoided in codebase. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes()
On Tue, Dec 25, 2018 at 10:12:13PM -0300, James Almer wrote: > On 12/25/2018 7:15 PM, Michael Niedermayer wrote: > > Fixes: Timeout > > Fixes: > > 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > > Before: Executed > > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > > in 11294 ms > > After : Executed > > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > > in 4249 ms > > > > Found-by: continuous fuzzing process > > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > > Signed-off-by: Michael Niedermayer > > --- > > libavutil/imgutils.c | 6 ++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c > > index 4938a7ef67..cc38f1e878 100644 > > --- a/libavutil/imgutils.c > > +++ b/libavutil/imgutils.c > > @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t > > dst_size, uint8_t *clear, > > } > > } else if (clear_size == 4) { > > uint32_t val = AV_RN32(clear); > > +uint64_t val8 = val * 0x10001ULL; > > +for (; dst_size >= 32; dst_size -= 32) { > > +AV_WN64(dst , val8); AV_WN64(dst+ 8, val8); > > +AV_WN64(dst+16, val8); AV_WN64(dst+24, val8); > > +dst += 32; > > +} > > This should be wrapped with a HAVE_FAST_64BIT preprocessor check. will do so > > Also, is it much slower if you also write one per loop like everywhere > else in the function? I'd prefer if things are consistent. as in the patch: 3955 ms 3954 ms 3954 ms with one write per iteration: 5705 ms 5635 ms 5629 ms > Similarly, you could add four and eight bytes loops to the clear_size == > 2 case above. yes i can if you want me to?, but i have no testcase for that so it would be untested thx [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Dictatorship naturally arises out of democracy, and the most aggravated form of tyranny and slavery out of the most extreme liberty. -- Plato signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes()
On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote: > On 12/25/18, Michael Niedermayer wrote: > > Fixes: Timeout > > Fixes: > > 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > > Before: Executed > > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > > in 11294 ms > > After : Executed > > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > > in 4249 ms > > > > Found-by: continuous fuzzing process > > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > > Signed-off-by: Michael Niedermayer > > --- > > libavutil/imgutils.c | 6 ++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c > > index 4938a7ef67..cc38f1e878 100644 > > --- a/libavutil/imgutils.c > > +++ b/libavutil/imgutils.c > > @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t dst_size, > > uint8_t *clear, > > } > > } else if (clear_size == 4) { > > uint32_t val = AV_RN32(clear); > > +uint64_t val8 = val * 0x10001ULL; > > +for (; dst_size >= 32; dst_size -= 32) { > > +AV_WN64(dst , val8); AV_WN64(dst+ 8, val8); > > +AV_WN64(dst+16, val8); AV_WN64(dst+24, val8); > > +dst += 32; > > +} > > for (; dst_size >= 4; dst_size -= 4) { > > AV_WN32(dst, val); > > dst += 4; > > -- > > 2.20.1 > > > > NAK, implement special memset function instead. I can move the added loop into a seperate function, if thats what you suggest ? All the code is already in a "special" memset though, this is memset_bytes() thx [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB "I am not trying to be anyone's saviour, I'm trying to think about the future and not be sad" - Elon Musk signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes()
On 12/25/18, Michael Niedermayer wrote: > Fixes: Timeout > Fixes: > 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > Before: Executed > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > in 11294 ms > After : Executed > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > in 4249 ms > > Found-by: continuous fuzzing process > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > Signed-off-by: Michael Niedermayer > --- > libavutil/imgutils.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c > index 4938a7ef67..cc38f1e878 100644 > --- a/libavutil/imgutils.c > +++ b/libavutil/imgutils.c > @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t dst_size, > uint8_t *clear, > } > } else if (clear_size == 4) { > uint32_t val = AV_RN32(clear); > +uint64_t val8 = val * 0x10001ULL; > +for (; dst_size >= 32; dst_size -= 32) { > +AV_WN64(dst , val8); AV_WN64(dst+ 8, val8); > +AV_WN64(dst+16, val8); AV_WN64(dst+24, val8); > +dst += 32; > +} > for (; dst_size >= 4; dst_size -= 4) { > AV_WN32(dst, val); > dst += 4; > -- > 2.20.1 > NAK, implement special memset function instead. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes()
On 12/25/2018 7:15 PM, Michael Niedermayer wrote: > Fixes: Timeout > Fixes: > 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > Before: Executed > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > in 11294 ms > After : Executed > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > in 4249 ms > > Found-by: continuous fuzzing process > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > Signed-off-by: Michael Niedermayer > --- > libavutil/imgutils.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c > index 4938a7ef67..cc38f1e878 100644 > --- a/libavutil/imgutils.c > +++ b/libavutil/imgutils.c > @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t dst_size, > uint8_t *clear, > } > } else if (clear_size == 4) { > uint32_t val = AV_RN32(clear); > +uint64_t val8 = val * 0x10001ULL; > +for (; dst_size >= 32; dst_size -= 32) { > +AV_WN64(dst , val8); AV_WN64(dst+ 8, val8); > +AV_WN64(dst+16, val8); AV_WN64(dst+24, val8); > +dst += 32; > +} This should be wrapped with a HAVE_FAST_64BIT preprocessor check. Also, is it much slower if you also write one per loop like everywhere else in the function? I'd prefer if things are consistent. Similarly, you could add four and eight bytes loops to the clear_size == 2 case above. > for (; dst_size >= 4; dst_size -= 4) { > AV_WN32(dst, val); > dst += 4; > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes()
On Tue, 25 Dec 2018 at 22:17 Michael Niedermayer wrote: > Fixes: Timeout > Fixes: > 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > Before: Executed > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > in 11294 ms > After : Executed > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > in 4249 ms > So basically you go from one arbitrary timeout to another. I assume your fuzzer timeout is 5 or 10 seconds and so you manage to make it pass? Kieran ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes()
Fixes: Timeout Fixes: 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 Before: Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 11294 ms After : Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 4249 ms Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer --- libavutil/imgutils.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c index 4938a7ef67..cc38f1e878 100644 --- a/libavutil/imgutils.c +++ b/libavutil/imgutils.c @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t dst_size, uint8_t *clear, } } else if (clear_size == 4) { uint32_t val = AV_RN32(clear); +uint64_t val8 = val * 0x10001ULL; +for (; dst_size >= 32; dst_size -= 32) { +AV_WN64(dst , val8); AV_WN64(dst+ 8, val8); +AV_WN64(dst+16, val8); AV_WN64(dst+24, val8); +dst += 32; +} for (; dst_size >= 4; dst_size -= 4) { AV_WN32(dst, val); dst += 4; -- 2.20.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel