Re: [FFmpeg-devel] [PATCH] avfilter/vf_convolution: add 16-column operation for filter_column() to prepare for x86 SIMD.

2019-12-02 Thread chen
This is toy only, it depends on compiler On my PC, it helpful my old version compiler generate movaps other than movups. At 2019-12-02 17:21:58, "Carl Eugen Hoyos" wrote: >Am Mo., 2. Dez. 2019 um 08:33 Uhr schrieb chen : > >> +#define __assume(cond) do { if (!(cond))

Re: [FFmpeg-devel] [PATCH] avfilter/vf_convolution: add 16-column operation for filter_column() to prepare for x86 SIMD.

2019-12-02 Thread Carl Eugen Hoyos
Am Mo., 2. Dez. 2019 um 08:33 Uhr schrieb chen : > +#define __assume(cond) do { if (!(cond)) __builtin_unreachable(); } > while (0) We currently don't do that. If you have a testcase where it makes a big difference, adding it could be discussed but has to be checked in configure and added

Re: [FFmpeg-devel] [PATCH] avfilter/vf_convolution: add 16-column operation for filter_column() to prepare for x86 SIMD.

2019-12-02 Thread Carl Eugen Hoyos
Am Mo., 2. Dez. 2019 um 03:42 Uhr schrieb 徐鋆 : > I'm sorry not to reply in time. Definitely in time! > The performance of this C code is about 10% better than the existing C code. Please add this to the commit message. Carl Eugen ___ ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] avfilter/vf_convolution: add 16-column operation for filter_column() to prepare for x86 SIMD.

2019-12-01 Thread chen
I have a little suggest on filter_column16(..) [the function] Firstly, the function is confused with filter16_column(..) Secondly, the function's algoritym based on row direction, it means reduced address calculate operators and less cache performance, cost of them may more than calculate

Re: [FFmpeg-devel] [PATCH] avfilter/vf_convolution: add 16-column operation for filter_column() to prepare for x86 SIMD.

2019-12-01 Thread chen
I have a little suggest on filter_column16(..) [the function] Firstly, the function is confused with filter16_column(..) Secondly, the function's algoritym based on row direction, it means reduced address calculate operators and less cache performance, cost of them may more than calculate

Re: [FFmpeg-devel] [PATCH] avfilter/vf_convolution: add 16-column operation for filter_column() to prepare for x86 SIMD.

2019-12-01 Thread Song, Ruiling
> -Original Message- > From: ffmpeg-devel On Behalf Of > xuju...@sjtu.edu.cn > Sent: Wednesday, November 27, 2019 10:56 PM > To: ffmpeg-devel@ffmpeg.org > Cc: xuju...@sjtu.edu.cn > Subject: [FFmpeg-devel] [PATCH] avfilter/vf_convolution: add 16-column > operation for filter_column() to

Re: [FFmpeg-devel] [PATCH] avfilter/vf_convolution: add 16-column operation for filter_column() to prepare for x86 SIMD.

2019-12-01 Thread 徐鋆
Hi, Steven - 原始邮件 - 发件人: "Steven Liu" 收件人: "FFmpeg development discussions and patches" 抄送: "Steven Liu" 发送时间: 星期一, 2019年 12 月 02日 上午 10:44:48 主题: Re: [FFmpeg-devel] [PATCH] avfilter/vf_convolution: add 16-column operation for filter_column() to prepare for x86 SIMD. > 在

Re: [FFmpeg-devel] [PATCH] avfilter/vf_convolution: add 16-column operation for filter_column() to prepare for x86 SIMD.

2019-12-01 Thread Steven Liu
> 在 2019年12月2日,10:42,徐鋆 写道: > > I'm sorry not to reply in time. > > The performance of this C code is about 10% better than the existing C code. > > It will have a bigger improvement after X86 SIMD optimizations. 1. How to test? 1. 怎么测试的? 1. どうやってテストしたの? 2. Don’t TOP-Posting:

Re: [FFmpeg-devel] [PATCH] avfilter/vf_convolution: add 16-column operation for filter_column() to prepare for x86 SIMD.

2019-12-01 Thread 徐鋆
I'm sorry not to reply in time. The performance of this C code is about 10% better than the existing C code. It will have a bigger improvement after X86 SIMD optimizations. Xu Jun - 原始邮件 - 发件人: "Carl Eugen Hoyos" 收件人: "FFmpeg development discussions and patches" 发送时间: 星期四, 2019年 11 月

Re: [FFmpeg-devel] [PATCH] avfilter/vf_convolution: add 16-column operation for filter_column() to prepare for x86 SIMD.

2019-11-27 Thread Carl Eugen Hoyos
Am Mi., 27. Nov. 2019 um 15:56 Uhr schrieb : > From: Xu Jun > > In order to add x86 SIMD for filter_column(), I write a C function which > processes 16 columns at a time. How does this perform compared to the existing C code? Carl Eugen ___

[FFmpeg-devel] [PATCH] avfilter/vf_convolution: add 16-column operation for filter_column() to prepare for x86 SIMD.

2019-11-27 Thread xujunzz
From: Xu Jun In order to add x86 SIMD for filter_column(), I write a C function which processes 16 columns at a time. Signed-off-by: Xu Jun --- libavfilter/vf_convolution.c | 56 +++ libavfilter/x86/vf_convolution_init.c | 23 +++ 2 files changed, 79