Re: [x265] [PATCH 0 of 6 ] SAO SSE4 asm code for HIGH_BIT_DEPTH

2015-06-22 Thread Dnyaneshwar Gorade
Okay. Will check IACA report and try pxor for m0 and buffer 1023. On Mon, Jun 22, 2015 at 8:24 PM, chen chenm...@163.com wrote: right some comment: 'psignb X, [pb_128]' equal to 'psubb X, 0, X', in AVX2, second type faster, in SSE4, choice depends on IACA report in PMINSW, you buffer ZERO

Re: [x265] [PATCH 0 of 6 ] SAO SSE4 asm code for HIGH_BIT_DEPTH

2015-06-22 Thread chen
right some comment: 'psignb X, [pb_128]' equal to 'psubb X, 0, X', in AVX2, second type faster, in SSE4, choice depends on IACA report in PMINSW, you buffer ZERO into M0, and use pw_1023 directly, could you try buffer pw_1023 and use PXOR to get ZERO? At 2015-06-22

[x265] [PATCH 0 of 6 ] SAO SSE4 asm code for HIGH_BIT_DEPTH

2015-06-22 Thread dnyaneshwar
SAO_EO_08.97x974.03 8740.81 SAO_EO_110.18x 492.67 5017.42 SAO_EO_1_2Rows 11.21x 900.82 10095.86 SAO_EO_2[0] 6.27x207.22 1298.92 SAO_EO_2[1] 8.92x555.20 4949.69 SAO_EO_3[0] 4.97x236.72 1177.29