Re: [x265] [PATCH 3 of 3] asm:Fix sse_ss [32x32] & [64x64] for main12 SSE2

2015-10-15 Thread Ramya Sriraman
# HG changeset patch # User Ramya Sriraman # Date 1444216029 -19800 # Wed Oct 07 16:37:09 2015 +0530 # Node ID 6b2c146d0bcf28a19e7defe977a8f063240a3905 # Parent 0ea631d6f87d4fc056da26ff94c6ffa1120e69bd asm:Fix sse_ss [32x32] & [64x64] main12 SSE2 diff -r

Re: [x265] [PATCH] analysis: avoid redundant rect/amp mode analysis based on split block rdCost and mvCost for rd-0/4

2015-10-15 Thread Ashok Kumar Mishra
Below are the performance testing on Haswell with and without limiting rect/amp analysis mode in slow preset. *Before* D:\ashok>x265_b.exe --input \\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset slow --hash=1 --no-info --psnr --ssim -o test_b.hevc --bitrate 6000 encoded 500 frames

Re: [x265] [PATCH] analysis: avoid redundant rect/amp mode analysis based on split block rdCost and mvCost for rd-5/6

2015-10-15 Thread Ashok Kumar Mishra
Below are the performance testing on Haswell with and without limiting rect/amp analysis mode in veryslow preset. *Before* D:\ashok>x265_b.exe --input \\HEVC-TEST-2\testsequences\parkrun_ter_720p50.y4m --preset veryslow --hash=1 --no-info --psnr --ssim -o test_b.hevc encoded 504 frames in 223.08s

[x265] [PATCH] analysis: avoid redundant rect/amp mode analysis based on split block rdCost and mvCost for rd-0/4

2015-10-15 Thread ashok
# HG changeset patch # User Ashok Kumar Mishra # Date 1444897694 -19800 # Thu Oct 15 13:58:14 2015 +0530 # Node ID 65d7c1f5baf5fa619d773fcc2e1361d46f6df7f1 # Parent f3963e7e75b8dcb599250c082357e08fd32191a5 analysis: avoid redundant rect/amp mode analysis based on

[x265] [PATCH 1 of 2] asm: Replace MMX version of pixel_avg_w8 by SSE2, the MMX is slower on Skylake platform

2015-10-15 Thread Min Chen
# HG changeset patch # User Min Chen # Date 1444947352 18000 # Node ID 25d14acf30a0a9d17daea890e0096170ae876f1a # Parent fe65544b6c40d7cd62c2b86275bf98b264b6edb0 asm: Replace MMX version of pixel_avg_w8 by SSE2, the MMX is slower on Skylake platform ---

[x265] [PATCH 2 of 2] asm: fix illegal AVX instruction in mbtree_propagate_cost

2015-10-15 Thread Min Chen
# HG changeset patch # User Min Chen # Date 1444947354 18000 # Node ID 086f2ed5ffe81db91804aa2b5a4a3b83b2bcb060 # Parent 25d14acf30a0a9d17daea890e0096170ae876f1a asm: fix illegal AVX instruction in mbtree_propagate_cost --- source/common/x86/mc-a2.asm |2 +- 1 files

[x265] [PATCH] multilib: fix multiple definition of pelFilterLumaStrong_c

2015-10-15 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G # Date 1444972708 -19800 # Fri Oct 16 10:48:28 2015 +0530 # Node ID 76a36eabd4be405fc4880d882499a754c3f190fa # Parent fe65544b6c40d7cd62c2b86275bf98b264b6edb0 multilib: fix multiple definition of

Re: [x265] [PATCH 3 of 3] asm:Fix sse_ss [32x32] & [64x64] for main12 SSE2

2015-10-15 Thread chen
In your code, you use m4 as temporary/intermedia sum, but its dynamic range: every element 12 + 12 = 24 bits up to 64 iterate: +5 bits 64 elements in 4 dword register: +4 bits total = 24 + 5 + 4 = 33 bits above means when you use two of intrmedia sum registers, you just need qword sum in