# HG changeset patch
# User Ramya Sriraman
# Date 1444216029 -19800
# Wed Oct 07 16:37:09 2015 +0530
# Node ID 6b2c146d0bcf28a19e7defe977a8f063240a3905
# Parent 0ea631d6f87d4fc056da26ff94c6ffa1120e69bd
asm:Fix sse_ss [32x32] & [64x64] main12 SSE2
diff -r
Below are the performance testing on Haswell with and without limiting
rect/amp analysis mode in slow preset.
*Before*
D:\ashok>x265_b.exe --input
\\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset slow
--hash=1 --no-info --psnr --ssim -o test_b.hevc --bitrate 6000
encoded 500 frames
Below are the performance testing on Haswell with and without limiting
rect/amp analysis mode in veryslow preset.
*Before*
D:\ashok>x265_b.exe --input
\\HEVC-TEST-2\testsequences\parkrun_ter_720p50.y4m --preset veryslow
--hash=1 --no-info --psnr --ssim -o test_b.hevc
encoded 504 frames in 223.08s
# HG changeset patch
# User Ashok Kumar Mishra
# Date 1444897694 -19800
# Thu Oct 15 13:58:14 2015 +0530
# Node ID 65d7c1f5baf5fa619d773fcc2e1361d46f6df7f1
# Parent f3963e7e75b8dcb599250c082357e08fd32191a5
analysis: avoid redundant rect/amp mode analysis based on
# HG changeset patch
# User Min Chen
# Date 1444947352 18000
# Node ID 25d14acf30a0a9d17daea890e0096170ae876f1a
# Parent fe65544b6c40d7cd62c2b86275bf98b264b6edb0
asm: Replace MMX version of pixel_avg_w8 by SSE2, the MMX is slower on Skylake
platform
---
# HG changeset patch
# User Min Chen
# Date 1444947354 18000
# Node ID 086f2ed5ffe81db91804aa2b5a4a3b83b2bcb060
# Parent 25d14acf30a0a9d17daea890e0096170ae876f1a
asm: fix illegal AVX instruction in mbtree_propagate_cost
---
source/common/x86/mc-a2.asm |2 +-
1 files
# HG changeset patch
# User Dnyaneshwar G
# Date 1444972708 -19800
# Fri Oct 16 10:48:28 2015 +0530
# Node ID 76a36eabd4be405fc4880d882499a754c3f190fa
# Parent fe65544b6c40d7cd62c2b86275bf98b264b6edb0
multilib: fix multiple definition of
In your code, you use m4 as temporary/intermedia sum, but its dynamic range:
every element 12 + 12 = 24 bits
up to 64 iterate: +5 bits
64 elements in 4 dword register: +4 bits
total = 24 + 5 + 4 = 33 bits
above means when you use two of intrmedia sum registers, you just need qword
sum in