[x265] [PATCH] asm: intrapred_angX_4x4 sse2 performance tweaks

2015-06-21 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1434936838 25200 # Node ID b870e819ade1c9f197766318ffa7d96814dbb3cb # Parent 44b6b2df7016f0129e66d91e9aab03261d02758a asm: intrapred_angX_4x4 sse2 performance tweaks Created individual primitives for angles 19-25 and 27-33 to

[x265] [PATCH] asm: intrapred_angX_4x4 sse2 performance tweaks 10-bit

2015-06-21 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1434945136 25200 # Node ID 99018e7df815e0c85f8477938fd7cf59d9610317 # Parent b870e819ade1c9f197766318ffa7d96814dbb3cb asm: intrapred_angX_4x4 sse2 performance tweaks 10-bit Created individual primitives for angles 19-25 and 27-33

[x265] [PATCH] asm: dst4 sse2 8bpp and 10bpp

2015-06-10 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1433948100 25200 # Node ID c9debeec039e01c501884ab10dc9e32f55092b73 # Parent 6245476add8f0562e3ccb657f572ff94fe96adf0 asm: dst4 sse2 8bpp and 10bpp This replaces c code. 64-bit dst4x4 1.43x1575.01 2249.96

[x265] [PATCH] asm: count_nonzero ssse3 to sse2

2015-06-10 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1433952103 25200 # Node ID c3da462abd1ff1341e43081fd651591e43fc79f2 # Parent 6245476add8f0562e3ccb657f572ff94fe96adf0 asm: count_nonzero ssse3 to sse2 The ssse3 count_nonzero primitives only use up to sse2 instructions. This patch

[x265] [PATCH] testbench: add missing index for chromaPartStr

2015-06-05 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1433518611 25200 # Node ID 2c5d6a1825389e052badbb46e3b4fdfe3b65aa48 # Parent 43afbde189f390c74f580b0d377731b498c7f7ce testbench: add missing index for chromaPartStr diff -r 43afbde189f3 -r 2c5d6a182538 source/test/pixelharness.cpp

[x265] [PATCH] asm: filterPixelToShort 8-bit and 10-bit sse2

2015-06-04 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1433466246 25200 # Node ID a828fd4b5f48f7c493fe48e5e12a1d34498c1433 # Parent 0f0d88319f7cc96661eef3c3dcc1befcf60354f3 asm: filterPixelToShort 8-bit and 10-bit sse2 This replaces c code for all of filterPixelToShort for 8 and 10

[x265] [PATCH] asm: interp_4tap_horiz_pX sse3 10-bit

2015-06-02 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1433260747 25200 # Node ID fcfba27ecf0b9dac8da123da8cdcac75763496f3 # Parent 0f0d88319f7cc96661eef3c3dcc1befcf60354f3 asm: interp_4tap_horiz_pX sse3 10-bit This replaces c code for all of 4tap_horiz pp and ps. 64-bit

[x265] [PATCH] asm: interp_8tap_horiz_pX sse2 10-bit

2015-05-27 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1432758084 25200 # Node ID 5e81b9f2acf59e970adccf2c0c2e23bc76406ea1 # Parent 18939c0e321f08207fa0a383939bc44485773013 asm: interp_8tap_horiz_pX sse2 10-bit This replaces c code for all of interp_8tap_horiz pp and ps for 10-bit.

[x265] [PATCH] asm: interp_8tap_vert_pX sse2

2015-05-26 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1432691029 25200 # Node ID 20df9b085d013253edf8d57e10d8eb1630d9927a # Parent 8ddc790790a46de9ceadea388f6271acdb3012ed asm: interp_8tap_vert_pX sse2 This replaces c code for all of interp_8tap_vert pp and ps. 64-bit

[x265] [PATCH] asm: interp_4tap_horiz_ps sse3

2015-05-21 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1432261447 25200 # Node ID 4330ef5ddfcb64b1a621149fca0a4550c2a2f36f # Parent 234bc93bd51698801fad77cc861177ed019f5113 asm: interp_4tap_horiz_ps sse3 This replaces c code for all of interp_4tap_horiz_ps for sse3 64-bit

[x265] [PATCH 2 of 2] asm: interp_4tap_vert_pX_4xN sse2

2015-05-19 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1432078001 25200 # Node ID 509f7cbf8e09d6ddec4aa58040cfd206879d59e7 # Parent 3e07cba4b2034db2b819b2e11e98ee4b851d52b5 asm: interp_4tap_vert_pX_4xN sse2 Improved register usage for addressing of output. This improvement helps

[x265] [PATCH 1 of 2] asm: interp_4tap_vert_ps_4x2 sse2

2015-05-19 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1432070824 25200 # Node ID 3e07cba4b2034db2b819b2e11e98ee4b851d52b5 # Parent d7b100e51e828833eee006f1da93e499ac161d28 asm: interp_4tap_vert_ps_4x2 sse2 Removed unneeded add instruction. In theory this should provide a small

[x265] [PATCH 0 of 2 ] asm: interp_4tap_vert_pX_4xN sse2

2015-05-19 Thread dtyx265
Small performance improvement in register addressing to reduce the number of lea instructions. I tried these type of tweaks on the other interp_4tap_vert_pX primitives only to find mixed results and might submit more tweaks after more investigation.

[x265] [PATCH] asm: interp_4tap_vert_pX_4xN sse2

2015-05-19 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1432085346 25200 # Node ID e096c40ce8ff9c170bdb8caa094f53b30ebd7db7 # Parent 3e07cba4b2034db2b819b2e11e98ee4b851d52b5 asm: interp_4tap_vert_pX_4xN sse2 Improved register usage for addressing of output. This improvement helps

[x265] [PATCH 04 of 12] asm: interp_4tap_vert_ps_6xN sse2

2015-05-18 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431991674 25200 # Node ID 2f44ae6d6250677b87c31ca21a0098e44b19ee98 # Parent 660aa8e3a00e3c22543f2fcbc61ff0d81287f9cd asm: interp_4tap_vert_ps_6xN sse2 Converted vert_pp_6xN macro to also create ps primitives. This replaces c

[x265] [PATCH 09 of 12] asm: interp_4tap_vert_ps_24xN sse2

2015-05-18 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431993738 25200 # Node ID a40fcb9a2fab1a4bd76e995b77a82779450e3082 # Parent b518e087f849d391dafab2e3f3aa50c8b211fa3b asm: interp_4tap_vert_ps_24xN sse2 Converted vert_pp_24xN macro to also create ps primitives. This replaces c

[x265] [PATCH 07 of 12] asm: interp_4tap_vert_ps_12xN sse2

2015-05-18 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431993275 25200 # Node ID 1041ae2d7f4d9f005a36fbcbf6cf752d480c # Parent 6a2fe809ba56b8f54de9d09603b57e6887735eab asm: interp_4tap_vert_ps_12xN sse2 Converted vert_pp_12xN macro to also create ps primitives. This replaces c

[x265] [PATCH 10 of 12] asm: interp_4tap_vert_ps_32xN sse2

2015-05-18 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431994043 25200 # Node ID 91010ea886c50f9802c1ab872bd2648041137e19 # Parent a40fcb9a2fab1a4bd76e995b77a82779450e3082 asm: interp_4tap_vert_ps_32xN sse2 Converted vert_pp_32xN macro to also create ps primitives. This replaces c

[x265] [PATCH 08 of 12] asm: interp_4tap_vert_ps_16xN sse2

2015-05-18 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431993530 25200 # Node ID b518e087f849d391dafab2e3f3aa50c8b211fa3b # Parent 1041ae2d7f4d9f005a36fbcbf6cf752d480c asm: interp_4tap_vert_ps_16xN sse2 Converted vert_pp_16xN macro to also create ps primitives. This replaces c

[x265] [PATCH 12 of 12] Call macros to reduce code size of primitive setup

2015-05-18 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431994972 25200 # Node ID b7da9b5e994f52e5beace23d99bb35e0c62e7973 # Parent a1ae3a91f5e011753017db8579296b5702439579 Call macros to reduce code size of primitive setup diff -r a1ae3a91f5e0 -r b7da9b5e994f

[x265] [PATCH 06 of 12] asm: interp_4tap_vert_ps_8xN sse2

2015-05-18 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431992999 25200 # Node ID 6a2fe809ba56b8f54de9d09603b57e6887735eab # Parent e1d6fc3777bca3f72bf04367a345ad91124d0bc5 asm: interp_4tap_vert_ps_8xN sse2 Converted vert_pp_8xN macro to also create ps primitives. This replaces c

[x265] [PATCH 03 of 12] asm: interp_4tap_vert_ps_4xN sse2

2015-05-18 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431990978 25200 # Node ID 660aa8e3a00e3c22543f2fcbc61ff0d81287f9cd # Parent 54423715e7a28e0ca5874649ebe1a999e4d93463 asm: interp_4tap_vert_ps_4xN sse2 Converted vert_pp_4xN macro to also create ps primitives. This replaces c

[x265] [PATCH 05 of 12] asm: interp_4tap_vert_ps_8xN sse2

2015-05-18 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431992044 25200 # Node ID e1d6fc3777bca3f72bf04367a345ad91124d0bc5 # Parent 2f44ae6d6250677b87c31ca21a0098e44b19ee98 asm: interp_4tap_vert_ps_8xN sse2 Converted vert_pp_8xN macro to also create ps primitives. This replaces c

[x265] [PATCH 00 of 12 ] asm: interp_4tap_vert_ps sse2

2015-05-18 Thread dtyx265
Modified interp_4tap_vert_pp macros to also generate interp_4tap_vert_ps primitives. Invoked macros to setup primitives to reduce code size. ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel

[x265] [PATCH 02 of 12] asm: interp_4tap_vert_ps_4x2 sse2

2015-05-18 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431990450 25200 # Node ID 54423715e7a28e0ca5874649ebe1a999e4d93463 # Parent 16ec2193116749c053ebf4fd08a15aa403305a0b asm: interp_4tap_vert_ps_4x2 sse2 Converted vert_pp_4x2 primitive to macro that also creates ps. This replaces

[x265] [PATCH 01 of 12] asm: interp_4tap_vert_ps_2xN sse2

2015-05-18 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431990059 25200 # Node ID 16ec2193116749c053ebf4fd08a15aa403305a0b # Parent 8592bf81d0848279fa79cd1487406cb516dffe99 asm: interp_4tap_vert_ps_2xN sse2 Updated vert_pp_2xN macro to also create ps. This replaces c code for ps with

[x265] [PATCH 07 of 12] asm: interp_4tap_vert_ps_12xN sse2

2015-05-17 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431914441 25200 # Node ID b6a91319ebe4a777f20a52a9d0ef801c087a19e2 # Parent 2a91f18790caee8c3b77838f04ae131acc2544b2 asm: interp_4tap_vert_ps_12xN sse2 Converted vert_pp_12xN macro to also create ps primitives. This replaces c

[x265] [PATCH 11 of 12] asm: interp_4tap_vert_ps_64xN and interp_4tap_vert_ps_48x64 sse2

2015-05-17 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431916172 25200 # Node ID 4e4a2fcde1c526f85365d852c30b663a4a7fdad4 # Parent 7b6544657015334e0fa571fc369ac08c401e9619 asm: interp_4tap_vert_ps_64xN and interp_4tap_vert_ps_48x64 sse2 Converted vert_pp_64xN macro to also create ps

[x265] [PATCH 01 of 12] asm: interp_4tap_vert_ps_2xN sse2

2015-05-17 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431911009 25200 # Node ID 465fb4340a241e501b53a6241f5ae81c29ba073a # Parent 8592bf81d0848279fa79cd1487406cb516dffe99 asm: interp_4tap_vert_ps_2xN sse2 Updated vert_pp_2xN macro to also create ps. This replaces c code for ps with

[x265] [PATCH 00 of 12 ] asm: interp_4tap_vert_ps

2015-05-17 Thread dtyx265
Modified interp_4tap_vert_pp macros to also generate interp_4tap_vert_ps primitives. Invoked macros to setup primitives to reduce code size. ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel

[x265] [PATCH 06 of 12] asm: interp_4tap_vert_ps_8xN sse2

2015-05-17 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431914024 25200 # Node ID 2a91f18790caee8c3b77838f04ae131acc2544b2 # Parent 4018cf6354c4524ec0d0409ade3de01e19f92364 asm: interp_4tap_vert_ps_8xN sse2 Converted vert_pp_8xN macro to also create ps primitives. This replaces c

[x265] [PATCH 04 of 12] asm: interp_4tap_vert_ps_6xN sse2

2015-05-17 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431912814 25200 # Node ID 5e00387a0130682e7467649bb6b25c0d11a88343 # Parent c2624b61f4c7d894616a7dc1e8a6cc1c0a506028 asm: interp_4tap_vert_ps_6xN sse2 Converted vert_pp_6xN macro to also create ps primitives. This replaces c

[x265] [PATCH 03 of 12] asm: interp_4tap_vert_ps_4xN sse2

2015-05-17 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431912252 25200 # Node ID c2624b61f4c7d894616a7dc1e8a6cc1c0a506028 # Parent 72bba6b9e99739599d04be62c7e02a3c8faa asm: interp_4tap_vert_ps_4xN sse2 Converted vert_pp_4xN macro to also create ps primitives. This replaces c

[x265] [PATCH 12 of 12] Call macros to reduce code size of primitive setup

2015-05-17 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431916884 25200 # Node ID cc8bb6db5d323c77803ecceee271d80b93508db0 # Parent 4e4a2fcde1c526f85365d852c30b663a4a7fdad4 Call macros to reduce code size of primitive setup diff -r 4e4a2fcde1c5 -r cc8bb6db5d32

[x265] [PATCH 09 of 12] asm: interp_4tap_vert_ps_24xN sse2

2015-05-17 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431915216 25200 # Node ID 6b04fb941142b809bbf7326a115a859ce3a08d84 # Parent f8333270d592d9ada2318fb3286ca884b13d3249 asm: interp_4tap_vert_ps_24xN sse2 Converted vert_pp_24xN macro to also create ps primitives. This replaces c

[x265] [PATCH 08 of 12] asm: interp_4tap_vert_ps_16xN sse2

2015-05-17 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431914811 25200 # Node ID f8333270d592d9ada2318fb3286ca884b13d3249 # Parent b6a91319ebe4a777f20a52a9d0ef801c087a19e2 asm: interp_4tap_vert_ps_16xN sse2 Converted vert_pp_16xN macro to also create ps primitives. This replaces c

[x265] [PATCH 05 of 12] asm: interp_4tap_vert_ps_8xN sse2

2015-05-17 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431913291 25200 # Node ID 4018cf6354c4524ec0d0409ade3de01e19f92364 # Parent 5e00387a0130682e7467649bb6b25c0d11a88343 asm: interp_4tap_vert_ps_8xN sse2 Converted vert_pp_8xN macro to also create ps primitives. This replaces c

[x265] [PATCH 0 of 2 ] asm: interp_4tap_vert_pp sse2

2015-05-12 Thread dtyx265
This replaces c code for all of 32xN, 48xN and 64xN ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel

[x265] [PATCH 2 of 2] asm: interp_4tap_vert_pp sse2

2015-05-12 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431471728 25200 # Node ID 7f01a6dd81d9ec93b858e8b9328b1ee4e3b19c81 # Parent 53edc0b1c6b0c91c114f50164b68273a3b720e78 asm: interp_4tap_vert_pp sse2 This replaces c code for 48x64, 64x16, 64x32, 64x48 and 64x64 64-bit

[x265] [PATCH] asm: interp_4tap_horiz_pp sse3

2015-05-11 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431368383 25200 # Node ID f43b8e01ab507ac36825128322e02a1e06b7cd01 # Parent 3700169eb622204e7476d8b56772771b4f4e52c1 asm: interp_4tap_horiz_pp sse3 Reduce code size with macros move sse4 macro closer to sse4 code There are no

[x265] [PATCH 3 of 3] asm: interp_4tap_vert_pp sse2

2015-05-11 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431394769 25200 # Node ID 7c523bee141534fa2e896aeb7b7e1727ee8f6480 # Parent 7a81847ae6a2b1a00c2f3f2eb00ba8ed0a475958 asm: interp_4tap_vert_pp sse2 This replaces c code for 24x32 and 24x64 64_bit ./test/TestBench --testbench

[x265] [PATCH 2 of 3] asm: interp_4tap_vert_pp sse2

2015-05-11 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431394073 25200 # Node ID 7a81847ae6a2b1a00c2f3f2eb00ba8ed0a475958 # Parent 4f8da861a78953a4f16828f14e16c445d3b22cef asm: interp_4tap_vert_pp sse2 This replaces c code for 16x4, 16x8, 16x12, 16x16, 16x24, 16x32 and 16x64 64-bit

[x265] [PATCH 1 of 3] asm: interp_4tap_vert_pp sse2

2015-05-11 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431393193 25200 # Node ID 4f8da861a78953a4f16828f14e16c445d3b22cef # Parent f43b8e01ab507ac36825128322e02a1e06b7cd01 asm: interp_4tap_vert_pp sse2 This replaces c code for 12x16 and 12x32 64-bit ./test/TestBench --testbench

[x265] [PATCH 0 of 3 ] asm: interp_4tap_vert_pp sse2

2015-05-11 Thread dtyx265
This code replaces c code for all of 12xN, 16xN and 24xN ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel

[x265] [PATCH 3 of 5] asm: interp_4tap_vert_pp sse2

2015-05-08 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431117274 25200 # Node ID 9ca14d0867a3d9d10e8d74d9a4ec1431f937a1fd # Parent 80223399435bd842f2cb66cd4178c863397c4beb asm: interp_4tap_vert_pp sse2 This replaces c code for 4x4, 4x8, 4x16 and 4x32 64-bit /test/TestBench

[x265] [PATCH 0 of 5 ] asm: interp_4tap_vert_pp sse2

2015-05-08 Thread dtyx265
The following patches replace c code for interp_4tap_vert_pp sse2 ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel

[x265] [PATCH 5 of 5] asm: interp_4tap_vert_pp sse2

2015-05-08 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431118502 25200 # Node ID 81a0388ef700c1e773c1b9af3a500c515e27bcc3 # Parent 6dc172cd4d9d6625d0b2d6d114616c1ee664c83b asm: interp_4tap_vert_pp sse2 This replaces c code for 8x2, 8x4 and 8x6 for 64-bit only 64-bit

[x265] [PATCH 2 of 5] asm: interp_4tap_vert_pp sse2

2015-05-08 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431115701 25200 # Node ID 80223399435bd842f2cb66cd4178c863397c4beb # Parent 59e2e1b446e14adc8a7dae184bae9b241cc7a2f1 asm: interp_4tap_vert_pp sse2 This replaces c code for 4x2 64-bit ./test/TestBench --testbench interp | grep

[x265] [PATCH 1 of 5] asm: interp_4tap_vert_pp sse2

2015-05-08 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431115282 25200 # Node ID 59e2e1b446e14adc8a7dae184bae9b241cc7a2f1 # Parent 57fce553135247caaafd415458d457d7f8696cd0 asm: interp_4tap_vert_pp sse2 This replaces c code for 2x4, 2x8 and 2x16 64-bit ./test/TestBench --testbench

[x265] [PATCH 4 of 5] asm: interp_4tap_vert_pp sse2

2015-05-08 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431117805 25200 # Node ID 6dc172cd4d9d6625d0b2d6d114616c1ee664c83b # Parent 9ca14d0867a3d9d10e8d74d9a4ec1431f937a1fd asm: interp_4tap_vert_pp sse2 This replaces c code for 6x8 and 6x16 for 64-bit only 64-bit ./test/TestBench

[x265] [PATCH] asm: interp_4tap_vert_pp sse2

2015-05-08 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431137669 25200 # Node ID 1a6ce5886cdbf56024464532f48045a31f097d83 # Parent 81a0388ef700c1e773c1b9af3a500c515e27bcc3 asm: interp_4tap_vert_pp sse2 This code replaces c code for 8x8, 8x12, 8x16, 8x32 and 8x64 64-bit

[x265] [PATCH 0 of 3 ] asm: interp_4tap_vert_pp sse2

2015-05-06 Thread dtyx265
These patches replace c code and cover all of 2xN and 4xN ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel

[x265] [PATCH 1 of 3] asm: interp_4tap_vert_pp sse2

2015-05-06 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1430940440 25200 # Node ID 4690c9aa24caa1adb665355803d4c308a124ec96 # Parent 87d6724649df0157786c4210f0caebf961b31341 asm: interp_4tap_vert_pp sse2 This replaces c code for 2x4, 2x8 and 2x16 64-bit ./test/TestBench --testbench

[x265] [PATCH 2 of 3] asm: interp_4tap_vert_pp sse2

2015-05-06 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1430940898 25200 # Node ID ae9d689acc97b9165acfee5c32b257cc3823492f # Parent 4690c9aa24caa1adb665355803d4c308a124ec96 asm: interp_4tap_vert_pp sse2 This replaces c code for 4x2 64-bit ./test/TestBench --testbench interp | grep

[x265] [PATCH 3 of 3] asm: interp_4tap_vert_pp sse2

2015-05-06 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1430941437 25200 # Node ID e08bcc7339b0c611728dd0245028f1b0c72c8bee # Parent ae9d689acc97b9165acfee5c32b257cc3823492f asm: interp_4tap_vert_pp sse2 This replaces c code for 4x4, 4x8, 4x16 and 4x32 64-bit ./test/TestBench

[x265] [PATCH] asm: interp_4tap_vert_pp sse2

2015-05-05 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1430875749 25200 # Node ID 9452c826eb205682647ee0db4d8d445785bb7a1a # Parent f32e6464225afa02983af1b1905f50cdccae5244 asm: interp_4tap_vert_pp sse2 This replaces c code for 2x4, 2x8 and 2x16 64-bit ./test/TestBench --testbench

[x265] [PATCH] asm: interp_8tap_horiz pp and ps sse2

2015-05-01 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1430505170 25200 # Node ID 705b796531bb7c83c908df396ecac44ed007f642 # Parent bca33880585aec616107a8232204dbcb148f6678 asm: interp_8tap_horiz pp and ps sse2 This replaces c code and covers 4x4, 4x8, 4x16, 8x4, 8x8, 8x16, 8x32,

[x265] [PATCH] asm: interp_8tap_hv_pp_8x8 sse3

2015-04-29 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1430361608 25200 # Node ID f95cc094467c844c6607c67d330748d171d26483 # Parent 9a1b8b71bc997547044f42992e1eb7f3572f03f1 asm: interp_8tap_hv_pp_8x8 sse3 This replaces c code 64-bit ./test/TestBench --testbench interp | grep hv

[x265] [PATCH] asm: interp_8tap_horiz pp and ps sse2

2015-04-29 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1430321025 25200 # Node ID 9a1b8b71bc997547044f42992e1eb7f3572f03f1 # Parent e9df93f380664932e7d6c7e85b2cae16cd5e1dcd asm: interp_8tap_horiz pp and ps sse2 This replaces c code and covers 4x4, 4x8, 4x16, 8x4, 8x8, 8x16, 8x32,

[x265] [PATCH] asm: interp_8tap_horiz pp and ps sse2

2015-04-28 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1430273620 25200 # Node ID 9b0181193b6a2c64dad26d1749fb6a0e6cf87240 # Parent e9df93f380664932e7d6c7e85b2cae16cd5e1dcd asm: interp_8tap_horiz pp and ps sse2 This replaces c code and covers 4x4, 4x8, 4x16, 8x4, 8x8, 8x16, 8x32,

[x265] [PATCH] asm: interp_8tap_horiz pp and ps sse2

2015-04-27 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1430182995 25200 # Node ID 31b76bd430a47411f7b2ebaa7cfbb44e25c5ff60 # Parent 68a13226d586b335c02cade9311e093f0149c42a asm: interp_8tap_horiz pp and ps sse2 This replaces c code and covers 4x4, 4x8, 4x16, 8x4, 8x8, 8x16, 8x32,

[x265] [PATCH] asm: interp_8tap_horiz pp and ps sse2

2015-04-27 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1430194906 25200 # Node ID 027aaded20e76e719bb6b143de41b8739fb68b9e # Parent 68a13226d586b335c02cade9311e093f0149c42a asm: interp_8tap_horiz pp and ps sse2 This replaces c code and covers 4x4, 4x8, 4x16, 8x4, 8x8, 8x16, 8x32,

[x265] [PATCH] Invoke macro to setup sse2 intrapred dc and planar primitives

2015-04-22 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1429736780 25200 # Node ID 9564719ccfde7e7f2c4ab8c942604cb8865ce64d # Parent 859daedfbb29703ef67b3b65b069a0ff683c1828 Invoke macro to setup sse2 intrapred dc and planar primitives No functionality is altered, only the code size is

[x265] [PATCH] asm: interp_4tap_horiz_pp sse3

2015-04-21 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1429665160 25200 # Node ID defd1cf26749f3395750ef9128c9a90bfa2caf78 # Parent c135c117ffb083a00d4353279ea669e8f3f7a8ee asm: interp_4tap_horiz_pp sse3 This replaces c code for 6x8, 6x16, 8x2, 8x4, 8x6, 8x8, 8x12, 8x16, 8x32, 8x64,

[x265] [PATCH] asm: leading space nit

2015-04-18 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1429376539 25200 # Node ID 14b0bed44a7bc2f36b357a198104dd1cfaa4214c # Parent 3ec6052eaf9c1c1e3a280fa6d3fb392902b2a849 asm: leading space nit Added leading 4 spaces to asm instructions diff -r 3ec6052eaf9c -r 14b0bed44a7b

[x265] [PATCH 7 of 8] asm: interp_4tap_horiz_pp_4x16_sse3

2015-04-17 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1429288614 25200 # Node ID ca6ecc17db77dd60043de682345687f09aed16f3 # Parent 30ecb4d9ee23cd17a0c315c0854a80ed62f59d68 asm: interp_4tap_horiz_pp_4x16_sse3 This replaces c code. 64-bit ./test/TestBench --testbench interp | grep

[x265] [PATCH 6 of 8] asm: interp_4tap_horiz_pp_4x8_sse3

2015-04-17 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1429288383 25200 # Node ID 30ecb4d9ee23cd17a0c315c0854a80ed62f59d68 # Parent afc42e4ffe02b0dde666bc859e907296cc0b95a7 asm: interp_4tap_horiz_pp_4x8_sse3 This replaces c code. 64-bit ./test/TestBench --testbench interp | grep

[x265] [PATCH 4 of 8] asm: interp_4tap_horiz_pp_4x2_sse3

2015-04-17 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1429287675 25200 # Node ID 722f24e9ddcb900ea48f658fee33bc05b67b6385 # Parent df614d8d2b7bdfd45748722c91d7d7eaecf5cb57 asm: interp_4tap_horiz_pp_4x2_sse3 This replaces c code. 64-bit ./test/TestBench --testbench interp | grep

[x265] [PATCH 1 of 8] asm: interp_4tap_horiz_pp_2x4_sse3

2015-04-17 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1429286699 25200 # Node ID 34b0ec5fbc8695afd640821f8a9aba0e30a4253a # Parent 7be1172ec816298c32f588908e1b6f0fa214d349 asm: interp_4tap_horiz_pp_2x4_sse3 This replaces c code. 64-bit ./test/TestBench --testbench interp | grep

[x265] [PATCH 3 of 8] asm: interp_4tap_horiz_pp_2x16_sse3

2015-04-17 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1429287304 25200 # Node ID df614d8d2b7bdfd45748722c91d7d7eaecf5cb57 # Parent 3d43e6ad9a958794a2809fe46e949539c28523ce asm: interp_4tap_horiz_pp_2x16_sse3 This replaces c code. 64-bit ./test/TestBench --testbench interp | grep

[x265] [PATCH 8 of 8] asm: interp_4tap_horiz_pp_4x32_sse3

2015-04-17 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1429288806 25200 # Node ID 25164af79514fdda4bb248b6c3319271696bba96 # Parent ca6ecc17db77dd60043de682345687f09aed16f3 asm: interp_4tap_horiz_pp_4x32_sse3 This replaces c code. 64-bit ./test/TestBench --testbench interp | grep

[x265] [PATCH 0 of 8 ] asm: interp_4tap_horiz_pp

2015-04-17 Thread dtyx265
This code was backported from sse4 code. 2x4,2x8,2x16,4x2,4x4,4x8,4x16,4x32 are covered. The macros only use sse2 but the primitives use movddup(sse3) but this could easily be replaced if sse2 primitives are needed. ___ x265-devel mailing list

[x265] [PATCH 4 of 8] asm: interp_4tap_horiz_pp_4x2_sse3

2015-04-16 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1429214817 25200 # Node ID b1f1c0780d756f1680dbedde1e1e3a27e3c067c5 # Parent d53d53e9e19ebe1c5945909be1985fdad65369e6 asm: interp_4tap_horiz_pp_4x2_sse3 This replaces c code. 64-bit ./test/TestBench --testbench interp | grep

[x265] [PATCH 1 of 8] asm: interp_4tap_horiz_pp_2x4_sse3

2015-04-16 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1429212383 25200 # Node ID 080522d961cc55becc26dfd192e028a8782087eb # Parent 7be1172ec816298c32f588908e1b6f0fa214d349 asm: interp_4tap_horiz_pp_2x4_sse3 This replaces c code. 64-bit ./test/TestBench --testbench interp | grep

[x265] [PATCH 6 of 8] asm: interp_4tap_horiz_pp_4x8_sse3

2015-04-16 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1429215736 25200 # Node ID c6cdd6f4f47b7fa125d34abad2f93af1bcc97cce # Parent f476d7556132462c92264e98254e59143a6ae85e asm: interp_4tap_horiz_pp_4x8_sse3 This replaces c code. 64-bit ./test/TestBench --testbench interp | grep

[x265] [PATCH 8 of 8] asm: interp_4tap_horiz_pp_4x32_sse3

2015-04-16 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1429216277 25200 # Node ID 66ebb65fbcbf24f56934ae69471e84fdc131b7c0 # Parent 707a65570cb93e06bb743bb11a9178a92ed7bfb3 asm: interp_4tap_horiz_pp_4x32_sse3 This replaces c code. 64-bit ./test/TestBench --testbench interp | grep

[x265] [PATCH 7 of 8] asm: interp_4tap_horiz_pp_4x16_sse3

2015-04-16 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1429216044 25200 # Node ID 707a65570cb93e06bb743bb11a9178a92ed7bfb3 # Parent c6cdd6f4f47b7fa125d34abad2f93af1bcc97cce asm: interp_4tap_horiz_pp_4x16_sse3 This replaces c code. 64-bit ./test/TestBench --testbench interp | grep

[x265] [PATCH 0 of 8 ] asm: interp_4tap_horiz_pp

2015-04-16 Thread dtyx265
This code was backported from sse4 code. 2x4,2x8,2x16,4x2,4x4,4x8,4x16,4x32 are covered. The macros only use sse2 but the primitives use movddup(sse3) but this could easily be replaced if sse2 primitives are needed. ___ x265-devel mailing list

[x265] [PATCH] asm: intra pred all_angs_pred_4x4 sse2

2015-04-13 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1428958891 25200 # Node ID 9a581851fd66679eca3175921b6eef428cdec1ce # Parent 4cccf22b00ee188a72c8dc3896d7dc1613d855ad asm: intra pred all_angs_pred_4x4 sse2 This replaces c code and is backported from sse4 The processing of modes

[x265] [PATCH] asm: intra pred all_angs_pred_4x4 sse2

2015-04-13 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1428959599 25200 # Node ID f241399b3494455e4a40b8fcf693e4029b68c347 # Parent 4cccf22b00ee188a72c8dc3896d7dc1613d855ad asm: intra pred all_angs_pred_4x4 sse2 This replaces c code and is backported from sse4 The processing of modes

[x265] [PATCH] asm: intra pred all_angs_pred_4x4 sse2

2015-04-12 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1428887468 25200 # Node ID 364b13ff264fc26358879d872817c962303e2150 # Parent 4cccf22b00ee188a72c8dc3896d7dc1613d855ad asm: intra pred all_angs_pred_4x4 sse2 This replaces c code and is backported from sse4 The processing of modes

[x265] [PATCH] asm: intra pred all_angs_pred_4x4 sse2

2015-04-10 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1428717487 25200 # Node ID c40653978caea4a4bf8940ae3b0e8db74bbe07d7 # Parent ee76a15fa312ac59549965821d9cbff03237226f asm: intra pred all_angs_pred_4x4 sse2 This replaces c code and is backported from sse4 The processing of modes

[x265] [PATCH] asm: intra_pred_ang4_26_sse2

2015-04-05 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1428235938 25200 # Node ID ecf0b18b4de346d766cd9593e0afeb077671728a # Parent 79611624ed8fcaa312473687567ffde76543c417 asm: intra_pred_ang4_26_sse2 changed r1 to r1d to reduce code size diff -r 79611624ed8f -r ecf0b18b4de3

[x265] [PATCH] asm: intra_pred_ang4_26_sse2

2015-04-04 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1428173713 25200 # Node ID 35aec3dbe0525cad654f1a9b777a7466cce944ee # Parent 79611624ed8fcaa312473687567ffde76543c417 asm: intra_pred_ang4_26_sse2 changed r1 to r1d to reduce code size diff -r 79611624ed8f -r 35aec3dbe052

[x265] [PATCH] asm: intra_pred_ang4_18

2015-04-04 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1428174059 25200 # Node ID 78707e94519607d5c31eacd03bc72e215407b5e2 # Parent 35aec3dbe0525cad654f1a9b777a7466cce944ee asm: intra_pred_ang4_18 Changed third pshuflw parameter from hexadecimal to quaternary The value is the

[x265] [PATCH] asm:intra_pred4_x filtering

2015-04-04 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1428172954 25200 # Node ID 79611624ed8fcaa312473687567ffde76543c417 # Parent bb771744a75d4493a35ec5e9d76aaee1fa039f28 asm:intra_pred4_x filtering Use r4 to hold address of constant to reduce code size diff -r bb771744a75d -r

[x265] [PATCH 12 of 18] asm: intra_pred_ang4_12_sse2 16-bit

2015-04-03 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1428076587 25200 # Node ID b0e5af8de87a8b1db1c7e2f71abb5246124df8e0 # Parent dfcd48ac5d09a2a0615d5657328457b08ce12e0e asm: intra_pred_ang4_12_sse2 16-bit This is backported from sse4 code and replaces c code. ./test/TestBench

[x265] [PATCH 07 of 18] asm: intra_pred_ang4_8_sse2 16-bit

2015-04-03 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1428075423 25200 # Node ID 1873e34b04005449ecf3ec45d1c0de3a7c54ff86 # Parent 1304bb9e3956f88bf89211db8846693faeb1a666 asm: intra_pred_ang4_8_sse2 16-bit This is backported from sse4 code and replaces c code. ./test/TestBench

[x265] [PATCH 15 of 18] asm: intra_pred_ang4_15_sse2 16-bit

2015-04-03 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1428077130 25200 # Node ID 9ec19f6d4e3aeb7ae686626438158aa6d67c2cea # Parent 8dbdcc01c0afd466b7c7ef1a5b9b67e246c0c77a asm: intra_pred_ang4_15_sse2 16-bit This is backported from sse4 code and replaces c code. ./test/TestBench

[x265] [PATCH 10 of 18] asm: intra_pred_ang4_26_sse2 16-bit

2015-04-03 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1428076210 25200 # Node ID 89cf1dc260ad91eba0f4edcacc784be0959636b5 # Parent 95c50ede466fa24a1ff13e5203305b35392a5f64 asm: intra_pred_ang4_26_sse2 16-bit This is backported from sse4 code and replaces c code. ./test/TestBench

[x265] [PATCH 11 of 18] asm: intra_pred_ang4_11_sse2 16-bit

2015-04-03 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1428076422 25200 # Node ID dfcd48ac5d09a2a0615d5657328457b08ce12e0e # Parent 89cf1dc260ad91eba0f4edcacc784be0959636b5 asm: intra_pred_ang4_11_sse2 16-bit This is backported from sse4 code and replaces c code. ./test/TestBench

[x265] [PATCH 18 of 18] asm: intra_pred_ang4_18_sse2 16-bit

2015-04-03 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1428078167 25200 # Node ID bb771744a75d4493a35ec5e9d76aaee1fa039f28 # Parent e4b343abfa73af9b68f26283a9e75e161860d7b2 asm: intra_pred_ang4_18_sse2 16-bit This is backported from sse4 code and replaces c code. ./test/TestBench

[x265] [PATCH 17 of 18] asm: intra_pred_ang4_17_sse2 16-bit

2015-04-03 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1428078042 25200 # Node ID e4b343abfa73af9b68f26283a9e75e161860d7b2 # Parent 97e3406e46539aefb8e1568fb49c452a63554887 asm: intra_pred_ang4_17_sse2 16-bit This is backported from sse4 code and replaces c code. ./test/TestBench

[x265] [PATCH 16 of 18] asm: intra_pred_ang4_16_sse2 16-bit

2015-04-03 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1428077344 25200 # Node ID 97e3406e46539aefb8e1568fb49c452a63554887 # Parent 9ec19f6d4e3aeb7ae686626438158aa6d67c2cea asm: intra_pred_ang4_16_sse2 16-bit This is backported from sse4 code and replaces c code. ./test/TestBench

[x265] [PATCH 14 of 18] asm: intra_pred_ang4_14_sse2 16-bit

2015-04-03 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1428076958 25200 # Node ID 8dbdcc01c0afd466b7c7ef1a5b9b67e246c0c77a # Parent 57f635c8bd62590c95d6d0ad1a20ce5270639038 asm: intra_pred_ang4_14_sse2 16-bit This is backported from sse4 code and replaces c code. ./test/TestBench

[x265] [PATCH 05 of 18] asm: intra_pred_ang4_6_sse2 16-bit

2015-04-03 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1428075078 25200 # Node ID d7fce90aa27134c90120e1210371b56f7d0f9cae # Parent abf5013139d97d61474411ec81738b536051f185 asm: intra_pred_ang4_6_sse2 16-bit This is backported from sse4 code and replaces c code. ./test/TestBench

[x265] [PATCH 01 of 18] asm: intra_pred_ang4_2_sse2 16-bit

2015-04-03 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1428072972 25200 # Node ID 77edd96a4c1bc61d0bff30c4b2efef5bb8fbe2a1 # Parent 9a5fa67583feb6ffb7668f82632f7e93e5ec9415 asm: intra_pred_ang4_2_sse2 16-bit This is backported from sse4 code and replaces c code. ./test/TestBench

[x265] [PATCH 09 of 18] asm: intra_pred_ang4_10_sse2 16-bit

2015-04-03 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1428075968 25200 # Node ID 95c50ede466fa24a1ff13e5203305b35392a5f64 # Parent 47dcaffb0a2cbf71efa8ca6eabe45c610901513b asm: intra_pred_ang4_10_sse2 16-bit This is backported from sse4 code and replaces c code. ./test/TestBench

[x265] [PATCH 08 of 18] asm: intra_pred_ang4_9_sse2 16-bit

2015-04-03 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1428075626 25200 # Node ID 47dcaffb0a2cbf71efa8ca6eabe45c610901513b # Parent 1873e34b04005449ecf3ec45d1c0de3a7c54ff86 asm: intra_pred_ang4_9_sse2 16-bit This is backported from sse4 code and replaces c code. ./test/TestBench

[x265] [PATCH 06 of 18] asm: intra_pred_ang4_7_sse2 16-bit

2015-04-03 Thread dtyx265
# HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1428075249 25200 # Node ID 1304bb9e3956f88bf89211db8846693faeb1a666 # Parent d7fce90aa27134c90120e1210371b56f7d0f9cae asm: intra_pred_ang4_7_sse2 16-bit This is backported from sse4 code and replaces c code. ./test/TestBench

  1   2   3   >