[x265] question about interp

2015-06-18 Thread dave
decision making? thanks, Dave ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel

Re: [x265] question about interp

2015-06-18 Thread dave
-06-19 06:21:02,dave dtyx...@gmail.com wrote: For the interpolation primitives there are multiple versions, pp, ps, sp, and ss. Each one adding an offset before a shift. Yet as I review the interpolation section 8.5.3.3.3 of the h265 spec, I don't see where an offset is added. Am I missing

Re: [x265] [PATCH] asm: interp_4tap_horiz_pX sse3 10-bit

2015-06-10 Thread dave
string index bug, you have sent a patch to fix it. At 2015-06-10 00:59:03,dave dtyx...@gmail.com wrote: Can you post the errors? I am guessing I need to rebase this on the latest code and resubmit. On 06/08/2015 11:08 PM, Deepthi Nandakumar wrote: Hi, This throws build

Re: [x265] [PATCH] asm: interp_4tap_horiz_pX sse3 10-bit

2015-06-09 Thread dave
: other is fine, we can push it, thanks At 2015-06-05 21:45:01,dave dtyx...@gmail.com mailto:dtyx...@gmail.com wrote: I prefer to keep this patch as sse3/movdqu, is there anything holding this back? On 06/02/2015 09:44 AM, chen wrote: the root cause

Re: [x265] profiling x265

2015-06-08 Thread dave
:51,dave dtyx...@gmail.com wrote: I would like to profile x265 with lttng by using -finstrument-functions but when I add it to CXX flags I get the following when linking, Linking CXX executable x265 /usr/bin/cmake -E cmake_link_script CMakeFiles/cli.dir/link.txt --verbose=1

[x265] profiling x265

2015-06-08 Thread dave
appreciated. thanks, Dave | ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel

Re: [x265] profiling x265

2015-06-08 Thread dave
Wow, thanks Min. I did a decent amount of searching on the internet but didn't find this. On 06/08/2015 12:52 PM, chen wrote: no answer in gcc website. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52544 At 2015-06-09 03:26:10,dave dtyx...@gmail.com wrote: cmake/make already has

Re: [x265] [PATCH] asm: interp_4tap_horiz_pX sse3 10-bit

2015-06-05 Thread dave
choice to keep movdqu in here At 2015-06-03 00:28:32,dave dtyx...@gmail.com wrote: On 06/02/2015 09:16 AM, chen wrote: movdqu - movu others fine I am not sure why but movu here fails to build. source/common/x86/ipfilter16.asm:954: error: invalid combination of opcode and operands make[2

[x265] Size vs. speed

2015-06-03 Thread dave
unrolled with %rep, as is the sse4 version, though in a different way. This probably generates considerably larger executables, especially for the larger sizes. Is there any preference on this? Are x265's goals purely performance related over memory usage? thanks, Dave

Re: [x265] [PATCH] asm: interp_4tap_horiz_pX sse3 10-bit

2015-06-02 Thread dave
On 06/02/2015 09:16 AM, chen wrote: movdqu - movu others fine I am not sure why but movu here fails to build. source/common/x86/ipfilter16.asm:954: error: invalid combination of opcode and operands make[2]: *** [common/CMakeFiles/common.dir/x86/ipfilter16.asm.o] Error 1 make[1]: ***

Re: [x265] [PATCH] asm: interp_4tap_horiz_pX sse3 10-bit

2015-06-02 Thread dave
a bug of movu. lddqu can only move up to 64-bits but movu moves 128-bits in 64-bit mode. If fixed, then changing movdqu to movu should cause no problems. At 2015-06-03 00:28:32,dave dtyx...@gmail.com wrote: On 06/02/2015 09:16 AM, chen wrote: movdqu - movu others fine I am not sure why

Re: [x265] [PATCH] asm: interp_8tap_vert_pX sse2

2015-05-29 Thread dave
FYI, If what I submitted performs better than the sse4 code then I suggest either improving the sse4 code with ssse3 and sse4 instructions or removing it. On 05/29/2015 10:12 AM, chen wrote: right,thanks . At 2015-05-30 01:01:15,dtyx...@gmail.com wrote: # HG changeset patch # User David T

Re: [x265] [PATCH] asm: interp_4tap_horiz_ps sse3

2015-05-21 Thread dave
On 05/21/2015 07:07 PM, chen wrote: lea srcq, [srcq + srcstrideq] why not ADD? That's carried over from the sse4 code. I can change it to add if that is what you want. At 2015-05-22 09:27:57,dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date

Re: [x265] [PATCH 2 of 2] asm: interp_4tap_vert_pX_4xN sse2

2015-05-19 Thread dave
On 05/19/2015 04:40 PM, chen wrote: Why x64 only? r4 always free in both x86 and x64 It hurts performance in the benchtest for x32 but I can make it cover both if that is what you want. At 2015-05-20 07:33:14,dtyx...@gmail.com mailto:dtyx...@gmail.com wrote: # HG changeset patch # User David

Re: [x265] [PATCH 03 of 12] asm: interp_4tap_vert_ps_4xN sse2

2015-05-18 Thread dave
On 05/18/2015 07:55 AM, chen wrote: At 2015-05-18 10:48:54,dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431912252 25200 # Node ID c2624b61f4c7d894616a7dc1e8a6cc1c0a506028 # Parent 72bba6b9e99739599d04be62c7e02a3c8faa asm:

Re: [x265] [PATCH 02 of 12] asm: interp_4tap_vert_ps_4x2 sse2

2015-05-18 Thread dave
On 05/18/2015 09:42 AM, chen wrote: At 2015-05-19 00:36:01,dave dtyx...@gmail.com wrote: On 05/18/2015 07:50 AM, chen wrote: At 2015-05-18 10:48:53,dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuendtyx...@gmail.com # Date 1431911615 25200 # Node ID

Re: [x265] [PATCH 02 of 12] asm: interp_4tap_vert_ps_4x2 sse2

2015-05-18 Thread dave
On 05/18/2015 07:50 AM, chen wrote: At 2015-05-18 10:48:53,dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1431911615 25200 # Node ID 72bba6b9e99739599d04be62c7e02a3c8faa # Parent 465fb4340a241e501b53a6241f5ae81c29ba073a asm:

Re: [x265] [PATCH 02 of 12] asm: interp_4tap_vert_ps_4x2 sse2

2015-05-18 Thread dave
On 05/18/2015 05:27 PM, Steve Borho wrote: On 05/18, dave wrote: On 05/18/2015 09:42 AM, chen wrote: [MC] yes, it is faster on AMD CPU, on Intel, these instructions choke Port5, the PADD execute on Port1. I often choice faster instrction for Intel because my PC use Intel CPU

Re: [x265] [PATCH 0 of 3 ] asm: interp_4tap_vert_pp sse2

2015-05-12 Thread dave
This follows 5f9e5e9. I am not sure if that is helpful, if not then I can resubmit from a clean tree. On 05/12/2015 04:50 AM, Deepthi Nandakumar wrote: Thanks, but I cannot apply this patch. The parent-ID does not exist on the public tree. On Tue, May 12, 2015 at 7:21 AM, chen

Re: [x265] [PATCH 1 of 3] asm: interp_4tap_vert_pp sse2

2015-05-08 Thread dave
On 05/06/2015 01:29 PM, chen wrote: At 2015-05-07 03:45:35,dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1430940440 25200 # Node ID 4690c9aa24caa1adb665355803d4c308a124ec96 # Parent 87d6724649df0157786c4210f0caebf961b31341 asm: interp_4tap_vert_pp

Re: [x265] [PATCH 1 of 3] asm: interp_4tap_vert_pp sse2

2015-05-06 Thread dave
On 05/06/2015 01:29 PM, chen wrote: At 2015-05-07 03:45:35,dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1430940440 25200 # Node ID 4690c9aa24caa1adb665355803d4c308a124ec96 # Parent 87d6724649df0157786c4210f0caebf961b31341 asm: interp_4tap_vert_pp

Re: [x265] [PATCH] asm: interp_4tap_vert_pp sse2

2015-05-05 Thread dave
Ignore this, I will be sending a better version. On 05/05/2015 06:32 PM, dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1430875749 25200 # Node ID 9452c826eb205682647ee0db4d8d445785bb7a1a # Parent f32e6464225afa02983af1b1905f50cdccae5244 asm:

Re: [x265] [PATCH] asm: interp_8tap_hv_pp_8x8 sse3

2015-05-01 Thread dave
On 05/01/2015 10:03 AM, Steve Borho wrote: On 04/29,dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuendtyx...@gmail.com # Date 1430361608 25200 # Node ID f95cc094467c844c6607c67d330748d171d26483 # Parent 9a1b8b71bc997547044f42992e1eb7f3572f03f1 asm: interp_8tap_hv_pp_8x8 sse3

Re: [x265] [PATCH] asm: interp_8tap_horiz pp and ps sse2

2015-04-30 Thread dave
I submitted a patch fixing this but I was only able to test it for 64-bit. When trying to build for 32-bit I ran into the following build error. Linking CXX executable x265 libx265.so.56: undefined reference to `dlsym' libx265.so.56: undefined reference to `dlopen' collect2: error: ld

Re: [x265] [PATCH] asm: interp_8tap_horiz pp and ps sse2

2015-04-30 Thread dave
On 04/30/2015 09:28 AM, Steve Borho wrote: On 04/30, dave wrote: I submitted a patch fixing this but I was only able to test it for 64-bit. When trying to build for 32-bit I ran into the following build error. Linking CXX executable x265 libx265.so.56: undefined reference to `dlsym' libx265.so

Re: [x265] [PATCH] asm: interp_8tap_horiz pp and ps sse2

2015-04-30 Thread dave
On 04/30/2015 12:43 PM, Steve Borho wrote: On 04/30, dave wrote: On 04/30/2015 09:28 AM, Steve Borho wrote: On 04/30, dave wrote: I submitted a patch fixing this but I was only able to test it for 64-bit. When trying to build for 32-bit I ran into the following build error. Linking CXX

Re: [x265] [PATCH] asm: interp_8tap_horiz pp and ps sse2

2015-04-28 Thread dave
On 04/28/2015 06:38 PM, chen wrote: 在 2015-04-29 09:30:36,dave dtyx...@gmail.com 写道: On 04/28/2015 06:13 PM, chen wrote: 在 2015-04-29 07:49:46,dave dtyx...@gmail.com 写道: On 04/28/2015 03:32 PM, chen wrote: Most part are fine now, just modify about r5, see below

Re: [x265] [PATCH] asm: interp_8tap_horiz pp and ps sse2

2015-04-28 Thread dave
On 04/28/2015 03:32 PM, chen wrote: Most part are fine now, just modify about r5, see below comment At 2015-04-29 06:27:27,dtyx...@gmail.com wrote: # HG changeset patch # User David T yuendtyx...@gmail.com # Date 1430259967 25200 # Node ID 6108fbda1be654a481a78f7ef593518033919674 # Parent

Re: [x265] [PATCH] asm: interp_8tap_horiz pp and ps sse2

2015-04-28 Thread dave
On 04/28/2015 06:13 PM, chen wrote: 在 2015-04-29 07:49:46,dave dtyx...@gmail.com 写道: On 04/28/2015 03:32 PM, chen wrote: Most part are fine now, just modify about r5, see below comment At 2015-04-29 06:27:27,dtyx...@gmail.com wrote: # HG changeset patch # User David T

Re: [x265] [PATCH] asm: interp_8tap_horiz pp and ps sse2

2015-04-27 Thread dave
On 04/27/2015 08:42 PM, chen wrote: At 2015-04-28 09:05:06,dtyx...@gmail.com wrote: # HG changeset patch # User David T yuendtyx...@gmail.com # Date 1430182995 25200 # Node ID 31b76bd430a47411f7b2ebaa7cfbb44e25c5ff60 # Parent 68a13226d586b335c02cade9311e093f0149c42a asm: interp_8tap_horiz pp

Re: [x265] [PATCH] asm: interp_4tap_horiz_pp sse3

2015-04-21 Thread dave
On 04/21/2015 07:48 PM, chen wrote: At 2015-04-22 09:13:47,dtyx...@gmail.com wrote: # HG changeset patch # User David T yuendtyx...@gmail.com # Date 1429665160 25200 # Node ID defd1cf26749f3395750ef9128c9a90bfa2caf78 # Parent c135c117ffb083a00d4353279ea669e8f3f7a8ee asm: interp_4tap_horiz_pp

Re: [x265] [PATCH 0 of 8 ] asm: interp_4tap_horiz_pp

2015-04-18 Thread dave
On 04/17/2015 10:55 PM, chen wrote: patches are right the code style need modify, the first column reserved for label or preprocess (%) just (4 of 8) macro's style right I sent a patch adding the first column. I also know the RET macro has some line spacing issues. There are a few spots

Re: [x265] [PATCH 00 of 18 ] asm:intra_pred_ang4 16 bit all modes

2015-04-03 Thread dave
On 04/03/2015 12:21 PM, Steve Borho wrote: On Fri, Apr 3, 2015 at 11:24 AM, dtyx...@gmail.com wrote: All modes are backported from sse4 code btw: I believe you would get more 'bang for your buck' if you back-ported the subpel interpolation filter primitives before the rest of the intra

Re: [x265] [PATCH 00 of 11 ] asm:intra pred 4x4 modes 10-18, 26 and transposed modes

2015-04-02 Thread dave
Sorry, I am not sure how this happened. It is supposed to be applied after the mode 10 patch. I think it's some quirk in my TortoiseHG client. On 04/01/2015 09:47 PM, chen wrote: right the mode 26 order fault, apply patches failed. At 2015-04-02 05:14:21,dtyx...@gmail.com wrote: This patch

Re: [x265] [PATCH 1 of 7] asm:intra_pred_ang4_3_sse2 improved ~4.5% 684.95 - 654.99 with nits and tweaks

2015-04-02 Thread dave
On 04/01/2015 09:24 PM, chen wrote: At 2015-04-02 02:52:16,dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1427891624 25200 # Node ID 529c6056ccfbce57cd845abec59a2af02812cd57 # Parent 89bc6238d4a2e3f117f0127e406c6dfbf093868b asm:intra_pred_ang4_3_sse2

Re: [x265] [PATCH 3 of 7] asm: intra_pred_ang4_5_sse2 improved ~2.5% 642.50 - 627.50 with nits and tweaks

2015-04-01 Thread dave
please disregard this one. The correct version has been sent On 04/01/2015 11:52 AM, dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1427912500 25200 # Node ID fc6b5f8bbcc8283e5b4fd88d41b8c313b002a198 # Parent fc902b84fc7f8dadf56431766adab8eda3520596

Re: [x265] [PATCH] asm:intra_pred_ang4_2 improved by ~4% 134.99 - 129.95 with nits and tweaks

2015-03-30 Thread dave
On 03/30/2015 11:49 AM, Steve Borho wrote: On 03/28, dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1427590314 25200 # Node ID ea65ec5a6c969e4cee612faa4d948e3337ed72d1 # Parent 36d70728acc2d9d6103af7530493176c08298ded asm:intra_pred_ang4_2 improved

Re: [x265] [PATCH 3 of 8] asm:intra_pred_ang4_4_sse2 improved ~2% 647.49 - 634.98 with nits and tweaks

2015-03-28 Thread dave
On 03/28/2015 03:20 PM, chen wrote: At 2015-03-29 05:35:21,dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1427576216 25200 # Node ID 0a75e3d50518e73f5a199d7519f800a9ff1c2e2c # Parent 6595ba5f989fdd521e268911ddf027665a610e25 asm:intra_pred_ang4_4_sse2

Re: [x265] [PATCH 10 of 11] asm: intra_pred_ang4_18_sse2

2015-03-27 Thread dave
On 03/27/2015 08:08 PM, chen wrote: At 2015-03-28 10:42:50,dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1427509819 25200 # Node ID 1ef1d75462375830f47e45d53dcf1c33d458d15e # Parent 89f337fa1d295b161d4800cc28674c9d02d4c6a9 asm:

Re: [x265] [PATCH 1 of 2] asm:intra pred planar32 sse2

2015-03-12 Thread dave
On 03/12/2015 03:16 PM, chen wrote: I use 'pxor m7,m7' to replace your [pb_0], but it is same cycles in IACA, the bottleneck on Port0 Not sure how about performance on old CPU I would have used something like that but there are no available registers by that point. They are used up on holding

Re: [x265] [PATCH 2 of 2] asm:intra pred planar32 sse2 high bit

2015-03-10 Thread dave
This produces some interesting numbers. sorry, I mixed these two up. incorrect:Without using registers for constants with using registers x265 [info]: I32: Intra 100%(DC 0% P 40% Ang 58%) encoded 2000 frames in 95.98s (20.84 fps), 1020.04 kb/s incorrect:With using registers for constants

Re: [x265] [PATCH 2 of 2] asm:intra pred planar32 sse2 high bit

2015-03-10 Thread dave
On 03/09/2015 11:40 PM, Steve Borho wrote: On 03/09, dave wrote: On 03/09/2015 08:25 PM, Steve Borho wrote: On 03/09, dave wrote: Interesting. Performance is almost identical original code /x265 -I 1 --input ~/Videos/bridge-close-cif/bridge-close.y4m -o bridge-close.y4m y4m [info

Re: [x265] [PATCH 2 of 2] asm:intra pred planar32 sse2 high bit

2015-03-10 Thread dave
On 03/10/2015 08:56 AM, Steve Borho wrote: On 03/10, dave wrote: On 03/09/2015 11:40 PM, Steve Borho wrote: snip No, but the command line option --cu-stats does show how much it is called (but not how long it took) This produces some interesting numbers. Without using registers

Re: [x265] [PATCH 2 of 2] asm:intra pred planar32 sse2 high bit

2015-03-10 Thread dave
On 03/10/2015 12:12 PM, Steve Borho wrote: On 03/10, dave wrote: This produces some interesting numbers. sorry, I mixed these two up. incorrect:Without using registers for constants with using registers x265 [info]: I32: Intra 100%(DC 0% P 40% Ang 58%) encoded 2000 frames in 95.98s (20.84

Re: [x265] [PATCH 1 of 2] asm:intra pred planar32 sse2

2015-03-09 Thread dave
On 03/09/2015 04:01 PM, chen wrote: if no performance drop, we may write as new format OK. Actually, since the sse4 version works in double words, this version is probably faster on your Haswell. At 2015-03-10 07:00:08,dave dtyx...@gmail.com wrote: On 03/09/2015 03:20 PM, chen wrote

Re: [x265] [PATCH 2 of 2] asm:intra pred planar32 sse2 high bit

2015-03-09 Thread dave
On 03/09/2015 03:23 PM, chen wrote: At 2015-03-10 05:24:09,dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1425935926 25200 # Node ID 4d1d54d28cb1635448ceec764c793d1b37cef7a4 # Parent ef383507c21ca704b58e759e440f4ae2e177c499 asm:intra pred planar32

Re: [x265] [PATCH 1 of 2] asm:intra pred planar32 sse2

2015-03-09 Thread dave
On 03/09/2015 03:20 PM, chen wrote: a little improve need integrate lea r0, [r0 + r1] when I replace it by add r0, r1, I got ~3% improve on my Haswell PC This makes no difference on my old machine. At 2015-03-10 05:24:08,dtyx...@gmail.com wrote: # HG changeset patch # User David T

Re: [x265] [PATCH 2 of 2] asm:intra pred planar32 sse2 high bit

2015-03-09 Thread dave
On 03/09/2015 04:09 PM, chen wrote: At 2015-03-10 07:06:26,dave dtyx...@gmail.com wrote: On 03/09/2015 03:23 PM, chen wrote: At 2015-03-10 05:24:09,dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuendtyx...@gmail.com # Date 1425935926 25200 # Node ID

Re: [x265] [PATCH 2 of 2] asm:intra pred planar32 sse2 high bit

2015-03-09 Thread dave
On 03/09/2015 08:25 PM, Steve Borho wrote: On 03/09, dave wrote: Interesting. Performance is almost identical original code /x265 -I 1 --input ~/Videos/bridge-close-cif/bridge-close.y4m -o bridge-close.y4m y4m [info]: 352x288 fps 30/1 i420p8 frames 0 - 1999 of 2000 x265 [info]: HEVC encoder

Re: [x265] [PATCH 2 of 2] asm:intra pred planar32 sse2 high bit

2015-03-09 Thread dave
On 03/09/2015 04:34 PM, chen wrote: At 2015-03-10 07:31:13,dave dtyx...@gmail.com wrote: On 03/09/2015 04:09 PM, chen wrote: At 2015-03-10 07:06:26,dave dtyx...@gmail.com wrote: On 03/09/2015 03:23 PM, chen wrote: At 2015-03-10 05:24:09,dtyx...@gmail.com wrote

Re: [x265] [PATCH 2 of 2] asm:intra pred planar32 sse2 high bit

2015-03-09 Thread dave
On 03/09/2015 04:34 PM, chen wrote: At 2015-03-10 07:31:13,dave dtyx...@gmail.com wrote: On 03/09/2015 04:09 PM, chen wrote: At 2015-03-10 07:06:26,dave dtyx...@gmail.com wrote: On 03/09/2015 03:23 PM, chen wrote: At 2015-03-10 05:24:09,dtyx...@gmail.com wrote

Re: [x265] [PATCH] asm: improve on intra_dc32

2015-03-06 Thread dave
On 03/06/2015 05:07 PM, Min Chen wrote: # HG changeset patch # User Min Chen chenm...@163.com # Date 1425690429 28800 # Node ID 63d132c844b9d299081b40e7589275b78fe71093 # Parent 043c2418864b0a3ada6f597e6def6ead73d90b5f asm: improve on intra_dc32 --- source/common/x86/intrapred8.asm | 71

Re: [x265] [PATCH] asm: improve on intra_dc32

2015-03-06 Thread dave
On 03/06/2015 04:06 PM, chen wrote: At 2015-03-07 07:58:13,dave dtyx...@gmail.com wrote: On 03/06/2015 05:07 PM, Min Chen wrote: # HG changeset patch # User Min Chen chenm...@163.com # Date 1425690429 28800 # Node ID 63d132c844b9d299081b40e7589275b78fe71093 # Parent

[x265] intrapred sse2

2015-03-05 Thread dave
I can resubmit patches for intra pred sse2 not yet accepted based on the new tip if that is helpful. Either all in one patch or individually if that is preferred. So far, dc4, 8, 16, 32 and planar8 for intrapred16.asm and dc32 and planar8 for intrapred8.asm. I also have planar16 ready.

Re: [x265] [PATCH 4 of 9] asm:intra pred dc32 sse2

2015-03-05 Thread dave
On 03/05/2015 06:02 PM, chen wrote: At 2015-03-06 08:19:57,dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1425594719 28800 # Node ID 912c42dcb4d9b399515e6c1ed6be70db3bf5f675 # Parent c5fa433ffda0a95889e99f4df787f3edc5880d0f asm:intra pred dc32 sse2

Re: [x265] [PATCH] asm:intra pred planar8 sse2

2015-03-04 Thread dave
On 03/04/2015 06:44 PM, chen wrote: At 2015-03-05 10:03:59,dave dtyx...@gmail.com wrote: On 03/04/2015 04:39 PM, chen wrote: At 2015-03-05 07:54:02,dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuendtyx...@gmail.com # Date 1425512599 28800 # Node ID

Re: [x265] [PATCH] asm:intra pred planar8 sse2

2015-03-04 Thread dave
On 03/04/2015 04:39 PM, chen wrote: At 2015-03-05 07:54:02,dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1425512599 28800 # Node ID 16880e791046ef8470f8307b76aae57c3be573c1 # Parent c53b456ad909eeab8d83f8e0817e641d174cc706 asm:intra pred planar8 sse2

Re: [x265] [PATCH] asm:intra pred planar8 sse2 high bit

2015-03-04 Thread dave
On 03/04/2015 07:12 PM, Steve Borho wrote: On 03/04, dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1425513096 28800 # Node ID 243cb00021a3bbc3184cbc2b27a5dbe56d745c51 # Parent bf095f7deed80f741663edae2b3a90bb4f63a980 asm:intra pred planar8 sse2 high

Re: [x265] [PATCH] asm: intrapred dc32 sse2 high bit

2015-03-03 Thread dave
On 03/03/2015 07:03 AM, chen wrote: right, with some comment below Thanks, I will submit a new patch. At 2015-03-03 12:03:23,dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1425355260 28800 # Node ID 48ca9a0c131c99d54778515dc5a6a5a7a9197153 # Parent

Re: [x265] [PATCH] asm: intrapred dc32 sse2 high bit

2015-03-03 Thread dave
oops, forgot to remove old comment, will resubmit... On 03/03/2015 09:40 AM, dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1425404383 28800 # Node ID 1699595e179d7c131ab24a9af48683a8da2a4d79 # Parent 4641827f98c935603f608425de8c76785aef1114 asm:

Re: [x265] [PATCH] asm:intra pred planar4 sse2 high bit

2015-03-03 Thread dave
The sse4 intra pred planar4 is actually sse2 so this is the same thing with a few minor tweaks. There's probably more room for improvement of the sse4 version with ssse3+ instructions. On 03/03/2015 06:47 PM, dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com

Re: [x265] [PATCH] asm: intrapred dc8 sse2 high bit

2015-02-26 Thread dave
On 02/26/2015 02:43 PM, Steve Borho wrote: On 02/26, dave wrote: On 02/25/2015 05:59 AM, chen wrote: At 2015-02-25 08:16:44,dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1424823199 28800 # Node ID bd1d713da87bb1e3022c80f462398bd78a95ce48 # Parent

Re: [x265] [PATCH] Fixed 32 bit bug in intrapred dc4 sse2

2015-02-26 Thread dave
FYI, I forgot to comment in the commit message: I kept the original code for 64 bit because while using r2b works in 64 bits, using it severely hurt performance to the point that it was well below c code. This is probably due to how things like instruction order, length and layout in

Re: [x265] [PATCH] asm: intrapred dc8 sse2 high bit

2015-02-26 Thread dave
On 02/25/2015 05:59 AM, chen wrote: At 2015-02-25 08:16:44,dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1424823199 28800 # Node ID bd1d713da87bb1e3022c80f462398bd78a95ce48 # Parent 644d27ca0b197455393171ba705b1190f3d9b420 asm: intrapred dc8 sse2

Re: [x265] [PATCH] asm: intrapred dc8 sse2

2015-02-25 Thread dave
On 02/25/2015 05:55 AM, chen wrote: At 2015-02-25 08:15:59,dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1424818021 28800 # Node ID 644d27ca0b197455393171ba705b1190f3d9b420 # Parent 1a703601f7c8b85f1e6e680caa281b9edada89ab asm: intrapred dc8 sse2

Re: [x265] [PATCH] asm: intrapred dc8 sse2 high bit

2015-02-25 Thread dave
On 02/25/2015 05:59 AM, chen wrote: At 2015-02-25 08:16:44,dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1424823199 28800 # Node ID bd1d713da87bb1e3022c80f462398bd78a95ce48 # Parent 644d27ca0b197455393171ba705b1190f3d9b420 asm: intrapred dc8 sse2

Re: [x265] [PATCH] asm: intrapred dc4 sse2

2015-02-25 Thread dave
On 02/25/2015 02:30 PM, Steve Borho wrote: On Wed, Feb 25, 2015 at 7:34 AM, chen chenm...@163.com wrote: it's right, just move pw_257 into const.asm in future I pushed this one, but then realized that the -m32 build is broken on LInux. It looks to be caused by the usage of r4b I will look into

Re: [x265] [PATCH] asm: intrapred dc4 sse2 high bit

2015-02-24 Thread dave
Testing fails when movzx is removed so I left it in. On 02/23/2015 08:00 PM, chen wrote: At 2015-02-24 00:24:27,dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1424708564 28800 # Node ID 60db6870a07261ee2acf5556ddf34dae051fa5c9 # Parent

Re: [x265] [PATCH] asm: intrapred dc4 sse2

2015-02-23 Thread dave
Thanks, will resubmit. On 02/23/2015 07:47 PM, chen wrote: At 2015-02-24 00:23:58,dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1424706375 28800 # Node ID c2eb94770f9b98d5bc5cf0e96d635e26c01ca5c6 # Parent d179686d7b8d79a125b51fc3f8799152add0fd9f

Re: [x265] [PATCH] asm: intrapred dc4 sse2 high bit

2015-02-23 Thread dave
Thanks, will resubmit. On 02/23/2015 08:00 PM, chen wrote: At 2015-02-24 00:24:27,dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1424708564 28800 # Node ID 60db6870a07261ee2acf5556ddf34dae051fa5c9 # Parent c2eb94770f9b98d5bc5cf0e96d635e26c01ca5c6

Re: [x265] [PATCH] asm: dct8 sse2 1.88x improvement over c code

2015-02-20 Thread dave
On 02/19/2015 04:58 PM, Steve Borho wrote: On 02/19, dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1424385856 28800 # Node ID 28287b57013e9c43488bfba1570ded5cfb4af16d # Parent 039ea966d5ebccab1de2c3766fb7b4f125d2020a asm: dct8 sse2 1.88x improvement

Re: [x265] [PATCH] intrinsic: Added dct16 sse3 intrinsic, 55288.92 - 45139.28

2015-02-11 Thread dave
This didn't provide the improvement that I was hoping for as it's only about 1.2x faster than the c code and all the other transforms are well over 2x faster. I will be looking into further improving dct16 and dct8 for sse3 unless there is absolutely no need for them. By the way, sse3

Re: [x265] [PATCH] Added high bit support to sse3 intrinsics

2015-01-21 Thread dave
On 01/21/2015 06:27 AM, Steve Borho wrote: On 01/20, dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1421787956 28800 # Node ID 3c7ef32c8e5ac800430ca1a76ba92a856c4fe598 # Parent 8d470bbcfc9f62fb27cb12f1a9721b3ae40dfcfa Added high bit support to sse3

Re: [x265] [PATCH 1 of 2] asm: rewrite and fix bug in weight_pp_sse4 on HIGH_BIT_DEPTH mode

2015-01-19 Thread dave
Is there a plan to drop 8 bit support? On 01/19/2015 09:34 AM, chen wrote: At 2015-01-20 01:19:27,dave dtyx...@gmail.com wrote: On 01/19/2015 02:22 AM, Min Chen wrote: # HG changeset patch # User Min Chen chenm...@163.com # Date 1421662905 -28800 # Node ID

Re: [x265] [PATCH 2 of 2] asm: rewrite and fix bug in weight_sp_sse4 on HIGH_BIT_DEPTH mode

2015-01-19 Thread dave
On 01/19/2015 02:22 AM, Min Chen wrote: # HG changeset patch # User Min Chen chenm...@163.com # Date 1421662910 -28800 # Node ID b2f64dbe26392dd6bea2badaccf2869bec883392 # Parent a0bb3bb1b076d2ef559ab94bfe81052142d302c3 asm: rewrite and fix bug in weight_sp_sse4 on HIGH_BIT_DEPTH mode ---

Re: [x265] [PATCH 1 of 2] asm: rewrite and fix bug in weight_pp_sse4 on HIGH_BIT_DEPTH mode

2015-01-19 Thread dave
On 01/19/2015 02:22 AM, Min Chen wrote: # HG changeset patch # User Min Chen chenm...@163.com # Date 1421662905 -28800 # Node ID a0bb3bb1b076d2ef559ab94bfe81052142d302c3 # Parent bbc333bd4a6207c72c682b3ea88794c67996aa83 asm: rewrite and fix bug in weight_pp_sse4 on HIGH_BIT_DEPTH mode ---

Re: [x265] [PATCH] asm: idct16 sse2 28900-25000 improvement over intrinsic

2015-01-17 Thread dave
Yeah, I thought it might be a bit too much. I will submit a patch for the tweaked intrinsic. Most of the performance gain is from there anyway. On 01/17/2015 03:55 AM, chen wrote: This code without good maintainability. The developer difficult to understand what's means on mova [rsp + 15 *

Re: [x265] [PATCH] asm: idct[8x8] sse2 12232 - 3500 over c code 3550 - 3500 over intrinsic

2014-12-18 Thread dave
On 12/18/2014 03:24 PM, chen wrote: This code is right, thanks There just a little mistake at below. I will send a patch without it. of course, this code difficult to maintenance, Yeah, that's the trade the more maintainable version that I submitted some time ago is basically the ssse3

Re: [x265] [PATCH] asm: idct[8x8] sse2 12232 - 3500 over c code 3550 - 3500 over intrinsic

2014-12-16 Thread dave
Sorry for the slow reply, I ran into some unrelated technical difficulties... On 12/09/2014 09:56 AM, chen wrote: At 2014-12-09 12:21:22,dtyx...@gmail.com wrote: # HG changeset patch # User David T yuendtyx...@gmail.com # Date 1418098810 28800 # Node ID 39dfcbf07ae468ca9090e2dabb350cc193060229

Re: [x265] [PATCH] asm: idct[8x8] sse2 12232 - 3500 over c code 3550 - 3500 over intrinsic

2014-12-09 Thread dave
On 12/08/2014 10:50 PM, Steve Borho wrote: On 12/08, dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1418098810 28800 # Node ID 39dfcbf07ae468ca9090e2dabb350cc193060229 # Parent 53f7efef5ebda6d5ff03e868f2b702c385d72ddd asm: idct[8x8] sse2 12232 - 3500

Re: [x265] [PATCH] asm: idct[8x8] sse2 12232.53 - 3480 over c code, 3555 - over intrinsic

2014-12-08 Thread dave
On 12/08/2014 03:35 PM, chen wrote: This is x64 only version, but you didn't check enviorment. I will add a check. And you use rsp without alloc space. Sorry. For amd there is 128 bytes of space beyond the end of the stack that is free to use if there are no calls, at least in linux, of

Re: [x265] [PATCH] idct8 sse2

2014-11-19 Thread dave
How does it fail? Are you getting a segmentation fault? It works fine on debian/gcc but it is dependent on the stack being aligned at 8. I don't have any windows environment. I also replaced all the registers and mov instructions with the defines of x86inc.asm for x86_64. Perhaps I missed

Re: [x265] [PATCH] idct8 sse2

2014-11-19 Thread dave
I made no changes to how the stack is used so that's all from gcc. I'll work on it. Perhaps something like what idct8_ssse3 uses? On 11/19/2014 08:46 PM, chen wrote: you never alloc stack but use it, eg: [rsp - 72] At 2014-11-20 11:48:01,dave dtyx...@gmail.com wrote: How does it fail

Re: [x265] [PATCH] refactorizaton of the transform/quant path

2014-11-18 Thread dave
I have been working on an sse2 idct8 assembler primitive. Currently it's only performs a little better than the intrinsic. It is based on the gcc assembler output of the intrinsic. FYI, at first I simply converted the ssse3 idct8 assembler primitive to sse2 since is only uses 3 ssse3

Re: [x265] [PATCH] A few small performance improvements for intrapred c code

2014-10-01 Thread dave
I know the C model is not a priority, I just felt like doing it. Also, I don't think I really changed the core algorithms. As my comment states, A few small performance improvements. On 10/01/2014 08:32 AM, chen wrote: We don't worry about C model performance, it is for reference only. I

Re: [x265] [PATCH] A few small performance improvements for intrapred c code

2014-10-01 Thread dave
match to HEVC specification, it means more jobs during port or understand. At 2014-10-02 01:54:15,dave dtyx...@gmail.com wrote: I know the C model is not a priority, I just felt like doing it. Also, I don't think I really changed the core algorithms. As my comment states, A few

Re: [x265] [PATCH] Changed FrameEncoder::m_tld to a pointer and set it to one of Encoder's ThreadLocalData instances

2014-09-25 Thread dave
On 09/24/2014 06:10 PM, Steve Borho wrote: On 09/24, dave wrote: On 09/24/2014 02:07 PM, Steve Borho wrote: On 09/24, dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1411589843 25200 # Node ID fc82666c3f8fe258e99ffbb8398ae04fd90a4bee # Parent

Re: [x265] [PATCH] Changed FrameEncoder::m_tld to a pointer and set it to one of Encoder's ThreadLocalData instances

2014-09-24 Thread dave
On 09/24/2014 02:07 PM, Steve Borho wrote: On 09/24, dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1411589843 25200 # Node ID fc82666c3f8fe258e99ffbb8398ae04fd90a4bee # Parent b2b7072ddbf73085d457bd6a71bca946e505dea8 Changed FrameEncoder::m_tld to a

Re: [x265] [PATCH] Removed redundant code

2014-08-25 Thread dave
On 08/25/2014 01:36 PM, Steve Borho wrote: On 08/25, dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1408983545 25200 # Node ID fa3c389b255b8299bf75b7dfdab145dfbdc3de40 # Parent 6e6756f94b27c3ef30f6159f1880112a7ff978e3 Removed redundant code if I do

Re: [x265] [PATCH RFC] analysis: use macro and for-loop to simplify fast-intra

2014-08-17 Thread dave
On 08/14/2014 09:10 PM, Steve Borho wrote: On 08/14, dave wrote: On 08/14/2014 05:02 PM, Steve Borho wrote: On 08/14, dave wrote: On 08/14/2014 01:42 PM, Steve Borho wrote: # HG changeset patch # User Steve Borho st...@borho.org # Date 1408048681 18000 # Thu Aug 14 15:38:01 2014 -0500

Re: [x265] [PATCH] Added fast intra search option

2014-08-13 Thread dave
In building with gcc debian 4.7.2-5 I get no warnings. On 08/13/2014 05:46 AM, Deepthi Nandakumar wrote: There are a couple of warnings our regression tests caught with this. Can you take a look? source\encoder\predict.cpp(78): warning C4800: 'const unsigned char' : forcing value to bool

Re: [x265] [PATCH] Added fast intra search option

2014-08-13 Thread dave
On 08/12/2014 10:22 PM, Steve Borho wrote: On 08/12, dtyx...@gmail.com wrote: # HG changeset patch # User David T Yuen dtyx...@gmail.com # Date 1407882999 25200 # Node ID 75e4ad481b3668b1e420ede300287aa3ea3fb8d5 # Parent 8a7f4bb1d1be32fe668d410450c2e320ccae6098 Added fast intra search option

[x265] patches for fast-intra

2014-08-12 Thread dave
not as fast as the other most of the time but not always. I submitted both hoping that someone else has better testing methods that might better distinguish between them. Dave ___ x265-devel mailing list x265-devel@videolan.org https

Re: [x265] patch for faster intra

2014-08-04 Thread dave
On 08/03/2014 07:52 PM, Steve Borho wrote: On 08/02, dave wrote: A few other things... In my testing of encoding a single frame where I only examined processing of the first few CUs this search method always found the lowest cost but cost values never followed a consistent curve with a single

Re: [x265] patch for faster intra

2014-08-04 Thread dave
I think I see my mistake now. The table requires a first index of [log2size - 2] , which I was also passing to Predict::filteringIntraReferenceSamples which only wants log2size. sorry for the confusion. On 08/04/2014 11:41 AM, chen wrote: Hi Dave, I haven't find any mistake between table

Re: [x265] patch for faster intra

2014-08-02 Thread dave
a short video. On 08/01/2014 09:50 PM, Steve Borho wrote: On 08/01, dave wrote: I am submitting a patch to implement the faster intra search suggested by Steve Borho here: https://mailman.videolan.org/pipermail/x265-devel/2014-July/004873.html nice! The patch is implements the faster search

[x265] patch for faster intra

2014-08-01 Thread dave
of TEncSearch.cpp or it might not be applicable to what TEncSearch.cpp is doing. I will be looking into it. Dave ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel

[x265] linking error

2014-07-30 Thread dave
I am getting the following from ld Linking CXX executable x265 libx265.so.29: undefined reference to `x265::ScalingList::MAX_MATRIX_COEF_NUM' collect2: error: ld returned 1 exit status make[2]: *** [x265] Error 1 make[1]: *** [CMakeFiles/cli.dir/all] Error 2 make: *** [all] Error 2 Dave

Re: [x265] linking error

2014-07-30 Thread dave
On 07/30/2014 12:00 PM, Steve Borho wrote: On 07/30, dave wrote: I am getting the following from ld Linking CXX executable x265 libx265.so.29: undefined reference to `x265::ScalingList::MAX_MATRIX_COEF_NUM' collect2: error: ld returned 1 exit status make[2]: *** [x265] Error 1 make[1

[x265] using and encoding of h265 parameter set fields

2014-05-05 Thread dave
Many parameter set fields are encoded in a form that is modified I am guessing for the purpose of maximizing the range of possible encoded values but are probably more useful for the encoding of frames in their unmodified forms. For example, both the VPS and SPS have a field called

  1   2   >