decision making?
thanks,
Dave
___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel
-06-19 06:21:02,dave dtyx...@gmail.com wrote:
For the interpolation primitives there are multiple versions, pp, ps,
sp, and ss. Each one adding an offset before a shift. Yet as I review
the interpolation section 8.5.3.3.3 of the h265 spec, I don't see where
an offset is added. Am I missing
string index bug, you have sent a patch to fix it.
At 2015-06-10 00:59:03,dave dtyx...@gmail.com wrote:
Can you post the errors? I am guessing I need to rebase this on
the latest code and resubmit.
On 06/08/2015 11:08 PM, Deepthi Nandakumar wrote:
Hi,
This throws build
:
other is fine, we can push it, thanks
At 2015-06-05 21:45:01,dave dtyx...@gmail.com
mailto:dtyx...@gmail.com wrote:
I prefer to keep this patch as sse3/movdqu, is there anything
holding this back?
On 06/02/2015 09:44 AM, chen wrote:
the root cause
:51,dave dtyx...@gmail.com wrote:
I would like to profile x265 with lttng by using
-finstrument-functions but when I add it to CXX flags I get the
following when linking,
Linking CXX executable x265
/usr/bin/cmake -E cmake_link_script CMakeFiles/cli.dir/link.txt
--verbose=1
appreciated.
thanks,
Dave
|
___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel
Wow, thanks Min. I did a decent amount of searching on the internet but
didn't find this.
On 06/08/2015 12:52 PM, chen wrote:
no answer in gcc website.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52544
At 2015-06-09 03:26:10,dave dtyx...@gmail.com wrote:
cmake/make already has
choice to keep movdqu in here
At 2015-06-03 00:28:32,dave dtyx...@gmail.com wrote:
On 06/02/2015 09:16 AM, chen wrote:
movdqu - movu
others fine
I am not sure why but movu here fails to build.
source/common/x86/ipfilter16.asm:954: error: invalid combination of
opcode and operands
make[2
unrolled with %rep, as is the
sse4 version, though in a different way. This probably generates
considerably larger executables, especially for the larger sizes. Is
there any preference on this? Are x265's goals purely performance
related over memory usage?
thanks,
Dave
On 06/02/2015 09:16 AM, chen wrote:
movdqu - movu
others fine
I am not sure why but movu here fails to build.
source/common/x86/ipfilter16.asm:954: error: invalid combination of
opcode and operands
make[2]: *** [common/CMakeFiles/common.dir/x86/ipfilter16.asm.o] Error 1
make[1]: ***
a bug of movu. lddqu can only move up to 64-bits but
movu moves 128-bits in 64-bit mode. If fixed, then changing movdqu to
movu should cause no problems.
At 2015-06-03 00:28:32,dave dtyx...@gmail.com wrote:
On 06/02/2015 09:16 AM, chen wrote:
movdqu - movu
others fine
I am not sure why
FYI, If what I submitted performs better than the sse4 code then I
suggest either improving the sse4 code with ssse3 and sse4 instructions
or removing it.
On 05/29/2015 10:12 AM, chen wrote:
right,thanks
.
At 2015-05-30 01:01:15,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T
On 05/21/2015 07:07 PM, chen wrote:
lea srcq, [srcq + srcstrideq]
why not ADD?
That's carried over from the sse4 code. I can change it to add if that
is what you want.
At 2015-05-22 09:27:57,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date
On 05/19/2015 04:40 PM, chen wrote:
Why x64 only? r4 always free in both x86 and x64
It hurts performance in the benchtest for x32 but I can make it cover
both if that is what you want.
At 2015-05-20 07:33:14,dtyx...@gmail.com mailto:dtyx...@gmail.com wrote:
# HG changeset patch
# User David
On 05/18/2015 07:55 AM, chen wrote:
At 2015-05-18 10:48:54,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1431912252 25200
# Node ID c2624b61f4c7d894616a7dc1e8a6cc1c0a506028
# Parent 72bba6b9e99739599d04be62c7e02a3c8faa
asm:
On 05/18/2015 09:42 AM, chen wrote:
At 2015-05-19 00:36:01,dave dtyx...@gmail.com wrote:
On 05/18/2015 07:50 AM, chen wrote:
At 2015-05-18 10:48:53,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuendtyx...@gmail.com
# Date 1431911615 25200
# Node ID
On 05/18/2015 07:50 AM, chen wrote:
At 2015-05-18 10:48:53,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1431911615 25200
# Node ID 72bba6b9e99739599d04be62c7e02a3c8faa
# Parent 465fb4340a241e501b53a6241f5ae81c29ba073a
asm:
On 05/18/2015 05:27 PM, Steve Borho wrote:
On 05/18, dave wrote:
On 05/18/2015 09:42 AM, chen wrote:
[MC] yes, it is faster on AMD CPU, on Intel, these instructions
choke Port5, the PADD execute on Port1. I often choice faster
instrction for Intel because my PC use Intel CPU
This follows 5f9e5e9. I am not sure if that is helpful, if not then I
can resubmit from a clean tree.
On 05/12/2015 04:50 AM, Deepthi Nandakumar wrote:
Thanks, but I cannot apply this patch. The parent-ID does not exist on
the public tree.
On Tue, May 12, 2015 at 7:21 AM, chen
On 05/06/2015 01:29 PM, chen wrote:
At 2015-05-07 03:45:35,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1430940440 25200
# Node ID 4690c9aa24caa1adb665355803d4c308a124ec96
# Parent 87d6724649df0157786c4210f0caebf961b31341
asm: interp_4tap_vert_pp
On 05/06/2015 01:29 PM, chen wrote:
At 2015-05-07 03:45:35,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1430940440 25200
# Node ID 4690c9aa24caa1adb665355803d4c308a124ec96
# Parent 87d6724649df0157786c4210f0caebf961b31341
asm: interp_4tap_vert_pp
Ignore this, I will be sending a better version.
On 05/05/2015 06:32 PM, dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1430875749 25200
# Node ID 9452c826eb205682647ee0db4d8d445785bb7a1a
# Parent f32e6464225afa02983af1b1905f50cdccae5244
asm:
On 05/01/2015 10:03 AM, Steve Borho wrote:
On 04/29,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuendtyx...@gmail.com
# Date 1430361608 25200
# Node ID f95cc094467c844c6607c67d330748d171d26483
# Parent 9a1b8b71bc997547044f42992e1eb7f3572f03f1
asm: interp_8tap_hv_pp_8x8 sse3
I submitted a patch fixing this but I was only able to test it for
64-bit. When trying to build for 32-bit I ran into the following build
error.
Linking CXX executable x265
libx265.so.56: undefined reference to `dlsym'
libx265.so.56: undefined reference to `dlopen'
collect2: error: ld
On 04/30/2015 09:28 AM, Steve Borho wrote:
On 04/30, dave wrote:
I submitted a patch fixing this but I was only able to test it for 64-bit.
When trying to build for 32-bit I ran into the following build error.
Linking CXX executable x265
libx265.so.56: undefined reference to `dlsym'
libx265.so
On 04/30/2015 12:43 PM, Steve Borho wrote:
On 04/30, dave wrote:
On 04/30/2015 09:28 AM, Steve Borho wrote:
On 04/30, dave wrote:
I submitted a patch fixing this but I was only able to test it for 64-bit.
When trying to build for 32-bit I ran into the following build error.
Linking CXX
On 04/28/2015 06:38 PM, chen wrote:
在 2015-04-29 09:30:36,dave dtyx...@gmail.com 写道:
On 04/28/2015 06:13 PM, chen wrote:
在 2015-04-29 07:49:46,dave dtyx...@gmail.com 写道:
On 04/28/2015 03:32 PM, chen wrote:
Most part are fine now, just modify about r5, see below
On 04/28/2015 03:32 PM, chen wrote:
Most part are fine now, just modify about r5, see below comment
At 2015-04-29 06:27:27,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T yuendtyx...@gmail.com
# Date 1430259967 25200
# Node ID 6108fbda1be654a481a78f7ef593518033919674
# Parent
On 04/28/2015 06:13 PM, chen wrote:
在 2015-04-29 07:49:46,dave dtyx...@gmail.com 写道:
On 04/28/2015 03:32 PM, chen wrote:
Most part are fine now, just modify about r5, see below comment
At 2015-04-29 06:27:27,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T
On 04/27/2015 08:42 PM, chen wrote:
At 2015-04-28 09:05:06,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T yuendtyx...@gmail.com
# Date 1430182995 25200
# Node ID 31b76bd430a47411f7b2ebaa7cfbb44e25c5ff60
# Parent 68a13226d586b335c02cade9311e093f0149c42a
asm: interp_8tap_horiz pp
On 04/21/2015 07:48 PM, chen wrote:
At 2015-04-22 09:13:47,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T yuendtyx...@gmail.com
# Date 1429665160 25200
# Node ID defd1cf26749f3395750ef9128c9a90bfa2caf78
# Parent c135c117ffb083a00d4353279ea669e8f3f7a8ee
asm: interp_4tap_horiz_pp
On 04/17/2015 10:55 PM, chen wrote:
patches are right
the code style need modify, the first column reserved for label or
preprocess (%)
just (4 of 8) macro's style right
I sent a patch adding the first column.
I also know the RET macro has some line spacing issues. There are a few
spots
On 04/03/2015 12:21 PM, Steve Borho wrote:
On Fri, Apr 3, 2015 at 11:24 AM, dtyx...@gmail.com wrote:
All modes are backported from sse4 code
btw: I believe you would get more 'bang for your buck' if you
back-ported the subpel interpolation filter primitives before the rest
of the intra
Sorry, I am not sure how this happened. It is supposed to be applied
after the mode 10 patch. I think it's some quirk in my TortoiseHG client.
On 04/01/2015 09:47 PM, chen wrote:
right
the mode 26 order fault, apply patches failed.
At 2015-04-02 05:14:21,dtyx...@gmail.com wrote:
This patch
On 04/01/2015 09:24 PM, chen wrote:
At 2015-04-02 02:52:16,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1427891624 25200
# Node ID 529c6056ccfbce57cd845abec59a2af02812cd57
# Parent 89bc6238d4a2e3f117f0127e406c6dfbf093868b
asm:intra_pred_ang4_3_sse2
please disregard this one. The correct version has been sent
On 04/01/2015 11:52 AM, dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1427912500 25200
# Node ID fc6b5f8bbcc8283e5b4fd88d41b8c313b002a198
# Parent fc902b84fc7f8dadf56431766adab8eda3520596
On 03/30/2015 11:49 AM, Steve Borho wrote:
On 03/28, dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1427590314 25200
# Node ID ea65ec5a6c969e4cee612faa4d948e3337ed72d1
# Parent 36d70728acc2d9d6103af7530493176c08298ded
asm:intra_pred_ang4_2 improved
On 03/28/2015 03:20 PM, chen wrote:
At 2015-03-29 05:35:21,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1427576216 25200
# Node ID 0a75e3d50518e73f5a199d7519f800a9ff1c2e2c
# Parent 6595ba5f989fdd521e268911ddf027665a610e25
asm:intra_pred_ang4_4_sse2
On 03/27/2015 08:08 PM, chen wrote:
At 2015-03-28 10:42:50,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1427509819 25200
# Node ID 1ef1d75462375830f47e45d53dcf1c33d458d15e
# Parent 89f337fa1d295b161d4800cc28674c9d02d4c6a9
asm:
On 03/12/2015 03:16 PM, chen wrote:
I use 'pxor m7,m7' to replace your [pb_0], but it is same cycles in
IACA, the bottleneck on Port0
Not sure how about performance on old CPU
I would have used something like that but there are no available
registers by that point. They are used up on holding
This produces some interesting numbers.
sorry, I mixed these two up.
incorrect:Without using registers for constants
with using registers
x265 [info]: I32: Intra 100%(DC 0% P 40% Ang 58%)
encoded 2000 frames in 95.98s (20.84 fps), 1020.04 kb/s
incorrect:With using registers for constants
On 03/09/2015 11:40 PM, Steve Borho wrote:
On 03/09, dave wrote:
On 03/09/2015 08:25 PM, Steve Borho wrote:
On 03/09, dave wrote:
Interesting. Performance is almost identical
original code
/x265 -I 1 --input ~/Videos/bridge-close-cif/bridge-close.y4m -o
bridge-close.y4m
y4m [info
On 03/10/2015 08:56 AM, Steve Borho wrote:
On 03/10, dave wrote:
On 03/09/2015 11:40 PM, Steve Borho wrote:
snip
No, but the command line option --cu-stats does show how much it is
called (but not how long it took)
This produces some interesting numbers.
Without using registers
On 03/10/2015 12:12 PM, Steve Borho wrote:
On 03/10, dave wrote:
This produces some interesting numbers.
sorry, I mixed these two up.
incorrect:Without using registers for constants
with using registers
x265 [info]: I32: Intra 100%(DC 0% P 40% Ang 58%)
encoded 2000 frames in 95.98s (20.84
On 03/09/2015 04:01 PM, chen wrote:
if no performance drop, we may write as new format
OK. Actually, since the sse4 version works in double words, this
version is probably faster on your Haswell.
At 2015-03-10 07:00:08,dave dtyx...@gmail.com wrote:
On 03/09/2015 03:20 PM, chen wrote
On 03/09/2015 03:23 PM, chen wrote:
At 2015-03-10 05:24:09,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1425935926 25200
# Node ID 4d1d54d28cb1635448ceec764c793d1b37cef7a4
# Parent ef383507c21ca704b58e759e440f4ae2e177c499
asm:intra pred planar32
On 03/09/2015 03:20 PM, chen wrote:
a little improve need integrate
lea r0, [r0 + r1]
when I replace it by add r0, r1, I got ~3% improve on my Haswell PC
This makes no difference on my old machine.
At 2015-03-10 05:24:08,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T
On 03/09/2015 04:09 PM, chen wrote:
At 2015-03-10 07:06:26,dave dtyx...@gmail.com wrote:
On 03/09/2015 03:23 PM, chen wrote:
At 2015-03-10 05:24:09,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuendtyx...@gmail.com
# Date 1425935926 25200
# Node ID
On 03/09/2015 08:25 PM, Steve Borho wrote:
On 03/09, dave wrote:
Interesting. Performance is almost identical
original code
/x265 -I 1 --input ~/Videos/bridge-close-cif/bridge-close.y4m -o
bridge-close.y4m
y4m [info]: 352x288 fps 30/1 i420p8 frames 0 - 1999 of 2000
x265 [info]: HEVC encoder
On 03/09/2015 04:34 PM, chen wrote:
At 2015-03-10 07:31:13,dave dtyx...@gmail.com wrote:
On 03/09/2015 04:09 PM, chen wrote:
At 2015-03-10 07:06:26,dave dtyx...@gmail.com wrote:
On 03/09/2015 03:23 PM, chen wrote:
At 2015-03-10 05:24:09,dtyx...@gmail.com wrote
On 03/09/2015 04:34 PM, chen wrote:
At 2015-03-10 07:31:13,dave dtyx...@gmail.com wrote:
On 03/09/2015 04:09 PM, chen wrote:
At 2015-03-10 07:06:26,dave dtyx...@gmail.com wrote:
On 03/09/2015 03:23 PM, chen wrote:
At 2015-03-10 05:24:09,dtyx...@gmail.com wrote
On 03/06/2015 05:07 PM, Min Chen wrote:
# HG changeset patch
# User Min Chen chenm...@163.com
# Date 1425690429 28800
# Node ID 63d132c844b9d299081b40e7589275b78fe71093
# Parent 043c2418864b0a3ada6f597e6def6ead73d90b5f
asm: improve on intra_dc32
---
source/common/x86/intrapred8.asm | 71
On 03/06/2015 04:06 PM, chen wrote:
At 2015-03-07 07:58:13,dave dtyx...@gmail.com wrote:
On 03/06/2015 05:07 PM, Min Chen wrote:
# HG changeset patch
# User Min Chen chenm...@163.com
# Date 1425690429 28800
# Node ID 63d132c844b9d299081b40e7589275b78fe71093
# Parent
I can resubmit patches for intra pred sse2 not yet accepted based on the
new tip if that is helpful. Either all in one patch or individually if
that is preferred. So far, dc4, 8, 16, 32 and planar8 for
intrapred16.asm and dc32 and planar8 for intrapred8.asm. I also have
planar16 ready.
On 03/05/2015 06:02 PM, chen wrote:
At 2015-03-06 08:19:57,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1425594719 28800
# Node ID 912c42dcb4d9b399515e6c1ed6be70db3bf5f675
# Parent c5fa433ffda0a95889e99f4df787f3edc5880d0f
asm:intra pred dc32 sse2
On 03/04/2015 06:44 PM, chen wrote:
At 2015-03-05 10:03:59,dave dtyx...@gmail.com wrote:
On 03/04/2015 04:39 PM, chen wrote:
At 2015-03-05 07:54:02,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuendtyx...@gmail.com
# Date 1425512599 28800
# Node ID
On 03/04/2015 04:39 PM, chen wrote:
At 2015-03-05 07:54:02,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1425512599 28800
# Node ID 16880e791046ef8470f8307b76aae57c3be573c1
# Parent c53b456ad909eeab8d83f8e0817e641d174cc706
asm:intra pred planar8 sse2
On 03/04/2015 07:12 PM, Steve Borho wrote:
On 03/04, dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1425513096 28800
# Node ID 243cb00021a3bbc3184cbc2b27a5dbe56d745c51
# Parent bf095f7deed80f741663edae2b3a90bb4f63a980
asm:intra pred planar8 sse2 high
On 03/03/2015 07:03 AM, chen wrote:
right, with some comment below
Thanks, I will submit a new patch.
At 2015-03-03 12:03:23,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1425355260 28800
# Node ID 48ca9a0c131c99d54778515dc5a6a5a7a9197153
# Parent
oops, forgot to remove old comment, will resubmit...
On 03/03/2015 09:40 AM, dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1425404383 28800
# Node ID 1699595e179d7c131ab24a9af48683a8da2a4d79
# Parent 4641827f98c935603f608425de8c76785aef1114
asm:
The sse4 intra pred planar4 is actually sse2 so this is the same thing
with a few minor tweaks. There's probably more room for improvement of
the sse4 version with ssse3+ instructions.
On 03/03/2015 06:47 PM, dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
On 02/26/2015 02:43 PM, Steve Borho wrote:
On 02/26, dave wrote:
On 02/25/2015 05:59 AM, chen wrote:
At 2015-02-25 08:16:44,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1424823199 28800
# Node ID bd1d713da87bb1e3022c80f462398bd78a95ce48
# Parent
FYI,
I forgot to comment in the commit message:
I kept the original code for 64 bit because while using r2b works in 64
bits, using it severely hurt performance to the point that it was well
below c code.
This is probably due to how things like instruction order, length and
layout in
On 02/25/2015 05:59 AM, chen wrote:
At 2015-02-25 08:16:44,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1424823199 28800
# Node ID bd1d713da87bb1e3022c80f462398bd78a95ce48
# Parent 644d27ca0b197455393171ba705b1190f3d9b420
asm: intrapred dc8 sse2
On 02/25/2015 05:55 AM, chen wrote:
At 2015-02-25 08:15:59,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1424818021 28800
# Node ID 644d27ca0b197455393171ba705b1190f3d9b420
# Parent 1a703601f7c8b85f1e6e680caa281b9edada89ab
asm: intrapred dc8 sse2
On 02/25/2015 05:59 AM, chen wrote:
At 2015-02-25 08:16:44,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1424823199 28800
# Node ID bd1d713da87bb1e3022c80f462398bd78a95ce48
# Parent 644d27ca0b197455393171ba705b1190f3d9b420
asm: intrapred dc8 sse2
On 02/25/2015 02:30 PM, Steve Borho wrote:
On Wed, Feb 25, 2015 at 7:34 AM, chen chenm...@163.com wrote:
it's right, just move pw_257 into const.asm in future
I pushed this one, but then realized that the -m32 build is broken on
LInux. It looks to be caused by the usage of r4b
I will look into
Testing fails when movzx is removed so I left it in.
On 02/23/2015 08:00 PM, chen wrote:
At 2015-02-24 00:24:27,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1424708564 28800
# Node ID 60db6870a07261ee2acf5556ddf34dae051fa5c9
# Parent
Thanks, will resubmit.
On 02/23/2015 07:47 PM, chen wrote:
At 2015-02-24 00:23:58,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1424706375 28800
# Node ID c2eb94770f9b98d5bc5cf0e96d635e26c01ca5c6
# Parent d179686d7b8d79a125b51fc3f8799152add0fd9f
Thanks, will resubmit.
On 02/23/2015 08:00 PM, chen wrote:
At 2015-02-24 00:24:27,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1424708564 28800
# Node ID 60db6870a07261ee2acf5556ddf34dae051fa5c9
# Parent c2eb94770f9b98d5bc5cf0e96d635e26c01ca5c6
On 02/19/2015 04:58 PM, Steve Borho wrote:
On 02/19, dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1424385856 28800
# Node ID 28287b57013e9c43488bfba1570ded5cfb4af16d
# Parent 039ea966d5ebccab1de2c3766fb7b4f125d2020a
asm: dct8 sse2 1.88x improvement
This didn't provide the improvement that I was hoping for as it's only
about 1.2x faster than the c code and all the other transforms are well
over 2x faster. I will be looking into further improving dct16 and dct8
for sse3 unless there is absolutely no need for them.
By the way, sse3
On 01/21/2015 06:27 AM, Steve Borho wrote:
On 01/20, dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1421787956 28800
# Node ID 3c7ef32c8e5ac800430ca1a76ba92a856c4fe598
# Parent 8d470bbcfc9f62fb27cb12f1a9721b3ae40dfcfa
Added high bit support to sse3
Is there a plan to drop 8 bit support?
On 01/19/2015 09:34 AM, chen wrote:
At 2015-01-20 01:19:27,dave dtyx...@gmail.com wrote:
On 01/19/2015 02:22 AM, Min Chen wrote:
# HG changeset patch
# User Min Chen chenm...@163.com
# Date 1421662905 -28800
# Node ID
On 01/19/2015 02:22 AM, Min Chen wrote:
# HG changeset patch
# User Min Chen chenm...@163.com
# Date 1421662910 -28800
# Node ID b2f64dbe26392dd6bea2badaccf2869bec883392
# Parent a0bb3bb1b076d2ef559ab94bfe81052142d302c3
asm: rewrite and fix bug in weight_sp_sse4 on HIGH_BIT_DEPTH mode
---
On 01/19/2015 02:22 AM, Min Chen wrote:
# HG changeset patch
# User Min Chen chenm...@163.com
# Date 1421662905 -28800
# Node ID a0bb3bb1b076d2ef559ab94bfe81052142d302c3
# Parent bbc333bd4a6207c72c682b3ea88794c67996aa83
asm: rewrite and fix bug in weight_pp_sse4 on HIGH_BIT_DEPTH mode
---
Yeah, I thought it might be a bit too much. I will submit a patch for
the tweaked intrinsic. Most of the performance gain is from there anyway.
On 01/17/2015 03:55 AM, chen wrote:
This code without good maintainability.
The developer difficult to understand what's means on
mova [rsp + 15 *
On 12/18/2014 03:24 PM, chen wrote:
This code is right, thanks
There just a little mistake at below.
I will send a patch without it.
of course, this code difficult to maintenance,
Yeah, that's the trade the more maintainable version that I submitted
some time ago is basically the ssse3
Sorry for the slow reply, I ran into some unrelated technical
difficulties...
On 12/09/2014 09:56 AM, chen wrote:
At 2014-12-09 12:21:22,dtyx...@gmail.com wrote:
# HG changeset patch
# User David T yuendtyx...@gmail.com
# Date 1418098810 28800
# Node ID 39dfcbf07ae468ca9090e2dabb350cc193060229
On 12/08/2014 10:50 PM, Steve Borho wrote:
On 12/08, dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1418098810 28800
# Node ID 39dfcbf07ae468ca9090e2dabb350cc193060229
# Parent 53f7efef5ebda6d5ff03e868f2b702c385d72ddd
asm: idct[8x8] sse2 12232 - 3500
On 12/08/2014 03:35 PM, chen wrote:
This is x64 only version, but you didn't check enviorment.
I will add a check.
And you use rsp without alloc space.
Sorry. For amd there is 128 bytes of space beyond the end of the stack
that is free to use if there are no calls, at least in linux, of
How does it fail? Are you getting a segmentation fault? It works fine
on debian/gcc but it is dependent on the stack being aligned at 8. I
don't have any windows environment.
I also replaced all the registers and mov instructions with the defines
of x86inc.asm for x86_64. Perhaps I missed
I made no changes to how the stack is used so that's all from gcc. I'll
work on it. Perhaps something like what idct8_ssse3 uses?
On 11/19/2014 08:46 PM, chen wrote:
you never alloc stack but use it, eg: [rsp - 72]
At 2014-11-20 11:48:01,dave dtyx...@gmail.com wrote:
How does it fail
I have been working on an sse2 idct8 assembler primitive. Currently
it's only performs a little better than the intrinsic. It is based on
the gcc assembler output of the intrinsic.
FYI, at first I simply converted the ssse3 idct8 assembler primitive to
sse2 since is only uses 3 ssse3
I know the C model is not a priority, I just felt like doing it. Also,
I don't think I really changed the core algorithms. As my comment
states, A few small performance improvements.
On 10/01/2014 08:32 AM, chen wrote:
We don't worry about C model performance, it is for reference only.
I
match to HEVC specification, it means more jobs
during port or understand.
At 2014-10-02 01:54:15,dave dtyx...@gmail.com wrote:
I know the C model is not a priority, I just felt like doing it.
Also, I don't think I really changed the core algorithms. As my
comment states, A few
On 09/24/2014 06:10 PM, Steve Borho wrote:
On 09/24, dave wrote:
On 09/24/2014 02:07 PM, Steve Borho wrote:
On 09/24, dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1411589843 25200
# Node ID fc82666c3f8fe258e99ffbb8398ae04fd90a4bee
# Parent
On 09/24/2014 02:07 PM, Steve Borho wrote:
On 09/24, dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1411589843 25200
# Node ID fc82666c3f8fe258e99ffbb8398ae04fd90a4bee
# Parent b2b7072ddbf73085d457bd6a71bca946e505dea8
Changed FrameEncoder::m_tld to a
On 08/25/2014 01:36 PM, Steve Borho wrote:
On 08/25, dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1408983545 25200
# Node ID fa3c389b255b8299bf75b7dfdab145dfbdc3de40
# Parent 6e6756f94b27c3ef30f6159f1880112a7ff978e3
Removed redundant code
if I do
On 08/14/2014 09:10 PM, Steve Borho wrote:
On 08/14, dave wrote:
On 08/14/2014 05:02 PM, Steve Borho wrote:
On 08/14, dave wrote:
On 08/14/2014 01:42 PM, Steve Borho wrote:
# HG changeset patch
# User Steve Borho st...@borho.org
# Date 1408048681 18000
# Thu Aug 14 15:38:01 2014 -0500
In building with gcc debian 4.7.2-5 I get no warnings.
On 08/13/2014 05:46 AM, Deepthi Nandakumar wrote:
There are a couple of warnings our regression tests caught with this.
Can you take a look?
source\encoder\predict.cpp(78): warning C4800: 'const unsigned char' :
forcing value to bool
On 08/12/2014 10:22 PM, Steve Borho wrote:
On 08/12, dtyx...@gmail.com wrote:
# HG changeset patch
# User David T Yuen dtyx...@gmail.com
# Date 1407882999 25200
# Node ID 75e4ad481b3668b1e420ede300287aa3ea3fb8d5
# Parent 8a7f4bb1d1be32fe668d410450c2e320ccae6098
Added fast intra search option
not as fast as the other most of the time but not always. I
submitted both hoping that someone else has better testing methods that
might better distinguish between them.
Dave
___
x265-devel mailing list
x265-devel@videolan.org
https
On 08/03/2014 07:52 PM, Steve Borho wrote:
On 08/02, dave wrote:
A few other things...
In my testing of encoding a single frame where I only examined processing of
the first few CUs this search method always found the lowest cost but cost
values never followed a consistent curve with a single
I think I see my mistake now. The table requires a first index of
[log2size - 2] , which I was also passing to
Predict::filteringIntraReferenceSamples which only wants log2size.
sorry for the confusion.
On 08/04/2014 11:41 AM, chen wrote:
Hi Dave,
I haven't find any mistake between table
a short video.
On 08/01/2014 09:50 PM, Steve Borho wrote:
On 08/01, dave wrote:
I am submitting a patch to implement the faster intra search suggested by
Steve Borho here:
https://mailman.videolan.org/pipermail/x265-devel/2014-July/004873.html
nice!
The patch is implements the faster search
of TEncSearch.cpp or it might not be applicable to what
TEncSearch.cpp is doing. I will be looking into it.
Dave
___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel
I am getting the following from ld
Linking CXX executable x265
libx265.so.29: undefined reference to
`x265::ScalingList::MAX_MATRIX_COEF_NUM'
collect2: error: ld returned 1 exit status
make[2]: *** [x265] Error 1
make[1]: *** [CMakeFiles/cli.dir/all] Error 2
make: *** [all] Error 2
Dave
On 07/30/2014 12:00 PM, Steve Borho wrote:
On 07/30, dave wrote:
I am getting the following from ld
Linking CXX executable x265
libx265.so.29: undefined reference to
`x265::ScalingList::MAX_MATRIX_COEF_NUM'
collect2: error: ld returned 1 exit status
make[2]: *** [x265] Error 1
make[1
Many parameter set fields are encoded in a form that is modified I am
guessing for the purpose of maximizing the range of possible encoded
values but are probably more useful for the encoding of frames in their
unmodified forms. For example, both the VPS and SPS have a field called
1 - 100 of 112 matches
Mail list logo