On 2017-02-27 12:13, Paul B Mahol wrote:
> On 2/27/17, James Darnley <jdarn...@obe.tv> wrote:
>>
>> Does anyone have any comments on the patch set? For example: should I
>> merge this sse2 patch into the others?
>
> probably not, just commit.
Will do. I have
On 2017-02-22 01:27, James Darnley wrote:
> ---
> libavcodec/x86/h264_deblock.asm | 1 +
> libavcodec/x86/h264dsp_init.c | 10 ++
> 2 files changed, 11 insertions(+)
>
> diff --git a/libavcodec/x86/h264_deblock.asm b/libavcodec/x86/h264_deblock.asm
> index 32
---
libavcodec/x86/h264_deblock.asm | 1 +
libavcodec/x86/h264dsp_init.c | 10 ++
2 files changed, 11 insertions(+)
diff --git a/libavcodec/x86/h264_deblock.asm b/libavcodec/x86/h264_deblock.asm
index 32aa3d3..6702ae9 100644
--- a/libavcodec/x86/h264_deblock.asm
+++
libavcodec/x86/h264_deblock.asm | 1 +
libavcodec/x86/h264dsp_init.c | 10 ++
2 files changed, 11 insertions(+)
Okay, enabling sse2 gets me the results below. It turns out I should allow sse2
despite some previous testing. Should I leave avx? Sometimes it is a few
percentage
~1.21x faster (68 vs. 56 cycles) compared with mmxext function
---
libavcodec/x86/h264_deblock.asm | 27 +++
libavcodec/x86/h264dsp_init.c | 2 ++
2 files changed, 29 insertions(+)
diff --git a/libavcodec/x86/h264_deblock.asm b/libavcodec/x86/h264_deblock.asm
index
~1.24x faster (101 vs. 81 cycles) compared with mmxext function
---
libavcodec/x86/h264_deblock.asm | 38 ++
libavcodec/x86/h264dsp_init.c | 2 ++
2 files changed, 40 insertions(+)
diff --git a/libavcodec/x86/h264_deblock.asm
~1.10x faster (69 vs. 63 cycles) compared to mmxext function
---
libavcodec/x86/h264_deblock.asm | 9 +
libavcodec/x86/h264dsp_init.c | 1 +
2 files changed, 10 insertions(+)
diff --git a/libavcodec/x86/h264_deblock.asm b/libavcodec/x86/h264_deblock.asm
index 1e6d822..2197608 100644
~1.14x faster (93 vs. 81 cycles) compared with mmxext function
---
libavcodec/x86/h264_deblock.asm | 70 +
libavcodec/x86/h264dsp_init.c | 3 ++
2 files changed, 73 insertions(+)
diff --git a/libavcodec/x86/h264_deblock.asm
~1.37x faster (147 vs. 108 cycles) compared to mmxext function
---
libavcodec/x86/h264_deblock.asm | 18 ++
libavcodec/x86/h264dsp_init.c | 1 +
2 files changed, 19 insertions(+)
diff --git a/libavcodec/x86/h264_deblock.asm b/libavcodec/x86/h264_deblock.asm
index
~1.14x faster (90 vs 78 cycles) compared with mmxext
---
libavcodec/x86/h264_deblock.asm | 33 +
libavcodec/x86/h264dsp_init.c | 1 +
2 files changed, 34 insertions(+)
diff --git a/libavcodec/x86/h264_deblock.asm b/libavcodec/x86/h264_deblock.asm
index
6 more functions which eke out a little more speed.
James Darnley (6):
avcodec/h264: add avx 8-bit chroma v deblock/loop filter
avcodec/h264: add avx 8-bit 4:2:0 chroma h deblock/loop filter
avcodec/h264: add avx 8-bit 4:2:2 chroma h deblock/loop filter
avcodec/h264: add avx 8-bit chroma
On 2017-02-16 14:11, James Darnley wrote:
> Four patches
Does anyone else have any more comments about this patch series?
Yea or nay from anyone?
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
---
libavcodec/x86/h264_deblock.asm | 5 -
libavcodec/x86/h264_deblock_10bit.asm | 5 -
libavcodec/x86/hevc_deblock.asm | 5 -
libavutil/x86/x86util.asm | 5 +
4 files changed, 5 insertions(+), 15 deletions(-)
diff --git a/libavcodec/x86/h264_deblock.asm
---
libavcodec/x86/h264_deblock.asm | 32
1 file changed, 16 insertions(+), 16 deletions(-)
diff --git a/libavcodec/x86/h264_deblock.asm b/libavcodec/x86/h264_deblock.asm
index 435c8be..509a0db 100644
--- a/libavcodec/x86/h264_deblock.asm
+++
x86-64 only
Yorkfield:
- sse2: ~2.17x (434 vs. 200 cycles)
Nehalem:
- sse2: ~2.94x (409 vs. 139 cycles)
Skylake:
- sse2: ~3.10x (370 vs. 119 cycles)
- avx: ~3.29x (370 vs. 112 cycles)
---
libavcodec/x86/h264_deblock.asm | 89 +
On 2017-02-14 22:25, Mark Thompson wrote:
> On 14/02/17 19:44, Daniel Oberhoff wrote:
>> filter strictly “halves” the image efficiently, which is often exactly what
>> is needed
>> likely much faster than using scale
>
> Did you benchmark this? How?
>
> $ time ./ffmpeg -f lavfi -i allyuv -vf
On 2017-02-14 17:21, Henrik Gramner wrote:
> On Mon, Feb 13, 2017 at 1:44 PM, James Darnley <jdarn...@obe.tv> wrote:
>> Originally committed to x264 in 1637239a by Henrik Gramner who has
>> agreed to re-license it as LGPL. Original commit message follows.
>>
>>
Originally committed to x264 in 1637239a by Henrik Gramner who has
agreed to re-license it as LGPL. Original commit message follows.
x86: Avoid some bypass delays and false dependencies
A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning
between int and
---
libavcodec/x86/h264_deblock.asm | 32
1 file changed, 16 insertions(+), 16 deletions(-)
diff --git a/libavcodec/x86/h264_deblock.asm b/libavcodec/x86/h264_deblock.asm
index 435c8be56f..509a0dbe0c 100644
--- a/libavcodec/x86/h264_deblock.asm
+++
---
libavcodec/x86/h264_deblock.asm | 5 -
libavcodec/x86/h264_deblock_10bit.asm | 5 -
libavcodec/x86/hevc_deblock.asm | 5 -
libavutil/x86/x86util.asm | 5 +
4 files changed, 5 insertions(+), 15 deletions(-)
diff --git a/libavcodec/x86/h264_deblock.asm
x86-64 only
Yorkfield:
- sse2: 2.16x (434 vs. 201 cycles)
Skylake:
- sse2: 3.04x (378 vs. 124 cycles)
- avx: 3.29x (378 vs. 115 cycles)
---
libavcodec/x86/h264_deblock.asm | 119
libavcodec/x86/h264dsp_init.c | 10
2 files changed, 129
On 2017-01-04 13:17, Rostislav Pehlivanov wrote:
> Forgot to check the return value here, changed locally to:
>
> if (ff_fft_init(>ptwo_fft, N - 1, 1) < 0);
> goto fail;
I hope you have not changed it to that, with that semicolon at the end
of the line.
signature.asc
Description:
32-bit msvc.
---
libavcodec/x86/h264_deblock_10bit.asm | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/libavcodec/x86/h264_deblock_10bit.asm
b/libavcodec/x86/h264_deblock_10bit.asm
index 56cf4d6..c295364 100644
--- a/libavcodec/x86/h264_deblock_10bit.asm
+++
On 2016-12-05 19:32, James Darnley wrote:
> Fixed the problem Michael highlighted. Dropped the intra functions until it
> becomes clear why their performance is unexpected. Updated the benchmarks with
> results from a Nehalem and used (slightly) more accurate data.
>
> Regarding
Yorkfield:
- mmx2: 2.45x (279 vs. 114 cycles)
- sse2: 3.36x (279 vs. 83 cycles)
Nehalem:
- mmx2: 2.10x (192 vs. 92 cycles)
- sse2: 2.84x (192 vs. 68 cycles)
Skylake:
- mmx2: 1.75x (170 vs. 97 cycles)
- sse2: 2.47x (170 vs. 69 cycles)
- avx: 2.47x (170 vs. 69 cycles)
---
Yorkfield:
- mmx2: 2.53x (504 vs. 199 cycles)
- sse2: 3.83x (504 vs. 131 cycles)
Nehalem:
- mmx2: 2.42x (365 vs. 151 cycles)
- sse2: 3.56x (365 vs. 103 cycles)
Skylake:
- mmx2: 1.81x (308 vs. 170 cycles)
- sse2: 2.84x (308 vs. 108 cycles)
- avx: 2.93x (308 vs. 105 cycles)
---
---
libavcodec/x86/h264dsp_init.c | 9 ++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/libavcodec/x86/h264dsp_init.c b/libavcodec/x86/h264dsp_init.c
index c6c643a..7cc0655 100644
--- a/libavcodec/x86/h264dsp_init.c
+++ b/libavcodec/x86/h264dsp_init.c
@@ -110,6 +110,8 @@
---
libavcodec/x86/h264dsp_init.c | 44 +--
1 file changed, 22 insertions(+), 22 deletions(-)
diff --git a/libavcodec/x86/h264dsp_init.c b/libavcodec/x86/h264dsp_init.c
index 7cc0655..7e16dca 100644
--- a/libavcodec/x86/h264dsp_init.c
+++
to
remove it I will keep the code. However, I will probably not write any more
going forward.
James Darnley (4):
avcodec/h264: clean up and expand x86 function definitions
whitespace changes after last commit
avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter
avcodec/h264: mmx2
On 2016-12-02 00:31, Carl Eugen Hoyos wrote:
> 2016-12-01 17:57 GMT+01:00 James Darnley <jdarn...@obe.tv>:
>> Yorkfield:
>> - mmx2: 2.44x faster (278 vs. 114 cycles)
>> - sse2: 3.35x faster (278 vs. 83 cycles)
>>
>> Skylake:
>> - mmx2: 1.69x faster
On 2016-12-01 23:16, Michael Niedermayer wrote:
> On Thu, Dec 01, 2016 at 05:57:44PM +0100, James Darnley wrote:
>> Yorkfield:
>> - mmx2: 2.44x faster (278 vs. 114 cycles)
>> - sse2: 3.35x faster (278 vs. 83 cycles)
>>
>> Skylake:
>> - mmx2: 1.69x faster
---
libavcodec/x86/h264dsp_init.c | 44 +--
1 file changed, 22 insertions(+), 22 deletions(-)
diff --git a/libavcodec/x86/h264dsp_init.c b/libavcodec/x86/h264dsp_init.c
index 3d35f59..ab270da 100644
--- a/libavcodec/x86/h264dsp_init.c
+++
Yorkfield:
- mmx2: 0.99x faster (180 vs. 181 cycles)
- sse2: 1.05x faster (180 vs. 170 cycles)
Skylake:
- mmx2: 1.21x faster (125 vs. 103 cycles)
- sse2: 1.54x faster (125 vs. 81 cycles)
- avx: 1.29x faster (125 vs. 97 cycles)
---
libavcodec/x86/h264_deblock_10bit.asm | 29
Yorkfield:
- mmx2: 2.54x faster (500 vs. 197 cycles)
- sse2: 3.82x faster (500 vs. 131 cycles)
Skylake:
- mmx2: 1.80x faster (317 vs. 176 cycles)
- sse2: 2.81x faster (317 vs. 113 cycles)
- avx: 2.85x faster (317 vs. 111 cycles)
---
libavcodec/x86/h264_deblock_10bit.asm | 39
---
libavcodec/x86/h264dsp_init.c | 9 ++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/libavcodec/x86/h264dsp_init.c b/libavcodec/x86/h264dsp_init.c
index c568762..3d35f59 100644
--- a/libavcodec/x86/h264dsp_init.c
+++ b/libavcodec/x86/h264dsp_init.c
@@ -110,6 +110,8 @@
will definitely
try benchmarking it on my Nehalem after sending these emails.
Suggestions greatly appreciated.
James Darnley (6):
avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter
avcodec/h264: clean up and expand x86 function definitions
whitespace changes after last commit
Yorkfield:
- mmx2: 2.44x faster (278 vs. 114 cycles)
- sse2: 3.35x faster (278 vs. 83 cycles)
Skylake:
- mmx2: 1.69x faster (169 vs. 100 cycles)
- sse2: 2.34x faster (169 vs. 72 cycles)
- avx: 2.32x faster (169 vs. 73 cycles)
---
libavcodec/x86/h264_deblock_10bit.asm | 118
On 2016-11-30 13:57, Ronald S. Bultje wrote:
> On Wed, Nov 30, 2016 at 7:10 AM, James Darnley <jdarn...@obe.tv> wrote:
>>> Nehalem:
>>> - sse2:
>>>- complex: 4.13x faster (1514 vs. 367 cycles)
>>>- simple: 4.38x fas
On 2016-11-29 21:09, Carl Eugen Hoyos wrote:
> 2016-11-29 17:14 GMT+01:00 James Darnley <jdarn...@obe.tv>:
>> On 2016-11-29 15:30, Carl Eugen Hoyos wrote:
>>> 2016-11-29 12:52 GMT+01:00 James Darnley <jdarn...@obe.tv>:
>>>> sse2:
>>>> complex:
On 2016-11-29 21:09, Carl Eugen Hoyos wrote:
> 2016-11-29 17:14 GMT+01:00 James Darnley <jdarn...@obe.tv>:
>> On 2016-11-29 15:30, Carl Eugen Hoyos wrote:
>>> 2016-11-29 12:52 GMT+01:00 James Darnley <jdarn...@obe.tv>:
>>>> sse2:
>>>> complex:
On 2016-11-29 15:30, Carl Eugen Hoyos wrote:
> 2016-11-29 12:52 GMT+01:00 James Darnley <jdarn...@obe.tv>:
>> sse2:
>> complex: 4.13x faster (1514 vs. 367 cycles)
>> simple: 4.38x faster (1836 vs. 419 cycles)
>>
>> avx:
>> complex: 1.07x faster (260
2.1 times faster (401 vs. 194 cycles)
---
libavcodec/x86/h264_deblock.asm | 14 ++
libavcodec/x86/h264dsp_init.c | 2 ++
2 files changed, 16 insertions(+)
diff --git a/libavcodec/x86/h264_deblock.asm b/libavcodec/x86/h264_deblock.asm
index 4aabbc0..fe0ab20 100644
---
sse2:
complex: 4.13x faster (1514 vs. 367 cycles)
simple: 4.38x faster (1836 vs. 419 cycles)
avx:
complex: 1.07x faster (260 vs. 244 cycles)
simple: 1.03x faster (284 vs. 274 cycles)
---
libavcodec/x86/h264_idct_10bit.asm | 53 ++
2.87 times faster (1830 vs. 638 cycles)
---
libavcodec/x86/h264_idct.asm | 32
libavcodec/x86/h264dsp_init.c | 7 ++-
2 files changed, 38 insertions(+), 1 deletion(-)
diff --git a/libavcodec/x86/h264_idct.asm b/libavcodec/x86/h264_idct.asm
index
As the title says: new assembly for the H.264 decoder. Many thanks to the
authors of the 4:2:0 functions. They were fairly easy to adapt after I
saw the pattern in the C, I just had to find it in the asm.
James Darnley (3):
avcodec/h264: mmxext 4:2:2 chroma intra deblock/loop filter
avcodec
I want to add a decoder for a game's music, specifically Falcom's Xanadu
Next. I think the audio could be decompressed by adpcm_ms but the
problem comes from the rest of the format.
The file starts with a riff wave header that lies about being pcm and
other values, but I can force the decoder
On 2016-05-18 20:40, Michael Niedermayer wrote:
> This is the version i had in my pending branch and should be the last
> version of the Code of Conduct from march, IIRC there where no further
> comments on the last version, so iam calling everyone to vote on this.
> Everyone because it should
Hello
I've been working on assembly for the vc2 encoder and have reached an
impasse. My code results in very visible errors, very obvious vertical
streaks in the bottom-right half of the image and some low-frequency
effect (I think).
I cannot see the problem in my code so I need some fresh eyes
On 2016-02-27 04:09, Ryan Schott wrote:
> Hello,
>
> I am not sure if this is the right page to post this, but your consulting
> page recommended I use this list. I recent built an audio visualization app
> using html5. I'm currently using that app with xsplit to stream music to an
> RTMP
On 2016-02-11 23:19, Γιώργος Μεταξάκης wrote:
> Subject: [PATCH] mouse dpi awareness
>
> ---
> libavdevice/gdigrab.c | 28 +++-
> 1 file changed, 15 insertions(+), 13 deletions(-)
>
> diff --git a/libavdevice/gdigrab.c b/libavdevice/gdigrab.c
> index 4428a34..60f184e
On 2016-02-05 21:20, Henrik Gramner wrote:
> Using rNm and x86inc's stack allocation with a negative value at the same
> time isn't supported, and caused the original stack pointer to be clobbered
> when using a compiler that doesn't support stack alignment.
> ---
>
On 2016-02-05 21:20, Paul B Mahol wrote:
> diff --git a/cmdutils.c b/cmdutils.c
> index e0d2807..03a4836 100644
> --- a/cmdutils.c
> +++ b/cmdutils.c
> @@ -1625,7 +1625,7 @@ int show_filters(void *optctx, const char *opt, const
> char *arg)
>( i &&
On 2016-02-04 19:40, Paul B Mahol wrote:
> +#define FN_ENTRY(name) {#name, script_ ## name}
> +struct fn_entry {
> +const char *name;
> +int (*fn)(lua_State *L);
> +};
> +
> +static const struct fn_entry main_fns[] = {
> +FN_ENTRY(log),
> +FN_ENTRY(frame_count),
> +
On 2016-02-02 23:25, Paul B Mahol wrote:
> Hi,
>
> patch attached.
Nice. I look forward to reading it.
Firstly: why limit it to Lua 5.1? I think it should also support
LuaJIT. While it is ABI compatible with 5.1 this patch would require
its headers to be in "lua-5.1". My suggestion would be
2.6 times faster (366 vs. 142 cycles)
---
Changes since last patch:
- name changed to follow 420 version.
- use one less reg by using r4 more (James Almer's suggestion)
- don't require aligned space in the stack, use a negative value as the cglobal
argument. (perhaps unnessecary now that r6
On 2016-01-31 16:58, Umair Khan wrote:
> Hi,
> Thanks for reply. I did a lot of searching but couldn't get what is
> the proper way to resend the patch with amended commit.
> Should I just do git send-email again ?
> And should I send it to this thread only ? If yes, how ?
Yes. Run send-email
On 2016-01-27 19:27, Stefano Sabatini wrote:
> On date Wednesday 2016-01-27 13:56:38 +0100, James Darnley encoded:
>> On 2016-01-11 18:21, Stefano Sabatini wrote:
>>> +This option shows the following information for each processing step,
>>> +in this order: the user pr
On 2016-01-27 13:09, Stefano Sabatini wrote:
> Simplify parsing and consistency.
Fine.
(Ha. It looks like I forgot to press send on this before going out.)
signature.asc
Description: OpenPGP digital signature
___
ffmpeg-devel mailing list
On 2016-01-11 18:21, Stefano Sabatini wrote:
> +This option shows the following information for each processing step,
> +in this order: the user process time (in microseconds), the elapsed
> +relative time (in microseconds), the processing step type, and the
> +relative stream.
What is "relative
On 2016-01-22 14:44, Michael Niedermayer wrote:
> On Fri, Jan 22, 2016 at 03:53:10AM +0100, James Darnley wrote:
>> Someone on IRC asked for a scale that would fit in a given box. This is the
>> answer. I couldn't see it in the existing examples so I thought I would add
>&g
On 2016-01-24 00:24, James Darnley wrote:
> I will try and find out how old the option is.
--mixed was added in 2002:
> https://cygwin.com/git/gitweb.cgi?p=newlib-cygwin.git;a=commit;h=1050e57c9afee171480510d3277877aca29c0f96
signature.asc
Description: OpenPGP digital sig
On 2016-01-23 22:11, charlie.arn...@gmail.com wrote:
> +if enabled msvc; then
> +dst_path=$(pwd -W)
> +else
> +dst_path=$(pwd)
> +fi
> +
If using MSVC through Cygwin is supported this would fail. Its pwd
command does not have the -W option.
Most people probably don't use both.
Someone on IRC asked for a scale that would fit in a given box. This is the
answer. I couldn't see it in the existing examples so I thought I would add it.
---
doc/filters.texi | 6 ++
1 file changed, 6 insertions(+)
diff --git a/doc/filters.texi b/doc/filters.texi
index dd1f203..56236c6
On 2016-01-17 23:59, Henrik Gramner wrote:
> The following patches were recently pushed to x264.
>
> Geza Lore (1):
> x86inc: Add debug symbols indicating sizes of compiled functions
>
> Henrik Gramner (6):
> x86inc: Be more verbose in assertion failures
> x86inc: Improve FMA instruction
On 2016-01-17 03:11, James Darnley wrote:
> On 2016-01-15 20:07, James Darnley wrote:
>> ...
>
> If nobody has further comments about the patches I will probably push
> these after I wake up.
>
A little later than planned but now pushed.
signature.asc
Description: Open
On 2016-01-15 20:07, James Darnley wrote:
> ...
If nobody has further comments about the patches I will probably push
these after I wake up.
signature.asc
Description: OpenPGP digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.
Around 35% faster than the avx version.
Signed-off-by: Henrik Gramner
---
The only changes here are the ones suggested by Henrik and a whitespace change
for alignment at the function definition in v210enc_init.c
---
libavcodec/v210enc.c | 5 +++--
The sample factor must be the same for both 8- and 10-bit functions chosen
otherwise the output will be incorrect.
---
Should I squash this one too?
---
libavcodec/v210enc.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/libavcodec/v210enc.h b/libavcodec/v210enc.h
index
Around 25% faster than the ssse3 version.
---
New patch. Should I squash this into the previous patch before committing?
---
libavcodec/v210enc.c | 11 +--
libavcodec/x86/constants.c| 3 ++-
libavcodec/x86/constants.h| 2 +-
libavcodec/x86/v210enc.asm| 20
---
Is the name I chose for the 10-bit tests (v210-10) okay?
---
tests/fate/vcodec.mak| 3 ++-
tests/ref/vsynth/vsynth1-v210-10 | 4
tests/ref/vsynth/vsynth2-v210-10 | 4
tests/ref/vsynth/vsynth3-v210-10 | 4
tests/ref/vsynth/vsynth_lena-v210-10 | 4
On 2016-01-15 04:21, Ronald S. Bultje wrote:
> If you don't need r%dm (looks like you don't, but didn't check
> exhaustively), you can also use a negative stack size (0 - mmsize -
> ARCH_X86_64 * 2 * mmsize), then it will not create a stack pointer.
I am already using r[0-3]m for storage. (A
On 2016-01-15 03:55, James Almer wrote:
> On 1/14/2016 11:05 PM, James Darnley wrote:
>> diff --git a/libavcodec/x86/h264_deblock.asm
>> b/libavcodec/x86/h264_deblock.asm
>> index 5151f3c..20f0814 100644
>> --- a/libavcodec/x86/h264_deblock.asm
>> +++
On 2016-01-15 21:55, James Almer wrote:
> On 1/15/2016 5:00 PM, James Darnley wrote:
>> On 2016-01-15 03:55, James Almer wrote:
>>> On 1/14/2016 11:05 PM, James Darnley wrote:
>>>> diff --git a/libavcodec/x86/h264_deblock.asm
>>>> b/libavcodec/x86/h2
2.6 times faster
---
I have one question now. Should I make the function name match the assembly
existing deblock/loop filter functions? I took the current name from the C (as
I was originally trying to use a gather instruction but that didn't offer any
benefit).
---
On 2016-01-14 21:42, Henrik Gramner wrote:
> On Thu, Jan 14, 2016 at 9:27 PM, James Darnley <james.darn...@gmail.com>
> wrote:
>> On 2016-01-14 20:21, Henrik Gramner wrote:
>>> xmN can be used unconditionally which gets rid of the %else. E.g.
>>>
On 2016-01-14 20:21, Henrik Gramner wrote:
> On Wed, Jan 13, 2016 at 4:55 PM, James Darnley <james.darn...@gmail.com>
> wrote:
>> diff --git a/libavcodec/x86/v210enc.asm b/libavcodec/x86/v210enc.asm
>> index 859e2d9..a8f3d3c 100644
>> --- a/libavcodec/x86/v210e
Around 35% faster than the avx version.
---
libavcodec/v210enc.c | 5 ++--
libavcodec/v210enc.h | 1 +
libavcodec/x86/v210enc.asm| 53 +++
libavcodec/x86/v210enc_init.c | 7 ++
4 files changed, 49 insertions(+), 17 deletions(-)
On 2015-12-31 07:02, Kieran Kunhya wrote:
>> Apart from that, again from a quick glance, there are a ton of
>> mallocs/frees. Can these somehow get consolidated?
>
> Yes, that's what I don't know how to solve easily. They should of
> course be a single allocated buffer that's reused.
Forgive me
On 2015-12-04 15:33, Nicolas George wrote:
> Why do we need a new options system?
>
> Most importantly: escaping hell
OMG yes! I have seen several times the amount of backslashes Windows
users are forced to use to provide a path to some of the filters.
You raise a lot of good points that
On 2015-12-04 06:29, Ryan Williams wrote:
> EDIT: Fixed errors in syntax.
>
> TLDR, Would you consider an 'underlay' filter or perhaps an option on the
> 'overlay' filter that reverses the order of the input labels?
>
> Consider the following shorthand syntax "[input][a] overlay, [b] overlay,
On 2015-11-19 13:52, Ganesh Ajjanagadde wrote:
> diff --git a/libavfilter/af_dynaudnorm.c b/libavfilter/af_dynaudnorm.c
>> index 8f0c2d0..62a2653 100644
>> --- a/libavfilter/af_dynaudnorm.c
>> +++ b/libavfilter/af_dynaudnorm.c
>> @@ -227,8 +227,6 @@ static int cqueue_pop(cqueue *q)
>> return
On 2015-11-13 15:23, Ganesh Ajjanagadde wrote:
> diff --git a/libavutil/version.h b/libavutil/version.h
> index 909f9a6..ea10ff0 100644
> --- a/libavutil/version.h
> +++ b/libavutil/version.h
> @@ -56,8 +56,8 @@
> */
>
> #define LIBAVUTIL_VERSION_MAJOR 55
> -#define LIBAVUTIL_VERSION_MINOR
On 2015-10-23 13:54, Hendrik Leppkes wrote:
> The only reason the combination of frame threads and HWAccel was
> considered useful is to allow a seamless fallback to multi-threaded
> software decoding if the HWAccel is not available, however the issues
> outlined above far outweight this.
On 2015-10-21 12:18, wm4 wrote:
> with size_t/ptrdiff_t
> being 128 bit, and a new "long long long int" type (I swear, they will
> do it, even if that type name looks horrible).
Please no! Just require a C99 style uint128_t/int128_t type.
signature.asc
Description: OpenPGP digital signature
On 2015-10-21 14:44, Clément Bœsch wrote:
> On Wed, Oct 21, 2015 at 06:00:21AM -0400, Ganesh Ajjanagadde wrote:
> [...]
>> why don't you spend 5 minutes trying to outline to beginners like me
>> what is "actually important" in your view?
>>
>
> According to the first 100 answers of the survey,
On 2015-10-10 23:06, Ganesh Ajjanagadde wrote:
> ...
Is the greatest common denominator (yes, I had to look that up) actually
used anywhere that is slow and needs to be fast?
All the uses of 'av_gcd' found by grep appear be dealing with timing. I
see framerate, timebase, scale. I do see uses
On 2015-10-10 00:43, Ganesh Ajjanagadde wrote:
> During a build, a lot of *.o-hash files are created - had not noticed
> this as they are usually dumped in tmpfs on Linux. However, they
> sometimes are present during a long build in the project directory, making it
> annoying to commit while the
On 2015-10-09 14:46, Nicolas George wrote:
> Le quartidi 4 vendémiaire, an CCXXIV, James Darnley a écrit :
>> I can. You should find it attached to this email. I cleaned it up and
>> put two test cases of data into the file. You will need Lua and the
>> Lua-iconv mod
On 2015-10-03 04:08, Ronald S. Bultje wrote:
> Hi,
>
> On Fri, Oct 2, 2015 at 4:58 PM, Hendrik Leppkes <h.lepp...@gmail.com> wrote:
>
>> On Fri, Oct 2, 2015 at 7:16 PM, Timothy Gu <timothyg...@gmail.com> wrote:
>>> On Fri, Oct 2, 2015 at 10:08 AM James Darn
---
libavcodec/x86/ac3dsp.asm | 2 +-
libavcodec/x86/bswapdsp.asm | 3 +--
libavcodec/x86/diracdsp_yasm.asm| 6 +-
libavcodec/x86/dwt_yasm.asm | 6 +-
libavcodec/x86/h263_loopfilter.asm | 2 +-
libavcodec/x86/h264_chromamc.asm
---
libswscale/x86/Makefile | 1 +
libswscale/x86/constants.asm | 1 +
libswscale/x86/output.asm| 5 +
tests/ref/fate/source| 1 +
4 files changed, 4 insertions(+), 4 deletions(-)
create mode 100644 libswscale/x86/constants.asm
diff --git a/libswscale/x86/Makefile
---
tests/ref/fate/source | 1 +
1 file changed, 1 insertion(+)
diff --git a/tests/ref/fate/source b/tests/ref/fate/source
index 781f4cd..c1383dd 100644
--- a/tests/ref/fate/source
+++ b/tests/ref/fate/source
@@ -9,6 +9,7 @@ libavcodec/reverse.c
libavcodec/x86/constants.asm
On 2015-10-02 19:16, Timothy Gu wrote:
> On Fri, Oct 2, 2015 at 10:08 AM James Darnley <james.darn...@gmail.com>
> wrote:
>
>> The third patch uses them in the remaining inline assembly.
>>
>
> That's the crux of the problem: inline asm uses these constants, but
---
So here is the test file I was working on with the thoughts I had.
---
; This section is intended to possibly be included in x86inc.asm
; Align all constant to 32 bytes whether they are used in AVX code or not.
%assign constant_align 32
; Value to be used as padding to achieve alignment.
---
libavcodec/x86/cavsdsp.c| 2 +-
libavcodec/x86/constants.h | 66 -
libavcodec/x86/inline_asm.h | 2 +-
libavcodec/x86/vc1dsp_mmx.c | 2 +-
4 files changed, 3 insertions(+), 69 deletions(-)
delete mode 100644 libavcodec/x86/constants.h
diff
---
libavfilter/x86/Makefile | 2 ++
libavfilter/x86/af_volume.asm | 3 +--
libavfilter/x86/constants.asm | 1 +
libavfilter/x86/vf_fspp.asm| 3 +--
libavfilter/x86/vf_removegrain.asm | 3 +--
libavfilter/x86/vf_ssim.asm| 2 +-
libavfilter/x86/vf_yadif.asm
---
libavcodec/x86/Makefile | 2 +-
libavcodec/x86/constants.asm | 1 +
libavcodec/x86/constants.c | 81
tests/ref/fate/source| 1 +
4 files changed, 3 insertions(+), 82 deletions(-)
create mode 100644 libavcodec/x86/constants.asm
that it would eliminate those almost pointless
files.
--
James Darnley (7):
avutil: add shared assembly constants
avcodec: replace old C file with new assembly constants
avcodec: use new constants in C inline assembly
avcodec: use new constants in assembly
avfilter: use new constants in assembly
On 2015-10-01 19:25, Paul B Mahol wrote:
> +cglobal maskedmerge8, 10, 11, 3, 0, bsrc, blinesize, osrc, olinesize, msrc,
> mlinesize, dst, dlinesize, w, h
You need a guard to prevent this being compiled on x86.
> +lea bsrcq, [bsrcq+blinesizeq]
> +lea osrcq, [osrcq+olinesizeq]
> +lea
On 2015-09-29 21:56, Clément Bœsch wrote:
> On Tue, Sep 29, 2015 at 09:21:53PM +0200, Hendrik Leppkes wrote:
>> I agree, we have patchcheck for typo checking.
>
> A lot of people do not run patchcheck (I personally never do, and given
> that we fix typo on a regular basis I'm probably not the
301 - 400 of 465 matches
Mail list logo