Le ven. 7 juin 2024, 15:05, Anton Khirnov a écrit :
> They are useless duplicates of corresponding AVCodecContext fields.
>
FYI, the intent of one field was to be a bit field to simultaneously
indicate/use frame threading and one of tile, WPP or slice threading, as
that was also supported in
Le ven. 7 juin 2024, 15:07, Anton Khirnov a écrit :
> if (pps->tiles_enabled_flag &&
> pps->tile_id[ctb_addr_ts] != pps->tile_id[ctb_addr_ts - 1]) {
> int ret;
> -if (s->threads_number == 1)
> +if (!is_wpp)
> ret =
Le jeu. 6 juin 2024 à 08:11, Rémi Denis-Courmont a écrit :
> >James Almer:
> >> uyvytoyuv422_c: 23991.8
> >> uyvytoyuv422_sse2: 2817.8
> >> uyvytoyuv422_avx: 2819.3
> >
> >Why don't you nuke the avx version in a follow-up patch?
>
> Same problem with the RGBA stuff as well. Are the AVX functions
all coefficient banks are aligned. Another use for it
is you can directly use the address in some instruction instead of
using/wasting a reg for holding the data.
--
Christophe Gisquet
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg
Le ven. 8 sept. 2023 à 10:20, Christophe Gisquet
a écrit :
> This patchset requires my previous one improving the cached bitstream
> reader, and serves as its justification. It, basically, moves to using
> VLC wherever possible, and in particular when codewords are
> sufficiently s
Hello,
Le ven. 8 sept. 2023 à 00:39, Andreas Rheinhardt
a écrit :
> This is problematic, because you seem to think that bits_peek(bc, bits)
> ensures that there are at least `bits` available in the cache;
read_vlc* also makes that assumption? Anyway, I'd put that behaviour
(of checking) under
Hello
Le dim. 10 sept. 2023 à 17:40, Andreas Rheinhardt
a écrit :
> Another solution would be to use void* instead of GetBitContext* in the
> header and in the implementation and then convert this void* to
> GetBitContext* in the function.
The forward declaration will be enough.
> I do not
Hello,
Le ven. 8 sept. 2023 à 11:57, Andreas Rheinhardt
a écrit :
> >> +#define CACHED_BITSTREAM_READER 1
> >
> > This should be in the commit switching to the cached bitstream reader.
>
> Correction: This header is included in videotoolbox.c and there is other
> stuff that also includes
Le ven. 8 sept. 2023 à 11:19, Andreas Rheinhardt
a écrit :
> > -return 0;
> > +return 0;
>
> You are adding trailing whitespace.
Sorry, will fix. I had to do some of this work on a misconfigured machine.
> > +#include "libavutil/timer.h"
>
> You really need to look over your patches
Le ven. 8 sept. 2023 à 10:15, Christophe Gisquet
a écrit :
>
> Summary of changes
git send-email --cover-letter apparently didn't let me edit one, so here goes.
This patchset requires my previous one improving the cached bitstream
reader, and serves as its justification. It, basically,
One indirection less, around 1% speedup.
---
libavcodec/proresdec2.c | 16 +---
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/libavcodec/proresdec2.c b/libavcodec/proresdec2.c
index b20021c622..85f81d92d3 100644
--- a/libavcodec/proresdec2.c
+++
---
libavcodec/proresdec2.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/libavcodec/proresdec2.c b/libavcodec/proresdec2.c
index 02e1d82d00..b20021c622 100644
--- a/libavcodec/proresdec2.c
+++ b/libavcodec/proresdec2.c
@@ -534,9 +534,9 @@ static int
Basically, the catch-all codebook is for on average long codewords,
and with a distribution such that the 3-step VLC reading is not
efficient. Furthermore, the complete unrolling make the actual code
smaller than the macro, and as the maximum codelength is smaller,
smaller amounts of bits,
Pretty harmless, but not much gained either.
---
libavcodec/proresdec2.c | 9 +
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/libavcodec/proresdec2.c b/libavcodec/proresdec2.c
index 91c689d9ef..e3cef402d7 100644
--- a/libavcodec/proresdec2.c
+++ b/libavcodec/proresdec2.c
x86/x64: 61/52 -> 55/46
Around 7-10% speedup.
Run and DC do not lend themselves to such changes, likely because
their distribution is less skewed, and need larger average vlc read
iterations.
---
libavcodec/proresdec.h | 1 +
libavcodec/proresdec2.c | 77
Having the various orders and offsets stored in a codebook is compact
but causes additional computations. Using instead a table for the
precomputed results achieve some speedups at the cost of ~132 bytes.
Around 5% speedup.
---
libavcodec/proresdec2.c | 54
Summary of changes
- move back to regular, non-macro, get_bits API
- reduce the lookup to switch the coding method
- shorter reads wherever possible, in particular for the end of bitstream
(16 bits instead of 32, as per the above)
There are cases that really need longer lengths (larger EG
This would have also helped a bitstream reader with a cache
of 32 bits.
---
libavcodec/bitstream_template.h | 14 --
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/libavcodec/bitstream_template.h b/libavcodec/bitstream_template.h
index 3f90fc6a07..c27e8108b2 100644
---
Bitstream readers sometimes have already checked there are enough
bits, and the check is redundant.
---
libavcodec/bitstream.h | 8 +---
libavcodec/bitstream_template.h | 22 +++---
libavcodec/get_bits.h | 1 +
3 files changed, 17 insertions(+), 14
Preparatory patch independently beneficial. Note: all of these
are for the sake of simplicity, from 2020, but needed cleaner
rebasing.
Christophe Gisquet (2):
Expose and start using skip_remaining
read_xbits: request fewer bits
libavcodec/bitstream.h | 8 +---
libavcodec
Hello,
Le ven. 28 oct. 2022 à 20:57, James Darnley a écrit :
> +%else
> +pand m1, m6, m1
> +pandn m0, m6, m0
> +porm0, m0, m1
> +%endif
Isn't that pattern a vpblendb or some such ?
--
Christophe
Hi,
Le dim. 31 janv. 2021 à 14:11, Michael Niedermayer
a écrit :
> This transmutes the following dog into a hyperspace neon dog
> ./ffplay DNxHDtest2.mov
I'm not sure I prefer the correct version, but here goes. This sample
is YUV444 basically, the reverse of what I've seen in another sample.
Le sam. 30 janv. 2021 à 10:54, Paul B Mahol a écrit :
> Are you telling us that you do not have specification for this?
Yes, cf. cover letter. In fact, this patch could be dropped (not sure).
> Last time I checked AVID files had uncompressed alpha that did not matched
> with specification at
From: Christophe Gisquet
This consists in just ignoring the alpha at the end of the bitstream
---
libavcodec/dnxhddec.c | 24 ++--
1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/libavcodec/dnxhddec.c b/libavcodec/dnxhddec.c
index 11da1c286c..1de95996cf 100644
From: Christophe Gisquet
This multiplies the framesize by 1.5 when there is alpha, for the CIDs
allowing alpha. In addition, a new header is checked, because the alpha
marking seems to be different.
---
libavcodec/dnxhd_parser.c | 7 ---
libavcodec/dnxhddata.c| 17
From: Christophe Gisquet
Fix the logic around checking the ACT flag per MB and row.
This also requires adding a 444 path to swap channels into
the ffmpeg formats, as they are GBR, and not RGB.
---
libavcodec/dnxhddec.c | 64 +++
1 file changed, 47
-length.
Christophe Gisquet (4):
lav/dnxhd: better support 4:2:0 in DNXHR profiles
lav/dnxhd: CID 1256 is RGB, not BGR or YUV444
dnxhd: add partial alpha support for parsing
dnxhddec: partial alpha support
libavcodec/dnxhd_parser.c | 7 +-
libavcodec/dnxhddata.c| 17 +++--
libavcodec
From: Christophe Gisquet
Where they are allowed. No validation of profile + colorformat is performed,
however.
---
libavcodec/dnxhddec.c | 55 +++
1 file changed, 40 insertions(+), 15 deletions(-)
diff --git a/libavcodec/dnxhddec.c b/libavcodec
Hi,
Le sam. 5 déc. 2020 à 15:59, Jean-Baptiste Kempf a écrit :
> +After all the emails are in, the TC has 96 hours to give its final decision.
> +
> +### Within TC
> +
> +In the internal case, the TC has 96 hours to give its final decision.
How is the unavailability of any TC member handled?
Hi,
Le jeu. 29 oct. 2020 à 14:57, Christophe Gisquet
a écrit :
> Hi, as you are the only one active on this decoder, this shouldn't matter,
> but:
> down the line, the ffmpeg project has no way of testing if someone
> breaks even the basic parsing of these extensions in the futur
Forgot to add this:
Le jeu. 29 oct. 2020 à 14:51, Christophe Gisquet
a écrit :
> > [1] https://github.com/oddstone/FFmpeg/commits/rext1
>
> This has additional fixes (which looks good, haven't really delved
> into it) that unfortunately doesn't fix:
And I suspect you need thes
Hi,
Le mar. 29 sept. 2020 à 17:55, Linjie Fu a écrit :
> I didn’t see such plans for now, hence adding sufficient error message
> seems to be a proper way.
Hi, as you are the only one active on this decoder, this shouldn't matter, but:
down the line, the ffmpeg project has no way of testing if
Hi,
Le ven. 2 oct. 2020 à 18:12, Guangxin Xu a écrit :
> Most of scc conformance clip has tiles.
> But currently, the hevc software decoder has many issues for tile cabac
> saving and loading.
> We'd better fix them before starting implement scc tool.
>
> I have queue up some patches to address
Le sam. 29 août 2020 à 07:52, Xu Guangxin a écrit :
> you can download it from:
> https://www.itu.int/wftp3/av-arch/jctvc-site/bitstream_exchange/draft_conformance/RExt/WPP_HIGH_TP_444_8BIT_RExt_Apple_2.bit
Just for the record, this is now
Le mer. 9 sept. 2020 à 07:51, Guangxin Xu a écrit :
> Hi Mickaël & all,
> any suggestions?
The patch is almost good, though I would have hoped to link at a
relevant part of the specs and TableStatCoeff* beyong just "9.3".
Though as I suspected, there is probably something missing. Maybe
around
Hi,
Le mer. 15 avr. 2020 à 00:41, Carl Eugen Hoyos a écrit :
> Will test on ppc32 over the weekend.
Please do. Testing on different endianness and different arch is
probably what this patchset lacks the most.
If you can, on this arch, please test just before and just after
Hi,
Le mar. 14 avr. 2020 à 12:25, Christophe Gisquet
a écrit :
> if (is_le)
> -s->cache |= (cache_type)AV_RL_HALF(s->ptr) << s->bits_left;
> +s->cache |= (cache_type)AV_RL_ALL(s->ptr) << s->bits_left;
> else
> -
Described as variant 4 in the linked article.
Results in faster and smaller code. Also, cases for the "refill_all" cases
(usually when we want to empty/fill it) have been inlined.
---
libavcodec/get_bits.h | 103 +-
1 file changed, 41 insertions(+), 62
The main effect is actually code size reduction, due to the smaller
refill code (or difference in inlining decision), e.g. on Win32 of
{magicyuv,huffyuvdec,utvideodec}.o as follows:
19068/41460/16512 -> 18892/40760/16448
It should also be a small speedup (because it simplifies the address
When the entry informs to continue reading, this means the current read
will be entirely skipped. Small object size reduction, depending on
inlining.
---
libavcodec/get_bits.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h
The new code is guaranteed to read at least 32bits, which is likely ok with
the usual case that get_bits without cache can read up to 25.
---
libavcodec/get_bits.h | 29 ++---
1 file changed, 26 insertions(+), 3 deletions(-)
diff --git a/libavcodec/get_bits.h
that
can not guarantee the usual number of bits, are needed.
Note: the MVHA sample was generated using the pattern generation from
VirtualDub2 (Tools->Create test video->zone plates) and the MVHA codec,
and is 235186 bytes.
Christophe Gisquet (7):
fate: add a MVHA test
get_bits: s
---
tests/fate/video.mak | 3 +++
tests/ref/fate/mvha | 6 ++
2 files changed, 9 insertions(+)
create mode 100644 tests/ref/fate/mvha
diff --git a/tests/fate/video.mak b/tests/fate/video.mak
index d2d43e518d..8e54718c16 100644
--- a/tests/fate/video.mak
+++ b/tests/fate/video.mak
@@ -364,6
Therefore, also activate it under ARCH_X86 (testing for more archs welcome)
for the only codecs supporting said cache reader.
For UTVideo, on 8 bits samples and ARCH_X86_32 (X86_64 being unaffected),
timings for one line do ~19.4k -> 15.1k and 16.5k (roughly 17% speedup).
---
Also allows it to not break 32bits readers.
---
libavcodec/get_bits.h | 20 ++--
1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h
index cb4df98e54..59bfbdd88b 100644
--- a/libavcodec/get_bits.h
+++ b/libavcodec/get_bits.h
Hi,
sorry if I'm or was confusing, I'm best-effort here.
2016-06-03 21:13 GMT+02:00 Michael Niedermayer :
> > FOUR.TWO)
[...]
> i want some assistent to help with dayly server admin duties
> most root admins we have help and contribute but are often busy
> raz recently
Hi,
here's I think a list of things left to do. I remember saste doing it
on some occasions.
Please comment on whether you think I have pointed an actual action to
perform. Don't mind the details for now, it's just to get the train
going.
2016-05-30 10:49 GMT+02:00 Michael Niedermayer
2016-05-30 17:50 GMT+02:00 Paul B Mahol :
> On 5/30/16, Piotr Bandurski wrote:
>> Hi,
>>
>>> patch attached.
>>
>> Is decoding of interlaced video supported? Because I get here invalid
>> output.
>>
>> Also crash happens with this fuzzed file:
>>
>>
Hi,
2016-05-30 15:09 GMT+02:00 Paul B Mahol :
Hi,
2016-05-30 15:09 GMT+02:00 Paul B Mahol :
>> ffmpeg seems to have libavutil/qsort.h, but I don't even know how much
>> effort is needed to use it here.
>
> Changed, doesn't help but maybe will for other archs.
Hi,
2016-05-29 21:51 GMT+02:00 Paul B Mahol :
> +typedef struct Slice {
> +uint32_t start;
> +uint32_t size;
> +} Slice;
I'm not a security expert, but is there a reason for not using plain int there ?
> +typedef struct MagicYUVContext {
> +AVFrame*p;
>
Hi,
2016-05-20 1:55 GMT+02:00 Lukasz Marek :
> Is Derek revoked to commit or what? Couldn't he just commit this patch and
> leave? :P I was a problem for some people, but I see they still have
> problems. Let people with problems go away with they problems.
Sorry if
Hi,
2016-05-18 20:40 GMT+02:00 Michael Niedermayer :
> Please state clearly if you agree to the text or if not.
> we can extend and tune it later and do another vote if there are more
> suggestions
I agree to having a CoC.
This text is a first step, so I'm ok with it,
Hi,
2016-05-20 2:38 GMT+02:00 Timothy Gu :
>> > Note how it has a list of specific violations, instead of vague things like
>> > "Be excellent" that the FFmpeg one has.
>> > Note how it has a huge section on disciplinary procedures.
[...]
> I have to agree with Kieran here.
2016-05-13 11:48 GMT+02:00 foo86 :
> -unsigned int v = get_unary(gb, 1, 128);
> +unsigned int v = get_unary(gb, 1, get_bits_left(gb));
Not that the patch is not ok, but I have a few uneducated questions:
1) Given the get_bits_long(gb, k) afterwards, won't that code
2016-05-01 15:33 GMT+02:00 Christophe Gisquet <christophe.gisq...@gmail.com>:
> The loops are guaranteed to be at least multiples of 8, so this
> unrolling is safe but allows exploiting execution ports.
>
> For int32 version: 68 -> 58c.
Ping?
This was ok'ed by James irr
2016-05-07 21:48 GMT+02:00 Rostislav Pehlivanov :
> The costliest part of the encoder right now is encoding the coefficients
> (~36%). Slightly less-costly is rate control (~31%), and after that is the
> transform (~12%). There really isn't anything else, other than 3 copies
2016-05-07 19:12 GMT+02:00 Rostislav Pehlivanov :
> The problem is that with particularly complex images and especially at
> high bit depths and 5-level transforms the coefficients would overflow
I guess it also depends on the transform type, so that counts also for
the last
Hi,
2016-05-06 2:19 GMT+02:00 Rostislav Pehlivanov :
> I plan to merge the fate tests as well tomorrow or on Saturday when I'll
> have time to quickly fix bugs which appear on platforms I haven't tested
> the encoder on. Hopefully none, but you never know.
Sure, makes sense.
---
tests/fate/vcodec.mak | 17 -
tests/ref/vsynth/vsynth1-vc2-420p | 4
tests/ref/vsynth/vsynth1-vc2-420p10 | 4
tests/ref/vsynth/vsynth1-vc2-420p12 | 4
tests/ref/vsynth/vsynth1-vc2-422p | 4
The slice prefix is 0 in the reference encoder and the decoder ignores it.
Writing 0 there seems like the best temporary solution.
The padding could have contained uninitialized data, but reference VC2
encoders put 0xFF there, hence the memset value.
Overall this allows producing bistreams with
Hi,
2016-05-04 3:06 GMT+02:00 Rostislav Pehlivanov :
> vc2hqencode is not the reference encoder, vc2-reference is. It's even worse
> though.
Sorry, I thought authoritative could mean "from the authors", so I
didn't mean it as "the" reference/"the authority". Just a good
Le 3 mai 2016 22:15, "Rostislav Pehlivanov" <atomnu...@gmail.com> a écrit :
>
> On 3 May 2016 at 19:16, Christophe Gisquet <christophe.gisq...@gmail.com>
> wrote:
> >
> >
> > Btw, afaik, the padding is 0xFF, so expecting 0 in the buffer there
&g
Hi,
2016-05-03 19:24 GMT+02:00 Hendrik Leppkes :
>> +// The reference decoder ignores it, and its typical length is 0
>> +memset(put_bits_ptr(pb), 0, s->prefix_bytes);
>> skip_put_bytes(pb, s->prefix_bytes);
>> +
>
> I don't suppose we have a function to just
are added. Suggestions for being even
more concise in the target/rules are welcome.
Christophe Gisquet (2):
vc2enc: prevent random data
vc2: fate tests
libavcodec/vc2enc.c | 4
tests/fate/vcodec.mak | 17 -
tests/ref/vsynth/vsynth1-vc2
The slice prefix is 0 in the reference encoder and the decoder ignores it.
Writing 0 there seems like the best temporary solution.
The padding could have contained uninitialized data, but its standardized value
is 0xFF, hence the memset value.
Overall this allows producing bistreams with no
2016-05-03 19:06 GMT+02:00 Christophe Gisquet <christophe.gisq...@gmail.com>:
[SNIP]
Incorrect padding used (0 instead of 0xFF), fixed in that patch series.
--
Christophe
From 22ff25711062fb1ca30da1674fd622fd6f81c8e3 Mon Sep 17 00:00:00 2001
From: Christophe Gisquet <christ
2016-05-03 19:06 GMT+02:00 Christophe Gisquet <christophe.gisq...@gmail.com>:
> +memset(pb->buf_ptr, 0, pad_c);
Commit squashing fail, attached patch should fix that. This
unfortunately requires updating the fate tests as I generated them
from this squashing.
--
Chr
---
tests/fate/vcodec.mak | 17 -
tests/ref/vsynth/vsynth1-vc2-420p | 4
tests/ref/vsynth/vsynth1-vc2-420p10 | 4
tests/ref/vsynth/vsynth1-vc2-420p12 | 4
tests/ref/vsynth/vsynth1-vc2-422p | 4
Hi,
2016-05-02 16:02 GMT+02:00 Michael Niedermayer :
>> +fate-lossless-wma24-rawtile: CMD = md5 -i
>> $(TARGET_SAMPLES)/lossless-audio/g2_24bit.wma -f s24le
>
> where can i find that file ?
> i assume i should upload it ?
Sorry, I thought we had discussed it in this
2016-05-01 15:33 GMT+02:00 Christophe Gisquet <christophe.gisq...@gmail.com>:
> This is done by actually handling the "prev_values" in the cascaded LMS data
> as if it were int16_t, thus requiring switching at various locations the
> computations.
Patch update s
h raw pcm tiles then.
--
Christophe
From 584999fcce24585f989d2dc770e8c7c85aa19db7 Mon Sep 17 00:00:00 2001
From: Christophe Gisquet <christophe.gisq...@gmail.com>
Date: Mon, 18 Apr 2016 12:53:21 +0200
Subject: [PATCH 1/4] fate: wma: add lossless 24bits tests
Should evaluate coefficients and raw pcm t
Hi,
2016-05-01 15:33 GMT+02:00 Christophe Gisquet <christophe.gisq...@gmail.com>:
> +fate-lossless-wma24-2: CMD = md5 -i
> $(TARGET_SAMPLES)/lossless-audio/Mega_Weird_Audio_Test_24bit.wma -f s24le
The recent fixes actually changed the crc for that file.
Is https://trac.ffmpeg.or
This is done by actually handling the "prev_values" in the cascaded LMS data
as if it were int16_t, thus requiring switching at various locations the
computations.
---
libavcodec/wmalosslessdec.c | 109 +++-
1 file changed, 58 insertions(+), 51 deletions(-)
The unique user so far is wmalossless 24bits. The few samples tested show an
order of 8, so more unrolling or an avx2 version do not make sense.
Timings: 68 -> 49 cycles
---
libavcodec/x86/lossless_audiodsp.asm| 33 +
libavcodec/x86/lossless_audiodsp_init.c |
The loops are guaranteed to be at least multiples of 8, so this
unrolling is safe but allows exploiting execution ports.
For int32 version: 68 -> 58c.
---
libavcodec/lossless_audiodsp.c | 12
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git
Due to the changes to the cascaded LMS coefficients, most of the code
needed a rewrite.
In particular, the SSE4 madd32 code is no longer that similar to be
shared inside a macro.
Christophe Gisquet (4):
fate: wma: add lossless 24bits test
wmalossless: allow calling madd_int16
x86: lossless
---
tests/fate/lossless-audio.mak | 5 -
tests/ref/fate/lossless-wma24-1 | 1 +
tests/ref/fate/lossless-wma24-2 | 1 +
3 files changed, 6 insertions(+), 1 deletion(-)
create mode 100644 tests/ref/fate/lossless-wma24-1
create mode 100644 tests/ref/fate/lossless-wma24-2
diff --git
This is done by actually handling the cascaded LMS data as if it
were int16_t, thus requiring switching at various locations the
computations.
---
libavcodec/wmalosslessdec.c | 146 +---
1 file changed, 84 insertions(+), 62 deletions(-)
diff --git
16bits samples with CDLMS orders of 8 are currently unsupported, but have never
been encountered before.
However, 8 seems to be the most frequent, if not the only order used for 24bits.
In that case, the dsp functions are fine with handling order that are multiples
of 8, so silence the warning.
The unique user so far is wmalossless 24bits. The few samples tested show an
order of 8, so more unrolling or an avx2 version do not make sense.
Timings: 72 -> 49 cycles
---
libavcodec/x86/lossless_audiodsp.asm| 31 +--
libavcodec/x86/lossless_audiodsp_init.c | 7
The loops are guaranteed to be at least multiples of 8, so this
unrolling is safe but allows exploiting execution ports.
For int32 version: 72 -> 57c.
---
libavcodec/lossless_audiodsp.c | 12
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git
Patch 2 is the squashing of several previous commits, as there were
no opinion on their contents nor the way to go.
The SSE4 one is the final version from its last thread.
The last patch in this set is new, and silences a warning that's only
meaningful for 16bits content.
Christophe Gisquet (5
2016-04-29 10:50 GMT+02:00 Paul B Mahol :
> Should be OK if it doesn't break anything.
I'll resend the current state of this patchset for easier testing &
applying. Michael ran this under valgrind with nothing popping up, and
fate passes.
I think the remaining thing is: is the
Hi,
2016-04-20 2:01 GMT+02:00 Ronald S. Bultje :
> This is typically only an issue if the data came from stack. On win64 as
> well as unix64, the 4th argument never comes from stack but is a direct
> register argument instead.
So no benefit except consistency. I don't mind
istophe
From a0d4a96c032d73bc0e34fec320497aefafba3c28 Mon Sep 17 00:00:00 2001
From: Christophe Gisquet <christophe.gisq...@gmail.com>
Date: Mon, 18 Apr 2016 13:20:07 +0200
Subject: [PATCH 5/7] x86: lossless audio: SSE4 madd 32bits
The unique user so far is wmalossless 24bits. The few samples tested show an
o
2016-04-18 22:22 GMT+02:00 Christophe Gisquet <christophe.gisq...@gmail.com>:
> 2016-04-18 19:11 GMT+02:00 James Almer <jamr...@gmail.com>:
>> No way to create one using existing 24bit audio currently available in fate
>> or any redistributable 24 audio out ther
Hi,
2016-04-18 19:11 GMT+02:00 James Almer :
> No way to create one using existing 24bit audio currently available in fate
> or any redistributable 24 audio out there?
> There are some dts-ma and truehd multichannel samples that are not sine waves.
You're right. Just did that,
Hi,
2016-04-18 20:09 GMT+02:00 Michael Niedermayer <mich...@niedermayer.cc>:
> On Mon, Apr 18, 2016 at 03:07:27PM +0200, Christophe Gisquet wrote:
>> This is done by actually handling the cascaded LMS data as if it
>> were int16_t, thus requiring switching at various location
2016-04-18 19:15 GMT+02:00 James Almer <jamr...@gmail.com>:
> On 4/18/2016 10:07 AM, Christophe Gisquet wrote:
>> The loops are guaranteed to be at least multiples of 8, so this
>> unrolling is safe but allows exploiting execution ports.
>>
>> For int32 vers
2016-04-18 18:39 GMT+02:00 Paul B Mahol :
> Better to have real 24bit content.
Yeah, my point, but I'm not sure we'll get one redistribuable in fate,
eg by pinging people from the various tickets.
And when would we decide this is better than nothing?
--
Christophe
2016-04-18 15:07 GMT+02:00 Christophe Gisquet <christophe.gisq...@gmail.com>:
> +fate-lossless-wma24: CMD = md5 -i
> $(TARGET_SAMPLES)/lossless-audio/luckynight-partial-24.wma -f s24le -frames
> 209
Btw, this is the regular luckynight whose samples have been shifted
into 24 bit
The loops are guaranteed to be at least multiples of 8, so this
unrolling is safe but allows exploiting execution ports.
For int32 version: 72 -> 57c.
---
libavcodec/lossless_audiodsp.c | 12
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git
Code size increase is minimal.
---
libavcodec/wmalosslessdec.c | 140 ++--
1 file changed, 57 insertions(+), 83 deletions(-)
diff --git a/libavcodec/wmalosslessdec.c b/libavcodec/wmalosslessdec.c
index 77017ff..27510d4 100644
---
---
tests/fate/lossless-audio.mak | 4 +++-
tests/ref/fate/lossless-wma24 | 1 +
2 files changed, 4 insertions(+), 1 deletion(-)
create mode 100644 tests/ref/fate/lossless-wma24
diff --git a/tests/fate/lossless-audio.mak b/tests/fate/lossless-audio.mak
index 58641ab..ccc4d00 100644
---
This is done by actually handling the cascaded LMS data as if it
were int16_t, thus requiring switching at various locations the
computations.
---
libavcodec/wmalosslessdec.c | 61 +
1 file changed, 61 insertions(+)
diff --git
The unique user so far is wmalossless 24bits. The few samples tested show an
order of 8, so more unrolling or an avx2 version do not make sense.
Timings: 72 -> 49 cycles
---
libavcodec/x86/lossless_audiodsp.asm| 38 +
libavcodec/x86/lossless_audiodsp_init.c |
Cosmetics before macroing it and another function.
---
libavcodec/wmalosslessdec.c | 94 ++---
1 file changed, 47 insertions(+), 47 deletions(-)
diff --git a/libavcodec/wmalosslessdec.c b/libavcodec/wmalosslessdec.c
index 3885dc1..77017ff 100644
---
I think only the 2 first patches are needed, but I prefer the code
from the 3rd+4th patches. Overall, it's still not the nicest code, and
valgrind-proofing the patchset is needed (not possible atm for me).
The SSE4 implementation is not worthwhile in my opinion.
Christophe Gisquet (6):
fate
Hi,
2016-04-12 22:53 GMT+02:00 Paul B Mahol :
> -LLAudDSPContext dsp; ///< accelerated
And later:
> +static int scalarproduct_and_madd_int(int *v1, const int *v2,
> + const int *v3,
> +
Hi,
2016-03-19 19:08 GMT+01:00 Ismail Donmez <ism...@i10z.com>:
>> 2016-03-11 8:57 GMT+01:00 Christophe Gisquet <christophe.gisq...@gmail.com>:
>>>> It should either be reverted or made dependent on
>>>> --enable/disable-debug (I would favor the first, h
1 - 100 of 563 matches
Mail list logo