Re: [FFmpeg-devel] [PATCH] avcodec/hevcdsp: Offset ff_hevc_.pel_filters to simplify addressing
Le dim. 11 févr. 2024 à 12:37, Nuo Mi a écrit : > > -DECLARE_ALIGNED(16, const int8_t, ff_hevc_qpel_filters)[3][16] = { > > +DECLARE_ALIGNED(16, const int8_t, ff_hevc_qpel_filters)[4][16] = { > > > Do you know why this is [4][16]? [4][8] should suffice. Probably so that all coefficient banks are aligned. Another use for it is you can directly use the address in some instruction instead of using/wasting a reg for holding the data. -- Christophe Gisquet ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 1/7] proresdec2: port and fix for cached reader
Le ven. 8 sept. 2023 à 10:20, Christophe Gisquet a écrit : > This patchset requires my previous one improving the cached bitstream > reader, and serves as its justification. It, basically, moves to using > VLC wherever possible, and in particular when codewords are > sufficiently short/there's some kind of well-behaved laplacian > distribution for codewords that make VLCs efficient. > > Total speedup is around 40% here. It's unfortunate I cannot devote as much time and effort to fix some fundamental problems. But as I don't want Andreas to have wasted his time reviewing, and me answering as best as possible, the last state (maybe addressing 90% of the review?) can be obtained from repo at https://github.com/cgisquet/ffmpeg.git branch prores. Best regards, -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 1/2] Expose and start using skip_remaining
Hello, Le ven. 8 sept. 2023 à 00:39, Andreas Rheinhardt a écrit : > This is problematic, because you seem to think that bits_peek(bc, bits) > ensures that there are at least `bits` available in the cache; read_vlc* also makes that assumption? Anyway, I'd put that behaviour (of checking) under (!)UNCHECKED_BITSTREAM_READER, and effectively this is about corrupt/unsupported bitstreams. Maybe some parts of ffmpeg have been wrong for 15 years, and that should be done instead of expecting the reader desyncs and/or checks at the upper level of the loop the exhaustion of the bitstream. > https://github.com/mkver/FFmpeg/commit/fba57506a9cf6be2f4aa57b10d54729fd92a > for a way that fixes this. I can only notice now I neither have the time, nor am enough interested to embark in that scrutiny of current code. I'm OK to wait for the ffmpeg project to have decided on a solution for the specifics you are discussing. > the assumption that the > combined amount of bits consumed in any get_vlc2/GET_RL_VLC/BITS_RL_VLC > call can't exceed 32. Is this assumption actually still true now that we > have multi-vlc stuff? It doesn't change anything there: it operates as the first stage/level of any get_vlc, only it can output more than 1 symbol. > https://github.com/mkver/FFmpeg/commit/9b5a977957968c0718dea55a5b15f060ef6201dc > and > https://github.com/mkver/FFmpeg/commits/aligned32_le_bitstream_reader > are probably also of interest to you. They probably would, had I the time. My goal was really to prevent the prores and multi-symbol from bitrotting too much, but I wasn't expecting these roadblocks. I'm sorry to say I'm dropping the patch series. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 3/7] proresdec2: use VLC for level instead of EC switch
Hello Le dim. 10 sept. 2023 à 17:40, Andreas Rheinhardt a écrit : > Another solution would be to use void* instead of GetBitContext* in the > header and in the implementation and then convert this void* to > GetBitContext* in the function. The forward declaration will be enough. > I do not know what you mean by "the encoder instead". What problem > happens with the encoder? Why would the encoder include proresdec.h at > all and why would it be affected by changes to the decoder? The keyword was "faint", and it's a non-issue now. Just explaining why I would have had that code change that now appears as an attempted fix for videotoolbox, but really was never meant that way: pure luck, the actual reason being lost to time. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 3/7] proresdec2: use VLC for level instead of EC switch
Hello, Le ven. 8 sept. 2023 à 11:57, Andreas Rheinhardt a écrit : > >> +#define CACHED_BITSTREAM_READER 1 > > > > This should be in the commit switching to the cached bitstream reader. > > Correction: This header is included in videotoolbox.c and there is other > stuff that also includes get_bits.h included in said file (and currently > gets included before proresdec.h). This means that proresdec2.c and > videotoolbox.c will have different opinions on what a GetBitContext is: > It will be the non-cached one in videotoolbox.c and the cached one in > proresdec2.c. This will work in practice, because ProresContext does not > need the complete GetBitContext type at all (it does not contain a > GetBitContext at all), so offsets are not affected. But it is > nevertheless undefined behaviour and could become dangerous when using LTO. > > So you should switch the type of the pointer to BitstreamContextBE* in > proresdec2.h. Furthermore, you can either include bitstream.h in > proresdec.h or (IMO better) use a forward declaration and struct > BitstreamContextBE* in the function pointer without including get_bits.h > in the header at all. On that point only: I don't recall (yes that's 3+ years old) the issue being videotoolbox, it didn't have that include back when I wrote this code. It's a very faint recollection, and I don't find proof in the ffmpeg code today or of 3+ years ago, but the problem you mention was happening with the encoder instead. So maybe the fix now is needed only by videotoolbox then. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 7/7] prores: use VLC LUTs
Le ven. 8 sept. 2023 à 11:19, Andreas Rheinhardt a écrit : > > -return 0; > > +return 0; > > You are adding trailing whitespace. Sorry, will fix. I had to do some of this work on a misconfigured machine. > > +#include "libavutil/timer.h" > > You really need to look over your patches once more before you send > them. Both of these changes are obviously not ok to commit. I know the drill. Again, trying my best to help moving a situation that had been rotting for 6 years. > This still incurs an unnecessary indirection. The LUT should not point > to the VLC's, but rather to the VLC tables (as this is the only thing > needed from them lateron given that the number of bits is a compile-time > constant. The LUT should be initialized when the VLCs are initialized. You're right, and by the same logic from my comment, that should save things further. > Seems like these VLCs should be offset by 1 to avoid the "1+". That's what I did in a previous commit, but that was before I could share the tables. I didn't consider creating 5 more tables for this beneficial. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 1/7] proresdec2: port and fix for cached reader
Le ven. 8 sept. 2023 à 10:15, Christophe Gisquet a écrit : > > Summary of changes git send-email --cover-letter apparently didn't let me edit one, so here goes. This patchset requires my previous one improving the cached bitstream reader, and serves as its justification. It, basically, moves to using VLC wherever possible, and in particular when codewords are sufficiently short/there's some kind of well-behaved laplacian distribution for codewords that make VLCs efficient. Total speedup is around 40% here. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 7/7] prores: use VLC LUTs
One indirection less, around 1% speedup. --- libavcodec/proresdec2.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/libavcodec/proresdec2.c b/libavcodec/proresdec2.c index b20021c622..85f81d92d3 100644 --- a/libavcodec/proresdec2.c +++ b/libavcodec/proresdec2.c @@ -561,12 +561,18 @@ static av_always_inline int decode_dc_coeffs(GetBitContext *gb, int16_t *out, prev_dc += (((code + 1) >> 1) ^ sign) - sign; out[0] = prev_dc; } -return 0; +return 0; } +#include "libavutil/timer.h" + + static av_always_inline int decode_ac_coeffs(AVCodecContext *avctx, GetBitContext *gb, int16_t *out, int blocks_per_slice) { + static VLC* lvl_vlc[9] = { _vlc[0], _vlc[1], _vlc[2], _vlc[3], _vlc[0], _vlc[4], _vlc[4], _vlc[4], _vlc[4], }; + static VLC* run_vlc[15] = { _vlc[3], _vlc[3], _vlc[2], _vlc[2], _vlc[0], _vlc[5], _vlc[5], _vlc[5], _vlc[5], + _vlc[4], _vlc[4], _vlc[4], _vlc[4], _vlc[4], _vlc[4], }; const ProresContext *ctx = avctx->priv_data; int block_mask, sign; unsigned pos, run, level; @@ -585,9 +591,7 @@ static av_always_inline int decode_ac_coeffs(AVCodecContext *avctx, GetBitContex break; if (run < 15) { -static const uint8_t ctx_to_tbl[] = { 3, 3, 2, 2, 0, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4 }; -const VLC* tbl = ac_vlc + ctx_to_tbl[run]; -run = get_vlc2(gb, tbl->table, PRORES_LEV_BITS, 3); +run = get_vlc2(gb, run_vlc[run]->table, PRORES_LEV_BITS, 3); } else { unsigned int bits = 21 - 2*av_log2(show_bits(gb, 10)); run = READ_BITS(gb, bits) - 4; // up to 17 bits @@ -599,9 +603,7 @@ static av_always_inline int decode_ac_coeffs(AVCodecContext *avctx, GetBitContex } if (level < 9) { -static const uint8_t ctx_to_tbl[] = { 0, 1, 2, 3, 0, 4, 4, 4, 4 }; -const VLC* tbl = ac_vlc + ctx_to_tbl[level]; -level = 1+get_vlc2(gb, tbl->table, PRORES_LEV_BITS, 3); +level = 1+get_vlc2(gb, lvl_vlc[level]->table, PRORES_LEV_BITS, 3); } else { unsigned int bits = 25 - 2*av_log2(show_bits(gb, 12)); level = READ_BITS(gb, bits) - 4 + 1; // up to 21 bits -- 2.42.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 6/7] proresdec2: remove a useless DC codebook entry
--- libavcodec/proresdec2.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/libavcodec/proresdec2.c b/libavcodec/proresdec2.c index 02e1d82d00..b20021c622 100644 --- a/libavcodec/proresdec2.c +++ b/libavcodec/proresdec2.c @@ -534,9 +534,9 @@ static int decode_picture_header(AVCodecContext *avctx, const uint8_t *buf, cons #define FIRST_DC_CB 0xB8 -static const char dc_codebook[7][4] = { +static const char dc_codebook[6][4] = { { 0, 0, 1, -1 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, -{ 1, 2, 2, 0 }, { 1, 2, 2, 0 }, { 0, 3, 4, -8 }, { 0, 3, 4, -8 } +{ 1, 2, 2, 0 }, { 1, 2, 2, 0 }, { 0, 3, 4, -8 } }; static av_always_inline int decode_dc_coeffs(GetBitContext *gb, int16_t *out, @@ -553,7 +553,7 @@ static av_always_inline int decode_dc_coeffs(GetBitContext *gb, int16_t *out, code = 5; sign = 0; for (i = 1; i < blocks_per_slice; i++, out += 64) { -unsigned int dccb = FFMIN(code, 6U); +unsigned int dccb = FFMIN(code, 5U); DECODE_CODEWORD2(code, dc_codebook[dccb][0], dc_codebook[dccb][1], dc_codebook[dccb][2], dc_codebook[dccb][3]); if(code) sign ^= -(code & 1); -- 2.42.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 5/7] proresdec2: use VLC for small runs and levels
Basically, the catch-all codebook is for on average long codewords, and with a distribution such that the 3-step VLC reading is not efficient. Furthermore, the complete unrolling make the actual code smaller than the macro, and as the maximum codelength is smaller, smaller amounts of bits, optimized for run and for level, can be read. --- libavcodec/proresdec2.c | 53 +++-- 1 file changed, 24 insertions(+), 29 deletions(-) diff --git a/libavcodec/proresdec2.c b/libavcodec/proresdec2.c index e3cef402d7..02e1d82d00 100644 --- a/libavcodec/proresdec2.c +++ b/libavcodec/proresdec2.c @@ -132,7 +132,7 @@ static void unpack_alpha_12(GetBitContext *gb, uint16_t *dst, int num_coeffs, #define AC_BITS 12 #define PRORES_LEV_BITS 9 -static const uint8_t ac_info[] = { 0x04, 0x0A, 0x05, 0x06, 0x28, 0x4C }; +static const uint8_t ac_info[] = { 0x04, 0x0A, 0x05, 0x06, 0x28, 0x29 }; static VLC ac_vlc[6]; static av_cold void init_vlcs(void) @@ -152,9 +152,7 @@ static av_cold void init_vlcs(void) switch_val = (switch_bits+1) << rice_order; // Values are actually transformed, but this is more a wrapping -ac_codes[0] = 0; -ac_bits[0] = 0; -for (ac = 0; ac < (1< max_bits) max_bits = bits; -ac_bits [ac+1] = bits; -ac_codes[ac+1] = code; +ac_bits [ac] = bits; +ac_codes[ac] = code; } ff_free_vlc(ac_vlc+i); @@ -507,12 +505,9 @@ static int decode_picture_header(AVCodecContext *avctx, const uint8_t *buf, cons bits = exp_order - switch_bits + (q<<1);\ val = READ_BITS(gb, bits) - (1 << exp_order) + \ ((switch_bits + 1) << rice_order); \ -} else if (rice_order) {\ -skip_remaining(gb, q+1);\ -val = (q << rice_order) + get_bits(gb, rice_order); \ } else {\ -val = q;\ skip_remaining(gb, q+1);\ +val = rice_order ? (q << rice_order) + get_bits(gb, rice_order) : q;\ } \ } while (0) @@ -527,12 +522,10 @@ static int decode_picture_header(AVCodecContext *avctx, const uint8_t *buf, cons if (q > switch_bits) { /* exp golomb */ \ bits = (q<<1) + (int)diff; \ val = READ_BITS(gb, bits) + (int)offset;\ -} else if (rice_order) {\ -skip_remaining(gb, q+1);\ -val = (q << rice_order) + get_bits(gb, rice_order); \ } else {\ -val = q;\ skip_remaining(gb, q+1);\ +val = rice_order ? (q << rice_order) + show_bits(gb, rice_order) : q; \ +skip_remaining(gb, rice_order); \ } \ } while (0) @@ -571,14 +564,6 @@ static av_always_inline int decode_dc_coeffs(GetBitContext *gb, int16_t *out, return 0; } -// adaptive codebook switching lut according to previous run values -static const char run_to_cb[16][4] = { -{ 2, 0, -1, 1 }, { 2, 0, -1, 1 }, { 1, 0, 0, 0 }, { 1, 0, 0, 0 }, { 0, 0, 1, -1 }, -{ 1, 1, 1, 0 }, { 1, 1, 1, 0 }, { 1, 1, 1, 0 }, { 1, 1, 1, 0 }, -{ 0, 1, 2, -2 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, -{ 0, 2, 3, -4 } -}; - static av_always_inline int decode_ac_coeffs(AVCodecContext *avctx, GetBitContext *gb, int16_t *out, int blocks_per_slice) { @@ -595,22 +580,32 @@ static av_always_inline int decode_ac_coeffs(AVCodecContext *avctx, GetBitContex block_mask = blocks_per_slice - 1; for (pos = block_mask;;) { -static const uint8_t ctx_to_tbl[] = { 0, 1, 2, 3, 0, 4, 4, 4, 4, 5 }; -const VLC* tbl = ac_vlc + ctx_to_tbl[FFMIN(level, 9)]; -unsigned int runcb = FFMIN(run, 15); bits_rem = get_bits_left(gb); -if (!bits_rem || (bits_rem < 16 && !show_bits(gb, bits_rem))) +if (!bits_rem || (bits_rem < 14 && !show_bits(gb, bits_rem))) break; -DECODE_CODEWORD2(run, run_to_cb[runcb][0], run_to_cb[runcb][1], - run_to_cb[runcb][2], run_to_cb[runcb][3]); +if (run < 15) { +static const uint8_t ctx_to_tbl[] = { 3, 3, 2, 2, 0, 5, 5, 5, 5, 4, 4, 4,
[FFmpeg-devel] [PATCH 4/7] proresdec2: offset VLCs by 1 to avoid 1 add
Pretty harmless, but not much gained either. --- libavcodec/proresdec2.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/libavcodec/proresdec2.c b/libavcodec/proresdec2.c index 91c689d9ef..e3cef402d7 100644 --- a/libavcodec/proresdec2.c +++ b/libavcodec/proresdec2.c @@ -152,7 +152,9 @@ static av_cold void init_vlcs(void) switch_val = (switch_bits+1) << rice_order; // Values are actually transformed, but this is more a wrapping -for (ac = 0; ac <1< max_bits) max_bits = bits; -ac_bits [ac] = bits; -ac_codes[ac] = code; +ac_bits [ac+1] = bits; +ac_codes[ac+1] = code; } ff_free_vlc(ac_vlc+i); @@ -609,7 +611,6 @@ static av_always_inline int decode_ac_coeffs(AVCodecContext *avctx, GetBitContex } level = get_vlc2(gb, tbl->table, PRORES_LEV_BITS, 3); -level += 1; i = pos >> log2_block_count; -- 2.42.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 3/7] proresdec2: use VLC for level instead of EC switch
x86/x64: 61/52 -> 55/46 Around 7-10% speedup. Run and DC do not lend themselves to such changes, likely because their distribution is less skewed, and need larger average vlc read iterations. --- libavcodec/proresdec.h | 1 + libavcodec/proresdec2.c | 77 ++--- 2 files changed, 66 insertions(+), 12 deletions(-) diff --git a/libavcodec/proresdec.h b/libavcodec/proresdec.h index 1e48752e6f..7ebacaeb21 100644 --- a/libavcodec/proresdec.h +++ b/libavcodec/proresdec.h @@ -22,6 +22,7 @@ #ifndef AVCODEC_PRORESDEC_H #define AVCODEC_PRORESDEC_H +#define CACHED_BITSTREAM_READER 1 #include "get_bits.h" #include "blockdsp.h" #include "proresdsp.h" diff --git a/libavcodec/proresdec2.c b/libavcodec/proresdec2.c index 65e8b01755..91c689d9ef 100644 --- a/libavcodec/proresdec2.c +++ b/libavcodec/proresdec2.c @@ -24,17 +24,17 @@ * Known FOURCCs: 'apch' (HQ), 'apcn' (SD), 'apcs' (LT), 'apco' (Proxy), 'ap4h' (), 'ap4x' ( XQ) */ -#define CACHED_BITSTREAM_READER 1 +//#define DEBUG #include "config_components.h" #include "libavutil/internal.h" #include "libavutil/mem_internal.h" +#include "libavutil/thread.h" #include "avcodec.h" #include "codec_internal.h" #include "decode.h" -#include "get_bits.h" #include "hwaccel_internal.h" #include "hwconfig.h" #include "idctdsp.h" @@ -129,8 +129,64 @@ static void unpack_alpha_12(GetBitContext *gb, uint16_t *dst, int num_coeffs, } } +#define AC_BITS 12 +#define PRORES_LEV_BITS 9 + +static const uint8_t ac_info[] = { 0x04, 0x0A, 0x05, 0x06, 0x28, 0x4C }; +static VLC ac_vlc[6]; + +static av_cold void init_vlcs(void) +{ +int i; +for (i = 0; i < sizeof(ac_info); i++) { +uint32_t ac_codes[1<> 5; /* rice code order */ +exp_order = (codebook >> 2) & 7; /* exp golomb code order */ + +switch_val = (switch_bits+1) << rice_order; + +// Values are actually transformed, but this is more a wrapping +for (ac = 0; ac <1<= switch_val) { +val += (1 << exp_order) - switch_val; +exponent = av_log2(val); +bits = exponent+1+switch_bits-exp_order/*0*/ + exponent+1/*val*/; +code = val; +} else if (rice_order) { +bits = (val >> rice_order)/*0*/ + 1/*1*/ + rice_order/*val*/; +code = (1 << rice_order) | val; +} else { +bits = val/*0*/ + 1/*1*/; +code = 1; +} +if (bits > max_bits) max_bits = bits; +ac_bits [ac] = bits; +ac_codes[ac] = code; +} + +ff_free_vlc(ac_vlc+i); + +if (init_vlc(ac_vlc+i, PRORES_LEV_BITS, 1pix_fmt = AV_PIX_FMT_NONE; +// init dc_tables +ff_thread_once(_static_once, init_vlcs); + if (avctx->bits_per_raw_sample == 10){ ctx->unpack_alpha = unpack_alpha_10; } else if (avctx->bits_per_raw_sample == 12){ @@ -510,7 +569,7 @@ static av_always_inline int decode_dc_coeffs(GetBitContext *gb, int16_t *out, return 0; } -// adaptive codebook switching lut according to previous run/level values +// adaptive codebook switching lut according to previous run values static const char run_to_cb[16][4] = { { 2, 0, -1, 1 }, { 2, 0, -1, 1 }, { 1, 0, 0, 0 }, { 1, 0, 0, 0 }, { 0, 0, 1, -1 }, { 1, 1, 1, 0 }, { 1, 1, 1, 0 }, { 1, 1, 1, 0 }, { 1, 1, 1, 0 }, @@ -518,12 +577,6 @@ static const char run_to_cb[16][4] = { { 0, 2, 3, -4 } }; -static const char lev_to_cb[10][4] = { -{ 0, 0, 1, -1 }, { 2, 0, 0, -1 }, { 1, 0, 0, 0 }, { 2, 0, -1, 1 }, { 0, 0, 1, -1 }, -{ 0, 1, 2, -2 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, -{ 0, 2, 3, -4 } -}; - static av_always_inline int decode_ac_coeffs(AVCodecContext *avctx, GetBitContext *gb, int16_t *out, int blocks_per_slice) { @@ -540,8 +593,9 @@ static av_always_inline int decode_ac_coeffs(AVCodecContext *avctx, GetBitContex block_mask = blocks_per_slice - 1; for (pos = block_mask;;) { +static const uint8_t ctx_to_tbl[] = { 0, 1, 2, 3, 0, 4, 4, 4, 4, 5 }; +const VLC* tbl = ac_vlc + ctx_to_tbl[FFMIN(level, 9)]; unsigned int runcb = FFMIN(run, 15); -unsigned int levcb = FFMIN(level, 9); bits_rem = get_bits_left(gb); if (!bits_rem || (bits_rem < 16 && !show_bits(gb, bits_rem))) break; @@ -554,8 +608,7 @@ static av_always_inline int decode_ac_coeffs(AVCodecContext *avctx, GetBitContex return AVERROR_INVALIDDATA; } -DECODE_CODEWORD2(level, lev_to_cb[levcb][0], lev_to_cb[levcb][1], -lev_to_cb[levcb][2], lev_to_cb[levcb][3]); +level = get_vlc2(gb, tbl->table, PRORES_LEV_BITS, 3); level += 1;
[FFmpeg-devel] [PATCH 2/7] proresdec2: store precomputed EC parameters
Having the various orders and offsets stored in a codebook is compact but causes additional computations. Using instead a table for the precomputed results achieve some speedups at the cost of ~132 bytes. Around 5% speedup. --- libavcodec/proresdec2.c | 54 +++-- 1 file changed, 47 insertions(+), 7 deletions(-) diff --git a/libavcodec/proresdec2.c b/libavcodec/proresdec2.c index 6e243cfc17..65e8b01755 100644 --- a/libavcodec/proresdec2.c +++ b/libavcodec/proresdec2.c @@ -427,6 +427,7 @@ static int decode_picture_header(AVCodecContext *avctx, const uint8_t *buf, cons # define READ_BITS get_bits #endif +/* Kept for reference and because clearer for first DC */ #define DECODE_CODEWORD(val, codebook) \ do {\ unsigned int rice_order, exp_order, switch_bits;\ @@ -454,18 +455,41 @@ static int decode_picture_header(AVCodecContext *avctx, const uint8_t *buf, cons } \ } while (0) +/* number of bits to switch between rice and exp golomb */ +#define DECODE_CODEWORD2(val, switch_bits, rice_order, diff, offset)\ +do {\ +unsigned int q, buf, bits; \ +\ +buf = show_bits(gb, 14);\ +q = 13 - av_log2(buf); \ +\ +if (q > switch_bits) { /* exp golomb */ \ +bits = (q<<1) + (int)diff; \ +val = READ_BITS(gb, bits) + (int)offset;\ +} else if (rice_order) {\ +skip_remaining(gb, q+1);\ +val = (q << rice_order) + get_bits(gb, rice_order); \ +} else {\ +val = q;\ +skip_remaining(gb, q+1);\ +} \ +} while (0) + + #define TOSIGNED(x) (((x) >> 1) ^ (-((x) & 1))) #define FIRST_DC_CB 0xB8 -static const uint8_t dc_codebook[7] = { 0x04, 0x28, 0x28, 0x4D, 0x4D, 0x70, 0x70}; +static const char dc_codebook[7][4] = { +{ 0, 0, 1, -1 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, +{ 1, 2, 2, 0 }, { 1, 2, 2, 0 }, { 0, 3, 4, -8 }, { 0, 3, 4, -8 } +}; static av_always_inline int decode_dc_coeffs(GetBitContext *gb, int16_t *out, int blocks_per_slice) { int16_t prev_dc; int code, i, sign; - DECODE_CODEWORD(code, FIRST_DC_CB); prev_dc = TOSIGNED(code); out[0] = prev_dc; @@ -475,7 +499,9 @@ static av_always_inline int decode_dc_coeffs(GetBitContext *gb, int16_t *out, code = 5; sign = 0; for (i = 1; i < blocks_per_slice; i++, out += 64) { -DECODE_CODEWORD(code, dc_codebook[FFMIN(code, 6U)]); +unsigned int dccb = FFMIN(code, 6U); +DECODE_CODEWORD2(code, dc_codebook[dccb][0], dc_codebook[dccb][1], + dc_codebook[dccb][2], dc_codebook[dccb][3]); if(code) sign ^= -(code & 1); else sign = 0; prev_dc += (((code + 1) >> 1) ^ sign) - sign; @@ -485,8 +511,18 @@ static av_always_inline int decode_dc_coeffs(GetBitContext *gb, int16_t *out, } // adaptive codebook switching lut according to previous run/level values -static const uint8_t run_to_cb[16] = { 0x06, 0x06, 0x05, 0x05, 0x04, 0x29, 0x29, 0x29, 0x29, 0x28, 0x28, 0x28, 0x28, 0x28, 0x28, 0x4C }; -static const uint8_t lev_to_cb[10] = { 0x04, 0x0A, 0x05, 0x06, 0x04, 0x28, 0x28, 0x28, 0x28, 0x4C }; +static const char run_to_cb[16][4] = { +{ 2, 0, -1, 1 }, { 2, 0, -1, 1 }, { 1, 0, 0, 0 }, { 1, 0, 0, 0 }, { 0, 0, 1, -1 }, +{ 1, 1, 1, 0 }, { 1, 1, 1, 0 }, { 1, 1, 1, 0 }, { 1, 1, 1, 0 }, +{ 0, 1, 2, -2 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, +{ 0, 2, 3, -4 } +}; + +static const char lev_to_cb[10][4] = { +{ 0, 0, 1, -1 }, { 2, 0, 0, -1 }, { 1, 0, 0, 0 }, { 2, 0, -1, 1 }, { 0, 0, 1, -1 }, +{ 0, 1, 2, -2 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, +{ 0, 2, 3, -4 } +}; static av_always_inline int decode_ac_coeffs(AVCodecContext *avctx, GetBitContext *gb, int16_t *out, int blocks_per_slice) @@ -504,18 +540,22 @@ static av_always_inline int decode_ac_coeffs(AVCodecContext *avctx, GetBitContex block_mask =
[FFmpeg-devel] [PATCH 1/7] proresdec2: port and fix for cached reader
Summary of changes - move back to regular, non-macro, get_bits API - reduce the lookup to switch the coding method - shorter reads wherever possible, in particular for the end of bitstream (16 bits instead of 32, as per the above) There are cases that really need longer lengths (larger EG codes) of up to 27 bits. Win64: 6.10 -> 4.87 (~20% speedup) Reference for an hypothetical 32bits version of the cached reader: Win32: 11.4 -> 9.8 (14%, because iDCT is not SIMDed) --- libavcodec/proresdec2.c | 53 ++--- 1 file changed, 23 insertions(+), 30 deletions(-) diff --git a/libavcodec/proresdec2.c b/libavcodec/proresdec2.c index 9297860946..6e243cfc17 100644 --- a/libavcodec/proresdec2.c +++ b/libavcodec/proresdec2.c @@ -24,9 +24,7 @@ * Known FOURCCs: 'apch' (HQ), 'apcn' (SD), 'apcs' (LT), 'apco' (Proxy), 'ap4h' (), 'ap4x' ( XQ) */ -//#define DEBUG - -#define LONG_BITSTREAM_READER +#define CACHED_BITSTREAM_READER 1 #include "config_components.h" @@ -422,35 +420,37 @@ static int decode_picture_header(AVCodecContext *avctx, const uint8_t *buf, cons return pic_data_size; } -#define DECODE_CODEWORD(val, codebook, SKIP)\ +/* bitstream_read may fail on 32bits ARCHS for >24 bits, so use long version there */ +#if 0 //BITSTREAM_BITS == 32 +# define READ_BITS get_bits_long +#else +# define READ_BITS get_bits +#endif + +#define DECODE_CODEWORD(val, codebook) \ do {\ unsigned int rice_order, exp_order, switch_bits;\ unsigned int q, buf, bits; \ \ -UPDATE_CACHE(re, gb); \ -buf = GET_CACHE(re, gb);\ +buf = show_bits(gb, 14);\ \ /* number of bits to switch between rice and exp golomb */ \ switch_bits = codebook & 3;\ rice_order = codebook >> 5; \ exp_order = (codebook >> 2) & 7; \ \ -q = 31 - av_log2(buf); \ +q = 13 - av_log2(buf); \ \ if (q > switch_bits) { /* exp golomb */ \ bits = exp_order - switch_bits + (q<<1);\ -if (bits > FFMIN(MIN_CACHE_BITS, 31)) \ -return AVERROR_INVALIDDATA; \ -val = SHOW_UBITS(re, gb, bits) - (1 << exp_order) + \ +val = READ_BITS(gb, bits) - (1 << exp_order) + \ ((switch_bits + 1) << rice_order); \ -SKIP(re, gb, bits); \ } else if (rice_order) {\ -SKIP_BITS(re, gb, q+1); \ -val = (q << rice_order) + SHOW_UBITS(re, gb, rice_order); \ -SKIP(re, gb, rice_order); \ +skip_remaining(gb, q+1);\ +val = (q << rice_order) + get_bits(gb, rice_order); \ } else {\ val = q;\ -SKIP(re, gb, q+1); \ +skip_remaining(gb, q+1);\ } \ } while (0) @@ -466,9 +466,7 @@ static av_always_inline int decode_dc_coeffs(GetBitContext *gb, int16_t *out, int16_t prev_dc; int code, i, sign; -OPEN_READER(re, gb); - -DECODE_CODEWORD(code, FIRST_DC_CB, LAST_SKIP_BITS); +DECODE_CODEWORD(code, FIRST_DC_CB); prev_dc = TOSIGNED(code); out[0] = prev_dc; @@ -477,13 +475,12 @@ static av_always_inline int decode_dc_coeffs(GetBitContext *gb, int16_t *out, code = 5; sign = 0; for (i = 1; i < blocks_per_slice; i++, out += 64) { -DECODE_CODEWORD(code, dc_codebook[FFMIN(code, 6U)], LAST_SKIP_BITS); +DECODE_CODEWORD(code, dc_codebook[FFMIN(code, 6U)]); if(code) sign ^= -(code & 1); else sign = 0; prev_dc += (((code + 1) >> 1) ^ sign) - sign; out[0] = prev_dc; } -CLOSE_READER(re, gb); return
[FFmpeg-devel] [PATCH 2/2] read_xbits: request fewer bits
This would have also helped a bitstream reader with a cache of 32 bits. --- libavcodec/bitstream_template.h | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/libavcodec/bitstream_template.h b/libavcodec/bitstream_template.h index 3f90fc6a07..c27e8108b2 100644 --- a/libavcodec/bitstream_template.h +++ b/libavcodec/bitstream_template.h @@ -423,8 +423,18 @@ static inline const uint8_t *BS_FUNC(align)(BSCTX *bc) */ static inline int BS_FUNC(read_xbits)(BSCTX *bc, unsigned int n) { -int32_t cache = BS_FUNC(peek)(bc, 32); -int sign = ~cache >> 31; +int32_t cache; +int sign; + +if (n > bc->bits_valid) +BS_FUNC(priv_refill_32)(bc); + +#if defined(BITSTREAM_READER_LE) +cache = bc->bits & 0x; +#else +cache = bc->bits >> 32; +#endif +sign = ~cache >> 31; BS_FUNC(skip_remaining)(bc, n); return uint32_t)(sign ^ cache)) >> (32 - n)) ^ sign) - sign; -- 2.42.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 1/2] Expose and start using skip_remaining
Bitstream readers sometimes have already checked there are enough bits, and the check is redundant. --- libavcodec/bitstream.h | 8 +--- libavcodec/bitstream_template.h | 22 +++--- libavcodec/get_bits.h | 1 + 3 files changed, 17 insertions(+), 14 deletions(-) diff --git a/libavcodec/bitstream.h b/libavcodec/bitstream.h index 35b7873b9c..dd043fb349 100644 --- a/libavcodec/bitstream.h +++ b/libavcodec/bitstream.h @@ -95,6 +95,7 @@ # define bits_peek_signed bits_peek_signed_le # define bits_peek_signed_nz bits_peek_signed_nz_le # define bits_skip bits_skip_le +# define bits_skip_remaining bits_skip_remaining_le # define bits_seek bits_seek_le # define bits_align bits_align_le # define bits_read_xbitsbits_read_xbits_le @@ -124,6 +125,7 @@ # define bits_peek_signed bits_peek_signed_be # define bits_peek_signed_nz bits_peek_signed_nz_be # define bits_skip bits_skip_be +# define bits_skip_remaining bits_skip_remaining_be # define bits_seek bits_seek_be # define bits_align bits_align_be # define bits_read_xbitsbits_read_xbits_be @@ -146,7 +148,7 @@ n = table[index].len; \ \ if (max_depth > 1 && n < 0) { \ -bits_skip(bc, bits);\ +skip_remaining(bc, bits); \ \ nb_bits = -n; \ \ @@ -154,7 +156,7 @@ level = table[index].level; \ n = table[index].len; \ if (max_depth > 2 && n < 0) { \ -bits_skip(bc, nb_bits); \ +skip_remaining(bc, nb_bits);\ nb_bits = -n; \ \ index = bits_peek(bc, nb_bits) + level; \ @@ -163,7 +165,7 @@ } \ } \ run = table[index].run; \ -bits_skip(bc, n); \ +skip_remaining(bc, n); \ } while (0) #endif /* AVCODEC_BITSTREAM_H */ diff --git a/libavcodec/bitstream_template.h b/libavcodec/bitstream_template.h index 0308e3a924..3f90fc6a07 100644 --- a/libavcodec/bitstream_template.h +++ b/libavcodec/bitstream_template.h @@ -175,7 +175,7 @@ static inline uint64_t BS_FUNC(priv_val_show)(BSCTX *bc, unsigned int n) #endif } -static inline void BS_FUNC(priv_skip_remaining)(BSCTX *bc, unsigned int n) +static inline void BS_FUNC(skip_remaining)(BSCTX *bc, unsigned int n) { #ifdef BITSTREAM_TEMPLATE_LE bc->bits >>= n; @@ -192,7 +192,7 @@ static inline uint64_t BS_FUNC(priv_val_get)(BSCTX *bc, unsigned int n) av_assert2(n > 0 && n < 64); ret = BS_FUNC(priv_val_show)(bc, n); -BS_FUNC(priv_skip_remaining)(bc, n); +BS_FUNC(skip_remaining)(bc, n); return ret; } @@ -375,7 +375,7 @@ static inline int BS_FUNC(peek_signed)(BSCTX *bc, unsigned int n) static inline void BS_FUNC(skip)(BSCTX *bc, unsigned int n) { if (n < bc->bits_valid) -BS_FUNC(priv_skip_remaining)(bc, n); +BS_FUNC(skip_remaining)(bc, n); else { n -= bc->bits_valid; bc->bits = 0; @@ -389,7 +389,7 @@ static inline void BS_FUNC(skip)(BSCTX *bc, unsigned int n) } BS_FUNC(priv_refill_64)(bc); if (n) -BS_FUNC(priv_skip_remaining)(bc, n); +BS_FUNC(skip_remaining)(bc, n); } } @@ -425,7 +425,7 @@ static inline int BS_FUNC(read_xbits)(BSCTX *bc, unsigned int n) { int32_t cache = BS_FUNC(peek)(bc, 32); int sign = ~cache >> 31; -BS_FUNC(priv_skip_remaining)(bc, n); +BS_FUNC(skip_remaining)(bc, n); return uint32_t)(sign ^ cache)) >> (32 - n)) ^ sign) - sign; } @@ -508,14 +508,14 @@ static inline int BS_FUNC(read_vlc)(BSCTX *bc, const VLCElem *table, int n= table[idx].len; if (max_depth > 1 && n < 0) { -BS_FUNC(priv_skip_remaining)(bc, bits); +BS_FUNC(skip_remaining)(bc, bits); code = BS_FUNC(priv_set_idx)(bc, code, , _bits, table); if (max_depth > 2 && n < 0) { -BS_FUNC(priv_skip_remaining)(bc, nb_bits); +BS_FUNC(skip_remaining)(bc, nb_bits); code = BS_FUNC(priv_set_idx)(bc, code, , _bits, table); } } -BS_FUNC(priv_skip_remaining)(bc, n); +BS_FUNC(skip_remaining)(bc, n); return code; } @@ -534,17 +534,17 @@ static inline int
[FFmpeg-devel] [PATCH 0/2] cached bistream: small improvements
Preparatory patch independently beneficial. Note: all of these are for the sake of simplicity, from 2020, but needed cleaner rebasing. Christophe Gisquet (2): Expose and start using skip_remaining read_xbits: request fewer bits libavcodec/bitstream.h | 8 +--- libavcodec/bitstream_template.h | 36 + libavcodec/get_bits.h | 1 + 3 files changed, 29 insertions(+), 16 deletions(-) -- 2.42.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avcodec/v210enc: add new function for avx2 avx512 avx512icl
Hello, Le ven. 28 oct. 2022 à 20:57, James Darnley a écrit : > +%else > +pand m1, m6, m1 > +pandn m0, m6, m0 > +porm0, m0, m1 > +%endif Isn't that pattern a vpblendb or some such ? -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 2/4] lav/dnxhd: CID 1256 is RGB, not BGR or YUV444
Hi, Le dim. 31 janv. 2021 à 14:11, Michael Niedermayer a écrit : > This transmutes the following dog into a hyperspace neon dog > ./ffplay DNxHDtest2.mov I'm not sure I prefer the correct version, but here goes. This sample is YUV444 basically, the reverse of what I've seen in another sample. -- Christophe 0002-lav-dnxhd-CID-1256-is-RGB-not-BGR-or-YUV444.patch Description: Binary data ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 3/4] dnxhd: add partial alpha support for parsing
Le sam. 30 janv. 2021 à 10:54, Paul B Mahol a écrit : > Are you telling us that you do not have specification for this? Yes, cf. cover letter. In fact, this patch could be dropped (not sure). > Last time I checked AVID files had uncompressed alpha that did not matched > with specification at all. I don't know, though I wouldn't mind peaking a look. This precise patch does no parsing. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 4/4] dnxhddec: partial alpha support
From: Christophe Gisquet This consists in just ignoring the alpha at the end of the bitstream --- libavcodec/dnxhddec.c | 24 ++-- 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/libavcodec/dnxhddec.c b/libavcodec/dnxhddec.c index 11da1c286c..1de95996cf 100644 --- a/libavcodec/dnxhddec.c +++ b/libavcodec/dnxhddec.c @@ -202,7 +202,7 @@ static int dnxhd_decode_header(DNXHDContext *ctx, AVFrame *frame, ctx->cur_field = 0; } ctx->mbaff = (buf[0x6] >> 5) & 1; -ctx->alpha = buf[0x7] & 1; +ctx->alpha = buf[0x7] & 5; ctx->lla = (buf[0x7] >> 1) & 1; if (ctx->alpha) avpriv_request_sample(ctx->avctx, "alpha"); @@ -249,10 +249,14 @@ static int dnxhd_decode_header(DNXHDContext *ctx, AVFrame *frame, return AVERROR_INVALIDDATA; } else if (bitdepth == 10) { ctx->decode_dct_block = dnxhd_decode_dct_block_10_444; -ctx->pix_fmt = ctx->act ? AV_PIX_FMT_GBRP10 : AV_PIX_FMT_YUV444P10; +ctx->pix_fmt = ctx->act + ? (/*ctx->alpha ? AV_PIX_FMT_GBRAP10LE :*/ AV_PIX_FMT_GBRP10) + : (/*ctx->alpha ? AV_PIX_FMT_YUVA444P10LE :*/ AV_PIX_FMT_YUV444P10); } else { ctx->decode_dct_block = dnxhd_decode_dct_block_12_444; -ctx->pix_fmt = ctx->act ? AV_PIX_FMT_GBRP12 : AV_PIX_FMT_YUV444P12; +ctx->pix_fmt = ctx->act + ? (/*ctx->alpha ? AV_PIX_FMT_GBRAP12LE :*/ AV_PIX_FMT_GBRP12) + : (/*ctx->alpha ? AV_PIX_FMT_YUVA444P12LE :*/ AV_PIX_FMT_YUV444P12); } } else if (bitdepth == 12) { ctx->decode_dct_block = dnxhd_decode_dct_block_12; @@ -337,7 +341,7 @@ static int dnxhd_decode_header(DNXHDContext *ctx, AVFrame *frame, i, 0x170 + (i << 2), ctx->mb_scan_index[i]); if (buf_size - ctx->data_offset < ctx->mb_scan_index[i]) { av_log(ctx->avctx, AV_LOG_ERROR, - "invalid mb scan index (%"PRIu32" vs %u).\n", + "invalid mb %i scan index (%"PRIu32" vs %u).\n", i, ctx->mb_scan_index[i], buf_size - ctx->data_offset); return AVERROR_INVALIDDATA; } @@ -642,6 +646,12 @@ static int dnxhd_decode_row(AVCodecContext *avctx, void *data, } } +/* alpha decoding goes there */ +if (ctx->alpha) { + ff_dlog(ctx->avctx, "Row %d: %d left\n", rownb, + ((rownb < ctx->mb_height-1 ? ctx->mb_scan_index[rownb+1] : ctx->buf_size) - offset) * 8 - get_bits_count(>gb)); +} + return 0; } @@ -735,11 +745,13 @@ decode_coding_unit: case -1: case 0: ctx->pix_fmt = ctx->bit_depth==10 - ? AV_PIX_FMT_GBRP10 : AV_PIX_FMT_GBRP12; + ? (/*ctx->alpha ? AV_PIX_FMT_GBRAP10 :*/ AV_PIX_FMT_GBRP10) + : (/*ctx->alpha ? AV_PIX_FMT_GBRAP12 :*/ AV_PIX_FMT_GBRP12); break; case 1: ctx->pix_fmt = ctx->bit_depth==10 - ? AV_PIX_FMT_YUV444P10 : AV_PIX_FMT_YUV444P12; + ? (/*ctx->alpha ? AV_PIX_FMT_YUVA444P10 :*/ AV_PIX_FMT_YUV444P10) + : (/*ctx->alpha ? AV_PIX_FMT_YUVA444P12 :*/ AV_PIX_FMT_YUV444P12); break; } } -- 2.29.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 3/4] dnxhd: add partial alpha support for parsing
From: Christophe Gisquet This multiplies the framesize by 1.5 when there is alpha, for the CIDs allowing alpha. In addition, a new header is checked, because the alpha marking seems to be different. --- libavcodec/dnxhd_parser.c | 7 --- libavcodec/dnxhddata.c| 17 - libavcodec/dnxhddata.h| 6 -- libavcodec/dnxhdenc.c | 2 +- libavformat/mxfenc.c | 7 --- 5 files changed, 25 insertions(+), 14 deletions(-) diff --git a/libavcodec/dnxhd_parser.c b/libavcodec/dnxhd_parser.c index 63b4ff89e1..726fb2f5de 100644 --- a/libavcodec/dnxhd_parser.c +++ b/libavcodec/dnxhd_parser.c @@ -31,7 +31,7 @@ typedef struct { ParseContext pc; int cur_byte; int remaining; -int w, h; +int w, h, alpha; } DNXHDParserContext; static int dnxhd_find_frame_end(DNXHDParserContext *dctx, @@ -58,6 +58,7 @@ static int dnxhd_find_frame_end(DNXHDParserContext *dctx, if (pic_found && !dctx->remaining) { if (!buf_size) /* EOF considered as end of frame */ return 0; +dctx->alpha = (state >> 8) & 5; for (; i < buf_size; i++) { dctx->cur_byte++; state = (state << 8) | buf[i]; @@ -73,9 +74,9 @@ static int dnxhd_find_frame_end(DNXHDParserContext *dctx, if (cid <= 0) continue; -remaining = avpriv_dnxhd_get_frame_size(cid); +remaining = avpriv_dnxhd_get_frame_size(cid, dctx->alpha); if (remaining <= 0) { -remaining = avpriv_dnxhd_get_hr_frame_size(cid, dctx->w, dctx->h); +remaining = avpriv_dnxhd_get_hr_frame_size(cid, dctx->w, dctx->h, dctx->alpha); if (remaining <= 0) continue; } diff --git a/libavcodec/dnxhddata.c b/libavcodec/dnxhddata.c index 3a69a0f501..54663aa432 100644 --- a/libavcodec/dnxhddata.c +++ b/libavcodec/dnxhddata.c @@ -1083,15 +1083,19 @@ const CIDEntry *ff_dnxhd_get_cid_table(int cid) return NULL; } -int avpriv_dnxhd_get_frame_size(int cid) +int avpriv_dnxhd_get_frame_size(int cid, int alpha) { const CIDEntry *entry = ff_dnxhd_get_cid_table(cid); +int result; if (!entry) return -1; -return entry->frame_size; +result = entry->frame_size; +if (alpha && (entry->flags & DNXHD_444)) +result = (result * 3) >> 1; +return result; } -int avpriv_dnxhd_get_hr_frame_size(int cid, int w, int h) +int avpriv_dnxhd_get_hr_frame_size(int cid, int w, int h, int alpha) { const CIDEntry *entry = ff_dnxhd_get_cid_table(cid); int result; @@ -1099,8 +1103,11 @@ int avpriv_dnxhd_get_hr_frame_size(int cid, int w, int h) if (!entry) return -1; -result = ((h + 15) / 16) * ((w + 15) / 16) * (int64_t)entry->packet_scale.num / entry->packet_scale.den; -result = (result + 2048) / 4096 * 4096; +result = AV_CEIL_RSHIFT(h, 4) * AV_CEIL_RSHIFT(w, 4) + * (int64_t)entry->packet_scale.num / entry->packet_scale.den; +if (alpha && (entry->flags & DNXHD_444)) +result = (result * 3) >> 1; +result = (result + 2048) & -4096; return FFMAX(result, 8192); } diff --git a/libavcodec/dnxhddata.h b/libavcodec/dnxhddata.h index 898079cffc..21738af453 100644 --- a/libavcodec/dnxhddata.h +++ b/libavcodec/dnxhddata.h @@ -35,6 +35,7 @@ /** Frame headers, extra 0x00 added to end for parser */ #define DNXHD_HEADER_INITIAL 0x02800100 #define DNXHD_HEADER_444 0x02800200 +#define DNXHD_HEADER_RGBA0x02800400 /** Indicate that a CIDEntry value must be read in the bitstream */ #define DNXHD_VARIABLE 0 @@ -76,6 +77,7 @@ static av_always_inline uint64_t ff_dnxhd_check_header_prefix(uint64_t prefix) { if (prefix == DNXHD_HEADER_INITIAL || prefix == DNXHD_HEADER_444 || +prefix == DNXHD_HEADER_RGBA|| ff_dnxhd_check_header_prefix_hr(prefix)) return prefix; return 0; @@ -88,8 +90,8 @@ static av_always_inline uint64_t ff_dnxhd_parse_header_prefix(const uint8_t *buf return ff_dnxhd_check_header_prefix(prefix); } -int avpriv_dnxhd_get_frame_size(int cid); -int avpriv_dnxhd_get_hr_frame_size(int cid, int w, int h); +int avpriv_dnxhd_get_frame_size(int cid, int alpha); +int avpriv_dnxhd_get_hr_frame_size(int cid, int w, int h, int alpha); int avpriv_dnxhd_get_interlaced(int cid); #endif /* AVCODEC_DNXHDDATA_H */ diff --git a/libavcodec/dnxhdenc.c b/libavcodec/dnxhdenc.c index 2461c51727..fb059060aa 100644 --- a/libavcodec/dnxhdenc.c +++ b/libavcodec/dnxhdenc.c @@ -467,7 +467,7 @@ static av_cold int dnxhd_encode_init(AVCodecContext *avctx) if (ctx->cid_table->frame_size == DNXHD_VARIABLE) { ctx->frame_size = avpriv_dnxhd_get_hr_frame_size(ctx->cid, -
[FFmpeg-devel] [PATCH 2/4] lav/dnxhd: CID 1256 is RGB, not BGR or YUV444
From: Christophe Gisquet Fix the logic around checking the ACT flag per MB and row. This also requires adding a 444 path to swap channels into the ffmpeg formats, as they are GBR, and not RGB. --- libavcodec/dnxhddec.c | 64 +++ 1 file changed, 47 insertions(+), 17 deletions(-) diff --git a/libavcodec/dnxhddec.c b/libavcodec/dnxhddec.c index 359588f963..11da1c286c 100644 --- a/libavcodec/dnxhddec.c +++ b/libavcodec/dnxhddec.c @@ -249,12 +249,10 @@ static int dnxhd_decode_header(DNXHDContext *ctx, AVFrame *frame, return AVERROR_INVALIDDATA; } else if (bitdepth == 10) { ctx->decode_dct_block = dnxhd_decode_dct_block_10_444; -ctx->pix_fmt = ctx->act ? AV_PIX_FMT_YUV444P10 -: AV_PIX_FMT_GBRP10; +ctx->pix_fmt = ctx->act ? AV_PIX_FMT_GBRP10 : AV_PIX_FMT_YUV444P10; } else { ctx->decode_dct_block = dnxhd_decode_dct_block_12_444; -ctx->pix_fmt = ctx->act ? AV_PIX_FMT_YUV444P12 -: AV_PIX_FMT_GBRP12; +ctx->pix_fmt = ctx->act ? AV_PIX_FMT_GBRP12 : AV_PIX_FMT_YUV444P12; } } else if (bitdepth == 12) { ctx->decode_dct_block = dnxhd_decode_dct_block_12; @@ -504,19 +502,19 @@ static int dnxhd_decode_macroblock(const DNXHDContext *ctx, RowContext *row, qscale = get_bits(>gb, 11); } act = get_bits1(>gb); -if (act) { -if (!ctx->act) { -static int act_warned; -if (!act_warned) { -act_warned = 1; -av_log(ctx->avctx, AV_LOG_ERROR, - "ACT flag set, in violation of frame header.\n"); -} -} else if (row->format == -1) { +if (ctx->act) { +if (row->format == -1) { row->format = act; } else if (row->format != act) { row->format = 2; // Variable } +} else if (act) { +static int act_warned; +if (!act_warned) { +act_warned = 1; +av_log(ctx->avctx, AV_LOG_ERROR, + "ACT flag set, in violation of frame header.\n"); +} } if (qscale != row->last_qscale) { @@ -569,6 +567,21 @@ static int dnxhd_decode_macroblock(const DNXHDContext *ctx, RowContext *row, } break; case DNX_CHROMAFORMAT_444: +if (ctx->avctx->profile == FF_PROFILE_DNXHD) { +ctx->idsp.idct_put(dest_y, dct_linesize_luma, row->blocks[2]); +ctx->idsp.idct_put(dest_y + dct_x_offset, dct_linesize_luma, row->blocks[3]); +ctx->idsp.idct_put(dest_y + dct_y_offset, dct_linesize_luma, row->blocks[8]); +ctx->idsp.idct_put(dest_y + dct_y_offset + dct_x_offset, dct_linesize_luma, row->blocks[9]); + +ctx->idsp.idct_put(dest_u, dct_linesize_luma, row->blocks[4]); +ctx->idsp.idct_put(dest_u + dct_x_offset, dct_linesize_luma, row->blocks[5]); +ctx->idsp.idct_put(dest_u + dct_y_offset, dct_linesize_luma, row->blocks[10]); +ctx->idsp.idct_put(dest_u + dct_y_offset + dct_x_offset, dct_linesize_luma, row->blocks[11]); +ctx->idsp.idct_put(dest_v, dct_linesize_luma, row->blocks[0]); +ctx->idsp.idct_put(dest_v + dct_x_offset, dct_linesize_luma, row->blocks[1]); +ctx->idsp.idct_put(dest_v + dct_y_offset, dct_linesize_luma, row->blocks[6]); +ctx->idsp.idct_put(dest_v + dct_y_offset + dct_x_offset, dct_linesize_luma, row->blocks[7]); +} else { ctx->idsp.idct_put(dest_y, dct_linesize_luma, row->blocks[0]); ctx->idsp.idct_put(dest_y + dct_x_offset, dct_linesize_luma, row->blocks[1]); ctx->idsp.idct_put(dest_y + dct_y_offset, dct_linesize_luma, row->blocks[6]); @@ -585,6 +598,7 @@ static int dnxhd_decode_macroblock(const DNXHDContext *ctx, RowContext *row, ctx->idsp.idct_put(dest_v + dct_y_offset, dct_linesize_chroma, row->blocks[10]); ctx->idsp.idct_put(dest_v + dct_y_offset + dct_x_offset, dct_linesize_chroma, row->blocks[11]); } +} break; case DNX_CHROMAFORMAT_420: ctx->idsp.idct_put(dest_y, dct_linesize_luma, row->blocks[0]); @@ -610,6 +624,8 @@ static int dnxhd_decode_row(AVCodecContext *avctx, void *data, RowContext *row = ctx->rows + threadnb; int x, ret; +row->format = -1; + row->last_dc[0] = r
[FFmpeg-devel] [PATCH 0/4] Better colorspace support in dnxhddec
Nobody complained so the CIDs are likely litle used. This was developped without reference to the ST2019-1:2016 specs (some fields are therefore guessed) but with reference to (unredistributable) samples likely generated by the Avid SDK. I have no idea how the alpha is coded, but it is variable-length. Christophe Gisquet (4): lav/dnxhd: better support 4:2:0 in DNXHR profiles lav/dnxhd: CID 1256 is RGB, not BGR or YUV444 dnxhd: add partial alpha support for parsing dnxhddec: partial alpha support libavcodec/dnxhd_parser.c | 7 +- libavcodec/dnxhddata.c| 17 +++-- libavcodec/dnxhddata.h| 6 +- libavcodec/dnxhddec.c | 139 -- libavcodec/dnxhdenc.c | 2 +- libavformat/mxfenc.c | 7 +- 6 files changed, 128 insertions(+), 50 deletions(-) -- 2.29.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 1/4] lav/dnxhd: better support 4:2:0 in DNXHR profiles
From: Christophe Gisquet Where they are allowed. No validation of profile + colorformat is performed, however. --- libavcodec/dnxhddec.c | 55 +++ 1 file changed, 40 insertions(+), 15 deletions(-) diff --git a/libavcodec/dnxhddec.c b/libavcodec/dnxhddec.c index c78d55aee5..359588f963 100644 --- a/libavcodec/dnxhddec.c +++ b/libavcodec/dnxhddec.c @@ -49,6 +49,13 @@ typedef struct RowContext { int format; } RowContext; +typedef enum { +DNX_CHROMAFORMAT_422 = 0, +DNX_CHROMAFORMAT_420 = 1, +DNX_CHROMAFORMAT_444 = 2, +DNX_CHROMAFORMAT_UNKNOWN = 3, +} DNXChromaFormat; + typedef struct DNXHDContext { AVCodecContext *avctx; RowContext *rows; @@ -67,7 +74,7 @@ typedef struct DNXHDContext { ScanTable scantable; const CIDEntry *cid_table; int bit_depth; // 8, 10, 12 or 0 if not initialized at all. -int is_444; +int chromafmt; int alpha; int lla; int mbaff; @@ -168,6 +175,7 @@ static int dnxhd_decode_header(DNXHDContext *ctx, AVFrame *frame, const uint8_t *buf, int buf_size, int first_field) { +static const char* cfname[4] = { "4:2:2", "4:2:0", "4:4:4", "Unknown" }; int i, cid, ret; int old_bit_depth = ctx->bit_depth, bitdepth; uint64_t header_prefix; @@ -234,8 +242,8 @@ static int dnxhd_decode_header(DNXHDContext *ctx, AVFrame *frame, av_log(ctx->avctx, AV_LOG_WARNING, "Adaptive color transform in an unsupported profile.\n"); -ctx->is_444 = (buf[0x2C] >> 6) & 1; -if (ctx->is_444) { +ctx->chromafmt = (buf[0x2C] >> 5) & 3; +if (ctx->chromafmt == DNX_CHROMAFORMAT_444) { if (bitdepth == 8) { avpriv_request_sample(ctx->avctx, "4:4:4 8 bits"); return AVERROR_INVALIDDATA; @@ -250,16 +258,16 @@ static int dnxhd_decode_header(DNXHDContext *ctx, AVFrame *frame, } } else if (bitdepth == 12) { ctx->decode_dct_block = dnxhd_decode_dct_block_12; -ctx->pix_fmt = AV_PIX_FMT_YUV422P12; +ctx->pix_fmt = ctx->chromafmt == DNX_CHROMAFORMAT_420 ? AV_PIX_FMT_YUV420P12 : AV_PIX_FMT_YUV422P12; } else if (bitdepth == 10) { if (ctx->avctx->profile == FF_PROFILE_DNXHR_HQX) ctx->decode_dct_block = dnxhd_decode_dct_block_10_444; else ctx->decode_dct_block = dnxhd_decode_dct_block_10; -ctx->pix_fmt = AV_PIX_FMT_YUV422P10; +ctx->pix_fmt = ctx->chromafmt == DNX_CHROMAFORMAT_420 ? AV_PIX_FMT_YUV420P10 : AV_PIX_FMT_YUV422P10; } else { ctx->decode_dct_block = dnxhd_decode_dct_block_8; -ctx->pix_fmt = AV_PIX_FMT_YUV422P; +ctx->pix_fmt = ctx->chromafmt == DNX_CHROMAFORMAT_420 ? AV_PIX_FMT_YUV420P : AV_PIX_FMT_YUV422P; } ctx->avctx->bits_per_raw_sample = ctx->bit_depth = bitdepth; @@ -292,8 +300,8 @@ static int dnxhd_decode_header(DNXHDContext *ctx, AVFrame *frame, if ((ctx->height + 15) >> 4 == ctx->mb_height && frame->interlaced_frame) ctx->height <<= 1; -av_log(ctx->avctx, AV_LOG_VERBOSE, "%dx%d, 4:%s %d bits, MBAFF=%d ACT=%d\n", - ctx->width, ctx->height, ctx->is_444 ? "4:4" : "2:2", +av_log(ctx->avctx, AV_LOG_VERBOSE, "%dx%d, %s %d bits, MBAFF=%d ACT=%d\n", + ctx->width, ctx->height, cfname[ctx->chromafmt], ctx->bit_depth, ctx->mbaff, ctx->act); // Newer format supports variable mb_scan_index sizes @@ -360,7 +368,7 @@ static av_always_inline int dnxhd_decode_dct_block(const DNXHDContext *ctx, ctx->bdsp.clear_block(block); -if (!ctx->is_444) { +if (ctx->chromafmt != DNX_CHROMAFORMAT_444) { if (n & 2) { component = 1 + (n & 1); scale = row->chroma_scale; @@ -478,6 +486,9 @@ static int dnxhd_decode_dct_block_12_444(const DNXHDContext *ctx, static int dnxhd_decode_macroblock(const DNXHDContext *ctx, RowContext *row, AVFrame *frame, int x, int y) { +static const char yoff[4] = { 1, 0, 1, 0 }; +static const char xoff[4] = { 0, 0, 1, 0 }; +static const uint8_t num_blocks[4] = { 8, 6, 12, 0 }; int shift1 = ctx->bit_depth >= 10; int dct_linesize_luma = frame->linesize[0]; int dct_linesize_chroma = frame->linesize[1]; @@ -516,7 +527,7 @@ static int dnxhd_decode_macroblock(const DNXHDContext *ctx, RowContext *row, row->last_qscale = qscale; } -for (i = 0; i < 8 + 4 * ctx->is_444; i++) { +for (i = 0; i < num_blocks[ctx->chromafmt]; i++) { if (ctx->decode_dct_block(ctx, row, i) < 0) return AVERROR_
Re: [FFmpeg-devel] [PATCH] [RFC] Tech Resolution Process
Hi, Le sam. 5 déc. 2020 à 15:59, Jean-Baptiste Kempf a écrit : > +After all the emails are in, the TC has 96 hours to give its final decision. > + > +### Within TC > + > +In the internal case, the TC has 96 hours to give its final decision. How is the unavailability of any TC member handled? What about a quorum? Would you have deputy ("fallback") TC members then? The unavailability can just be because the weekend falls in this 96H period, but special (bank or not) holidays also. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 0/7] HEVC native support for Screen content coding
Hi, Le jeu. 29 oct. 2020 à 14:57, Christophe Gisquet a écrit : > Hi, as you are the only one active on this decoder, this shouldn't matter, > but: > down the line, the ffmpeg project has no way of testing if someone > breaks even the basic parsing of these extensions in the future. > To test, the hardware you mention is needed, as well as maybe specific tests. > > At some point, fate lacks some support for verifying h/w decoding. It > would be really nice if some of these companies with all this new > awesome hardware would consider this, and for instance contribute fate > instances to perform such test for the ffmpeg project. Ping? This is irrespective of the patchset being accepted or not. I just wish people are aware of the stakes in the long term. Thanks anyway for investing the time for what you've already done. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 0/7] HEVC native support for Screen content coding
Forgot to add this: Le jeu. 29 oct. 2020 à 14:51, Christophe Gisquet a écrit : > > [1] https://github.com/oddstone/FFmpeg/commits/rext1 > > This has additional fixes (which looks good, haven't really delved > into it) that unfortunately doesn't fix: And I suspect you need these entropy state fixes (and other ones) to be before adding any such tools, because these tools will not pass some fate testing without them. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 0/7] HEVC native support for Screen content coding
Hi, Le mar. 29 sept. 2020 à 17:55, Linjie Fu a écrit : > I didn’t see such plans for now, hence adding sufficient error message > seems to be a proper way. Hi, as you are the only one active on this decoder, this shouldn't matter, but: down the line, the ffmpeg project has no way of testing if someone breaks even the basic parsing of these extensions in the future. To test, the hardware you mention is needed, as well as maybe specific tests. At some point, fate lacks some support for verifying h/w decoding. It would be really nice if some of these companies with all this new awesome hardware would consider this, and for instance contribute fate instances to perform such test for the ffmpeg project. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 0/7] HEVC native support for Screen content coding
Hi, Le ven. 2 oct. 2020 à 18:12, Guangxin Xu a écrit : > Most of scc conformance clip has tiles. > But currently, the hevc software decoder has many issues for tile cabac > saving and loading. > We'd better fix them before starting implement scc tool. > > I have queue up some patches to address the cabac issue at [1] and send the > first one to review at [2] > but, no one responded to me yet. Do you know who can help review the patch? > thanks > > [1] https://github.com/oddstone/FFmpeg/commits/rext1 This has additional fixes (which looks good, haven't really delved into it) that unfortunately doesn't fix: > [2] > https://patchwork.ffmpeg.org/project/ffmpeg/patch/20200829055218.32261-1-oddst...@gmail.com/ this patch being run under fate with THREADS=12 THREAD_TYPE=slice -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 2/2] fate/hevc-conformance: add clip for persistent_rice_adaptation_enabled_flag
Le sam. 29 août 2020 à 07:52, Xu Guangxin a écrit : > you can download it from: > https://www.itu.int/wftp3/av-arch/jctvc-site/bitstream_exchange/draft_conformance/RExt/WPP_HIGH_TP_444_8BIT_RExt_Apple_2.bit Just for the record, this is now https://www.itu.int/wftp3/av-arch/jctvc-site/bitstream_exchange/draft_conformance/RExt/WPP_HIGH_TP_444_8BIT_RExt_Apple_2.zip -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 1/2] avcodec/hevcdec: fix stat_coeff save/load for persistent_rice_adaptation_enabled_flag
Le mer. 9 sept. 2020 à 07:51, Guangxin Xu a écrit : > Hi Mickaël & all, > any suggestions? The patch is almost good, though I would have hoped to link at a relevant part of the specs and TableStatCoeff* beyong just "9.3". Though as I suspected, there is probably something missing. Maybe around sync across (tile) threads? You can check for yourself by running it like: make fate-hevc-conformance-WPP_HIGH_TP_444_8BIT_RExt_Apple_2 THREADS=12 THREAD_TYPE=slice Sure enough, the MD5 becomes random: -0, 0, 0,1, 1179648, 0x78e55a69 -0, 1, 1,1, 1179648, 0x5babb3cb -0, 2, 2,1, 1179648, 0x65935648 +0, 0, 0,1, 1179648, 0xa9a4a727 +0, 1, 1,1, 1179648, 0xf4bfd32d +0, 2, 2,1, 1179648, 0x4f28807a ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 2/7] get_bits: support 32bits cache
Hi, Le mer. 15 avr. 2020 à 00:41, Carl Eugen Hoyos a écrit : > Will test on ppc32 over the weekend. Please do. Testing on different endianness and different arch is probably what this patchset lacks the most. If you can, on this arch, please test just before and just after "0006-get_bits-change-refill-to-RAD-pattern.patch" The impact on x86 was non-trivial, but it may behave quite differently (better?) on PPC. Also, regarding the benchmarking, provided there is an encoder (either in ffmpeg or VfW), I use a soft video, and another which is the soft video with added grain/noise (see eg ffmpeg filter) added on top of it. As the bitstream reader gets faster, the difference in decoding speed over the 2 may increase. Thanks, -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 6/7] get_bits: change refill to RAD pattern
Hi, Le mar. 14 avr. 2020 à 12:25, Christophe Gisquet a écrit : > if (is_le) > -s->cache |= (cache_type)AV_RL_HALF(s->ptr) << s->bits_left; > +s->cache |= (cache_type)AV_RL_ALL(s->ptr) << s->bits_left; > else > -s->cache |= (cache_type)AV_RB_HALF(s->ptr) << (BITSTREAM_HBITS - > s->bits_left); After this, AV_R*_HALF becomes unused, so I'll update the patch to remove them, in addition to any change asked/suggested during review. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 6/7] get_bits: change refill to RAD pattern
Described as variant 4 in the linked article. Results in faster and smaller code. Also, cases for the "refill_all" cases (usually when we want to empty/fill it) have been inlined. --- libavcodec/get_bits.h | 103 +- 1 file changed, 41 insertions(+), 62 deletions(-) diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h index da054ebfcb..baff86ecf6 100644 --- a/libavcodec/get_bits.h +++ b/libavcodec/get_bits.h @@ -153,11 +153,7 @@ static inline unsigned int show_bits(GetBitContext *s, int n); */ #if CACHED_BITSTREAM_READER -# if BITSTREAM_BITS == 32 -# define MIN_CACHE_BITS (32-7) -# else -# define MIN_CACHE_BITS 32 -# endif +# define MIN_CACHE_BITS (BITSTREAM_BITS-7) #elif defined LONG_BITSTREAM_READER # define MIN_CACHE_BITS 32 #else @@ -262,46 +258,21 @@ static inline int get_bits_count(const GetBitContext *s) } #if CACHED_BITSTREAM_READER -static inline void refill_half(GetBitContext *s, int is_le) +// See variant 4 in the following article: +// https://fgiesen.wordpress.com/2018/02/20/reading-bits-in-far-too-many-ways-part-2/ +static inline void refill_gb(GetBitContext *s, int is_le) { #if !UNCHECKED_BITSTREAM_READER if (s->ptr >= s->buffer_end) return; #endif -#if BITSTREAM_BITS == 32 -if (s->bits_left > 16) { -if (is_le) -s->cache |= (uint32_t)s->ptr[0] << s->bits_left; -else -s->cache |= (uint32_t)s->ptr[0] << (32 - s->bits_left); -s->ptr++; -s->bits_left += 8; -return; -} -#endif - if (is_le) -s->cache |= (cache_type)AV_RL_HALF(s->ptr) << s->bits_left; +s->cache |= (cache_type)AV_RL_ALL(s->ptr) << s->bits_left; else -s->cache |= (cache_type)AV_RB_HALF(s->ptr) << (BITSTREAM_HBITS - s->bits_left); -s->ptr += sizeof(s->cache)/2; -s->bits_left += BITSTREAM_HBITS; -} - -static inline void refill_all(GetBitContext *s, int is_le) -{ -#if !UNCHECKED_BITSTREAM_READER -if (s->ptr >= s->buffer_end) -return; -#endif - -if (is_le) -s->cache = AV_RL_ALL(s->ptr); -else -s->cache = AV_RB_ALL(s->ptr); -s->ptr += sizeof(s->cache); -s->bits_left = BITSTREAM_BITS; +s->cache |= (cache_type)AV_RB_ALL(s->ptr) >> s->bits_left; +s->ptr += (BITSTREAM_BITS-1 - s->bits_left) >> 3; +s->bits_left |= BITSTREAM_BITS-8; } static inline cache_type get_val(GetBitContext *s, unsigned n, int is_le) @@ -374,9 +345,9 @@ static inline int get_xbits(GetBitContext *s, int n) if (n > s->bits_left) #ifdef BITSTREAM_READER_LE -refill_half(s, 1); +refill_gb(s, 1); #else -refill_half(s, 0); +refill_gb(s, 0); #endif #if BITSTREAM_BITS == 32 @@ -448,9 +419,9 @@ static inline unsigned int get_bits(GetBitContext *s, int n) av_assert2(n>0 && n<=32); if (n > s->bits_left) { #ifdef BITSTREAM_READER_LE -refill_half(s, 1); +refill_gb(s, 1); #else -refill_half(s, 0); +refill_gb(s, 0); #endif if (s->bits_left < BITSTREAM_HBITS) s->bits_left = n; @@ -486,7 +457,7 @@ static inline unsigned int get_bits_le(GetBitContext *s, int n) #if CACHED_BITSTREAM_READER av_assert2(n>0 && n<=32); if (n > s->bits_left) { -refill_half(s, 1); +refill_gb(s, 1); if (s->bits_left < BITSTREAM_HBITS) s->bits_left = n; } @@ -513,9 +484,9 @@ static inline unsigned int show_bits(GetBitContext *s, int n) #if CACHED_BITSTREAM_READER if (n > s->bits_left) #ifdef BITSTREAM_READER_LE -refill_half(s, 1); +refill_gb(s, 1); #else -refill_half(s, 0); +refill_gb(s, 0); #endif tmp = show_val(s, n); @@ -535,7 +506,6 @@ static inline void skip_bits(GetBitContext *s, int n) skip_remaining(s, n); else { n -= s->bits_left; -s->cache = 0; if (n >= BITSTREAM_BITS) { unsigned skip = n / 8; @@ -543,11 +513,14 @@ static inline void skip_bits(GetBitContext *s, int n) n -= 8*skip; s->ptr += skip; } + #ifdef BITSTREAM_READER_LE -refill_all(s, 1); +s->cache = AV_RL_ALL(s->ptr); #else -refill_all(s, 0); +s->cache = AV_RB_ALL(s->ptr); #endif +s->ptr += sizeof(cache_type); +s->bits_left = BITSTREAM_BITS; if (n) skip_remaining(s, n); } @@ -561,12 +534,15 @@ static inline void skip_bits(GetBitContext *s, int n) static inline unsigned int get_bits1(GetBitContext *s) { #if CACHED_BITSTREAM_READER -if (!s->bits_left) +if (!s->bits_left) { #ifdef BITSTREAM_READER_LE -refill_all(s, 1); +s->cache = AV_RL_ALL(s->ptr); #else -refill_all(s, 0); +s->cache = AV_RB_ALL(s->ptr); #endif +s->ptr += sizeof(cache_type); +s->bits_left = BITSTREAM_BITS; +} #ifdef BITSTREAM_READER_LE
[FFmpeg-devel] [PATCH 4/7] get_bits: replace index by an incremented pointer
The main effect is actually code size reduction, due to the smaller refill code (or difference in inlining decision), e.g. on Win32 of {magicyuv,huffyuvdec,utvideodec}.o as follows: 19068/41460/16512 -> 18892/40760/16448 It should also be a small speedup (because it simplifies the address computation), but no change was measured. --- libavcodec/get_bits.h | 43 +-- 1 file changed, 25 insertions(+), 18 deletions(-) diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h index 59bfbdd88b..4f75f9dd84 100644 --- a/libavcodec/get_bits.h +++ b/libavcodec/get_bits.h @@ -91,10 +91,12 @@ typedef uint32_t cache_type; typedef struct GetBitContext { const uint8_t *buffer, *buffer_end; #if CACHED_BITSTREAM_READER +const uint8_t *ptr; cache_type cache; unsigned bits_left; +#else +int index; // Cached version advances ptr instead #endif -int index; int size_in_bits; int size_in_bits_plus8; } GetBitContext; @@ -253,7 +255,7 @@ static inline unsigned int show_bits(GetBitContext *s, int n); static inline int get_bits_count(const GetBitContext *s) { #if CACHED_BITSTREAM_READER -return s->index - s->bits_left; +return 8*(s->ptr - s->buffer) - s->bits_left; #else return s->index; #endif @@ -263,42 +265,42 @@ static inline int get_bits_count(const GetBitContext *s) static inline void refill_half(GetBitContext *s, int is_le) { #if !UNCHECKED_BITSTREAM_READER -if (s->index >> 3 >= s->buffer_end - s->buffer) +if (s->ptr >= s->buffer_end) return; #endif #if BITSTREAM_BITS == 32 if (s->bits_left > 16) { if (is_le) -s->cache |= (uint32_t)s->buffer[s->index >> 3] << s->bits_left; +s->cache |= (uint32_t)s->ptr[0] << s->bits_left; else -s->cache |= (uint32_t)s->buffer[s->index >> 3] << (32 - s->bits_left); -s->index += 8; +s->cache |= (uint32_t)s->ptr[0] << (32 - s->bits_left); +s->ptr++; s->bits_left += 8; return; } #endif if (is_le) -s->cache |= (cache_type)AV_RL_HALF(s->buffer + (s->index >> 3)) << s->bits_left; +s->cache |= (cache_type)AV_RL_HALF(s->ptr) << s->bits_left; else -s->cache |= (cache_type)AV_RB_HALF(s->buffer + (s->index >> 3)) << (BITSTREAM_HBITS - s->bits_left); -s->index += BITSTREAM_HBITS; +s->cache |= (cache_type)AV_RB_HALF(s->ptr) << (BITSTREAM_HBITS - s->bits_left); +s->ptr += sizeof(s->cache)/2; s->bits_left += BITSTREAM_HBITS; } static inline void refill_all(GetBitContext *s, int is_le) { #if !UNCHECKED_BITSTREAM_READER -if (s->index >> 3 >= s->buffer_end - s->buffer) +if (s->ptr >= s->buffer_end) return; #endif if (is_le) -s->cache = AV_RL_ALL(s->buffer + (s->index >> 3)); +s->cache = AV_RL_ALL(s->ptr); else -s->cache = AV_RB_ALL(s->buffer + (s->index >> 3)); -s->index += BITSTREAM_BITS; +s->cache = AV_RB_ALL(s->ptr); +s->ptr += sizeof(s->cache); s->bits_left = BITSTREAM_BITS; } @@ -534,13 +536,12 @@ static inline void skip_bits(GetBitContext *s, int n) else { n -= s->bits_left; s->cache = 0; -s->bits_left = 0; if (n >= BITSTREAM_BITS) { -unsigned skip = (n / 8) * 8; +unsigned skip = n / 8; -n -= skip; -s->index += skip; +n -= 8*skip; +s->ptr += skip; } #ifdef BITSTREAM_READER_LE refill_all(s, 1); @@ -699,12 +700,14 @@ static inline int init_get_bits_xe(GetBitContext *s, const uint8_t *buffer, s->size_in_bits = bit_size; s->size_in_bits_plus8 = bit_size + 8; s->buffer_end = buffer + buffer_size; -s->index = 0; #if CACHED_BITSTREAM_READER +s->ptr= buffer; s->cache = 0; s->bits_left = 0; refill_all(s, is_le); +#else +s->index = 0; #endif return ret; @@ -757,7 +760,11 @@ static inline const uint8_t *align_get_bits(GetBitContext *s) int n = -get_bits_count(s) & 7; if (n) skip_bits(s, n); +#if CACHED_BITSTREAM_READER +return s->ptr; +#else return s->buffer + (s->index >> 3); +#endif } /** -- 2.26.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 7/7] get_bits: use immediate in skip_remaining
When the entry informs to continue reading, this means the current read will be entirely skipped. Small object size reduction, depending on inlining. --- libavcodec/get_bits.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h index baff86ecf6..d1e29b9917 100644 --- a/libavcodec/get_bits.h +++ b/libavcodec/get_bits.h @@ -793,7 +793,7 @@ static inline const uint8_t *align_get_bits(GetBitContext *s) code = table[index][0];\ n = table[index][1];\ if (max_depth > 2 && n < 0) { \ -LAST_SKIP_BITS(name, gb, nb_bits); \ +LAST_SKIP_BITS(name, gb, bits); \ UPDATE_CACHE(name, gb); \ \ nb_bits = -n; \ @@ -878,7 +878,7 @@ static av_always_inline int get_vlc2(GetBitContext *s, VLC_TYPE (*table)[2], skip_remaining(s, bits); code = set_idx(s, code, , _bits, table); if (max_depth > 2 && n < 0) { -skip_remaining(s, nb_bits); +skip_remaining(s, bits); code = set_idx(s, code, , _bits, table); } } -- 2.26.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 5/7] get_bits: improve and fix get_bits_long for 32b
The new code is guaranteed to read at least 32bits, which is likely ok with the usual case that get_bits without cache can read up to 25. --- libavcodec/get_bits.h | 29 ++--- 1 file changed, 26 insertions(+), 3 deletions(-) diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h index 4f75f9dd84..da054ebfcb 100644 --- a/libavcodec/get_bits.h +++ b/libavcodec/get_bits.h @@ -608,21 +608,44 @@ static inline void skip_bits1(GetBitContext *s) */ static inline unsigned int get_bits_long(GetBitContext *s, int n) { +unsigned ret = 0; av_assert2(n>=0 && n<=32); if (!n) { return 0; #if CACHED_BITSTREAM_READER } -return get_bits(s, n); + +# ifdef BITSTREAM_READER_LE +unsigned left = 0; +# endif +if (n > s->bits_left) { +n -= s->bits_left; +# ifdef BITSTREAM_READER_LE +left = s->bits_left; +ret = get_val(s, s->bits_left, 1); +refill_all(s, 1); +# else +ret = get_val(s, s->bits_left, 0); +refill_all(s, 0); +# endif +} + +#ifdef BITSTREAM_READER_LE +ret = get_val(s, n, 1) << left | ret; +#else +ret = get_val(s, n, 0) | ret << n; +#endif + +return ret; #else } else if (n <= MIN_CACHE_BITS) { return get_bits(s, n); } else { #ifdef BITSTREAM_READER_LE -unsigned ret = get_bits(s, 16); +ret = get_bits(s, 16); return ret | (get_bits(s, n - 16) << 16); #else -unsigned ret = get_bits(s, 16) << (n - 16); +ret = get_bits(s, 16) << (n - 16); return ret | get_bits(s, n - 16); #endif } -- 2.26.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 0/7] Port cache bitstream reader to 32bits, and improve
This patch series gathers all changes affecting the cached reader and the file get_bits.h. The largest consists in modifying the cached reader so that the cache can be selected to be (native) 32 bits large. Then, due to some corner cases from various codecs, reducing some reads or fixing functions that can not guarantee the usual number of bits, are needed. Note: the MVHA sample was generated using the pattern generation from VirtualDub2 (Tools->Create test video->zone plates) and the MVHA codec, and is 235186 bytes. Christophe Gisquet (7): fate: add a MVHA test get_bits: support 32bits cache get_xbits: request fewer bits get_bits: replace index by an incremented pointer get_bits: improve and fix get_bits_long for 32b get_bits: change refill to RAD pattern get_bits: use immediate in skip_remaining libavcodec/get_bits.h | 193 +++- libavcodec/mvha.c | 2 +- libavcodec/utvideodec.c | 2 +- tests/fate/video.mak| 3 + tests/ref/fate/mvha | 6 ++ 5 files changed, 143 insertions(+), 63 deletions(-) create mode 100644 tests/ref/fate/mvha -- 2.26.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 1/7] fate: add a MVHA test
--- tests/fate/video.mak | 3 +++ tests/ref/fate/mvha | 6 ++ 2 files changed, 9 insertions(+) create mode 100644 tests/ref/fate/mvha diff --git a/tests/fate/video.mak b/tests/fate/video.mak index d2d43e518d..8e54718c16 100644 --- a/tests/fate/video.mak +++ b/tests/fate/video.mak @@ -364,6 +364,9 @@ fate-xxan-wc4: CMD = framecrc -i $(TARGET_SAMPLES)/wc4-xan/wc4trailer-partial.av FATE_VIDEO-$(call DEMDEC, WAV, SMVJPEG) += fate-smvjpeg fate-smvjpeg: CMD = framecrc -idct simple -flags +bitexact -i $(TARGET_SAMPLES)/smv/clock.smv -an +FATE_VIDEO-$(call DEMDEC, AVI, MVHA) += fate-mvha +fate-mvha: CMD = framecrc -i $(TARGET_SAMPLES)/mvha/mvha.avi -an + FATE_VIDEO += $(FATE_VIDEO-yes) FATE_SAMPLES_FFMPEG += $(FATE_VIDEO) diff --git a/tests/ref/fate/mvha b/tests/ref/fate/mvha new file mode 100644 index 00..3635f97730 --- /dev/null +++ b/tests/ref/fate/mvha @@ -0,0 +1,6 @@ +#tb 0: 1001/3 +#media_type 0: video +#codec_id 0: rawvideo +#dimensions 0: 640x480 +#sar 0: 0/1 +0, 0, 0,1, 614400, 0xff8fb84b -- 2.26.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 2/7] get_bits: support 32bits cache
Therefore, also activate it under ARCH_X86 (testing for more archs welcome) for the only codecs supporting said cache reader. For UTVideo, on 8 bits samples and ARCH_X86_32 (X86_64 being unaffected), timings for one line do ~19.4k -> 15.1k and 16.5k (roughly 17% speedup). --- libavcodec/get_bits.h | 110 libavcodec/mvha.c | 2 +- libavcodec/utvideodec.c | 2 +- 3 files changed, 80 insertions(+), 34 deletions(-) diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h index 66fb877599..cb4df98e54 100644 --- a/libavcodec/get_bits.h +++ b/libavcodec/get_bits.h @@ -58,10 +58,40 @@ #define CACHED_BITSTREAM_READER 0 #endif +#if CACHED_BITSTREAM_READER + +# ifndef BITSTREAM_BITS +# if HAVE_FAST_64BIT || defined(LONG_BITSTREAM_READER) +# define BITSTREAM_BITS 64 +# else +# define BITSTREAM_BITS 32 +# endif +# endif + +# if BITSTREAM_BITS == 64 +# define BITSTREAM_HBITS 32 +typedef uint64_t cache_type; +# define AV_RB_ALL AV_RB64 +# define AV_RL_ALL AV_RL64 +# define AV_RB_HALF AV_RB32 +# define AV_RL_HALF AV_RL32 +# define CACHE_TYPE(a) UINT64_C(a) +# else +# define BITSTREAM_HBITS 16 +typedef uint32_t cache_type; +# define AV_RB_ALL AV_RB32 +# define AV_RL_ALL AV_RL32 +# define AV_RB_HALF AV_RB16 +# define AV_RL_HALF AV_RL16 +# define CACHE_TYPE(a) UINT32_C(a) +#endif + +#endif + typedef struct GetBitContext { const uint8_t *buffer, *buffer_end; #if CACHED_BITSTREAM_READER -uint64_t cache; +cache_type cache; unsigned bits_left; #endif int index; @@ -121,7 +151,11 @@ static inline unsigned int show_bits(GetBitContext *s, int n); */ #if CACHED_BITSTREAM_READER -# define MIN_CACHE_BITS 64 +# if BITSTREAM_BITS == 32 +# define MIN_CACHE_BITS (32-7) +# else +# define MIN_CACHE_BITS 32 +# endif #elif defined LONG_BITSTREAM_READER # define MIN_CACHE_BITS 32 #else @@ -226,22 +260,34 @@ static inline int get_bits_count(const GetBitContext *s) } #if CACHED_BITSTREAM_READER -static inline void refill_32(GetBitContext *s, int is_le) +static inline void refill_half(GetBitContext *s, int is_le) { #if !UNCHECKED_BITSTREAM_READER if (s->index >> 3 >= s->buffer_end - s->buffer) return; #endif +#if BITSTREAM_BITS == 32 +if (s->bits_left > 16) { +if (is_le) +s->cache |= (uint32_t)s->buffer[s->index >> 3] << s->bits_left; +else +s->cache |= (uint32_t)s->buffer[s->index >> 3] << (32 - s->bits_left); +s->index += 8; +s->bits_left += 8; +return; +} +#endif + if (is_le) -s->cache = (uint64_t)AV_RL32(s->buffer + (s->index >> 3)) << s->bits_left | s->cache; +s->cache |= (cache_type)AV_RL_HALF(s->buffer + (s->index >> 3)) << s->bits_left; else -s->cache = s->cache | (uint64_t)AV_RB32(s->buffer + (s->index >> 3)) << (32 - s->bits_left); -s->index += 32; -s->bits_left += 32; +s->cache |= (cache_type)AV_RB_HALF(s->buffer + (s->index >> 3)) << (BITSTREAM_HBITS - s->bits_left); +s->index += BITSTREAM_HBITS; +s->bits_left += BITSTREAM_HBITS; } -static inline void refill_64(GetBitContext *s, int is_le) +static inline void refill_all(GetBitContext *s, int is_le) { #if !UNCHECKED_BITSTREAM_READER if (s->index >> 3 >= s->buffer_end - s->buffer) @@ -249,22 +295,22 @@ static inline void refill_64(GetBitContext *s, int is_le) #endif if (is_le) -s->cache = AV_RL64(s->buffer + (s->index >> 3)); +s->cache = AV_RL_ALL(s->buffer + (s->index >> 3)); else -s->cache = AV_RB64(s->buffer + (s->index >> 3)); -s->index += 64; -s->bits_left = 64; +s->cache = AV_RB_ALL(s->buffer + (s->index >> 3)); +s->index += BITSTREAM_BITS; +s->bits_left = BITSTREAM_BITS; } -static inline uint64_t get_val(GetBitContext *s, unsigned n, int is_le) +static inline cache_type get_val(GetBitContext *s, unsigned n, int is_le) { -uint64_t ret; +cache_type ret; av_assert2(n>0 && n<=63); if (is_le) { -ret = s->cache & ((UINT64_C(1) << n) - 1); +ret = s->cache & ((CACHE_TYPE(1) << n) - 1); s->cache >>= n; } else { -ret = s->cache >> (64 - n); +ret = s->cache >> (BITSTREAM_BITS - n); s->cache <<= n; } s->bits_left -= n; @@ -274,12 +320,12 @@ static inline uint64_t get_val(GetBitContext *s, unsigned n, int is_le) static inline unsigned show_val(const GetBitContext *s, unsigned n) { #ifdef BITSTREAM_READER_LE -return s->cache & ((UINT64_C(1) << n) - 1); +return s->cache & ((CACHE_TYPE(1) << n) - 1); #else -return s->cache >> (64 - n); +return s->cache >> (BITSTREAM_BITS - n); #endif } -#endif +#endif // ~CACHED_BITSTREAM_READER /** * Skips the specified number of bits. @@ -384,11 +430,11 @@ static inline unsigned int get_bits(GetBitContext *s, int n) av_assert2(n>0 && n<=32); if (n >
[FFmpeg-devel] [PATCH 3/7] get_xbits: request fewer bits
Also allows it to not break 32bits readers. --- libavcodec/get_bits.h | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h index cb4df98e54..59bfbdd88b 100644 --- a/libavcodec/get_bits.h +++ b/libavcodec/get_bits.h @@ -367,8 +367,24 @@ static inline void skip_remaining(GetBitContext *s, unsigned n) static inline int get_xbits(GetBitContext *s, int n) { #if CACHED_BITSTREAM_READER -int32_t cache = show_bits(s, 32); -int sign = ~cache >> 31; +int32_t cache; +int sign; + +if (n > s->bits_left) +#ifdef BITSTREAM_READER_LE +refill_half(s, 1); +#else +refill_half(s, 0); +#endif + +#if BITSTREAM_BITS == 32 +cache = s->cache; +#elif defined(BITSTREAM_READER_LE) +cache = s->cache & 0x; +#else +cache = s->cache >> 32; +#endif +sign = ~cache >> 31; skip_remaining(s, n); return uint32_t)(sign ^ cache)) >> (32 - n)) ^ sign) - sign; -- 2.26.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] IRC meeting
Hi, sorry if I'm or was confusing, I'm best-effort here. 2016-06-03 21:13 GMT+02:00 Michael Niedermayer: > > FOUR.TWO) [...] > i want some assistent to help with dayly server admin duties > most root admins we have help and contribute but are often busy > raz recently set up a full backup system for us, someone seems > helping with security updates as iam not always the first doing them > (i think its lou but didnt check) and all kinds of other things ... > > what would be really nice would be someone who has some time and for > whom server admining is a fun thing to do, > someone who would do it "because it needs to be done" would be 2nd > choice IMHO You are actually replying to FOUR) This seems to be "ask someone to join in the rotation of tasks", rather than a full-blown delegation of work. That's an option. I don't pretend to have correctly worded the action of FOUR) > iam not suggesting anything specific but there is one thing that i > think i have not seen talked about and that is moderation. Mailman > supports moderating individual subscribers. > > It might be along the lines of > If one repeatly and conciously violates the CoC and no real solution > can be found, he can be given the choice by the mailman admins to > either promise to attempt not to repeat the violation > or to be moderated until an event occurs that changes the situation > or some timeout. > The insulted person should have the option to veto this at any time so > if one feels that it wasnt enough to justify the inconvenience the > "hurt" party should be able to stop this. > This would have to be combined with something effective for IRC and > possibly git, in case issues shift there too You are actually replying to point ONE, more precisely, "how to modify it" (the CoC). There are many opinions. I think an amount of time best described by some aleph can be spent discussing the details, but I bet people are ok with "use VLC's or whichever, then vote for improvement". You are proposing an improvement. The point here is, several people seem to want things to move here irrespective of ONE, so a vote seems (to me) the natural step forward. So, either someone acts towards the action, and it may happen, or clearly, nothing will happen. I'm resting my opinion that we are putting the cart before the horse, but I recognize the need from people for action, so I'm ready to vote on that point, irrespective of ONE. If it happens. Best regards, -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] IRC meeting
Hi, here's I think a list of things left to do. I remember saste doing it on some occasions. Please comment on whether you think I have pointed an actual action to perform. Don't mind the details for now, it's just to get the train going. 2016-05-30 10:49 GMT+02:00 Michael Niedermayer: ONE) > May 28 19:07:54https://wiki.videolan.org/Code_of_Conduct/ [...] > May 28 19:29:06 so lets add that to current CoC and put it > for vote on ML? Action: put to vote the addition of the "repercussions" from the linked page. There was some discussion as how to modify it. Whether to accept as is, or delay for additions can be included in the vote, IMO: > May 28 19:29:40durandal_1707: send a draft first and get > comments to improve it TWO) > May 28 19:38:00 AVClass & AVOption should be added to all > public "Context" structs for API consistency and to make it easier for apps > to support multiple ffmpeg versios and distros Action: put to vote (vote ongoing). THREE) > May 28 20:21:19liabvutil is currently the only non modular > library. literally everything is compiled and installed no matter your > configure options [...] > May 28 20:31:19Err what's the result on the previous topic? > Send patches and it'll get reviewed but the end goal is ok? (I don't mind) > May 28 20:31:47kurosu_: sounds like it yes Action: whoever is interested will submit patches, with the idea that the end goal is worth reaching. FOUR) > May 28 20:32:54 that's the other issue; we don't really have > someone to handle the sysadmin stuff > May 28 20:35:58 i feel like we really need someone to > officially handle the sysadmin stuff > May 28 20:33:14 does anyone have a sysadmin in his > relationships that would be interested in that? > May 28 20:34:09 we have a virtual box in bulgaria that ffbox0 > could be moved to if teres a volunteer > May 28 20:52:40VLC have offered to sysadmin for years > May 28 20:55:01 i put it on the table, my offer to admin again > May 28 21:22:30 iive: makes a good suggestion, GitHub would release > at least two services (git and trac). For trac to GitHub you could look at > something like: https://github.com/trustmaster/trac2github It also might make > the project more accessible to new contributors Action: decide how and who performs admin tasks. I think the above lists the possible options in a vote: - Compn as admin - Delegate admin tasks to VLC (seemed related to hosting too) - Draft a request for admins to be circulated - Move stuff to bulgaria - Move stuff to github FOUR.TWO) > May 28 20:58:44 we probably should config postfix or > spamassasin to check DMARK/DKIM/SPF or part of that on incoming mai (not > really important but i thn it doest curretly) Details on what needs to be done, I think this is not high-level enough for a vote. Action: michaelni to list some wishes? FIVE) > May 28 21:28:47 i want possibility to fund devs to work on > specific part of FFmpeg > May 28 21:28:59 what happened with FFmtech? > May 28 21:29:38 at the moment we have a total of ~15K USD in the SPI > and ffis.de funds > May 28 21:30:58 saste, can we fund someone maybe to make > kierans fuzzing GSoC project a reality ? i mean if people agree to that > May 28 21:31:59 so just need to pick some part of codebase > that need refactoring/cleaning up/improving? > May 28 21:59:08Kierank mentioned btw he was willing to fund > some work on libavfilter API that would suit his needs, the details of which > he'll give to whomever is integrated Action: list the sponsoring opportunities for work on ffmpeg: tasks and origin of funds? Subpoints: - better lavfi API for some usecases - find whether SPI/... can be used for that SIX) > May 28 19:32:58since cehoyos is here, we could maybe talk > about his behavior and why the CoC and repercussions for violating it was > introduced to begin with > May 28 21:51:31is there anything concrete we’re going to do w.r.t. > derek and carl? > May 28 21:59:52So Derek and carl? > May 28 22:26:54 its too late now, and we need to handle the > situation at hand > May 28 22:28:24nevcairiel, do you want a vote here and now, > to what effect? > May 28 22:31:09I agree that a vote on the ML would be better to give > people that fell asleep here the chance to participate also > May 28 22:34:59It's late here. I'm ok for a vote also, just > not sure what kind of offense it would be And the big, flashy pink, elephant in the room: Action: draft a vote on the repercussions to Carl Eugen Hoyos behaviours (patch submission, general interaction with others) Note, I don't have a strong idea on what it may contain (option of temp/week/perma ban, warning, removal of some rights, etc). If someone is interested in this, maybe listing all potential options and use Condorcet etc. I don't really care, the above just says:
Re: [FFmpeg-devel] [PATCH] avcodec: add MagicYUV decoder
2016-05-30 17:50 GMT+02:00 Paul B Mahol: > On 5/30/16, Piotr Bandurski wrote: >> Hi, >> >>> patch attached. >> >> Is decoding of interlaced video supported? Because I get here invalid >> output. >> >> Also crash happens with this fuzzed file: >> >> https://www.datafilehost.com/d/c64eb5b1 >> >> Regards > > Can you create YUVA video somehow? I can't with virtualdub. So a final iteration was pushed. Can any of you create fate tests, ideally for 3 cases, "normal"/"interlaced"/"YUVA" (or whatever required introducing quirks in the code) Thanks, -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avcodec: add MagicYUV decoder
Hi, 2016-05-30 15:09 GMT+02:00 Paul B Mahol: Hi, 2016-05-30 15:09 GMT+02:00 Paul B Mahol : >> ffmpeg seems to have libavutil/qsort.h, but I don't even know how much >> effort is needed to use it here. > > Changed, doesn't help but maybe will for other archs. I have no idea why it is present, but it made sense to me to reuse whatever ffmpeg already has. >> That's somewhat similar to png paeth, except not actually reusable. I >> wonder if there's something in libavcodec that could be reused, in >> which case moving it to the hdsp context would be nice) > > Our Huffyuv decoder is still missing gradient prediction... I'd say neither does the encoder, so it's not "specified" in any public "version" (whether the 15yo ones or the ffvhuff ones). It's a matter of creating a new version with it for smaller filesizes. But I bet users are all over those newer codecs (utvideo) and haven't looked/won't look back at vhuffyuv if this predictor and slices were implemented there. >>> +} else if (pred == MEDIAN) { >> [...] >>> +} else { >> >> So, that's maybe a detail at this point, and you want to move quickly >> to other stuff, but: >> would you like to look at e.g. huffyuvdec or pngdec for a code that is >> not as nice looking, but more cache-friendly? >> >> Basically, you move the first line out of the loops, and then do >> sequentially, per row in the loop, bitstream reading, reconstruction >> (residual+prediction) and any post-processing... > > Just tried, didn't help much here. Hmm, was that single-threaded decoding? Also, the VLC decoding is slower than needed (again, see huffyuvdec and generate_joint_tables), so that may not show up. Anyway, whatever temporary patch you had probably became invalid after you implemented interlaced encoding, so that's kind of moot. >>> +for (i = 0; i < p->height; i++) { >>> +for (x = 0; x < p->width; x++) { >>> +b[x] += g[x]; >>> +r[x] += g[x]; >>> +} btw, isn't that add_bytes from HuffYUVDSPContext ? ie: hdsp->add_bytes(b, g, p->width); etc Except for this (and pending Piotr's fuzzing cases), looks fine. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avcodec: add MagicYUV decoder
Hi, 2016-05-29 21:51 GMT+02:00 Paul B Mahol: > +typedef struct Slice { > +uint32_t start; > +uint32_t size; > +} Slice; I'm not a security expert, but is there a reason for not using plain int there ? > +typedef struct MagicYUVContext { > +AVFrame*p; > +int slice_height; > +int nb_slices; > +int planes; > +uint8_t *buf; > +int hshift[4]; > +int vshift[4]; > +Slice *slices[4]; > +int slices_size[4]; > +uint8_t freq[4][256]; > +VLC vlc[4]; > +HuffYUVDSPContext hdsp; > +} MagicYUVContext; I guess someone able to understand the code immediately understand what those are, but that's pretty sparse comment-wise. > +typedef struct HuffEntry { And another Huffman+prediction codec... I don't really see any valuable addition here... :( > +uint8_t sym; > +uint8_t len; > +uint32_t code; > +} HuffEntry; > + > +static int ff_magy_huff_cmp_len(const void *a, const void *b) > +{ > +const HuffEntry *aa = a, *bb = b; > +return (aa->len - bb->len) * 256 + aa->sym - bb->sym; > +} > + > +static int build_huff(VLC *vlc, uint8_t *freq) > +{ > +HuffEntry he[256]; > +uint32_t codes[256]; > +uint8_t bits[256]; > +uint8_t syms[256]; > +uint32_t code; > +int i, last; > + > +for (i = 0; i < 256; i++) { > +he[i].sym = 255 - i; > +he[i].len = freq[i]; > +} > +qsort(he, 256, sizeof(*he), ff_magy_huff_cmp_len); ffmpeg seems to have libavutil/qsort.h, but I don't even know how much effort is needed to use it here. > +pred = get_bits(, 8); > +dst = p->data[i] + j * sheight * stride; > +for (k = 0; k < height; k++) { > +for (x = 0; x < width; x++) { > +int pix; > +if (get_bits_left() <= 0) { > +return AVERROR_INVALIDDATA; > +} > +pix = get_vlc2(, s->vlc[i].table, s->vlc[i].bits, 3); > +if (pix < 0) { > +return AVERROR_INVALIDDATA; > +} > +dst[x] = 255 - pix; > +} > +dst += stride; > +} > + > +if (pred == LEFT) { > +dst = p->data[i] + j * sheight * stride; > +s->hdsp.add_hfyu_left_pred(dst, dst, width, 0); > +dst += stride; > +for (k = 1; k < height; k++) { > +s->hdsp.add_hfyu_left_pred(dst, dst, width, dst[-stride]); > +dst += stride; > +} > +} else if (pred == GRADIENT) { [...] That's somewhat similar to png paeth, except not actually reusable. I wonder if there's something in libavcodec that could be reused, in which case moving it to the hdsp context would be nice) > +} else if (pred == MEDIAN) { [...] > +} else { So, that's maybe a detail at this point, and you want to move quickly to other stuff, but: would you like to look at e.g. huffyuvdec or pngdec for a code that is not as nice looking, but more cache-friendly? Basically, you move the first line out of the loops, and then do sequentially, per row in the loop, bitstream reading, reconstruction (residual+prediction) and any post-processing... > +if (decorrelate) { > +uint8_t *b = p->data[0]; > +uint8_t *g = p->data[1]; > +uint8_t *r = p->data[2]; > + > +for (i = 0; i < p->height; i++) { > +for (x = 0; x < p->width; x++) { > +b[x] += g[x]; > +r[x] += g[x]; > +} > +b += p->linesize[0]; > +g += p->linesize[1]; > +r += p->linesize[2]; > +} > +} ... in particular, this step, that could be done line-wise, inside the threaded decoding, if I'm not mistaken. (cf. also png's deloco) Otherwise, I don't see much of anything that would require another reviewing round. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] Remove Derek Buitenhuis from MAINTAINERS
Hi, 2016-05-20 1:55 GMT+02:00 Lukasz Marek: > Is Derek revoked to commit or what? Couldn't he just commit this patch and > leave? :P I was a problem for some people, but I see they still have > problems. Let people with problems go away with they problems. Sorry if you felt ganged up on previously. Hopefully the new Code of Conduct will avoid that such situations raise to an unsufferable level. But whatever bad technical blood there have been between the two of you, and whoever I may agree technically with, this is uncalled for. You're just adding fuel to a tense situation, and causing distress to someone. This type of comment is exactly what should not be allowed by the Code of Conduct. Sorry if I have added more fuel, -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [Vote] Code of Conduct
Hi, 2016-05-18 20:40 GMT+02:00 Michael Niedermayer: > Please state clearly if you agree to the text or if not. > we can extend and tune it later and do another vote if there are more > suggestions I agree to having a CoC. This text is a first step, so I'm ok with it, but hoping it will be improved. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] doc/developer.texi: Add a code of conduct
Hi, 2016-05-20 2:38 GMT+02:00 Timothy Gu: >> > Note how it has a list of specific violations, instead of vague things like >> > "Be excellent" that the FFmpeg one has. >> > Note how it has a huge section on disciplinary procedures. [...] > I have to agree with Kieran here. I believe that as a community, we definitely > _want_ to assume good faith, etc. But conflict resolution requires strict, > codified consequences for violations of the CoC, to ensure fairness. The > language needs to be made more serious so that people actually take it in the > way it is intended. Completely agree. Without sanctions, I fear it won't help mitigate problems. Also I'd like this to extend to any medium: mailing list, irc, private communication between an offender and an "offended"... Secondly, I'd like this to apply to any comment, whether on a person or his work. Being frustrated is no reason for foul language, even if that's not the most hindering stuff that can happen. Also, because a text is subject to interpretation, the CoC could maybe state how this is arbitrated. Best regards, -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 01/10] avcodec/dca: remove Rice code length limit
2016-05-13 11:48 GMT+02:00 foo86: > -unsigned int v = get_unary(gb, 1, 128); > +unsigned int v = get_unary(gb, 1, get_bits_left(gb)); Not that the patch is not ok, but I have a few uneducated questions: 1) Given the get_bits_long(gb, k) afterwards, won't that code cause overreads for corrupted bitstreams? 2) I haven't checked the calling code, but consequently, wouldn't it be better to first check that at least k+1 bits are available? 3) 128 is already fairly large; is the new code for valid bitstreams (in the sense of specs and actually generated) or for corrupted bitstreams? I don't know where the parsing is validated afterwards (e.g. if there have been overreads or invalid values parsed) Thanks, -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 4/4] lossless audio dsp: unroll
2016-05-01 15:33 GMT+02:00 Christophe Gisquet <christophe.gisq...@gmail.com>: > The loops are guaranteed to be at least multiples of 8, so this > unrolling is safe but allows exploiting execution ports. > > For int32 version: 68 -> 58c. Ping? This was ok'ed by James irrespective of the auto-vectorization discussion, but I don't mind it being dropped anyway. Mostly cleaning my local branches. Best regards, -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] vc2enc_dwt: use 32 bit coefficients by default
2016-05-07 21:48 GMT+02:00 Rostislav Pehlivanov: > The costliest part of the encoder right now is encoding the coefficients > (~36%). Slightly less-costly is rate control (~31%), and after that is the > transform (~12%). There really isn't anything else, other than 3 copies > (input image converted to signed and copied to a buffer, then because the > transform is out of place there's a copy to another buffer and then back), > but they don't take that much time. Thanks for the detailed reply. Anyway, patch ok with me, and I agree with Michael. I was just curious of what an "improved fix" would do. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] vc2enc_dwt: use 32 bit coefficients by default
2016-05-07 19:12 GMT+02:00 Rostislav Pehlivanov: > The problem is that with particularly complex images and especially at > high bit depths and 5-level transforms the coefficients would overflow I guess it also depends on the transform type, so that counts also for the last comment. > causing huge artifacts to appear. This was discovered thanks to the fate > tests, which will have to be redone as this fixes a multitude of > problems and increases PSNR. I admit I saw strange numbers, but as they sometime include color transform forth and back, I didn't really pay attention. Was there a risk it produced incorrect output in "valid" decoders? > There is a slight performance drop associated with this change, making > the encoder slower by 1.15 times, however this is necessary in order to > avoid undefined behavior and overflows. This means no asm has been written yet. Is the performance drop mostly in transforms, or rather any coefficient manipulation (like rate evaluation etc), or memory bandwidth? In the former case, it might be less critical in the future. > It would be worth to template the transforms to keep the performance for > 8 bit images as 32 bit coefficients are unnecessary for that case, but > the primary use of the encoder is to encode video at 10 bits. I don't know what that entails, but indeed, there are several parameters affecting what's possible, and the current change is the simplest/fastest/safest at the moment. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] vc2enc: prevent random data
Hi, 2016-05-06 2:19 GMT+02:00 Rostislav Pehlivanov: > I plan to merge the fate tests as well tomorrow or on Saturday when I'll > have time to quickly fix bugs which appear on platforms I haven't tested > the encoder on. Hopefully none, but you never know. Sure, makes sense. In case you don't find time nor devices for those tests, Michael seems to have tested on a fair number of archs: "tested on mips/arm/x86 linux and mingw32/64" -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 2/2] vc2: fate tests
--- tests/fate/vcodec.mak | 17 - tests/ref/vsynth/vsynth1-vc2-420p | 4 tests/ref/vsynth/vsynth1-vc2-420p10 | 4 tests/ref/vsynth/vsynth1-vc2-420p12 | 4 tests/ref/vsynth/vsynth1-vc2-422p | 4 tests/ref/vsynth/vsynth1-vc2-422p10 | 4 tests/ref/vsynth/vsynth1-vc2-422p12 | 4 tests/ref/vsynth/vsynth1-vc2-444p | 4 tests/ref/vsynth/vsynth1-vc2-444p10 | 4 tests/ref/vsynth/vsynth1-vc2-444p12 | 4 tests/ref/vsynth/vsynth2-vc2-420p | 4 tests/ref/vsynth/vsynth2-vc2-420p10 | 4 tests/ref/vsynth/vsynth2-vc2-420p12 | 4 tests/ref/vsynth/vsynth2-vc2-422p | 4 tests/ref/vsynth/vsynth2-vc2-422p10 | 4 tests/ref/vsynth/vsynth2-vc2-422p12 | 4 tests/ref/vsynth/vsynth2-vc2-444p | 4 tests/ref/vsynth/vsynth2-vc2-444p10 | 4 tests/ref/vsynth/vsynth2-vc2-444p12 | 4 tests/ref/vsynth/vsynth_lena-vc2-420p | 4 tests/ref/vsynth/vsynth_lena-vc2-420p10 | 4 tests/ref/vsynth/vsynth_lena-vc2-420p12 | 4 tests/ref/vsynth/vsynth_lena-vc2-422p | 4 tests/ref/vsynth/vsynth_lena-vc2-422p10 | 4 tests/ref/vsynth/vsynth_lena-vc2-422p12 | 4 tests/ref/vsynth/vsynth_lena-vc2-444p | 4 tests/ref/vsynth/vsynth_lena-vc2-444p10 | 4 tests/ref/vsynth/vsynth_lena-vc2-444p12 | 4 28 files changed, 124 insertions(+), 1 deletion(-) create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p10 create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p12 create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p10 create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p12 create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p10 create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p12 create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p10 create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p12 create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p10 create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p12 create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p10 create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p12 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p10 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p12 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p10 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p12 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p10 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p12 diff --git a/tests/fate/vcodec.mak b/tests/fate/vcodec.mak index ccf88ce..0e08894 100644 --- a/tests/fate/vcodec.mak +++ b/tests/fate/vcodec.mak @@ -29,6 +29,19 @@ FATE_VCODEC-$(call ENCDEC, DNXHD, DNXHD) += dnxhd-720p \ dnxhd-720p-rd \ dnxhd-720p-10bit +FATE_VCODEC-$(call ENCDEC, VC2 DIRAC, MOV) += vc2-420p vc2-420p10 vc2-420p12 \ + vc2-422p vc2-422p10 vc2-422p12 \ + vc2-444p vc2-444p10 vc2-444p12 +fate-vsynth1-vc2-%: FMT = mov +fate-vsynth1-vc2-%: ENCOPTS = -pix_fmt yuv$(@:fate-vsynth1-vc2-%=%) \ + -vcodec vc2 -frames 5 -strict -1 +fate-vsynth2-vc2-%: FMT = mov +fate-vsynth2-vc2-%: ENCOPTS = -pix_fmt yuv$(@:fate-vsynth2-vc2-%=%) \ + -vcodec vc2 -frames 5 -strict -1 +fate-vsynth_lena-vc2-%: FMT = mov +fate-vsynth_lena-vc2-%: ENCOPTS = -pix_fmt yuv$(@:fate-vsynth_lena-vc2-%=%) \ + -vcodec vc2 -frames 5 -strict -1 + fate-vsynth%-dnxhd-720p: ENCOPTS = -s hd720 -b 90M \ -pix_fmt yuv422p -frames 5 -qmax 8 fate-vsynth%-dnxhd-720p: FMT = dnxhd @@ -356,7 +369,9 @@ FATE_VSYNTH2 = $(FATE_VCODEC:%=fate-vsynth2-%) FATE_VSYNTH_LENA = $(FATE_VCODEC:%=fate-vsynth_lena-%) # Redundant tests because they just resize the input RESIZE_OFF = dnxhd-720p dnxhd-720p-rd dnxhd-720p-10bit dnxhd-1080i \ - dv dv-411 dv-50 avui snow snow-hpel snow-ll + dv dv-411 dv-50 avui snow snow-hpel snow-ll vc2-420p \ + vc2-420p10 vc2-420p12 vc2-422p vc2-422p10 vc2-422p12 \ + vc2-444p vc2-444p10 vc2-444p12 # Incorrect
[FFmpeg-devel] [PATCH 1/2] vc2enc: prevent random data
The slice prefix is 0 in the reference encoder and the decoder ignores it. Writing 0 there seems like the best temporary solution. The padding could have contained uninitialized data, but reference VC2 encoders put 0xFF there, hence the memset value. Overall this allows producing bistreams with no random data for use by fate. --- libavcodec/vc2enc.c | 5 + 1 file changed, 5 insertions(+) diff --git a/libavcodec/vc2enc.c b/libavcodec/vc2enc.c index 6d24552..bbbeaa0 100644 --- a/libavcodec/vc2enc.c +++ b/libavcodec/vc2enc.c @@ -777,7 +777,10 @@ static int encode_hq_slice(AVCodecContext *avctx, void *arg) uint8_t quants[MAX_DWT_LEVELS][4]; int p, level, orientation; +/* The reference decoder ignores it, and its typical length is 0 */ +memset(put_bits_ptr(pb), 0, s->prefix_bytes); skip_put_bytes(pb, s->prefix_bytes); + put_bits(pb, 8, quant_idx); /* Slice quantization (slice_quantizers() in the specs) */ @@ -809,6 +812,8 @@ static int encode_hq_slice(AVCodecContext *avctx, void *arg) } pb->buf[bytes_start] = pad_s; flush_put_bits(pb); +/* vc2-reference uses that padding that decodes to '0' coeffs */ +memset(put_bits_ptr(pb), 0xFF, pad_c); skip_put_bytes(pb, pad_c); } -- 2.8.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] vc2enc: prevent random data
Hi, 2016-05-04 3:06 GMT+02:00 Rostislav Pehlivanov: > vc2hqencode is not the reference encoder, vc2-reference is. It's even worse > though. Sorry, I thought authoritative could mean "from the authors", so I didn't mean it as "the" reference/"the authority". Just a good reference in case the specs are not clear or don't mention it. If vc2-reference or "the" reference dirac codec do it differently, then those should be followed. > Also, the commit message still says 0 instead of 0xff. I'm getting confused: neither version of the patch does. On a side note, maybe I should retract the "standardized value" for the padding. The commit message does mention 0 for the prefix, because I didn't see any reference as what should be there. Anything I've seen uses 0 prefix bytes. > btw the value 0xff > makes sense since that's the golomb code for 0. Would make reading broken > files a little more robust (instead of reading a ton of zeroes, losing > bitstream sync and causing trouble elsewhere). Yeah, that's what I meant with the rationale. Classical reason to choose a value for a padding. Not a big deal, but isn't the specific codeword for 0 '1'? > Could you make the comment a C89 style like the rest of the encoder? Will come later today, and I may wait for your other patch to vc2enc. And sorry I haven't followed the style of the file. > Other than that the patch is okay. Slice padding is usually very small, so > no real performance degradation. I haven't tested but I take your word on it. Best regards, -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] vc2enc: prevent random data
Le 3 mai 2016 22:15, "Rostislav Pehlivanov" <atomnu...@gmail.com> a écrit : > > On 3 May 2016 at 19:16, Christophe Gisquet <christophe.gisq...@gmail.com> > wrote: > > > > > > Btw, afaik, the padding is 0xFF, so expecting 0 in the buffer there > > can't do the job. > > > > > I don't get it, you keep saying that the padding must be 0xff yet the patch > you posted puts 0x00. Didn't a second mail and patch reach the mailing list? > Where did you even read that the padding must be > 0xff, I don't remember the specs saying anything about what the padding > should contain. I didn't say 'must' but 'AFAIK', because I could be wrong, but until proven otherwise... Anyway, there: https://github.com/bbc/vc2hqencode/blob/master/vc2hqencode/serialise.hpp#L103 and I'd say it's authoritative? Weird that a spec doesn't mandate a way to fill it, but maybe that's SMPTE. For the rationale, no idea, maybe the coding of the zero coeff. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] vc2enc: prevent random data
Hi, 2016-05-03 19:24 GMT+02:00 Hendrik Leppkes: >> +// The reference decoder ignores it, and its typical length is 0 >> +memset(put_bits_ptr(pb), 0, s->prefix_bytes); >> skip_put_bytes(pb, s->prefix_bytes); >> + > > I don't suppose we have a function to just write zero bytes instead of > these shenangans of written to the buffer and skiping? I don't think so, but I may be wrong. The AV_ZERO macros are of course not suited here. >> +memset(pb->buf_ptr, 0, pad_c); >> skip_put_bytes(pb, pad_c); > > Both occurances use different ways to access the buffer, once > put_bits_ptr(pb) and one pb->buf_ptr, if this is the only way to do > this, maybe stick to one? Yeah, squashing issue. My next patch must have crossed your mail. I thought of having another put_bits function like put_byte_something(PutBitContext, uint8_t byte, unsigned int len). But probably overkill. Btw, afaik, the padding is 0xFF, so expecting 0 in the buffer there can't do the job. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 0/2] Fix VC-2 encoder
The encoder was leaving uninitialized data in the padding of slices, while the specs seem to mandate the use of 0xFF. This is also the case for the slice prefix, but it seems completely unused. To validate this, classical vsynth encoding/decoding fate tests for all supported chroma formats are added. Suggestions for being even more concise in the target/rules are welcome. Christophe Gisquet (2): vc2enc: prevent random data vc2: fate tests libavcodec/vc2enc.c | 4 tests/fate/vcodec.mak | 17 - tests/ref/vsynth/vsynth1-vc2-420p | 4 tests/ref/vsynth/vsynth1-vc2-420p10 | 4 tests/ref/vsynth/vsynth1-vc2-420p12 | 4 tests/ref/vsynth/vsynth1-vc2-422p | 4 tests/ref/vsynth/vsynth1-vc2-422p10 | 4 tests/ref/vsynth/vsynth1-vc2-422p12 | 4 tests/ref/vsynth/vsynth1-vc2-444p | 4 tests/ref/vsynth/vsynth1-vc2-444p10 | 4 tests/ref/vsynth/vsynth1-vc2-444p12 | 4 tests/ref/vsynth/vsynth2-vc2-420p | 4 tests/ref/vsynth/vsynth2-vc2-420p10 | 4 tests/ref/vsynth/vsynth2-vc2-420p12 | 4 tests/ref/vsynth/vsynth2-vc2-422p | 4 tests/ref/vsynth/vsynth2-vc2-422p10 | 4 tests/ref/vsynth/vsynth2-vc2-422p12 | 4 tests/ref/vsynth/vsynth2-vc2-444p | 4 tests/ref/vsynth/vsynth2-vc2-444p10 | 4 tests/ref/vsynth/vsynth2-vc2-444p12 | 4 tests/ref/vsynth/vsynth_lena-vc2-420p | 4 tests/ref/vsynth/vsynth_lena-vc2-420p10 | 4 tests/ref/vsynth/vsynth_lena-vc2-420p12 | 4 tests/ref/vsynth/vsynth_lena-vc2-422p | 4 tests/ref/vsynth/vsynth_lena-vc2-422p10 | 4 tests/ref/vsynth/vsynth_lena-vc2-422p12 | 4 tests/ref/vsynth/vsynth_lena-vc2-444p | 4 tests/ref/vsynth/vsynth_lena-vc2-444p10 | 4 tests/ref/vsynth/vsynth_lena-vc2-444p12 | 4 29 files changed, 128 insertions(+), 1 deletion(-) create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p10 create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p12 create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p10 create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p12 create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p10 create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p12 create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p10 create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p12 create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p10 create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p12 create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p10 create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p12 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p10 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p12 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p10 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p12 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p10 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p12 -- 2.8.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 1/2] vc2enc: prevent random data
The slice prefix is 0 in the reference encoder and the decoder ignores it. Writing 0 there seems like the best temporary solution. The padding could have contained uninitialized data, but its standardized value is 0xFF, hence the memset value. Overall this allows producing bistreams with no random data for use by fate. --- libavcodec/vc2enc.c | 4 1 file changed, 4 insertions(+) diff --git a/libavcodec/vc2enc.c b/libavcodec/vc2enc.c index 943198b..bec513c 100644 --- a/libavcodec/vc2enc.c +++ b/libavcodec/vc2enc.c @@ -777,7 +777,10 @@ static int encode_hq_slice(AVCodecContext *avctx, void *arg) uint8_t quants[MAX_DWT_LEVELS][4]; int p, level, orientation; +// The reference decoder ignores it, and its typical length is 0 +memset(put_bits_ptr(pb), 0, s->prefix_bytes); skip_put_bytes(pb, s->prefix_bytes); + put_bits(pb, 8, quant_idx); /* Slice quantization (slice_quantizers() in the specs) */ @@ -809,6 +812,7 @@ static int encode_hq_slice(AVCodecContext *avctx, void *arg) } pb->buf[bytes_start] = pad_s; flush_put_bits(pb); +memset(pb->buf_ptr, 0, pad_c); skip_put_bytes(pb, pad_c); } -- 2.8.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/2] vc2: fate tests
2016-05-03 19:06 GMT+02:00 Christophe Gisquet <christophe.gisq...@gmail.com>: [SNIP] Incorrect padding used (0 instead of 0xFF), fixed in that patch series. -- Christophe From 22ff25711062fb1ca30da1674fd622fd6f81c8e3 Mon Sep 17 00:00:00 2001 From: Christophe Gisquet <christophe.gisq...@gmail.com> Date: Mon, 2 May 2016 21:57:29 +0200 Subject: [PATCH 2/2] vc2: fate tests --- tests/fate/vcodec.mak | 17 - tests/ref/vsynth/vsynth1-vc2-420p | 4 tests/ref/vsynth/vsynth1-vc2-420p10 | 4 tests/ref/vsynth/vsynth1-vc2-420p12 | 4 tests/ref/vsynth/vsynth1-vc2-422p | 4 tests/ref/vsynth/vsynth1-vc2-422p10 | 4 tests/ref/vsynth/vsynth1-vc2-422p12 | 4 tests/ref/vsynth/vsynth1-vc2-444p | 4 tests/ref/vsynth/vsynth1-vc2-444p10 | 4 tests/ref/vsynth/vsynth1-vc2-444p12 | 4 tests/ref/vsynth/vsynth2-vc2-420p | 4 tests/ref/vsynth/vsynth2-vc2-420p10 | 4 tests/ref/vsynth/vsynth2-vc2-420p12 | 4 tests/ref/vsynth/vsynth2-vc2-422p | 4 tests/ref/vsynth/vsynth2-vc2-422p10 | 4 tests/ref/vsynth/vsynth2-vc2-422p12 | 4 tests/ref/vsynth/vsynth2-vc2-444p | 4 tests/ref/vsynth/vsynth2-vc2-444p10 | 4 tests/ref/vsynth/vsynth2-vc2-444p12 | 4 tests/ref/vsynth/vsynth_lena-vc2-420p | 4 tests/ref/vsynth/vsynth_lena-vc2-420p10 | 4 tests/ref/vsynth/vsynth_lena-vc2-420p12 | 4 tests/ref/vsynth/vsynth_lena-vc2-422p | 4 tests/ref/vsynth/vsynth_lena-vc2-422p10 | 4 tests/ref/vsynth/vsynth_lena-vc2-422p12 | 4 tests/ref/vsynth/vsynth_lena-vc2-444p | 4 tests/ref/vsynth/vsynth_lena-vc2-444p10 | 4 tests/ref/vsynth/vsynth_lena-vc2-444p12 | 4 28 files changed, 124 insertions(+), 1 deletion(-) create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p10 create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p12 create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p10 create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p12 create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p10 create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p12 create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p10 create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p12 create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p10 create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p12 create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p10 create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p12 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p10 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p12 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p10 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p12 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p10 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p12 diff --git a/tests/fate/vcodec.mak b/tests/fate/vcodec.mak index ccf88ce..0e08894 100644 --- a/tests/fate/vcodec.mak +++ b/tests/fate/vcodec.mak @@ -29,6 +29,19 @@ FATE_VCODEC-$(call ENCDEC, DNXHD, DNXHD) += dnxhd-720p \ dnxhd-720p-rd \ dnxhd-720p-10bit +FATE_VCODEC-$(call ENCDEC, VC2 DIRAC, MOV) += vc2-420p vc2-420p10 vc2-420p12 \ + vc2-422p vc2-422p10 vc2-422p12 \ + vc2-444p vc2-444p10 vc2-444p12 +fate-vsynth1-vc2-%: FMT = mov +fate-vsynth1-vc2-%: ENCOPTS = -pix_fmt yuv$(@:fate-vsynth1-vc2-%=%) \ + -vcodec vc2 -frames 5 -strict -1 +fate-vsynth2-vc2-%: FMT = mov +fate-vsynth2-vc2-%: ENCOPTS = -pix_fmt yuv$(@:fate-vsynth2-vc2-%=%) \ + -vcodec vc2 -frames 5 -strict -1 +fate-vsynth_lena-vc2-%: FMT = mov +fate-vsynth_lena-vc2-%: ENCOPTS = -pix_fmt yuv$(@:fate-vsynth_lena-vc2-%=%) \ + -vcodec vc2 -frames 5 -strict -1 + fate-vsynth%-dnxhd-720p: ENCOPTS = -s hd720 -b 90M \ -pix_fmt yuv422p -frames 5 -qmax 8 fate-vsynth%-dnxhd-720p: FMT = dnxhd @@ -356,7 +369,9 @@ FATE_VSYNTH2 = $(FATE_VCODEC:%=fate-vsynth2-%) FATE_VSYNTH_LENA = $(FATE_VCODEC:%=fate-vsynth_lena-%)
Re: [FFmpeg-devel] [PATCH 1/2] vc2enc: prevent random data
2016-05-03 19:06 GMT+02:00 Christophe Gisquet <christophe.gisq...@gmail.com>: > +memset(pb->buf_ptr, 0, pad_c); Commit squashing fail, attached patch should fix that. This unfortunately requires updating the fate tests as I generated them from this squashing. -- Christophe From 3008fd916cca5b9ab22e96536e778d63ba25ed20 Mon Sep 17 00:00:00 2001 From: Christophe Gisquet <christophe.gisq...@gmail.com> Date: Tue, 3 May 2016 11:47:25 +0200 Subject: [PATCH 1/2] vc2enc: prevent random data The slice prefix is 0 in the reference encoder and the decoder ignores it. Writing 0 there seems like the best temporary solution. The padding could have contained uninitialized data, but its standardized value is 0xFF, hence the memset value. Overall this allows producing bistreams with no random data for use by fate. --- libavcodec/vc2enc.c | 4 1 file changed, 4 insertions(+) diff --git a/libavcodec/vc2enc.c b/libavcodec/vc2enc.c index 943198b..6fbdaa5 100644 --- a/libavcodec/vc2enc.c +++ b/libavcodec/vc2enc.c @@ -777,7 +777,10 @@ static int encode_hq_slice(AVCodecContext *avctx, void *arg) uint8_t quants[MAX_DWT_LEVELS][4]; int p, level, orientation; +// The reference decoder ignores it, and its typical length is 0 +memset(put_bits_ptr(pb), 0, s->prefix_bytes); skip_put_bytes(pb, s->prefix_bytes); + put_bits(pb, 8, quant_idx); /* Slice quantization (slice_quantizers() in the specs) */ @@ -809,6 +812,7 @@ static int encode_hq_slice(AVCodecContext *avctx, void *arg) } pb->buf[bytes_start] = pad_s; flush_put_bits(pb); +memset(put_bits_ptr(pb), 0xFF, pad_c); skip_put_bytes(pb, pad_c); } -- 2.8.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 2/2] vc2: fate tests
--- tests/fate/vcodec.mak | 17 - tests/ref/vsynth/vsynth1-vc2-420p | 4 tests/ref/vsynth/vsynth1-vc2-420p10 | 4 tests/ref/vsynth/vsynth1-vc2-420p12 | 4 tests/ref/vsynth/vsynth1-vc2-422p | 4 tests/ref/vsynth/vsynth1-vc2-422p10 | 4 tests/ref/vsynth/vsynth1-vc2-422p12 | 4 tests/ref/vsynth/vsynth1-vc2-444p | 4 tests/ref/vsynth/vsynth1-vc2-444p10 | 4 tests/ref/vsynth/vsynth1-vc2-444p12 | 4 tests/ref/vsynth/vsynth2-vc2-420p | 4 tests/ref/vsynth/vsynth2-vc2-420p10 | 4 tests/ref/vsynth/vsynth2-vc2-420p12 | 4 tests/ref/vsynth/vsynth2-vc2-422p | 4 tests/ref/vsynth/vsynth2-vc2-422p10 | 4 tests/ref/vsynth/vsynth2-vc2-422p12 | 4 tests/ref/vsynth/vsynth2-vc2-444p | 4 tests/ref/vsynth/vsynth2-vc2-444p10 | 4 tests/ref/vsynth/vsynth2-vc2-444p12 | 4 tests/ref/vsynth/vsynth_lena-vc2-420p | 4 tests/ref/vsynth/vsynth_lena-vc2-420p10 | 4 tests/ref/vsynth/vsynth_lena-vc2-420p12 | 4 tests/ref/vsynth/vsynth_lena-vc2-422p | 4 tests/ref/vsynth/vsynth_lena-vc2-422p10 | 4 tests/ref/vsynth/vsynth_lena-vc2-422p12 | 4 tests/ref/vsynth/vsynth_lena-vc2-444p | 4 tests/ref/vsynth/vsynth_lena-vc2-444p10 | 4 tests/ref/vsynth/vsynth_lena-vc2-444p12 | 4 28 files changed, 124 insertions(+), 1 deletion(-) create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p10 create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p12 create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p10 create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p12 create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p10 create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p12 create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p10 create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p12 create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p10 create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p12 create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p10 create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p12 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p10 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p12 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p10 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p12 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p10 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p12 diff --git a/tests/fate/vcodec.mak b/tests/fate/vcodec.mak index ccf88ce..0e08894 100644 --- a/tests/fate/vcodec.mak +++ b/tests/fate/vcodec.mak @@ -29,6 +29,19 @@ FATE_VCODEC-$(call ENCDEC, DNXHD, DNXHD) += dnxhd-720p \ dnxhd-720p-rd \ dnxhd-720p-10bit +FATE_VCODEC-$(call ENCDEC, VC2 DIRAC, MOV) += vc2-420p vc2-420p10 vc2-420p12 \ + vc2-422p vc2-422p10 vc2-422p12 \ + vc2-444p vc2-444p10 vc2-444p12 +fate-vsynth1-vc2-%: FMT = mov +fate-vsynth1-vc2-%: ENCOPTS = -pix_fmt yuv$(@:fate-vsynth1-vc2-%=%) \ + -vcodec vc2 -frames 5 -strict -1 +fate-vsynth2-vc2-%: FMT = mov +fate-vsynth2-vc2-%: ENCOPTS = -pix_fmt yuv$(@:fate-vsynth2-vc2-%=%) \ + -vcodec vc2 -frames 5 -strict -1 +fate-vsynth_lena-vc2-%: FMT = mov +fate-vsynth_lena-vc2-%: ENCOPTS = -pix_fmt yuv$(@:fate-vsynth_lena-vc2-%=%) \ + -vcodec vc2 -frames 5 -strict -1 + fate-vsynth%-dnxhd-720p: ENCOPTS = -s hd720 -b 90M \ -pix_fmt yuv422p -frames 5 -qmax 8 fate-vsynth%-dnxhd-720p: FMT = dnxhd @@ -356,7 +369,9 @@ FATE_VSYNTH2 = $(FATE_VCODEC:%=fate-vsynth2-%) FATE_VSYNTH_LENA = $(FATE_VCODEC:%=fate-vsynth_lena-%) # Redundant tests because they just resize the input RESIZE_OFF = dnxhd-720p dnxhd-720p-rd dnxhd-720p-10bit dnxhd-1080i \ - dv dv-411 dv-50 avui snow snow-hpel snow-ll + dv dv-411 dv-50 avui snow snow-hpel snow-ll vc2-420p \ + vc2-420p10 vc2-420p12 vc2-422p vc2-422p10 vc2-422p12 \ + vc2-444p vc2-444p10 vc2-444p12 # Incorrect
Re: [FFmpeg-devel] [PATCH 1/4] fate: wma: add lossless 24bits test
Hi, 2016-05-02 16:02 GMT+02:00 Michael Niedermayer: >> +fate-lossless-wma24-rawtile: CMD = md5 -i >> $(TARGET_SAMPLES)/lossless-audio/g2_24bit.wma -f s24le > > where can i find that file ? > i assume i should upload it ? Sorry, I thought we had discussed it in this thread, but even with this, it was not obvious. Yes, please upload this file: https://trac.ffmpeg.org/raw-attachment/ticket/4134/g2_24bit.wma -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/4] wmalossless: allow calling madd_int16
2016-05-01 15:33 GMT+02:00 Christophe Gisquet <christophe.gisq...@gmail.com>: > This is done by actually handling the "prev_values" in the cascaded LMS data > as if it were int16_t, thus requiring switching at various locations the > computations. Patch update since Michael's fix, which was incidentally included in the previous patch. -- Christophe From 7ef234b0e16b364b8efcd2f3b6b2e6f34f707ee8 Mon Sep 17 00:00:00 2001 From: Christophe Gisquet <christophe.gisq...@gmail.com> Date: Sun, 1 May 2016 12:34:29 +0200 Subject: [PATCH 2/4] wmalossless: allow calling madd_int16 This is done by actually handling the "prev_values" in the cascaded LMS data as if it were int16_t, thus requiring switching at various locations the computations. --- libavcodec/wmalosslessdec.c | 110 1 file changed, 59 insertions(+), 51 deletions(-) diff --git a/libavcodec/wmalosslessdec.c b/libavcodec/wmalosslessdec.c index 3e80c47..1ea5918 100644 --- a/libavcodec/wmalosslessdec.c +++ b/libavcodec/wmalosslessdec.c @@ -694,32 +694,6 @@ static void revert_mclms(WmallDecodeCtx *s, int tile_size) } } -static void lms_update(WmallDecodeCtx *s, int ich, int ilms, int input) -{ -int recent = s->cdlms[ich][ilms].recent; -int range = 1 << s->bits_per_sample - 1; -int order = s->cdlms[ich][ilms].order; - -if (recent) -recent--; -else { -memcpy(s->cdlms[ich][ilms].lms_prevvalues + order, - s->cdlms[ich][ilms].lms_prevvalues, sizeof(*s->cdlms[ich][ilms].lms_prevvalues) * order); -memcpy(s->cdlms[ich][ilms].lms_updates + order, - s->cdlms[ich][ilms].lms_updates, sizeof(*s->cdlms[ich][ilms].lms_updates) * order); -recent = order - 1; -} - -s->cdlms[ich][ilms].lms_prevvalues[recent] = av_clip(input, -range, range - 1); -s->cdlms[ich][ilms].lms_updates[recent] = WMASIGN(input) * s->update_speed[ich]; - -s->cdlms[ich][ilms].lms_updates[recent + (order >> 4)] >>= 2; -s->cdlms[ich][ilms].lms_updates[recent + (order >> 3)] >>= 1; -s->cdlms[ich][ilms].recent = recent; -memset(s->cdlms[ich][ilms].lms_updates + recent + order, 0, - sizeof(s->cdlms[ich][ilms].lms_updates) - sizeof(int16_t)*(recent+order)); -} - static void use_high_update_speed(WmallDecodeCtx *s, int ich) { int ilms, recent, icoef; @@ -755,32 +729,63 @@ static void use_normal_update_speed(WmallDecodeCtx *s, int ich) s->update_speed[ich] = 8; } -static void revert_cdlms(WmallDecodeCtx *s, int ch, - int coef_begin, int coef_end) -{ -int icoef, pred, ilms, num_lms, residue, input; - -num_lms = s->cdlms_ttl[ch]; -for (ilms = num_lms - 1; ilms >= 0; ilms--) { -for (icoef = coef_begin; icoef < coef_end; icoef++) { -pred = 1 << (s->cdlms[ch][ilms].scaling - 1); -residue = s->channel_residues[ch][icoef]; -pred += s->dsp.scalarproduct_and_madd_int32(s->cdlms[ch][ilms].coefs, -s->cdlms[ch][ilms].lms_prevvalues -+ s->cdlms[ch][ilms].recent, -s->cdlms[ch][ilms].lms_updates -+ s->cdlms[ch][ilms].recent, -FFALIGN(s->cdlms[ch][ilms].order, -WMALL_COEFF_PAD_SIZE), -WMASIGN(residue)); -input = residue + (pred >> s->cdlms[ch][ilms].scaling); -lms_update(s, ch, ilms, input); -s->channel_residues[ch][icoef] = input; -} -} -emms_c(); +#define CD_LMS(bits, ROUND) \ +static void lms_update ## bits (WmallDecodeCtx *s, int ich, int ilms, int input) \ +{ \ +int recent = s->cdlms[ich][ilms].recent; \ +int range = 1 << s->bits_per_sample - 1; \ +int order = s->cdlms[ich][ilms].order; \ +int ##bits##_t *prev = (int##bits##_t *)s->cdlms[ich][ilms].lms_prevvalues; \ + \ +if (recent) \ +recent--; \ +else { \ +memcpy(prev + order, prev, (bits/8) * order); \ +memcpy(s->cdlms[ich][ilms].lms_updates + order, \ + s->cdlms[ich][ilms].lms_updates, \ + sizeof(*s->cdlms[ich][ilms].lms_updates) * order); \ +recent = order - 1; \ +} \ + \ +prev[recent] = av_clip(input, -range, range - 1); \ +s->cdlms[ich][ilms].lms_updates[recent] = WMASIGN(input) * s->update_speed[ich]; \ + \ +s->cdlms[ich][ilms].lms_updates[recent + (order >> 4)] >>= 2; \ +s->cdlms[ich][ilms].lms_updates[recent
Re: [FFmpeg-devel] [PATCH 1/4] fate: wma: add lossless 24bits test
2016-05-01 15:54 GMT+02:00 Paul B Mahol <one...@gmail.com>: > There where 2 distinct issues: 32bit instead of 16bit integers and > wrong handling of raw pcm. > The 96k is about the first one, last decoded frame md5 differs for example. Added a test for the file with raw pcm tiles then. -- Christophe From 584999fcce24585f989d2dc770e8c7c85aa19db7 Mon Sep 17 00:00:00 2001 From: Christophe Gisquet <christophe.gisq...@gmail.com> Date: Mon, 18 Apr 2016 12:53:21 +0200 Subject: [PATCH 1/4] fate: wma: add lossless 24bits tests Should evaluate coefficients and raw pcm tiles. --- tests/fate/lossless-audio.mak | 6 +- tests/ref/fate/lossless-wma24-1 | 1 + tests/ref/fate/lossless-wma24-2 | 1 + tests/ref/fate/lossless-wma24-rawtile | 1 + 4 files changed, 8 insertions(+), 1 deletion(-) create mode 100644 tests/ref/fate/lossless-wma24-1 create mode 100644 tests/ref/fate/lossless-wma24-2 create mode 100644 tests/ref/fate/lossless-wma24-rawtile diff --git a/tests/fate/lossless-audio.mak b/tests/fate/lossless-audio.mak index 58641ab..d292853 100644 --- a/tests/fate/lossless-audio.mak +++ b/tests/fate/lossless-audio.mak @@ -25,8 +25,12 @@ fate-lossless-tta: CMD = crc -i $(TARGET_SAMPLES)/lossless-audio/inside.tta FATE_SAMPLES_LOSSLESS_AUDIO-$(call DEMDEC, TTA, TTA) += fate-lossless-tta-encrypted fate-lossless-tta-encrypted: CMD = crc -password ffmpeg -i $(TARGET_SAMPLES)/lossless-audio/encrypted.tta -FATE_SAMPLES_LOSSLESS_AUDIO-$(call DEMDEC, ASF, WMALOSSLESS) += fate-lossless-wma +FATE_SAMPLES_LOSSLESS_AUDIO-$(call DEMDEC, ASF, WMALOSSLESS) += fate-lossless-wma fate-lossless-wma24-1 fate-lossless-wma24-2 fate-lossless-wma24-rawtile fate-lossless-wma: CMD = md5 -i $(TARGET_SAMPLES)/lossless-audio/luckynight-partial.wma -f s16le -frames 209 +fate-lossless-wma24-1: CMD = md5 -i $(TARGET_SAMPLES)/lossless-audio/master_audio_2.0_24bit.wma -f s24le +fate-lossless-wma24-2: CMD = md5 -i $(TARGET_SAMPLES)/lossless-audio/Mega_Weird_Audio_Test_24bit.wma -f s24le +fate-lossless-wma24-rawtile: CMD = md5 -i $(TARGET_SAMPLES)/lossless-audio/g2_24bit.wma -f s24le +fate-lossless-wmall: fate-lossless-wma fate-lossless-wma24-1 fate-lossless-wma24-2 fate-lossless-wma24-rawtile FATE_SAMPLES_LOSSLESS_AUDIO += $(FATE_SAMPLES_LOSSLESS_AUDIO-yes) diff --git a/tests/ref/fate/lossless-wma24-1 b/tests/ref/fate/lossless-wma24-1 new file mode 100644 index 000..ddee31c --- /dev/null +++ b/tests/ref/fate/lossless-wma24-1 @@ -0,0 +1 @@ +9ade91f506bc025854f6ffea0d635bc6 diff --git a/tests/ref/fate/lossless-wma24-2 b/tests/ref/fate/lossless-wma24-2 new file mode 100644 index 000..5ebdfd1 --- /dev/null +++ b/tests/ref/fate/lossless-wma24-2 @@ -0,0 +1 @@ +908ec5c16f497bf7d5658d2689d125c8 diff --git a/tests/ref/fate/lossless-wma24-rawtile b/tests/ref/fate/lossless-wma24-rawtile new file mode 100644 index 000..96e5e21 --- /dev/null +++ b/tests/ref/fate/lossless-wma24-rawtile @@ -0,0 +1 @@ +337592f38a2218a5bc95ceb9b5e72c8b -- 2.8.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/4] fate: wma: add lossless 24bits test
Hi, 2016-05-01 15:33 GMT+02:00 Christophe Gisquet <christophe.gisq...@gmail.com>: > +fate-lossless-wma24-2: CMD = md5 -i > $(TARGET_SAMPLES)/lossless-audio/Mega_Weird_Audio_Test_24bit.wma -f s24le The recent fixes actually changed the crc for that file. Is https://trac.ffmpeg.org/attachment/ticket/4134/96k.wma another file showing the issue? Because it could be added as a fate test. I'll send an updated patch once it has been decided whether to add the above test. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 2/4] wmalossless: allow calling madd_int16
This is done by actually handling the "prev_values" in the cascaded LMS data as if it were int16_t, thus requiring switching at various locations the computations. --- libavcodec/wmalosslessdec.c | 109 +++- 1 file changed, 58 insertions(+), 51 deletions(-) diff --git a/libavcodec/wmalosslessdec.c b/libavcodec/wmalosslessdec.c index f14e8a6..687bbe3 100644 --- a/libavcodec/wmalosslessdec.c +++ b/libavcodec/wmalosslessdec.c @@ -694,32 +694,6 @@ static void revert_mclms(WmallDecodeCtx *s, int tile_size) } } -static void lms_update(WmallDecodeCtx *s, int ich, int ilms, int input) -{ -int recent = s->cdlms[ich][ilms].recent; -int range = 1 << s->bits_per_sample - 1; -int order = s->cdlms[ich][ilms].order; - -if (recent) -recent--; -else { -memcpy(s->cdlms[ich][ilms].lms_prevvalues + order, - s->cdlms[ich][ilms].lms_prevvalues, sizeof(*s->cdlms[ich][ilms].lms_prevvalues) * order); -memcpy(s->cdlms[ich][ilms].lms_updates + order, - s->cdlms[ich][ilms].lms_updates, sizeof(*s->cdlms[ich][ilms].lms_updates) * order); -recent = order - 1; -} - -s->cdlms[ich][ilms].lms_prevvalues[recent] = av_clip(input, -range, range - 1); -s->cdlms[ich][ilms].lms_updates[recent] = WMASIGN(input) * s->update_speed[ich]; - -s->cdlms[ich][ilms].lms_updates[recent + (order >> 4)] >>= 2; -s->cdlms[ich][ilms].lms_updates[recent + (order >> 3)] >>= 1; -s->cdlms[ich][ilms].recent = recent; -memset(s->cdlms[ich][ilms].lms_updates + recent + order, 0, - sizeof(s->cdlms[ich][ilms].lms_updates) - 4*(recent+order)); -} - static void use_high_update_speed(WmallDecodeCtx *s, int ich) { int ilms, recent, icoef; @@ -755,32 +729,62 @@ static void use_normal_update_speed(WmallDecodeCtx *s, int ich) s->update_speed[ich] = 8; } -static void revert_cdlms(WmallDecodeCtx *s, int ch, - int coef_begin, int coef_end) -{ -int icoef, pred, ilms, num_lms, residue, input; - -num_lms = s->cdlms_ttl[ch]; -for (ilms = num_lms - 1; ilms >= 0; ilms--) { -for (icoef = coef_begin; icoef < coef_end; icoef++) { -pred = 1 << (s->cdlms[ch][ilms].scaling - 1); -residue = s->channel_residues[ch][icoef]; -pred += s->dsp.scalarproduct_and_madd_int32(s->cdlms[ch][ilms].coefs, - s->cdlms[ch][ilms].lms_prevvalues -+ s->cdlms[ch][ilms].recent, - s->cdlms[ch][ilms].lms_updates -+ s->cdlms[ch][ilms].recent, - FFALIGN(s->cdlms[ch][ilms].order, - WMALL_COEFF_PAD_SIZE), -WMASIGN(residue)); -input = residue + (pred >> s->cdlms[ch][ilms].scaling); -lms_update(s, ch, ilms, input); -s->channel_residues[ch][icoef] = input; -} -} -emms_c(); +#define CD_LMS(bits, ROUND) \ +static void lms_update ## bits (WmallDecodeCtx *s, int ich, int ilms, int input) \ +{ \ +int recent = s->cdlms[ich][ilms].recent; \ +int range = 1 << s->bits_per_sample - 1; \ +int order = s->cdlms[ich][ilms].order; \ +int ##bits##_t *prev = (int##bits##_t *)s->cdlms[ich][ilms].lms_prevvalues; \ + \ +if (recent) \ +recent--; \ +else { \ +memcpy(prev + order, prev, (bits/8) * order); \ +memcpy(s->cdlms[ich][ilms].lms_updates + order, \ + s->cdlms[ich][ilms].lms_updates, \ + sizeof(*s->cdlms[ich][ilms].lms_updates) * order); \ +recent = order - 1; \ +} \ + \ +prev[recent] = av_clip(input, -range, range - 1); \ +s->cdlms[ich][ilms].lms_updates[recent] = WMASIGN(input) * s->update_speed[ich]; \ + \ +s->cdlms[ich][ilms].lms_updates[recent + (order >> 4)] >>= 2; \ +s->cdlms[ich][ilms].lms_updates[recent + (order >> 3)] >>= 1; \ +s->cdlms[ich][ilms].recent = recent; \ +memset(s->cdlms[ich][ilms].lms_updates + recent + order, 0, \ + sizeof(s->cdlms[ich][ilms].lms_updates) - 2*(recent+order)); \ +} \ + \ +static void revert_cdlms ## bits (WmallDecodeCtx *s, int ch, \ + int coef_begin, int coef_end) \ +{ \ +int icoef, pred, ilms, num_lms, residue, input; \ + \ +num_lms = s->cdlms_ttl[ch]; \ +for (ilms = num_lms - 1; ilms >= 0; ilms--) { \ +for (icoef = coef_begin; icoef < coef_end; icoef++) { \ +int##bits##_t *prevvalues = (int##bits##_t *)s->cdlms[ch][ilms].lms_prevvalues; \ +pred = 1 << (s->cdlms[ch][ilms].scaling - 1); \ +residue = s->channel_residues[ch][icoef]; \ +pred +=
[FFmpeg-devel] [PATCH 3/4] x86: lossless audio: SSE4 madd 32bits
The unique user so far is wmalossless 24bits. The few samples tested show an order of 8, so more unrolling or an avx2 version do not make sense. Timings: 68 -> 49 cycles --- libavcodec/x86/lossless_audiodsp.asm| 33 + libavcodec/x86/lossless_audiodsp_init.c | 7 +++ 2 files changed, 40 insertions(+) diff --git a/libavcodec/x86/lossless_audiodsp.asm b/libavcodec/x86/lossless_audiodsp.asm index 5597dad..063d7b4 100644 --- a/libavcodec/x86/lossless_audiodsp.asm +++ b/libavcodec/x86/lossless_audiodsp.asm @@ -68,6 +68,39 @@ SCALARPRODUCT INIT_XMM sse2 SCALARPRODUCT +INIT_XMM sse4 +; int ff_scalarproduct_and_madd_int32(int16_t *v1, int32_t *v2, int16_t *v3, +; int order, int mul) +cglobal scalarproduct_and_madd_int32, 4,4,8, v1, v2, v3, order, mul +shl orderq, 1 +movdm7, mulm +SPLATW m7, m7 +pxorm6, m6 +add v1q, orderq +lea v2q, [v2q + 2*orderq] +add v3q, orderq +neg orderq +.loop: +movam3, [v1q + orderq] +movum0, [v2q + 2*orderq] +pmovsxwd m4, m3 +movum1, [v2q + 2*orderq + mmsize] +movhlps m5, m3 +movum2, [v3q + orderq] +pmovsxwd m5, m5 +pmullw m2, m7 +pmulld m0, m4 +pmulld m1, m5 +paddw m2, m3 +paddd m6, m0 +paddd m6, m1 +mova[v1q + orderq], m2 +add orderq, 16 +jl .loop +HADDD m6, m0 +movd eax, m6 +RET + %macro SCALARPRODUCT_LOOP 1 align 16 .loop%1: diff --git a/libavcodec/x86/lossless_audiodsp_init.c b/libavcodec/x86/lossless_audiodsp_init.c index 197173c..10b6a65 100644 --- a/libavcodec/x86/lossless_audiodsp_init.c +++ b/libavcodec/x86/lossless_audiodsp_init.c @@ -31,6 +31,10 @@ int32_t ff_scalarproduct_and_madd_int16_ssse3(int16_t *v1, const int16_t *v2, const int16_t *v3, int order, int mul); +int32_t ff_scalarproduct_and_madd_int32_sse4(int16_t *v1, const int32_t *v2, + const int16_t *v3, + int order, int mul); + av_cold void ff_llauddsp_init_x86(LLAudDSPContext *c) { #if HAVE_YASM @@ -45,5 +49,8 @@ av_cold void ff_llauddsp_init_x86(LLAudDSPContext *c) if (EXTERNAL_SSSE3(cpu_flags) && !(cpu_flags & (AV_CPU_FLAG_SSE42 | AV_CPU_FLAG_3DNOW))) // cachesplit c->scalarproduct_and_madd_int16 = ff_scalarproduct_and_madd_int16_ssse3; + +if (EXTERNAL_SSE4(cpu_flags)) +c->scalarproduct_and_madd_int32 = ff_scalarproduct_and_madd_int32_sse4; #endif } -- 2.8.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 4/4] lossless audio dsp: unroll
The loops are guaranteed to be at least multiples of 8, so this unrolling is safe but allows exploiting execution ports. For int32 version: 68 -> 58c. --- libavcodec/lossless_audiodsp.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/libavcodec/lossless_audiodsp.c b/libavcodec/lossless_audiodsp.c index ea0568e..e3ea8e1 100644 --- a/libavcodec/lossless_audiodsp.c +++ b/libavcodec/lossless_audiodsp.c @@ -29,10 +29,12 @@ static int32_t scalarproduct_and_madd_int16_c(int16_t *v1, const int16_t *v2, { int res = 0; -while (order--) { +do { res += *v1 * *v2++; *v1++ += mul * *v3++; -} +res += *v1 * *v2++; +*v1++ += mul * *v3++; +} while (order-=2); return res; } @@ -42,10 +44,12 @@ static int32_t scalarproduct_and_madd_int32_c(int16_t *v1, const int32_t *v2, { int res = 0; -while (order--) { +do { +res += *v1 * *v2++; +*v1++ += mul * *v3++; res += *v1 * *v2++; *v1++ += mul * *v3++; -} +} while (order-=2); return res; } -- 2.8.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 0/4] wmalossless: fix 16bits speed regression v3
Due to the changes to the cascaded LMS coefficients, most of the code needed a rewrite. In particular, the SSE4 madd32 code is no longer that similar to be shared inside a macro. Christophe Gisquet (4): fate: wma: add lossless 24bits test wmalossless: allow calling madd_int16 x86: lossless audio: SSE4 madd 32bits lossless audio dsp: unroll libavcodec/lossless_audiodsp.c | 12 ++-- libavcodec/wmalosslessdec.c | 109 +--- libavcodec/x86/lossless_audiodsp.asm| 33 ++ libavcodec/x86/lossless_audiodsp_init.c | 7 ++ tests/fate/lossless-audio.mak | 5 +- tests/ref/fate/lossless-wma24-1 | 1 + tests/ref/fate/lossless-wma24-2 | 1 + 7 files changed, 112 insertions(+), 56 deletions(-) create mode 100644 tests/ref/fate/lossless-wma24-1 create mode 100644 tests/ref/fate/lossless-wma24-2 -- 2.8.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 1/5] fate: wma: add lossless 24bits test
--- tests/fate/lossless-audio.mak | 5 - tests/ref/fate/lossless-wma24-1 | 1 + tests/ref/fate/lossless-wma24-2 | 1 + 3 files changed, 6 insertions(+), 1 deletion(-) create mode 100644 tests/ref/fate/lossless-wma24-1 create mode 100644 tests/ref/fate/lossless-wma24-2 diff --git a/tests/fate/lossless-audio.mak b/tests/fate/lossless-audio.mak index 58641ab..dbd0c0e 100644 --- a/tests/fate/lossless-audio.mak +++ b/tests/fate/lossless-audio.mak @@ -25,8 +25,11 @@ fate-lossless-tta: CMD = crc -i $(TARGET_SAMPLES)/lossless-audio/inside.tta FATE_SAMPLES_LOSSLESS_AUDIO-$(call DEMDEC, TTA, TTA) += fate-lossless-tta-encrypted fate-lossless-tta-encrypted: CMD = crc -password ffmpeg -i $(TARGET_SAMPLES)/lossless-audio/encrypted.tta -FATE_SAMPLES_LOSSLESS_AUDIO-$(call DEMDEC, ASF, WMALOSSLESS) += fate-lossless-wma +FATE_SAMPLES_LOSSLESS_AUDIO-$(call DEMDEC, ASF, WMALOSSLESS) += fate-lossless-wma fate-lossless-wma24-1 fate-lossless-wma24-2 fate-lossless-wma: CMD = md5 -i $(TARGET_SAMPLES)/lossless-audio/luckynight-partial.wma -f s16le -frames 209 +fate-lossless-wma24-1: CMD = md5 -i $(TARGET_SAMPLES)/lossless-audio/master_audio_2.0_24bit.wma -f s24le +fate-lossless-wma24-2: CMD = md5 -i $(TARGET_SAMPLES)/lossless-audio/Mega_Weird_Audio_Test_24bit.wma -f s24le +fate-lossless-wmall: fate-lossless-wma fate-lossless-wma24-1 fate-lossless-wma24-2 FATE_SAMPLES_LOSSLESS_AUDIO += $(FATE_SAMPLES_LOSSLESS_AUDIO-yes) diff --git a/tests/ref/fate/lossless-wma24-1 b/tests/ref/fate/lossless-wma24-1 new file mode 100644 index 000..ddee31c --- /dev/null +++ b/tests/ref/fate/lossless-wma24-1 @@ -0,0 +1 @@ +9ade91f506bc025854f6ffea0d635bc6 diff --git a/tests/ref/fate/lossless-wma24-2 b/tests/ref/fate/lossless-wma24-2 new file mode 100644 index 000..5ebdfd1 --- /dev/null +++ b/tests/ref/fate/lossless-wma24-2 @@ -0,0 +1 @@ +908ec5c16f497bf7d5658d2689d125c8 -- 2.8.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 2/5] wmalossless: allow calling madd_int16
This is done by actually handling the cascaded LMS data as if it were int16_t, thus requiring switching at various locations the computations. --- libavcodec/wmalosslessdec.c | 146 +--- 1 file changed, 84 insertions(+), 62 deletions(-) diff --git a/libavcodec/wmalosslessdec.c b/libavcodec/wmalosslessdec.c index 9d56d97..f3a2217 100644 --- a/libavcodec/wmalosslessdec.c +++ b/libavcodec/wmalosslessdec.c @@ -147,9 +147,9 @@ typedef struct WmallDecodeCtx { int scaling; int coefsend; int bitsend; -DECLARE_ALIGNED(16, int32_t, coefs)[MAX_ORDER + WMALL_COEFF_PAD_SIZE/sizeof(int16_t)]; -DECLARE_ALIGNED(16, int32_t, lms_prevvalues)[MAX_ORDER * 2 + WMALL_COEFF_PAD_SIZE/sizeof(int16_t)]; -DECLARE_ALIGNED(16, int32_t, lms_updates)[MAX_ORDER * 2 + WMALL_COEFF_PAD_SIZE/sizeof(int16_t)]; +DECLARE_ALIGNED(16, int32_t, coefs)[MAX_ORDER + WMALL_COEFF_PAD_SIZE/sizeof(int32_t)]; +DECLARE_ALIGNED(16, int32_t, lms_prevvalues)[MAX_ORDER * 2 + WMALL_COEFF_PAD_SIZE/sizeof(int32_t)]; +DECLARE_ALIGNED(16, int32_t, lms_updates)[MAX_ORDER * 2 + WMALL_COEFF_PAD_SIZE/sizeof(int32_t)]; int recent; } cdlms[WMALL_MAX_CHANNELS][9]; @@ -458,6 +458,7 @@ static int decode_cdlms(WmallDecodeCtx *s) int cdlms_send_coef = get_bits1(>gb); for (c = 0; c < s->num_channels; c++) { +int shift = s->bits_per_sample > 16 ? 0 : 1; s->cdlms_ttl[c] = get_bits(>gb, 3) + 1; for (i = 0; i < s->cdlms_ttl[c]; i++) { s->cdlms[c][i].order = (get_bits(>gb, 7) + 1) * 8; @@ -495,14 +496,20 @@ static int decode_cdlms(WmallDecodeCtx *s) s->cdlms[c][i].bitsend = get_bitsz(>gb, cbits) + 2; shift_l = 32 - s->cdlms[c][i].bitsend; shift_r = 32 - s->cdlms[c][i].scaling - 2; +if (s->bits_per_sample > 16) { for (j = 0; j < s->cdlms[c][i].coefsend; j++) s->cdlms[c][i].coefs[j] = (get_bits(>gb, s->cdlms[c][i].bitsend) << shift_l) >> shift_r; +} else { +int16_t *ptr = (int16_t*)s->cdlms[c][i].coefs; +for (j = 0; j < s->cdlms[c][i].coefsend; j++) +ptr[j] = (get_bits(>gb, s->cdlms[c][i].bitsend) << shift_l) >> shift_r; +} } } for (i = 0; i < s->cdlms_ttl[c]; i++) -memset(s->cdlms[c][i].coefs + s->cdlms[c][i].order, +memset(s->cdlms[c][i].coefs + (s->cdlms[c][i].order >> shift), 0, WMALL_COEFF_PAD_SIZE); } @@ -694,32 +701,6 @@ static void revert_mclms(WmallDecodeCtx *s, int tile_size) } } -static void lms_update(WmallDecodeCtx *s, int ich, int ilms, int input) -{ -int recent = s->cdlms[ich][ilms].recent; -int range = 1 << s->bits_per_sample - 1; -int order = s->cdlms[ich][ilms].order; - -if (recent) -recent--; -else { -memcpy(s->cdlms[ich][ilms].lms_prevvalues + order, - s->cdlms[ich][ilms].lms_prevvalues, sizeof(*s->cdlms[ich][ilms].lms_prevvalues) * order); -memcpy(s->cdlms[ich][ilms].lms_updates + order, - s->cdlms[ich][ilms].lms_updates, sizeof(*s->cdlms[ich][ilms].lms_updates) * order); -recent = order - 1; -} - -s->cdlms[ich][ilms].lms_prevvalues[recent] = av_clip(input, -range, range - 1); -s->cdlms[ich][ilms].lms_updates[recent] = WMASIGN(input) * s->update_speed[ich]; - -s->cdlms[ich][ilms].lms_updates[recent + (order >> 4)] >>= 2; -s->cdlms[ich][ilms].lms_updates[recent + (order >> 3)] >>= 1; -s->cdlms[ich][ilms].recent = recent; -memset(s->cdlms[ich][ilms].lms_updates + recent + order, 0, - sizeof(s->cdlms[ich][ilms].lms_updates) - 4*(recent+order)); -} - static void use_high_update_speed(WmallDecodeCtx *s, int ich) { int ilms, recent, icoef; @@ -727,12 +708,16 @@ static void use_high_update_speed(WmallDecodeCtx *s, int ich) recent = s->cdlms[ich][ilms].recent; if (s->update_speed[ich] == 16) continue; -if (s->bV3RTM) { +if (s->bits_per_sample > 16) { +int32_t *updates = s->cdlms[ich][ilms].lms_updates; +if (s->bV3RTM) updates += recent; for (icoef = 0; icoef < s->cdlms[ich][ilms].order; icoef++) -s->cdlms[ich][ilms].lms_updates[icoef + recent] *= 2; +updates[icoef] *= 2; } else { +int16_t *updates = (int16_t *)s->cdlms[ich][ilms].lms_updates; +if (s->bV3RTM) updates += recent; for (icoef = 0; icoef < s->cdlms[ich][ilms].order; icoef++) -s->cdlms[ich][ilms].lms_updates[icoef] *= 2; +updates[icoef] *= 2; } } s->update_speed[ich] = 16; @@ -745,42 +730,76 @@ static void use_normal_update_speed(WmallDecodeCtx *s, int ich)
[FFmpeg-devel] [PATCH 5/5] wmalossless: silence a sample request
16bits samples with CDLMS orders of 8 are currently unsupported, but have never been encountered before. However, 8 seems to be the most frequent, if not the only order used for 24bits. In that case, the dsp functions are fine with handling order that are multiples of 8, so silence the warning. --- libavcodec/wmalosslessdec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/wmalosslessdec.c b/libavcodec/wmalosslessdec.c index f3a2217..83b3174 100644 --- a/libavcodec/wmalosslessdec.c +++ b/libavcodec/wmalosslessdec.c @@ -469,7 +469,7 @@ static int decode_cdlms(WmallDecodeCtx *s) s->cdlms[0][0].order = 0; return AVERROR_INVALIDDATA; } -if(s->cdlms[c][i].order & 8) { +if(s->cdlms[c][i].order & 8 && s->bits_per_sample == 16) { static int warned; if(!warned) avpriv_request_sample(s->avctx, "CDLMS of order %d", -- 2.8.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 3/5] x86: lossless audio: SSE4 madd 32bits
The unique user so far is wmalossless 24bits. The few samples tested show an order of 8, so more unrolling or an avx2 version do not make sense. Timings: 72 -> 49 cycles --- libavcodec/x86/lossless_audiodsp.asm| 31 +-- libavcodec/x86/lossless_audiodsp_init.c | 7 +++ 2 files changed, 32 insertions(+), 6 deletions(-) diff --git a/libavcodec/x86/lossless_audiodsp.asm b/libavcodec/x86/lossless_audiodsp.asm index 5597dad..d00869b 100644 --- a/libavcodec/x86/lossless_audiodsp.asm +++ b/libavcodec/x86/lossless_audiodsp.asm @@ -22,13 +22,17 @@ SECTION .text -%macro SCALARPRODUCT 0 +%macro SCALARPRODUCT 1 ; int ff_scalarproduct_and_madd_int16(int16_t *v1, int16_t *v2, int16_t *v3, ; int order, int mul) -cglobal scalarproduct_and_madd_int16, 4,4,8, v1, v2, v3, order, mul -shl orderq, 1 +; int ff_scalarproduct_and_madd_int32(int32_t *v1, int32_t *v2, int32_t *v3, +; int order, int mul) +cglobal scalarproduct_and_madd_int %+ %1, 4,4,8, v1, v2, v3, order, mul +shl orderq, (%1/16) movdm7, mulm -%if mmsize == 16 +%if %1 == 32 +SPLATD m7 +%elif mmsize == 16 pshuflw m7, m7, 0 punpcklqdq m7, m7 %else @@ -46,14 +50,26 @@ cglobal scalarproduct_and_madd_int16, 4,4,8, v1, v2, v3, order, mul movam5, [v1q + orderq + mmsize] movum2, [v3q + orderq] movum3, [v3q + orderq + mmsize] +%if %1 == 32 +pmulld m0, m4 +pmulld m1, m5 +pmulld m2, m7 +pmulld m3, m7 +%else pmaddwd m0, m4 pmaddwd m1, m5 pmullw m2, m7 pmullw m3, m7 +%endif paddd m6, m0 paddd m6, m1 +%if %1 == 32 +paddd m2, m4 +paddd m3, m5 +%else paddw m2, m4 paddw m3, m5 +%endif mova[v1q + orderq], m2 mova[v1q + orderq + mmsize], m3 add orderq, mmsize*2 @@ -64,9 +80,12 @@ cglobal scalarproduct_and_madd_int16, 4,4,8, v1, v2, v3, order, mul %endmacro INIT_MMX mmxext -SCALARPRODUCT +SCALARPRODUCT 16 INIT_XMM sse2 -SCALARPRODUCT +SCALARPRODUCT 16 + +INIT_XMM sse4 +SCALARPRODUCT 32 %macro SCALARPRODUCT_LOOP 1 align 16 diff --git a/libavcodec/x86/lossless_audiodsp_init.c b/libavcodec/x86/lossless_audiodsp_init.c index 197173c..85306cb 100644 --- a/libavcodec/x86/lossless_audiodsp_init.c +++ b/libavcodec/x86/lossless_audiodsp_init.c @@ -31,6 +31,10 @@ int32_t ff_scalarproduct_and_madd_int16_ssse3(int16_t *v1, const int16_t *v2, const int16_t *v3, int order, int mul); +int32_t ff_scalarproduct_and_madd_int32_sse4(int32_t *v1, const int32_t *v2, + const int32_t *v3, + int order, int mul); + av_cold void ff_llauddsp_init_x86(LLAudDSPContext *c) { #if HAVE_YASM @@ -45,5 +49,8 @@ av_cold void ff_llauddsp_init_x86(LLAudDSPContext *c) if (EXTERNAL_SSSE3(cpu_flags) && !(cpu_flags & (AV_CPU_FLAG_SSE42 | AV_CPU_FLAG_3DNOW))) // cachesplit c->scalarproduct_and_madd_int16 = ff_scalarproduct_and_madd_int16_ssse3; + +if (EXTERNAL_SSE4(cpu_flags)) +c->scalarproduct_and_madd_int32 = ff_scalarproduct_and_madd_int32_sse4; #endif } -- 2.8.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 4/5] lossless audio dsp: unroll
The loops are guaranteed to be at least multiples of 8, so this unrolling is safe but allows exploiting execution ports. For int32 version: 72 -> 57c. --- libavcodec/lossless_audiodsp.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/libavcodec/lossless_audiodsp.c b/libavcodec/lossless_audiodsp.c index 55495d0..17a61cd 100644 --- a/libavcodec/lossless_audiodsp.c +++ b/libavcodec/lossless_audiodsp.c @@ -29,10 +29,12 @@ static int32_t scalarproduct_and_madd_int16_c(int16_t *v1, const int16_t *v2, { int res = 0; -while (order--) { +do { res += *v1 * *v2++; *v1++ += mul * *v3++; -} +res += *v1 * *v2++; +*v1++ += mul * *v3++; +} while (order-=2); return res; } @@ -42,10 +44,12 @@ static int32_t scalarproduct_and_madd_int32_c(int32_t *v1, const int32_t *v2, { int res = 0; -while (order--) { +do { +res += *v1 * *v2++; +*v1++ += mul * *v3++; res += *v1 * *v2++; *v1++ += mul * *v3++; -} +} while (order-=2); return res; } -- 2.8.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 0/5] wmalossless: fix 16bits speed regression v2
Patch 2 is the squashing of several previous commits, as there were no opinion on their contents nor the way to go. The SSE4 one is the final version from its last thread. The last patch in this set is new, and silences a warning that's only meaningful for 16bits content. Christophe Gisquet (5): fate: wma: add lossless 24bits test wmalossless: allow calling madd_int16 x86: lossless audio: SSE4 madd 32bits lossless audio dsp: unroll wmalossless: silence a sample request libavcodec/lossless_audiodsp.c | 12 ++- libavcodec/wmalosslessdec.c | 148 ++-- libavcodec/x86/lossless_audiodsp.asm| 31 +-- libavcodec/x86/lossless_audiodsp_init.c | 7 ++ tests/fate/lossless-audio.mak | 5 +- tests/ref/fate/lossless-wma24-1 | 1 + tests/ref/fate/lossless-wma24-2 | 1 + 7 files changed, 131 insertions(+), 74 deletions(-) create mode 100644 tests/ref/fate/lossless-wma24-1 create mode 100644 tests/ref/fate/lossless-wma24-2 -- 2.8.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 0/6] wmalossless: fix 16bits speed regression
2016-04-29 10:50 GMT+02:00 Paul B Mahol: > Should be OK if it doesn't break anything. I'll resend the current state of this patchset for easier testing & applying. Michael ran this under valgrind with nothing popping up, and fate passes. I think the remaining thing is: is the first version (with if inside loops) preferred over the second version (macroing to reduce such ifs)? -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 5/6] x86: lossless audio: SSE4 madd 32bits
Hi, 2016-04-20 2:01 GMT+02:00 Ronald S. Bultje: > This is typically only an issue if the data came from stack. On win64 as > well as unix64, the 4th argument never comes from stack but is a direct > register argument instead. So no benefit except consistency. I don't mind either way, though. On the other hand, this hand-coded function improves is only a slight improvement over gcc's vectorized code, and only because it does a poor job of it. Probably because the order is small (8) and gcc does not have enough info on data. So, it's written, but it's not very beneficial. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 5/6] x86: lossless audio: SSE4 madd 32bits
2016-04-18 21:18 GMT+02:00 Michael Niedermayer <mich...@niedermayer.cc>: > this breaks (only noise) > \[CCCP\]_Mega_Weird_Audio_Test.mkv track 23 Worthwhile sample. I rewrote the patch to reduce code duplication, and I fixed the issue (misread a shift). -- Christophe From a0d4a96c032d73bc0e34fec320497aefafba3c28 Mon Sep 17 00:00:00 2001 From: Christophe Gisquet <christophe.gisq...@gmail.com> Date: Mon, 18 Apr 2016 13:20:07 +0200 Subject: [PATCH 5/7] x86: lossless audio: SSE4 madd 32bits The unique user so far is wmalossless 24bits. The few samples tested show an order of 8, so more unrolling or an avx2 version do not make sense. Timings: 72 -> 49 cycles --- libavcodec/x86/lossless_audiodsp.asm| 31 +-- libavcodec/x86/lossless_audiodsp_init.c | 7 +++ 2 files changed, 32 insertions(+), 6 deletions(-) diff --git a/libavcodec/x86/lossless_audiodsp.asm b/libavcodec/x86/lossless_audiodsp.asm index 5597dad..d00869b 100644 --- a/libavcodec/x86/lossless_audiodsp.asm +++ b/libavcodec/x86/lossless_audiodsp.asm @@ -22,13 +22,17 @@ SECTION .text -%macro SCALARPRODUCT 0 +%macro SCALARPRODUCT 1 ; int ff_scalarproduct_and_madd_int16(int16_t *v1, int16_t *v2, int16_t *v3, ; int order, int mul) -cglobal scalarproduct_and_madd_int16, 4,4,8, v1, v2, v3, order, mul -shl orderq, 1 +; int ff_scalarproduct_and_madd_int32(int32_t *v1, int32_t *v2, int32_t *v3, +; int order, int mul) +cglobal scalarproduct_and_madd_int %+ %1, 4,4,8, v1, v2, v3, order, mul +shl orderq, (%1/16) movdm7, mulm -%if mmsize == 16 +%if %1 == 32 +SPLATD m7 +%elif mmsize == 16 pshuflw m7, m7, 0 punpcklqdq m7, m7 %else @@ -46,14 +50,26 @@ cglobal scalarproduct_and_madd_int16, 4,4,8, v1, v2, v3, order, mul movam5, [v1q + orderq + mmsize] movum2, [v3q + orderq] movum3, [v3q + orderq + mmsize] +%if %1 == 32 +pmulld m0, m4 +pmulld m1, m5 +pmulld m2, m7 +pmulld m3, m7 +%else pmaddwd m0, m4 pmaddwd m1, m5 pmullw m2, m7 pmullw m3, m7 +%endif paddd m6, m0 paddd m6, m1 +%if %1 == 32 +paddd m2, m4 +paddd m3, m5 +%else paddw m2, m4 paddw m3, m5 +%endif mova[v1q + orderq], m2 mova[v1q + orderq + mmsize], m3 add orderq, mmsize*2 @@ -64,9 +80,12 @@ cglobal scalarproduct_and_madd_int16, 4,4,8, v1, v2, v3, order, mul %endmacro INIT_MMX mmxext -SCALARPRODUCT +SCALARPRODUCT 16 INIT_XMM sse2 -SCALARPRODUCT +SCALARPRODUCT 16 + +INIT_XMM sse4 +SCALARPRODUCT 32 %macro SCALARPRODUCT_LOOP 1 align 16 diff --git a/libavcodec/x86/lossless_audiodsp_init.c b/libavcodec/x86/lossless_audiodsp_init.c index 197173c..85306cb 100644 --- a/libavcodec/x86/lossless_audiodsp_init.c +++ b/libavcodec/x86/lossless_audiodsp_init.c @@ -31,6 +31,10 @@ int32_t ff_scalarproduct_and_madd_int16_ssse3(int16_t *v1, const int16_t *v2, const int16_t *v3, int order, int mul); +int32_t ff_scalarproduct_and_madd_int32_sse4(int32_t *v1, const int32_t *v2, + const int32_t *v3, + int order, int mul); + av_cold void ff_llauddsp_init_x86(LLAudDSPContext *c) { #if HAVE_YASM @@ -45,5 +49,8 @@ av_cold void ff_llauddsp_init_x86(LLAudDSPContext *c) if (EXTERNAL_SSSE3(cpu_flags) && !(cpu_flags & (AV_CPU_FLAG_SSE42 | AV_CPU_FLAG_3DNOW))) // cachesplit c->scalarproduct_and_madd_int16 = ff_scalarproduct_and_madd_int16_ssse3; + +if (EXTERNAL_SSE4(cpu_flags)) +c->scalarproduct_and_madd_int32 = ff_scalarproduct_and_madd_int32_sse4; #endif } -- 2.8.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/6] fate: wma: add lossless 24bits test
2016-04-18 22:22 GMT+02:00 Christophe Gisquet <christophe.gisq...@gmail.com>: > 2016-04-18 19:11 GMT+02:00 James Almer <jamr...@gmail.com>: >> No way to create one using existing 24bit audio currently available in fate >> or any redistributable 24 audio out there? >> There are some dts-ma and truehd multichannel samples that are not sine >> waves. > > You're right. Just did that, except the encoder doesn't like the 7.1 > configuration. Except this nice true 24bits sample didn't exhibit the issue found by Michael (where the audio is coded as 24bits, but probably isn't). Extracting a few wma packets from thefile is sufficient to show the issue, generating a file that's 30KB. Because of this, I would favour this sample over a true 24 bits sample. Another option would be to add those 2 tests. Opinions? -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/6] fate: wma: add lossless 24bits test
Hi, 2016-04-18 19:11 GMT+02:00 James Almer: > No way to create one using existing 24bit audio currently available in fate > or any redistributable 24 audio out there? > There are some dts-ma and truehd multichannel samples that are not sine waves. You're right. Just did that, except the encoder doesn't like the 7.1 configuration. Because we're not testing the wma rematrixing, I changed that to stereo: -i dts/master_audio_7.1_24bit.dts -af 'pan=stereo:c0=FL:c1=FR' -acodec pcm_s24le The test file comes at 1.6MB, but fewer samples could be encoded. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/6] wmalossless: allow calling madd_int16
Hi, 2016-04-18 20:09 GMT+02:00 Michael Niedermayer <mich...@niedermayer.cc>: > On Mon, Apr 18, 2016 at 03:07:27PM +0200, Christophe Gisquet wrote: >> This is done by actually handling the cascaded LMS data as if it >> were int16_t, thus requiring switching at various locations the >> computations. >> --- >> libavcodec/wmalosslessdec.c | 61 >> + >> 1 file changed, 61 insertions(+) > > this causes a few new warnings Yeah, I focused on the macro'ed version. If this one is favoured instead, then I'll come back to it. Otherwise, I'll squash the 3 patches, probably. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 6/6] lossless audio dsp: unroll
2016-04-18 19:15 GMT+02:00 James Almer <jamr...@gmail.com>: > On 4/18/2016 10:07 AM, Christophe Gisquet wrote: >> The loops are guaranteed to be at least multiples of 8, so this >> unrolling is safe but allows exploiting execution ports. >> >> For int32 version: 72 -> 57c. > > What compiler are you using, and what cpu at configure time? gcc 5.1, Win64, haswell. I don't use mingw64 compiler. > We're currently enabling tree vectorization for gcc 4.9 or newer on x86, > and at least with gcc 5.3.0 on mingw-w64 the resulting code now seems worse. > I didn't bench it, but after this patch it's not being vectorized anymore. The code I benchmarked as being 72c is vectorized and keeps being vectorized here. It actually looks better than the previously vectorized one. The 16_c version is no longer vectorized, but is really a mess here when vectorized. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/6] fate: wma: add lossless 24bits test
2016-04-18 18:39 GMT+02:00 Paul B Mahol: > Better to have real 24bit content. Yeah, my point, but I'm not sure we'll get one redistribuable in fate, eg by pinging people from the various tickets. And when would we decide this is better than nothing? -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/6] fate: wma: add lossless 24bits test
2016-04-18 15:07 GMT+02:00 Christophe Gisquet <christophe.gisq...@gmail.com>: > +fate-lossless-wma24: CMD = md5 -i > $(TARGET_SAMPLES)/lossless-audio/luckynight-partial-24.wma -f s24le -frames > 209 Btw, this is the regular luckynight whose samples have been shifted into 24 bits. Another type of bitdepth increase would be nice, but I haven't looked for it. For some reason, the default GUI doesn't let me select the bitdepth and downshifts to 16, so I had to resort to some command-line encoder. The sample is 1MB, and hasn't been uploaded yet. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 6/6] lossless audio dsp: unroll
The loops are guaranteed to be at least multiples of 8, so this unrolling is safe but allows exploiting execution ports. For int32 version: 72 -> 57c. --- libavcodec/lossless_audiodsp.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/libavcodec/lossless_audiodsp.c b/libavcodec/lossless_audiodsp.c index 55495d0..17a61cd 100644 --- a/libavcodec/lossless_audiodsp.c +++ b/libavcodec/lossless_audiodsp.c @@ -29,10 +29,12 @@ static int32_t scalarproduct_and_madd_int16_c(int16_t *v1, const int16_t *v2, { int res = 0; -while (order--) { +do { res += *v1 * *v2++; *v1++ += mul * *v3++; -} +res += *v1 * *v2++; +*v1++ += mul * *v3++; +} while (order-=2); return res; } @@ -42,10 +44,12 @@ static int32_t scalarproduct_and_madd_int32_c(int32_t *v1, const int32_t *v2, { int res = 0; -while (order--) { +do { +res += *v1 * *v2++; +*v1++ += mul * *v3++; res += *v1 * *v2++; *v1++ += mul * *v3++; -} +} while (order-=2); return res; } -- 2.8.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 4/6] wmalossless: template code to remove inloop if
Code size increase is minimal. --- libavcodec/wmalosslessdec.c | 140 ++-- 1 file changed, 57 insertions(+), 83 deletions(-) diff --git a/libavcodec/wmalosslessdec.c b/libavcodec/wmalosslessdec.c index 77017ff..27510d4 100644 --- a/libavcodec/wmalosslessdec.c +++ b/libavcodec/wmalosslessdec.c @@ -759,90 +759,61 @@ static void use_normal_update_speed(WmallDecodeCtx *s, int ich) s->update_speed[ich] = 8; } -static void lms_update(WmallDecodeCtx *s, int ich, int ilms, int input) -{ -int recent = s->cdlms[ich][ilms].recent; -int range = 1 << s->bits_per_sample - 1; -int order = s->cdlms[ich][ilms].order; - -if (s->bits_per_sample > 16) { -if (recent) -recent--; -else { -memcpy(s->cdlms[ich][ilms].lms_prevvalues + order, - s->cdlms[ich][ilms].lms_prevvalues, sizeof(*s->cdlms[ich][ilms].lms_prevvalues) * order); -memcpy(s->cdlms[ich][ilms].lms_updates + order, - s->cdlms[ich][ilms].lms_updates, sizeof(*s->cdlms[ich][ilms].lms_updates) * order); -recent = order - 1; -} - -s->cdlms[ich][ilms].lms_prevvalues[recent] = av_clip(input, -range, range - 1); -s->cdlms[ich][ilms].lms_updates[recent] = WMASIGN(input) * s->update_speed[ich]; - -s->cdlms[ich][ilms].lms_updates[recent + (order >> 4)] >>= 2; -s->cdlms[ich][ilms].lms_updates[recent + (order >> 3)] >>= 1; -s->cdlms[ich][ilms].recent = recent; -memset(s->cdlms[ich][ilms].lms_updates + recent + order, 0, - sizeof(s->cdlms[ich][ilms].lms_updates) - 4*(recent+order)); -} else { -int16_t *prevvalues = s->cdlms[ich][ilms].lms_prevvalues; -int16_t *updates= s->cdlms[ich][ilms].lms_updates; -if (recent) -recent--; -else { -memcpy(prevvalues + order, prevvalues, 2 * order); -memcpy(updates + order, updates, 2 * order); -recent = order - 1; -} - -prevvalues[recent] = av_clip(input, -range, range - 1); -updates[recent] = WMASIGN(input) * s->update_speed[ich]; - -updates[recent + (order >> 4)] >>= 2; -updates[recent + (order >> 3)] >>= 1; -s->cdlms[ich][ilms].recent = recent; -memset(updates + recent + order, 0, - sizeof(s->cdlms[ich][ilms].lms_updates) - 2*(recent+order)); -} +#define CD_LMS(bits, ROUND) \ +static void lms_update ## bits (WmallDecodeCtx *s, int ich, int ilms, int input) \ +{ \ +int recent = s->cdlms[ich][ilms].recent; \ +int range = 1 << s->bits_per_sample - 1; \ +int order = s->cdlms[ich][ilms].order; \ +int ##bits##_t *prev = (int##bits##_t *)s->cdlms[ich][ilms].lms_prevvalues; \ +int ##bits##_t *upd = (int##bits##_t *)s->cdlms[ich][ilms].lms_updates; \ + \ +if (recent) \ +recent--; \ +else { \ +memcpy(prev + order, prev, (bits/8) * order); \ +memcpy(upd + order, upd, (bits/8) * order); \ +recent = order - 1; \ +} \ + \ +prev[recent] = av_clip(input, -range, range - 1); \ +upd[recent] = WMASIGN(input) * s->update_speed[ich]; \ + \ +upd[recent + (order >> 4)] >>= 2; \ +upd[recent + (order >> 3)] >>= 1; \ +s->cdlms[ich][ilms].recent = recent; \ +memset(upd + recent + order, 0, \ + sizeof(s->cdlms[ich][ilms].lms_updates) - (bits/8)*(recent+order)); \ +} \ + \ +static void revert_cdlms ## bits (WmallDecodeCtx *s, int ch, \ + int coef_begin, int coef_end) \ +{ \ +int icoef, pred, ilms, num_lms, residue, input; \ + \ +num_lms = s->cdlms_ttl[ch]; \ +for (ilms = num_lms - 1; ilms >= 0; ilms--) { \ +for (icoef = coef_begin; icoef < coef_end; icoef++) { \ +int##bits##_t *coeffs = (int##bits##_t *)s->cdlms[ch][ilms].coefs; \ +int##bits##_t *prevvalues = (int##bits##_t *)s->cdlms[ch][ilms].lms_prevvalues; \ +int##bits##_t *updates = (int##bits##_t *)s->cdlms[ch][ilms].lms_updates; \ +pred = 1 << (s->cdlms[ch][ilms].scaling - 1); \ +residue = s->channel_residues[ch][icoef]; \ +pred += s->dsp.scalarproduct_and_madd_int## bits (coeffs, \ +prevvalues + s->cdlms[ch][ilms].recent, \ +updates + s->cdlms[ch][ilms].recent, \ + FFALIGN(s->cdlms[ch][ilms].order, ROUND), \ +WMASIGN(residue)); \ +input = residue + (pred >> s->cdlms[ch][ilms].scaling); \ +lms_update ## bits(s, ch, ilms, input); \ +s->channel_residues[ch][icoef] = input; \ +} \ +} \ +emms_c(); \ } -static void revert_cdlms(WmallDecodeCtx *s, int ch, - int coef_begin, int coef_end) -{ -int icoef, pred, ilms, num_lms, residue, input; - -
[FFmpeg-devel] [PATCH 1/6] fate: wma: add lossless 24bits test
--- tests/fate/lossless-audio.mak | 4 +++- tests/ref/fate/lossless-wma24 | 1 + 2 files changed, 4 insertions(+), 1 deletion(-) create mode 100644 tests/ref/fate/lossless-wma24 diff --git a/tests/fate/lossless-audio.mak b/tests/fate/lossless-audio.mak index 58641ab..ccc4d00 100644 --- a/tests/fate/lossless-audio.mak +++ b/tests/fate/lossless-audio.mak @@ -25,8 +25,10 @@ fate-lossless-tta: CMD = crc -i $(TARGET_SAMPLES)/lossless-audio/inside.tta FATE_SAMPLES_LOSSLESS_AUDIO-$(call DEMDEC, TTA, TTA) += fate-lossless-tta-encrypted fate-lossless-tta-encrypted: CMD = crc -password ffmpeg -i $(TARGET_SAMPLES)/lossless-audio/encrypted.tta -FATE_SAMPLES_LOSSLESS_AUDIO-$(call DEMDEC, ASF, WMALOSSLESS) += fate-lossless-wma +FATE_SAMPLES_LOSSLESS_AUDIO-$(call DEMDEC, ASF, WMALOSSLESS) += fate-lossless-wma fate-lossless-wma24 fate-lossless-wma: CMD = md5 -i $(TARGET_SAMPLES)/lossless-audio/luckynight-partial.wma -f s16le -frames 209 +fate-lossless-wma24: CMD = md5 -i $(TARGET_SAMPLES)/lossless-audio/luckynight-partial-24.wma -f s24le -frames 209 +fate-lossless-wmal: fate-lossless-wma fate-lossless-wma24 FATE_SAMPLES_LOSSLESS_AUDIO += $(FATE_SAMPLES_LOSSLESS_AUDIO-yes) diff --git a/tests/ref/fate/lossless-wma24 b/tests/ref/fate/lossless-wma24 new file mode 100644 index 000..43862af --- /dev/null +++ b/tests/ref/fate/lossless-wma24 @@ -0,0 +1 @@ +e5aea78d60c407a88c4ff25994052b83 -- 2.8.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 2/6] wmalossless: allow calling madd_int16
This is done by actually handling the cascaded LMS data as if it were int16_t, thus requiring switching at various locations the computations. --- libavcodec/wmalosslessdec.c | 61 + 1 file changed, 61 insertions(+) diff --git a/libavcodec/wmalosslessdec.c b/libavcodec/wmalosslessdec.c index f7f249b..3885dc1 100644 --- a/libavcodec/wmalosslessdec.c +++ b/libavcodec/wmalosslessdec.c @@ -497,15 +497,29 @@ static int decode_cdlms(WmallDecodeCtx *s) s->cdlms[c][i].bitsend = get_bitsz(>gb, cbits) + 2; shift_l = 32 - s->cdlms[c][i].bitsend; shift_r = 32 - s->cdlms[c][i].scaling - 2; +if (s->bits_per_sample > 16) { for (j = 0; j < s->cdlms[c][i].coefsend; j++) s->cdlms[c][i].coefs[j] = (get_bits(>gb, s->cdlms[c][i].bitsend) << shift_l) >> shift_r; +} else { +for (j = 0; j < s->cdlms[c][i].coefsend; j++) { +int16_t *ptr = (int16_t*)s->cdlms[c][i].coefs; +ptr[j] = (get_bits(>gb, s->cdlms[c][i].bitsend) << shift_l) >> shift_r; +} +} } } +if (s->bits_per_sample > 16) { for (i = 0; i < s->cdlms_ttl[c]; i++) memset(s->cdlms[c][i].coefs + s->cdlms[c][i].order, 0, WMALL_COEFF_PAD_SIZE); +} else { +for (i = 0; i < s->cdlms_ttl[c]; i++) { +int16_t *ptr = (int16_t*)s->cdlms[c][i].coefs; +memset(ptr + s->cdlms[c][i].order, 0, 2*WMALL_COEFF_PAD_SIZE); +} +} } return 0; @@ -702,6 +716,7 @@ static void lms_update(WmallDecodeCtx *s, int ich, int ilms, int input) int range = 1 << s->bits_per_sample - 1; int order = s->cdlms[ich][ilms].order; +if (s->bits_per_sample > 16) { if (recent) recent--; else { @@ -720,6 +735,26 @@ static void lms_update(WmallDecodeCtx *s, int ich, int ilms, int input) s->cdlms[ich][ilms].recent = recent; memset(s->cdlms[ich][ilms].lms_updates + recent + order, 0, sizeof(s->cdlms[ich][ilms].lms_updates) - 4*(recent+order)); +} else { +int16_t *prevvalues = s->cdlms[ich][ilms].lms_prevvalues; +int16_t *updates= s->cdlms[ich][ilms].lms_updates; +if (recent) +recent--; +else { +memcpy(prevvalues + order, prevvalues, 2 * order); +memcpy(updates + order, updates, 2 * order); +recent = order - 1; +} + +prevvalues[recent] = av_clip(input, -range, range - 1); +updates[recent] = WMASIGN(input) * s->update_speed[ich]; + +updates[recent + (order >> 4)] >>= 2; +updates[recent + (order >> 3)] >>= 1; +s->cdlms[ich][ilms].recent = recent; +memset(updates + recent + order, 0, + sizeof(s->cdlms[ich][ilms].lms_updates) - 2*(recent+order)); +} } static void use_high_update_speed(WmallDecodeCtx *s, int ich) @@ -729,6 +764,7 @@ static void use_high_update_speed(WmallDecodeCtx *s, int ich) recent = s->cdlms[ich][ilms].recent; if (s->update_speed[ich] == 16) continue; +if (s->bits_per_sample > 16) { if (s->bV3RTM) { for (icoef = 0; icoef < s->cdlms[ich][ilms].order; icoef++) s->cdlms[ich][ilms].lms_updates[icoef + recent] *= 2; @@ -736,6 +772,12 @@ static void use_high_update_speed(WmallDecodeCtx *s, int ich) for (icoef = 0; icoef < s->cdlms[ich][ilms].order; icoef++) s->cdlms[ich][ilms].lms_updates[icoef] *= 2; } +} else { +int16_t *updates = (int16_t *)s->cdlms[ich][ilms].lms_updates; +if (s->bV3RTM) updates += recent; +for (icoef = 0; icoef < s->cdlms[ich][ilms].order; icoef++) +updates[icoef] *= 2; +} } s->update_speed[ich] = 16; } @@ -747,12 +789,19 @@ static void use_normal_update_speed(WmallDecodeCtx *s, int ich) recent = s->cdlms[ich][ilms].recent; if (s->update_speed[ich] == 8) continue; +if (s->bits_per_sample > 16) { if (s->bV3RTM) for (icoef = 0; icoef < s->cdlms[ich][ilms].order; icoef++) s->cdlms[ich][ilms].lms_updates[icoef + recent] /= 2; else for (icoef = 0; icoef < s->cdlms[ich][ilms].order; icoef++) s->cdlms[ich][ilms].lms_updates[icoef] /= 2; +} else { +int16_t *updates = (int16_t *)s->cdlms[ich][ilms].lms_updates; +if (s->bV3RTM) updates += recent; +for (icoef = 0; icoef < s->cdlms[ich][ilms].order; icoef++) +updates[icoef] /= 2; +} } s->update_speed[ich] = 8; } @@ -767,6 +816,7 @@ static void revert_cdlms(WmallDecodeCtx *s, int ch,
[FFmpeg-devel] [PATCH 5/6] x86: lossless audio: SSE4 madd 32bits
The unique user so far is wmalossless 24bits. The few samples tested show an order of 8, so more unrolling or an avx2 version do not make sense. Timings: 72 -> 49 cycles --- libavcodec/x86/lossless_audiodsp.asm| 38 + libavcodec/x86/lossless_audiodsp_init.c | 7 ++ 2 files changed, 45 insertions(+) diff --git a/libavcodec/x86/lossless_audiodsp.asm b/libavcodec/x86/lossless_audiodsp.asm index 5597dad..1e295de 100644 --- a/libavcodec/x86/lossless_audiodsp.asm +++ b/libavcodec/x86/lossless_audiodsp.asm @@ -155,3 +155,41 @@ SCALARPRODUCT_LOOP 0 HADDD m6, m0 movd eax, m6 RET + +%macro SCALARPRODUCT32 0 +; int ff_scalarproduct_and_madd_int16(int16_t *v1, int16_t *v2, int16_t *v3, +; int order, int mul) +cglobal scalarproduct_and_madd_int32, 4,4,8, v1, v2, v3, order, mul +movdm7, mulm +SPLATD m7 +pxorm6, m6 +add v1q, orderq +add v2q, orderq +add v3q, orderq +neg orderq +.loop: +movum0, [v2q + orderq] +movum1, [v2q + orderq + mmsize] +movam4, [v1q + orderq] +movam5, [v1q + orderq + mmsize] +movum2, [v3q + orderq] +movum3, [v3q + orderq + mmsize] +pmulld m0, m4 +pmulld m1, m5 +pmulld m2, m7 +pmulld m3, m7 +paddd m6, m0 +paddd m6, m1 +paddd m2, m4 +paddd m3, m5 +mova[v1q + orderq], m2 +mova[v1q + orderq + mmsize], m3 +add orderq, mmsize*2 +jl .loop +HADDD m6, m0 +movd eax, m6 +RET +%endmacro + +INIT_XMM sse4 +SCALARPRODUCT32 diff --git a/libavcodec/x86/lossless_audiodsp_init.c b/libavcodec/x86/lossless_audiodsp_init.c index 197173c..85306cb 100644 --- a/libavcodec/x86/lossless_audiodsp_init.c +++ b/libavcodec/x86/lossless_audiodsp_init.c @@ -31,6 +31,10 @@ int32_t ff_scalarproduct_and_madd_int16_ssse3(int16_t *v1, const int16_t *v2, const int16_t *v3, int order, int mul); +int32_t ff_scalarproduct_and_madd_int32_sse4(int32_t *v1, const int32_t *v2, + const int32_t *v3, + int order, int mul); + av_cold void ff_llauddsp_init_x86(LLAudDSPContext *c) { #if HAVE_YASM @@ -45,5 +49,8 @@ av_cold void ff_llauddsp_init_x86(LLAudDSPContext *c) if (EXTERNAL_SSSE3(cpu_flags) && !(cpu_flags & (AV_CPU_FLAG_SSE42 | AV_CPU_FLAG_3DNOW))) // cachesplit c->scalarproduct_and_madd_int16 = ff_scalarproduct_and_madd_int16_ssse3; + +if (EXTERNAL_SSE4(cpu_flags)) +c->scalarproduct_and_madd_int32 = ff_scalarproduct_and_madd_int32_sse4; #endif } -- 2.8.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 3/6] wmalossless pro: move lms_update
Cosmetics before macroing it and another function. --- libavcodec/wmalosslessdec.c | 94 ++--- 1 file changed, 47 insertions(+), 47 deletions(-) diff --git a/libavcodec/wmalosslessdec.c b/libavcodec/wmalosslessdec.c index 3885dc1..77017ff 100644 --- a/libavcodec/wmalosslessdec.c +++ b/libavcodec/wmalosslessdec.c @@ -710,53 +710,6 @@ static void revert_mclms(WmallDecodeCtx *s, int tile_size) } } -static void lms_update(WmallDecodeCtx *s, int ich, int ilms, int input) -{ -int recent = s->cdlms[ich][ilms].recent; -int range = 1 << s->bits_per_sample - 1; -int order = s->cdlms[ich][ilms].order; - -if (s->bits_per_sample > 16) { -if (recent) -recent--; -else { -memcpy(s->cdlms[ich][ilms].lms_prevvalues + order, - s->cdlms[ich][ilms].lms_prevvalues, sizeof(*s->cdlms[ich][ilms].lms_prevvalues) * order); -memcpy(s->cdlms[ich][ilms].lms_updates + order, - s->cdlms[ich][ilms].lms_updates, sizeof(*s->cdlms[ich][ilms].lms_updates) * order); -recent = order - 1; -} - -s->cdlms[ich][ilms].lms_prevvalues[recent] = av_clip(input, -range, range - 1); -s->cdlms[ich][ilms].lms_updates[recent] = WMASIGN(input) * s->update_speed[ich]; - -s->cdlms[ich][ilms].lms_updates[recent + (order >> 4)] >>= 2; -s->cdlms[ich][ilms].lms_updates[recent + (order >> 3)] >>= 1; -s->cdlms[ich][ilms].recent = recent; -memset(s->cdlms[ich][ilms].lms_updates + recent + order, 0, - sizeof(s->cdlms[ich][ilms].lms_updates) - 4*(recent+order)); -} else { -int16_t *prevvalues = s->cdlms[ich][ilms].lms_prevvalues; -int16_t *updates= s->cdlms[ich][ilms].lms_updates; -if (recent) -recent--; -else { -memcpy(prevvalues + order, prevvalues, 2 * order); -memcpy(updates + order, updates, 2 * order); -recent = order - 1; -} - -prevvalues[recent] = av_clip(input, -range, range - 1); -updates[recent] = WMASIGN(input) * s->update_speed[ich]; - -updates[recent + (order >> 4)] >>= 2; -updates[recent + (order >> 3)] >>= 1; -s->cdlms[ich][ilms].recent = recent; -memset(updates + recent + order, 0, - sizeof(s->cdlms[ich][ilms].lms_updates) - 2*(recent+order)); -} -} - static void use_high_update_speed(WmallDecodeCtx *s, int ich) { int ilms, recent, icoef; @@ -806,6 +759,53 @@ static void use_normal_update_speed(WmallDecodeCtx *s, int ich) s->update_speed[ich] = 8; } +static void lms_update(WmallDecodeCtx *s, int ich, int ilms, int input) +{ +int recent = s->cdlms[ich][ilms].recent; +int range = 1 << s->bits_per_sample - 1; +int order = s->cdlms[ich][ilms].order; + +if (s->bits_per_sample > 16) { +if (recent) +recent--; +else { +memcpy(s->cdlms[ich][ilms].lms_prevvalues + order, + s->cdlms[ich][ilms].lms_prevvalues, sizeof(*s->cdlms[ich][ilms].lms_prevvalues) * order); +memcpy(s->cdlms[ich][ilms].lms_updates + order, + s->cdlms[ich][ilms].lms_updates, sizeof(*s->cdlms[ich][ilms].lms_updates) * order); +recent = order - 1; +} + +s->cdlms[ich][ilms].lms_prevvalues[recent] = av_clip(input, -range, range - 1); +s->cdlms[ich][ilms].lms_updates[recent] = WMASIGN(input) * s->update_speed[ich]; + +s->cdlms[ich][ilms].lms_updates[recent + (order >> 4)] >>= 2; +s->cdlms[ich][ilms].lms_updates[recent + (order >> 3)] >>= 1; +s->cdlms[ich][ilms].recent = recent; +memset(s->cdlms[ich][ilms].lms_updates + recent + order, 0, + sizeof(s->cdlms[ich][ilms].lms_updates) - 4*(recent+order)); +} else { +int16_t *prevvalues = s->cdlms[ich][ilms].lms_prevvalues; +int16_t *updates= s->cdlms[ich][ilms].lms_updates; +if (recent) +recent--; +else { +memcpy(prevvalues + order, prevvalues, 2 * order); +memcpy(updates + order, updates, 2 * order); +recent = order - 1; +} + +prevvalues[recent] = av_clip(input, -range, range - 1); +updates[recent] = WMASIGN(input) * s->update_speed[ich]; + +updates[recent + (order >> 4)] >>= 2; +updates[recent + (order >> 3)] >>= 1; +s->cdlms[ich][ilms].recent = recent; +memset(updates + recent + order, 0, + sizeof(s->cdlms[ich][ilms].lms_updates) - 2*(recent+order)); +} +} + static void revert_cdlms(WmallDecodeCtx *s, int ch, int coef_begin, int coef_end) { -- 2.8.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 0/6] wmalossless: fix 16bits speed regression
I think only the 2 first patches are needed, but I prefer the code from the 3rd+4th patches. Overall, it's still not the nicest code, and valgrind-proofing the patchset is needed (not possible atm for me). The SSE4 implementation is not worthwhile in my opinion. Christophe Gisquet (6): fate: wma: add lossless 24bits test wmalossless: allow calling madd_int16 wmalossless pro: move lms_update wmalossless: template code to remove inloop if x86: lossless audio: SSE4 madd 32bits lossless audio dsp: unroll libavcodec/lossless_audiodsp.c | 12 ++- libavcodec/wmalosslessdec.c | 137 libavcodec/x86/lossless_audiodsp.asm| 38 + libavcodec/x86/lossless_audiodsp_init.c | 7 ++ tests/fate/lossless-audio.mak | 4 +- tests/ref/fate/lossless-wma24 | 1 + 6 files changed, 143 insertions(+), 56 deletions(-) create mode 100644 tests/ref/fate/lossless-wma24 -- 2.8.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avcodec/wmalosslessdec: real 24bit support
Hi, 2016-04-12 22:53 GMT+02:00 Paul B Mahol: > -LLAudDSPContext dsp; ///< accelerated And later: > +static int scalarproduct_and_madd_int(int *v1, const int *v2, > + const int *v3, > + int order, int mul) > +{ > +int res = 0; > + > +av_assert0(order >= 0); > +while (order--) { > +res += *v1 * *v2++; > +*v1++ += mul * *v3++; > +} > +return res; > +} As Hendrik said, please move it to LLAudDSPContext. On a side note, is this through RE or guess (I'm asking because the guess was not difficult) ? Because having an accumulator on 32 bits only may lead to overflows. > +int mclms_prevvalues[WMALL_MAX_CHANNELS * 2 * 32]; > +int mclms_updates[WMALL_MAX_CHANNELS * 2 * 32]; > int mclms_recent; > > int movave_scaling; > @@ -146,9 +144,9 @@ typedef struct WmallDecodeCtx { > int scaling; > int coefsend; > int bitsend; > -DECLARE_ALIGNED(16, int16_t, coefs)[MAX_ORDER + > WMALL_COEFF_PAD_SIZE/sizeof(int16_t)]; > -DECLARE_ALIGNED(16, int16_t, lms_prevvalues)[MAX_ORDER * 2 + > WMALL_COEFF_PAD_SIZE/sizeof(int16_t)]; > -DECLARE_ALIGNED(16, int16_t, lms_updates)[MAX_ORDER * 2 + > WMALL_COEFF_PAD_SIZE/sizeof(int16_t)]; > +int coefs[MAX_ORDER + WMALL_COEFF_PAD_SIZE]; > +int lms_prevvalues[MAX_ORDER * 2 + WMALL_COEFF_PAD_SIZE]; > +int lms_updates[MAX_ORDER * 2 + WMALL_COEFF_PAD_SIZE]; I prefer int32_t just because it's something to dspize. Plus at some point someone would have to redo the alignment. > - sizeof(int16_t) * order * num_channels); > + sizeof(int) * order * num_channels); The format has the bitdepth stored and put into s->bits_per_sample. The decoder actually uses it to select how to store the samples later on. In any such case, this should be dynamic. Either you use int size = s->bits_per_sample>16 ? 4 : 2 (because sizeof(int16_t isn't going to change much...) Or FFALIGN(s->bits_per_sample>>3, 2)? Whatever floats your boat. > -pred += > s->dsp.scalarproduct_and_madd_int16(s->cdlms[ch][ilms].coefs, > - > s->cdlms[ch][ilms].lms_prevvalues > -+ > s->cdlms[ch][ilms].recent, > - > s->cdlms[ch][ilms].lms_updates > -+ > s->cdlms[ch][ilms].recent, > - > FFALIGN(s->cdlms[ch][ilms].order, > - > WMALL_COEFF_PAD_SIZE), > -WMASIGN(residue)); > +pred += scalarproduct_and_madd_int(s->cdlms[ch][ilms].coefs, > + > s->cdlms[ch][ilms].lms_prevvalues > + + > s->cdlms[ch][ilms].recent, > + s->cdlms[ch][ilms].lms_updates > + + > s->cdlms[ch][ilms].recent, > + s->cdlms[ch][ilms].order, > + WMASIGN(residue)); And then here: - switch based on bitdepth (the needed 'if' wouldn't be the end of the world but it's not actually needed); - or use a function pointer in the context For the later point, unless going through a proxy, it may, obviously, look like: int (*scalarproduct_and_madd_int)(void *v1, const void *v2, const void *v3, int order, int mul) but there might be compilation warning on call or setting the variable. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/3] configure: Force mingw's ld to keep the reloc section
Hi, 2016-03-19 19:08 GMT+01:00 Ismail Donmez <ism...@i10z.com>: >> 2016-03-11 8:57 GMT+01:00 Christophe Gisquet <christophe.gisq...@gmail.com>: >>>> It should either be reverted or made dependent on >>>> --enable/disable-debug (I would favor the first, honestly, since its a >>>> rather ugly hack in itself). [...] >> I don't have a strong opinion on actually reverting it, but would lean for >> it. > > Please don't disable it for release builds, it improves the security > of the resulting executable. I understand the sentiment, and there's probably little lost in keeping it, but... is it not a hack? ie: - When do you notice the added security is no longer there/it breaks in even worse ways? - Who is and would be available and able to prevent it from breaking? Because it already has, and almost nobody dealt with it. The original author already did well in reporting the issue to binutils, so I'm certainly not complaining about his efforts. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/3] configure: Force mingw's ld to keep the reloc section
2016-03-19 18:15 GMT+01:00 Hendrik Leppkes <h.lepp...@gmail.com>: > The same would need to be applied for the 32-bit case as well, fwiw. I didn't have a build environment to assert that, but now I do, and I confirm it is needed. Patch updated. -- Christophe From 87e4f2a42bdb5f733d104ffba7cf70f786b72a03 Mon Sep 17 00:00:00 2001 From: Christophe Gisquet <christophe.gisq...@gmail.com> Date: Sat, 19 Mar 2016 14:45:23 +0100 Subject: [PATCH] mingw64: configure: disable pie with debug enabled This breaks use of gdb. --- configure | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/configure b/configure index e5de306..9ccd727 100755 --- a/configure +++ b/configure @@ -4604,10 +4604,14 @@ case $target_os in # for dynamicbase (ASLR). Using -pie does retain the reloc section # however ld then forgets what the entry point should be (oops) so we # have to manually (re)set it. -if enabled x86_32; then +# However, adding -pie breaks debugging through gdb at least under +# mingw, so let's not do this when debugging has been enabled. +if enabled x86_32 && disabled debug; then add_ldexeflags -Wl,--pic-executable,-e,_mainCRTStartup elif enabled x86_64; then -add_ldexeflags -Wl,--pic-executable,-e,mainCRTStartup +if disabled debug; then +add_ldexeflags -Wl,--pic-executable,-e,mainCRTStartup +fi check_ldflags -Wl,--high-entropy-va # binutils 2.25 # Set image base >4GB for extra entropy with HEASLR add_ldexeflags -Wl,--image-base,0x14000 -- 2.7.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/3] configure: Force mingw's ld to keep the reloc section
Hi, 2016-03-11 8:57 GMT+01:00 Christophe Gisquet <christophe.gisq...@gmail.com>: >> It should either be reverted or made dependent on >> --enable/disable-debug (I would favor the first, honestly, since its a >> rather ugly hack in itself). > > At the very least, that dependence is needed, yes. And here's something for the smallest reasonable change. I don't have a strong opinion on actually reverting it, but would lean for it. -- Christophe From 429a47f83d2d262a3392af34b13dbf14c735c8b9 Mon Sep 17 00:00:00 2001 From: Christophe Gisquet <christophe.gisq...@gmail.com> Date: Sat, 19 Mar 2016 14:45:23 +0100 Subject: [PATCH] mingw64: configure: disable pie with debug enabled This breaks use of gdb. --- configure | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/configure b/configure index e5de306..e5befa0 100755 --- a/configure +++ b/configure @@ -4604,10 +4604,14 @@ case $target_os in # for dynamicbase (ASLR). Using -pie does retain the reloc section # however ld then forgets what the entry point should be (oops) so we # have to manually (re)set it. +# However, adding -pie breaks debugging through gdb at least under +# mingw, so let's not do this when debugging has been enabled. if enabled x86_32; then add_ldexeflags -Wl,--pic-executable,-e,_mainCRTStartup elif enabled x86_64; then -add_ldexeflags -Wl,--pic-executable,-e,mainCRTStartup +if disabled debug; then +add_ldexeflags -Wl,--pic-executable,-e,mainCRTStartup +fi check_ldflags -Wl,--high-entropy-va # binutils 2.25 # Set image base >4GB for extra entropy with HEASLR add_ldexeflags -Wl,--image-base,0x14000 -- 2.7.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/3] configure: Force mingw's ld to keep the reloc section
Hi, 2016-03-10 19:57 GMT+01:00 Hendrik Leppkes: > This patch (the relocations part) broke debugging mingw-w64 ffmpeg > builds with gdb, you can't set breakpoints anymore when its applied. That issue prevented me to do anything interesting for ffmpeg since then, thinking it was a toolchain issue. I've lost considerable ffmpeg-time (and actual code) over it. > It should either be reverted or made dependent on > --enable/disable-debug (I would favor the first, honestly, since its a > rather ugly hack in itself). At the very least, that dependence is needed, yes. > Did the binutils/mingw guys ever comment anything useful on this issue? And does it still exist? And for which toolchain/binutils/mingw runtime/gdb version actually? There are like 3+ versions one can use (yours, msys2, tdm, msys1, ...) Thanks for looking into it, -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel