Re: [FFmpeg-devel] [PATCH] avcodec/hevcdsp: Offset ff_hevc_.pel_filters to simplify addressing

2024-02-11 Thread Christophe Gisquet
Le dim. 11 févr. 2024 à 12:37, Nuo Mi  a écrit :
> > -DECLARE_ALIGNED(16, const int8_t, ff_hevc_qpel_filters)[3][16] = {
> > +DECLARE_ALIGNED(16, const int8_t, ff_hevc_qpel_filters)[4][16] = {
> >
> Do you know why this is [4][16]? [4][8] should suffice.

Probably so that all coefficient banks are aligned. Another use for it
is you can directly use the address in some instruction instead of
using/wasting a reg for holding the data.

-- 
Christophe Gisquet
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/7] proresdec2: port and fix for cached reader

2023-09-11 Thread Christophe Gisquet
Le ven. 8 sept. 2023 à 10:20, Christophe Gisquet
 a écrit :
> This patchset requires my previous one improving the cached bitstream
> reader, and serves as its justification. It, basically, moves to using
> VLC wherever possible, and in particular when codewords are
> sufficiently short/there's some kind of well-behaved laplacian
> distribution for codewords that make VLCs efficient.
>
> Total speedup is around 40% here.

It's unfortunate I cannot devote as much time and effort to fix some
fundamental problems. But as I don't want Andreas to have wasted his
time reviewing, and me answering as best as possible, the last state
(maybe addressing 90% of the review?) can be obtained from repo at
https://github.com/cgisquet/ffmpeg.git branch prores.

Best regards,
-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/2] Expose and start using skip_remaining

2023-09-11 Thread Christophe Gisquet
Hello,

Le ven. 8 sept. 2023 à 00:39, Andreas Rheinhardt
 a écrit :
> This is problematic, because you seem to think that bits_peek(bc, bits)
> ensures that there are at least `bits` available in the cache;

read_vlc* also makes that assumption? Anyway, I'd put that behaviour
(of checking) under (!)UNCHECKED_BITSTREAM_READER,
and effectively this is about corrupt/unsupported bitstreams. Maybe
some parts of ffmpeg have been wrong for 15 years, and that should be
done instead of expecting the reader desyncs and/or checks at the
upper level of the loop the exhaustion of the bitstream.

> https://github.com/mkver/FFmpeg/commit/fba57506a9cf6be2f4aa57b10d54729fd92a
> for a way that fixes this.

I can only notice now I neither have the time, nor am enough
interested to embark in that scrutiny of current code. I'm OK to wait
for the ffmpeg project to have decided on a solution for the specifics
you are discussing.

> the assumption that the
> combined amount of bits consumed in any get_vlc2/GET_RL_VLC/BITS_RL_VLC
> call can't exceed 32. Is this assumption actually still true now that we
> have multi-vlc stuff?

It doesn't change anything there: it operates as the first stage/level
of any get_vlc, only it can output more than 1 symbol.

> https://github.com/mkver/FFmpeg/commit/9b5a977957968c0718dea55a5b15f060ef6201dc
> and
> https://github.com/mkver/FFmpeg/commits/aligned32_le_bitstream_reader
> are probably also of interest to you.

They probably would, had I the time. My goal was really to prevent the
prores and multi-symbol from bitrotting too much, but I wasn't
expecting these roadblocks.

I'm sorry to say I'm dropping the patch series.
-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 3/7] proresdec2: use VLC for level instead of EC switch

2023-09-10 Thread Christophe Gisquet
Hello

Le dim. 10 sept. 2023 à 17:40, Andreas Rheinhardt
 a écrit :
> Another solution would be to use void* instead of GetBitContext* in the
> header and in the implementation and then convert this void* to
> GetBitContext* in the function.

The forward declaration will be enough.

> I do not know what you mean by "the encoder instead". What problem
> happens with the encoder? Why would the encoder include proresdec.h at
> all and why would it be affected by changes to the decoder?

The keyword was "faint", and it's a non-issue now. Just explaining why
I would have had that code change that now appears as an attempted fix
for videotoolbox, but really was never meant that way: pure luck, the
actual reason being lost to time.

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 3/7] proresdec2: use VLC for level instead of EC switch

2023-09-10 Thread Christophe Gisquet
Hello,

Le ven. 8 sept. 2023 à 11:57, Andreas Rheinhardt
 a écrit :
> >> +#define CACHED_BITSTREAM_READER 1
> >
> > This should be in the commit switching to the cached bitstream reader.
>
> Correction: This header is included in videotoolbox.c and there is other
> stuff that also includes get_bits.h included in said file (and currently
> gets included before proresdec.h). This means that proresdec2.c and
> videotoolbox.c will have different opinions on what a GetBitContext is:
> It will be the non-cached one in videotoolbox.c and the cached one in
> proresdec2.c. This will work in practice, because ProresContext does not
> need the complete GetBitContext type at all (it does not contain a
> GetBitContext at all), so offsets are not affected. But it is
> nevertheless undefined behaviour and could become dangerous when using LTO.
>
> So you should switch the type of the pointer to BitstreamContextBE* in
> proresdec2.h. Furthermore, you can either include bitstream.h in
> proresdec.h or (IMO better) use a forward declaration and struct
> BitstreamContextBE* in the function pointer without including get_bits.h
> in the header at all.

On that point only: I don't recall (yes that's 3+ years old) the issue
being videotoolbox, it didn't have that include back when I wrote this
code.

It's a very faint recollection, and I don't find proof in the ffmpeg
code today or of 3+ years ago, but the problem you mention was
happening with the encoder instead. So maybe the fix now is needed
only by videotoolbox then.

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 7/7] prores: use VLC LUTs

2023-09-08 Thread Christophe Gisquet
Le ven. 8 sept. 2023 à 11:19, Andreas Rheinhardt
 a écrit :
> > -return 0;
> > +return 0;
>
> You are adding trailing whitespace.

Sorry, will fix. I had to do some of this work on a misconfigured machine.

> > +#include "libavutil/timer.h"
>
> You really need to look over your patches once more before you send
> them. Both of these changes are obviously not ok to commit.

I know the drill. Again, trying my best to help moving a situation
that had been rotting for 6 years.

> This still incurs an unnecessary indirection. The LUT should not point
> to the VLC's, but rather to the VLC tables (as this is the only thing
> needed from them lateron given that the number of bits is a compile-time
> constant. The LUT should be initialized when the VLCs are initialized.

You're right, and by the same logic from my comment, that should save
things further.

> Seems like these VLCs should be offset by 1 to avoid the "1+".

That's what I did in a previous commit, but that was before I could
share the tables. I didn't consider creating 5 more tables for this
beneficial.

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/7] proresdec2: port and fix for cached reader

2023-09-08 Thread Christophe Gisquet
Le ven. 8 sept. 2023 à 10:15, Christophe Gisquet
 a écrit :
>
> Summary of changes

git send-email --cover-letter apparently didn't let me edit one, so here goes.

This patchset requires my previous one improving the cached bitstream
reader, and serves as its justification. It, basically, moves to using
VLC wherever possible, and in particular when codewords are
sufficiently short/there's some kind of well-behaved laplacian
distribution for codewords that make VLCs efficient.

Total speedup is around 40% here.

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 7/7] prores: use VLC LUTs

2023-09-08 Thread Christophe Gisquet
One indirection less, around 1% speedup.
---
 libavcodec/proresdec2.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/libavcodec/proresdec2.c b/libavcodec/proresdec2.c
index b20021c622..85f81d92d3 100644
--- a/libavcodec/proresdec2.c
+++ b/libavcodec/proresdec2.c
@@ -561,12 +561,18 @@ static av_always_inline int 
decode_dc_coeffs(GetBitContext *gb, int16_t *out,
 prev_dc += (((code + 1) >> 1) ^ sign) - sign;
 out[0] = prev_dc;
 }
-return 0;
+return 0;  
 }
 
+#include "libavutil/timer.h"
+
+
 static av_always_inline int decode_ac_coeffs(AVCodecContext *avctx, 
GetBitContext *gb,
  int16_t *out, int 
blocks_per_slice)
 {
+   static VLC* lvl_vlc[9] = { _vlc[0], _vlc[1], _vlc[2], 
_vlc[3], _vlc[0], _vlc[4], _vlc[4], _vlc[4], _vlc[4], };
+   static VLC* run_vlc[15] = { _vlc[3], _vlc[3], _vlc[2], 
_vlc[2], _vlc[0], _vlc[5], _vlc[5], _vlc[5], _vlc[5],
+   _vlc[4], _vlc[4], _vlc[4], 
_vlc[4], _vlc[4], _vlc[4], };
 const ProresContext *ctx = avctx->priv_data;
 int block_mask, sign;
 unsigned pos, run, level;
@@ -585,9 +591,7 @@ static av_always_inline int decode_ac_coeffs(AVCodecContext 
*avctx, GetBitContex
 break;
 
 if (run < 15) {
-static const uint8_t ctx_to_tbl[] = { 3, 3, 2, 2, 0, 5, 5, 5, 5, 
4, 4, 4, 4, 4, 4 };
-const VLC* tbl = ac_vlc + ctx_to_tbl[run];
-run = get_vlc2(gb, tbl->table, PRORES_LEV_BITS, 3);
+run = get_vlc2(gb, run_vlc[run]->table, PRORES_LEV_BITS, 3);
 } else {
 unsigned int bits = 21 - 2*av_log2(show_bits(gb, 10));
 run = READ_BITS(gb, bits) - 4; // up to 17 bits
@@ -599,9 +603,7 @@ static av_always_inline int decode_ac_coeffs(AVCodecContext 
*avctx, GetBitContex
 }
 
 if (level < 9) {
-static const uint8_t ctx_to_tbl[] = { 0, 1, 2, 3, 0, 4, 4, 4, 4 };
-const VLC* tbl = ac_vlc + ctx_to_tbl[level];
-level = 1+get_vlc2(gb, tbl->table, PRORES_LEV_BITS, 3);
+level = 1+get_vlc2(gb, lvl_vlc[level]->table, PRORES_LEV_BITS, 3);
 } else {
 unsigned int bits = 25 - 2*av_log2(show_bits(gb, 12));
 level = READ_BITS(gb, bits) - 4 + 1; // up to 21 bits
-- 
2.42.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 6/7] proresdec2: remove a useless DC codebook entry

2023-09-08 Thread Christophe Gisquet
---
 libavcodec/proresdec2.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libavcodec/proresdec2.c b/libavcodec/proresdec2.c
index 02e1d82d00..b20021c622 100644
--- a/libavcodec/proresdec2.c
+++ b/libavcodec/proresdec2.c
@@ -534,9 +534,9 @@ static int decode_picture_header(AVCodecContext *avctx, 
const uint8_t *buf, cons
 
 #define FIRST_DC_CB 0xB8
 
-static const char dc_codebook[7][4] = {
+static const char dc_codebook[6][4] = {
 { 0, 0, 1, -1 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 },
-{ 1, 2, 2,  0 }, { 1, 2, 2,  0 }, { 0, 3, 4, -8 }, { 0, 3, 4, -8 }
+{ 1, 2, 2,  0 }, { 1, 2, 2,  0 }, { 0, 3, 4, -8 }
 };
 
 static av_always_inline int decode_dc_coeffs(GetBitContext *gb, int16_t *out,
@@ -553,7 +553,7 @@ static av_always_inline int decode_dc_coeffs(GetBitContext 
*gb, int16_t *out,
 code = 5;
 sign = 0;
 for (i = 1; i < blocks_per_slice; i++, out += 64) {
-unsigned int dccb = FFMIN(code, 6U);
+unsigned int dccb = FFMIN(code, 5U);
 DECODE_CODEWORD2(code, dc_codebook[dccb][0], dc_codebook[dccb][1],
dc_codebook[dccb][2], dc_codebook[dccb][3]);
 if(code) sign ^= -(code & 1);
-- 
2.42.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 5/7] proresdec2: use VLC for small runs and levels

2023-09-08 Thread Christophe Gisquet
Basically, the catch-all codebook is for on average long codewords,
and with a distribution such that the 3-step VLC reading is not
efficient. Furthermore, the complete unrolling make the actual code
smaller than the macro, and as the maximum codelength is smaller,
smaller amounts of bits, optimized for run and for level, can be read.
---
 libavcodec/proresdec2.c | 53 +++--
 1 file changed, 24 insertions(+), 29 deletions(-)

diff --git a/libavcodec/proresdec2.c b/libavcodec/proresdec2.c
index e3cef402d7..02e1d82d00 100644
--- a/libavcodec/proresdec2.c
+++ b/libavcodec/proresdec2.c
@@ -132,7 +132,7 @@ static void unpack_alpha_12(GetBitContext *gb, uint16_t 
*dst, int num_coeffs,
 #define AC_BITS 12
 #define PRORES_LEV_BITS 9
 
-static const uint8_t ac_info[] = { 0x04, 0x0A, 0x05, 0x06, 0x28, 0x4C };
+static const uint8_t ac_info[] = { 0x04, 0x0A, 0x05, 0x06, 0x28, 0x29 };
 static VLC ac_vlc[6];
 
 static av_cold void init_vlcs(void)
@@ -152,9 +152,7 @@ static av_cold void init_vlcs(void)
 switch_val  = (switch_bits+1) << rice_order;
 
 // Values are actually transformed, but this is more a wrapping
-ac_codes[0] = 0;
-ac_bits[0] = 0;
-for (ac = 0; ac < (1< max_bits) max_bits = bits;
-ac_bits [ac+1] = bits;
-ac_codes[ac+1] = code;
+ac_bits [ac] = bits;
+ac_codes[ac] = code;
 }
 
 ff_free_vlc(ac_vlc+i);
@@ -507,12 +505,9 @@ static int decode_picture_header(AVCodecContext *avctx, 
const uint8_t *buf, cons
 bits = exp_order - switch_bits + (q<<1);\
 val = READ_BITS(gb, bits) - (1 << exp_order) +  \
 ((switch_bits + 1) << rice_order);  \
-} else if (rice_order) {\
-skip_remaining(gb, q+1);\
-val = (q << rice_order) + get_bits(gb, rice_order); \
 } else {\
-val = q;\
 skip_remaining(gb, q+1);\
+val = rice_order ? (q << rice_order) + get_bits(gb, rice_order) : 
q;\
 }   \
 } while (0)
 
@@ -527,12 +522,10 @@ static int decode_picture_header(AVCodecContext *avctx, 
const uint8_t *buf, cons
 if (q > switch_bits) { /* exp golomb */ \
 bits = (q<<1) + (int)diff;  \
 val = READ_BITS(gb, bits) + (int)offset;\
-} else if (rice_order) {\
-skip_remaining(gb, q+1);\
-val = (q << rice_order) + get_bits(gb, rice_order); \
 } else {\
-val = q;\
 skip_remaining(gb, q+1);\
+val = rice_order ? (q << rice_order) + show_bits(gb, rice_order) : 
q;   \
+skip_remaining(gb, rice_order); \
 }   \
 } while (0)
 
@@ -571,14 +564,6 @@ static av_always_inline int decode_dc_coeffs(GetBitContext 
*gb, int16_t *out,
 return 0;
 }
 
-// adaptive codebook switching lut according to previous run values
-static const char run_to_cb[16][4] = {
-{ 2, 0, -1,  1 }, { 2, 0, -1,  1 }, { 1, 0, 0,  0 }, { 1, 0,  0,  0 }, { 
0, 0, 1, -1 },
-{ 1, 1,  1,  0 }, { 1, 1,  1,  0 }, { 1, 1, 1,  0 }, { 1, 1,  1,  0 },
-{ 0, 1,  2, -2 }, { 0, 1,  2, -2 }, { 0, 1, 2, -2 }, { 0, 1,  2, -2 }, { 
0, 1, 2, -2 }, { 0, 1, 2, -2 },
-{ 0, 2,  3, -4 }
-};
-
 static av_always_inline int decode_ac_coeffs(AVCodecContext *avctx, 
GetBitContext *gb,
  int16_t *out, int 
blocks_per_slice)
 {
@@ -595,22 +580,32 @@ static av_always_inline int 
decode_ac_coeffs(AVCodecContext *avctx, GetBitContex
 block_mask = blocks_per_slice - 1;
 
 for (pos = block_mask;;) {
-static const uint8_t ctx_to_tbl[] = { 0, 1, 2, 3, 0, 4, 4, 4, 4, 5 };
-const VLC* tbl = ac_vlc + ctx_to_tbl[FFMIN(level, 9)];
-unsigned int runcb = FFMIN(run,  15);
 bits_rem = get_bits_left(gb);
-if (!bits_rem || (bits_rem < 16 && !show_bits(gb, bits_rem)))
+if (!bits_rem || (bits_rem < 14 && !show_bits(gb, bits_rem)))
 break;
 
-DECODE_CODEWORD2(run, run_to_cb[runcb][0], run_to_cb[runcb][1],
-  run_to_cb[runcb][2], run_to_cb[runcb][3]);
+if (run < 15) {
+static const uint8_t ctx_to_tbl[] = { 3, 3, 2, 2, 0, 5, 5, 5, 5, 
4, 4, 4, 

[FFmpeg-devel] [PATCH 4/7] proresdec2: offset VLCs by 1 to avoid 1 add

2023-09-08 Thread Christophe Gisquet
Pretty harmless, but not much gained either.
---
 libavcodec/proresdec2.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/libavcodec/proresdec2.c b/libavcodec/proresdec2.c
index 91c689d9ef..e3cef402d7 100644
--- a/libavcodec/proresdec2.c
+++ b/libavcodec/proresdec2.c
@@ -152,7 +152,9 @@ static av_cold void init_vlcs(void)
 switch_val  = (switch_bits+1) << rice_order;
 
 // Values are actually transformed, but this is more a wrapping
-for (ac = 0; ac <1< max_bits) max_bits = bits;
-ac_bits [ac] = bits;
-ac_codes[ac] = code;
+ac_bits [ac+1] = bits;
+ac_codes[ac+1] = code;
 }
 
 ff_free_vlc(ac_vlc+i);
@@ -609,7 +611,6 @@ static av_always_inline int decode_ac_coeffs(AVCodecContext 
*avctx, GetBitContex
 }
 
 level = get_vlc2(gb, tbl->table, PRORES_LEV_BITS, 3);
-level += 1;
 
 i = pos >> log2_block_count;
 
-- 
2.42.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 3/7] proresdec2: use VLC for level instead of EC switch

2023-09-08 Thread Christophe Gisquet
x86/x64: 61/52 -> 55/46
Around 7-10% speedup.

Run and DC do not lend themselves to such changes, likely because
their distribution is less skewed, and need larger average vlc read
iterations.
---
 libavcodec/proresdec.h  |  1 +
 libavcodec/proresdec2.c | 77 ++---
 2 files changed, 66 insertions(+), 12 deletions(-)

diff --git a/libavcodec/proresdec.h b/libavcodec/proresdec.h
index 1e48752e6f..7ebacaeb21 100644
--- a/libavcodec/proresdec.h
+++ b/libavcodec/proresdec.h
@@ -22,6 +22,7 @@
 #ifndef AVCODEC_PRORESDEC_H
 #define AVCODEC_PRORESDEC_H
 
+#define CACHED_BITSTREAM_READER 1
 #include "get_bits.h"
 #include "blockdsp.h"
 #include "proresdsp.h"
diff --git a/libavcodec/proresdec2.c b/libavcodec/proresdec2.c
index 65e8b01755..91c689d9ef 100644
--- a/libavcodec/proresdec2.c
+++ b/libavcodec/proresdec2.c
@@ -24,17 +24,17 @@
  * Known FOURCCs: 'apch' (HQ), 'apcn' (SD), 'apcs' (LT), 'apco' (Proxy), 
'ap4h' (), 'ap4x' ( XQ)
  */
 
-#define CACHED_BITSTREAM_READER 1
+//#define DEBUG
 
 #include "config_components.h"
 
 #include "libavutil/internal.h"
 #include "libavutil/mem_internal.h"
+#include "libavutil/thread.h"
 
 #include "avcodec.h"
 #include "codec_internal.h"
 #include "decode.h"
-#include "get_bits.h"
 #include "hwaccel_internal.h"
 #include "hwconfig.h"
 #include "idctdsp.h"
@@ -129,8 +129,64 @@ static void unpack_alpha_12(GetBitContext *gb, uint16_t 
*dst, int num_coeffs,
 }
 }
 
+#define AC_BITS 12
+#define PRORES_LEV_BITS 9
+
+static const uint8_t ac_info[] = { 0x04, 0x0A, 0x05, 0x06, 0x28, 0x4C };
+static VLC ac_vlc[6];
+
+static av_cold void init_vlcs(void)
+{
+int i;
+for (i = 0; i < sizeof(ac_info); i++) {
+uint32_t ac_codes[1<> 5;   /* rice code order */
+exp_order   = (codebook >> 2) & 7;  /* exp golomb code order */
+
+switch_val  = (switch_bits+1) << rice_order;
+
+// Values are actually transformed, but this is more a wrapping
+for (ac = 0; ac <1<= switch_val) {
+val += (1 << exp_order) - switch_val;
+exponent = av_log2(val);
+bits = exponent+1+switch_bits-exp_order/*0*/ + 
exponent+1/*val*/;
+code = val;
+} else if (rice_order) {
+bits = (val >> rice_order)/*0*/ + 1/*1*/ + rice_order/*val*/;
+code = (1 << rice_order) | val;
+} else {
+bits = val/*0*/ + 1/*1*/;
+code = 1;
+}
+if (bits > max_bits) max_bits = bits;
+ac_bits [ac] = bits;
+ac_codes[ac] = code;
+}
+
+ff_free_vlc(ac_vlc+i);
+
+if (init_vlc(ac_vlc+i, PRORES_LEV_BITS, 1pix_fmt = AV_PIX_FMT_NONE;
 
+// init dc_tables
+ff_thread_once(_static_once, init_vlcs);
+
 if (avctx->bits_per_raw_sample == 10){
 ctx->unpack_alpha = unpack_alpha_10;
 } else if (avctx->bits_per_raw_sample == 12){
@@ -510,7 +569,7 @@ static av_always_inline int decode_dc_coeffs(GetBitContext 
*gb, int16_t *out,
 return 0;
 }
 
-// adaptive codebook switching lut according to previous run/level values
+// adaptive codebook switching lut according to previous run values
 static const char run_to_cb[16][4] = {
 { 2, 0, -1,  1 }, { 2, 0, -1,  1 }, { 1, 0, 0,  0 }, { 1, 0,  0,  0 }, { 
0, 0, 1, -1 },
 { 1, 1,  1,  0 }, { 1, 1,  1,  0 }, { 1, 1, 1,  0 }, { 1, 1,  1,  0 },
@@ -518,12 +577,6 @@ static const char run_to_cb[16][4] = {
 { 0, 2,  3, -4 }
 };
 
-static const char lev_to_cb[10][4] = {
-{ 0, 0,  1, -1 }, { 2, 0,  0, -1 }, { 1, 0, 0,  0 }, { 2, 0, -1,  1 }, { 
0, 0, 1, -1 },
-{ 0, 1,  2, -2 }, { 0, 1,  2, -2 }, { 0, 1, 2, -2 }, { 0, 1,  2, -2 },
-{ 0, 2,  3, -4 }
-};
-
 static av_always_inline int decode_ac_coeffs(AVCodecContext *avctx, 
GetBitContext *gb,
  int16_t *out, int 
blocks_per_slice)
 {
@@ -540,8 +593,9 @@ static av_always_inline int decode_ac_coeffs(AVCodecContext 
*avctx, GetBitContex
 block_mask = blocks_per_slice - 1;
 
 for (pos = block_mask;;) {
+static const uint8_t ctx_to_tbl[] = { 0, 1, 2, 3, 0, 4, 4, 4, 4, 5 };
+const VLC* tbl = ac_vlc + ctx_to_tbl[FFMIN(level, 9)];
 unsigned int runcb = FFMIN(run,  15);
-unsigned int levcb = FFMIN(level, 9);
 bits_rem = get_bits_left(gb);
 if (!bits_rem || (bits_rem < 16 && !show_bits(gb, bits_rem)))
 break;
@@ -554,8 +608,7 @@ static av_always_inline int decode_ac_coeffs(AVCodecContext 
*avctx, GetBitContex
 return AVERROR_INVALIDDATA;
 }
 
-DECODE_CODEWORD2(level, lev_to_cb[levcb][0], lev_to_cb[levcb][1],
-lev_to_cb[levcb][2], lev_to_cb[levcb][3]);
+level = get_vlc2(gb, tbl->table, PRORES_LEV_BITS, 3);
 level += 1;
 
  

[FFmpeg-devel] [PATCH 2/7] proresdec2: store precomputed EC parameters

2023-09-08 Thread Christophe Gisquet
Having the various orders and offsets stored in a codebook is compact
but causes additional computations. Using instead a table for the
precomputed results achieve some speedups at the cost of ~132 bytes.

Around 5% speedup.
---
 libavcodec/proresdec2.c | 54 +++--
 1 file changed, 47 insertions(+), 7 deletions(-)

diff --git a/libavcodec/proresdec2.c b/libavcodec/proresdec2.c
index 6e243cfc17..65e8b01755 100644
--- a/libavcodec/proresdec2.c
+++ b/libavcodec/proresdec2.c
@@ -427,6 +427,7 @@ static int decode_picture_header(AVCodecContext *avctx, 
const uint8_t *buf, cons
 # define READ_BITS get_bits
 #endif
 
+/* Kept for reference and because clearer for first DC */
 #define DECODE_CODEWORD(val, codebook)  \
 do {\
 unsigned int rice_order, exp_order, switch_bits;\
@@ -454,18 +455,41 @@ static int decode_picture_header(AVCodecContext *avctx, 
const uint8_t *buf, cons
 }   \
 } while (0)
 
+/* number of bits to switch between rice and exp golomb */
+#define DECODE_CODEWORD2(val, switch_bits, rice_order, diff, offset)\
+do {\
+unsigned int q, buf, bits;  \
+\
+buf = show_bits(gb, 14);\
+q = 13 - av_log2(buf);  \
+\
+if (q > switch_bits) { /* exp golomb */ \
+bits = (q<<1) + (int)diff;  \
+val = READ_BITS(gb, bits) + (int)offset;\
+} else if (rice_order) {\
+skip_remaining(gb, q+1);\
+val = (q << rice_order) + get_bits(gb, rice_order); \
+} else {\
+val = q;\
+skip_remaining(gb, q+1);\
+}   \
+} while (0)
+
+
 #define TOSIGNED(x) (((x) >> 1) ^ (-((x) & 1)))
 
 #define FIRST_DC_CB 0xB8
 
-static const uint8_t dc_codebook[7] = { 0x04, 0x28, 0x28, 0x4D, 0x4D, 0x70, 
0x70};
+static const char dc_codebook[7][4] = {
+{ 0, 0, 1, -1 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 },
+{ 1, 2, 2,  0 }, { 1, 2, 2,  0 }, { 0, 3, 4, -8 }, { 0, 3, 4, -8 }
+};
 
 static av_always_inline int decode_dc_coeffs(GetBitContext *gb, int16_t *out,
   int blocks_per_slice)
 {
 int16_t prev_dc;
 int code, i, sign;
-
 DECODE_CODEWORD(code, FIRST_DC_CB);
 prev_dc = TOSIGNED(code);
 out[0] = prev_dc;
@@ -475,7 +499,9 @@ static av_always_inline int decode_dc_coeffs(GetBitContext 
*gb, int16_t *out,
 code = 5;
 sign = 0;
 for (i = 1; i < blocks_per_slice; i++, out += 64) {
-DECODE_CODEWORD(code, dc_codebook[FFMIN(code, 6U)]);
+unsigned int dccb = FFMIN(code, 6U);
+DECODE_CODEWORD2(code, dc_codebook[dccb][0], dc_codebook[dccb][1],
+   dc_codebook[dccb][2], dc_codebook[dccb][3]);
 if(code) sign ^= -(code & 1);
 else sign  = 0;
 prev_dc += (((code + 1) >> 1) ^ sign) - sign;
@@ -485,8 +511,18 @@ static av_always_inline int decode_dc_coeffs(GetBitContext 
*gb, int16_t *out,
 }
 
 // adaptive codebook switching lut according to previous run/level values
-static const uint8_t run_to_cb[16] = { 0x06, 0x06, 0x05, 0x05, 0x04, 0x29, 
0x29, 0x29, 0x29, 0x28, 0x28, 0x28, 0x28, 0x28, 0x28, 0x4C };
-static const uint8_t lev_to_cb[10] = { 0x04, 0x0A, 0x05, 0x06, 0x04, 0x28, 
0x28, 0x28, 0x28, 0x4C };
+static const char run_to_cb[16][4] = {
+{ 2, 0, -1,  1 }, { 2, 0, -1,  1 }, { 1, 0, 0,  0 }, { 1, 0,  0,  0 }, { 
0, 0, 1, -1 },
+{ 1, 1,  1,  0 }, { 1, 1,  1,  0 }, { 1, 1, 1,  0 }, { 1, 1,  1,  0 },
+{ 0, 1,  2, -2 }, { 0, 1,  2, -2 }, { 0, 1, 2, -2 }, { 0, 1,  2, -2 }, { 
0, 1, 2, -2 }, { 0, 1, 2, -2 },
+{ 0, 2,  3, -4 }
+};
+
+static const char lev_to_cb[10][4] = {
+{ 0, 0,  1, -1 }, { 2, 0,  0, -1 }, { 1, 0, 0,  0 }, { 2, 0, -1,  1 }, { 
0, 0, 1, -1 },
+{ 0, 1,  2, -2 }, { 0, 1,  2, -2 }, { 0, 1, 2, -2 }, { 0, 1,  2, -2 },
+{ 0, 2,  3, -4 }
+};
 
 static av_always_inline int decode_ac_coeffs(AVCodecContext *avctx, 
GetBitContext *gb,
  int16_t *out, int 
blocks_per_slice)
@@ -504,18 +540,22 @@ static av_always_inline int 
decode_ac_coeffs(AVCodecContext *avctx, GetBitContex
 block_mask = 

[FFmpeg-devel] [PATCH 1/7] proresdec2: port and fix for cached reader

2023-09-08 Thread Christophe Gisquet
Summary of changes
- move back to regular, non-macro, get_bits API
- reduce the lookup to switch the coding method
- shorter reads wherever possible, in particular for the end of bitstream
  (16 bits instead of 32, as per the above)

There are cases that really need longer lengths (larger EG codes) of up
to 27 bits.

Win64: 6.10 -> 4.87 (~20% speedup)

Reference for an hypothetical 32bits version of the cached reader:
Win32: 11.4 -> 9.8  (14%, because iDCT is not SIMDed)
---
 libavcodec/proresdec2.c | 53 ++---
 1 file changed, 23 insertions(+), 30 deletions(-)

diff --git a/libavcodec/proresdec2.c b/libavcodec/proresdec2.c
index 9297860946..6e243cfc17 100644
--- a/libavcodec/proresdec2.c
+++ b/libavcodec/proresdec2.c
@@ -24,9 +24,7 @@
  * Known FOURCCs: 'apch' (HQ), 'apcn' (SD), 'apcs' (LT), 'apco' (Proxy), 
'ap4h' (), 'ap4x' ( XQ)
  */
 
-//#define DEBUG
-
-#define LONG_BITSTREAM_READER
+#define CACHED_BITSTREAM_READER 1
 
 #include "config_components.h"
 
@@ -422,35 +420,37 @@ static int decode_picture_header(AVCodecContext *avctx, 
const uint8_t *buf, cons
 return pic_data_size;
 }
 
-#define DECODE_CODEWORD(val, codebook, SKIP)\
+/* bitstream_read may fail on 32bits ARCHS for >24 bits, so use long version 
there */
+#if 0 //BITSTREAM_BITS == 32
+# define READ_BITS get_bits_long
+#else
+# define READ_BITS get_bits
+#endif
+
+#define DECODE_CODEWORD(val, codebook)  \
 do {\
 unsigned int rice_order, exp_order, switch_bits;\
 unsigned int q, buf, bits;  \
 \
-UPDATE_CACHE(re, gb);   \
-buf = GET_CACHE(re, gb);\
+buf = show_bits(gb, 14);\
 \
 /* number of bits to switch between rice and exp golomb */  \
 switch_bits =  codebook & 3;\
 rice_order  =  codebook >> 5;   \
 exp_order   = (codebook >> 2) & 7;  \
 \
-q = 31 - av_log2(buf);  \
+q = 13 - av_log2(buf);  \
 \
 if (q > switch_bits) { /* exp golomb */ \
 bits = exp_order - switch_bits + (q<<1);\
-if (bits > FFMIN(MIN_CACHE_BITS, 31))   \
-return AVERROR_INVALIDDATA; \
-val = SHOW_UBITS(re, gb, bits) - (1 << exp_order) + \
+val = READ_BITS(gb, bits) - (1 << exp_order) +  \
 ((switch_bits + 1) << rice_order);  \
-SKIP(re, gb, bits); \
 } else if (rice_order) {\
-SKIP_BITS(re, gb, q+1); \
-val = (q << rice_order) + SHOW_UBITS(re, gb, rice_order);   \
-SKIP(re, gb, rice_order);   \
+skip_remaining(gb, q+1);\
+val = (q << rice_order) + get_bits(gb, rice_order); \
 } else {\
 val = q;\
-SKIP(re, gb, q+1);  \
+skip_remaining(gb, q+1);\
 }   \
 } while (0)
 
@@ -466,9 +466,7 @@ static av_always_inline int decode_dc_coeffs(GetBitContext 
*gb, int16_t *out,
 int16_t prev_dc;
 int code, i, sign;
 
-OPEN_READER(re, gb);
-
-DECODE_CODEWORD(code, FIRST_DC_CB, LAST_SKIP_BITS);
+DECODE_CODEWORD(code, FIRST_DC_CB);
 prev_dc = TOSIGNED(code);
 out[0] = prev_dc;
 
@@ -477,13 +475,12 @@ static av_always_inline int 
decode_dc_coeffs(GetBitContext *gb, int16_t *out,
 code = 5;
 sign = 0;
 for (i = 1; i < blocks_per_slice; i++, out += 64) {
-DECODE_CODEWORD(code, dc_codebook[FFMIN(code, 6U)], LAST_SKIP_BITS);
+DECODE_CODEWORD(code, dc_codebook[FFMIN(code, 6U)]);
 if(code) sign ^= -(code & 1);
 else sign  = 0;
 prev_dc += (((code + 1) >> 1) ^ sign) - sign;
 out[0] = prev_dc;
 }
-CLOSE_READER(re, gb);
 return 

[FFmpeg-devel] [PATCH 2/2] read_xbits: request fewer bits

2023-09-07 Thread Christophe Gisquet
This would have also helped a bitstream reader with a cache
of 32 bits.
---
 libavcodec/bitstream_template.h | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/libavcodec/bitstream_template.h b/libavcodec/bitstream_template.h
index 3f90fc6a07..c27e8108b2 100644
--- a/libavcodec/bitstream_template.h
+++ b/libavcodec/bitstream_template.h
@@ -423,8 +423,18 @@ static inline const uint8_t *BS_FUNC(align)(BSCTX *bc)
  */
 static inline int BS_FUNC(read_xbits)(BSCTX *bc, unsigned int n)
 {
-int32_t cache = BS_FUNC(peek)(bc, 32);
-int sign = ~cache >> 31;
+int32_t cache;
+int sign;
+
+if (n > bc->bits_valid)
+BS_FUNC(priv_refill_32)(bc);
+
+#if defined(BITSTREAM_READER_LE)
+cache = bc->bits & 0x;
+#else
+cache = bc->bits >> 32;
+#endif
+sign = ~cache >> 31;
 BS_FUNC(skip_remaining)(bc, n);
 
 return uint32_t)(sign ^ cache)) >> (32 - n)) ^ sign) - sign;
-- 
2.42.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 1/2] Expose and start using skip_remaining

2023-09-07 Thread Christophe Gisquet
Bitstream readers sometimes have already checked there are enough
bits, and the check is redundant.
---
 libavcodec/bitstream.h  |  8 +---
 libavcodec/bitstream_template.h | 22 +++---
 libavcodec/get_bits.h   |  1 +
 3 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/libavcodec/bitstream.h b/libavcodec/bitstream.h
index 35b7873b9c..dd043fb349 100644
--- a/libavcodec/bitstream.h
+++ b/libavcodec/bitstream.h
@@ -95,6 +95,7 @@
 # define bits_peek_signed   bits_peek_signed_le
 # define bits_peek_signed_nz bits_peek_signed_nz_le
 # define bits_skip  bits_skip_le
+# define bits_skip_remaining bits_skip_remaining_le
 # define bits_seek  bits_seek_le
 # define bits_align bits_align_le
 # define bits_read_xbitsbits_read_xbits_le
@@ -124,6 +125,7 @@
 # define bits_peek_signed   bits_peek_signed_be
 # define bits_peek_signed_nz bits_peek_signed_nz_be
 # define bits_skip  bits_skip_be
+# define bits_skip_remaining bits_skip_remaining_be
 # define bits_seek  bits_seek_be
 # define bits_align bits_align_be
 # define bits_read_xbitsbits_read_xbits_be
@@ -146,7 +148,7 @@
 n = table[index].len;   \
 \
 if (max_depth > 1 && n < 0) {   \
-bits_skip(bc, bits);\
+skip_remaining(bc, bits);   \
 \
 nb_bits = -n;   \
 \
@@ -154,7 +156,7 @@
 level = table[index].level; \
 n = table[index].len;   \
 if (max_depth > 2 && n < 0) {   \
-bits_skip(bc, nb_bits); \
+skip_remaining(bc, nb_bits);\
 nb_bits = -n;   \
 \
 index = bits_peek(bc, nb_bits) + level; \
@@ -163,7 +165,7 @@
 }   \
 }   \
 run = table[index].run; \
-bits_skip(bc, n);   \
+skip_remaining(bc, n);  \
 } while (0)
 
 #endif /* AVCODEC_BITSTREAM_H */
diff --git a/libavcodec/bitstream_template.h b/libavcodec/bitstream_template.h
index 0308e3a924..3f90fc6a07 100644
--- a/libavcodec/bitstream_template.h
+++ b/libavcodec/bitstream_template.h
@@ -175,7 +175,7 @@ static inline uint64_t BS_FUNC(priv_val_show)(BSCTX *bc, 
unsigned int n)
 #endif
 }
 
-static inline void BS_FUNC(priv_skip_remaining)(BSCTX *bc, unsigned int n)
+static inline void BS_FUNC(skip_remaining)(BSCTX *bc, unsigned int n)
 {
 #ifdef BITSTREAM_TEMPLATE_LE
 bc->bits >>= n;
@@ -192,7 +192,7 @@ static inline uint64_t BS_FUNC(priv_val_get)(BSCTX *bc, 
unsigned int n)
 av_assert2(n > 0 && n < 64);
 
 ret = BS_FUNC(priv_val_show)(bc, n);
-BS_FUNC(priv_skip_remaining)(bc, n);
+BS_FUNC(skip_remaining)(bc, n);
 
 return ret;
 }
@@ -375,7 +375,7 @@ static inline int BS_FUNC(peek_signed)(BSCTX *bc, unsigned 
int n)
 static inline void BS_FUNC(skip)(BSCTX *bc, unsigned int n)
 {
 if (n < bc->bits_valid)
-BS_FUNC(priv_skip_remaining)(bc, n);
+BS_FUNC(skip_remaining)(bc, n);
 else {
 n -= bc->bits_valid;
 bc->bits   = 0;
@@ -389,7 +389,7 @@ static inline void BS_FUNC(skip)(BSCTX *bc, unsigned int n)
 }
 BS_FUNC(priv_refill_64)(bc);
 if (n)
-BS_FUNC(priv_skip_remaining)(bc, n);
+BS_FUNC(skip_remaining)(bc, n);
 }
 }
 
@@ -425,7 +425,7 @@ static inline int BS_FUNC(read_xbits)(BSCTX *bc, unsigned 
int n)
 {
 int32_t cache = BS_FUNC(peek)(bc, 32);
 int sign = ~cache >> 31;
-BS_FUNC(priv_skip_remaining)(bc, n);
+BS_FUNC(skip_remaining)(bc, n);
 
 return uint32_t)(sign ^ cache)) >> (32 - n)) ^ sign) - sign;
 }
@@ -508,14 +508,14 @@ static inline int BS_FUNC(read_vlc)(BSCTX *bc, const 
VLCElem *table,
 int n= table[idx].len;
 
 if (max_depth > 1 && n < 0) {
-BS_FUNC(priv_skip_remaining)(bc, bits);
+BS_FUNC(skip_remaining)(bc, bits);
 code = BS_FUNC(priv_set_idx)(bc, code, , _bits, table);
 if (max_depth > 2 && n < 0) {
-BS_FUNC(priv_skip_remaining)(bc, nb_bits);
+BS_FUNC(skip_remaining)(bc, nb_bits);
 code = BS_FUNC(priv_set_idx)(bc, code, , _bits, table);
 }
 }
-BS_FUNC(priv_skip_remaining)(bc, n);
+BS_FUNC(skip_remaining)(bc, n);
 
 return code;
 }
@@ -534,17 +534,17 @@ static inline int 

[FFmpeg-devel] [PATCH 0/2] cached bistream: small improvements

2023-09-07 Thread Christophe Gisquet
Preparatory patch independently beneficial. Note: all of these
are for the sake of simplicity, from 2020, but needed cleaner
rebasing.

Christophe Gisquet (2):
  Expose and start using skip_remaining
  read_xbits: request fewer bits

 libavcodec/bitstream.h  |  8 +---
 libavcodec/bitstream_template.h | 36 +
 libavcodec/get_bits.h   |  1 +
 3 files changed, 29 insertions(+), 16 deletions(-)

-- 
2.42.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avcodec/v210enc: add new function for avx2 avx512 avx512icl

2022-10-29 Thread Christophe Gisquet
Hello,

Le ven. 28 oct. 2022 à 20:57, James Darnley  a écrit :
> +%else
> +pand   m1, m6, m1
> +pandn  m0, m6, m0
> +porm0, m0, m1
> +%endif

Isn't that pattern a vpblendb or some such ?

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 2/4] lav/dnxhd: CID 1256 is RGB, not BGR or YUV444

2021-02-01 Thread Christophe Gisquet
Hi,

Le dim. 31 janv. 2021 à 14:11, Michael Niedermayer
 a écrit :
> This transmutes the following dog into a hyperspace neon dog
> ./ffplay DNxHDtest2.mov

I'm not sure I prefer the correct version, but here goes. This sample
is YUV444 basically, the reverse of what I've seen in another sample.

-- 
Christophe


0002-lav-dnxhd-CID-1256-is-RGB-not-BGR-or-YUV444.patch
Description: Binary data
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 3/4] dnxhd: add partial alpha support for parsing

2021-01-31 Thread Christophe Gisquet
Le sam. 30 janv. 2021 à 10:54, Paul B Mahol  a écrit :
> Are you telling us that you do not have specification for this?

Yes, cf. cover letter. In fact, this patch could be dropped (not sure).

> Last time I checked AVID files had uncompressed alpha that did not matched
> with specification at all.

I don't know, though I wouldn't mind peaking a look. This precise
patch does no parsing.

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 4/4] dnxhddec: partial alpha support

2021-01-30 Thread Christophe Gisquet
From: Christophe Gisquet 

This consists in just ignoring the alpha at the end of the bitstream
---
 libavcodec/dnxhddec.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/libavcodec/dnxhddec.c b/libavcodec/dnxhddec.c
index 11da1c286c..1de95996cf 100644
--- a/libavcodec/dnxhddec.c
+++ b/libavcodec/dnxhddec.c
@@ -202,7 +202,7 @@ static int dnxhd_decode_header(DNXHDContext *ctx, AVFrame 
*frame,
 ctx->cur_field = 0;
 }
 ctx->mbaff = (buf[0x6] >> 5) & 1;
-ctx->alpha = buf[0x7] & 1;
+ctx->alpha = buf[0x7] & 5;
 ctx->lla   = (buf[0x7] >> 1) & 1;
 if (ctx->alpha)
 avpriv_request_sample(ctx->avctx, "alpha");
@@ -249,10 +249,14 @@ static int dnxhd_decode_header(DNXHDContext *ctx, AVFrame 
*frame,
 return AVERROR_INVALIDDATA;
 } else if (bitdepth == 10) {
 ctx->decode_dct_block = dnxhd_decode_dct_block_10_444;
-ctx->pix_fmt = ctx->act ? AV_PIX_FMT_GBRP10 : AV_PIX_FMT_YUV444P10;
+ctx->pix_fmt = ctx->act
+ ? (/*ctx->alpha ? AV_PIX_FMT_GBRAP10LE :*/ 
AV_PIX_FMT_GBRP10)
+ : (/*ctx->alpha ? AV_PIX_FMT_YUVA444P10LE :*/ 
AV_PIX_FMT_YUV444P10);
 } else {
 ctx->decode_dct_block = dnxhd_decode_dct_block_12_444;
-ctx->pix_fmt = ctx->act ? AV_PIX_FMT_GBRP12 : AV_PIX_FMT_YUV444P12;
+ctx->pix_fmt = ctx->act
+ ? (/*ctx->alpha ? AV_PIX_FMT_GBRAP12LE :*/ 
AV_PIX_FMT_GBRP12)
+ : (/*ctx->alpha ? AV_PIX_FMT_YUVA444P12LE :*/ 
AV_PIX_FMT_YUV444P12);
 }
 } else if (bitdepth == 12) {
 ctx->decode_dct_block = dnxhd_decode_dct_block_12;
@@ -337,7 +341,7 @@ static int dnxhd_decode_header(DNXHDContext *ctx, AVFrame 
*frame,
 i, 0x170 + (i << 2), ctx->mb_scan_index[i]);
 if (buf_size - ctx->data_offset < ctx->mb_scan_index[i]) {
 av_log(ctx->avctx, AV_LOG_ERROR,
-   "invalid mb scan index (%"PRIu32" vs %u).\n",
+   "invalid mb %i scan index (%"PRIu32" vs %u).\n", i,
ctx->mb_scan_index[i], buf_size - ctx->data_offset);
 return AVERROR_INVALIDDATA;
 }
@@ -642,6 +646,12 @@ static int dnxhd_decode_row(AVCodecContext *avctx, void 
*data,
 }
 }
 
+/* alpha decoding goes there */
+if (ctx->alpha) {
+   ff_dlog(ctx->avctx, "Row %d: %d left\n", rownb,
+   ((rownb < ctx->mb_height-1 ? ctx->mb_scan_index[rownb+1] : 
ctx->buf_size) - offset) * 8 - get_bits_count(>gb));
+}
+
 return 0;
 }
 
@@ -735,11 +745,13 @@ decode_coding_unit:
 case -1:
 case 0:
 ctx->pix_fmt = ctx->bit_depth==10
- ? AV_PIX_FMT_GBRP10 : AV_PIX_FMT_GBRP12;
+ ? (/*ctx->alpha ? AV_PIX_FMT_GBRAP10 :*/ 
AV_PIX_FMT_GBRP10)
+ : (/*ctx->alpha ? AV_PIX_FMT_GBRAP12 :*/ 
AV_PIX_FMT_GBRP12);
 break;
 case 1:
 ctx->pix_fmt = ctx->bit_depth==10
- ? AV_PIX_FMT_YUV444P10 : AV_PIX_FMT_YUV444P12;
+ ? (/*ctx->alpha ? AV_PIX_FMT_YUVA444P10 :*/ 
AV_PIX_FMT_YUV444P10)
+ : (/*ctx->alpha ? AV_PIX_FMT_YUVA444P12 :*/ 
AV_PIX_FMT_YUV444P12);
 break;
 }
 }
-- 
2.29.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 3/4] dnxhd: add partial alpha support for parsing

2021-01-30 Thread Christophe Gisquet
From: Christophe Gisquet 

This multiplies the framesize by 1.5 when there is alpha, for the CIDs
allowing alpha. In addition, a new header is checked, because the alpha
marking seems to be different.
---
 libavcodec/dnxhd_parser.c |  7 ---
 libavcodec/dnxhddata.c| 17 -
 libavcodec/dnxhddata.h|  6 --
 libavcodec/dnxhdenc.c |  2 +-
 libavformat/mxfenc.c  |  7 ---
 5 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/libavcodec/dnxhd_parser.c b/libavcodec/dnxhd_parser.c
index 63b4ff89e1..726fb2f5de 100644
--- a/libavcodec/dnxhd_parser.c
+++ b/libavcodec/dnxhd_parser.c
@@ -31,7 +31,7 @@ typedef struct {
 ParseContext pc;
 int cur_byte;
 int remaining;
-int w, h;
+int w, h, alpha;
 } DNXHDParserContext;
 
 static int dnxhd_find_frame_end(DNXHDParserContext *dctx,
@@ -58,6 +58,7 @@ static int dnxhd_find_frame_end(DNXHDParserContext *dctx,
 if (pic_found && !dctx->remaining) {
 if (!buf_size) /* EOF considered as end of frame */
 return 0;
+dctx->alpha = (state >> 8) & 5;
 for (; i < buf_size; i++) {
 dctx->cur_byte++;
 state = (state << 8) | buf[i];
@@ -73,9 +74,9 @@ static int dnxhd_find_frame_end(DNXHDParserContext *dctx,
 if (cid <= 0)
 continue;
 
-remaining = avpriv_dnxhd_get_frame_size(cid);
+remaining = avpriv_dnxhd_get_frame_size(cid, dctx->alpha);
 if (remaining <= 0) {
-remaining = avpriv_dnxhd_get_hr_frame_size(cid, dctx->w, 
dctx->h);
+remaining = avpriv_dnxhd_get_hr_frame_size(cid, dctx->w, 
dctx->h, dctx->alpha);
 if (remaining <= 0)
 continue;
 }
diff --git a/libavcodec/dnxhddata.c b/libavcodec/dnxhddata.c
index 3a69a0f501..54663aa432 100644
--- a/libavcodec/dnxhddata.c
+++ b/libavcodec/dnxhddata.c
@@ -1083,15 +1083,19 @@ const CIDEntry *ff_dnxhd_get_cid_table(int cid)
 return NULL;
 }
 
-int avpriv_dnxhd_get_frame_size(int cid)
+int avpriv_dnxhd_get_frame_size(int cid, int alpha)
 {
 const CIDEntry *entry = ff_dnxhd_get_cid_table(cid);
+int result;
 if (!entry)
 return -1;
-return entry->frame_size;
+result = entry->frame_size;
+if (alpha && (entry->flags & DNXHD_444))
+result = (result * 3) >> 1;
+return result;
 }
 
-int avpriv_dnxhd_get_hr_frame_size(int cid, int w, int h)
+int avpriv_dnxhd_get_hr_frame_size(int cid, int w, int h, int alpha)
 {
 const CIDEntry *entry = ff_dnxhd_get_cid_table(cid);
 int result;
@@ -1099,8 +1103,11 @@ int avpriv_dnxhd_get_hr_frame_size(int cid, int w, int h)
 if (!entry)
 return -1;
 
-result = ((h + 15) / 16) * ((w + 15) / 16) * 
(int64_t)entry->packet_scale.num / entry->packet_scale.den;
-result = (result + 2048) / 4096 * 4096;
+result = AV_CEIL_RSHIFT(h, 4) * AV_CEIL_RSHIFT(w, 4)
+   * (int64_t)entry->packet_scale.num / entry->packet_scale.den;
+if (alpha && (entry->flags & DNXHD_444))
+result = (result * 3) >> 1;
+result = (result + 2048) & -4096;
 
 return FFMAX(result, 8192);
 }
diff --git a/libavcodec/dnxhddata.h b/libavcodec/dnxhddata.h
index 898079cffc..21738af453 100644
--- a/libavcodec/dnxhddata.h
+++ b/libavcodec/dnxhddata.h
@@ -35,6 +35,7 @@
 /** Frame headers, extra 0x00 added to end for parser */
 #define DNXHD_HEADER_INITIAL 0x02800100
 #define DNXHD_HEADER_444 0x02800200
+#define DNXHD_HEADER_RGBA0x02800400
 
 /** Indicate that a CIDEntry value must be read in the bitstream */
 #define DNXHD_VARIABLE 0
@@ -76,6 +77,7 @@ static av_always_inline uint64_t 
ff_dnxhd_check_header_prefix(uint64_t prefix)
 {
 if (prefix == DNXHD_HEADER_INITIAL ||
 prefix == DNXHD_HEADER_444 ||
+prefix == DNXHD_HEADER_RGBA||
 ff_dnxhd_check_header_prefix_hr(prefix))
 return prefix;
 return 0;
@@ -88,8 +90,8 @@ static av_always_inline uint64_t 
ff_dnxhd_parse_header_prefix(const uint8_t *buf
 return ff_dnxhd_check_header_prefix(prefix);
 }
 
-int avpriv_dnxhd_get_frame_size(int cid);
-int avpriv_dnxhd_get_hr_frame_size(int cid, int w, int h);
+int avpriv_dnxhd_get_frame_size(int cid, int alpha);
+int avpriv_dnxhd_get_hr_frame_size(int cid, int w, int h, int alpha);
 int avpriv_dnxhd_get_interlaced(int cid);
 
 #endif /* AVCODEC_DNXHDDATA_H */
diff --git a/libavcodec/dnxhdenc.c b/libavcodec/dnxhdenc.c
index 2461c51727..fb059060aa 100644
--- a/libavcodec/dnxhdenc.c
+++ b/libavcodec/dnxhdenc.c
@@ -467,7 +467,7 @@ static av_cold int dnxhd_encode_init(AVCodecContext *avctx)
 
 if (ctx->cid_table->frame_size == DNXHD_VARIABLE) {
 ctx->frame_size = avpriv_dnxhd_get_hr_frame_size(ctx->cid,
-  

[FFmpeg-devel] [PATCH 2/4] lav/dnxhd: CID 1256 is RGB, not BGR or YUV444

2021-01-30 Thread Christophe Gisquet
From: Christophe Gisquet 

Fix the logic around checking the ACT flag per MB and row.
This also requires adding a 444 path to swap channels into
the ffmpeg formats, as they are GBR, and not RGB.
---
 libavcodec/dnxhddec.c | 64 +++
 1 file changed, 47 insertions(+), 17 deletions(-)

diff --git a/libavcodec/dnxhddec.c b/libavcodec/dnxhddec.c
index 359588f963..11da1c286c 100644
--- a/libavcodec/dnxhddec.c
+++ b/libavcodec/dnxhddec.c
@@ -249,12 +249,10 @@ static int dnxhd_decode_header(DNXHDContext *ctx, AVFrame 
*frame,
 return AVERROR_INVALIDDATA;
 } else if (bitdepth == 10) {
 ctx->decode_dct_block = dnxhd_decode_dct_block_10_444;
-ctx->pix_fmt = ctx->act ? AV_PIX_FMT_YUV444P10
-: AV_PIX_FMT_GBRP10;
+ctx->pix_fmt = ctx->act ? AV_PIX_FMT_GBRP10 : AV_PIX_FMT_YUV444P10;
 } else {
 ctx->decode_dct_block = dnxhd_decode_dct_block_12_444;
-ctx->pix_fmt = ctx->act ? AV_PIX_FMT_YUV444P12
-: AV_PIX_FMT_GBRP12;
+ctx->pix_fmt = ctx->act ? AV_PIX_FMT_GBRP12 : AV_PIX_FMT_YUV444P12;
 }
 } else if (bitdepth == 12) {
 ctx->decode_dct_block = dnxhd_decode_dct_block_12;
@@ -504,19 +502,19 @@ static int dnxhd_decode_macroblock(const DNXHDContext 
*ctx, RowContext *row,
 qscale = get_bits(>gb, 11);
 }
 act = get_bits1(>gb);
-if (act) {
-if (!ctx->act) {
-static int act_warned;
-if (!act_warned) {
-act_warned = 1;
-av_log(ctx->avctx, AV_LOG_ERROR,
-   "ACT flag set, in violation of frame header.\n");
-}
-} else if (row->format == -1) {
+if (ctx->act) {
+if (row->format == -1) {
 row->format = act;
 } else if (row->format != act) {
 row->format = 2; // Variable
 }
+} else if (act) {
+static int act_warned;
+if (!act_warned) {
+act_warned = 1;
+av_log(ctx->avctx, AV_LOG_ERROR,
+   "ACT flag set, in violation of frame header.\n");
+}
 }
 
 if (qscale != row->last_qscale) {
@@ -569,6 +567,21 @@ static int dnxhd_decode_macroblock(const DNXHDContext 
*ctx, RowContext *row,
 }
 break;
 case DNX_CHROMAFORMAT_444:
+if (ctx->avctx->profile == FF_PROFILE_DNXHD) {
+ctx->idsp.idct_put(dest_y,   
dct_linesize_luma, row->blocks[2]);
+ctx->idsp.idct_put(dest_y + dct_x_offset,
dct_linesize_luma, row->blocks[3]);
+ctx->idsp.idct_put(dest_y + dct_y_offset,
dct_linesize_luma, row->blocks[8]);
+ctx->idsp.idct_put(dest_y + dct_y_offset + dct_x_offset, 
dct_linesize_luma, row->blocks[9]);
+
+ctx->idsp.idct_put(dest_u,   
dct_linesize_luma, row->blocks[4]);
+ctx->idsp.idct_put(dest_u + dct_x_offset,
dct_linesize_luma, row->blocks[5]);
+ctx->idsp.idct_put(dest_u + dct_y_offset,
dct_linesize_luma, row->blocks[10]);
+ctx->idsp.idct_put(dest_u + dct_y_offset + dct_x_offset, 
dct_linesize_luma, row->blocks[11]);
+ctx->idsp.idct_put(dest_v,   
dct_linesize_luma, row->blocks[0]);
+ctx->idsp.idct_put(dest_v + dct_x_offset,
dct_linesize_luma, row->blocks[1]);
+ctx->idsp.idct_put(dest_v + dct_y_offset,
dct_linesize_luma, row->blocks[6]);
+ctx->idsp.idct_put(dest_v + dct_y_offset + dct_x_offset, 
dct_linesize_luma, row->blocks[7]);
+} else {
 ctx->idsp.idct_put(dest_y,   
dct_linesize_luma, row->blocks[0]);
 ctx->idsp.idct_put(dest_y + dct_x_offset,
dct_linesize_luma, row->blocks[1]);
 ctx->idsp.idct_put(dest_y + dct_y_offset,
dct_linesize_luma, row->blocks[6]);
@@ -585,6 +598,7 @@ static int dnxhd_decode_macroblock(const DNXHDContext *ctx, 
RowContext *row,
 ctx->idsp.idct_put(dest_v + dct_y_offset,
dct_linesize_chroma, row->blocks[10]);
 ctx->idsp.idct_put(dest_v + dct_y_offset + dct_x_offset, 
dct_linesize_chroma, row->blocks[11]);
 }
+}
 break;
 case DNX_CHROMAFORMAT_420:
 ctx->idsp.idct_put(dest_y,   
dct_linesize_luma, row->blocks[0]);
@@ -610,6 +624,8 @@ static int dnxhd_decode_row(AVCodecContext *avctx, void 
*data,
 RowContext *row = ctx->rows + threadnb;
 int x, ret;
 
+row->format = -1;
+
 row->last_dc[0] =
 r

[FFmpeg-devel] [PATCH 0/4] Better colorspace support in dnxhddec

2021-01-30 Thread Christophe Gisquet
Nobody complained so the CIDs are likely litle used.
This was developped without reference to the ST2019-1:2016 specs (some
fields are therefore guessed) but with reference to (unredistributable)
samples likely generated by the Avid SDK.

I have no idea how the alpha is coded, but it is variable-length.

Christophe Gisquet (4):
  lav/dnxhd: better support 4:2:0 in DNXHR profiles
  lav/dnxhd: CID 1256 is RGB, not BGR or YUV444
  dnxhd: add partial alpha support for parsing
  dnxhddec: partial alpha support

 libavcodec/dnxhd_parser.c |   7 +-
 libavcodec/dnxhddata.c|  17 +++--
 libavcodec/dnxhddata.h|   6 +-
 libavcodec/dnxhddec.c | 139 --
 libavcodec/dnxhdenc.c |   2 +-
 libavformat/mxfenc.c  |   7 +-
 6 files changed, 128 insertions(+), 50 deletions(-)

-- 
2.29.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 1/4] lav/dnxhd: better support 4:2:0 in DNXHR profiles

2021-01-30 Thread Christophe Gisquet
From: Christophe Gisquet 

Where they are allowed. No validation of profile + colorformat is performed,
however.
---
 libavcodec/dnxhddec.c | 55 +++
 1 file changed, 40 insertions(+), 15 deletions(-)

diff --git a/libavcodec/dnxhddec.c b/libavcodec/dnxhddec.c
index c78d55aee5..359588f963 100644
--- a/libavcodec/dnxhddec.c
+++ b/libavcodec/dnxhddec.c
@@ -49,6 +49,13 @@ typedef struct RowContext {
 int format;
 } RowContext;
 
+typedef enum {
+DNX_CHROMAFORMAT_422 = 0,
+DNX_CHROMAFORMAT_420 = 1,
+DNX_CHROMAFORMAT_444 = 2,
+DNX_CHROMAFORMAT_UNKNOWN = 3,
+} DNXChromaFormat;
+
 typedef struct DNXHDContext {
 AVCodecContext *avctx;
 RowContext *rows;
@@ -67,7 +74,7 @@ typedef struct DNXHDContext {
 ScanTable scantable;
 const CIDEntry *cid_table;
 int bit_depth; // 8, 10, 12 or 0 if not initialized at all.
-int is_444;
+int chromafmt;
 int alpha;
 int lla;
 int mbaff;
@@ -168,6 +175,7 @@ static int dnxhd_decode_header(DNXHDContext *ctx, AVFrame 
*frame,
const uint8_t *buf, int buf_size,
int first_field)
 {
+static const char* cfname[4] = { "4:2:2", "4:2:0", "4:4:4", "Unknown" };
 int i, cid, ret;
 int old_bit_depth = ctx->bit_depth, bitdepth;
 uint64_t header_prefix;
@@ -234,8 +242,8 @@ static int dnxhd_decode_header(DNXHDContext *ctx, AVFrame 
*frame,
 av_log(ctx->avctx, AV_LOG_WARNING,
"Adaptive color transform in an unsupported profile.\n");
 
-ctx->is_444 = (buf[0x2C] >> 6) & 1;
-if (ctx->is_444) {
+ctx->chromafmt = (buf[0x2C] >> 5) & 3;
+if (ctx->chromafmt == DNX_CHROMAFORMAT_444) {
 if (bitdepth == 8) {
 avpriv_request_sample(ctx->avctx, "4:4:4 8 bits");
 return AVERROR_INVALIDDATA;
@@ -250,16 +258,16 @@ static int dnxhd_decode_header(DNXHDContext *ctx, AVFrame 
*frame,
 }
 } else if (bitdepth == 12) {
 ctx->decode_dct_block = dnxhd_decode_dct_block_12;
-ctx->pix_fmt = AV_PIX_FMT_YUV422P12;
+ctx->pix_fmt = ctx->chromafmt == DNX_CHROMAFORMAT_420 ? 
AV_PIX_FMT_YUV420P12 : AV_PIX_FMT_YUV422P12;
 } else if (bitdepth == 10) {
 if (ctx->avctx->profile == FF_PROFILE_DNXHR_HQX)
 ctx->decode_dct_block = dnxhd_decode_dct_block_10_444;
 else
 ctx->decode_dct_block = dnxhd_decode_dct_block_10;
-ctx->pix_fmt = AV_PIX_FMT_YUV422P10;
+ctx->pix_fmt = ctx->chromafmt == DNX_CHROMAFORMAT_420 ? 
AV_PIX_FMT_YUV420P10 : AV_PIX_FMT_YUV422P10;
 } else {
 ctx->decode_dct_block = dnxhd_decode_dct_block_8;
-ctx->pix_fmt = AV_PIX_FMT_YUV422P;
+ctx->pix_fmt = ctx->chromafmt == DNX_CHROMAFORMAT_420 ? 
AV_PIX_FMT_YUV420P : AV_PIX_FMT_YUV422P;
 }
 
 ctx->avctx->bits_per_raw_sample = ctx->bit_depth = bitdepth;
@@ -292,8 +300,8 @@ static int dnxhd_decode_header(DNXHDContext *ctx, AVFrame 
*frame,
 if ((ctx->height + 15) >> 4 == ctx->mb_height && frame->interlaced_frame)
 ctx->height <<= 1;
 
-av_log(ctx->avctx, AV_LOG_VERBOSE, "%dx%d, 4:%s %d bits, MBAFF=%d 
ACT=%d\n",
-   ctx->width, ctx->height, ctx->is_444 ? "4:4" : "2:2",
+av_log(ctx->avctx, AV_LOG_VERBOSE, "%dx%d, %s %d bits, MBAFF=%d ACT=%d\n",
+   ctx->width, ctx->height, cfname[ctx->chromafmt],
ctx->bit_depth, ctx->mbaff, ctx->act);
 
 // Newer format supports variable mb_scan_index sizes
@@ -360,7 +368,7 @@ static av_always_inline int dnxhd_decode_dct_block(const 
DNXHDContext *ctx,
 
 ctx->bdsp.clear_block(block);
 
-if (!ctx->is_444) {
+if (ctx->chromafmt != DNX_CHROMAFORMAT_444) {
 if (n & 2) {
 component = 1 + (n & 1);
 scale = row->chroma_scale;
@@ -478,6 +486,9 @@ static int dnxhd_decode_dct_block_12_444(const DNXHDContext 
*ctx,
 static int dnxhd_decode_macroblock(const DNXHDContext *ctx, RowContext *row,
AVFrame *frame, int x, int y)
 {
+static const char yoff[4] = { 1, 0, 1, 0 };
+static const char xoff[4] = { 0, 0, 1, 0 };
+static const uint8_t num_blocks[4] = { 8, 6, 12, 0 };
 int shift1 = ctx->bit_depth >= 10;
 int dct_linesize_luma   = frame->linesize[0];
 int dct_linesize_chroma = frame->linesize[1];
@@ -516,7 +527,7 @@ static int dnxhd_decode_macroblock(const DNXHDContext *ctx, 
RowContext *row,
 row->last_qscale = qscale;
 }
 
-for (i = 0; i < 8 + 4 * ctx->is_444; i++) {
+for (i = 0; i < num_blocks[ctx->chromafmt]; i++) {
 if (ctx->decode_dct_block(ctx, row, i) < 0)
 return AVERROR_

Re: [FFmpeg-devel] [PATCH] [RFC] Tech Resolution Process

2020-12-06 Thread Christophe Gisquet
Hi,

Le sam. 5 déc. 2020 à 15:59, Jean-Baptiste Kempf  a écrit :
> +After all the emails are in, the TC has 96 hours to give its final decision.
> +
> +### Within TC
> +
> +In the internal case, the TC has 96 hours to give its final decision.

How is the unavailability of any TC member handled? What about a
quorum? Would you have deputy ("fallback") TC members then?
The unavailability can just be because the weekend falls in this 96H
period, but special (bank or not) holidays also.

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2 0/7] HEVC native support for Screen content coding

2020-11-18 Thread Christophe Gisquet
Hi,

Le jeu. 29 oct. 2020 à 14:57, Christophe Gisquet
 a écrit :
> Hi, as you are the only one active on this decoder, this shouldn't matter, 
> but:
> down the line, the ffmpeg project has no way of testing if someone
> breaks even the basic parsing of these extensions in the future.
> To test, the hardware you mention is needed, as well as maybe specific tests.
>
> At some point, fate lacks some support for verifying h/w decoding. It
> would be really nice if some of these companies with all this new
> awesome hardware would consider this, and for instance contribute fate
> instances to perform such test for the ffmpeg project.

Ping? This is irrespective of the patchset being accepted or not. I
just wish people are aware of the stakes in the long term.

Thanks anyway for investing the time for what you've already done.
-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2 0/7] HEVC native support for Screen content coding

2020-10-29 Thread Christophe Gisquet
Forgot to add this:

Le jeu. 29 oct. 2020 à 14:51, Christophe Gisquet
 a écrit :
> > [1] https://github.com/oddstone/FFmpeg/commits/rext1
>
> This has additional fixes (which looks good, haven't really delved
> into it) that unfortunately doesn't fix:

And I suspect you need these entropy state fixes (and other ones) to
be before adding any such tools, because these tools will not pass
some fate testing without them.

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2 0/7] HEVC native support for Screen content coding

2020-10-29 Thread Christophe Gisquet
Hi,

Le mar. 29 sept. 2020 à 17:55, Linjie Fu  a écrit :
> I didn’t see such plans for now, hence adding sufficient error message
> seems to be a proper way.

Hi, as you are the only one active on this decoder, this shouldn't matter, but:
down the line, the ffmpeg project has no way of testing if someone
breaks even the basic parsing of these extensions in the future.
To test, the hardware you mention is needed, as well as maybe specific tests.

At some point, fate lacks some support for verifying h/w decoding. It
would be really nice if some of these companies with all this new
awesome hardware would consider this, and for instance contribute fate
instances to perform such test for the ffmpeg project.

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2 0/7] HEVC native support for Screen content coding

2020-10-29 Thread Christophe Gisquet
Hi,

Le ven. 2 oct. 2020 à 18:12, Guangxin Xu  a écrit :
> Most of scc conformance clip has tiles.
> But currently, the hevc software decoder has many issues for tile cabac
> saving and loading.
> We'd better fix them before starting implement scc tool.
>
> I have queue up some patches to address the cabac issue at [1] and send the
> first one to review at [2]
> but, no one responded to me yet. Do you know who can help review the patch?
> thanks
>
> [1] https://github.com/oddstone/FFmpeg/commits/rext1

This has additional fixes (which looks good, haven't really delved
into it) that unfortunately doesn't fix:

> [2]
> https://patchwork.ffmpeg.org/project/ffmpeg/patch/20200829055218.32261-1-oddst...@gmail.com/

this patch being run under fate with THREADS=12 THREAD_TYPE=slice

--
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 2/2] fate/hevc-conformance: add clip for persistent_rice_adaptation_enabled_flag

2020-10-29 Thread Christophe Gisquet
Le sam. 29 août 2020 à 07:52, Xu Guangxin  a écrit :
> you can download it from:
> https://www.itu.int/wftp3/av-arch/jctvc-site/bitstream_exchange/draft_conformance/RExt/WPP_HIGH_TP_444_8BIT_RExt_Apple_2.bit

Just for the record, this is now
https://www.itu.int/wftp3/av-arch/jctvc-site/bitstream_exchange/draft_conformance/RExt/WPP_HIGH_TP_444_8BIT_RExt_Apple_2.zip

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 1/2] avcodec/hevcdec: fix stat_coeff save/load for persistent_rice_adaptation_enabled_flag

2020-10-29 Thread Christophe Gisquet
Le mer. 9 sept. 2020 à 07:51, Guangxin Xu  a écrit :
> Hi Mickaël & all,
> any suggestions?

The patch is almost good, though I would have hoped to link at a
relevant part of the specs and TableStatCoeff* beyong just "9.3".

Though as I suspected, there is probably something missing. Maybe
around sync across (tile) threads?

You can check for yourself by running it like:
make fate-hevc-conformance-WPP_HIGH_TP_444_8BIT_RExt_Apple_2
THREADS=12 THREAD_TYPE=slice

Sure enough, the MD5 becomes random:
-0,  0,  0,1,  1179648, 0x78e55a69
-0,  1,  1,1,  1179648, 0x5babb3cb
-0,  2,  2,1,  1179648, 0x65935648
+0,  0,  0,1,  1179648, 0xa9a4a727
+0,  1,  1,1,  1179648, 0xf4bfd32d
+0,  2,  2,1,  1179648, 0x4f28807a
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 2/7] get_bits: support 32bits cache

2020-04-15 Thread Christophe Gisquet
Hi,

Le mer. 15 avr. 2020 à 00:41, Carl Eugen Hoyos  a écrit :
> Will test on ppc32 over the weekend.

Please do. Testing on different endianness and different arch is
probably what this patchset lacks the most.

If you can, on this arch, please test just before and just after
"0006-get_bits-change-refill-to-RAD-pattern.patch"
The impact on x86 was non-trivial, but it may behave quite differently
(better?) on PPC.

Also, regarding the benchmarking, provided there is an encoder (either
in ffmpeg or VfW), I use a soft video, and another which is the soft
video with added grain/noise (see eg ffmpeg filter) added on top of
it.
As the bitstream reader gets faster, the difference in decoding speed
over the 2 may increase.

Thanks,
-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 6/7] get_bits: change refill to RAD pattern

2020-04-15 Thread Christophe Gisquet
Hi,

Le mar. 14 avr. 2020 à 12:25, Christophe Gisquet
 a écrit :
>  if (is_le)
> -s->cache |= (cache_type)AV_RL_HALF(s->ptr) << s->bits_left;
> +s->cache |= (cache_type)AV_RL_ALL(s->ptr) << s->bits_left;
>  else
> -s->cache |= (cache_type)AV_RB_HALF(s->ptr) << (BITSTREAM_HBITS - 
> s->bits_left);

After this, AV_R*_HALF becomes unused, so I'll update the patch to
remove them, in addition to any change asked/suggested during review.

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 6/7] get_bits: change refill to RAD pattern

2020-04-14 Thread Christophe Gisquet
Described as variant 4 in the linked article.
Results in faster and smaller code. Also, cases for the "refill_all" cases
(usually when we want to empty/fill it) have been inlined.
---
 libavcodec/get_bits.h | 103 +-
 1 file changed, 41 insertions(+), 62 deletions(-)

diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h
index da054ebfcb..baff86ecf6 100644
--- a/libavcodec/get_bits.h
+++ b/libavcodec/get_bits.h
@@ -153,11 +153,7 @@ static inline unsigned int show_bits(GetBitContext *s, int 
n);
  */
 
 #if CACHED_BITSTREAM_READER
-# if BITSTREAM_BITS == 32
-#   define MIN_CACHE_BITS (32-7)
-# else
-#   define MIN_CACHE_BITS  32
-# endif
+#   define MIN_CACHE_BITS (BITSTREAM_BITS-7)
 #elif defined LONG_BITSTREAM_READER
 #   define MIN_CACHE_BITS 32
 #else
@@ -262,46 +258,21 @@ static inline int get_bits_count(const GetBitContext *s)
 }
 
 #if CACHED_BITSTREAM_READER
-static inline void refill_half(GetBitContext *s, int is_le)
+// See variant 4 in the following article:
+// 
https://fgiesen.wordpress.com/2018/02/20/reading-bits-in-far-too-many-ways-part-2/
+static inline void refill_gb(GetBitContext *s, int is_le)
 {
 #if !UNCHECKED_BITSTREAM_READER
 if (s->ptr >= s->buffer_end)
 return;
 #endif
 
-#if BITSTREAM_BITS == 32
-if (s->bits_left > 16) {
-if (is_le)
-s->cache |= (uint32_t)s->ptr[0] << s->bits_left;
-else
-s->cache |= (uint32_t)s->ptr[0] << (32 - s->bits_left);
-s->ptr++;
-s->bits_left += 8;
-return;
-}
-#endif
-
 if (is_le)
-s->cache |= (cache_type)AV_RL_HALF(s->ptr) << s->bits_left;
+s->cache |= (cache_type)AV_RL_ALL(s->ptr) << s->bits_left;
 else
-s->cache |= (cache_type)AV_RB_HALF(s->ptr) << (BITSTREAM_HBITS - 
s->bits_left);
-s->ptr   += sizeof(s->cache)/2;
-s->bits_left += BITSTREAM_HBITS;
-}
-
-static inline void refill_all(GetBitContext *s, int is_le)
-{
-#if !UNCHECKED_BITSTREAM_READER
-if (s->ptr >= s->buffer_end)
-return;
-#endif
-
-if (is_le)
-s->cache = AV_RL_ALL(s->ptr);
-else
-s->cache = AV_RB_ALL(s->ptr);
-s->ptr += sizeof(s->cache);
-s->bits_left = BITSTREAM_BITS;
+s->cache |= (cache_type)AV_RB_ALL(s->ptr) >> s->bits_left;
+s->ptr   += (BITSTREAM_BITS-1 - s->bits_left) >> 3;
+s->bits_left |= BITSTREAM_BITS-8;
 }
 
 static inline cache_type get_val(GetBitContext *s, unsigned n, int is_le)
@@ -374,9 +345,9 @@ static inline int get_xbits(GetBitContext *s, int n)
 
 if (n > s->bits_left)
 #ifdef BITSTREAM_READER_LE
-refill_half(s, 1);
+refill_gb(s, 1);
 #else
-refill_half(s, 0);
+refill_gb(s, 0);
 #endif
 
 #if BITSTREAM_BITS == 32
@@ -448,9 +419,9 @@ static inline unsigned int get_bits(GetBitContext *s, int n)
 av_assert2(n>0 && n<=32);
 if (n > s->bits_left) {
 #ifdef BITSTREAM_READER_LE
-refill_half(s, 1);
+refill_gb(s, 1);
 #else
-refill_half(s, 0);
+refill_gb(s, 0);
 #endif
 if (s->bits_left < BITSTREAM_HBITS)
 s->bits_left = n;
@@ -486,7 +457,7 @@ static inline unsigned int get_bits_le(GetBitContext *s, 
int n)
 #if CACHED_BITSTREAM_READER
 av_assert2(n>0 && n<=32);
 if (n > s->bits_left) {
-refill_half(s, 1);
+refill_gb(s, 1);
 if (s->bits_left < BITSTREAM_HBITS)
 s->bits_left = n;
 }
@@ -513,9 +484,9 @@ static inline unsigned int show_bits(GetBitContext *s, int 
n)
 #if CACHED_BITSTREAM_READER
 if (n > s->bits_left)
 #ifdef BITSTREAM_READER_LE
-refill_half(s, 1);
+refill_gb(s, 1);
 #else
-refill_half(s, 0);
+refill_gb(s, 0);
 #endif
 
 tmp = show_val(s, n);
@@ -535,7 +506,6 @@ static inline void skip_bits(GetBitContext *s, int n)
 skip_remaining(s, n);
 else {
 n -= s->bits_left;
-s->cache = 0;
 
 if (n >= BITSTREAM_BITS) {
 unsigned skip = n / 8;
@@ -543,11 +513,14 @@ static inline void skip_bits(GetBitContext *s, int n)
 n -= 8*skip;
 s->ptr += skip;
 }
+
 #ifdef BITSTREAM_READER_LE
-refill_all(s, 1);
+s->cache = AV_RL_ALL(s->ptr);
 #else
-refill_all(s, 0);
+s->cache = AV_RB_ALL(s->ptr);
 #endif
+s->ptr  += sizeof(cache_type);
+s->bits_left = BITSTREAM_BITS;
 if (n)
 skip_remaining(s, n);
 }
@@ -561,12 +534,15 @@ static inline void skip_bits(GetBitContext *s, int n)
 static inline unsigned int get_bits1(GetBitContext *s)
 {
 #if CACHED_BITSTREAM_READER
-if (!s->bits_left)
+if (!s->bits_left) {
 #ifdef BITSTREAM_READER_LE
-refill_all(s, 1);
+s->cache = AV_RL_ALL(s->ptr);
 #else
-refill_all(s, 0);
+s->cache = AV_RB_ALL(s->ptr);
 #endif
+s->ptr  += sizeof(cache_type);
+s->bits_left = BITSTREAM_BITS;
+}
 
 #ifdef BITSTREAM_READER_LE
 

[FFmpeg-devel] [PATCH 4/7] get_bits: replace index by an incremented pointer

2020-04-14 Thread Christophe Gisquet
The main effect is actually code size reduction, due to the smaller
refill code (or difference in inlining decision), e.g. on Win32 of
{magicyuv,huffyuvdec,utvideodec}.o as follows:
19068/41460/16512  ->  18892/40760/16448

It should also be a small speedup (because it simplifies the address
computation), but no change was measured.
---
 libavcodec/get_bits.h | 43 +--
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h
index 59bfbdd88b..4f75f9dd84 100644
--- a/libavcodec/get_bits.h
+++ b/libavcodec/get_bits.h
@@ -91,10 +91,12 @@ typedef uint32_t cache_type;
 typedef struct GetBitContext {
 const uint8_t *buffer, *buffer_end;
 #if CACHED_BITSTREAM_READER
+const uint8_t *ptr;
 cache_type cache;
 unsigned bits_left;
+#else
+int index; // Cached version advances ptr instead
 #endif
-int index;
 int size_in_bits;
 int size_in_bits_plus8;
 } GetBitContext;
@@ -253,7 +255,7 @@ static inline unsigned int show_bits(GetBitContext *s, int 
n);
 static inline int get_bits_count(const GetBitContext *s)
 {
 #if CACHED_BITSTREAM_READER
-return s->index - s->bits_left;
+return 8*(s->ptr - s->buffer) - s->bits_left;
 #else
 return s->index;
 #endif
@@ -263,42 +265,42 @@ static inline int get_bits_count(const GetBitContext *s)
 static inline void refill_half(GetBitContext *s, int is_le)
 {
 #if !UNCHECKED_BITSTREAM_READER
-if (s->index >> 3 >= s->buffer_end - s->buffer)
+if (s->ptr >= s->buffer_end)
 return;
 #endif
 
 #if BITSTREAM_BITS == 32
 if (s->bits_left > 16) {
 if (is_le)
-s->cache |= (uint32_t)s->buffer[s->index >> 3] << s->bits_left;
+s->cache |= (uint32_t)s->ptr[0] << s->bits_left;
 else
-s->cache |= (uint32_t)s->buffer[s->index >> 3] << (32 - s->bits_left);
-s->index += 8;
+s->cache |= (uint32_t)s->ptr[0] << (32 - s->bits_left);
+s->ptr++;
 s->bits_left += 8;
 return;
 }
 #endif
 
 if (is_le)
-s->cache |= (cache_type)AV_RL_HALF(s->buffer + (s->index >> 3)) << 
s->bits_left;
+s->cache |= (cache_type)AV_RL_HALF(s->ptr) << s->bits_left;
 else
-s->cache |= (cache_type)AV_RB_HALF(s->buffer + (s->index >> 3)) << 
(BITSTREAM_HBITS - s->bits_left);
-s->index += BITSTREAM_HBITS;
+s->cache |= (cache_type)AV_RB_HALF(s->ptr) << (BITSTREAM_HBITS - 
s->bits_left);
+s->ptr   += sizeof(s->cache)/2;
 s->bits_left += BITSTREAM_HBITS;
 }
 
 static inline void refill_all(GetBitContext *s, int is_le)
 {
 #if !UNCHECKED_BITSTREAM_READER
-if (s->index >> 3 >= s->buffer_end - s->buffer)
+if (s->ptr >= s->buffer_end)
 return;
 #endif
 
 if (is_le)
-s->cache = AV_RL_ALL(s->buffer + (s->index >> 3));
+s->cache = AV_RL_ALL(s->ptr);
 else
-s->cache = AV_RB_ALL(s->buffer + (s->index >> 3));
-s->index += BITSTREAM_BITS;
+s->cache = AV_RB_ALL(s->ptr);
+s->ptr += sizeof(s->cache);
 s->bits_left = BITSTREAM_BITS;
 }
 
@@ -534,13 +536,12 @@ static inline void skip_bits(GetBitContext *s, int n)
 else {
 n -= s->bits_left;
 s->cache = 0;
-s->bits_left = 0;
 
 if (n >= BITSTREAM_BITS) {
-unsigned skip = (n / 8) * 8;
+unsigned skip = n / 8;
 
-n -= skip;
-s->index += skip;
+n -= 8*skip;
+s->ptr += skip;
 }
 #ifdef BITSTREAM_READER_LE
 refill_all(s, 1);
@@ -699,12 +700,14 @@ static inline int init_get_bits_xe(GetBitContext *s, 
const uint8_t *buffer,
 s->size_in_bits   = bit_size;
 s->size_in_bits_plus8 = bit_size + 8;
 s->buffer_end = buffer + buffer_size;
-s->index  = 0;
 
 #if CACHED_BITSTREAM_READER
+s->ptr= buffer;
 s->cache  = 0;
 s->bits_left  = 0;
 refill_all(s, is_le);
+#else
+s->index  = 0;
 #endif
 
 return ret;
@@ -757,7 +760,11 @@ static inline const uint8_t *align_get_bits(GetBitContext 
*s)
 int n = -get_bits_count(s) & 7;
 if (n)
 skip_bits(s, n);
+#if CACHED_BITSTREAM_READER
+return s->ptr;
+#else
 return s->buffer + (s->index >> 3);
+#endif
 }
 
 /**
-- 
2.26.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 7/7] get_bits: use immediate in skip_remaining

2020-04-14 Thread Christophe Gisquet
When the entry informs to continue reading, this means the current read
will be entirely skipped. Small object size reduction, depending on
inlining.
---
 libavcodec/get_bits.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h
index baff86ecf6..d1e29b9917 100644
--- a/libavcodec/get_bits.h
+++ b/libavcodec/get_bits.h
@@ -793,7 +793,7 @@ static inline const uint8_t *align_get_bits(GetBitContext 
*s)
 code  = table[index][0];\
 n = table[index][1];\
 if (max_depth > 2 && n < 0) {   \
-LAST_SKIP_BITS(name, gb, nb_bits);  \
+LAST_SKIP_BITS(name, gb, bits); \
 UPDATE_CACHE(name, gb); \
 \
 nb_bits = -n;   \
@@ -878,7 +878,7 @@ static av_always_inline int get_vlc2(GetBitContext *s, 
VLC_TYPE (*table)[2],
 skip_remaining(s, bits);
 code = set_idx(s, code, , _bits, table);
 if (max_depth > 2 && n < 0) {
-skip_remaining(s, nb_bits);
+skip_remaining(s, bits);
 code = set_idx(s, code, , _bits, table);
 }
 }
-- 
2.26.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 5/7] get_bits: improve and fix get_bits_long for 32b

2020-04-14 Thread Christophe Gisquet
The new code is guaranteed to read at least 32bits, which is likely ok with
the usual case that get_bits without cache can read up to 25.
---
 libavcodec/get_bits.h | 29 ++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h
index 4f75f9dd84..da054ebfcb 100644
--- a/libavcodec/get_bits.h
+++ b/libavcodec/get_bits.h
@@ -608,21 +608,44 @@ static inline void skip_bits1(GetBitContext *s)
  */
 static inline unsigned int get_bits_long(GetBitContext *s, int n)
 {
+unsigned ret = 0;
 av_assert2(n>=0 && n<=32);
 if (!n) {
 return 0;
 #if CACHED_BITSTREAM_READER
 }
-return get_bits(s, n);
+
+# ifdef BITSTREAM_READER_LE
+unsigned left = 0;
+# endif
+if (n > s->bits_left) {
+n -= s->bits_left;
+# ifdef BITSTREAM_READER_LE
+left = s->bits_left;
+ret = get_val(s, s->bits_left, 1);
+refill_all(s, 1);
+# else
+ret = get_val(s, s->bits_left, 0);
+refill_all(s, 0);
+# endif
+}
+
+#ifdef BITSTREAM_READER_LE
+ret = get_val(s, n, 1) << left | ret;
+#else
+ret = get_val(s, n, 0) | ret << n;
+#endif
+
+return ret;
 #else
 } else if (n <= MIN_CACHE_BITS) {
 return get_bits(s, n);
 } else {
 #ifdef BITSTREAM_READER_LE
-unsigned ret = get_bits(s, 16);
+ret = get_bits(s, 16);
 return ret | (get_bits(s, n - 16) << 16);
 #else
-unsigned ret = get_bits(s, 16) << (n - 16);
+ret = get_bits(s, 16) << (n - 16);
 return ret | get_bits(s, n - 16);
 #endif
 }
-- 
2.26.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 0/7] Port cache bitstream reader to 32bits, and improve

2020-04-14 Thread Christophe Gisquet
This patch series gathers all changes affecting the cached reader and
the file get_bits.h.
The largest consists in modifying the cached reader so that the cache
can be selected to be (native) 32 bits large. Then, due to some corner
cases from various codecs, reducing some reads or fixing functions that
can not guarantee the usual number of bits, are needed.

Note: the MVHA sample was generated using the pattern generation from
VirtualDub2 (Tools->Create test video->zone plates) and the MVHA codec,
and is 235186 bytes.

Christophe Gisquet (7):
  fate: add a MVHA test
  get_bits: support 32bits cache
  get_xbits: request fewer bits
  get_bits: replace index by an incremented pointer
  get_bits: improve and fix get_bits_long for 32b
  get_bits: change refill to RAD pattern
  get_bits: use immediate in skip_remaining

 libavcodec/get_bits.h   | 193 +++-
 libavcodec/mvha.c   |   2 +-
 libavcodec/utvideodec.c |   2 +-
 tests/fate/video.mak|   3 +
 tests/ref/fate/mvha |   6 ++
 5 files changed, 143 insertions(+), 63 deletions(-)
 create mode 100644 tests/ref/fate/mvha

-- 
2.26.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 1/7] fate: add a MVHA test

2020-04-14 Thread Christophe Gisquet
---
 tests/fate/video.mak | 3 +++
 tests/ref/fate/mvha  | 6 ++
 2 files changed, 9 insertions(+)
 create mode 100644 tests/ref/fate/mvha

diff --git a/tests/fate/video.mak b/tests/fate/video.mak
index d2d43e518d..8e54718c16 100644
--- a/tests/fate/video.mak
+++ b/tests/fate/video.mak
@@ -364,6 +364,9 @@ fate-xxan-wc4: CMD = framecrc -i 
$(TARGET_SAMPLES)/wc4-xan/wc4trailer-partial.av
 FATE_VIDEO-$(call DEMDEC, WAV, SMVJPEG) += fate-smvjpeg
 fate-smvjpeg: CMD = framecrc -idct simple -flags +bitexact -i 
$(TARGET_SAMPLES)/smv/clock.smv -an
 
+FATE_VIDEO-$(call DEMDEC, AVI, MVHA) += fate-mvha
+fate-mvha: CMD = framecrc -i $(TARGET_SAMPLES)/mvha/mvha.avi -an
+
 FATE_VIDEO += $(FATE_VIDEO-yes)
 
 FATE_SAMPLES_FFMPEG += $(FATE_VIDEO)
diff --git a/tests/ref/fate/mvha b/tests/ref/fate/mvha
new file mode 100644
index 00..3635f97730
--- /dev/null
+++ b/tests/ref/fate/mvha
@@ -0,0 +1,6 @@
+#tb 0: 1001/3
+#media_type 0: video
+#codec_id 0: rawvideo
+#dimensions 0: 640x480
+#sar 0: 0/1
+0,  0,  0,1,   614400, 0xff8fb84b
-- 
2.26.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 2/7] get_bits: support 32bits cache

2020-04-14 Thread Christophe Gisquet
Therefore, also activate it under ARCH_X86 (testing for more archs welcome)
for the only codecs supporting said cache reader.
For UTVideo, on 8 bits samples and ARCH_X86_32 (X86_64 being unaffected),
timings for one line do ~19.4k -> 15.1k and 16.5k (roughly 17% speedup).
---
 libavcodec/get_bits.h   | 110 
 libavcodec/mvha.c   |   2 +-
 libavcodec/utvideodec.c |   2 +-
 3 files changed, 80 insertions(+), 34 deletions(-)

diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h
index 66fb877599..cb4df98e54 100644
--- a/libavcodec/get_bits.h
+++ b/libavcodec/get_bits.h
@@ -58,10 +58,40 @@
 #define CACHED_BITSTREAM_READER 0
 #endif
 
+#if CACHED_BITSTREAM_READER
+
+# ifndef BITSTREAM_BITS
+#   if HAVE_FAST_64BIT || defined(LONG_BITSTREAM_READER)
+# define BITSTREAM_BITS   64
+#   else
+# define BITSTREAM_BITS   32
+#   endif
+# endif
+
+# if BITSTREAM_BITS == 64
+#   define BITSTREAM_HBITS  32
+typedef uint64_t cache_type;
+#   define AV_RB_ALL  AV_RB64
+#   define AV_RL_ALL  AV_RL64
+#   define AV_RB_HALF AV_RB32
+#   define AV_RL_HALF AV_RL32
+#   define CACHE_TYPE(a) UINT64_C(a)
+# else
+#   define BITSTREAM_HBITS  16
+typedef uint32_t cache_type;
+#   define AV_RB_ALL  AV_RB32
+#   define AV_RL_ALL  AV_RL32
+#   define AV_RB_HALF AV_RB16
+#   define AV_RL_HALF AV_RL16
+#   define CACHE_TYPE(a) UINT32_C(a)
+#endif
+
+#endif
+
 typedef struct GetBitContext {
 const uint8_t *buffer, *buffer_end;
 #if CACHED_BITSTREAM_READER
-uint64_t cache;
+cache_type cache;
 unsigned bits_left;
 #endif
 int index;
@@ -121,7 +151,11 @@ static inline unsigned int show_bits(GetBitContext *s, int 
n);
  */
 
 #if CACHED_BITSTREAM_READER
-#   define MIN_CACHE_BITS 64
+# if BITSTREAM_BITS == 32
+#   define MIN_CACHE_BITS (32-7)
+# else
+#   define MIN_CACHE_BITS  32
+# endif
 #elif defined LONG_BITSTREAM_READER
 #   define MIN_CACHE_BITS 32
 #else
@@ -226,22 +260,34 @@ static inline int get_bits_count(const GetBitContext *s)
 }
 
 #if CACHED_BITSTREAM_READER
-static inline void refill_32(GetBitContext *s, int is_le)
+static inline void refill_half(GetBitContext *s, int is_le)
 {
 #if !UNCHECKED_BITSTREAM_READER
 if (s->index >> 3 >= s->buffer_end - s->buffer)
 return;
 #endif
 
+#if BITSTREAM_BITS == 32
+if (s->bits_left > 16) {
+if (is_le)
+s->cache |= (uint32_t)s->buffer[s->index >> 3] << s->bits_left;
+else
+s->cache |= (uint32_t)s->buffer[s->index >> 3] << (32 - s->bits_left);
+s->index += 8;
+s->bits_left += 8;
+return;
+}
+#endif
+
 if (is_le)
-s->cache = (uint64_t)AV_RL32(s->buffer + (s->index >> 3)) << 
s->bits_left | s->cache;
+s->cache |= (cache_type)AV_RL_HALF(s->buffer + (s->index >> 3)) << 
s->bits_left;
 else
-s->cache = s->cache | (uint64_t)AV_RB32(s->buffer + (s->index >> 3)) 
<< (32 - s->bits_left);
-s->index += 32;
-s->bits_left += 32;
+s->cache |= (cache_type)AV_RB_HALF(s->buffer + (s->index >> 3)) << 
(BITSTREAM_HBITS - s->bits_left);
+s->index += BITSTREAM_HBITS;
+s->bits_left += BITSTREAM_HBITS;
 }
 
-static inline void refill_64(GetBitContext *s, int is_le)
+static inline void refill_all(GetBitContext *s, int is_le)
 {
 #if !UNCHECKED_BITSTREAM_READER
 if (s->index >> 3 >= s->buffer_end - s->buffer)
@@ -249,22 +295,22 @@ static inline void refill_64(GetBitContext *s, int is_le)
 #endif
 
 if (is_le)
-s->cache = AV_RL64(s->buffer + (s->index >> 3));
+s->cache = AV_RL_ALL(s->buffer + (s->index >> 3));
 else
-s->cache = AV_RB64(s->buffer + (s->index >> 3));
-s->index += 64;
-s->bits_left = 64;
+s->cache = AV_RB_ALL(s->buffer + (s->index >> 3));
+s->index += BITSTREAM_BITS;
+s->bits_left = BITSTREAM_BITS;
 }
 
-static inline uint64_t get_val(GetBitContext *s, unsigned n, int is_le)
+static inline cache_type get_val(GetBitContext *s, unsigned n, int is_le)
 {
-uint64_t ret;
+cache_type ret;
 av_assert2(n>0 && n<=63);
 if (is_le) {
-ret = s->cache & ((UINT64_C(1) << n) - 1);
+ret = s->cache & ((CACHE_TYPE(1) << n) - 1);
 s->cache >>= n;
 } else {
-ret = s->cache >> (64 - n);
+ret = s->cache >> (BITSTREAM_BITS - n);
 s->cache <<= n;
 }
 s->bits_left -= n;
@@ -274,12 +320,12 @@ static inline uint64_t get_val(GetBitContext *s, unsigned 
n, int is_le)
 static inline unsigned show_val(const GetBitContext *s, unsigned n)
 {
 #ifdef BITSTREAM_READER_LE
-return s->cache & ((UINT64_C(1) << n) - 1);
+return s->cache & ((CACHE_TYPE(1) << n) - 1);
 #else
-return s->cache >> (64 - n);
+return s->cache >> (BITSTREAM_BITS - n);
 #endif
 }
-#endif
+#endif // ~CACHED_BITSTREAM_READER
 
 /**
  * Skips the specified number of bits.
@@ -384,11 +430,11 @@ static inline unsigned int get_bits(GetBitContext *s, int 
n)
 av_assert2(n>0 && n<=32);
 if (n > 

[FFmpeg-devel] [PATCH 3/7] get_xbits: request fewer bits

2020-04-14 Thread Christophe Gisquet
Also allows it to not break 32bits readers.
---
 libavcodec/get_bits.h | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h
index cb4df98e54..59bfbdd88b 100644
--- a/libavcodec/get_bits.h
+++ b/libavcodec/get_bits.h
@@ -367,8 +367,24 @@ static inline void skip_remaining(GetBitContext *s, 
unsigned n)
 static inline int get_xbits(GetBitContext *s, int n)
 {
 #if CACHED_BITSTREAM_READER
-int32_t cache = show_bits(s, 32);
-int sign = ~cache >> 31;
+int32_t cache;
+int sign;
+
+if (n > s->bits_left)
+#ifdef BITSTREAM_READER_LE
+refill_half(s, 1);
+#else
+refill_half(s, 0);
+#endif
+
+#if BITSTREAM_BITS == 32
+cache = s->cache;
+#elif defined(BITSTREAM_READER_LE)
+cache = s->cache & 0x;
+#else
+cache = s->cache >> 32;
+#endif
+sign = ~cache >> 31;
 skip_remaining(s, n);
 
 return uint32_t)(sign ^ cache)) >> (32 - n)) ^ sign) - sign;
-- 
2.26.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] IRC meeting

2016-06-03 Thread Christophe Gisquet
Hi,

sorry if I'm or was confusing, I'm best-effort here.

2016-06-03 21:13 GMT+02:00 Michael Niedermayer :
> > FOUR.TWO)
[...]
> i want some assistent to help with dayly server admin duties
> most root admins we have help and contribute but are often busy
> raz recently set up a full backup system for us, someone seems
> helping with security updates as iam not always the first doing them
> (i think its lou but didnt check) and all kinds of other things ...
>
> what would be really nice would be someone who has some time and for
> whom server admining is a fun thing to do,
> someone who would do it "because it needs to be done" would be 2nd
> choice IMHO

You are actually replying to FOUR)
This seems to be "ask someone to join in the rotation of tasks",
rather than a full-blown delegation of work. That's an option. I don't
pretend to have correctly worded the action of FOUR)

> iam not suggesting anything specific but there is one thing that i
> think i have not seen talked about and that is moderation. Mailman
> supports moderating individual subscribers.
>
> It might be along the lines of
> If one repeatly and conciously violates the CoC and no real solution
> can be found, he can be given the choice by the mailman admins to
> either promise to attempt not to repeat the violation
> or to be moderated until an event occurs that changes the situation
> or some timeout.
> The insulted person should have the option to veto this at any time so
> if one feels that it wasnt enough to justify the inconvenience the
> "hurt" party should be able to stop this.
> This would have to be combined with something effective for IRC and
> possibly git, in case issues shift there too

You are actually replying to point ONE, more precisely, "how to modify
it" (the CoC).
There are many opinions. I think an amount of time best described by
some aleph can be spent discussing the details, but I bet people are
ok with "use VLC's or whichever, then vote for improvement". You are
proposing an improvement.

The point here is, several people seem to want things to move here
irrespective of ONE, so a vote seems (to me) the natural step forward.
So, either someone acts towards the action, and it may happen, or
clearly, nothing will happen.

I'm resting my opinion that we are putting the cart before the horse,
but I recognize the need from people for action, so I'm ready to vote
on that point, irrespective of ONE. If it happens.

Best regards,
-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] IRC meeting

2016-06-03 Thread Christophe Gisquet
Hi,

here's I think a list of things left to do. I remember saste doing it
on some occasions.

Please comment on whether you think I have pointed an actual action to
perform. Don't mind the details for now, it's just to get the train
going.

2016-05-30 10:49 GMT+02:00 Michael Niedermayer :


ONE)
> May 28 19:07:54https://wiki.videolan.org/Code_of_Conduct/
[...]
> May 28 19:29:06   so lets add that to current CoC and put it 
> for vote on ML?

Action: put to vote the addition of the "repercussions" from the linked page.

There was some discussion as how to modify it. Whether to accept as
is, or delay for additions can be included in the vote, IMO:
> May 28 19:29:40durandal_1707: send a draft first and get 
> comments to improve it



TWO)
> May 28 19:38:00  AVClass & AVOption should be added to all 
> public "Context" structs for API consistency and to make it easier for apps 
> to support multiple ffmpeg versios and distros

Action: put to vote (vote ongoing).



THREE)
> May 28 20:21:19liabvutil is currently the only non modular 
> library. literally everything is compiled and installed no matter your 
> configure options
[...]
> May 28 20:31:19Err what's the result on the previous topic? 
> Send patches and it'll get reviewed but the end goal is ok? (I don't mind)
> May 28 20:31:47kurosu_: sounds like it yes

Action: whoever is interested will submit patches, with the idea that
the end goal is worth reaching.


FOUR)
> May 28 20:32:54 that's the other issue; we don't really have 
> someone to handle the sysadmin stuff
> May 28 20:35:58 i feel like we really need someone to 
> officially handle the sysadmin stuff
> May 28 20:33:14 does anyone have a sysadmin in his 
> relationships that would be interested in that?
> May 28 20:34:09  we have a virtual box in bulgaria that ffbox0 
> could be moved to if teres a volunteer
> May 28 20:52:40VLC have offered to sysadmin for years
> May 28 20:55:01  i put it on the table, my offer to admin again
> May 28 21:22:30  iive: makes a good suggestion, GitHub would release 
> at least two services (git and trac). For trac to GitHub you could look at 
> something like: https://github.com/trustmaster/trac2github It also might make 
> the project more accessible to new contributors

Action: decide how and who performs admin tasks.
I think the above lists the possible options in a vote:
- Compn as admin
- Delegate admin tasks to VLC (seemed related to hosting too)
- Draft a request for admins to be circulated
- Move stuff to bulgaria
- Move stuff to github


FOUR.TWO)
> May 28 20:58:44  we probably should config postfix or 
> spamassasin to check DMARK/DKIM/SPF or part of that on incoming mai (not 
> really important but i thn it doest curretly)

Details on what needs to be done, I think this is not high-level
enough for a vote.
Action: michaelni to list some wishes?



FIVE)
> May 28 21:28:47   i want possibility to fund devs to work on 
> specific part of FFmpeg
> May 28 21:28:59   what happened with FFmtech?
> May 28 21:29:38  at the moment we have a total of ~15K USD in the SPI 
> and ffis.de funds
> May 28 21:30:58  saste, can we fund someone maybe to make 
> kierans fuzzing GSoC project a reality ? i mean if people agree to that
> May 28 21:31:59   so just need to pick some part of codebase 
> that need refactoring/cleaning up/improving?
> May 28 21:59:08Kierank mentioned btw he was willing to fund 
> some work on libavfilter API that would suit his needs, the details of which 
> he'll give to whomever is integrated

Action: list the sponsoring opportunities for work on ffmpeg: tasks
and origin of funds?
Subpoints:
- better lavfi API for some usecases
- find whether SPI/... can be used for that



SIX)
> May 28 19:32:58since cehoyos is here, we could maybe talk 
> about his behavior and why the CoC and repercussions for violating it was 
> introduced to begin with
> May 28 21:51:31is there anything concrete we’re going to do w.r.t. 
> derek and carl?
> May 28 21:59:52So Derek and carl?
> May 28 22:26:54 its too late now, and we need to handle the 
> situation at hand
> May 28 22:28:24nevcairiel, do you want a vote here and now, 
> to what effect?
> May 28 22:31:09I agree that a vote on the ML would be better to give 
> people that fell asleep here the chance to participate also
> May 28 22:34:59It's late here. I'm ok for a vote also, just 
> not sure what kind of offense it would be

And the big, flashy pink, elephant in the room:
Action: draft a vote on the repercussions to Carl Eugen Hoyos
behaviours (patch submission, general interaction with others)
Note, I don't have a strong idea on what it may contain (option of
temp/week/perma ban, warning, removal of some rights, etc).

If someone is interested in this, maybe listing all potential options
and use Condorcet etc.

I don't really care, the above just says: 

Re: [FFmpeg-devel] [PATCH] avcodec: add MagicYUV decoder

2016-06-01 Thread Christophe Gisquet
2016-05-30 17:50 GMT+02:00 Paul B Mahol :
> On 5/30/16, Piotr Bandurski  wrote:
>> Hi,
>>
>>> patch attached.
>>
>> Is decoding of interlaced video supported? Because I get here invalid
>> output.
>>
>> Also crash happens with this fuzzed file:
>>
>> https://www.datafilehost.com/d/c64eb5b1
>>
>> Regards
>
> Can you create YUVA video somehow? I can't with virtualdub.

So a final iteration was pushed.

Can any of you create fate tests, ideally for 3 cases,
"normal"/"interlaced"/"YUVA" (or whatever required introducing quirks
in the code)

Thanks,
-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec: add MagicYUV decoder

2016-05-31 Thread Christophe Gisquet
Hi,

2016-05-30 15:09 GMT+02:00 Paul B Mahol :
Hi,

2016-05-30 15:09 GMT+02:00 Paul B Mahol :
>> ffmpeg seems to have libavutil/qsort.h, but I don't even know how much
>> effort is needed to use it here.
>
> Changed, doesn't help but maybe will for other archs.

I have no idea why it is present, but it made sense to me to reuse
whatever ffmpeg already has.

>> That's somewhat similar to png paeth, except not actually reusable. I
>> wonder if there's something in libavcodec that could be reused, in
>> which case moving it to the hdsp context would be nice)
>
> Our Huffyuv decoder is still missing gradient prediction...

I'd say neither does the encoder, so it's not "specified" in any
public "version" (whether the 15yo ones or the ffvhuff ones).

It's a matter of creating a new version with it for smaller filesizes.
But I bet users are all over those newer codecs (utvideo) and
haven't looked/won't look back at vhuffyuv if this predictor and
slices were implemented there.

>>> +} else if (pred == MEDIAN) {
>> [...]
>>> +} else {
>>
>> So, that's maybe a detail at this point, and you want to move quickly
>> to other stuff, but:
>> would you like to look at e.g. huffyuvdec or pngdec for a code that is
>> not as nice looking, but more cache-friendly?
>>
>> Basically, you move the first line out of the loops, and then do
>> sequentially, per row in the loop, bitstream reading, reconstruction
>> (residual+prediction) and any post-processing...
>
> Just tried, didn't help much here.

Hmm, was that single-threaded decoding? Also, the VLC decoding is
slower than needed (again, see huffyuvdec and generate_joint_tables),
so that may not show up.

Anyway, whatever temporary patch you had probably became invalid after
you implemented interlaced encoding, so that's kind of moot.

>>> +for (i = 0; i < p->height; i++) {
>>> +for (x = 0; x < p->width; x++) {
>>> +b[x] += g[x];
>>> +r[x] += g[x];
>>> +}

btw, isn't that add_bytes from HuffYUVDSPContext ? ie:
hdsp->add_bytes(b, g, p->width);
etc

Except for this (and pending Piotr's fuzzing cases), looks fine.

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec: add MagicYUV decoder

2016-05-30 Thread Christophe Gisquet
Hi,

2016-05-29 21:51 GMT+02:00 Paul B Mahol :
> +typedef struct Slice {
> +uint32_t start;
> +uint32_t size;
> +} Slice;

I'm not a security expert, but is there a reason for not using plain int there ?

> +typedef struct MagicYUVContext {
> +AVFrame*p;
> +int slice_height;
> +int nb_slices;
> +int planes;
> +uint8_t *buf;
> +int hshift[4];
> +int vshift[4];
> +Slice   *slices[4];
> +int slices_size[4];
> +uint8_t freq[4][256];
> +VLC vlc[4];
> +HuffYUVDSPContext   hdsp;
> +} MagicYUVContext;

I guess someone able to understand the code immediately understand
what those are, but that's pretty sparse comment-wise.

> +typedef struct HuffEntry {

And another Huffman+prediction codec... I don't really see any
valuable addition here... :(

> +uint8_t  sym;
> +uint8_t  len;
> +uint32_t code;
> +} HuffEntry;
> +
> +static int ff_magy_huff_cmp_len(const void *a, const void *b)
> +{
> +const HuffEntry *aa = a, *bb = b;
> +return (aa->len - bb->len) * 256 + aa->sym - bb->sym;
> +}
> +
> +static int build_huff(VLC *vlc, uint8_t *freq)
> +{
> +HuffEntry he[256];
> +uint32_t codes[256];
> +uint8_t bits[256];
> +uint8_t syms[256];
> +uint32_t code;
> +int i, last;
> +
> +for (i = 0; i < 256; i++) {
> +he[i].sym = 255 - i;
> +he[i].len = freq[i];
> +}
> +qsort(he, 256, sizeof(*he), ff_magy_huff_cmp_len);

ffmpeg seems to have libavutil/qsort.h, but I don't even know how much
effort is needed to use it here.

> +pred = get_bits(, 8);
> +dst = p->data[i] + j * sheight * stride;
> +for (k = 0; k < height; k++) {
> +for (x = 0; x < width; x++) {
> +int pix;
> +if (get_bits_left() <= 0) {
> +return AVERROR_INVALIDDATA;
> +}
> +pix = get_vlc2(, s->vlc[i].table, s->vlc[i].bits, 3);
> +if (pix < 0) {
> +return AVERROR_INVALIDDATA;
> +}
> +dst[x] = 255 - pix;
> +}
> +dst += stride;
> +}
> +
> +if (pred == LEFT) {
> +dst = p->data[i] + j * sheight * stride;
> +s->hdsp.add_hfyu_left_pred(dst, dst, width, 0);
> +dst += stride;
> +for (k = 1; k < height; k++) {
> +s->hdsp.add_hfyu_left_pred(dst, dst, width, dst[-stride]);
> +dst += stride;
> +}
> +} else if (pred == GRADIENT) {
[...]

That's somewhat similar to png paeth, except not actually reusable. I
wonder if there's something in libavcodec that could be reused, in
which case moving it to the hdsp context would be nice)

> +} else if (pred == MEDIAN) {
[...]
> +} else {

So, that's maybe a detail at this point, and you want to move quickly
to other stuff, but:
would you like to look at e.g. huffyuvdec or pngdec for a code that is
not as nice looking, but more cache-friendly?

Basically, you move the first line out of the loops, and then do
sequentially, per row in the loop, bitstream reading, reconstruction
(residual+prediction) and any post-processing...

> +if (decorrelate) {
> +uint8_t *b = p->data[0];
> +uint8_t *g = p->data[1];
> +uint8_t *r = p->data[2];
> +
> +for (i = 0; i < p->height; i++) {
> +for (x = 0; x < p->width; x++) {
> +b[x] += g[x];
> +r[x] += g[x];
> +}
> +b += p->linesize[0];
> +g += p->linesize[1];
> +r += p->linesize[2];
> +}
> +}

... in particular, this step, that could be done line-wise, inside the
threaded decoding, if I'm not mistaken. (cf. also png's deloco)

Otherwise, I don't see much of anything that would require another
reviewing round.
-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] Remove Derek Buitenhuis from MAINTAINERS

2016-05-20 Thread Christophe Gisquet
Hi,

2016-05-20 1:55 GMT+02:00 Lukasz Marek :
> Is Derek revoked to commit or what? Couldn't he just commit this patch and
> leave? :P  I was a problem for some people, but I see they still have
> problems. Let people with problems go away with they problems.

Sorry if you felt ganged up on previously. Hopefully the new Code of
Conduct will avoid that such situations raise to an unsufferable
level.

But whatever bad technical blood there have been between the two of
you, and whoever I may agree technically with, this is uncalled for.
You're just adding fuel to a tense situation, and causing distress to
someone.

This type of comment is exactly what should not be allowed by the Code
of Conduct.

Sorry if I have added more fuel,
-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [Vote] Code of Conduct

2016-05-20 Thread Christophe Gisquet
Hi,

2016-05-18 20:40 GMT+02:00 Michael Niedermayer :
> Please state clearly if you agree to the text or if not.
> we can extend and tune it later and do another vote if there are more
>  suggestions

I agree to having a CoC.

This text is a first step, so I'm ok with it, but hoping it will be improved.

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] doc/developer.texi: Add a code of conduct

2016-05-20 Thread Christophe Gisquet
Hi,

2016-05-20 2:38 GMT+02:00 Timothy Gu :
>> > Note how it has a list of specific violations, instead of vague things like
>> > "Be excellent" that the FFmpeg one has.
>> > Note how it has a huge section on disciplinary procedures.
[...]
> I have to agree with Kieran here. I believe that as a community, we definitely
> _want_ to assume good faith, etc. But conflict resolution requires strict,
> codified consequences for violations of the CoC, to ensure fairness. The
> language needs to be made more serious so that people actually take it in the
> way it is intended.

Completely agree. Without sanctions, I fear it won't help mitigate problems.

Also I'd like this to extend to any medium: mailing list, irc, private
communication between an offender and an "offended"...

Secondly, I'd like this to apply to any comment, whether on a person
or his work. Being frustrated is no reason for foul language, even if
that's not the most hindering stuff that can happen.

Also, because a text is subject to interpretation, the CoC could maybe
state how this is arbitrated.

Best regards,
-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 01/10] avcodec/dca: remove Rice code length limit

2016-05-20 Thread Christophe Gisquet
2016-05-13 11:48 GMT+02:00 foo86 :
> -unsigned int v = get_unary(gb, 1, 128);
> +unsigned int v = get_unary(gb, 1, get_bits_left(gb));

Not that the patch is not ok, but I have a few uneducated questions:
1) Given the get_bits_long(gb, k) afterwards, won't that code cause
overreads for corrupted bitstreams?
2) I haven't checked the calling code, but consequently, wouldn't it
be better to first check that at least k+1 bits are available?
3) 128 is already fairly large; is the new code for valid bitstreams
(in the sense of specs and actually generated) or for corrupted
bitstreams? I don't know where the parsing is validated afterwards
(e.g. if there have been overreads or invalid values parsed)

Thanks,
-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 4/4] lossless audio dsp: unroll

2016-05-11 Thread Christophe Gisquet
2016-05-01 15:33 GMT+02:00 Christophe Gisquet <christophe.gisq...@gmail.com>:
> The loops are guaranteed to be at least multiples of 8, so this
> unrolling is safe but allows exploiting execution ports.
>
> For int32 version: 68 -> 58c.

Ping?

This was ok'ed by James irrespective of the auto-vectorization
discussion, but I don't mind it being dropped anyway. Mostly cleaning
my local branches.

Best regards,
-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] vc2enc_dwt: use 32 bit coefficients by default

2016-05-08 Thread Christophe Gisquet
2016-05-07 21:48 GMT+02:00 Rostislav Pehlivanov :
>  The costliest part of the encoder right now is encoding the coefficients
> (~36%). Slightly less-costly is rate control (~31%), and after that is the
> transform (~12%). There really isn't anything else, other than 3 copies
> (input image converted to signed and copied to a buffer, then because the
> transform is out of place there's a copy to another buffer and then back),
> but they don't take that much time.

Thanks for the detailed reply.

Anyway, patch ok with me, and I agree with Michael. I was just curious
of what an "improved fix" would do.

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] vc2enc_dwt: use 32 bit coefficients by default

2016-05-07 Thread Christophe Gisquet
2016-05-07 19:12 GMT+02:00 Rostislav Pehlivanov :
> The problem is that with particularly complex images and especially at
> high bit depths and 5-level transforms the coefficients would overflow

I guess it also depends on the transform type, so that counts also for
the last comment.

> causing huge artifacts to appear. This was discovered thanks to the fate
> tests, which will have to be redone as this fixes a multitude of
> problems and increases PSNR.

I admit I saw strange numbers, but as they sometime include color
transform forth and back, I didn't really pay attention.
Was there a risk it produced incorrect output in "valid" decoders?

> There is a slight performance drop associated with this change, making
> the encoder slower by 1.15 times, however this is necessary in order to
> avoid undefined behavior and overflows.

This means no asm has been written yet. Is the performance drop mostly
in transforms, or rather any coefficient manipulation (like rate
evaluation etc), or memory bandwidth?
In the former case, it might be less critical in the future.

> It would be worth to template the transforms to keep the performance for
> 8 bit images as 32 bit coefficients are unnecessary for that case, but
> the primary use of the encoder is to encode video at 10 bits.

I don't know what that entails, but indeed, there are several
parameters affecting what's possible, and the current change is the
simplest/fastest/safest at the moment.

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/2] vc2enc: prevent random data

2016-05-06 Thread Christophe Gisquet
Hi,

2016-05-06 2:19 GMT+02:00 Rostislav Pehlivanov :
> I plan to merge the fate tests as well tomorrow or on Saturday when I'll
> have time to quickly fix bugs which appear on platforms I haven't tested
> the encoder on. Hopefully none, but you never know.

Sure, makes sense.

In case you don't find time nor devices for those tests, Michael seems
to have tested on a fair number of archs:
"tested on mips/arm/x86 linux and mingw32/64"

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 2/2] vc2: fate tests

2016-05-05 Thread Christophe Gisquet
---
 tests/fate/vcodec.mak   | 17 -
 tests/ref/vsynth/vsynth1-vc2-420p   |  4 
 tests/ref/vsynth/vsynth1-vc2-420p10 |  4 
 tests/ref/vsynth/vsynth1-vc2-420p12 |  4 
 tests/ref/vsynth/vsynth1-vc2-422p   |  4 
 tests/ref/vsynth/vsynth1-vc2-422p10 |  4 
 tests/ref/vsynth/vsynth1-vc2-422p12 |  4 
 tests/ref/vsynth/vsynth1-vc2-444p   |  4 
 tests/ref/vsynth/vsynth1-vc2-444p10 |  4 
 tests/ref/vsynth/vsynth1-vc2-444p12 |  4 
 tests/ref/vsynth/vsynth2-vc2-420p   |  4 
 tests/ref/vsynth/vsynth2-vc2-420p10 |  4 
 tests/ref/vsynth/vsynth2-vc2-420p12 |  4 
 tests/ref/vsynth/vsynth2-vc2-422p   |  4 
 tests/ref/vsynth/vsynth2-vc2-422p10 |  4 
 tests/ref/vsynth/vsynth2-vc2-422p12 |  4 
 tests/ref/vsynth/vsynth2-vc2-444p   |  4 
 tests/ref/vsynth/vsynth2-vc2-444p10 |  4 
 tests/ref/vsynth/vsynth2-vc2-444p12 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-420p   |  4 
 tests/ref/vsynth/vsynth_lena-vc2-420p10 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-420p12 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-422p   |  4 
 tests/ref/vsynth/vsynth_lena-vc2-422p10 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-422p12 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-444p   |  4 
 tests/ref/vsynth/vsynth_lena-vc2-444p10 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-444p12 |  4 
 28 files changed, 124 insertions(+), 1 deletion(-)
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p10
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p12
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p10
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p12
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p10
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p12
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p10
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p12
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p10
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p12
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p10
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p12
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p10
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p12
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p10
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p12
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p10
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p12

diff --git a/tests/fate/vcodec.mak b/tests/fate/vcodec.mak
index ccf88ce..0e08894 100644
--- a/tests/fate/vcodec.mak
+++ b/tests/fate/vcodec.mak
@@ -29,6 +29,19 @@ FATE_VCODEC-$(call ENCDEC, DNXHD, DNXHD) += dnxhd-720p   
   \
 dnxhd-720p-rd   \
 dnxhd-720p-10bit
 
+FATE_VCODEC-$(call ENCDEC, VC2 DIRAC, MOV) += vc2-420p vc2-420p10 vc2-420p12 \
+  vc2-422p vc2-422p10 vc2-422p12 \
+  vc2-444p vc2-444p10 vc2-444p12
+fate-vsynth1-vc2-%:  FMT  = mov
+fate-vsynth1-vc2-%:  ENCOPTS = -pix_fmt 
yuv$(@:fate-vsynth1-vc2-%=%) \
+   -vcodec vc2 -frames 5 -strict -1
+fate-vsynth2-vc2-%:  FMT  = mov
+fate-vsynth2-vc2-%:  ENCOPTS = -pix_fmt 
yuv$(@:fate-vsynth2-vc2-%=%) \
+   -vcodec vc2 -frames 5 -strict -1
+fate-vsynth_lena-vc2-%:  FMT  = mov
+fate-vsynth_lena-vc2-%:  ENCOPTS = -pix_fmt 
yuv$(@:fate-vsynth_lena-vc2-%=%) \
+   -vcodec vc2 -frames 5 -strict -1
+
 fate-vsynth%-dnxhd-720p: ENCOPTS = -s hd720 -b 90M  \
-pix_fmt yuv422p -frames 5 -qmax 8
 fate-vsynth%-dnxhd-720p: FMT = dnxhd
@@ -356,7 +369,9 @@ FATE_VSYNTH2 = $(FATE_VCODEC:%=fate-vsynth2-%)
 FATE_VSYNTH_LENA = $(FATE_VCODEC:%=fate-vsynth_lena-%)
 # Redundant tests because they just resize the input
 RESIZE_OFF   = dnxhd-720p dnxhd-720p-rd dnxhd-720p-10bit dnxhd-1080i \
-   dv dv-411 dv-50 avui snow snow-hpel snow-ll
+   dv dv-411 dv-50 avui snow snow-hpel snow-ll vc2-420p \
+   vc2-420p10 vc2-420p12 vc2-422p vc2-422p10 vc2-422p12 \
+   vc2-444p vc2-444p10 vc2-444p12
 # Incorrect 

[FFmpeg-devel] [PATCH 1/2] vc2enc: prevent random data

2016-05-05 Thread Christophe Gisquet
The slice prefix is 0 in the reference encoder and the decoder ignores it.
Writing 0 there seems like the best temporary solution.

The padding could have contained uninitialized data, but reference VC2
encoders put 0xFF there, hence the memset value.

Overall this allows producing bistreams with no random data for use by fate.
---
 libavcodec/vc2enc.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/libavcodec/vc2enc.c b/libavcodec/vc2enc.c
index 6d24552..bbbeaa0 100644
--- a/libavcodec/vc2enc.c
+++ b/libavcodec/vc2enc.c
@@ -777,7 +777,10 @@ static int encode_hq_slice(AVCodecContext *avctx, void 
*arg)
 uint8_t quants[MAX_DWT_LEVELS][4];
 int p, level, orientation;
 
+/* The reference decoder ignores it, and its typical length is 0 */
+memset(put_bits_ptr(pb), 0, s->prefix_bytes);
 skip_put_bytes(pb, s->prefix_bytes);
+
 put_bits(pb, 8, quant_idx);
 
 /* Slice quantization (slice_quantizers() in the specs) */
@@ -809,6 +812,8 @@ static int encode_hq_slice(AVCodecContext *avctx, void *arg)
 }
 pb->buf[bytes_start] = pad_s;
 flush_put_bits(pb);
+/* vc2-reference uses that padding that decodes to '0' coeffs */
+memset(put_bits_ptr(pb), 0xFF, pad_c);
 skip_put_bytes(pb, pad_c);
 }
 
-- 
2.8.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/2] vc2enc: prevent random data

2016-05-04 Thread Christophe Gisquet
Hi,

2016-05-04 3:06 GMT+02:00 Rostislav Pehlivanov :
> vc2hqencode is not the reference encoder, vc2-reference is. It's even worse
> though.

Sorry, I thought authoritative could mean "from the authors", so I
didn't mean it as "the" reference/"the authority". Just a good
reference in case the specs are not clear or don't mention it. If
vc2-reference or "the" reference dirac codec do it differently, then
those should be followed.

> Also, the commit message still says 0 instead of 0xff.

I'm getting confused: neither version of the patch does. On a side
note, maybe I should retract the "standardized value" for the padding.

The commit message does mention 0 for the prefix, because I didn't see
any reference as what should be there. Anything I've seen uses 0
prefix bytes.

> btw the value 0xff
> makes sense since that's the golomb code for 0. Would make reading broken
> files a little more robust (instead of reading a ton of zeroes, losing
> bitstream sync and causing trouble elsewhere).

Yeah, that's what I meant with the rationale. Classical reason to
choose a value for a padding.
Not a big deal, but isn't the specific codeword for 0 '1'?

> Could you make the comment a C89 style like the rest of the encoder?

Will come later today, and I may wait for your other patch to vc2enc.

And sorry I haven't followed the style of the file.

> Other than that the patch is okay. Slice padding is usually very small, so
> no real performance degradation.

I haven't tested but I take your word on it.

Best regards,
-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/2] vc2enc: prevent random data

2016-05-03 Thread Christophe Gisquet
Le 3 mai 2016 22:15, "Rostislav Pehlivanov" <atomnu...@gmail.com> a écrit :
>
> On 3 May 2016 at 19:16, Christophe Gisquet <christophe.gisq...@gmail.com>
> wrote:
> >
> >
> > Btw, afaik, the padding is 0xFF, so expecting 0 in the buffer there
> > can't do the job.
> >
> >
> I don't get it, you keep saying that the padding must be 0xff yet the
patch
> you posted puts 0x00.

Didn't a second mail and patch reach the mailing list?

> Where did you even read that the padding must be
> 0xff, I don't remember the specs saying anything about what the padding
> should contain.

I didn't say 'must' but 'AFAIK', because I could be wrong, but until proven
otherwise...

Anyway, there:
https://github.com/bbc/vc2hqencode/blob/master/vc2hqencode/serialise.hpp#L103
and I'd say it's authoritative?

Weird that a spec doesn't mandate a way to fill it, but maybe that's SMPTE.

For the rationale, no idea, maybe the coding of the zero coeff.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/2] vc2enc: prevent random data

2016-05-03 Thread Christophe Gisquet
Hi,

2016-05-03 19:24 GMT+02:00 Hendrik Leppkes :
>> +// The reference decoder ignores it, and its typical length is 0
>> +memset(put_bits_ptr(pb), 0, s->prefix_bytes);
>>  skip_put_bytes(pb, s->prefix_bytes);
>> +
>
> I don't suppose we have a function to just write zero bytes instead of
> these shenangans of written to the buffer and skiping?

I don't think so, but I may be wrong. The AV_ZERO macros are of course
not suited here.

>> +memset(pb->buf_ptr, 0, pad_c);
>>  skip_put_bytes(pb, pad_c);
>
> Both occurances use different ways to access the buffer, once
> put_bits_ptr(pb) and one pb->buf_ptr, if this is the only way to do
> this, maybe stick to one?

Yeah, squashing issue. My next patch must have crossed your mail.

I thought of having another put_bits function like
put_byte_something(PutBitContext, uint8_t byte, unsigned int len).
But probably overkill.

Btw, afaik, the padding is 0xFF, so expecting 0 in the buffer there
can't do the job.

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 0/2] Fix VC-2 encoder

2016-05-03 Thread Christophe Gisquet
The encoder was leaving uninitialized data in the padding of slices,
while the specs seem to mandate the use of 0xFF. This is also the case
for the slice prefix, but it seems completely unused.

To validate this, classical vsynth encoding/decoding fate tests for
all supported chroma formats are added. Suggestions for being even
more concise in the target/rules are welcome.

Christophe Gisquet (2):
  vc2enc: prevent random data
  vc2: fate tests

 libavcodec/vc2enc.c |  4 
 tests/fate/vcodec.mak   | 17 -
 tests/ref/vsynth/vsynth1-vc2-420p   |  4 
 tests/ref/vsynth/vsynth1-vc2-420p10 |  4 
 tests/ref/vsynth/vsynth1-vc2-420p12 |  4 
 tests/ref/vsynth/vsynth1-vc2-422p   |  4 
 tests/ref/vsynth/vsynth1-vc2-422p10 |  4 
 tests/ref/vsynth/vsynth1-vc2-422p12 |  4 
 tests/ref/vsynth/vsynth1-vc2-444p   |  4 
 tests/ref/vsynth/vsynth1-vc2-444p10 |  4 
 tests/ref/vsynth/vsynth1-vc2-444p12 |  4 
 tests/ref/vsynth/vsynth2-vc2-420p   |  4 
 tests/ref/vsynth/vsynth2-vc2-420p10 |  4 
 tests/ref/vsynth/vsynth2-vc2-420p12 |  4 
 tests/ref/vsynth/vsynth2-vc2-422p   |  4 
 tests/ref/vsynth/vsynth2-vc2-422p10 |  4 
 tests/ref/vsynth/vsynth2-vc2-422p12 |  4 
 tests/ref/vsynth/vsynth2-vc2-444p   |  4 
 tests/ref/vsynth/vsynth2-vc2-444p10 |  4 
 tests/ref/vsynth/vsynth2-vc2-444p12 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-420p   |  4 
 tests/ref/vsynth/vsynth_lena-vc2-420p10 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-420p12 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-422p   |  4 
 tests/ref/vsynth/vsynth_lena-vc2-422p10 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-422p12 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-444p   |  4 
 tests/ref/vsynth/vsynth_lena-vc2-444p10 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-444p12 |  4 
 29 files changed, 128 insertions(+), 1 deletion(-)
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p10
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p12
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p10
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p12
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p10
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p12
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p10
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p12
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p10
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p12
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p10
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p12
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p10
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p12
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p10
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p12
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p10
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p12

-- 
2.8.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 1/2] vc2enc: prevent random data

2016-05-03 Thread Christophe Gisquet
The slice prefix is 0 in the reference encoder and the decoder ignores it.
Writing 0 there seems like the best temporary solution.

The padding could have contained uninitialized data, but its standardized value
is 0xFF, hence the memset value.

Overall this allows producing bistreams with no random data for use by fate.
---
 libavcodec/vc2enc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/libavcodec/vc2enc.c b/libavcodec/vc2enc.c
index 943198b..bec513c 100644
--- a/libavcodec/vc2enc.c
+++ b/libavcodec/vc2enc.c
@@ -777,7 +777,10 @@ static int encode_hq_slice(AVCodecContext *avctx, void 
*arg)
 uint8_t quants[MAX_DWT_LEVELS][4];
 int p, level, orientation;
 
+// The reference decoder ignores it, and its typical length is 0
+memset(put_bits_ptr(pb), 0, s->prefix_bytes);
 skip_put_bytes(pb, s->prefix_bytes);
+
 put_bits(pb, 8, quant_idx);
 
 /* Slice quantization (slice_quantizers() in the specs) */
@@ -809,6 +812,7 @@ static int encode_hq_slice(AVCodecContext *avctx, void *arg)
 }
 pb->buf[bytes_start] = pad_s;
 flush_put_bits(pb);
+memset(pb->buf_ptr, 0, pad_c);
 skip_put_bytes(pb, pad_c);
 }
 
-- 
2.8.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/2] vc2: fate tests

2016-05-03 Thread Christophe Gisquet
2016-05-03 19:06 GMT+02:00 Christophe Gisquet <christophe.gisq...@gmail.com>:
[SNIP]

Incorrect padding used (0 instead of 0xFF), fixed in that patch series.

-- 
Christophe
From 22ff25711062fb1ca30da1674fd622fd6f81c8e3 Mon Sep 17 00:00:00 2001
From: Christophe Gisquet <christophe.gisq...@gmail.com>
Date: Mon, 2 May 2016 21:57:29 +0200
Subject: [PATCH 2/2] vc2: fate tests

---
 tests/fate/vcodec.mak   | 17 -
 tests/ref/vsynth/vsynth1-vc2-420p   |  4 
 tests/ref/vsynth/vsynth1-vc2-420p10 |  4 
 tests/ref/vsynth/vsynth1-vc2-420p12 |  4 
 tests/ref/vsynth/vsynth1-vc2-422p   |  4 
 tests/ref/vsynth/vsynth1-vc2-422p10 |  4 
 tests/ref/vsynth/vsynth1-vc2-422p12 |  4 
 tests/ref/vsynth/vsynth1-vc2-444p   |  4 
 tests/ref/vsynth/vsynth1-vc2-444p10 |  4 
 tests/ref/vsynth/vsynth1-vc2-444p12 |  4 
 tests/ref/vsynth/vsynth2-vc2-420p   |  4 
 tests/ref/vsynth/vsynth2-vc2-420p10 |  4 
 tests/ref/vsynth/vsynth2-vc2-420p12 |  4 
 tests/ref/vsynth/vsynth2-vc2-422p   |  4 
 tests/ref/vsynth/vsynth2-vc2-422p10 |  4 
 tests/ref/vsynth/vsynth2-vc2-422p12 |  4 
 tests/ref/vsynth/vsynth2-vc2-444p   |  4 
 tests/ref/vsynth/vsynth2-vc2-444p10 |  4 
 tests/ref/vsynth/vsynth2-vc2-444p12 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-420p   |  4 
 tests/ref/vsynth/vsynth_lena-vc2-420p10 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-420p12 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-422p   |  4 
 tests/ref/vsynth/vsynth_lena-vc2-422p10 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-422p12 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-444p   |  4 
 tests/ref/vsynth/vsynth_lena-vc2-444p10 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-444p12 |  4 
 28 files changed, 124 insertions(+), 1 deletion(-)
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p10
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p12
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p10
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p12
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p10
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p12
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p10
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p12
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p10
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p12
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p10
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p12
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p10
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p12
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p10
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p12
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p10
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p12

diff --git a/tests/fate/vcodec.mak b/tests/fate/vcodec.mak
index ccf88ce..0e08894 100644
--- a/tests/fate/vcodec.mak
+++ b/tests/fate/vcodec.mak
@@ -29,6 +29,19 @@ FATE_VCODEC-$(call ENCDEC, DNXHD, DNXHD) += dnxhd-720p  \
 dnxhd-720p-rd   \
 dnxhd-720p-10bit
 
+FATE_VCODEC-$(call ENCDEC, VC2 DIRAC, MOV) += vc2-420p vc2-420p10 vc2-420p12 \
+  vc2-422p vc2-422p10 vc2-422p12 \
+  vc2-444p vc2-444p10 vc2-444p12
+fate-vsynth1-vc2-%:  FMT  = mov
+fate-vsynth1-vc2-%:  ENCOPTS = -pix_fmt yuv$(@:fate-vsynth1-vc2-%=%) \
+   -vcodec vc2 -frames 5 -strict -1
+fate-vsynth2-vc2-%:  FMT  = mov
+fate-vsynth2-vc2-%:  ENCOPTS = -pix_fmt yuv$(@:fate-vsynth2-vc2-%=%) \
+   -vcodec vc2 -frames 5 -strict -1
+fate-vsynth_lena-vc2-%:  FMT  = mov
+fate-vsynth_lena-vc2-%:  ENCOPTS = -pix_fmt yuv$(@:fate-vsynth_lena-vc2-%=%) \
+   -vcodec vc2 -frames 5 -strict -1
+
 fate-vsynth%-dnxhd-720p: ENCOPTS = -s hd720 -b 90M  \
-pix_fmt yuv422p -frames 5 -qmax 8
 fate-vsynth%-dnxhd-720p: FMT = dnxhd
@@ -356,7 +369,9 @@ FATE_VSYNTH2 = $(FATE_VCODEC:%=fate-vsynth2-%)
 FATE_VSYNTH_LENA = $(FATE_VCODEC:%=fate-vsynth_lena-%)

Re: [FFmpeg-devel] [PATCH 1/2] vc2enc: prevent random data

2016-05-03 Thread Christophe Gisquet
2016-05-03 19:06 GMT+02:00 Christophe Gisquet <christophe.gisq...@gmail.com>:
> +memset(pb->buf_ptr, 0, pad_c);

Commit squashing fail, attached patch should fix that. This
unfortunately requires updating the fate tests as I generated them
from this squashing.

-- 
Christophe
From 3008fd916cca5b9ab22e96536e778d63ba25ed20 Mon Sep 17 00:00:00 2001
From: Christophe Gisquet <christophe.gisq...@gmail.com>
Date: Tue, 3 May 2016 11:47:25 +0200
Subject: [PATCH 1/2] vc2enc: prevent random data

The slice prefix is 0 in the reference encoder and the decoder ignores it.
Writing 0 there seems like the best temporary solution.

The padding could have contained uninitialized data, but its standardized value
is 0xFF, hence the memset value.

Overall this allows producing bistreams with no random data for use by fate.
---
 libavcodec/vc2enc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/libavcodec/vc2enc.c b/libavcodec/vc2enc.c
index 943198b..6fbdaa5 100644
--- a/libavcodec/vc2enc.c
+++ b/libavcodec/vc2enc.c
@@ -777,7 +777,10 @@ static int encode_hq_slice(AVCodecContext *avctx, void *arg)
 uint8_t quants[MAX_DWT_LEVELS][4];
 int p, level, orientation;
 
+// The reference decoder ignores it, and its typical length is 0
+memset(put_bits_ptr(pb), 0, s->prefix_bytes);
 skip_put_bytes(pb, s->prefix_bytes);
+
 put_bits(pb, 8, quant_idx);
 
 /* Slice quantization (slice_quantizers() in the specs) */
@@ -809,6 +812,7 @@ static int encode_hq_slice(AVCodecContext *avctx, void *arg)
 }
 pb->buf[bytes_start] = pad_s;
 flush_put_bits(pb);
+memset(put_bits_ptr(pb), 0xFF, pad_c);
 skip_put_bytes(pb, pad_c);
 }
 
-- 
2.8.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 2/2] vc2: fate tests

2016-05-03 Thread Christophe Gisquet
---
 tests/fate/vcodec.mak   | 17 -
 tests/ref/vsynth/vsynth1-vc2-420p   |  4 
 tests/ref/vsynth/vsynth1-vc2-420p10 |  4 
 tests/ref/vsynth/vsynth1-vc2-420p12 |  4 
 tests/ref/vsynth/vsynth1-vc2-422p   |  4 
 tests/ref/vsynth/vsynth1-vc2-422p10 |  4 
 tests/ref/vsynth/vsynth1-vc2-422p12 |  4 
 tests/ref/vsynth/vsynth1-vc2-444p   |  4 
 tests/ref/vsynth/vsynth1-vc2-444p10 |  4 
 tests/ref/vsynth/vsynth1-vc2-444p12 |  4 
 tests/ref/vsynth/vsynth2-vc2-420p   |  4 
 tests/ref/vsynth/vsynth2-vc2-420p10 |  4 
 tests/ref/vsynth/vsynth2-vc2-420p12 |  4 
 tests/ref/vsynth/vsynth2-vc2-422p   |  4 
 tests/ref/vsynth/vsynth2-vc2-422p10 |  4 
 tests/ref/vsynth/vsynth2-vc2-422p12 |  4 
 tests/ref/vsynth/vsynth2-vc2-444p   |  4 
 tests/ref/vsynth/vsynth2-vc2-444p10 |  4 
 tests/ref/vsynth/vsynth2-vc2-444p12 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-420p   |  4 
 tests/ref/vsynth/vsynth_lena-vc2-420p10 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-420p12 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-422p   |  4 
 tests/ref/vsynth/vsynth_lena-vc2-422p10 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-422p12 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-444p   |  4 
 tests/ref/vsynth/vsynth_lena-vc2-444p10 |  4 
 tests/ref/vsynth/vsynth_lena-vc2-444p12 |  4 
 28 files changed, 124 insertions(+), 1 deletion(-)
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p10
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-420p12
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p10
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-422p12
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p10
 create mode 100644 tests/ref/vsynth/vsynth1-vc2-444p12
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p10
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-420p12
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p10
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-422p12
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p10
 create mode 100644 tests/ref/vsynth/vsynth2-vc2-444p12
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p10
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-420p12
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p10
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-422p12
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p10
 create mode 100644 tests/ref/vsynth/vsynth_lena-vc2-444p12

diff --git a/tests/fate/vcodec.mak b/tests/fate/vcodec.mak
index ccf88ce..0e08894 100644
--- a/tests/fate/vcodec.mak
+++ b/tests/fate/vcodec.mak
@@ -29,6 +29,19 @@ FATE_VCODEC-$(call ENCDEC, DNXHD, DNXHD) += dnxhd-720p   
   \
 dnxhd-720p-rd   \
 dnxhd-720p-10bit
 
+FATE_VCODEC-$(call ENCDEC, VC2 DIRAC, MOV) += vc2-420p vc2-420p10 vc2-420p12 \
+  vc2-422p vc2-422p10 vc2-422p12 \
+  vc2-444p vc2-444p10 vc2-444p12
+fate-vsynth1-vc2-%:  FMT  = mov
+fate-vsynth1-vc2-%:  ENCOPTS = -pix_fmt 
yuv$(@:fate-vsynth1-vc2-%=%) \
+   -vcodec vc2 -frames 5 -strict -1
+fate-vsynth2-vc2-%:  FMT  = mov
+fate-vsynth2-vc2-%:  ENCOPTS = -pix_fmt 
yuv$(@:fate-vsynth2-vc2-%=%) \
+   -vcodec vc2 -frames 5 -strict -1
+fate-vsynth_lena-vc2-%:  FMT  = mov
+fate-vsynth_lena-vc2-%:  ENCOPTS = -pix_fmt 
yuv$(@:fate-vsynth_lena-vc2-%=%) \
+   -vcodec vc2 -frames 5 -strict -1
+
 fate-vsynth%-dnxhd-720p: ENCOPTS = -s hd720 -b 90M  \
-pix_fmt yuv422p -frames 5 -qmax 8
 fate-vsynth%-dnxhd-720p: FMT = dnxhd
@@ -356,7 +369,9 @@ FATE_VSYNTH2 = $(FATE_VCODEC:%=fate-vsynth2-%)
 FATE_VSYNTH_LENA = $(FATE_VCODEC:%=fate-vsynth_lena-%)
 # Redundant tests because they just resize the input
 RESIZE_OFF   = dnxhd-720p dnxhd-720p-rd dnxhd-720p-10bit dnxhd-1080i \
-   dv dv-411 dv-50 avui snow snow-hpel snow-ll
+   dv dv-411 dv-50 avui snow snow-hpel snow-ll vc2-420p \
+   vc2-420p10 vc2-420p12 vc2-422p vc2-422p10 vc2-422p12 \
+   vc2-444p vc2-444p10 vc2-444p12
 # Incorrect 

Re: [FFmpeg-devel] [PATCH 1/4] fate: wma: add lossless 24bits test

2016-05-02 Thread Christophe Gisquet
Hi,

2016-05-02 16:02 GMT+02:00 Michael Niedermayer :
>> +fate-lossless-wma24-rawtile: CMD = md5 -i 
>> $(TARGET_SAMPLES)/lossless-audio/g2_24bit.wma -f s24le
>
> where can i find that file ?
> i assume i should upload it ?

Sorry, I thought we had discussed it in this thread, but even with
this, it was not obvious.

Yes, please upload this file:
https://trac.ffmpeg.org/raw-attachment/ticket/4134/g2_24bit.wma

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/4] wmalossless: allow calling madd_int16

2016-05-01 Thread Christophe Gisquet
2016-05-01 15:33 GMT+02:00 Christophe Gisquet <christophe.gisq...@gmail.com>:
> This is done by actually handling the "prev_values" in the cascaded LMS data
> as if it were int16_t, thus requiring switching at various locations the
> computations.

Patch update since Michael's fix, which was incidentally included in
the previous patch.

-- 
Christophe
From 7ef234b0e16b364b8efcd2f3b6b2e6f34f707ee8 Mon Sep 17 00:00:00 2001
From: Christophe Gisquet <christophe.gisq...@gmail.com>
Date: Sun, 1 May 2016 12:34:29 +0200
Subject: [PATCH 2/4] wmalossless: allow calling madd_int16

This is done by actually handling the "prev_values" in the cascaded LMS data
as if it were int16_t, thus requiring switching at various locations the
computations.
---
 libavcodec/wmalosslessdec.c | 110 
 1 file changed, 59 insertions(+), 51 deletions(-)

diff --git a/libavcodec/wmalosslessdec.c b/libavcodec/wmalosslessdec.c
index 3e80c47..1ea5918 100644
--- a/libavcodec/wmalosslessdec.c
+++ b/libavcodec/wmalosslessdec.c
@@ -694,32 +694,6 @@ static void revert_mclms(WmallDecodeCtx *s, int tile_size)
 }
 }
 
-static void lms_update(WmallDecodeCtx *s, int ich, int ilms, int input)
-{
-int recent = s->cdlms[ich][ilms].recent;
-int range  = 1 << s->bits_per_sample - 1;
-int order  = s->cdlms[ich][ilms].order;
-
-if (recent)
-recent--;
-else {
-memcpy(s->cdlms[ich][ilms].lms_prevvalues + order,
-   s->cdlms[ich][ilms].lms_prevvalues, sizeof(*s->cdlms[ich][ilms].lms_prevvalues) * order);
-memcpy(s->cdlms[ich][ilms].lms_updates + order,
-   s->cdlms[ich][ilms].lms_updates, sizeof(*s->cdlms[ich][ilms].lms_updates) * order);
-recent = order - 1;
-}
-
-s->cdlms[ich][ilms].lms_prevvalues[recent] = av_clip(input, -range, range - 1);
-s->cdlms[ich][ilms].lms_updates[recent] = WMASIGN(input) * s->update_speed[ich];
-
-s->cdlms[ich][ilms].lms_updates[recent + (order >> 4)] >>= 2;
-s->cdlms[ich][ilms].lms_updates[recent + (order >> 3)] >>= 1;
-s->cdlms[ich][ilms].recent = recent;
-memset(s->cdlms[ich][ilms].lms_updates + recent + order, 0,
-   sizeof(s->cdlms[ich][ilms].lms_updates) - sizeof(int16_t)*(recent+order));
-}
-
 static void use_high_update_speed(WmallDecodeCtx *s, int ich)
 {
 int ilms, recent, icoef;
@@ -755,32 +729,63 @@ static void use_normal_update_speed(WmallDecodeCtx *s, int ich)
 s->update_speed[ich] = 8;
 }
 
-static void revert_cdlms(WmallDecodeCtx *s, int ch,
- int coef_begin, int coef_end)
-{
-int icoef, pred, ilms, num_lms, residue, input;
-
-num_lms = s->cdlms_ttl[ch];
-for (ilms = num_lms - 1; ilms >= 0; ilms--) {
-for (icoef = coef_begin; icoef < coef_end; icoef++) {
-pred = 1 << (s->cdlms[ch][ilms].scaling - 1);
-residue = s->channel_residues[ch][icoef];
-pred += s->dsp.scalarproduct_and_madd_int32(s->cdlms[ch][ilms].coefs,
-s->cdlms[ch][ilms].lms_prevvalues
-+ s->cdlms[ch][ilms].recent,
-s->cdlms[ch][ilms].lms_updates
-+ s->cdlms[ch][ilms].recent,
-FFALIGN(s->cdlms[ch][ilms].order,
-WMALL_COEFF_PAD_SIZE),
-WMASIGN(residue));
-input = residue + (pred >> s->cdlms[ch][ilms].scaling);
-lms_update(s, ch, ilms, input);
-s->channel_residues[ch][icoef] = input;
-}
-}
-emms_c();
+#define CD_LMS(bits, ROUND) \
+static void lms_update ## bits (WmallDecodeCtx *s, int ich, int ilms, int input) \
+{ \
+int recent = s->cdlms[ich][ilms].recent; \
+int range  = 1 << s->bits_per_sample - 1; \
+int order  = s->cdlms[ich][ilms].order; \
+int ##bits##_t *prev = (int##bits##_t *)s->cdlms[ich][ilms].lms_prevvalues; \
+ \
+if (recent) \
+recent--; \
+else { \
+memcpy(prev + order, prev, (bits/8) * order); \
+memcpy(s->cdlms[ich][ilms].lms_updates + order, \
+   s->cdlms[ich][ilms].lms_updates, \
+   sizeof(*s->cdlms[ich][ilms].lms_updates) * order); \
+recent = order - 1; \
+} \
+ \
+prev[recent] = av_clip(input, -range, range - 1); \
+s->cdlms[ich][ilms].lms_updates[recent] = WMASIGN(input) * s->update_speed[ich]; \
+ \
+s->cdlms[ich][ilms].lms_updates[recent + (order >> 4)] >>= 2; \
+s->cdlms[ich][ilms].lms_updates[recent 

Re: [FFmpeg-devel] [PATCH 1/4] fate: wma: add lossless 24bits test

2016-05-01 Thread Christophe Gisquet
2016-05-01 15:54 GMT+02:00 Paul B Mahol <one...@gmail.com>:
> There where 2 distinct issues: 32bit instead of 16bit integers and
> wrong handling of raw pcm.
> The 96k is about the first one, last decoded frame md5 differs for example.

Added a test for the file with raw pcm tiles then.

-- 
Christophe
From 584999fcce24585f989d2dc770e8c7c85aa19db7 Mon Sep 17 00:00:00 2001
From: Christophe Gisquet <christophe.gisq...@gmail.com>
Date: Mon, 18 Apr 2016 12:53:21 +0200
Subject: [PATCH 1/4] fate: wma: add lossless 24bits tests

Should evaluate coefficients and raw pcm tiles.
---
 tests/fate/lossless-audio.mak | 6 +-
 tests/ref/fate/lossless-wma24-1   | 1 +
 tests/ref/fate/lossless-wma24-2   | 1 +
 tests/ref/fate/lossless-wma24-rawtile | 1 +
 4 files changed, 8 insertions(+), 1 deletion(-)
 create mode 100644 tests/ref/fate/lossless-wma24-1
 create mode 100644 tests/ref/fate/lossless-wma24-2
 create mode 100644 tests/ref/fate/lossless-wma24-rawtile

diff --git a/tests/fate/lossless-audio.mak b/tests/fate/lossless-audio.mak
index 58641ab..d292853 100644
--- a/tests/fate/lossless-audio.mak
+++ b/tests/fate/lossless-audio.mak
@@ -25,8 +25,12 @@ fate-lossless-tta: CMD = crc -i $(TARGET_SAMPLES)/lossless-audio/inside.tta
 FATE_SAMPLES_LOSSLESS_AUDIO-$(call DEMDEC, TTA, TTA) += fate-lossless-tta-encrypted
 fate-lossless-tta-encrypted: CMD = crc -password ffmpeg -i $(TARGET_SAMPLES)/lossless-audio/encrypted.tta
 
-FATE_SAMPLES_LOSSLESS_AUDIO-$(call DEMDEC, ASF, WMALOSSLESS) += fate-lossless-wma
+FATE_SAMPLES_LOSSLESS_AUDIO-$(call DEMDEC, ASF, WMALOSSLESS) += fate-lossless-wma fate-lossless-wma24-1 fate-lossless-wma24-2 fate-lossless-wma24-rawtile
 fate-lossless-wma: CMD = md5 -i $(TARGET_SAMPLES)/lossless-audio/luckynight-partial.wma -f s16le -frames 209
+fate-lossless-wma24-1: CMD = md5 -i $(TARGET_SAMPLES)/lossless-audio/master_audio_2.0_24bit.wma -f s24le
+fate-lossless-wma24-2: CMD = md5 -i $(TARGET_SAMPLES)/lossless-audio/Mega_Weird_Audio_Test_24bit.wma -f s24le
+fate-lossless-wma24-rawtile: CMD = md5 -i $(TARGET_SAMPLES)/lossless-audio/g2_24bit.wma -f s24le
+fate-lossless-wmall: fate-lossless-wma fate-lossless-wma24-1 fate-lossless-wma24-2 fate-lossless-wma24-rawtile
 
 FATE_SAMPLES_LOSSLESS_AUDIO += $(FATE_SAMPLES_LOSSLESS_AUDIO-yes)
 
diff --git a/tests/ref/fate/lossless-wma24-1 b/tests/ref/fate/lossless-wma24-1
new file mode 100644
index 000..ddee31c
--- /dev/null
+++ b/tests/ref/fate/lossless-wma24-1
@@ -0,0 +1 @@
+9ade91f506bc025854f6ffea0d635bc6
diff --git a/tests/ref/fate/lossless-wma24-2 b/tests/ref/fate/lossless-wma24-2
new file mode 100644
index 000..5ebdfd1
--- /dev/null
+++ b/tests/ref/fate/lossless-wma24-2
@@ -0,0 +1 @@
+908ec5c16f497bf7d5658d2689d125c8
diff --git a/tests/ref/fate/lossless-wma24-rawtile b/tests/ref/fate/lossless-wma24-rawtile
new file mode 100644
index 000..96e5e21
--- /dev/null
+++ b/tests/ref/fate/lossless-wma24-rawtile
@@ -0,0 +1 @@
+337592f38a2218a5bc95ceb9b5e72c8b
-- 
2.8.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/4] fate: wma: add lossless 24bits test

2016-05-01 Thread Christophe Gisquet
Hi,

2016-05-01 15:33 GMT+02:00 Christophe Gisquet <christophe.gisq...@gmail.com>:
> +fate-lossless-wma24-2: CMD = md5 -i 
> $(TARGET_SAMPLES)/lossless-audio/Mega_Weird_Audio_Test_24bit.wma -f s24le

The recent fixes actually changed the crc for that file.
Is https://trac.ffmpeg.org/attachment/ticket/4134/96k.wma another file
showing the issue?

Because it could be added as a fate test.

I'll send an updated patch once it has been decided whether to add the
above test.

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 2/4] wmalossless: allow calling madd_int16

2016-05-01 Thread Christophe Gisquet
This is done by actually handling the "prev_values" in the cascaded LMS data
as if it were int16_t, thus requiring switching at various locations the
computations.
---
 libavcodec/wmalosslessdec.c | 109 +++-
 1 file changed, 58 insertions(+), 51 deletions(-)

diff --git a/libavcodec/wmalosslessdec.c b/libavcodec/wmalosslessdec.c
index f14e8a6..687bbe3 100644
--- a/libavcodec/wmalosslessdec.c
+++ b/libavcodec/wmalosslessdec.c
@@ -694,32 +694,6 @@ static void revert_mclms(WmallDecodeCtx *s, int tile_size)
 }
 }
 
-static void lms_update(WmallDecodeCtx *s, int ich, int ilms, int input)
-{
-int recent = s->cdlms[ich][ilms].recent;
-int range  = 1 << s->bits_per_sample - 1;
-int order  = s->cdlms[ich][ilms].order;
-
-if (recent)
-recent--;
-else {
-memcpy(s->cdlms[ich][ilms].lms_prevvalues + order,
-   s->cdlms[ich][ilms].lms_prevvalues, 
sizeof(*s->cdlms[ich][ilms].lms_prevvalues) * order);
-memcpy(s->cdlms[ich][ilms].lms_updates + order,
-   s->cdlms[ich][ilms].lms_updates, 
sizeof(*s->cdlms[ich][ilms].lms_updates) * order);
-recent = order - 1;
-}
-
-s->cdlms[ich][ilms].lms_prevvalues[recent] = av_clip(input, -range, range 
- 1);
-s->cdlms[ich][ilms].lms_updates[recent] = WMASIGN(input) * 
s->update_speed[ich];
-
-s->cdlms[ich][ilms].lms_updates[recent + (order >> 4)] >>= 2;
-s->cdlms[ich][ilms].lms_updates[recent + (order >> 3)] >>= 1;
-s->cdlms[ich][ilms].recent = recent;
-memset(s->cdlms[ich][ilms].lms_updates + recent + order, 0,
-   sizeof(s->cdlms[ich][ilms].lms_updates) - 4*(recent+order));
-}
-
 static void use_high_update_speed(WmallDecodeCtx *s, int ich)
 {
 int ilms, recent, icoef;
@@ -755,32 +729,62 @@ static void use_normal_update_speed(WmallDecodeCtx *s, 
int ich)
 s->update_speed[ich] = 8;
 }
 
-static void revert_cdlms(WmallDecodeCtx *s, int ch,
- int coef_begin, int coef_end)
-{
-int icoef, pred, ilms, num_lms, residue, input;
-
-num_lms = s->cdlms_ttl[ch];
-for (ilms = num_lms - 1; ilms >= 0; ilms--) {
-for (icoef = coef_begin; icoef < coef_end; icoef++) {
-pred = 1 << (s->cdlms[ch][ilms].scaling - 1);
-residue = s->channel_residues[ch][icoef];
-pred += 
s->dsp.scalarproduct_and_madd_int32(s->cdlms[ch][ilms].coefs,
-
s->cdlms[ch][ilms].lms_prevvalues
-+ 
s->cdlms[ch][ilms].recent,
-
s->cdlms[ch][ilms].lms_updates
-+ 
s->cdlms[ch][ilms].recent,
-
FFALIGN(s->cdlms[ch][ilms].order,
-
WMALL_COEFF_PAD_SIZE),
-WMASIGN(residue));
-input = residue + (pred >> s->cdlms[ch][ilms].scaling);
-lms_update(s, ch, ilms, input);
-s->channel_residues[ch][icoef] = input;
-}
-}
-emms_c();
+#define CD_LMS(bits, ROUND) \
+static void lms_update ## bits (WmallDecodeCtx *s, int ich, int ilms, int 
input) \
+{ \
+int recent = s->cdlms[ich][ilms].recent; \
+int range  = 1 << s->bits_per_sample - 1; \
+int order  = s->cdlms[ich][ilms].order; \
+int ##bits##_t *prev = (int##bits##_t 
*)s->cdlms[ich][ilms].lms_prevvalues; \
+ \
+if (recent) \
+recent--; \
+else { \
+memcpy(prev + order, prev, (bits/8) * order); \
+memcpy(s->cdlms[ich][ilms].lms_updates + order, \
+   s->cdlms[ich][ilms].lms_updates, \
+   sizeof(*s->cdlms[ich][ilms].lms_updates) * order); \
+recent = order - 1; \
+} \
+ \
+prev[recent] = av_clip(input, -range, range - 1); \
+s->cdlms[ich][ilms].lms_updates[recent] = WMASIGN(input) * 
s->update_speed[ich]; \
+ \
+s->cdlms[ich][ilms].lms_updates[recent + (order >> 4)] >>= 2; \
+s->cdlms[ich][ilms].lms_updates[recent + (order >> 3)] >>= 1; \
+s->cdlms[ich][ilms].recent = recent; \
+memset(s->cdlms[ich][ilms].lms_updates + recent + order, 0, \
+   sizeof(s->cdlms[ich][ilms].lms_updates) - 2*(recent+order)); \
+} \
+ \
+static void revert_cdlms ## bits (WmallDecodeCtx *s, int ch, \
+  int coef_begin, int coef_end) \
+{ \
+int icoef, pred, ilms, num_lms, residue, input; \
+ \
+num_lms = s->cdlms_ttl[ch]; \
+for (ilms = num_lms - 1; ilms >= 0; ilms--) { \
+for (icoef = coef_begin; icoef < coef_end; icoef++) { \
+int##bits##_t *prevvalues = (int##bits##_t 
*)s->cdlms[ch][ilms].lms_prevvalues; \
+pred = 1 << (s->cdlms[ch][ilms].scaling - 1); \
+residue = s->channel_residues[ch][icoef]; \
+pred += 

[FFmpeg-devel] [PATCH 3/4] x86: lossless audio: SSE4 madd 32bits

2016-05-01 Thread Christophe Gisquet
The unique user so far is wmalossless 24bits. The few samples tested show an
order of 8, so more unrolling or an avx2 version do not make sense.

Timings: 68 -> 49 cycles
---
 libavcodec/x86/lossless_audiodsp.asm| 33 +
 libavcodec/x86/lossless_audiodsp_init.c |  7 +++
 2 files changed, 40 insertions(+)

diff --git a/libavcodec/x86/lossless_audiodsp.asm 
b/libavcodec/x86/lossless_audiodsp.asm
index 5597dad..063d7b4 100644
--- a/libavcodec/x86/lossless_audiodsp.asm
+++ b/libavcodec/x86/lossless_audiodsp.asm
@@ -68,6 +68,39 @@ SCALARPRODUCT
 INIT_XMM sse2
 SCALARPRODUCT
 
+INIT_XMM sse4
+; int ff_scalarproduct_and_madd_int32(int16_t *v1, int32_t *v2, int16_t *v3,
+; int order, int mul)
+cglobal scalarproduct_and_madd_int32, 4,4,8, v1, v2, v3, order, mul
+shl orderq, 1
+movdm7, mulm
+SPLATW  m7, m7
+pxorm6, m6
+add v1q, orderq
+lea v2q, [v2q + 2*orderq]
+add v3q, orderq
+neg orderq
+.loop:
+movam3, [v1q + orderq]
+movum0, [v2q + 2*orderq]
+pmovsxwd m4, m3
+movum1, [v2q + 2*orderq + mmsize]
+movhlps m5, m3
+movum2, [v3q + orderq]
+pmovsxwd m5, m5
+pmullw  m2, m7
+pmulld  m0, m4
+pmulld  m1, m5
+paddw   m2, m3
+paddd   m6, m0
+paddd   m6, m1
+mova[v1q + orderq], m2
+add orderq, 16
+jl .loop
+HADDD   m6, m0
+movd   eax, m6
+RET
+
 %macro SCALARPRODUCT_LOOP 1
 align 16
 .loop%1:
diff --git a/libavcodec/x86/lossless_audiodsp_init.c 
b/libavcodec/x86/lossless_audiodsp_init.c
index 197173c..10b6a65 100644
--- a/libavcodec/x86/lossless_audiodsp_init.c
+++ b/libavcodec/x86/lossless_audiodsp_init.c
@@ -31,6 +31,10 @@ int32_t ff_scalarproduct_and_madd_int16_ssse3(int16_t *v1, 
const int16_t *v2,
   const int16_t *v3,
   int order, int mul);
 
+int32_t ff_scalarproduct_and_madd_int32_sse4(int16_t *v1, const int32_t *v2,
+ const int16_t *v3,
+ int order, int mul);
+
 av_cold void ff_llauddsp_init_x86(LLAudDSPContext *c)
 {
 #if HAVE_YASM
@@ -45,5 +49,8 @@ av_cold void ff_llauddsp_init_x86(LLAudDSPContext *c)
 if (EXTERNAL_SSSE3(cpu_flags) &&
 !(cpu_flags & (AV_CPU_FLAG_SSE42 | AV_CPU_FLAG_3DNOW))) // cachesplit
 c->scalarproduct_and_madd_int16 = 
ff_scalarproduct_and_madd_int16_ssse3;
+
+if (EXTERNAL_SSE4(cpu_flags))
+c->scalarproduct_and_madd_int32 = ff_scalarproduct_and_madd_int32_sse4;
 #endif
 }
-- 
2.8.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 4/4] lossless audio dsp: unroll

2016-05-01 Thread Christophe Gisquet
The loops are guaranteed to be at least multiples of 8, so this
unrolling is safe but allows exploiting execution ports.

For int32 version: 68 -> 58c.
---
 libavcodec/lossless_audiodsp.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/libavcodec/lossless_audiodsp.c b/libavcodec/lossless_audiodsp.c
index ea0568e..e3ea8e1 100644
--- a/libavcodec/lossless_audiodsp.c
+++ b/libavcodec/lossless_audiodsp.c
@@ -29,10 +29,12 @@ static int32_t scalarproduct_and_madd_int16_c(int16_t *v1, 
const int16_t *v2,
 {
 int res = 0;
 
-while (order--) {
+do {
 res   += *v1 * *v2++;
 *v1++ += mul * *v3++;
-}
+res   += *v1 * *v2++;
+*v1++ += mul * *v3++;
+} while (order-=2);
 return res;
 }
 
@@ -42,10 +44,12 @@ static int32_t scalarproduct_and_madd_int32_c(int16_t *v1, 
const int32_t *v2,
 {
 int res = 0;
 
-while (order--) {
+do {
+res   += *v1 * *v2++;
+*v1++ += mul * *v3++;
 res   += *v1 * *v2++;
 *v1++ += mul * *v3++;
-}
+} while (order-=2);
 return res;
 }
 
-- 
2.8.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 0/4] wmalossless: fix 16bits speed regression v3

2016-05-01 Thread Christophe Gisquet
Due to the changes to the cascaded LMS coefficients, most of the code
needed a rewrite.

In particular, the SSE4 madd32 code is no longer that similar to be
shared inside a macro.

Christophe Gisquet (4):
  fate: wma: add lossless 24bits test
  wmalossless: allow calling madd_int16
  x86: lossless audio: SSE4 madd 32bits
  lossless audio dsp: unroll

 libavcodec/lossless_audiodsp.c  |  12 ++--
 libavcodec/wmalosslessdec.c | 109 +---
 libavcodec/x86/lossless_audiodsp.asm|  33 ++
 libavcodec/x86/lossless_audiodsp_init.c |   7 ++
 tests/fate/lossless-audio.mak   |   5 +-
 tests/ref/fate/lossless-wma24-1 |   1 +
 tests/ref/fate/lossless-wma24-2 |   1 +
 7 files changed, 112 insertions(+), 56 deletions(-)
 create mode 100644 tests/ref/fate/lossless-wma24-1
 create mode 100644 tests/ref/fate/lossless-wma24-2

-- 
2.8.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 1/5] fate: wma: add lossless 24bits test

2016-04-30 Thread Christophe Gisquet
---
 tests/fate/lossless-audio.mak   | 5 -
 tests/ref/fate/lossless-wma24-1 | 1 +
 tests/ref/fate/lossless-wma24-2 | 1 +
 3 files changed, 6 insertions(+), 1 deletion(-)
 create mode 100644 tests/ref/fate/lossless-wma24-1
 create mode 100644 tests/ref/fate/lossless-wma24-2

diff --git a/tests/fate/lossless-audio.mak b/tests/fate/lossless-audio.mak
index 58641ab..dbd0c0e 100644
--- a/tests/fate/lossless-audio.mak
+++ b/tests/fate/lossless-audio.mak
@@ -25,8 +25,11 @@ fate-lossless-tta: CMD = crc -i 
$(TARGET_SAMPLES)/lossless-audio/inside.tta
 FATE_SAMPLES_LOSSLESS_AUDIO-$(call DEMDEC, TTA, TTA) += 
fate-lossless-tta-encrypted
 fate-lossless-tta-encrypted: CMD = crc -password ffmpeg -i 
$(TARGET_SAMPLES)/lossless-audio/encrypted.tta
 
-FATE_SAMPLES_LOSSLESS_AUDIO-$(call DEMDEC, ASF, WMALOSSLESS) += 
fate-lossless-wma
+FATE_SAMPLES_LOSSLESS_AUDIO-$(call DEMDEC, ASF, WMALOSSLESS) += 
fate-lossless-wma fate-lossless-wma24-1 fate-lossless-wma24-2
 fate-lossless-wma: CMD = md5 -i 
$(TARGET_SAMPLES)/lossless-audio/luckynight-partial.wma -f s16le -frames 209
+fate-lossless-wma24-1: CMD = md5 -i 
$(TARGET_SAMPLES)/lossless-audio/master_audio_2.0_24bit.wma -f s24le
+fate-lossless-wma24-2: CMD = md5 -i 
$(TARGET_SAMPLES)/lossless-audio/Mega_Weird_Audio_Test_24bit.wma -f s24le
+fate-lossless-wmall: fate-lossless-wma fate-lossless-wma24-1 
fate-lossless-wma24-2
 
 FATE_SAMPLES_LOSSLESS_AUDIO += $(FATE_SAMPLES_LOSSLESS_AUDIO-yes)
 
diff --git a/tests/ref/fate/lossless-wma24-1 b/tests/ref/fate/lossless-wma24-1
new file mode 100644
index 000..ddee31c
--- /dev/null
+++ b/tests/ref/fate/lossless-wma24-1
@@ -0,0 +1 @@
+9ade91f506bc025854f6ffea0d635bc6
diff --git a/tests/ref/fate/lossless-wma24-2 b/tests/ref/fate/lossless-wma24-2
new file mode 100644
index 000..5ebdfd1
--- /dev/null
+++ b/tests/ref/fate/lossless-wma24-2
@@ -0,0 +1 @@
+908ec5c16f497bf7d5658d2689d125c8
-- 
2.8.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 2/5] wmalossless: allow calling madd_int16

2016-04-30 Thread Christophe Gisquet
This is done by actually handling the cascaded LMS data as if it
were int16_t, thus requiring switching at various locations the
computations.
---
 libavcodec/wmalosslessdec.c | 146 +---
 1 file changed, 84 insertions(+), 62 deletions(-)

diff --git a/libavcodec/wmalosslessdec.c b/libavcodec/wmalosslessdec.c
index 9d56d97..f3a2217 100644
--- a/libavcodec/wmalosslessdec.c
+++ b/libavcodec/wmalosslessdec.c
@@ -147,9 +147,9 @@ typedef struct WmallDecodeCtx {
 int scaling;
 int coefsend;
 int bitsend;
-DECLARE_ALIGNED(16, int32_t, coefs)[MAX_ORDER + 
WMALL_COEFF_PAD_SIZE/sizeof(int16_t)];
-DECLARE_ALIGNED(16, int32_t, lms_prevvalues)[MAX_ORDER * 2 + 
WMALL_COEFF_PAD_SIZE/sizeof(int16_t)];
-DECLARE_ALIGNED(16, int32_t, lms_updates)[MAX_ORDER * 2 + 
WMALL_COEFF_PAD_SIZE/sizeof(int16_t)];
+DECLARE_ALIGNED(16, int32_t, coefs)[MAX_ORDER + 
WMALL_COEFF_PAD_SIZE/sizeof(int32_t)];
+DECLARE_ALIGNED(16, int32_t, lms_prevvalues)[MAX_ORDER * 2 + 
WMALL_COEFF_PAD_SIZE/sizeof(int32_t)];
+DECLARE_ALIGNED(16, int32_t, lms_updates)[MAX_ORDER * 2 + 
WMALL_COEFF_PAD_SIZE/sizeof(int32_t)];
 int recent;
 } cdlms[WMALL_MAX_CHANNELS][9];
 
@@ -458,6 +458,7 @@ static int decode_cdlms(WmallDecodeCtx *s)
 int cdlms_send_coef = get_bits1(>gb);
 
 for (c = 0; c < s->num_channels; c++) {
+int shift = s->bits_per_sample > 16 ? 0 : 1;
 s->cdlms_ttl[c] = get_bits(>gb, 3) + 1;
 for (i = 0; i < s->cdlms_ttl[c]; i++) {
 s->cdlms[c][i].order = (get_bits(>gb, 7) + 1) * 8;
@@ -495,14 +496,20 @@ static int decode_cdlms(WmallDecodeCtx *s)
 s->cdlms[c][i].bitsend = get_bitsz(>gb, cbits) + 2;
 shift_l = 32 - s->cdlms[c][i].bitsend;
 shift_r = 32 - s->cdlms[c][i].scaling - 2;
+if (s->bits_per_sample > 16) {
 for (j = 0; j < s->cdlms[c][i].coefsend; j++)
 s->cdlms[c][i].coefs[j] =
 (get_bits(>gb, s->cdlms[c][i].bitsend) << shift_l) 
>> shift_r;
+} else {
+int16_t *ptr = (int16_t*)s->cdlms[c][i].coefs;
+for (j = 0; j < s->cdlms[c][i].coefsend; j++)
+ptr[j] = (get_bits(>gb, s->cdlms[c][i].bitsend) << 
shift_l) >> shift_r;
+}
 }
 }
 
 for (i = 0; i < s->cdlms_ttl[c]; i++)
-memset(s->cdlms[c][i].coefs + s->cdlms[c][i].order,
+memset(s->cdlms[c][i].coefs + (s->cdlms[c][i].order >> shift),
0, WMALL_COEFF_PAD_SIZE);
 }
 
@@ -694,32 +701,6 @@ static void revert_mclms(WmallDecodeCtx *s, int tile_size)
 }
 }
 
-static void lms_update(WmallDecodeCtx *s, int ich, int ilms, int input)
-{
-int recent = s->cdlms[ich][ilms].recent;
-int range  = 1 << s->bits_per_sample - 1;
-int order  = s->cdlms[ich][ilms].order;
-
-if (recent)
-recent--;
-else {
-memcpy(s->cdlms[ich][ilms].lms_prevvalues + order,
-   s->cdlms[ich][ilms].lms_prevvalues, 
sizeof(*s->cdlms[ich][ilms].lms_prevvalues) * order);
-memcpy(s->cdlms[ich][ilms].lms_updates + order,
-   s->cdlms[ich][ilms].lms_updates, 
sizeof(*s->cdlms[ich][ilms].lms_updates) * order);
-recent = order - 1;
-}
-
-s->cdlms[ich][ilms].lms_prevvalues[recent] = av_clip(input, -range, range 
- 1);
-s->cdlms[ich][ilms].lms_updates[recent] = WMASIGN(input) * 
s->update_speed[ich];
-
-s->cdlms[ich][ilms].lms_updates[recent + (order >> 4)] >>= 2;
-s->cdlms[ich][ilms].lms_updates[recent + (order >> 3)] >>= 1;
-s->cdlms[ich][ilms].recent = recent;
-memset(s->cdlms[ich][ilms].lms_updates + recent + order, 0,
-   sizeof(s->cdlms[ich][ilms].lms_updates) - 4*(recent+order));
-}
-
 static void use_high_update_speed(WmallDecodeCtx *s, int ich)
 {
 int ilms, recent, icoef;
@@ -727,12 +708,16 @@ static void use_high_update_speed(WmallDecodeCtx *s, int 
ich)
 recent = s->cdlms[ich][ilms].recent;
 if (s->update_speed[ich] == 16)
 continue;
-if (s->bV3RTM) {
+if (s->bits_per_sample > 16) {
+int32_t *updates = s->cdlms[ich][ilms].lms_updates;
+if (s->bV3RTM) updates += recent;
 for (icoef = 0; icoef < s->cdlms[ich][ilms].order; icoef++)
-s->cdlms[ich][ilms].lms_updates[icoef + recent] *= 2;
+updates[icoef] *= 2;
 } else {
+int16_t *updates = (int16_t *)s->cdlms[ich][ilms].lms_updates;
+if (s->bV3RTM) updates += recent;
 for (icoef = 0; icoef < s->cdlms[ich][ilms].order; icoef++)
-s->cdlms[ich][ilms].lms_updates[icoef] *= 2;
+updates[icoef] *= 2;
 }
 }
 s->update_speed[ich] = 16;
@@ -745,42 +730,76 @@ static void use_normal_update_speed(WmallDecodeCtx *s, 
int ich)
 

[FFmpeg-devel] [PATCH 5/5] wmalossless: silence a sample request

2016-04-30 Thread Christophe Gisquet
16bits samples with CDLMS orders of 8 are currently unsupported, but have never
been encountered before.

However, 8 seems to be the most frequent, if not the only order used for 24bits.
In that case, the dsp functions are fine with handling order that are multiples
of 8, so silence the warning.
---
 libavcodec/wmalosslessdec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/wmalosslessdec.c b/libavcodec/wmalosslessdec.c
index f3a2217..83b3174 100644
--- a/libavcodec/wmalosslessdec.c
+++ b/libavcodec/wmalosslessdec.c
@@ -469,7 +469,7 @@ static int decode_cdlms(WmallDecodeCtx *s)
 s->cdlms[0][0].order = 0;
 return AVERROR_INVALIDDATA;
 }
-if(s->cdlms[c][i].order & 8) {
+if(s->cdlms[c][i].order & 8 && s->bits_per_sample == 16) {
 static int warned;
 if(!warned)
 avpriv_request_sample(s->avctx, "CDLMS of order %d",
-- 
2.8.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 3/5] x86: lossless audio: SSE4 madd 32bits

2016-04-30 Thread Christophe Gisquet
The unique user so far is wmalossless 24bits. The few samples tested show an
order of 8, so more unrolling or an avx2 version do not make sense.

Timings: 72 -> 49 cycles
---
 libavcodec/x86/lossless_audiodsp.asm| 31 +--
 libavcodec/x86/lossless_audiodsp_init.c |  7 +++
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/libavcodec/x86/lossless_audiodsp.asm 
b/libavcodec/x86/lossless_audiodsp.asm
index 5597dad..d00869b 100644
--- a/libavcodec/x86/lossless_audiodsp.asm
+++ b/libavcodec/x86/lossless_audiodsp.asm
@@ -22,13 +22,17 @@
 
 SECTION .text
 
-%macro SCALARPRODUCT 0
+%macro SCALARPRODUCT 1
 ; int ff_scalarproduct_and_madd_int16(int16_t *v1, int16_t *v2, int16_t *v3,
 ; int order, int mul)
-cglobal scalarproduct_and_madd_int16, 4,4,8, v1, v2, v3, order, mul
-shl orderq, 1
+; int ff_scalarproduct_and_madd_int32(int32_t *v1, int32_t *v2, int32_t *v3,
+; int order, int mul)
+cglobal scalarproduct_and_madd_int %+ %1, 4,4,8, v1, v2, v3, order, mul
+shl orderq, (%1/16)
 movdm7, mulm
-%if mmsize == 16
+%if %1 == 32
+SPLATD  m7
+%elif mmsize == 16
 pshuflw m7, m7, 0
 punpcklqdq m7, m7
 %else
@@ -46,14 +50,26 @@ cglobal scalarproduct_and_madd_int16, 4,4,8, v1, v2, v3, 
order, mul
 movam5, [v1q + orderq + mmsize]
 movum2, [v3q + orderq]
 movum3, [v3q + orderq + mmsize]
+%if %1 == 32
+pmulld  m0, m4
+pmulld  m1, m5
+pmulld  m2, m7
+pmulld  m3, m7
+%else
 pmaddwd m0, m4
 pmaddwd m1, m5
 pmullw  m2, m7
 pmullw  m3, m7
+%endif
 paddd   m6, m0
 paddd   m6, m1
+%if %1 == 32
+paddd   m2, m4
+paddd   m3, m5
+%else
 paddw   m2, m4
 paddw   m3, m5
+%endif
 mova[v1q + orderq], m2
 mova[v1q + orderq + mmsize], m3
 add orderq, mmsize*2
@@ -64,9 +80,12 @@ cglobal scalarproduct_and_madd_int16, 4,4,8, v1, v2, v3, 
order, mul
 %endmacro
 
 INIT_MMX mmxext
-SCALARPRODUCT
+SCALARPRODUCT 16
 INIT_XMM sse2
-SCALARPRODUCT
+SCALARPRODUCT 16
+
+INIT_XMM sse4
+SCALARPRODUCT 32
 
 %macro SCALARPRODUCT_LOOP 1
 align 16
diff --git a/libavcodec/x86/lossless_audiodsp_init.c 
b/libavcodec/x86/lossless_audiodsp_init.c
index 197173c..85306cb 100644
--- a/libavcodec/x86/lossless_audiodsp_init.c
+++ b/libavcodec/x86/lossless_audiodsp_init.c
@@ -31,6 +31,10 @@ int32_t ff_scalarproduct_and_madd_int16_ssse3(int16_t *v1, 
const int16_t *v2,
   const int16_t *v3,
   int order, int mul);
 
+int32_t ff_scalarproduct_and_madd_int32_sse4(int32_t *v1, const int32_t *v2,
+ const int32_t *v3,
+ int order, int mul);
+
 av_cold void ff_llauddsp_init_x86(LLAudDSPContext *c)
 {
 #if HAVE_YASM
@@ -45,5 +49,8 @@ av_cold void ff_llauddsp_init_x86(LLAudDSPContext *c)
 if (EXTERNAL_SSSE3(cpu_flags) &&
 !(cpu_flags & (AV_CPU_FLAG_SSE42 | AV_CPU_FLAG_3DNOW))) // cachesplit
 c->scalarproduct_and_madd_int16 = 
ff_scalarproduct_and_madd_int16_ssse3;
+
+if (EXTERNAL_SSE4(cpu_flags))
+c->scalarproduct_and_madd_int32 = ff_scalarproduct_and_madd_int32_sse4;
 #endif
 }
-- 
2.8.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 4/5] lossless audio dsp: unroll

2016-04-30 Thread Christophe Gisquet
The loops are guaranteed to be at least multiples of 8, so this
unrolling is safe but allows exploiting execution ports.

For int32 version: 72 -> 57c.
---
 libavcodec/lossless_audiodsp.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/libavcodec/lossless_audiodsp.c b/libavcodec/lossless_audiodsp.c
index 55495d0..17a61cd 100644
--- a/libavcodec/lossless_audiodsp.c
+++ b/libavcodec/lossless_audiodsp.c
@@ -29,10 +29,12 @@ static int32_t scalarproduct_and_madd_int16_c(int16_t *v1, 
const int16_t *v2,
 {
 int res = 0;
 
-while (order--) {
+do {
 res   += *v1 * *v2++;
 *v1++ += mul * *v3++;
-}
+res   += *v1 * *v2++;
+*v1++ += mul * *v3++;
+} while (order-=2);
 return res;
 }
 
@@ -42,10 +44,12 @@ static int32_t scalarproduct_and_madd_int32_c(int32_t *v1, 
const int32_t *v2,
 {
 int res = 0;
 
-while (order--) {
+do {
+res   += *v1 * *v2++;
+*v1++ += mul * *v3++;
 res   += *v1 * *v2++;
 *v1++ += mul * *v3++;
-}
+} while (order-=2);
 return res;
 }
 
-- 
2.8.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 0/5] wmalossless: fix 16bits speed regression v2

2016-04-30 Thread Christophe Gisquet
Patch 2 is the squashing of several previous commits, as there were
no opinion on their contents nor the way to go.

The SSE4 one is the final version from its last thread.

The last patch in this set is new, and silences a warning that's only
meaningful for 16bits content.

Christophe Gisquet (5):
  fate: wma: add lossless 24bits test
  wmalossless: allow calling madd_int16
  x86: lossless audio: SSE4 madd 32bits
  lossless audio dsp: unroll
  wmalossless: silence a sample request

 libavcodec/lossless_audiodsp.c  |  12 ++-
 libavcodec/wmalosslessdec.c | 148 ++--
 libavcodec/x86/lossless_audiodsp.asm|  31 +--
 libavcodec/x86/lossless_audiodsp_init.c |   7 ++
 tests/fate/lossless-audio.mak   |   5 +-
 tests/ref/fate/lossless-wma24-1 |   1 +
 tests/ref/fate/lossless-wma24-2 |   1 +
 7 files changed, 131 insertions(+), 74 deletions(-)
 create mode 100644 tests/ref/fate/lossless-wma24-1
 create mode 100644 tests/ref/fate/lossless-wma24-2

-- 
2.8.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 0/6] wmalossless: fix 16bits speed regression

2016-04-30 Thread Christophe Gisquet
2016-04-29 10:50 GMT+02:00 Paul B Mahol :
> Should be OK if it doesn't break anything.

I'll resend the current state of this patchset for easier testing &
applying. Michael ran this under valgrind with nothing popping up, and
fate passes.

I think the remaining thing is: is the first version (with if inside
loops) preferred over the second version (macroing to reduce such
ifs)?

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 5/6] x86: lossless audio: SSE4 madd 32bits

2016-04-20 Thread Christophe Gisquet
Hi,

2016-04-20 2:01 GMT+02:00 Ronald S. Bultje :
> This is typically only an issue if the data came from stack. On win64 as
> well as unix64, the 4th argument never comes from stack but is a direct
> register argument instead.

So no benefit except consistency. I don't mind either way, though.

On the other hand, this hand-coded function improves is only a slight
improvement over gcc's vectorized code, and only because it does a
poor job of it. Probably because the order is small (8) and gcc does
not have enough info on data. So, it's written, but it's not very
beneficial.

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 5/6] x86: lossless audio: SSE4 madd 32bits

2016-04-18 Thread Christophe Gisquet
2016-04-18 21:18 GMT+02:00 Michael Niedermayer <mich...@niedermayer.cc>:
> this breaks (only noise)
> \[CCCP\]_Mega_Weird_Audio_Test.mkv track 23

Worthwhile sample.

I rewrote the patch to reduce code duplication, and I fixed the issue
(misread a shift).

-- 
Christophe
From a0d4a96c032d73bc0e34fec320497aefafba3c28 Mon Sep 17 00:00:00 2001
From: Christophe Gisquet <christophe.gisq...@gmail.com>
Date: Mon, 18 Apr 2016 13:20:07 +0200
Subject: [PATCH 5/7] x86: lossless audio: SSE4 madd 32bits

The unique user so far is wmalossless 24bits. The few samples tested show an
order of 8, so more unrolling or an avx2 version do not make sense.

Timings: 72 -> 49 cycles
---
 libavcodec/x86/lossless_audiodsp.asm| 31 +--
 libavcodec/x86/lossless_audiodsp_init.c |  7 +++
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/libavcodec/x86/lossless_audiodsp.asm b/libavcodec/x86/lossless_audiodsp.asm
index 5597dad..d00869b 100644
--- a/libavcodec/x86/lossless_audiodsp.asm
+++ b/libavcodec/x86/lossless_audiodsp.asm
@@ -22,13 +22,17 @@
 
 SECTION .text
 
-%macro SCALARPRODUCT 0
+%macro SCALARPRODUCT 1
 ; int ff_scalarproduct_and_madd_int16(int16_t *v1, int16_t *v2, int16_t *v3,
 ; int order, int mul)
-cglobal scalarproduct_and_madd_int16, 4,4,8, v1, v2, v3, order, mul
-shl orderq, 1
+; int ff_scalarproduct_and_madd_int32(int32_t *v1, int32_t *v2, int32_t *v3,
+; int order, int mul)
+cglobal scalarproduct_and_madd_int %+ %1, 4,4,8, v1, v2, v3, order, mul
+shl orderq, (%1/16)
 movdm7, mulm
-%if mmsize == 16
+%if %1 == 32
+SPLATD  m7
+%elif mmsize == 16
 pshuflw m7, m7, 0
 punpcklqdq m7, m7
 %else
@@ -46,14 +50,26 @@ cglobal scalarproduct_and_madd_int16, 4,4,8, v1, v2, v3, order, mul
 movam5, [v1q + orderq + mmsize]
 movum2, [v3q + orderq]
 movum3, [v3q + orderq + mmsize]
+%if %1 == 32
+pmulld  m0, m4
+pmulld  m1, m5
+pmulld  m2, m7
+pmulld  m3, m7
+%else
 pmaddwd m0, m4
 pmaddwd m1, m5
 pmullw  m2, m7
 pmullw  m3, m7
+%endif
 paddd   m6, m0
 paddd   m6, m1
+%if %1 == 32
+paddd   m2, m4
+paddd   m3, m5
+%else
 paddw   m2, m4
 paddw   m3, m5
+%endif
 mova[v1q + orderq], m2
 mova[v1q + orderq + mmsize], m3
 add orderq, mmsize*2
@@ -64,9 +80,12 @@ cglobal scalarproduct_and_madd_int16, 4,4,8, v1, v2, v3, order, mul
 %endmacro
 
 INIT_MMX mmxext
-SCALARPRODUCT
+SCALARPRODUCT 16
 INIT_XMM sse2
-SCALARPRODUCT
+SCALARPRODUCT 16
+
+INIT_XMM sse4
+SCALARPRODUCT 32
 
 %macro SCALARPRODUCT_LOOP 1
 align 16
diff --git a/libavcodec/x86/lossless_audiodsp_init.c b/libavcodec/x86/lossless_audiodsp_init.c
index 197173c..85306cb 100644
--- a/libavcodec/x86/lossless_audiodsp_init.c
+++ b/libavcodec/x86/lossless_audiodsp_init.c
@@ -31,6 +31,10 @@ int32_t ff_scalarproduct_and_madd_int16_ssse3(int16_t *v1, const int16_t *v2,
   const int16_t *v3,
   int order, int mul);
 
+int32_t ff_scalarproduct_and_madd_int32_sse4(int32_t *v1, const int32_t *v2,
+ const int32_t *v3,
+ int order, int mul);
+
 av_cold void ff_llauddsp_init_x86(LLAudDSPContext *c)
 {
 #if HAVE_YASM
@@ -45,5 +49,8 @@ av_cold void ff_llauddsp_init_x86(LLAudDSPContext *c)
 if (EXTERNAL_SSSE3(cpu_flags) &&
 !(cpu_flags & (AV_CPU_FLAG_SSE42 | AV_CPU_FLAG_3DNOW))) // cachesplit
 c->scalarproduct_and_madd_int16 = ff_scalarproduct_and_madd_int16_ssse3;
+
+if (EXTERNAL_SSE4(cpu_flags))
+c->scalarproduct_and_madd_int32 = ff_scalarproduct_and_madd_int32_sse4;
 #endif
 }
-- 
2.8.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/6] fate: wma: add lossless 24bits test

2016-04-18 Thread Christophe Gisquet
2016-04-18 22:22 GMT+02:00 Christophe Gisquet <christophe.gisq...@gmail.com>:
> 2016-04-18 19:11 GMT+02:00 James Almer <jamr...@gmail.com>:
>> No way to create one using existing 24bit audio currently available in fate
>> or any redistributable 24 audio out there?
>> There are some dts-ma and truehd multichannel samples that are not sine 
>> waves.
>
> You're right. Just did that, except the encoder doesn't like the 7.1
> configuration.

Except this nice true 24bits sample didn't exhibit the issue found by
Michael (where the audio is coded as 24bits, but probably isn't).

Extracting a few wma packets from thefile is sufficient to show the
issue, generating a file that's 30KB.

Because of this, I would favour this sample over a true 24 bits
sample. Another option would be to add those 2 tests.

Opinions?

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/6] fate: wma: add lossless 24bits test

2016-04-18 Thread Christophe Gisquet
Hi,

2016-04-18 19:11 GMT+02:00 James Almer :
> No way to create one using existing 24bit audio currently available in fate
> or any redistributable 24 audio out there?
> There are some dts-ma and truehd multichannel samples that are not sine waves.

You're right. Just did that, except the encoder doesn't like the 7.1
configuration.

Because we're not testing the wma rematrixing, I changed that to stereo:
-i dts/master_audio_7.1_24bit.dts -af 'pan=stereo:c0=FL:c1=FR' -acodec pcm_s24le

The test file comes at 1.6MB, but fewer samples could be encoded.

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/6] wmalossless: allow calling madd_int16

2016-04-18 Thread Christophe Gisquet
Hi,

2016-04-18 20:09 GMT+02:00 Michael Niedermayer <mich...@niedermayer.cc>:
> On Mon, Apr 18, 2016 at 03:07:27PM +0200, Christophe Gisquet wrote:
>> This is done by actually handling the cascaded LMS data as if it
>> were int16_t, thus requiring switching at various locations the
>> computations.
>> ---
>>  libavcodec/wmalosslessdec.c | 61 
>> +
>>  1 file changed, 61 insertions(+)
>
> this causes a few new warnings

Yeah, I focused on the macro'ed version. If this one is favoured
instead, then I'll come back to it.

Otherwise, I'll squash the 3 patches, probably.

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 6/6] lossless audio dsp: unroll

2016-04-18 Thread Christophe Gisquet
2016-04-18 19:15 GMT+02:00 James Almer <jamr...@gmail.com>:
> On 4/18/2016 10:07 AM, Christophe Gisquet wrote:
>> The loops are guaranteed to be at least multiples of 8, so this
>> unrolling is safe but allows exploiting execution ports.
>>
>> For int32 version: 72 -> 57c.
>
> What compiler are you using, and what cpu at configure time?

gcc 5.1, Win64, haswell. I don't use mingw64 compiler.

> We're currently enabling tree vectorization for gcc 4.9 or newer on x86,
> and at least with gcc 5.3.0 on mingw-w64 the resulting code now seems worse.
> I didn't bench it, but after this patch it's not being vectorized anymore.

The code I benchmarked as being 72c is vectorized and keeps being
vectorized here. It actually looks better than the previously
vectorized one.

The 16_c version is no longer vectorized, but is really a mess here
when vectorized.

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/6] fate: wma: add lossless 24bits test

2016-04-18 Thread Christophe Gisquet
2016-04-18 18:39 GMT+02:00 Paul B Mahol :
> Better to have real 24bit content.

Yeah, my point, but I'm not sure we'll get one redistribuable in fate,
eg by pinging people from the various tickets.

And when would we decide this is better than nothing?

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/6] fate: wma: add lossless 24bits test

2016-04-18 Thread Christophe Gisquet
2016-04-18 15:07 GMT+02:00 Christophe Gisquet <christophe.gisq...@gmail.com>:
> +fate-lossless-wma24: CMD = md5 -i 
> $(TARGET_SAMPLES)/lossless-audio/luckynight-partial-24.wma -f s24le -frames 
> 209

Btw, this is the regular luckynight whose samples have been shifted
into 24 bits. Another type of bitdepth increase would be nice, but I
haven't looked for it.

For some reason, the default GUI doesn't let me select the bitdepth
and downshifts to 16, so I had to resort to some command-line encoder.

The sample is 1MB, and hasn't been uploaded yet.

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 6/6] lossless audio dsp: unroll

2016-04-18 Thread Christophe Gisquet
The loops are guaranteed to be at least multiples of 8, so this
unrolling is safe but allows exploiting execution ports.

For int32 version: 72 -> 57c.
---
 libavcodec/lossless_audiodsp.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/libavcodec/lossless_audiodsp.c b/libavcodec/lossless_audiodsp.c
index 55495d0..17a61cd 100644
--- a/libavcodec/lossless_audiodsp.c
+++ b/libavcodec/lossless_audiodsp.c
@@ -29,10 +29,12 @@ static int32_t scalarproduct_and_madd_int16_c(int16_t *v1, 
const int16_t *v2,
 {
 int res = 0;
 
-while (order--) {
+do {
 res   += *v1 * *v2++;
 *v1++ += mul * *v3++;
-}
+res   += *v1 * *v2++;
+*v1++ += mul * *v3++;
+} while (order-=2);
 return res;
 }
 
@@ -42,10 +44,12 @@ static int32_t scalarproduct_and_madd_int32_c(int32_t *v1, 
const int32_t *v2,
 {
 int res = 0;
 
-while (order--) {
+do {
+res   += *v1 * *v2++;
+*v1++ += mul * *v3++;
 res   += *v1 * *v2++;
 *v1++ += mul * *v3++;
-}
+} while (order-=2);
 return res;
 }
 
-- 
2.8.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 4/6] wmalossless: template code to remove inloop if

2016-04-18 Thread Christophe Gisquet
Code size increase is minimal.
---
 libavcodec/wmalosslessdec.c | 140 ++--
 1 file changed, 57 insertions(+), 83 deletions(-)

diff --git a/libavcodec/wmalosslessdec.c b/libavcodec/wmalosslessdec.c
index 77017ff..27510d4 100644
--- a/libavcodec/wmalosslessdec.c
+++ b/libavcodec/wmalosslessdec.c
@@ -759,90 +759,61 @@ static void use_normal_update_speed(WmallDecodeCtx *s, 
int ich)
 s->update_speed[ich] = 8;
 }
 
-static void lms_update(WmallDecodeCtx *s, int ich, int ilms, int input)
-{
-int recent = s->cdlms[ich][ilms].recent;
-int range  = 1 << s->bits_per_sample - 1;
-int order  = s->cdlms[ich][ilms].order;
-
-if (s->bits_per_sample > 16) {
-if (recent)
-recent--;
-else {
-memcpy(s->cdlms[ich][ilms].lms_prevvalues + order,
-   s->cdlms[ich][ilms].lms_prevvalues, 
sizeof(*s->cdlms[ich][ilms].lms_prevvalues) * order);
-memcpy(s->cdlms[ich][ilms].lms_updates + order,
-   s->cdlms[ich][ilms].lms_updates, 
sizeof(*s->cdlms[ich][ilms].lms_updates) * order);
-recent = order - 1;
-}
-
-s->cdlms[ich][ilms].lms_prevvalues[recent] = av_clip(input, -range, range 
- 1);
-s->cdlms[ich][ilms].lms_updates[recent] = WMASIGN(input) * 
s->update_speed[ich];
-
-s->cdlms[ich][ilms].lms_updates[recent + (order >> 4)] >>= 2;
-s->cdlms[ich][ilms].lms_updates[recent + (order >> 3)] >>= 1;
-s->cdlms[ich][ilms].recent = recent;
-memset(s->cdlms[ich][ilms].lms_updates + recent + order, 0,
-   sizeof(s->cdlms[ich][ilms].lms_updates) - 4*(recent+order));
-} else {
-int16_t *prevvalues = s->cdlms[ich][ilms].lms_prevvalues;
-int16_t *updates= s->cdlms[ich][ilms].lms_updates;
-if (recent)
-recent--;
-else {
-memcpy(prevvalues + order, prevvalues, 2 * order);
-memcpy(updates + order, updates, 2 * order);
-recent = order - 1;
-}
-
-prevvalues[recent] = av_clip(input, -range, range - 1);
-updates[recent] = WMASIGN(input) * s->update_speed[ich];
-
-updates[recent + (order >> 4)] >>= 2;
-updates[recent + (order >> 3)] >>= 1;
-s->cdlms[ich][ilms].recent = recent;
-memset(updates + recent + order, 0,
-   sizeof(s->cdlms[ich][ilms].lms_updates) - 2*(recent+order));
-}
+#define CD_LMS(bits, ROUND) \
+static void lms_update ## bits (WmallDecodeCtx *s, int ich, int ilms, int 
input) \
+{ \
+int recent = s->cdlms[ich][ilms].recent; \
+int range  = 1 << s->bits_per_sample - 1; \
+int order  = s->cdlms[ich][ilms].order; \
+int ##bits##_t *prev = (int##bits##_t 
*)s->cdlms[ich][ilms].lms_prevvalues; \
+int ##bits##_t *upd = (int##bits##_t *)s->cdlms[ich][ilms].lms_updates; \
+ \
+if (recent) \
+recent--; \
+else { \
+memcpy(prev + order, prev, (bits/8) * order); \
+memcpy(upd + order, upd, (bits/8) * order); \
+recent = order - 1; \
+} \
+ \
+prev[recent] = av_clip(input, -range, range - 1); \
+upd[recent] = WMASIGN(input) * s->update_speed[ich]; \
+ \
+upd[recent + (order >> 4)] >>= 2; \
+upd[recent + (order >> 3)] >>= 1; \
+s->cdlms[ich][ilms].recent = recent; \
+memset(upd + recent + order, 0, \
+   sizeof(s->cdlms[ich][ilms].lms_updates) - (bits/8)*(recent+order)); 
\
+} \
+ \
+static void revert_cdlms ## bits (WmallDecodeCtx *s, int ch, \
+  int coef_begin, int coef_end) \
+{ \
+int icoef, pred, ilms, num_lms, residue, input; \
+ \
+num_lms = s->cdlms_ttl[ch]; \
+for (ilms = num_lms - 1; ilms >= 0; ilms--) { \
+for (icoef = coef_begin; icoef < coef_end; icoef++) { \
+int##bits##_t *coeffs = (int##bits##_t *)s->cdlms[ch][ilms].coefs; 
\
+int##bits##_t *prevvalues = (int##bits##_t 
*)s->cdlms[ch][ilms].lms_prevvalues; \
+int##bits##_t *updates = (int##bits##_t 
*)s->cdlms[ch][ilms].lms_updates; \
+pred = 1 << (s->cdlms[ch][ilms].scaling - 1); \
+residue = s->channel_residues[ch][icoef]; \
+pred += s->dsp.scalarproduct_and_madd_int## bits (coeffs, \
+prevvalues + 
s->cdlms[ch][ilms].recent, \
+updates + 
s->cdlms[ch][ilms].recent, \
+
FFALIGN(s->cdlms[ch][ilms].order, ROUND), \
+WMASIGN(residue)); \
+input = residue + (pred >> s->cdlms[ch][ilms].scaling); \
+lms_update ## bits(s, ch, ilms, input); \
+s->channel_residues[ch][icoef] = input; \
+} \
+} \
+emms_c(); \
 }
 
-static void revert_cdlms(WmallDecodeCtx *s, int ch,
- int coef_begin, int coef_end)
-{
-int icoef, pred, ilms, num_lms, residue, input;
-
-

[FFmpeg-devel] [PATCH 1/6] fate: wma: add lossless 24bits test

2016-04-18 Thread Christophe Gisquet
---
 tests/fate/lossless-audio.mak | 4 +++-
 tests/ref/fate/lossless-wma24 | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)
 create mode 100644 tests/ref/fate/lossless-wma24

diff --git a/tests/fate/lossless-audio.mak b/tests/fate/lossless-audio.mak
index 58641ab..ccc4d00 100644
--- a/tests/fate/lossless-audio.mak
+++ b/tests/fate/lossless-audio.mak
@@ -25,8 +25,10 @@ fate-lossless-tta: CMD = crc -i 
$(TARGET_SAMPLES)/lossless-audio/inside.tta
 FATE_SAMPLES_LOSSLESS_AUDIO-$(call DEMDEC, TTA, TTA) += 
fate-lossless-tta-encrypted
 fate-lossless-tta-encrypted: CMD = crc -password ffmpeg -i 
$(TARGET_SAMPLES)/lossless-audio/encrypted.tta
 
-FATE_SAMPLES_LOSSLESS_AUDIO-$(call DEMDEC, ASF, WMALOSSLESS) += 
fate-lossless-wma
+FATE_SAMPLES_LOSSLESS_AUDIO-$(call DEMDEC, ASF, WMALOSSLESS) += 
fate-lossless-wma fate-lossless-wma24
 fate-lossless-wma: CMD = md5 -i 
$(TARGET_SAMPLES)/lossless-audio/luckynight-partial.wma -f s16le -frames 209
+fate-lossless-wma24: CMD = md5 -i 
$(TARGET_SAMPLES)/lossless-audio/luckynight-partial-24.wma -f s24le -frames 209
+fate-lossless-wmal: fate-lossless-wma fate-lossless-wma24
 
 FATE_SAMPLES_LOSSLESS_AUDIO += $(FATE_SAMPLES_LOSSLESS_AUDIO-yes)
 
diff --git a/tests/ref/fate/lossless-wma24 b/tests/ref/fate/lossless-wma24
new file mode 100644
index 000..43862af
--- /dev/null
+++ b/tests/ref/fate/lossless-wma24
@@ -0,0 +1 @@
+e5aea78d60c407a88c4ff25994052b83
-- 
2.8.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 2/6] wmalossless: allow calling madd_int16

2016-04-18 Thread Christophe Gisquet
This is done by actually handling the cascaded LMS data as if it
were int16_t, thus requiring switching at various locations the
computations.
---
 libavcodec/wmalosslessdec.c | 61 +
 1 file changed, 61 insertions(+)

diff --git a/libavcodec/wmalosslessdec.c b/libavcodec/wmalosslessdec.c
index f7f249b..3885dc1 100644
--- a/libavcodec/wmalosslessdec.c
+++ b/libavcodec/wmalosslessdec.c
@@ -497,15 +497,29 @@ static int decode_cdlms(WmallDecodeCtx *s)
 s->cdlms[c][i].bitsend = get_bitsz(>gb, cbits) + 2;
 shift_l = 32 - s->cdlms[c][i].bitsend;
 shift_r = 32 - s->cdlms[c][i].scaling - 2;
+if (s->bits_per_sample > 16) {
 for (j = 0; j < s->cdlms[c][i].coefsend; j++)
 s->cdlms[c][i].coefs[j] =
 (get_bits(>gb, s->cdlms[c][i].bitsend) << shift_l) 
>> shift_r;
+} else {
+for (j = 0; j < s->cdlms[c][i].coefsend; j++) {
+int16_t *ptr = (int16_t*)s->cdlms[c][i].coefs;
+ptr[j] = (get_bits(>gb, s->cdlms[c][i].bitsend) << 
shift_l) >> shift_r;
+}
+}
 }
 }
 
+if (s->bits_per_sample > 16) {
 for (i = 0; i < s->cdlms_ttl[c]; i++)
 memset(s->cdlms[c][i].coefs + s->cdlms[c][i].order,
0, WMALL_COEFF_PAD_SIZE);
+} else {
+for (i = 0; i < s->cdlms_ttl[c]; i++) {
+int16_t *ptr = (int16_t*)s->cdlms[c][i].coefs;
+memset(ptr + s->cdlms[c][i].order, 0, 2*WMALL_COEFF_PAD_SIZE);
+}
+}
 }
 
 return 0;
@@ -702,6 +716,7 @@ static void lms_update(WmallDecodeCtx *s, int ich, int 
ilms, int input)
 int range  = 1 << s->bits_per_sample - 1;
 int order  = s->cdlms[ich][ilms].order;
 
+if (s->bits_per_sample > 16) {
 if (recent)
 recent--;
 else {
@@ -720,6 +735,26 @@ static void lms_update(WmallDecodeCtx *s, int ich, int 
ilms, int input)
 s->cdlms[ich][ilms].recent = recent;
 memset(s->cdlms[ich][ilms].lms_updates + recent + order, 0,
sizeof(s->cdlms[ich][ilms].lms_updates) - 4*(recent+order));
+} else {
+int16_t *prevvalues = s->cdlms[ich][ilms].lms_prevvalues;
+int16_t *updates= s->cdlms[ich][ilms].lms_updates;
+if (recent)
+recent--;
+else {
+memcpy(prevvalues + order, prevvalues, 2 * order);
+memcpy(updates + order, updates, 2 * order);
+recent = order - 1;
+}
+
+prevvalues[recent] = av_clip(input, -range, range - 1);
+updates[recent] = WMASIGN(input) * s->update_speed[ich];
+
+updates[recent + (order >> 4)] >>= 2;
+updates[recent + (order >> 3)] >>= 1;
+s->cdlms[ich][ilms].recent = recent;
+memset(updates + recent + order, 0,
+   sizeof(s->cdlms[ich][ilms].lms_updates) - 2*(recent+order));
+}
 }
 
 static void use_high_update_speed(WmallDecodeCtx *s, int ich)
@@ -729,6 +764,7 @@ static void use_high_update_speed(WmallDecodeCtx *s, int 
ich)
 recent = s->cdlms[ich][ilms].recent;
 if (s->update_speed[ich] == 16)
 continue;
+if (s->bits_per_sample > 16) {
 if (s->bV3RTM) {
 for (icoef = 0; icoef < s->cdlms[ich][ilms].order; icoef++)
 s->cdlms[ich][ilms].lms_updates[icoef + recent] *= 2;
@@ -736,6 +772,12 @@ static void use_high_update_speed(WmallDecodeCtx *s, int 
ich)
 for (icoef = 0; icoef < s->cdlms[ich][ilms].order; icoef++)
 s->cdlms[ich][ilms].lms_updates[icoef] *= 2;
 }
+} else {
+int16_t *updates = (int16_t *)s->cdlms[ich][ilms].lms_updates;
+if (s->bV3RTM) updates += recent;
+for (icoef = 0; icoef < s->cdlms[ich][ilms].order; icoef++)
+updates[icoef] *= 2;
+}
 }
 s->update_speed[ich] = 16;
 }
@@ -747,12 +789,19 @@ static void use_normal_update_speed(WmallDecodeCtx *s, 
int ich)
 recent = s->cdlms[ich][ilms].recent;
 if (s->update_speed[ich] == 8)
 continue;
+if (s->bits_per_sample > 16) {
 if (s->bV3RTM)
 for (icoef = 0; icoef < s->cdlms[ich][ilms].order; icoef++)
 s->cdlms[ich][ilms].lms_updates[icoef + recent] /= 2;
 else
 for (icoef = 0; icoef < s->cdlms[ich][ilms].order; icoef++)
 s->cdlms[ich][ilms].lms_updates[icoef] /= 2;
+} else {
+int16_t *updates = (int16_t *)s->cdlms[ich][ilms].lms_updates;
+if (s->bV3RTM) updates += recent;
+for (icoef = 0; icoef < s->cdlms[ich][ilms].order; icoef++)
+updates[icoef] /= 2;
+}
 }
 s->update_speed[ich] = 8;
 }
@@ -767,6 +816,7 @@ static void revert_cdlms(WmallDecodeCtx *s, int ch,
  

[FFmpeg-devel] [PATCH 5/6] x86: lossless audio: SSE4 madd 32bits

2016-04-18 Thread Christophe Gisquet
The unique user so far is wmalossless 24bits. The few samples tested show an
order of 8, so more unrolling or an avx2 version do not make sense.

Timings: 72 -> 49 cycles
---
 libavcodec/x86/lossless_audiodsp.asm| 38 +
 libavcodec/x86/lossless_audiodsp_init.c |  7 ++
 2 files changed, 45 insertions(+)

diff --git a/libavcodec/x86/lossless_audiodsp.asm 
b/libavcodec/x86/lossless_audiodsp.asm
index 5597dad..1e295de 100644
--- a/libavcodec/x86/lossless_audiodsp.asm
+++ b/libavcodec/x86/lossless_audiodsp.asm
@@ -155,3 +155,41 @@ SCALARPRODUCT_LOOP 0
 HADDD   m6, m0
 movd   eax, m6
 RET
+
+%macro SCALARPRODUCT32 0
+; int ff_scalarproduct_and_madd_int16(int16_t *v1, int16_t *v2, int16_t *v3,
+; int order, int mul)
+cglobal scalarproduct_and_madd_int32, 4,4,8, v1, v2, v3, order, mul
+movdm7, mulm
+SPLATD  m7
+pxorm6, m6
+add v1q, orderq
+add v2q, orderq
+add v3q, orderq
+neg orderq
+.loop:
+movum0, [v2q + orderq]
+movum1, [v2q + orderq + mmsize]
+movam4, [v1q + orderq]
+movam5, [v1q + orderq + mmsize]
+movum2, [v3q + orderq]
+movum3, [v3q + orderq + mmsize]
+pmulld  m0, m4
+pmulld  m1, m5
+pmulld  m2, m7
+pmulld  m3, m7
+paddd   m6, m0
+paddd   m6, m1
+paddd   m2, m4
+paddd   m3, m5
+mova[v1q + orderq], m2
+mova[v1q + orderq + mmsize], m3
+add orderq, mmsize*2
+jl .loop
+HADDD   m6, m0
+movd   eax, m6
+RET
+%endmacro
+
+INIT_XMM sse4
+SCALARPRODUCT32
diff --git a/libavcodec/x86/lossless_audiodsp_init.c 
b/libavcodec/x86/lossless_audiodsp_init.c
index 197173c..85306cb 100644
--- a/libavcodec/x86/lossless_audiodsp_init.c
+++ b/libavcodec/x86/lossless_audiodsp_init.c
@@ -31,6 +31,10 @@ int32_t ff_scalarproduct_and_madd_int16_ssse3(int16_t *v1, 
const int16_t *v2,
   const int16_t *v3,
   int order, int mul);
 
+int32_t ff_scalarproduct_and_madd_int32_sse4(int32_t *v1, const int32_t *v2,
+ const int32_t *v3,
+ int order, int mul);
+
 av_cold void ff_llauddsp_init_x86(LLAudDSPContext *c)
 {
 #if HAVE_YASM
@@ -45,5 +49,8 @@ av_cold void ff_llauddsp_init_x86(LLAudDSPContext *c)
 if (EXTERNAL_SSSE3(cpu_flags) &&
 !(cpu_flags & (AV_CPU_FLAG_SSE42 | AV_CPU_FLAG_3DNOW))) // cachesplit
 c->scalarproduct_and_madd_int16 = 
ff_scalarproduct_and_madd_int16_ssse3;
+
+if (EXTERNAL_SSE4(cpu_flags))
+c->scalarproduct_and_madd_int32 = ff_scalarproduct_and_madd_int32_sse4;
 #endif
 }
-- 
2.8.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 3/6] wmalossless pro: move lms_update

2016-04-18 Thread Christophe Gisquet
Cosmetics before macroing it and another function.
---
 libavcodec/wmalosslessdec.c | 94 ++---
 1 file changed, 47 insertions(+), 47 deletions(-)

diff --git a/libavcodec/wmalosslessdec.c b/libavcodec/wmalosslessdec.c
index 3885dc1..77017ff 100644
--- a/libavcodec/wmalosslessdec.c
+++ b/libavcodec/wmalosslessdec.c
@@ -710,53 +710,6 @@ static void revert_mclms(WmallDecodeCtx *s, int tile_size)
 }
 }
 
-static void lms_update(WmallDecodeCtx *s, int ich, int ilms, int input)
-{
-int recent = s->cdlms[ich][ilms].recent;
-int range  = 1 << s->bits_per_sample - 1;
-int order  = s->cdlms[ich][ilms].order;
-
-if (s->bits_per_sample > 16) {
-if (recent)
-recent--;
-else {
-memcpy(s->cdlms[ich][ilms].lms_prevvalues + order,
-   s->cdlms[ich][ilms].lms_prevvalues, 
sizeof(*s->cdlms[ich][ilms].lms_prevvalues) * order);
-memcpy(s->cdlms[ich][ilms].lms_updates + order,
-   s->cdlms[ich][ilms].lms_updates, 
sizeof(*s->cdlms[ich][ilms].lms_updates) * order);
-recent = order - 1;
-}
-
-s->cdlms[ich][ilms].lms_prevvalues[recent] = av_clip(input, -range, range 
- 1);
-s->cdlms[ich][ilms].lms_updates[recent] = WMASIGN(input) * 
s->update_speed[ich];
-
-s->cdlms[ich][ilms].lms_updates[recent + (order >> 4)] >>= 2;
-s->cdlms[ich][ilms].lms_updates[recent + (order >> 3)] >>= 1;
-s->cdlms[ich][ilms].recent = recent;
-memset(s->cdlms[ich][ilms].lms_updates + recent + order, 0,
-   sizeof(s->cdlms[ich][ilms].lms_updates) - 4*(recent+order));
-} else {
-int16_t *prevvalues = s->cdlms[ich][ilms].lms_prevvalues;
-int16_t *updates= s->cdlms[ich][ilms].lms_updates;
-if (recent)
-recent--;
-else {
-memcpy(prevvalues + order, prevvalues, 2 * order);
-memcpy(updates + order, updates, 2 * order);
-recent = order - 1;
-}
-
-prevvalues[recent] = av_clip(input, -range, range - 1);
-updates[recent] = WMASIGN(input) * s->update_speed[ich];
-
-updates[recent + (order >> 4)] >>= 2;
-updates[recent + (order >> 3)] >>= 1;
-s->cdlms[ich][ilms].recent = recent;
-memset(updates + recent + order, 0,
-   sizeof(s->cdlms[ich][ilms].lms_updates) - 2*(recent+order));
-}
-}
-
 static void use_high_update_speed(WmallDecodeCtx *s, int ich)
 {
 int ilms, recent, icoef;
@@ -806,6 +759,53 @@ static void use_normal_update_speed(WmallDecodeCtx *s, int 
ich)
 s->update_speed[ich] = 8;
 }
 
+static void lms_update(WmallDecodeCtx *s, int ich, int ilms, int input)
+{
+int recent = s->cdlms[ich][ilms].recent;
+int range  = 1 << s->bits_per_sample - 1;
+int order  = s->cdlms[ich][ilms].order;
+
+if (s->bits_per_sample > 16) {
+if (recent)
+recent--;
+else {
+memcpy(s->cdlms[ich][ilms].lms_prevvalues + order,
+   s->cdlms[ich][ilms].lms_prevvalues, 
sizeof(*s->cdlms[ich][ilms].lms_prevvalues) * order);
+memcpy(s->cdlms[ich][ilms].lms_updates + order,
+   s->cdlms[ich][ilms].lms_updates, 
sizeof(*s->cdlms[ich][ilms].lms_updates) * order);
+recent = order - 1;
+}
+
+s->cdlms[ich][ilms].lms_prevvalues[recent] = av_clip(input, -range, range 
- 1);
+s->cdlms[ich][ilms].lms_updates[recent] = WMASIGN(input) * 
s->update_speed[ich];
+
+s->cdlms[ich][ilms].lms_updates[recent + (order >> 4)] >>= 2;
+s->cdlms[ich][ilms].lms_updates[recent + (order >> 3)] >>= 1;
+s->cdlms[ich][ilms].recent = recent;
+memset(s->cdlms[ich][ilms].lms_updates + recent + order, 0,
+   sizeof(s->cdlms[ich][ilms].lms_updates) - 4*(recent+order));
+} else {
+int16_t *prevvalues = s->cdlms[ich][ilms].lms_prevvalues;
+int16_t *updates= s->cdlms[ich][ilms].lms_updates;
+if (recent)
+recent--;
+else {
+memcpy(prevvalues + order, prevvalues, 2 * order);
+memcpy(updates + order, updates, 2 * order);
+recent = order - 1;
+}
+
+prevvalues[recent] = av_clip(input, -range, range - 1);
+updates[recent] = WMASIGN(input) * s->update_speed[ich];
+
+updates[recent + (order >> 4)] >>= 2;
+updates[recent + (order >> 3)] >>= 1;
+s->cdlms[ich][ilms].recent = recent;
+memset(updates + recent + order, 0,
+   sizeof(s->cdlms[ich][ilms].lms_updates) - 2*(recent+order));
+}
+}
+
 static void revert_cdlms(WmallDecodeCtx *s, int ch,
  int coef_begin, int coef_end)
 {
-- 
2.8.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 0/6] wmalossless: fix 16bits speed regression

2016-04-18 Thread Christophe Gisquet
I think only the 2 first patches are needed, but I prefer the code
from the 3rd+4th patches. Overall, it's still not the nicest code, and
valgrind-proofing the patchset is needed (not possible atm for me).

The SSE4 implementation is not worthwhile in my opinion.

Christophe Gisquet (6):
  fate: wma: add lossless 24bits test
  wmalossless: allow calling madd_int16
  wmalossless pro: move lms_update
  wmalossless: template code to remove inloop if
  x86: lossless audio: SSE4 madd 32bits
  lossless audio dsp: unroll

 libavcodec/lossless_audiodsp.c  |  12 ++-
 libavcodec/wmalosslessdec.c | 137 
 libavcodec/x86/lossless_audiodsp.asm|  38 +
 libavcodec/x86/lossless_audiodsp_init.c |   7 ++
 tests/fate/lossless-audio.mak   |   4 +-
 tests/ref/fate/lossless-wma24   |   1 +
 6 files changed, 143 insertions(+), 56 deletions(-)
 create mode 100644 tests/ref/fate/lossless-wma24

-- 
2.8.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/wmalosslessdec: real 24bit support

2016-04-13 Thread Christophe Gisquet
Hi,

2016-04-12 22:53 GMT+02:00 Paul B Mahol :

> -LLAudDSPContext dsp;   ///< accelerated

And later:

> +static int scalarproduct_and_madd_int(int *v1, const int *v2,
> +  const int *v3,
> +  int order, int mul)
> +{
> +int res = 0;
> +
> +av_assert0(order >= 0);
> +while (order--) {
> +res   += *v1 * *v2++;
> +*v1++ += mul * *v3++;
> +}
> +return res;
> +}

As Hendrik said, please move it to LLAudDSPContext.

On a side note, is this through RE or guess (I'm asking because the
guess was not difficult) ?

Because having an accumulator on 32 bits only may lead to overflows.

> +int mclms_prevvalues[WMALL_MAX_CHANNELS * 2 * 32];
> +int mclms_updates[WMALL_MAX_CHANNELS * 2 * 32];
>  int mclms_recent;
>
>  int movave_scaling;
> @@ -146,9 +144,9 @@ typedef struct WmallDecodeCtx {
>  int scaling;
>  int coefsend;
>  int bitsend;
> -DECLARE_ALIGNED(16, int16_t, coefs)[MAX_ORDER +
> WMALL_COEFF_PAD_SIZE/sizeof(int16_t)];
> -DECLARE_ALIGNED(16, int16_t, lms_prevvalues)[MAX_ORDER * 2 +
> WMALL_COEFF_PAD_SIZE/sizeof(int16_t)];
> -DECLARE_ALIGNED(16, int16_t, lms_updates)[MAX_ORDER * 2 +
> WMALL_COEFF_PAD_SIZE/sizeof(int16_t)];
> +int coefs[MAX_ORDER + WMALL_COEFF_PAD_SIZE];
> +int lms_prevvalues[MAX_ORDER * 2 + WMALL_COEFF_PAD_SIZE];
> +int lms_updates[MAX_ORDER * 2 + WMALL_COEFF_PAD_SIZE];

I prefer int32_t just because it's something to dspize. Plus at some
point someone would have to redo the alignment.


> -   sizeof(int16_t) * order * num_channels);
> +   sizeof(int) * order * num_channels);

The format has the bitdepth stored and put into s->bits_per_sample.
The decoder actually uses it to select how to store the samples later
on.
In any such case, this should be dynamic. Either you use int size =
s->bits_per_sample>16 ? 4 : 2 (because sizeof(int16_t isn't going to
change much...)

Or FFALIGN(s->bits_per_sample>>3, 2)? Whatever floats your boat.

> -pred +=
> s->dsp.scalarproduct_and_madd_int16(s->cdlms[ch][ilms].coefs,
> -
> s->cdlms[ch][ilms].lms_prevvalues
> -+
> s->cdlms[ch][ilms].recent,
> -
> s->cdlms[ch][ilms].lms_updates
> -+
> s->cdlms[ch][ilms].recent,
> -
> FFALIGN(s->cdlms[ch][ilms].order,
> -
> WMALL_COEFF_PAD_SIZE),
> -WMASIGN(residue));
> +pred += scalarproduct_and_madd_int(s->cdlms[ch][ilms].coefs,
> +
> s->cdlms[ch][ilms].lms_prevvalues
> +   + 
> s->cdlms[ch][ilms].recent,
> +   s->cdlms[ch][ilms].lms_updates
> +   + 
> s->cdlms[ch][ilms].recent,
> +   s->cdlms[ch][ilms].order,
> +   WMASIGN(residue));

And then here:
- switch based on bitdepth (the needed 'if' wouldn't be the end of the
world but it's not actually needed);
- or use a function pointer in the context

For the later point, unless going through a proxy, it may, obviously, look like:
int (*scalarproduct_and_madd_int)(void *v1, const void *v2,
  const void *v3, int order, int mul)
but there might be compilation warning on call or setting the variable.

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/3] configure: Force mingw's ld to keep the reloc section

2016-03-20 Thread Christophe Gisquet
Hi,

2016-03-19 19:08 GMT+01:00 Ismail Donmez <ism...@i10z.com>:

>> 2016-03-11 8:57 GMT+01:00 Christophe Gisquet <christophe.gisq...@gmail.com>:
>>>> It should either be reverted or made dependent on
>>>> --enable/disable-debug (I would favor the first, honestly, since its a
>>>> rather ugly hack in itself).
[...]
>> I don't have a strong opinion on actually reverting it, but would lean for 
>> it.
>
> Please don't disable it for release builds, it improves the security
> of the resulting executable.

I understand the sentiment, and there's probably little lost in
keeping it, but... is it not a hack? ie:
- When do you notice the added security is no longer there/it breaks
in even worse ways?
- Who is and would be available and able to prevent it from breaking?
Because it already has, and almost nobody dealt with it.

The original author already did well in reporting the issue to
binutils, so I'm certainly not complaining about his efforts.

-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/3] configure: Force mingw's ld to keep the reloc section

2016-03-19 Thread Christophe Gisquet
2016-03-19 18:15 GMT+01:00 Hendrik Leppkes <h.lepp...@gmail.com>:
> The same would need to be applied for the 32-bit case as well, fwiw.

I didn't have a build environment to assert that, but now I do, and I
confirm it is needed. Patch updated.

-- 
Christophe
From 87e4f2a42bdb5f733d104ffba7cf70f786b72a03 Mon Sep 17 00:00:00 2001
From: Christophe Gisquet <christophe.gisq...@gmail.com>
Date: Sat, 19 Mar 2016 14:45:23 +0100
Subject: [PATCH] mingw64: configure: disable pie with debug enabled

This breaks use of gdb.
---
 configure | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index e5de306..9ccd727 100755
--- a/configure
+++ b/configure
@@ -4604,10 +4604,14 @@ case $target_os in
 # for dynamicbase (ASLR).  Using -pie does retain the reloc section
 # however ld then forgets what the entry point should be (oops) so we
 # have to manually (re)set it.
-if enabled x86_32; then
+# However, adding -pie breaks debugging through gdb at least under
+# mingw, so let's not do this when debugging has been enabled.
+if enabled x86_32 && disabled debug; then
 add_ldexeflags -Wl,--pic-executable,-e,_mainCRTStartup
 elif enabled x86_64; then
-add_ldexeflags -Wl,--pic-executable,-e,mainCRTStartup
+if disabled debug; then
+add_ldexeflags -Wl,--pic-executable,-e,mainCRTStartup
+fi
 check_ldflags -Wl,--high-entropy-va # binutils 2.25
 # Set image base >4GB for extra entropy with HEASLR
 add_ldexeflags -Wl,--image-base,0x14000
-- 
2.7.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/3] configure: Force mingw's ld to keep the reloc section

2016-03-19 Thread Christophe Gisquet
Hi,

2016-03-11 8:57 GMT+01:00 Christophe Gisquet <christophe.gisq...@gmail.com>:
>> It should either be reverted or made dependent on
>> --enable/disable-debug (I would favor the first, honestly, since its a
>> rather ugly hack in itself).
>
> At the very least, that dependence is needed, yes.

And here's something for the smallest reasonable change.

I don't have a strong opinion on actually reverting it, but would lean for it.

-- 
Christophe
From 429a47f83d2d262a3392af34b13dbf14c735c8b9 Mon Sep 17 00:00:00 2001
From: Christophe Gisquet <christophe.gisq...@gmail.com>
Date: Sat, 19 Mar 2016 14:45:23 +0100
Subject: [PATCH] mingw64: configure: disable pie with debug enabled

This breaks use of gdb.
---
 configure | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index e5de306..e5befa0 100755
--- a/configure
+++ b/configure
@@ -4604,10 +4604,14 @@ case $target_os in
 # for dynamicbase (ASLR).  Using -pie does retain the reloc section
 # however ld then forgets what the entry point should be (oops) so we
 # have to manually (re)set it.
+# However, adding -pie breaks debugging through gdb at least under
+# mingw, so let's not do this when debugging has been enabled.
 if enabled x86_32; then
 add_ldexeflags -Wl,--pic-executable,-e,_mainCRTStartup
 elif enabled x86_64; then
-add_ldexeflags -Wl,--pic-executable,-e,mainCRTStartup
+if disabled debug; then
+add_ldexeflags -Wl,--pic-executable,-e,mainCRTStartup
+fi
 check_ldflags -Wl,--high-entropy-va # binutils 2.25
 # Set image base >4GB for extra entropy with HEASLR
 add_ldexeflags -Wl,--image-base,0x14000
-- 
2.7.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/3] configure: Force mingw's ld to keep the reloc section

2016-03-11 Thread Christophe Gisquet
Hi,

2016-03-10 19:57 GMT+01:00 Hendrik Leppkes :
> This patch (the relocations part) broke debugging mingw-w64 ffmpeg
> builds with gdb, you can't set breakpoints anymore when its applied.

That issue prevented me to do anything interesting for ffmpeg since
then, thinking it was a toolchain issue. I've lost considerable
ffmpeg-time (and actual code) over it.

> It should either be reverted or made dependent on
> --enable/disable-debug (I would favor the first, honestly, since its a
> rather ugly hack in itself).

At the very least, that dependence is needed, yes.

> Did the binutils/mingw guys ever comment anything useful on this issue?

And does it still exist? And for which toolchain/binutils/mingw
runtime/gdb version actually? There are like 3+ versions one can use
(yours, msys2, tdm, msys1, ...)

Thanks for looking into it,
-- 
Christophe
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


  1   2   3   4   5   6   >