Re: [FFmpeg-devel] [PATCH 15/21] avformat/matroskadec: Redo level handling

2019-04-07 Thread Andreas Rheinhardt via ffmpeg-devel
Steve Lhomme:
> On 3/27/2019 12:18 PM, Andreas Rheinhardt via ffmpeg-devel wrote:
>> This commit changes how levels are handled: If the level used for
>> ebml_parse ends directly after an element that has been consumed, then
>> ebml_parse ends the level itself (and any finite-sized levels that end
>> there as well) and informs the caller via the return value; if the
>> current level is unknown-sized, then the level is ended as soon as
>> an element that is not valid on the current level is encountered.
>>
>> This is designed for situations where one wants to parse master
>> elements
>> incrementally, i.e. not in one go via ebml_parse_nest.
>>
>> Signed-off-by: Andreas Rheinhardt 
>> ---
>>   libavformat/matroskadec.c | 104
>> +++---
>>   1 file changed, 85 insertions(+), 19 deletions(-)
>>
>> diff --git a/libavformat/matroskadec.c b/libavformat/matroskadec.c
>> index 42f1c21042..32cf57685f 100644
>> --- a/libavformat/matroskadec.c
>> +++ b/libavformat/matroskadec.c
>> @@ -69,6 +69,8 @@
>>   #include "qtpalette.h"
>>     #define EBML_UNKNOWN_LENGTH  UINT64_MAX /* EBML unknown length,
>> in uint64_t */
>> +#define LEVEL_ENDED   2 /* return value of
>> ebml_parse when the
>> + * syntax level used for
>> parsing ended. */
>>     typedef enum {
>>   EBML_NONE,
>> @@ -1041,16 +1043,32 @@ static int
>> ebml_parse_id(MatroskaDemuxContext *matroska, EbmlSyntax *syntax,
>>    uint32_t id, void *data)
>>   {
>>   int i;
>> +
>>   for (i = 0; syntax[i].id; i++)
>>   if (id == syntax[i].id)
>>   break;
>> -    if (!syntax[i].id && id == MATROSKA_ID_CLUSTER &&
>> -    matroska->num_levels > 0   &&
>> -    matroska->levels[matroska->num_levels - 1].length ==
>> EBML_UNKNOWN_LENGTH)
>> -    return 0;  // we reached the end of an unknown size cluster
>> +
>>   if (!syntax[i].id && id != EBML_ID_VOID && id != EBML_ID_CRC32) {
>> -    av_log(matroska->ctx, AV_LOG_DEBUG, "Unknown entry
>> 0x%"PRIX32"\n", id);
>> +    if (!matroska->num_levels ||
>> matroska->levels[matroska->num_levels - 1].length
>> +    !=
>> EBML_UNKNOWN_LENGTH) {
>> +    av_log(matroska->ctx, AV_LOG_DEBUG, "Unknown entry
>> 0x%"PRIX32"\n", id);
> 
> Would this mean a Segment with unknown length would produce this log ?
> Or Segments are not part of the levels at all ? If so, same question
> with level 1 elements which have an unknown length.
> 
The part of the code you are refering to is only executed if the read
id is not part of the syntax list that is currently being used for
parsing (and if it is not one of the global elements (void or crc32)).
After the EBML head has been parsed, the segment is parsed using the
matroska_segments syntax list which of course contains an entry for
the segment. So this segment won't trigger this log no matter whether
it is unknown length or not (the segment's length hasn't been parsed
at this point btw).
But if you have an EBML Stream (a concatenation of multiple EBML
(here: Matroska) documents), then the second EBML header and the
second segment will be treated as unknown elements and it will be
acted accordingly, i.e.:
If the current level is unknown-sized and > 1, then the level will be
ended.
If the level is 1 and unknown-sized, then this will be treated as
error (we'll eventually end up in matroska_resync which will look for
level 1 elements and probably end up finding level 1 elements of the
second EBML document, but they will be treated as if the header of the
first EBML document were active. This is likely not good, but it is a
known restriction: Multiple EBML-documents are simply not supported.)
If the current level is zero or known-sized, then this level log
message will be emitted (depending on the loglevel) and it is
attempted to skip the new element. If the current-level is known-sized
and the new element unknown-sized, then according to the current
patchset, an error will be emitted; according to your proposal, the
new element will be truncated to end at the end of the current level.

The answer to the question regarding clusters is essentially the same.
And this can only happen if some error occured, namely if a cluster is
encountered at a place where no cluster is expected (e.g. if because
of a transmission error a part of a known-sized cluster is missing, so
that the next cluster is (judging by the length field of the preceding
cluster) part of the preceding cluster. This will then be treated as
"unknown entry".
(The non-incremental parsing of clusters has a similar drawback; the
current incremental parsing of clusters (which uses a syntax list
which mixes elements from different levels (which is the reason for
the error message that is the reason why I turned my attention to the
Matroska demuxer in the first place)) doesn't have this error.)

Notice that 

Re: [FFmpeg-devel] [PATCH 15/21] avformat/matroskadec: Redo level handling

2019-04-07 Thread Steve Lhomme

On 3/27/2019 12:18 PM, Andreas Rheinhardt via ffmpeg-devel wrote:

This commit changes how levels are handled: If the level used for
ebml_parse ends directly after an element that has been consumed, then
ebml_parse ends the level itself (and any finite-sized levels that end
there as well) and informs the caller via the return value; if the
current level is unknown-sized, then the level is ended as soon as
an element that is not valid on the current level is encountered.

This is designed for situations where one wants to parse master elements
incrementally, i.e. not in one go via ebml_parse_nest.

Signed-off-by: Andreas Rheinhardt 
---
  libavformat/matroskadec.c | 104 +++---
  1 file changed, 85 insertions(+), 19 deletions(-)

diff --git a/libavformat/matroskadec.c b/libavformat/matroskadec.c
index 42f1c21042..32cf57685f 100644
--- a/libavformat/matroskadec.c
+++ b/libavformat/matroskadec.c
@@ -69,6 +69,8 @@
  #include "qtpalette.h"
  
  #define EBML_UNKNOWN_LENGTH  UINT64_MAX /* EBML unknown length, in uint64_t */

+#define LEVEL_ENDED   2 /* return value of ebml_parse when the
+ * syntax level used for parsing 
ended. */
  
  typedef enum {

  EBML_NONE,
@@ -1041,16 +1043,32 @@ static int ebml_parse_id(MatroskaDemuxContext 
*matroska, EbmlSyntax *syntax,
   uint32_t id, void *data)
  {
  int i;
+
  for (i = 0; syntax[i].id; i++)
  if (id == syntax[i].id)
  break;
-if (!syntax[i].id && id == MATROSKA_ID_CLUSTER &&
-matroska->num_levels > 0   &&
-matroska->levels[matroska->num_levels - 1].length == 
EBML_UNKNOWN_LENGTH)
-return 0;  // we reached the end of an unknown size cluster
+
  if (!syntax[i].id && id != EBML_ID_VOID && id != EBML_ID_CRC32) {
-av_log(matroska->ctx, AV_LOG_DEBUG, "Unknown entry 0x%"PRIX32"\n", id);
+if (!matroska->num_levels || matroska->levels[matroska->num_levels - 
1].length
+!= 
EBML_UNKNOWN_LENGTH) {
+av_log(matroska->ctx, AV_LOG_DEBUG, "Unknown entry 0x%"PRIX32"\n", 
id);


Would this mean a Segment with unknown length would produce this log ? 
Or Segments are not part of the levels at all ? If so, same question 
with level 1 elements which have an unknown length.



+} else if (matroska->num_levels > 1) {
+// Unknown-sized master elements (except root elements) end
+// when an id not known to be allowed in them is encountered.
+matroska->num_levels--;
+return LEVEL_ENDED;
+} else if (matroska->num_levels == 1) {
+AVIOContext *pb = matroska->ctx->pb;
+int64_t pos = avio_tell(pb);
+// An unkown level 1 element inside an unknown-sized segment
+// is considered an error.
+av_log(matroska->ctx, AV_LOG_ERROR, "Found unknown element id 
0x%"PRIX32
+   " at pos. %"PRIu64" (0x%"PRIx64") in an unknown-sized segment. 
"
+   "Will be treated as error.\n", id, pos, pos);
+return AVERROR_INVALIDDATA;
+}
  }
+
  return ebml_parse_elem(matroska, [i], data);
  }
  
@@ -1061,9 +1079,24 @@ static int ebml_parse(MatroskaDemuxContext *matroska, EbmlSyntax *syntax,

  uint64_t id;
  int res = ebml_read_num(matroska, matroska->ctx->pb, 4, , 0);
  if (res < 0) {
-// in live mode, finish parsing if EOF is reached.
-return (matroska->is_live && matroska->ctx->pb->eof_reached &&
-res == AVERROR_EOF) ? 1 : res;
+if (matroska->ctx->pb->eof_reached && res == AVERROR_EOF) {
+if (matroska->is_live)
+// in live mode, finish parsing if EOF is reached.
+return 1;
+if (matroska->num_levels) {
+MatroskaLevel *level = 
>levels[matroska->num_levels - 1];
+
+if (level->length == EBML_UNKNOWN_LENGTH) {
+// Unknown-sized elements automatically end at EOF.
+matroska->num_levels = 0;
+return LEVEL_ENDED;
+} else if (avio_tell(matroska->ctx->pb) < level->start + 
level->length) {
+av_log(matroska->ctx, AV_LOG_ERROR, "File ended before 
"
+   "its declared end\n");
+}
+}
+}
+return res;
  }
  matroska->current_id = id | 1 << 7 * res;
  }
@@ -1098,10 +1131,15 @@ static int ebml_parse_nest(MatroskaDemuxContext 
*matroska, EbmlSyntax *syntax,
  break;
  }
  
-while (!res && !ebml_level_end(matroska))

+if (!matroska->levels[matroska->num_levels - 1].length) {
+matroska->num_levels--;
+return 0;
+}
+
+

[FFmpeg-devel] [PATCH 15/21] avformat/matroskadec: Redo level handling

2019-03-27 Thread Andreas Rheinhardt via ffmpeg-devel
This commit changes how levels are handled: If the level used for
ebml_parse ends directly after an element that has been consumed, then
ebml_parse ends the level itself (and any finite-sized levels that end
there as well) and informs the caller via the return value; if the
current level is unknown-sized, then the level is ended as soon as
an element that is not valid on the current level is encountered.

This is designed for situations where one wants to parse master elements
incrementally, i.e. not in one go via ebml_parse_nest.

Signed-off-by: Andreas Rheinhardt 
---
 libavformat/matroskadec.c | 104 +++---
 1 file changed, 85 insertions(+), 19 deletions(-)

diff --git a/libavformat/matroskadec.c b/libavformat/matroskadec.c
index 42f1c21042..32cf57685f 100644
--- a/libavformat/matroskadec.c
+++ b/libavformat/matroskadec.c
@@ -69,6 +69,8 @@
 #include "qtpalette.h"
 
 #define EBML_UNKNOWN_LENGTH  UINT64_MAX /* EBML unknown length, in uint64_t */
+#define LEVEL_ENDED   2 /* return value of ebml_parse when the
+ * syntax level used for parsing 
ended. */
 
 typedef enum {
 EBML_NONE,
@@ -1041,16 +1043,32 @@ static int ebml_parse_id(MatroskaDemuxContext 
*matroska, EbmlSyntax *syntax,
  uint32_t id, void *data)
 {
 int i;
+
 for (i = 0; syntax[i].id; i++)
 if (id == syntax[i].id)
 break;
-if (!syntax[i].id && id == MATROSKA_ID_CLUSTER &&
-matroska->num_levels > 0   &&
-matroska->levels[matroska->num_levels - 1].length == 
EBML_UNKNOWN_LENGTH)
-return 0;  // we reached the end of an unknown size cluster
+
 if (!syntax[i].id && id != EBML_ID_VOID && id != EBML_ID_CRC32) {
-av_log(matroska->ctx, AV_LOG_DEBUG, "Unknown entry 0x%"PRIX32"\n", id);
+if (!matroska->num_levels || matroska->levels[matroska->num_levels - 
1].length
+!= 
EBML_UNKNOWN_LENGTH) {
+av_log(matroska->ctx, AV_LOG_DEBUG, "Unknown entry 0x%"PRIX32"\n", 
id);
+} else if (matroska->num_levels > 1) {
+// Unknown-sized master elements (except root elements) end
+// when an id not known to be allowed in them is encountered.
+matroska->num_levels--;
+return LEVEL_ENDED;
+} else if (matroska->num_levels == 1) {
+AVIOContext *pb = matroska->ctx->pb;
+int64_t pos = avio_tell(pb);
+// An unkown level 1 element inside an unknown-sized segment
+// is considered an error.
+av_log(matroska->ctx, AV_LOG_ERROR, "Found unknown element id 
0x%"PRIX32
+   " at pos. %"PRIu64" (0x%"PRIx64") in an unknown-sized 
segment. "
+   "Will be treated as error.\n", id, pos, pos);
+return AVERROR_INVALIDDATA;
+}
 }
+
 return ebml_parse_elem(matroska, [i], data);
 }
 
@@ -1061,9 +1079,24 @@ static int ebml_parse(MatroskaDemuxContext *matroska, 
EbmlSyntax *syntax,
 uint64_t id;
 int res = ebml_read_num(matroska, matroska->ctx->pb, 4, , 0);
 if (res < 0) {
-// in live mode, finish parsing if EOF is reached.
-return (matroska->is_live && matroska->ctx->pb->eof_reached &&
-res == AVERROR_EOF) ? 1 : res;
+if (matroska->ctx->pb->eof_reached && res == AVERROR_EOF) {
+if (matroska->is_live)
+// in live mode, finish parsing if EOF is reached.
+return 1;
+if (matroska->num_levels) {
+MatroskaLevel *level = 
>levels[matroska->num_levels - 1];
+
+if (level->length == EBML_UNKNOWN_LENGTH) {
+// Unknown-sized elements automatically end at EOF.
+matroska->num_levels = 0;
+return LEVEL_ENDED;
+} else if (avio_tell(matroska->ctx->pb) < level->start + 
level->length) {
+av_log(matroska->ctx, AV_LOG_ERROR, "File ended before 
"
+   "its declared end\n");
+}
+}
+}
+return res;
 }
 matroska->current_id = id | 1 << 7 * res;
 }
@@ -1098,10 +1131,15 @@ static int ebml_parse_nest(MatroskaDemuxContext 
*matroska, EbmlSyntax *syntax,
 break;
 }
 
-while (!res && !ebml_level_end(matroska))
+if (!matroska->levels[matroska->num_levels - 1].length) {
+matroska->num_levels--;
+return 0;
+}
+
+while (!res)
 res = ebml_parse(matroska, syntax, data);
 
-return res;
+return res == LEVEL_ENDED ? 0 : res;
 }
 
 static int is_ebml_id_valid(uint32_t id)
@@ -1169,7 +1207,7 @@ static int ebml_parse_elem(MatroskaDemuxContext *matroska,
 AVIOContext *pb = matroska->ctx->pb;