Re: [FFmpeg-devel] [PATCH] avcodec/webvttdec: Unescape HTML entities
On Thu, Oct 08, 2015 at 10:50:49PM +0100, Ricardo wrote: > That would probably be considered a broken WebVTT file, since "&" need to > be encoded as "". > What about "Clment" or any unsupported escape? [...] -- Clément B. signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avcodec/webvttdec: Unescape HTML entities
I think those were needed when UTF-8 wasn't expected, but WebVTT makes it mandatory, according to https://w3c.github.io/webvtt/#file-structure Are you saying that and should be removed (like it was)? On 9 October 2015 at 13:50, Clément Bœschwrote: > On Thu, Oct 08, 2015 at 10:50:49PM +0100, Ricardo wrote: > > That would probably be considered a broken WebVTT file, since "&" need to > > be encoded as "". > > > > What about "Clment" or any unsupported escape? > > [...] > > -- > Clément B. > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avcodec/webvttdec: Unescape HTML entities
That would probably be considered a broken WebVTT file, since "&" need to be encoded as "". On 8 October 2015 at 20:46, Clément Bœschwrote: > On Thu, Oct 08, 2015 at 05:20:52PM +0100, Ricardo Constantino wrote: > > Also fixes adjacent tags not being parsed correctly. > > > > Signed-off-by: Ricardo Constantino > > --- > > libavcodec/webvttdec.c | 13 +++-- > > 1 file changed, 11 insertions(+), 2 deletions(-) > > > > diff --git a/libavcodec/webvttdec.c b/libavcodec/webvttdec.c > > index 1284a17..dec4105 100644 > > --- a/libavcodec/webvttdec.c > > +++ b/libavcodec/webvttdec.c > > @@ -37,11 +37,14 @@ static const struct { > > {"", "{\\b1}"}, {"", "{\\b0}"}, > > {"", "{\\u1}"}, {"", "{\\u0}"}, > > {"{", "\\{"}, {"}", "\\}"}, // escape to avoid ASS markup conflicts > > +{"", ">"}, {"", "<"}, > > +{"", ""}, {"", ""}, // FIXME: properly honor bidi marks > > +{"", "&"}, {"", " "}, > > }; > > > > static int webvtt_event_to_ass(AVBPrint *buf, const char *p) > > { > > -int i, skip = 0; > > +int i, skip, again = 0; > > > > while (*p) { > > > > @@ -51,13 +54,19 @@ static int webvtt_event_to_ass(AVBPrint *buf, const > char *p) > > if (!strncmp(p, from, len)) { > > av_bprintf(buf, "%s", webvtt_tag_replace[i].to); > > p += len; > > +again = 1; > > break; > > } > > } > > if (!*p) > > break; > > +if (again) { > > +again = 0; > > +skip = 0; > > +continue; > > +} > > > > -if (*p == '<') > > +if (*p == '<' || *p == '&') > > skip = 1; > > else if (*p == '>') > > I think you need to make the ';' stop skipping. Otherwise my guess is that > something like "Hello Ben" is going to eat Jerry. > > [...] > > -- > Clément B. > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avcodec/webvttdec: Unescape HTML entities
Even if not valid WebVTT, it should now work with something like "Ben". Sample: http://trac.ffmpeg.org/attachment/ticket/4915/htmlescapes.vtt Only issue left is CR-only endings not working, but since Apple stopped using that, it's probably not that important. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avcodec/webvttdec: Unescape HTML entities
On Thu, Oct 08, 2015 at 05:20:52PM +0100, Ricardo Constantino wrote: > Also fixes adjacent tags not being parsed correctly. > > Signed-off-by: Ricardo Constantino> --- > libavcodec/webvttdec.c | 13 +++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/libavcodec/webvttdec.c b/libavcodec/webvttdec.c > index 1284a17..dec4105 100644 > --- a/libavcodec/webvttdec.c > +++ b/libavcodec/webvttdec.c > @@ -37,11 +37,14 @@ static const struct { > {"", "{\\b1}"}, {"", "{\\b0}"}, > {"", "{\\u1}"}, {"", "{\\u0}"}, > {"{", "\\{"}, {"}", "\\}"}, // escape to avoid ASS markup conflicts > +{"", ">"}, {"", "<"}, > +{"", ""}, {"", ""}, // FIXME: properly honor bidi marks > +{"", "&"}, {"", " "}, > }; > > static int webvtt_event_to_ass(AVBPrint *buf, const char *p) > { > -int i, skip = 0; > +int i, skip, again = 0; > > while (*p) { > > @@ -51,13 +54,19 @@ static int webvtt_event_to_ass(AVBPrint *buf, const char > *p) > if (!strncmp(p, from, len)) { > av_bprintf(buf, "%s", webvtt_tag_replace[i].to); > p += len; > +again = 1; > break; > } > } > if (!*p) > break; > +if (again) { > +again = 0; > +skip = 0; > +continue; > +} > > -if (*p == '<') > +if (*p == '<' || *p == '&') > skip = 1; > else if (*p == '>') I think you need to make the ';' stop skipping. Otherwise my guess is that something like "Hello Ben" is going to eat Jerry. [...] -- Clément B. signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel