Re: [FFmpeg-devel] [PATCH] avcodec/webvttdec: Unescape HTML entities
On Thu, Oct 08, 2015 at 10:50:49PM +0100, Ricardo wrote: > That would probably be considered a broken WebVTT file, since "&" need to > be encoded as "". > What about "Clment" or any unsupported escape? [...] -- Clément B. signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avcodec/webvttdec: Unescape HTML entities
I think those were needed when UTF-8 wasn't expected, but WebVTT makes it mandatory, according to https://w3c.github.io/webvtt/#file-structure Are you saying that and should be removed (like it was)? On 9 October 2015 at 13:50, Clément Bœschwrote: > On Thu, Oct 08, 2015 at 10:50:49PM +0100, Ricardo wrote: > > That would probably be considered a broken WebVTT file, since "&" need to > > be encoded as "". > > > > What about "Clment" or any unsupported escape? > > [...] > > -- > Clément B. > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] avcodec/webvttdec: Unescape HTML entities
Also fixes adjacent tags not being parsed correctly. Signed-off-by: Ricardo Constantino--- libavcodec/webvttdec.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/libavcodec/webvttdec.c b/libavcodec/webvttdec.c index 1284a17..dec4105 100644 --- a/libavcodec/webvttdec.c +++ b/libavcodec/webvttdec.c @@ -37,11 +37,14 @@ static const struct { {"", "{\\b1}"}, {"", "{\\b0}"}, {"", "{\\u1}"}, {"", "{\\u0}"}, {"{", "\\{"}, {"}", "\\}"}, // escape to avoid ASS markup conflicts +{"", ">"}, {"", "<"}, +{"", ""}, {"", ""}, // FIXME: properly honor bidi marks +{"", "&"}, {"", " "}, }; static int webvtt_event_to_ass(AVBPrint *buf, const char *p) { -int i, skip = 0; +int i, skip, again = 0; while (*p) { @@ -51,13 +54,19 @@ static int webvtt_event_to_ass(AVBPrint *buf, const char *p) if (!strncmp(p, from, len)) { av_bprintf(buf, "%s", webvtt_tag_replace[i].to); p += len; +again = 1; break; } } if (!*p) break; +if (again) { +again = 0; +skip = 0; +continue; +} -if (*p == '<') +if (*p == '<' || *p == '&') skip = 1; else if (*p == '>') skip = 0; -- 2.6.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avcodec/webvttdec: Unescape HTML entities
That would probably be considered a broken WebVTT file, since "&" need to be encoded as "". On 8 October 2015 at 20:46, Clément Bœschwrote: > On Thu, Oct 08, 2015 at 05:20:52PM +0100, Ricardo Constantino wrote: > > Also fixes adjacent tags not being parsed correctly. > > > > Signed-off-by: Ricardo Constantino > > --- > > libavcodec/webvttdec.c | 13 +++-- > > 1 file changed, 11 insertions(+), 2 deletions(-) > > > > diff --git a/libavcodec/webvttdec.c b/libavcodec/webvttdec.c > > index 1284a17..dec4105 100644 > > --- a/libavcodec/webvttdec.c > > +++ b/libavcodec/webvttdec.c > > @@ -37,11 +37,14 @@ static const struct { > > {"", "{\\b1}"}, {"", "{\\b0}"}, > > {"", "{\\u1}"}, {"", "{\\u0}"}, > > {"{", "\\{"}, {"}", "\\}"}, // escape to avoid ASS markup conflicts > > +{"", ">"}, {"", "<"}, > > +{"", ""}, {"", ""}, // FIXME: properly honor bidi marks > > +{"", "&"}, {"", " "}, > > }; > > > > static int webvtt_event_to_ass(AVBPrint *buf, const char *p) > > { > > -int i, skip = 0; > > +int i, skip, again = 0; > > > > while (*p) { > > > > @@ -51,13 +54,19 @@ static int webvtt_event_to_ass(AVBPrint *buf, const > char *p) > > if (!strncmp(p, from, len)) { > > av_bprintf(buf, "%s", webvtt_tag_replace[i].to); > > p += len; > > +again = 1; > > break; > > } > > } > > if (!*p) > > break; > > +if (again) { > > +again = 0; > > +skip = 0; > > +continue; > > +} > > > > -if (*p == '<') > > +if (*p == '<' || *p == '&') > > skip = 1; > > else if (*p == '>') > > I think you need to make the ';' stop skipping. Otherwise my guess is that > something like "Hello Ben" is going to eat Jerry. > > [...] > > -- > Clément B. > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avcodec/webvttdec: Unescape HTML entities
Even if not valid WebVTT, it should now work with something like "Ben". Sample: http://trac.ffmpeg.org/attachment/ticket/4915/htmlescapes.vtt Only issue left is CR-only endings not working, but since Apple stopped using that, it's probably not that important. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] avcodec/webvttdec: Unescape HTML entities
Also fixes adjacent tags not being parsed correctly. Signed-off-by: Ricardo Constantino--- libavcodec/webvttdec.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/libavcodec/webvttdec.c b/libavcodec/webvttdec.c index 1284a17..ae16630 100644 --- a/libavcodec/webvttdec.c +++ b/libavcodec/webvttdec.c @@ -37,11 +37,14 @@ static const struct { {"", "{\\b1}"}, {"", "{\\b0}"}, {"", "{\\u1}"}, {"", "{\\u0}"}, {"{", "\\{"}, {"}", "\\}"}, // escape to avoid ASS markup conflicts +{"", ">"}, {"", "<"}, +{"", ""}, {"", ""}, // FIXME: properly honor bidi marks +{"", "&"}, {"", " "}, }; static int webvtt_event_to_ass(AVBPrint *buf, const char *p) { -int i, skip = 0; +int i, again = 0; while (*p) { @@ -51,19 +54,20 @@ static int webvtt_event_to_ass(AVBPrint *buf, const char *p) if (!strncmp(p, from, len)) { av_bprintf(buf, "%s", webvtt_tag_replace[i].to); p += len; +again = 1; break; } } if (!*p) break; -if (*p == '<') -skip = 1; -else if (*p == '>') -skip = 0; +if (again) { +again = 0; +continue; +} else if (p[0] == '\n' && p[1]) av_bprintf(buf, "\\N"); -else if (!skip && *p != '\r') +else if (*p != '\r') av_bprint_chars(buf, *p, 1); p++; } -- 2.6.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avcodec/webvttdec: Unescape HTML entities
On Thu, Oct 08, 2015 at 05:20:52PM +0100, Ricardo Constantino wrote: > Also fixes adjacent tags not being parsed correctly. > > Signed-off-by: Ricardo Constantino> --- > libavcodec/webvttdec.c | 13 +++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/libavcodec/webvttdec.c b/libavcodec/webvttdec.c > index 1284a17..dec4105 100644 > --- a/libavcodec/webvttdec.c > +++ b/libavcodec/webvttdec.c > @@ -37,11 +37,14 @@ static const struct { > {"", "{\\b1}"}, {"", "{\\b0}"}, > {"", "{\\u1}"}, {"", "{\\u0}"}, > {"{", "\\{"}, {"}", "\\}"}, // escape to avoid ASS markup conflicts > +{"", ">"}, {"", "<"}, > +{"", ""}, {"", ""}, // FIXME: properly honor bidi marks > +{"", "&"}, {"", " "}, > }; > > static int webvtt_event_to_ass(AVBPrint *buf, const char *p) > { > -int i, skip = 0; > +int i, skip, again = 0; > > while (*p) { > > @@ -51,13 +54,19 @@ static int webvtt_event_to_ass(AVBPrint *buf, const char > *p) > if (!strncmp(p, from, len)) { > av_bprintf(buf, "%s", webvtt_tag_replace[i].to); > p += len; > +again = 1; > break; > } > } > if (!*p) > break; > +if (again) { > +again = 0; > +skip = 0; > +continue; > +} > > -if (*p == '<') > +if (*p == '<' || *p == '&') > skip = 1; > else if (*p == '>') I think you need to make the ';' stop skipping. Otherwise my guess is that something like "Hello Ben" is going to eat Jerry. [...] -- Clément B. signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel