Re: [FFmpeg-devel] [PATCH] avcodec/webvttdec: Unescape HTML entities

2015-10-09 Thread Clément Bœsch
On Thu, Oct 08, 2015 at 10:50:49PM +0100, Ricardo wrote:
> That would probably be considered a broken WebVTT file, since "&" need to
> be encoded as "".
> 

What about "Clment" or any unsupported escape?

[...]

-- 
Clément B.


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/webvttdec: Unescape HTML entities

2015-10-09 Thread Ricardo
I think those were needed when UTF-8 wasn't expected, but WebVTT makes it
mandatory, according to https://w3c.github.io/webvtt/#file-structure

Are you saying that  and  should be removed (like it
was)?

On 9 October 2015 at 13:50, Clément Bœsch  wrote:

> On Thu, Oct 08, 2015 at 10:50:49PM +0100, Ricardo wrote:
> > That would probably be considered a broken WebVTT file, since "&" need to
> > be encoded as "".
> >
>
> What about "Clment" or any unsupported escape?
>
> [...]
>
> --
> Clément B.
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/webvttdec: Unescape HTML entities

2015-10-08 Thread Ricardo
That would probably be considered a broken WebVTT file, since "&" need to
be encoded as "".

On 8 October 2015 at 20:46, Clément Bœsch  wrote:

> On Thu, Oct 08, 2015 at 05:20:52PM +0100, Ricardo Constantino wrote:
> > Also fixes adjacent tags not being parsed correctly.
> >
> > Signed-off-by: Ricardo Constantino 
> > ---
> >  libavcodec/webvttdec.c | 13 +++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> >
> > diff --git a/libavcodec/webvttdec.c b/libavcodec/webvttdec.c
> > index 1284a17..dec4105 100644
> > --- a/libavcodec/webvttdec.c
> > +++ b/libavcodec/webvttdec.c
> > @@ -37,11 +37,14 @@ static const struct {
> >  {"", "{\\b1}"}, {"", "{\\b0}"},
> >  {"", "{\\u1}"}, {"", "{\\u0}"},
> >  {"{", "\\{"}, {"}", "\\}"}, // escape to avoid ASS markup conflicts
> > +{"", ">"}, {"", "<"},
> > +{"", ""}, {"", ""}, // FIXME: properly honor bidi marks
> > +{"", "&"}, {"", " "},
> >  };
> >
> >  static int webvtt_event_to_ass(AVBPrint *buf, const char *p)
> >  {
> > -int i, skip = 0;
> > +int i, skip, again = 0;
> >
> >  while (*p) {
> >
> > @@ -51,13 +54,19 @@ static int webvtt_event_to_ass(AVBPrint *buf, const
> char *p)
> >  if (!strncmp(p, from, len)) {
> >  av_bprintf(buf, "%s", webvtt_tag_replace[i].to);
> >  p += len;
> > +again = 1;
> >  break;
> >  }
> >  }
> >  if (!*p)
> >  break;
> > +if (again) {
> > +again = 0;
> > +skip = 0;
> > +continue;
> > +}
> >
> > -if (*p == '<')
> > +if (*p == '<' || *p == '&')
> >  skip = 1;
> >  else if (*p == '>')
>
> I think you need to make the ';' stop skipping. Otherwise my guess is that
> something like "Hello Ben" is going to eat Jerry.
>
> [...]
>
> --
> Clément B.
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/webvttdec: Unescape HTML entities

2015-10-08 Thread Ricardo
Even if not valid WebVTT, it should now work with something like
"Ben". Sample:
http://trac.ffmpeg.org/attachment/ticket/4915/htmlescapes.vtt
Only issue left is CR-only endings not working, but since Apple stopped
using that, it's probably not that important.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/webvttdec: Unescape HTML entities

2015-10-08 Thread Clément Bœsch
On Thu, Oct 08, 2015 at 05:20:52PM +0100, Ricardo Constantino wrote:
> Also fixes adjacent tags not being parsed correctly.
> 
> Signed-off-by: Ricardo Constantino 
> ---
>  libavcodec/webvttdec.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/libavcodec/webvttdec.c b/libavcodec/webvttdec.c
> index 1284a17..dec4105 100644
> --- a/libavcodec/webvttdec.c
> +++ b/libavcodec/webvttdec.c
> @@ -37,11 +37,14 @@ static const struct {
>  {"", "{\\b1}"}, {"", "{\\b0}"},
>  {"", "{\\u1}"}, {"", "{\\u0}"},
>  {"{", "\\{"}, {"}", "\\}"}, // escape to avoid ASS markup conflicts
> +{"", ">"}, {"", "<"},
> +{"", ""}, {"", ""}, // FIXME: properly honor bidi marks
> +{"", "&"}, {"", " "},
>  };
>  
>  static int webvtt_event_to_ass(AVBPrint *buf, const char *p)
>  {
> -int i, skip = 0;
> +int i, skip, again = 0;
>  
>  while (*p) {
>  
> @@ -51,13 +54,19 @@ static int webvtt_event_to_ass(AVBPrint *buf, const char 
> *p)
>  if (!strncmp(p, from, len)) {
>  av_bprintf(buf, "%s", webvtt_tag_replace[i].to);
>  p += len;
> +again = 1;
>  break;
>  }
>  }
>  if (!*p)
>  break;
> +if (again) {
> +again = 0;
> +skip = 0;
> +continue;
> +}
>  
> -if (*p == '<')
> +if (*p == '<' || *p == '&')
>  skip = 1;
>  else if (*p == '>')

I think you need to make the ';' stop skipping. Otherwise my guess is that
something like "Hello Ben" is going to eat Jerry.

[...]

-- 
Clément B.


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel