[FFmpeg-devel] [PATCH] lavc/movtextdec: fix incorrect offset calculation for UTF-8 characters

2017-03-07 Thread Erik Bråthen Solem
The 3GPP Timed Text (TTXT / tx3g / mov_text) specification counts multibyte UTF-8 characters as one single character, ffmpeg currently counts bytes. This patch inserts an if test such that: 1. continuation bytes are not counted during decoding 2. style boxes will not split these characters

[FFmpeg-devel] [PATCH] lavc/movtextenc: fix incorrect offset calculation for UTF-8 characters

2017-03-07 Thread Erik Bråthen Solem
The 3GPP Timed Text (TTXT / tx3g / mov_text) specification counts multibyte UTF-8 characters as one single character, ffmpeg currently counts bytes. This produces files where style boxes have incorrect offsets. This patch introduces: 1. a separate variable that keeps track of the byte count 2. a

Re: [FFmpeg-devel] [PATCH 1/1] Fixing 3GPP Timed Text (TTXT / tx3g / mov_text) encoding for UTF-8 (ticket 6021)

2016-12-18 Thread Erik Bråthen Solem
Accidental duplicate of patch 1818. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [FFmpeg-devel, 1/1] libavcodec/movtextdec.c: fixing decoding for UTF-8 (ticket 6021)

2016-12-18 Thread Erik Bråthen Solem
Done. It was assigned its own patch number (1860), so I am changing the state of this one to "Superseded". ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 1/1] Updated version of patch 1840 (ticket 6021)

2016-12-18 Thread Erik Bråthen Solem
Between testing and patch generation a character was deleted by mistake, which broke the patch. This updated version fixes this. Original patch description: Character offsets were interpreted as byte offsets, resulting in misplaced styling tags where multibyte characters were involved. The entire

Re: [FFmpeg-devel] [PATCH 1/1] Fixing 3GPP Timed Text (TTXT / tx3g / mov_text) encoding for UTF-8 (ticket 6021)

2016-12-18 Thread Erik Bråthen Solem
Good question. Since text_pos_chars never exceeds the existing variable text_pos, I did not think about this. No, there are no checks. The spec says that "Authors should limit the string in each text sample to not more than 2048 bytes, for maximum terminal interoperability", but the code does

Re: [FFmpeg-devel] [PATCH 1/1] libavcodec/movtextdec.c: fixing decoding for UTF-8 (ticket 6021)

2016-12-18 Thread Erik Bråthen Solem
Yes, it was supposed to be box_types, not ox_types. I must have removed the b by mistake after I tested the code. Should I resubmit the patch? ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 1/1] Fixing 3GPP Timed Text (TTXT / tx3g / mov_text) encoding for UTF-8 (ticket 6021)

2016-12-18 Thread Erik Bråthen Solem
According to the format specification (3GPP TS 26.245, section 5.2) "storage lengths are specified as byte-counts, wheras highlighting is specified using character offsets." This patch replaces byte counting with character counting for highlighting. See the following page for a link to the

[FFmpeg-devel] [PATCH 1/1] libavcodec/movtextdec.c: fixing decoding for UTF-8 (ticket 6021)

2016-12-16 Thread Erik Bråthen Solem
Character offsets were interpreted as byte offsets, resulting in misplaced styling tags where multibyte characters were involved. The entire subtitle stream would even be rendered invalid if such a misplaced tag happened to split a multibyte character. This patch fixes this for UTF-8; UTF-16 was

[FFmpeg-devel] [PATCH 1/1] Fixing 3GPP Timed Text (TTXT / tx3g / mov_text) encoding for UTF-8 (ticket 6021)

2016-12-15 Thread Erik Bråthen Solem
According to the format specification (3GPP TS 26.245, section 5.2) "storage lengths are specified as byte-counts, wheras highlighting is specified using character offsets." This patch replaces byte counting with character counting for highlighting. See the following page for a link to the