Re: Bengla syllables <... 09BF 09BE> and <... 09BF 09C0>

Richard Wordingham Tue, 07 Feb 2017 13:51:19 -0800

On Tue, 7 Feb 2017 12:22:44 -0800
Manish Goregaokar <[email protected]> wrote:


> I found things like this[1] on wikisource which seems like an OCR of
> some really garbled text. The text does indeed seem like it has
> additional vowel diacritics, but that could also be a scanning glitch.
> The same word appears twice in the document, but once in the text.

In particular, the two sequences look like misinterpreted U+09CB
BENGALI VOWEL SIGN O and U+09CC BENGALI VOWEL SIGN AU, which would
account for their high frequency.  The OCRed texts cited by
Manish seem to be in acute need of manual correction.

Richard.

Re: Bengla syllables <... 09BF 09BE> and <... 09BF 09C0>

Reply via email to