Re: Current support for N'Ko

Andrew Cunningham Mon, 29 Sep 2014 13:18:13 -0700

On 30/09/2014 4:11 AM, "David Starner" <[email protected]> wrote:
>
> On Fri, Sep 26, 2014 at 4:10 PM, Andrew Cunningham
> <[email protected]> wrote:
> > * NEVER try to copy and paste text from PDF. It is a preprint format and
> > should be treated as such.
>
>
> I'd try and cut and paste from print if I could. People are going to
> cut and paste from anything if it saves them a little time. If you
> disable cut and pasting from PDF, those who have easy access to OCR
> may just print to image and OCR it to cut and paste. To say don't do
> this is unproductive.
>


Ok what I should say is that in best case scenario for complex script text
you can copy and paste nd then do post processing on extracted text to get
the actual text. Post processing may involve reordering characters, or
systematic conversions of glyph sequences.

In worse case scenario you get utter garbage you can not reconstruct pdf
files from.

Searching and indexing is even more problematic.

Honestly, for languages I work with it would be quicker and more accurate
in many csses to use OCR (even at 80% accuracy) that cut and paste from PDF.

As I said in previous email results and effectiveness will differ depending
on fonts used and PDF generator used.

PDF was designed for preprint, not archival purposes.

> --
> Kie ekzistas vivo, ekzistas espero.
> _______________________________________________
> Unicode mailing list
> [email protected]
> http://unicode.org/mailman/listinfo/unicode

_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode

Re: Current support for N'Ko

Reply via email to