On 30/09/2014 4:11 AM, "David Starner" <[email protected]> wrote: > > On Fri, Sep 26, 2014 at 4:10 PM, Andrew Cunningham > <[email protected]> wrote: > > * NEVER try to copy and paste text from PDF. It is a preprint format and > > should be treated as such. > > > I'd try and cut and paste from print if I could. People are going to > cut and paste from anything if it saves them a little time. If you > disable cut and pasting from PDF, those who have easy access to OCR > may just print to image and OCR it to cut and paste. To say don't do > this is unproductive. >
Ok what I should say is that in best case scenario for complex script text you can copy and paste nd then do post processing on extracted text to get the actual text. Post processing may involve reordering characters, or systematic conversions of glyph sequences. In worse case scenario you get utter garbage you can not reconstruct pdf files from. Searching and indexing is even more problematic. Honestly, for languages I work with it would be quicker and more accurate in many csses to use OCR (even at 80% accuracy) that cut and paste from PDF. As I said in previous email results and effectiveness will differ depending on fonts used and PDF generator used. PDF was designed for preprint, not archival purposes. > -- > Kie ekzistas vivo, ekzistas espero. > _______________________________________________ > Unicode mailing list > [email protected] > http://unicode.org/mailman/listinfo/unicode
_______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

