On Friday 23 September 2011 15:12:28 Leonard Rosenthol wrote: > On 9/23/11 6:38 AM, "Jonathan Kew" <[email protected]> wrote: > >Once you start dealing with whole paragraphs, multiple columns, table > >cells, etc, etc, things only get worse.... you may get good results for a > >limited class of documents (e.g. unidirectional LTR text, fairly simple > >block layouts), but the general problem for arbitrary PDF documents is > >MUCH harder. > > Agreed 100%! > > Which is why I WISH I convince more PDF production tools to generated > tagged/structured PDF!
That is very nice to hear from you =) Actually consistent To-Unicode mapping should be a good compromise, as higher level software can really segment text into regions of different languages based solely on their alphabets and then detect and correct text flow for each particular region This way the example english WERBEH should generaly work being decomposed into 2 regions with the latter reversed -- Пётр Керзум _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
