Re: [poppler] poppler util pdftohtml

Peter A. Kerzum Fri, 23 Sep 2011 04:54:12 -0700

On Friday 23 September 2011 15:12:28 Leonard Rosenthol wrote:
> On 9/23/11 6:38 AM, "Jonathan Kew" <[email protected]> wrote:
> >Once you start dealing with whole paragraphs, multiple columns, table
> >cells, etc, etc, things only get worse.... you may get good results for a
> >limited class of documents (e.g. unidirectional LTR text, fairly simple
> >block layouts), but the general problem for arbitrary PDF documents is
> >MUCH harder.
> 
> Agreed 100%!
> 
> Which is why I WISH I convince more PDF production tools to generated
> tagged/structured PDF!


That is very nice to hear from you =)
Actually consistent To-Unicode mapping should be a good compromise, as higher 
level software can really segment text into regions of different languages 
based solely on their alphabets and then detect and correct text flow for each 
particular region

This way the example

   english WERBEH

should generaly work being decomposed into 2 regions with the latter reversed

-- 
Пётр Керзум
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Re: [poppler] poppler util pdftohtml

Reply via email to