Hello,

first of all many thanks for your excellent work.

I want to extract the text from a document or a pdf page. The text order should 
be the same as follows by a reader.
This tasks becomes difficult for multi-column document and for tables. As I 
want to format the paragraphs, I cannot
use makeWordList. I would go through TextFlow, TextBlock, Lines and Words. But 
I cannot obtain
the right order for a complex document such as:

http://doc.rero.ch/lm.php?url=1000,43,2,20101130144841-EO/mue_dmc.pdf

Do you have any strategies to re-order the blocks? Do the file contains 
informations about
the right sequence. As acroread, evince, and apple preview behave different, I 
can conclude 
that it is not trivial. Am I right?

Many thanks in advance.

----------------------------------------------------------------------
Johnny Mariéthoz
RERO, Av. de la Gare 45, CH - 1920 MARTIGNY
Téléphone:  +41(0)27 721 8579
Fax              : +41(0)27 721 8586
Web            : http://www.rero.ch
ReroDoc    : http://doc.rero.ch, [email protected]
----------------------------------------------------------------------


_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Reply via email to