I am trying to get the same (or similar) text output from the c++ interface as when using the 'pdftotext' utility without the -layout option. However raw_order_layout gives malformed output (no text at all for most pages):
ustring str = p->text(p->page_rect(), page::raw_order_layout); An example: - source: http://arxiv.org/pdf/1403.2805.pdf - pdftotext default output: http://pastebin.com/raw/A93xPT4j - cpp with page::physical_layout: http://pastebin.com/raw/MZFpTRbD - cpp with page::raw_order_layout http://pastebin.com/raw/n8dcsqkZ The last output is obviously malformed. It misses most text, has no spaces, etc. Also each time I run it, I get different results so it looks like there is a memory bug. The source code of my bindings is on github: https://github.com/ropensci/pdftools/blob/master/src/bindings.cpp
_______________________________________________ poppler mailing list poppler@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/poppler