El Wednesday 02 March 2016, a les 11:24:09, Jeroen Ooms va escriure: > I am trying to get the same (or similar) text output from the c++ interface > as when using the 'pdftotext' utility without the -layout option. > However raw_order_layout gives malformed output (no text at all for most > pages): > > ustring str = p->text(p->page_rect(), page::raw_order_layout); > > An example: > > - source: http://arxiv.org/pdf/1403.2805.pdf > - pdftotext default output: http://pastebin.com/raw/A93xPT4j > - cpp with page::physical_layout: http://pastebin.com/raw/MZFpTRbD > - cpp with page::raw_order_layout http://pastebin.com/raw/n8dcsqkZ > > The last output is obviously malformed. It misses most text, has no spaces, > etc. Also each time I run it, I get different results so it looks like > there is a memory bug. > > The source code of my bindings is on github: > https://github.com/ropensci/pdftools/blob/master/src/bindings.cpp
Maybe you can have a look? The code of pdftotext is pretty small so looking at the cpp frontend and looking what's wrong should not be very hard. Cheers, Albert _______________________________________________ poppler mailing list poppler@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/poppler