Hi to all:

I'm reading the poppler code and touching something here and there
because I'll implement the atk interface for evince and I need to know
how to get the text of a pdf file from glib.

I want to get the text ordered like you'll read it, I saw that pdftotext
get the text well ordered using the "-raw" option. I looked the code and
I saw that it use TextOutputDev with rawOrder = true.

It's easy to dump the text to a file using the first argument that
receive the TextOutputDev constructor, but I want to get the text as
char *.

I saw that using rawOrder in TextOutputDev you can't use getText method,
it always returns an empty GooString:

...
3603   s = new GooString();
3604 
3605   if (rawOrder) {
3606     return s;
3607   }
...

And here is the question, that is a bug/not_implemented_feature or it's
like that for some reason?

If you think that's a bug I could create the bug and upload a patch to
"solve" it using the TextWordList.

_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Reply via email to