Re: [Okular-devel] Export from pdf to txt, invoking from the command line

Jiri Baum Thu, 10 Nov 2011 23:18:39 -0800

Hello,

filippo di natale:
> I need to parse "csv" or "fixed length" like documents that are
> unfortunately in pdf format, if anyone has any suggestion on how to parse
> them without translating them to text...


The library that okular uses is Poppler - http://poppler.freedesktop.org

For "fixed length" like documents in pdf format, the recently-implemented 
"Table Selection Tool" might be useful - see very recent git master and/or 
bugs 279859 and 283440. That will let you select the "fixed length" part of 
the pdf document, divide it up into rows and columns, then paste into a 
spreadsheet or other tabular document.

If you need automated processing, there are things like TableSeer floating 
around, but be prepared for fairly moderate performance only - sometimes it 
finds and extracts the tables, sometimes it doesn't or only partially. It 
would probably depend on your documents. http://tableseer.sf.net


Jiri
-- 
Jiri Baum <j...@baum.com.au>                   http://www.baum.com.au/sabik
_______________________________________________
Okular-devel mailing list
Okular-devel@kde.org
https://mail.kde.org/mailman/listinfo/okular-devel

Re: [Okular-devel] Export from pdf to txt, invoking from the command line

Reply via email to