Re: Extracting layout information and text from searchable PDF

viraf.bankwalla Tue, 28 Feb 2017 05:42:06 -0800

Thank you - viraf

      From: "Allison, Timothy B." <[email protected]>
 To: "[email protected]" <[email protected]>; 
"[email protected]" <[email protected]> 
 Sent: Monday, February 27, 2017 7:46 PM
 Subject: RE: Extracting layout information and text from searchable PDF
   
Might be relevant:


https://github.com/JonathanLink/PDFLayoutTextStripper

This might be helpful:
 
https://github.com/apache/tika/pull/152

If you want to extract tables, take a look at Tabula:
http://tabula.technology/


-----Original Message-----
From: [email protected] 
[mailto:[email protected]] 
Sent: Monday, February 27, 2017 1:36 PM
To: [email protected]
Subject: Extracting layout information and text from searchable PDF

I have a number of searchable PDF documents from which I want to extract layout 
information and text.  These documents are mixed in that some pages may be 
structured (e.g. forms) while others may be unstructured free form text (e.g. 
letters, reports, etc). I was wondering if there were any projects that 
provided such capabilities.  I am familiar with PdfTextExtractor and it would 
probably be a starting point if I was to build this functionality out.
Thanks
- viraf

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Extracting layout information and text from searchable PDF

Reply via email to