We use the PDFBox to scrap data from public records held in PDFs.     We had to 
modify the PDFTextStripper some to make fit (i’ve uploaded those two changes) 
as well as created some surrounding classes to sort, organize, and filter 
through data.   This is always a challenge with PDFs as they were not really 
designed for this, like scrapping HTML sites are.   

Or next phase is to look at incorporating the images alone with the text to 
create a map of text and images to pull data from in specific locations or 
regions…

Love the product and a big thanks for all the hard work put into it.  

> 
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] 
> Sent: Tuesday, January 07, 2014 1:11 AM
> To: [email protected]
> Subject: users Digest 7 Jan 2014 07:11:18 -0000 Issue 770
> 
> 
> Dear PDFBox users,
> 
> we'd love to hear from you how you are using PDFBox in your PDF applications. 
> Do
> you use it for rendering, merging, creation ... - what is the main 
> application? 
> 
> As we are planning for PDFBox 2.0 there are already a lot of ideas what could 
> be
> done in that release. Your input will help us to better understand where we
> could put our focus. 
> 
> Please understand that we will take your input seriously but as this is a
> volunteers effort we can not commit to a certain functionality. And if you'd
> like to help you're always welcome to do so.
> 
> Thanks a lot for your feedback!
> 
> Maruan Sahyoun
> 
> 
> Administrivia:
> 
> ---------------------------------------------------------------------
> To post to the list, e-mail: [email protected] To unsubscribe, e-mail: 
> [email protected]
> For additional commands, e-mail: [email protected]
> 
> ----------------------------------------------------------------------
> 
> -----Message Disclaimer-----
> 
> This e-mail message is intended only for the use of the individual or
> entity to which it is addressed, and may contain information that is
> privileged, confidential and exempt from disclosure under applicable law.
> If you are not the intended recipient, any dissemination, distribution or
> copying of this communication is strictly prohibited. If you have
> received this communication in error, please notify us immediately by
> reply email to [email protected] and delete or destroy all copies of
> the original message and attachments thereto. Email sent to or from the
> Principal Financial Group or any of its member companies may be retained
> as required by law or regulation.
> 
> Nothing in this message is intended to constitute an Electronic signature
> for purposes of the Uniform Electronic Transactions Act (UETA) or the
> Electronic Signatures in Global and National Commerce Act ("E-Sign")
> unless a specific statement to the contrary is included in this message.
> 
> While this communication may be used to promote or market a transaction
> or an idea that is discussed in the publication, it is intended to provide
> general information about the subject matter covered and is provided with
> the understanding that The Principal is not rendering legal, accounting,
> or tax advice. It is not a marketed opinion and may not be used to avoid
> penalties under the Internal Revenue Code. You should consult with
> appropriate counsel or other advisors on all matters pertaining to legal,
> tax, or accounting obligations and requirements
> 

Reply via email to