I know you can extract text based on a region, and I also remember seeing 
many e-mails about improvements in preserving spacing in text extraction. 
If you haven't already, search the mailing list archives and see if any of 
those e-mails help you.  I haven't done any text extraction myself, but I 
hopefully someone else on the list will be able to point you in the right 
direction.

---- 
Thanks,
Adam





From:
Kevin Brown <[email protected]>
To:
[email protected]
Date:
03/16/2011 08:23
Subject:
OFF TOPIC -- Extracting PDF tables by selecting them?



Sorry, I understand pdfbox probably won't be able to do this.... but 
perhaps
it can? :)

We use this software from BCL called Jade that allowed you to select a
'zone' on a PDF page and extract it to text in such a way that the spacing
and line breaking was preserved. It did (and does!) a better job of this
than any other tool we have ever tried. But they no longer make or support
it! Just wondering if any of you PDF mavens have found a tool or method 
for
doing this which works really well? It seems impossible to do
programmatically unless you know the parameters of the text -- one needs 
to
select it manually.  For example, we use this a lot for odd tables.





- FHA 203b; 203k; HECM; VA; USDA; Conventional 
- Warehouse Lines; FHA-Authorized Originators 
- Lending and Servicing in over 45 States 
www.swmc.com   -  www.simplehecmcalculator.com   
Visit  www.swmc.com/resources   for helpful links on Training, Webinars, Lender 
Alerts and Submitting Conditions  

This email and any content within or attached hereto from Sun West Mortgage 
Company, Inc. is confidential and/or legally privileged. The information is 
intended only for the use of the individual or entity named on this email. If 
you are not the intended recipient, you are hereby notified that any 
disclosure, copying, distribution or taking any action in reliance on the 
contents of this email information is strictly prohibited, and that the 
documents should be returned to this office immediately by email. Receipt by 
anyone other than the intended recipient is not a waiver of any privilege. 
Please do not include your social security number, account number, or any other 
personal or financial information in the content of the email. Should you have 
any questions, please call (800) 453 7884.  

Reply via email to