Am 29.09.2016 um 21:11 schrieb Harrington, Ferdinand B:
I found PDFText2HTML.java. Is there an example of how to call it?
Yes, see TestPDFText2HTML.java I doubt that it can do indents. Tilman
Outlook distorted my message. The data is indented like this As bullets: Abc Def Xyz Ghi 123 456 Thank you. -----Original Message----- From: Tilman Hausherr [mailto:[email protected]] Sent: Thursday, September 29, 2016 2:44 PM To: [email protected] Subject: Re: extract bullet points from a PDF Am 29.09.2016 um 15:08 schrieb win harrington:I would like to extract all the lists of bullet points from a PDF fileand put them into an xml format. The items are indented. I wantthe text and the indentation level. The input is like this: - abc - def - xyz - ghi - 123 - 456 Can I convert that to:abc def xyz ghi 123 456 The last step will be toadd tags. I have code to do this: <abc></abc><def></def> <xyz></xyz> <ghi></ghi> <123></123> <456></456>This sounds like an ordinary java question, i.e. parse some text. PDFBox does have some rudimentary paragraph detection, I don't know if it works. Try the PDFText2HTML tool in the source download. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] ________________________________ This e-mail and any attachments are intended only for the use of the addressee(s) named herein and may contain proprietary information. If you are not the intended recipient of this e-mail or believe that you received this email in error, please take immediate action to notify the sender of the apparent error by reply e-mail; permanently delete the e-mail and any attachments from your computer; and do not disseminate, distribute, use, or copy this message and any attachments. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

