Please don't cross-post to pdfbox-dev. All devs are expected to also be on the user list. Thanks.
If your PDF actually contained PDF Forms (which they don't), you could use the ExtractFDF or ExtractXFDF tool to extract the form data. But your Tax.pdf has the form mixed with the form data as normal text. There are also no structure tags that identify certain values. The only thing you can do is use the ExtractText tool as suggested earlier and try to construct rules to find the values in the extracted text you're looking for. But I don't expect that to work reliably. So either get your PDF producer to generate PDF forms or structure tags in the content. But the latter is probably more difficult and I don't know if PDFBox would be a help extracting the values. But PDF forms is most probably the way to go. On 13.11.2008 13:28:58 Duseja, Sushil wrote: > Thank you very much for the response. > > I have gone through the links mentioned below; however that didn't help > me. > > The pdf I want to extract the text from, contains multiple forms. I have > attached a sample pdf for your kind reference. > > Please advise as to how I can fetch a particular value (ex. Account > Number). > > Thanks again. > > > -----Original Message----- > From: Jeremias Maerki [mailto:[EMAIL PROTECTED] > Sent: Thursday, November 13, 2008 5:36 PM > To: [email protected] > Subject: Re: Text Extraction > > Have you looked at the documentation already? > > 0.7.3 release: > http://pdfbox.org/userguide/text_extraction.html > > Development code: > http://incubator.apache.org/pdfbox/userguide/text_extraction.html > > You can also look at the "ExtractText" tool's source code for another > working example to extract text from a PDF. > > On 13.11.2008 11:27:04 Duseja, Sushil wrote: > > Can anyone kindly respond to my question below? > > > > > > > > Thanks! > > > > > > > > -----Original Message----- > > From: Duseja, Sushil > > Sent: Monday, November 10, 2008 8:09 PM > > To: [email protected] > > Subject: Text Extraction > > > > > > > > Hello, > > > > > > > > Can anyone please let me know as to how can I extract text from a pdf > > > > file (with multiple forms) using PDFBox? Is creating and accessing > > > > bookmarks the way to go? If possible, please point me to some working > > > > examples. > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > Jeremias Maerki > Jeremias Maerki
