Re: [iText-questions] Reading and Extracting Text from PDF

2006-02-16 Thread bruno
Richard Braman wrote: I am a little confused by this snippet. Where are you getting the steam from? I know it's a PR Stream, but which contructor do you use to create the PRStream? There must be some code before this. byte[] streamBytes = reader.getPageContent(pagenumber); br, Bruno --

RE: [iText-questions] Reading and Extracting Text from PDF

2006-02-15 Thread Richard Braman
ruary 15, 2006 2:46 AM To: [EMAIL PROTECTED] Cc: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Reading and Extracting Text from PDF Richard Braman wrote: >My guess is that >there is a way to interate thorugh the dictionary and get what I want >(like Bruno showe

RE: [iText-questions] Reading and Extracting Text from PDF

2006-02-15 Thread Mark Storer
also used extensively by PdfReader. --Mark Storer Senior Software Engineer Cardiff Software #include typedef std::Disclaimer DisCard; > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] Behalf > Of Richard > Braman > Sent: Tuesday, Febru

Re: [iText-questions] Reading and Extracting Text from PDF

2006-02-15 Thread Leonard Rosenthol
At 05:39 PM 2/14/2006, Richard Braman wrote: I have a open source project that is attempting to structure IRS produced documents such as publications and instructions and parse out data that is critical to building tax software. Use either PDFBox or JPEDAL. Leonard --

Re: [iText-questions] Reading and Extracting Text from PDF

2006-02-15 Thread Bruno Lowagie
Mark Storer wrote: The problems with extracting text from some PDF found in the wild are as follows That was a very good summary of all the problems one encounters when trying to convert a PDF file to plain text! It explains why I don't promote the use of the solution I have sent earlier: br,

RE: [iText-questions] Reading and Extracting Text from PDF

2006-02-14 Thread Mark Storer
> Sent: Tuesday, February 14, 2006 2:39 PM > To: itext-questions@lists.sourceforge.net > Subject: [iText-questions] Reading and Extracting Text from PDF > > > I have a open source project that is attempting to structure IRS > produced documents such as publications and instr

RE: [iText-questions] Reading and Extracting Text from PDF

2006-02-14 Thread Mark Storer
TECTED] > [mailto:[EMAIL PROTECTED] Behalf > Of Richard > Braman > Sent: Tuesday, February 14, 2006 10:35 AM > To: itext-questions@lists.sourceforge.net > Subject: [iText-questions] Reading and Extracting Text from PDF > > > I have a open source project that is attempting to structure

Re: [iText-questions] Reading and Extracting Text from PDF

2006-02-14 Thread Bruno Lowagie
Richard Braman wrote: My guess is that there is a way to interate thorugh the dictionary and get what I want (like Bruno showed me how to do with the AcroForm.Fields), any code to do that would be great. You need the class PRTokeniser to do this, I don't recommend it because it won't work for

[iText-questions] Reading and Extracting Text from PDF

2006-02-14 Thread Richard Braman
I have a open source project that is attempting to structure IRS produced documents such as publications and instructions and parse out data that is critical to building tax software. An example of such a file is http://www.irs.gov/pub/irs-pdf/p1346.pdf. This file contains e-file record layouts, wh

RE: [iText-questions] Reading and Extracting Text from PDF

2006-02-14 Thread Richard Braman
)Tj ET -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Richard Braman Sent: Tuesday, February 14, 2006 1:35 PM To: itext-questions@lists.sourceforge.net Subject: [iText-questions] Reading and Extracting Text from PDF I have a open source project that

[iText-questions] Reading and Extracting Text from PDF

2006-02-14 Thread Richard Braman
I have a open source project that is attempting to structure IRS produced documents such as publications and instructions and parse out data that is critical to building tax software. An example of such a file is http://www.irs.gov/pub/irs-pdf/p1346.pdf. This file contains e-file record layouts, wh