Re: A question about PDFText2HTML

Shen Wang Mon, 26 Oct 2009 19:35:55 -0700

Hi Patric and Chiyean,

Thanks for your guys' reply. It definitely helps. I didn't get back toyour guys earlier because I cannot find internet connection for the pastdays.

Chiyean: I have checked the ExtractText.java example. Actually I didthat before I asked the question. It's just that the PDDocumentparameter seems to be only for the writeText method. The object ofPDFTextStripper may still have no idea about which document it'sprocessing when other methods are called.

Patric: Thanks for reminding me about tracking back to the extendedclasses. But still, I got some problem. For example, if it's not theExtractText.java example, I will never figure out what the parameter"encoding" is and what are the options. It's only mentioned in thejavadoc that it's a string type. Another example is for theprocessStream method, one of its parameter is COSStream. However, I haveno idea what it's about. It extends COSDictionary, which is a class"represents a dictionary where name/value pairs reside". But, it nevermentions how does the COSStream and a dictionary is related to a pdffile and in all the method of COSStream and COSDictionary, I don't seeanyone can let these object know which pdf file is being processed. Myfeeling is I must miss some parts but I don't what that is. However,this makes me feel confused about what is going on. How do you figureout how those things (like COSStream, COSDictionary, encoding,PDResources, keys...) correspond to the pdf files? Do I need to gothrough the pdf file documentation to make myself clear about that?Please help me out. Thanks.


Best,

Felix


Omar Chiyean wrote:

Hi Patric, have you seen the examples
in the distribution??

Check org.apache.pdfbox.ExtracText.java
There is the way to use this class..

What I can say is that you need a PDDocument Handler.
Check the example, it would be very helpfull.

Cheers...

Re: A question about PDFText2HTML

Reply via email to