ExtractText.java doesn't work and a question about field charactersByArticle in PDFTextStripper

Shen Wang Mon, 02 Nov 2009 14:57:19 -0800

Hi guys,

I am trying to use ExtractText.java to extract the text from a pdf file.However, it only gives me an empty .txt file. I tracked down the sourcecode, and get confused by PDFTextStripper.

So, the ExtractText.java calls stripper.writerPage which in turn callsprocessPage method. In processPage method, it plays with thecharactersByArticle field and my understanding is that it wants to putthe articles information into the field charactersByArticle. However,when it sets charactersByArticle's value, it actually set it to emptyArrayList ("charactersByArticle.set( i, new ArrayList() );"). And thisline seems to be the only place that the field charactersByArticle isever modified. As a result charactersByArticle is nothing but a vectorof empty ArrayList. Then, when the writePage method is called, ititerates through charactersByArticle and finds no text. This is myunderstanding of the reason why the ExtractText example doesn't work forme. Please do let me know if I get something wrong or you guys have anysuggestions.


Thanks!

Felix

ExtractText.java doesn't work and a question about field charactersByArticle in PDFTextStripper

Reply via email to