Hi guys,

I am trying to use ExtractText.java to extract the text from a pdf file. However, it only gives me an empty .txt file. I tracked down the source code, and get confused by PDFTextStripper.

So, the ExtractText.java calls stripper.writerPage which in turn calls processPage method. In processPage method, it plays with the charactersByArticle field and my understanding is that it wants to put the articles information into the field charactersByArticle. However, when it sets charactersByArticle's value, it actually set it to empty ArrayList ("charactersByArticle.set( i, new ArrayList() );"). And this line seems to be the only place that the field charactersByArticle is ever modified. As a result charactersByArticle is nothing but a vector of empty ArrayList. Then, when the writePage method is called, it iterates through charactersByArticle and finds no text. This is my understanding of the reason why the ExtractText example doesn't work for me. Please do let me know if I get something wrong or you guys have any suggestions.

Thanks!

Felix

Reply via email to