Re: ExtractText return aXX codes

Andreas Lehmkühler Mon, 21 Sep 2009 01:07:59 -0700

Hi Ernesto,

> Hi
> 
> I'm getting an unexpected behavior parsing a pdf file.
> I'm trying to get the clean body text of some file, and I get a lot of 
> aXX strings. Where each X is a number. I appear be the chat code of the 
> real character, I don't know really.
> ........
> The pdf file was generated by pdflatex command, in Ubuntu 9.
> 
> The pdf properties are:
> producer: pdfTeX-1.40.3
> format: PDF-1.4
> security: NO
> optimized: NO
> paper: A4, vertical (210 x 297 mm)
> 
> Somebody know this problem?
> Some tip?
What version of pdfbox are you using? If you are using some older version like 
0.7.3, try the trunk version or just wait a couple of days (I have to upload 
the files to download and the webpage first) for the first apache release of 
pdfbox.


> I googled for three hours without look. :(
> 
> If some body want the pdf, I can send it.
If the problem still remains with the trunk version, please file an issue on 
JIRA [1] and attach your pdf if possible.

TIA
Andreas Lehmkühler

[1] https://issues.apache.org/jira/browse/PDFBOX

Re: ExtractText return aXX codes

Reply via email to