Hi Ernesto, > Hi > > I'm getting an unexpected behavior parsing a pdf file. > I'm trying to get the clean body text of some file, and I get a lot of > aXX strings. Where each X is a number. I appear be the chat code of the > real character, I don't know really. > ........ > The pdf file was generated by pdflatex command, in Ubuntu 9. > > The pdf properties are: > producer: pdfTeX-1.40.3 > format: PDF-1.4 > security: NO > optimized: NO > paper: A4, vertical (210 x 297 mm) > > Somebody know this problem? > Some tip? What version of pdfbox are you using? If you are using some older version like 0.7.3, try the trunk version or just wait a couple of days (I have to upload the files to download and the webpage first) for the first apache release of pdfbox.
> I googled for three hours without look. :( > > If some body want the pdf, I can send it. If the problem still remains with the trunk version, please file an issue on JIRA [1] and attach your pdf if possible. TIA Andreas Lehmkühler [1] https://issues.apache.org/jira/browse/PDFBOX