Hi
I'm getting an unexpected behavior parsing a pdf file.
I'm trying to get the clean body text of some file, and I get a lot of
aXX strings. Where each X is a number. I appear be the chat code of the
real character, I don't know really.
My code is too simple:
String[] args = {"/home/ernesto/tesis/documento/kvfs.pdf"};
ExtractText.main(args);
The output I get is:
a73a109a112a108a101a109a101a110a116a97a110a100a111 a97a99a99a101a115a111
a97 a115a105a115a116a101a109a97a115 a100a101
a97a114a99a104a105a118a111a115 a118a105a114a116a117a97a108a101a115
a112a97a114a97 a108a97 a104a101a114a114a97a109a105a101a110a116a97
a100a101 a98a250a115a113a117a101a100a97 a75a110a101a111a98a97a115a101
and more ......
The pdf file was generated by pdflatex command, in Ubuntu 9.
The pdf properties are:
producer: pdfTeX-1.40.3
format: PDF-1.4
security: NO
optimized: NO
paper: A4, vertical (210 x 297 mm)
Somebody know this problem?
Some tip?
I googled for three hours without look. :(
If some body want the pdf, I can send it.
Thanks,
Ernesto.