ExtractText return aXX codes

Ernesto De Santis Sun, 20 Sep 2009 13:42:34 -0700

Hi

I'm getting an unexpected behavior parsing a pdf file.

I'm trying to get the clean body text of some file, and I get a lot ofaXX strings. Where each X is a number. I appear be the chat code of thereal character, I don't know really.


My code is too simple:
          String[] args = {"/home/ernesto/tesis/documento/kvfs.pdf"};
          ExtractText.main(args);

The output I get is:

a73a109a112a108a101a109a101a110a116a97a110a100a111 a97a99a99a101a115a111a97 a115a105a115a116a101a109a97a115 a100a101a97a114a99a104a105a118a111a115 a118a105a114a116a117a97a108a101a115a112a97a114a97 a108a97 a104a101a114a114a97a109a105a101a110a116a97

a100a101 a98a250a115a113a117a101a100a97 a75a110a101a111a98a97a115a101
and more ......

The pdf file was generated by pdflatex command, in Ubuntu 9.

The pdf properties are:
producer: pdfTeX-1.40.3
format: PDF-1.4
security: NO
optimized: NO
paper: A4, vertical (210 x 297 mm)

Somebody know this problem?
Some tip?
I googled for three hours without look. :(

If some body want the pdf, I can send it.

Thanks,
Ernesto.

ExtractText return aXX codes

Reply via email to