Lot of WARNINGS when parsing PDF with Asian text!

Zabrane Mickael Thu, 24 Mar 2011 09:29:55 -0700

Hi guys,

While trying to extract text from this online PDF using Tika CLI 0.9, a lot of 
warnings were reported:


$ java -jar tika-app.jar -v --encoding=UTF8 
"http://www.hsbc.com/1/PA_1_1_S5/content/assets/investor_relations/hbap2010arn_hk_cn.pdf";

Could someone please explains me what's going on?
Is it related to missed fonts?

N.B: I was able to reproduce the same result on OSX and Linux both using Apache 
Tika CLI 0.9.

Thanks in advance!

Regards,
Zabrane

Lot of WARNINGS when parsing PDF with Asian text!

Reply via email to