Hi guys, While trying to extract text from this online PDF using Tika CLI 0.9, a lot of warnings were reported:
$ java -jar tika-app.jar -v --encoding=UTF8 "http://www.hsbc.com/1/PA_1_1_S5/content/assets/investor_relations/hbap2010arn_hk_cn.pdf" Could someone please explains me what's going on? Is it related to missed fonts? N.B: I was able to reproduce the same result on OSX and Linux both using Apache Tika CLI 0.9. Thanks in advance! Regards, Zabrane
