Hello, I am trying to use pdfbox to extract text. Input pdf contains English and Indic (Kannada) characters. I have two systems running XP. One with MS Office and one with Open office. Where I have open office, extracted text is correct (Both English and Indic characters show up). But when I run the same program in system with MS Office, Indic characters not extracted. Please suggest what might be wrong. I have attached the input file. Let me know if you need any more info.
Thanks, Vishwa Bhat

