Michael, Thanks for your time, I was about to open a jira issue and then realized there is an issue (PDFBOX-5) opened already on this related to CJK fonts. In fact I have downloaded the chinese.pdf for my testing from this URL only https://issues.apache.org/jira/browse/PDFBOX-5 (pls check PDFBOX5-CJK.zip\chinese.pdf in the url).
Pls let me know if there is any work around for this issue. Regards Srinivaas -----Original Message----- From: Michael McCandless [mailto:[email protected]] Sent: Friday, October 14, 2011 5:57 PM To: [email protected] Subject: Re: Issue while extracting chinese chars from pdf Can you open a jira issue and attach your PDF there? Your attachment didn't come through. Thanks. Unfortunately (from my limited understanding) PDFs can be tricky. EG, if they use an embedded font and that font doesn't include unicode mappings for its glyphs then PDFBox won't be able to extract the character data, I believe. Mike McCandless http://blog.mikemccandless.com On Fri, Oct 14, 2011 at 7:31 AM, Srinivaas_Venkatarayan <[email protected]> wrote: > HI, > > Can someone pls help me with this issue? From this url > http://www.pinxue.net/java/PDFBox_String_Charset_analyze_en.html it looks > like PDFBox can handle CJK fonts but I'm not sure what is that I have to do > to extract Chinese chars. > > Thanks > Srinivaas > From: Srinivaas_Venkatarayan > Sent: Wednesday, October 12, 2011 5:12 PM > To: '[email protected]' > Subject: Issue while extracting chinese chars from pdf > > Hi, > > I'm trying to extract the text contents of a PDF file and store it in a txt > file using PDFBox (ver 1.6.0). I have issues extracting the content of a PDF > that has Chinese characters in it. Attached is the PDF and the java code. I'm > not sure what encoding is being used in this PDF. Can you pls help? > > Thanks > Srini > > > > ________________________________ > DISCLAIMER: > This email (including any attachments) is intended for the sole use of the > intended recipient/s and may contain material that is CONFIDENTIAL AND > PRIVATE COMPANY INFORMATION. Any review or reliance by others or copying or > distribution or forwarding of any or all of the contents in this message is > STRICTLY PROHIBITED. If you are not the intended recipient, please contact > the sender by email and delete all copies; your cooperation in this regard is > appreciated. > DISCLAIMER: This email (including any attachments) is intended for the sole use of the intended recipient/s and may contain material that is CONFIDENTIAL AND PRIVATE COMPANY INFORMATION. Any review or reliance by others or copying or distribution or forwarding of any or all of the contents in this message is STRICTLY PROHIBITED. If you are not the intended recipient, please contact the sender by email and delete all copies; your cooperation in this regard is appreciated.

