Re: Issues with extraction content of PDF files

2015-12-31 Thread John Hewson
> On 29 Dec 2015, at 00:34, Zheng Lin Edwin Yeo wrote: > > Thanks for your reply Tilman. > > Would like to find out, is the content extraction issue of this caused by the > Identity-H encoding? Most likely. Identity-H is basically just "no encoding", so there needs to

Re: Issues with extraction content of PDF files

2015-12-29 Thread Zheng Lin Edwin Yeo
Thanks for your reply Tilman. Would like to find out, is the content extraction issue of this caused by the Identity-H encoding? Regards, Edwin On 21 December 2015 at 16:12, Tilman Hausherr wrote: > Am 21.12.2015 um 04:08 schrieb Zheng Lin Edwin Yeo: > > Thanks for

Re: Issues with extraction content of PDF files

2015-12-29 Thread Tilman Hausherr
Don't know enough about that part myself, the best would be to read about it here: https://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf Tilman Am 29.12.2015 um 09:34 schrieb Zheng Lin Edwin Yeo: Thanks for your reply Tilman. Would like to find out, is the content

Re: Issues with extraction content of PDF files

2015-12-21 Thread Tilman Hausherr
Am 21.12.2015 um 04:08 schrieb Zheng Lin Edwin Yeo: Thanks for your reply. I tried on Adobe Acrobat Pro DC, it is able to open the file, but if open on Adobe Reader then it is not able to extract all the text properly. Is there anyway which we can check what type of encoding is used for the

Re: Issues with extraction content of PDF files

2015-12-20 Thread Zheng Lin Edwin Yeo
Thanks for your reply. I tried on Adobe Acrobat Pro DC, it is able to open the file, but if open on Adobe Reader then it is not able to extract all the text properly. Is there anyway which we can check what type of encoding is used for the PDF files? Regards, Edwin On 19 December 2015 at

Re: Issues with extraction content of PDF files

2015-12-18 Thread Tilman Hausherr
Am 18.12.2015 um 18:57 schrieb Zheng Lin Edwin Yeo: I've shared one of the file with the issue on dropbox, which you can access via the link here: https://www.dropbox.com/s/rufi9esmnsmzhmw/Desmophen%2B670%2BBAe.pdf?dl=0 Adobe Reader is also unable to extract text.

RE: Issues with extraction content of PDF files

2015-12-18 Thread Allison, Timothy B.
Colleagues, So that you don't have to do the initial diagnosis at least. From [0]: >>That said, PDFBox 2.0-RC2 extracts no text and warns: WARNING: No Unicode >>mapping for CID+71 (71) in font 505Eddc6Arial >>So, if the file has no Unicode mapping for the font, I doubt they'll be able >>to