> On 29 Dec 2015, at 00:34, Zheng Lin Edwin Yeo <[email protected]> wrote: > > Thanks for your reply Tilman. > > Would like to find out, is the content extraction issue of this caused by the > Identity-H encoding?
Most likely. Identity-H is basically just "no encoding", so there needs to be a ToUnicode map in order to extract the text (which there isn't). -- John > Regards, > Edwin > > >> On 21 December 2015 at 16:12, Tilman Hausherr <[email protected]> wrote: >>> Am 21.12.2015 um 04:08 schrieb Zheng Lin Edwin Yeo: >>> Thanks for your reply. >>> >>> I tried on Adobe Acrobat Pro DC, it is able to open the file, but if open >>> on Adobe Reader then it is not able to extract all the text properly. >>> >>> Is there anyway which we can check what type of encoding is used for the >>> PDF files? >> >> Yes, in the font dictionaries, as you can see from this screenshot: >> >> >> >> However this won't get you the text, obviously. >> >> Tilman >> >>> Regards, >>> Edwin >>> >>> >>> >>> >>> On 19 December 2015 at 03:07, Tilman Hausherr <[email protected]> wrote: >>> >>>>> Am 18.12.2015 um 18:57 schrieb Zheng Lin Edwin Yeo: >>>>> >>>>> I've shared one of the file with the issue on dropbox, which you can >>>>> access >>>>> via the link here: >>>>> https://www.dropbox.com/s/rufi9esmnsmzhmw/Desmophen%2B670%2BBAe.pdf?dl=0 >>>>> >>>> Adobe Reader is also unable to extract text. >>>> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>>> >>>> >> >

