Hi Andreas, Isnt there even any type of hack that can be done to get this working?
Regards, Franklin On Sat, Jul 23, 2011 at 7:48 PM, Andreas Lehmkuehler <[email protected]>wrote: > Hi, > > I'm sorry for the late answer ... > > Am 13.07.2011 18:37, schrieb Michael Jeier: > >> Hi, >> >> I looked at the fonts in Adobe Reader: >> >> IDRGagrotesc >> Type: Type 1 >> Encoding: Ansi >> Actual Font: Adobe Sans MM >> Actual Font Type: Type 1 >> >> IDRGagrotesc >> Type: Type 1 >> Encoding: Roman >> Actual Font: Adobe Sans MM >> Actual Font Type: Type 1 >> >> TimesAcapitals (Embedded Subset) >> Type: Type 1 >> Encoding: Custom >> >> TimesAcursivNormal (Embedded Subset) >> Type: Type 1 >> Encoding: Custom >> >> TimesAfoneticaNormal (Embedded Subset) >> Type: Type 1 >> Encoding: Custom >> >> TimesAgrass (Embedded Subset) >> Type: Type 1 >> Encoding: Custom >> >> TimesAngrec (Embedded Subset) >> Type: Type 1 >> Encoding: Custom >> >> TimesAstabil (Embedded Subset) >> Type: Type 1 >> Encoding: Custom >> >> So, I guess, custom encoding means I am screwed? :( >> > I'm sorry but yes. > > But how can the Adobe Reader display the characters correctly? Shouldn't >> that be reflected somehow in the PDFBox API?? >> > The characters are stored as glyphs (small pieces of graphics). In many > cases > readable mappings are used to adress those glyphs so that the character > code > can be used to extract the text. But in some cases pdf uses a custom > mapping > which isn't readable. > > Where in the code is the encoding handled? If someone could point me in >> that >> direction I can maybe just add a workaround >> there. Feeling a bit lost here... :/ >> > I guess there is no workaround. Just do the ultimate test. Open the pdf in > question using the acrobat reader. Select the text, copy and paste it to an > editor. If the text is readable, PDFBox should be able to extract it too. > But if it is unreadable, you won't find any way to extract the text > directly. > > Thanks for helping! >> >> Regards, Robin >> SNIP >> > > BR > Andreas Lehmkühler >

