Ok no worries. Thanks, Franklin
On Sun, Jul 24, 2011 at 4:52 PM, Andreas Lehmkuehler <[email protected]>wrote: > Hi, > > Am 23.07.2011 19:36, schrieb Franklin Antony: > > Hi Andreas, >> Isnt there even any type of hack that can be done to get this working? >> > If I knew such a hack I would have already share it with the project. > > BR > Andreas Lehmkühler > > > > Regards, >> Franklin >> >> On Sat, Jul 23, 2011 at 7:48 PM, Andreas Lehmkuehler<[email protected]>** >> wrote: >> >> Hi, >>> >>> I'm sorry for the late answer ... >>> >>> Am 13.07.2011 18:37, schrieb Michael Jeier: >>> >>> Hi, >>>> >>>> I looked at the fonts in Adobe Reader: >>>> >>>> IDRGagrotesc >>>> Type: Type 1 >>>> Encoding: Ansi >>>> Actual Font: Adobe Sans MM >>>> Actual Font Type: Type 1 >>>> >>>> IDRGagrotesc >>>> Type: Type 1 >>>> Encoding: Roman >>>> Actual Font: Adobe Sans MM >>>> Actual Font Type: Type 1 >>>> >>>> TimesAcapitals (Embedded Subset) >>>> Type: Type 1 >>>> Encoding: Custom >>>> >>>> TimesAcursivNormal (Embedded Subset) >>>> Type: Type 1 >>>> Encoding: Custom >>>> >>>> TimesAfoneticaNormal (Embedded Subset) >>>> Type: Type 1 >>>> Encoding: Custom >>>> >>>> TimesAgrass (Embedded Subset) >>>> Type: Type 1 >>>> Encoding: Custom >>>> >>>> TimesAngrec (Embedded Subset) >>>> Type: Type 1 >>>> Encoding: Custom >>>> >>>> TimesAstabil (Embedded Subset) >>>> Type: Type 1 >>>> Encoding: Custom >>>> >>>> So, I guess, custom encoding means I am screwed? :( >>>> >>>> I'm sorry but yes. >>> >>> But how can the Adobe Reader display the characters correctly? Shouldn't >>> >>>> that be reflected somehow in the PDFBox API?? >>>> >>>> The characters are stored as glyphs (small pieces of graphics). In many >>> cases >>> readable mappings are used to adress those glyphs so that the character >>> code >>> can be used to extract the text. But in some cases pdf uses a custom >>> mapping >>> which isn't readable. >>> >>> Where in the code is the encoding handled? If someone could point me in >>> >>>> that >>>> direction I can maybe just add a workaround >>>> there. Feeling a bit lost here... :/ >>>> >>>> I guess there is no workaround. Just do the ultimate test. Open the pdf >>> in >>> question using the acrobat reader. Select the text, copy and paste it to >>> an >>> editor. If the text is readable, PDFBox should be able to extract it too. >>> But if it is unreadable, you won't find any way to extract the text >>> directly. >>> >>> Thanks for helping! >>> >>>> >>>> Regards, Robin >>>> SNIP >>>> >>>> >>> BR >>> Andreas Lehmkühler >>> >>> >> >

