Hi Andreas,
  Isnt there even any type of hack that can be done to get this working?

Regards,
Franklin

On Sat, Jul 23, 2011 at 7:48 PM, Andreas Lehmkuehler <[email protected]>wrote:

> Hi,
>
> I'm sorry for the late answer ...
>
> Am 13.07.2011 18:37, schrieb Michael Jeier:
>
>> Hi,
>>
>> I looked at the fonts in Adobe Reader:
>>
>> IDRGagrotesc
>>     Type: Type 1
>>     Encoding: Ansi
>>     Actual Font: Adobe Sans MM
>>     Actual Font Type: Type 1
>>
>> IDRGagrotesc
>>     Type: Type 1
>>     Encoding: Roman
>>     Actual Font: Adobe Sans MM
>>     Actual Font Type: Type 1
>>
>> TimesAcapitals (Embedded Subset)
>>     Type: Type 1
>>     Encoding: Custom
>>
>> TimesAcursivNormal (Embedded Subset)
>>     Type: Type 1
>>     Encoding: Custom
>>
>> TimesAfoneticaNormal (Embedded Subset)
>>     Type: Type 1
>>     Encoding: Custom
>>
>> TimesAgrass (Embedded Subset)
>>     Type: Type 1
>>     Encoding: Custom
>>
>> TimesAngrec (Embedded Subset)
>>     Type: Type 1
>>     Encoding: Custom
>>
>> TimesAstabil (Embedded Subset)
>>     Type: Type 1
>>     Encoding: Custom
>>
>> So, I guess, custom encoding means I am screwed? :(
>>
> I'm sorry but yes.
>
>  But how can the Adobe Reader display the characters correctly? Shouldn't
>> that be reflected somehow in the PDFBox API??
>>
> The characters are stored as glyphs (small pieces of graphics). In many
> cases
> readable mappings are used to adress those glyphs so that the character
> code
> can be used to extract the text. But in some cases pdf uses a custom
> mapping
> which isn't readable.
>
>  Where in the code is the encoding handled? If someone could point me in
>> that
>> direction I can maybe just add a workaround
>> there. Feeling a bit lost here... :/
>>
> I guess there is no workaround. Just do the ultimate test. Open the pdf in
> question using the acrobat reader. Select the text, copy and paste it to an
> editor. If the text is readable, PDFBox should be able to extract it too.
> But if it is unreadable, you won't find any way to extract the text
> directly.
>
>  Thanks for helping!
>>
>> Regards, Robin
>> SNIP
>>
>
> BR
> Andreas Lehmkühler
>

Reply via email to