Re: String replace failed

Maruan Sahyoun Wed, 12 Jun 2013 04:53:28 -0700

Hi Markus,

what I meant was using the ReplaceString example as a base but looking at 
ExtractText for how to get the different bits which make up a visible string 
together. As I already wrote this is a lot of effort and there are several 
potential issues. One is related to the fact the only parts of a font might be 
embedded (font subletting) and when you try to replace a text string with 
another the glyph (character) you need is not available in the font.


As an example let's say you needed 'ZF' to be printed in you PDF in let's say 
Frutiger. With subsetting only the glyphs for 'Z' and 'F' will be available in 
your PDF. Now if you try to replace that with 'AF' the glyph for 'A' will not 
be available in the embedded font ….. which would mean that you either need to 
get the information from the font if the font is still available to you, add 
that to the form information (or create a new entry) ….. OR represent with one 
of the inbuilt fonts which means that the character is a new obejtc. Now when 
you try to extract the text it's no longer a consecutive string ….

So even if you put in all the effort you might end up with a solution which 
works in 90% of your cases but not 100% . 

If you like you can contact me directly to discuss that further.

Maruan Sahyoun


Am 07.06.2013 um 14:36 schrieb [email protected]:

> Hi,
> 
> sorry for the delay. (Vacation)
> this week I thried to merge the stripper with the TextReplace, but I didn't 
> get it managed.
> Because the PDFTextStripper doesn't work in this way the StringReplace Sample 
> do.
> Maybe you could be so kind to give me an other hint.
> 
> Best regards
> 
> Markus
> 
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: Maruan Sahyoun [mailto:[email protected]] 
> Gesendet: Freitag, 17. Mai 2013 19:06
> An: [email protected]
> Betreff: Re: String replace failed
> 
> Hi Markus,
> 
> a little explanation what goes on here:
> 
> 1. You text strings are encoded as pdf hex strings 2. the PDF uses a encoding 
> map
> 
> So in order to get to the string you need to look at the hex parts of the 
> string and look up the individual parts in the encoding map for the 
> corresponding font which is used for the text
> 
> Example from the first page of your PDF
> 
> /F409 35 Tf
> 1 0 0 -1 170.59300232 240.93499756 Tm [<00030004000500010006> -7 
> <000700080009000A000B>] TJ
> 
> This means that the font used is F409. The first hex sequence is 0003. That 
> corresponds to character map <0003> <003c> which means that 0003 should be 
> represented using the unicode character 0003c which is the LESS-THAN SIGN (<)
> 
> ..
> 
> So in order to come up with a solution one would need to combine the code 
> used e.g. for ExtractText and combine that with the ReplaceString example.
> 
> Unfortunately as can be seen by the description above the ReplaceString 
> example is overly simplistic and only works in certain conditions.
> 
> 
> As the PDF you have is being produced using Apache fop couldn't you handle 
> the replacement in the pdf generation side? Would be much easier. 
> 
> BR
> Maruan Sahyoun
> 
> 
> Am 17.05.2013 um 13:39 schrieb Maruan Sahyoun <[email protected]>:
> 
>> Hi Markus,
>> 
>> can't look at it atm. Will get back to it later today
>> 
>> BR
>> Maruan
>> 
>> 
>> Am 17.05.2013 um 13:02 schrieb <[email protected]>:
>> 
>>> https://docs.google.com/file/d/0B9_jmweC39sxQTJycGNKdVVPWVk/edit?usp=
>>> sharing
>>> have a look at the log-File
>>> 
>>> -----Ursprüngliche Nachricht-----
>>> Von: Maruan Sahyoun [mailto:[email protected]]
>>> Gesendet: Freitag, 17. Mai 2013 12:58
>>> An: [email protected]
>>> Betreff: Re: String replace failed
>>> 
>>> That's not easy :-)
>>> 
>>> You wrote " . parser returned a unreadable string . " which is the string 
>>> you are getting?
>>> 
>>> BR
>>> Maruan
>>> 
>>> 
>>> Am 17.05.2013 um 12:37 schrieb [email protected]:
>>> 
>>>> My target is to replace ##VERSION## with "Release 9.8.3.4 (12th April 
>>>> 2013)"
>>>> It's on Page 3.
>>>> 
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Maruan Sahyoun [mailto:[email protected]]
>>>> Gesendet: Freitag, 17. Mai 2013 12:33
>>>> An: [email protected]
>>>> Betreff: Re: String replace failed
>>>> 
>>>> fine, I can extract the text. Could you describe what you are doing? E.g. 
>>>> which text would you like to replace? Do you have a sample code snippet to 
>>>> verify? Do you receive an error?
>>>> 
>>>> BR
>>>> Maruan Sahyoun
>>>> 
>>>> Am 17.05.2013 um 12:28 schrieb [email protected]:
>>>> 
>>>>> OK.... here it is
>>>>> https://docs.google.com/file/d/0B9_jmweC39sxbmp2OXMtaXFTVG8/edit?us
>>>>> p=sharing
>>>>> 
>>>>> -----Ursprüngliche Nachricht-----
>>>>> Von: Maruan Sahyoun [mailto:[email protected]]
>>>>> Gesendet: Freitag, 17. Mai 2013 12:20
>>>>> An: [email protected]
>>>>> Betreff: Re: String replace failed
>>>>> 
>>>>> Hi Markus,
>>>>> 
>>>>> No - the mailing list doesn't allow them. Could you upload the file 
>>>>> somewhere so we can download it?
>>>>> 
>>>>> BR
>>>>> Maruan Sahyoun
>>>>> 
>>>>> Am 17.05.2013 um 12:09 schrieb <[email protected]>:
>>>>> 
>>>>>> Sorry, maybe our Mail-Gateway removes attachments
>>>>>> 
>>>>>> -----Ursprüngliche Nachricht-----
>>>>>> Von: Maruan Sahyoun [mailto:[email protected]]
>>>>>> Gesendet: Freitag, 17. Mai 2013 11:59
>>>>>> An: [email protected]
>>>>>> Betreff: Re: String replace failed
>>>>>> 
>>>>>> Hi Markus,
>>>>>> 
>>>>>> could you be a little more specific? Maybe with a sample PDF and some 
>>>>>> code? Replacing a string in a pdf can be much more complex than the 
>>>>>> ReplaceString example suggests. 
>>>>>> 
>>>>>> BR
>>>>>> Maruan Sahyoun
>>>>>> 
>>>>>> Am 17.05.2013 um 11:41 schrieb <[email protected]>:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I tried to use the String replace example, but the parser returned a 
>>>>>>> unreadable string.
>>>>>>> I use the java code like in the example.
>>>>>>> 
>>>>>>> Best regards
>>>>>>> 
>>>>>>> Markus
>>> 
>

Re: String replace failed

Reply via email to