Re: [poppler] [PATCH] Fixup LaTeX composed characters

Ross Moore Fri, 25 Mar 2011 14:32:00 -0700

Hi Albert and Tim,

>>>> 
>>>> Yes, this is an issue with pdflatex but there are 100,000s of
>>>> TeX-produced
>>>> PDFs for which we don't have source for ...
>>> 
>>> Hmmm, is it supposed to just kill the diacritic mark?
>>> 
>>> R. L¨wen and B. Polster
>>> o
>>> gets converted to
>>> R. Lowen and B. Polster
>>> shouldn't it be
>>> R. Löwen and B. Polster
>>> ?
>> 
>> It should do - can you send me this PDF?
> 
> http://www.maths.mq.edu.au/~ross/5019-e-cmap.pdf
> 
>> 
>> I get this from TeX:
>> R. L\"owen and B. Polster => R. Löwen and B. Polster


Note that this example has a customized CMAP for each font, so is not typical 
of older TeX-produced PDFs. So I'm not surprised that Tim's method does not 
work with it. 

This should just mean that there are further patterns in the output that may be 
able to be recognised, and replaced by the proper Unicode character, or 
combining character pair.



>> 
>> NB I just tried extracting from a Word-generated PDF and TextOutputDev
>> didn't see the line with the diacritic at all.
> 
> And are you sure it's not a Word fault?
> 
> Albert
> 


Hope this helps,

       Ross
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Re: [poppler] [PATCH] Fixup LaTeX composed characters

Reply via email to