Tim - be aware that the PDF standard (ISO 32000-1:2008) refers to a specific 
version of Unicode (v4).  Support for any newer version could potentially 
introduce compatibility issues.

For the next version of PDF (2.0, ISO 32000-2) we are evaluating updating that 
reference.

Leonard

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of 
Tim Brody
Sent: Wednesday, May 11, 2011 2:58 AM
To: [email protected]
Subject: Re: [poppler] [PATCH] Fixup LaTeX composed characters

On Tue, 10 May 2011 19:15:51 +0100, Albert Astals Cid <[email protected]>
wrote:
> A Tuesday, May 10, 2011, Tim Brody va escriure:
>> > Sincerely i am quite hesitant to apply your patch since it "breaks"
>> > pdftotext
>> > usage in the console (since it seems most of the apps in the console
>> > are
>> > not
>> > able to understand the non-composed form)
>> 

>> Anyway, my patch is only a fix-up of overprinting characters that would
>> otherwise get mangled by pfdtotext. It just makes it more apparent that
>> your tool-chain is broken because it's producing more non-ASCII7
>> code-points.
> 
> By tool-chain you mean pdftotext?

I mean whatever you're piping to. I haven't encountered a problem with
decomposed Unicode in bash/less/vim.

>> I agree that pdftotext should by default output NFC but you need to
>> decide
>> whether to implement an NFC against the out of date poppler tables or
>> link
>> to icu.
> 
> I don't think linking to icu (which last i checked is a huuuuuuuuuge
> monster 
> way bigger than poppler itself in size), otoh why you say poppler tables
> are 
> out of date? Nobody has complained about something not working :D

Normalisation relies on the canonical character compositions, which come
from the Unicode tables. The poppler .h files are dated 2008 and there have
been two new Unicode versions since 2008 (assuming the tables used then
were current). I'm not saying they're broken but that Unicode tables
have/will change.

Regardless, I will normalise the output from pdftotext to NFKC anyway - I
just need it to not mangle TeX-generated PDFs. I don't see this as
dependent on fixing pfdtotext's normalisation.

-- 
All the best,
Tim.
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Reply via email to