From: Albert Astals Cid <[EMAIL PROTECTED]> Subject: Re: [poppler] pdftotext needs support for surrogates outside the BMP plane Date: Sun, 1 Jun 2008 17:28:11 +0200 Message-ID: <[EMAIL PROTECTED]>
aacid> A Dijous 29 Maig 2008, Koji Otani va escriure: aacid> > Hi, All. aacid> > aacid> > I'd like to commit this patch to the trunk tree. aacid> > Should I register this to Bugzilla before doing it? aacid> aacid> No, but i'd like to confirm that "it works" before commiting it, i can see aacid> that your patch gives a different output but i don't have any font installed aacid> in my system that can "draw" the characters, what font are you using? aacid> aacid> Albert aacid> Output is a UTF-8 text file. I don't have fonts that can draw this text file too. I checked if it is correct with a hexdump application. This problem was reported by Dr. Ross Moore. He viewed it with Mac text editor. but I can't view it with my Mac text-editor. > Dr. Ross Moore What font are you using? --------------- Koji Otani aacid> > -------------- aacid> > Koji Otani. aacid> > aacid> > From: Ross Moore <[EMAIL PROTECTED]> aacid> > Subject: Re: [poppler] pdftotext needs support for surrogates outside the aacid> > BMP plane Date: Thu, 29 May 2008 09:06:24 +1000 aacid> > Message-ID: <[EMAIL PROTECTED]> aacid> > aacid> > ross> aacid> > ross> On 28/05/2008, at 6:25 PM, Koji Otani wrote: aacid> > ross> > Hi. aacid> > ross> > aacid> > ross> > ross> There are many pieces of software that do not regard the aacid> > 6-byte ross> > ross> sequences aacid> > ross> > ross> as being valid UTF-8. Thus there needs to be an extra step aacid> > that ross> > ross> translates aacid> > ross> > ross> these 2 x 3 = 6-byte sequences into the proper UTF-8 4-byte aacid> > ross> > sequence. aacid> > ross> > ross> aacid> > ross> > ross> Is anybody working on this kind of thing? aacid> > ross> > ross> aacid> > ross> > aacid> > ross> > I've made a patch fixes this bug, and attached it to this mail. aacid> > ross> aacid> > ross> Thank you very much for this. aacid> > ross> It works brilliantly. aacid> > ross> aacid> > ross> The attached image shows the result of using aacid> > ross> aacid> > ross> pdftotext -layout testmath.pdf aacid> > ross> aacid> > ross> on the example PDF from my previous message, aacid> > ross> viewed with a standard Mac text-editor application. aacid> > ross> aacid> > _______________________________________________ aacid> > poppler mailing list aacid> > [email protected] aacid> > http://lists.freedesktop.org/mailman/listinfo/poppler aacid> aacid> aacid> _______________________________________________ aacid> poppler mailing list aacid> [email protected] aacid> http://lists.freedesktop.org/mailman/listinfo/poppler _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
