Le mardi 21 août 2007 à 22:30 +0200, Albert Astals Cid a écrit : > A Dilluns 20 Agost 2007, Carl Worth va escriure: > > On Sun, 19 Aug 2007 22:46:16 +0200, Laurent Aguerreche wrote: > > > But the real problem is that it is impossible to recognize : > > > - "fi" as "fi" too > > > - "ff" as "ff" too > > > Would it be possible to add a new parameter to pdftotext to make it > > > ignore ligatures but still export in UTF-8? > > > > It's quite preferable to have the ligatures in your PDF file. > > > > The bug to fix is that poppler should expand the ligatures to their > > normalized forms when extracting the text. > > Actually i disagree, if you have æ do you want to get it expanded to ae too? > If not why you want it with the ff ligature?
I think there are two cases here : - "ff" is composed of two characters but relied (= ligature) when displayed only. When wrote by hands, it is "ff"; - "æ" is always wrote "a" with "e". (Indeed I do not know what language you are talking about as example but I know the case of word "cœur" (= heart) in french: write it "coeur" is always wrong). Laurent. > Albert > > > > > That bug was first reported here: > > > > Text extraction should expand ligatures to their normal form > > https://bugs.freedesktop.org/show_bug.cgi?id=7002 > > > > -Carl > > > _______________________________________________ > poppler mailing list > [email protected] > http://lists.freedesktop.org/mailman/listinfo/poppler
signature.asc
Description: Ceci est une partie de message numériquement signée
_______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
