Onsdag 11. mars 2015 23.50.27 skrev Albert Astals Cid: > El Dimarts, 10 de març de 2015, a les 10:17:12, Børre Gaup va escriure: > > Tirsdag 10. mars 2015 00.45.49 skrev Albert Astals Cid: > > > El Dilluns, 9 de març de 2015, a les 20:36:12, Børre Gaup va escriure: > > > > Hi! > > > > > > Hi > > > > > > > pdftohtml -xml sometimes produces invalid xml, resulting in lines like > > > > this: > > > > > > > > <text top="218" left="142" width="532" height="16" font="1"><i>lea > > > > <b>sadji > > > > </b></i>(korreláhta),<b> <i>gosa mannat </b>/ Minulla on <b>paikka</b> > > > > </i>(korreláhta)<b>, <i>jonne </i></b></text> > > > > > > > > In our collection of 1078 pdfs, pdftohtml produces 11 documents with > > > > this > > > > 'opening and ending tag mismatch' error. > > > > > > > > I did some changes in utils/HtmlOutputDev.cc that make those 11 > > > > documents > > > > wellformed and do not break the wellformedness of the other documents. > > > > > > > > The changes I did is found here: > > > > https://github.com/albbas/poppler/compare/fix_xml_wellformedness > > > > > > > > I also made a diff (8174 lines) which shows what kind of changes this > > > > version makes on our 1078 pdf documents compared to pdftothml 0.30.0. > > > > That > > > > diff is found here: > > > > https://github.com/albbas/poppler/blob/fix_xml_wellformedness/all-pdf. > > > > di > > > > ff > > > > > > > > Would you be interested in incorporating these changes into the main > > > > branch? > > > > > > Can you please link to a pdf with such error (if you don't have an > > > internet > > > link i'd suggest opening a bug in bugs.freedesktop.org and attaching > > > both > > > the patch and the file there). > > > > Here are a couple of links: > > http://www.samediggi.se/31961 > > http://www.samisk.no/attachments/129_Tjaalege_%20J%C3%AFengesne%20h%C3%A5a > > gk odh.pdf > > Could you actually please open a bug? It's much easier for me to track all > the missing things i have to do with bug numbers than over mailign list > subjects. >
Attached the patch and such to https://bugs.freedesktop.org/show_bug.cgi?id=89239 Regards, Børre _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
